Это мои данные.
Мне нужно рассчитать время пролистывания и пролистывания. Иногда мы получаем частичную запись, и нам необходимо также включить эти записи.
Мой ожидаемый результат:
Затем используются оконные функции, такие как lead
, lag
. После использования этих функций я не могу продолжить получение этого результата.
данные прилагаемой таблицы:
select 'user-1' as username,'building-1' as building,'IN' as action,'2024-04-10 01:00:00'::timestamp as timestamp
union all
select 'user-1' as username,'building-1' as building,'OUT' as action,'2024-04-10 02:00:00'::timestamp as timestamp
union all
select 'user-1' as username,'building-1' as building,'IN' as action,'2024-04-10 02:30:00'::timestamp as timestamp
union all
select 'user-1' as username,'building-1' as building,'OUT' as action,'2024-04-10 04:00:00'::timestamp as timestamp
union all
select 'user-1' as username,'building-1' as building,'IN' as action,'2024-04-11 10:00:00'::timestamp as timestamp
union all
select 'user-1' as username,'building-1' as building,'OUT' as action,'2024-04-11 11:00:00'::timestamp as timestamp
union all
select 'user-2' as username,'building-2' as building,'IN' as action,'2024-04-10 08:00:00'::timestamp as timestamp
union all
select 'user-2' as username,'building-2' as building,'OUT' as action,'2024-04-10 09:00:00'::timestamp as timestamp
union all
select 'user-2' as username,'building-3' as building,'OUT' as action,'2024-04-11 02:30:00'::timestamp as timestamp
union all
select 'user-2' as username,'building-4' as building,'IN' as action,'2024-04-11 04:00:00'::timestamp as timestamp
union all
select 'user-2' as username,'building-1' as building,'IN' as action,'2024-04-12 10:00:00'::timestamp as timestamp
union all
select 'user-2' as username,'building-1' as building,'OUT' as action,'2024-04-12 11:00:00'::timestamp as timestamp
)
select * from _data
Для этого вы можете использовать опережение/задержку вместе с CASE/WHEN/END.
Схема (PostgreSQL v15)
CREATE TABLE swipes (
"username" VARCHAR(6),
"building" VARCHAR(10),
"action" VARCHAR(3),
"timestamp" TIMESTAMP
);
INSERT INTO swipes
("username", "building", "action", "timestamp")
VALUES ('user-1', 'building-1', 'IN', '2024-04-10 01:00:00.000')
, ('user-1', 'building-1', 'OUT', '2024-04-10 02:00:00.000')
, ('user-1', 'building-1', 'IN', '2024-04-10 02:30:00.000')
, ('user-1', 'building-1', 'OUT', '2024-04-10 04:00:00.000')
, ('user-1', 'building-1', 'IN', '2024-04-11 10:00:00.000')
, ('user-1', 'building-1', 'OUT', '2024-04-11 11:00:00.000')
, ('user-2', 'building-2', 'IN', '2024-04-10 08:00:00.000')
, ('user-2', 'building-2', 'OUT', '2024-04-10 09:00:00.000')
, ('user-2', 'building-3', 'OUT', '2024-04-11 02:30:00.000')
, ('user-2', 'building-4', 'IN', '2024-04-11 04:00:00.000')
, ('user-2', 'building-1', 'IN', '2024-04-12 10:00:00.000')
, ('user-2', 'building-1', 'OUT', '2024-04-12 11:00:00.000')
-- added those rows to showcase the results when the same action happens twice in a row
, ('user-3', 'building-1', 'IN', '2024-04-12 11:00:00.000')
, ('user-3', 'building-1', 'IN', '2024-04-12 12:00:00.000')
, ('user-3', 'building-1', 'OUT', '2024-04-12 13:00:00.000')
;
Запрос №1
-- Since the information may be partial we can't simply look at all "in"s or "out"s,
-- so we look at all the rows & end up with everything twice, hence the DISTINCT.
SELECT DISTINCT
username
, building
, CASE
WHEN action = 'IN' THEN timestamp
-- To find the entry time corresponding to this "out" row, we'll look at the previous row.
-- If it's an "in", then we keep the timestamp.
-- If it's an "out", then there were 2 "outs" in a row, so the "in" value is set to null (see user-3's rows for an example of this).
WHEN action = 'OUT' THEN
CASE WHEN LAG(action) OVER user_actions_per_building_by_timestamp = 'IN'
THEN LAG(timestamp) OVER user_actions_per_building_by_timestamp
END
END AS "in"
, CASE
-- same concept as above, with IN/OUT & LAG/LEAD flipped.
WHEN action = 'IN' THEN
CASE WHEN LEAD(action) OVER user_actions_per_building_by_timestamp = 'OUT'
THEN LEAD(timestamp) OVER user_actions_per_building_by_timestamp
END
WHEN action = 'OUT' THEN timestamp
END AS "out"
FROM swipes
WINDOW user_actions_per_building_by_timestamp AS (PARTITION BY username, building ORDER BY timestamp)
ORDER BY username, building, "in", "out";
Другим способом может быть использование рекурсивного CTE, который может быть легче понять, если условия станут более сложными (и он также будет работать по-другому, хотя вам нужно будет проверить это на некоторых реальных данных).
@Marth..Отлично, большое спасибо... я пробую эту формулу за последние два дня. Ценится.