Presto Window Function - (sum over partition by)
Publish date: 2024-01-30
I have the following Presto 2 tables, one storing budget information by client & day, and the other one storing spend information by client & day
select day, client_id, budget_id, budget_period, budget_amount from budget_table
day | client_id | budget_id | budget_period | budget_amount |
---|---|---|---|---|
2021-02-27 | 1 | 1-1 | daily | 10 |
2021-02-28 | 1 | 1-1 | daily | 10 |
2021-03-01 | 1 | 1-1 | daily | 10 |
2021-03-02 | 1 | 1-1 | daily | 10 |
2021-03-03 | 1 | 1-1 | daily | 10 |
2021-03-04 | 1 | 1-2 | monthly | 500 |
2021-03-05 | 1 | 1-2 | monthly | 500 |
2021-03-06 | 1 | 1-2 | monthly | 500 |
2021-02-27 | 2 | 2-1 | monthly | 400 |
2021-02-28 | 2 | 2-1 | monthly | 400 |
2021-03-01 | 2 | 2-1 | monthly | 400 |
2021-03-02 | 2 | 2-1 | monthly | 400 |
2021-03-03 | 2 | 2-2 | one_time | 1000 |
2021-03-04 | 2 | 2-2 | one_time | 1000 |
2021-03-05 | 2 | 2-2 | one_time | 1000 |
2021-03-06 | 2 | 2-2 | one_time | 1000 |
select day, client_id, spend from spend_table
day | client_id | spend |
---|---|---|
2021-02-27 | 1 | 8 |
2021-02-28 | 1 | 9 |
2021-03-01 | 1 | 10 |
2021-03-02 | 1 | 7 |
2021-03-03 | 1 | 6 |
2021-03-04 | 1 | 16 |
2021-03-05 | 1 | 19 |
2021-03-06 | 1 | 18 |
2021-02-27 | 2 | 13 |
2021-02-28 | 2 | 15 |
2021-03-01 | 2 | 14 |
2021-03-02 | 2 | 15 |
2021-03-03 | 2 | 20 |
2021-03-04 | 2 | 25 |
2021-03-05 | 2 | 18 |
2021-03-06 | 2 | 27 |
Below is desired output:
day | client_id | budget_id | budget_period | budget_amount | spend | spend_over_period |
---|---|---|---|---|---|---|
2021-02-27 | 1 | 1-1 | daily | 10 | 8 | 8 |
2021-02-28 | 1 | 1-1 | daily | 10 | 9 | 9 |
2021-03-01 | 1 | 1-1 | daily | 10 | 10 | 10 |
2021-03-02 | 1 | 1-1 | daily | 10 | 7 | 7 |
2021-03-03 | 1 | 1-1 | daily | 10 | 6 | 6 |
2021-03-04 | 1 | 1-2 | monthly | 500 | 16 | 16 |
2021-03-05 | 1 | 1-2 | monthly | 500 | 19 | 35 |
2021-03-06 | 1 | 1-2 | monthly | 500 | 18 | 53 |
2021-02-27 | 2 | 2-1 | monthly | 400 | 13 | 13 |
2021-02-28 | 2 | 2-1 | monthly | 400 | 15 | 28 |
2021-03-01 | 2 | 2-1 | monthly | 400 | 14 | 14 |
2021-03-02 | 2 | 2-1 | monthly | 400 | 15 | 29 |
2021-03-03 | 2 | 2-2 | one_time | 1000 | 20 | 20 |
2021-03-04 | 2 | 2-2 | one_time | 1000 | 25 | 45 |
2021-03-05 | 2 | 2-2 | one_time | 1000 | 18 | 63 |
2021-03-06 | 2 | 2-2 | one_time | 1000 | 27 | 90 |
I have tried
select s.day, s.client_id, b.budget_id, b.budget_period, b.budget_amount, s.spend, case when b.budget_period = 'daily' then s.spend when b.budget_period = 'monthly' then sum(s.spend) over (partition by b.budget_id, month(date(s.day))) when as spend_over_period = 'one_time' then sum(s.spend) over (partition by b.budget_id) end as budget_over_period from spend_table as s join budget_table as b on s.day = b.day and s.client_id = b.client_id group by 1,2,3,4,5,6
But, I get u'EXPRESSION_NOT_AGGREGATE' error. Does anybody know how to query to get the desired output in Presto?
01 Answer
You can remove group by
clause completely and use ordering and frame for your window functions with order by date(s.day)
and range between unbounded preceding and current row
:
select s.day, s.client_id, b.budget_id, b.budget_period, b.budget_amount, s.spend, case when b.budget_period = 'daily' then s.spend when b.budget_period = 'monthly' then sum(s.spend) over (partition by b.budget_id, month(date(s.day)) order by date(s.day) range between unbounded preceding and current row) when b.budget_period = 'one_time' then sum(s.spend) over (partition by b.budget_id order by date(s.day) range between unbounded preceding and current row) end as spend_over_period from spend_table as s join budget_table as b on s.day = b.day and s.client_id = b.client_id order by 2,3,1
Output:
day | client_id | budget_id | budget_period | budget_amount | spend | spend_over_period |
---|---|---|---|---|---|---|
2021-02-27 | 1 | 1-1 | daily | 10 | 8 | 8 |
2021-02-28 | 1 | 1-1 | daily | 10 | 9 | 9 |
2021-03-01 | 1 | 1-1 | daily | 10 | 10 | 10 |
2021-03-02 | 1 | 1-1 | daily | 10 | 7 | 7 |
2021-03-03 | 1 | 1-1 | daily | 10 | 6 | 6 |
2021-03-04 | 1 | 1-2 | monthly | 500 | 16 | 16 |
2021-03-05 | 1 | 1-2 | monthly | 500 | 19 | 35 |
2021-03-06 | 1 | 1-2 | monthly | 500 | 18 | 53 |
2021-02-27 | 2 | 2-1 | monthly | 400 | 13 | 13 |
2021-02-28 | 2 | 2-1 | monthly | 400 | 15 | 28 |
2021-03-01 | 2 | 2-1 | monthly | 400 | 14 | 14 |
2021-03-02 | 2 | 2-1 | monthly | 400 | 15 | 29 |
2021-03-03 | 2 | 2-2 | one_time | 1000 | 20 | 20 |
2021-03-04 | 2 | 2-2 | one_time | 1000 | 25 | 45 |
2021-03-05 | 2 | 2-2 | one_time | 1000 | 18 | 63 |
2021-03-06 | 2 | 2-2 | one_time | 1000 | 27 | 90 |
ncG1vNJzZmirpJawrLvVnqmfpJ%2Bse6S7zGiorp2jqbawutJobXFxZGyGdYOOqameq6Skeri1zZ2msGWWqrukwMiopWarpaJ6sMLEq2SpmaKptrW1zqdkm7E%3D