Presto Window Function - (sum over partition by)

Publish date: 2024-01-30

I have the following Presto 2 tables, one storing budget information by client & day, and the other one storing spend information by client & day

select day, client_id, budget_id, budget_period, budget_amount from budget_table 
dayclient_idbudget_idbudget_periodbudget_amount
2021-02-2711-1daily10
2021-02-2811-1daily10
2021-03-0111-1daily10
2021-03-0211-1daily10
2021-03-0311-1daily10
2021-03-0411-2monthly500
2021-03-0511-2monthly500
2021-03-0611-2monthly500
2021-02-2722-1monthly400
2021-02-2822-1monthly400
2021-03-0122-1monthly400
2021-03-0222-1monthly400
2021-03-0322-2one_time1000
2021-03-0422-2one_time1000
2021-03-0522-2one_time1000
2021-03-0622-2one_time1000
select day, client_id, spend from spend_table 
dayclient_idspend
2021-02-2718
2021-02-2819
2021-03-01110
2021-03-0217
2021-03-0316
2021-03-04116
2021-03-05119
2021-03-06118
2021-02-27213
2021-02-28215
2021-03-01214
2021-03-02215
2021-03-03220
2021-03-04225
2021-03-05218
2021-03-06227

Below is desired output:

dayclient_idbudget_idbudget_periodbudget_amountspendspend_over_period
2021-02-2711-1daily1088
2021-02-2811-1daily1099
2021-03-0111-1daily101010
2021-03-0211-1daily1077
2021-03-0311-1daily1066
2021-03-0411-2monthly5001616
2021-03-0511-2monthly5001935
2021-03-0611-2monthly5001853
2021-02-2722-1monthly4001313
2021-02-2822-1monthly4001528
2021-03-0122-1monthly4001414
2021-03-0222-1monthly4001529
2021-03-0322-2one_time10002020
2021-03-0422-2one_time10002545
2021-03-0522-2one_time10001863
2021-03-0622-2one_time10002790

I have tried

select s.day, s.client_id, b.budget_id, b.budget_period, b.budget_amount, s.spend, case when b.budget_period = 'daily' then s.spend when b.budget_period = 'monthly' then sum(s.spend) over (partition by b.budget_id, month(date(s.day))) when as spend_over_period = 'one_time' then sum(s.spend) over (partition by b.budget_id) end as budget_over_period from spend_table as s join budget_table as b on s.day = b.day and s.client_id = b.client_id group by 1,2,3,4,5,6 

But, I get u'EXPRESSION_NOT_AGGREGATE' error. Does anybody know how to query to get the desired output in Presto?

0

1 Answer

You can remove group by clause completely and use ordering and frame for your window functions with order by date(s.day) and range between unbounded preceding and current row:

select s.day, s.client_id, b.budget_id, b.budget_period, b.budget_amount, s.spend, case when b.budget_period = 'daily' then s.spend when b.budget_period = 'monthly' then sum(s.spend) over (partition by b.budget_id, month(date(s.day)) order by date(s.day) range between unbounded preceding and current row) when b.budget_period = 'one_time' then sum(s.spend) over (partition by b.budget_id order by date(s.day) range between unbounded preceding and current row) end as spend_over_period from spend_table as s join budget_table as b on s.day = b.day and s.client_id = b.client_id order by 2,3,1 

Output:

dayclient_idbudget_idbudget_periodbudget_amountspendspend_over_period
2021-02-2711-1daily1088
2021-02-2811-1daily1099
2021-03-0111-1daily101010
2021-03-0211-1daily1077
2021-03-0311-1daily1066
2021-03-0411-2monthly5001616
2021-03-0511-2monthly5001935
2021-03-0611-2monthly5001853
2021-02-2722-1monthly4001313
2021-02-2822-1monthly4001528
2021-03-0122-1monthly4001414
2021-03-0222-1monthly4001529
2021-03-0322-2one_time10002020
2021-03-0422-2one_time10002545
2021-03-0522-2one_time10001863
2021-03-0622-2one_time10002790

ncG1vNJzZmirpJawrLvVnqmfpJ%2Bse6S7zGiorp2jqbawutJobXFxZGyGdYOOqameq6Skeri1zZ2msGWWqrukwMiopWarpaJ6sMLEq2SpmaKptrW1zqdkm7E%3D