# Shantanu's Blog

Database Consultant

## pandas case study 34

moving average for the variable length window

How do I calculate the moving average where the set of previous rows is not fixed? For e.g. this SQL query will calculate the average for the totalprice column for the last 1 month orders.

SELECT avg(totalprice) OVER (
PARTITION BY custkey
ORDER BY orderdate
RANGE BETWEEN interval '1' month PRECEDING AND CURRENT ROW)
FROM orders

I have this dataframe:

from io  import StringIO
import pandas as pd

myst="""cust_1,2020-10-10,100
cust_2,2020-10-10,15
cust_1,2020-10-15,200
cust_1,2020-10-16,240
cust_2,2020-12-20,25
cust_1,2020-12-25,140
cust_2,2021-01-01,5
"""

u_cols=['custkey', 'orderdate', 'totalprice']

myf = StringIO(myst)
import pandas as pd
orders = pd.read_csv(StringIO(myst), sep=',', names = u_cols)
orders['orderdate'] = pd.to_datetime(orders['orderdate'])
df=df.sort_values(list(df.columns))

Answer:

orders['my_average'] = (orders.groupby('custkey')
.apply(lambda d: d.rolling('30D', on='orderdate')['totalprice'].mean())
.reset_index(level=0, drop=True)
.astype(int)
)

https://stackoverflow.com/questions/68268531/window-function-for-moving-average

Labels:

## Quick test tables in Athena

If you quickly ant to test data using athena query, use with syntax as shown below:

WITH countries(country_code) AS (VALUES 'pol', 'CAN', 'USA')
SELECT upper(country_code) AS country_code
FROM countries

In this example aggregate expression uses "OVER" function.

WITH students_results(student_id, result) AS (VALUES
('student_1', 17),
('student_2', 16),
('student_3', 18),
('student_4', 18),
('student_5', 10),
('student_6', 20),
('student_7', 16))
SELECT
student_id,
result,
count(*) OVER (
ORDER BY result
) AS close_better_scores_count
FROM students_results

Labels: ,