A fun exercise to export all the data from Strava over to a dataset so that I can create fun views and charts on top.
Pre-requisites:
- Create an API application: https://www.strava.com/settings/api in your strava account (set
localhost
as the website and other details) - Set API keys as Environment variables
STRAVA_CLIENT_ID
&STRAVA_CLIENT_SECRET
You can set the env variables simply by export STRAVA_CLIENT_ID=...
& export STRAVA_CLIENT_SECRET=...
Find the codebase here: https://github.com/Vaibhavs10/strava-analyse
Setup the env and scraper:
- Clone the github repository via:
git clone https://github.com/Vaibhavs10/strava-analyse.git
- Setup the python env with
uv
:uv venv --python 3.12
, followed by,source .venv/bin/activate
- Install all required packages via
uv pip install requests huggingface_hub
- Run
huggingface-cli login
(required to upload the dataset to Hugging Face) - Run the python script and follow the instructions:
python upload-strava-to-hf.py
Note: when you run the script you’ll be prompted to authorise access to your strava App. Once you click Authorise it’ll redirect you too a page that doesn’t exist 👀.
Don’t worry about it and look at the URL it is trying to redirect too, it should look something like http://localhost:8000/?state=&code=e55c038bcf96ea6deff15c68649afc9554e6fbd6&scope=read,activity:read_all,profile:read_all
The thing that matters to us is the string after code
, just copy and paste it to the script and that’s it!
If all goes well, you should be able to go to your dataset on Hugging Face, here’s mine for example: https://huggingface.co/datasets/reach-vb/strava-stats
With the Hugging Face dataset comes a lot of interesting possiblilities to identify trends using SQL or even pandas/ polars too. Since the dataset is small we can just run SQL queries via DataStudio.
Let’s crunch some numbers!
To start with, how many cummulative kms have I run through the years?
Best part about Datastudio is you can just ask the AI to write a SQL query for you 🔥
WITH run_stats AS (
SELECT
EXTRACT(YEAR FROM start_date::DATE) AS year,
SUM(distance) / 1000 AS total_distance_km,
SUM(moving_time) AS total_moving_time_seconds
FROM
trainWHERE
type = 'Run'
AND distance > 0
AND moving_time > 0
GROUP BY
year
)SELECT
year,
total_distance_km,-- Calculate minutes and seconds separately for proper pace format
FLOOR((total_moving_time_seconds/60) / total_distance_km) || ':' ||
LPAD(FLOOR(MOD((total_moving_time_seconds/60) / total_distance_km, 1) * 60)::VARCHAR, 2, '0') AS avg_pace_per_km,
0, MAX(total_distance_km) OVER (), 30) AS distance_chart,
bar(total_distance_km, /60) / total_distance_km, 0, MAX((total_moving_time_seconds/60) / total_distance_km) OVER (), 30) AS pace_chart
bar((total_moving_time_secondsFROM
run_statsORDER BY
year;
Result:
year | total_distance_km | avg_pace_per_km | distance_chart | pace_chart |
---|---|---|---|---|
2020 | 78.129 | 7:18 | ████▌ | ████████████████████████████████ |
2021 | 414.0414 | 6:27 | ████████████████████████▏ | ████████████████████████████▌ |
2022 | 169.5316 | 5:59 | █████████ | ████████████████████████████▌ |
2023 | 73.4308 | 6:48 | ████▎ | █████████████████████████████▉ |
2024 | 363.506 | 6:31 | █████████████████████▎ | ████████████████████████████▊ |
2025 | 512.7066 | 7:16 | ████████████████████████████████ | ██████████████████████████████▊ |
I was at my fastest self in 2022 (though with much less kms). 2023 was the least I’ve ran across the years and 2025 (so far) has been the most I’ve ran (albeit with a much slower pace owing too Zone 2 training).
Next, let’s look more deeply into the year so far and how the runs have been shaping up.
SELECT
EXTRACT(MONTH FROM start_date::DATE) AS month,
COUNT(*) AS num_runs,
ROUND(SUM(distance) / 1000, 1) AS total_distance_km,
ROUND(AVG(average_heartrate), 1) AS avg_heartrate,
-- Calculate pace in min:sec per km from moving_time and distance
CONCAT(
FLOOR((SUM(moving_time) / SUM(distance) * 1000) / 60), -- minutes
':',
LPAD(CAST(FLOOR(MOD(SUM(moving_time) / SUM(distance) * 1000, 60)) AS VARCHAR), 2, '0') -- seconds with leading zero
AS avg_pace_min_per_km,
) ROUND(AVG(distance) / 1000, 2) AS avg_distance_per_run_km,
ROUND(AVG(average_watts), 1) AS avg_power
FROM
trainWHERE
type = 'Run'
AND distance > 2000
AND EXTRACT(YEAR FROM start_date::DATE) = 2025
GROUP BY
EXTRACT(MONTH FROM start_date::DATE)
ORDER BY
month;
Result:
month | num_runs | total_distance_km | avg_heartrate | avg_pace_min_per_km | avg_distance_per_run_km | avg_power |
---|---|---|---|---|---|---|
1 | 8 | 70.7 | 161.6 | 6:24 | 8.84 | 176.2 |
2 | 9 | 52.2 | 163.3 | 6:27 | 5.81 | 184.4 |
3 | 8 | 46.3 | 153.9 | 7:15 | 5.78 | 161.8 |
4 | 14 | 82.7 | 157.5 | 6:46 | 5.91 | 178.7 |
5 | 15 | 104.2 | 152.7 | 7:59 | 6.94 | 157.7 |
6 | 16 | 119.1 | 154.5 | 7:51 | 7.44 | 162.7 |
7 | 4 | 34.0 | 152.0 | 7:19 | 8.50 | 160.6 |
June was the month with the most kms so far. It’s nice to see Zone 2 progression a bit from May -> July (7:59 to 7:19 min/ km). As someone with the aerobic base of a potato it’s quite nice to see some progress.
Alright, let’s delve a bit more deeper into how my Zone 2 running transformation is going
SELECT
:DATE AS date,
start_date:ROUND(SUM(distance) / 1000, 1) AS total_distance_km,
ROUND(AVG(average_heartrate), 1) AS avg_heartrate,
-- Properly formatted pace as MM:SS with leading zero for single-digit seconds
CONCAT(
FLOOR(SUM(moving_time) / SUM(distance) * 1000 / 60)::INTEGER, -- minutes
':',
LPAD(
ROUND(SUM(moving_time) / SUM(distance) * 1000) % 60)::INTEGER::VARCHAR, -- seconds converted to integer first
(2, '0'
)AS avg_pace_min_per_km,
) ROUND(AVG(average_watts), 1) AS avg_power
FROM
trainWHERE
type = 'Run'
AND distance > 7500 -- Filter for runs greater than 7.5 km
AND EXTRACT(YEAR FROM start_date::DATE) IN (2024, 2025) -- Only 2024 and 2025 runs
GROUP BY
:DATE
start_date:ORDER BY
date ASC -- Order from oldest to latest
Results:
date | total_distance_km | avg_heartrate | avg_pace_min_per_km | avg_power |
---|---|---|---|---|
2025-01-18 | 17.1 | 164.3 | 6:23 | 177.5 |
2025-02-15 | 10.0 | 170.1 | 6:30 | 184.3 |
2025-02-20 | 7.5 | 167.3 | 6:29 | 184.3 |
2025-04-12 | 10.0 | 170.0 | 6:51 | 181.0 |
2025-04-17 | 8.1 | 156.8 | 6:41 | 181.3 |
2025-05-11 | 9.2 | 156.0 | 8:06 | 152.0 |
2025-05-13 | 7.5 | 153.5 | 8:50 | 139.8 |
2025-05-15 | 8.0 | 153.3 | 8:26 | 149.9 |
2025-05-17 | 8.0 | 150.1 | 8:07 | 157.0 |
2025-05-20 | 9.1 | 155.1 | 8:26 | 149.2 |
2025-05-24 | 7.8 | 152.0 | 7:58 | 157.0 |
2025-05-27 | 10.1 | 161.7 | 7:13 | 171.4 |
2025-06-03 | 8.5 | 153.0 | 7:54 | 159.2 |
2025-06-06 | 8.0 | 150.8 | 8:17 | 152.2 |
2025-06-08 | 10.5 | 153.3 | 8:15 | 152.3 |
2025-06-14 | 12.1 | 150.4 | 7:54 | 164.3 |
2025-06-21 | 9.0 | 161.0 | 7:27 | 167.3 |
2025-06-22 | 11.0 | 154.0 | 7:08 | 174.1 |
2025-06-24 | 8.0 | 158.7 | 7:09 | 181.3 |
2025-06-28 | 10.0 | 152.9 | 8:07 | 153.9 |
2025-07-01 | 10.1 | 154.0 | 7:11 | 0.0 |
2025-07-05 | 13.2 | 152.5 | 7:37 | 160.6 |
Pretty dope to see the heart rate go down and the pace slightly improve, hoping to double down on this more as the year continues!
These were just simple examples but you can really do a lot more with this, checkout segemnts, bike rides power comparisons based on elevation and more.
P.S. You can save all of these queries in your Hugging Face Data Studio as well, so you can just refresh your dataset monthly/ weekly and analyse trends.
Now go ahead and play with it and let me know how it goes. 🤙