Comparing my 2021 Spotify Wrapped to my Raw Streaming Data
How I ran basic analytics on my own Spotify streaming data and compared it to my 2021 "wrapped"
While scrolling through my Spotify wrapped for 2021, some things struck me as being a little off. I wanted to get my hands on the raw data to check things for myself. Unfortnately, the Spotify API (at the time of writing this) does not have the ability to pull a user's entire streaming history. You can, however, request an export of your data here. So that's what I did!
A few days after requesting the data from Spotify I received an email with a link to download the exported data. It contained a number of JSON data files, including every stream from the past year. That data looks like this:
[
{
"endTime" : "2020-12-14 19:55",
"artistName" : "Alex Chilton",
"trackName" : "Oogum Boogum",
"msPlayed" : 206840
},
{
"endTime" : "2020-12-14 20:04",
"artistName" : "The Pastels",
"trackName" : "Nothing To Be Done",
"msPlayed" : 232133
}
]
So I started to play with the data a bit to try and compare it to what Spotify included in my "wrapped". Let's see how the raw data compares to Spotify "Wrapped".
Top 10 Artists by Stream Count
Below is a table showing my calculations and a screenshot from my "wrapped".
Artist Name | # of Streams |
---|---|
The Murlocs | 126 |
Shannon & The Clams | 114 |
Devendra Banhart | 82 |
Tennis | 70 |
Daft Punk | 69 |
Alan Jackson | 67 |
The Beatles | 61 |
BRONCHO | 58 |
Khruangbin | 58 |
King Gizzard & The Lizard Wizard | 56 |
If you compare the table data to the screenshot, you can see that Shannon & The Clams and Daft Punk are missing from the official "wrapped". What could be the reason for this? Maybe Top Artists for "wrapped" are calculated by listen time rather than # of streams.
Top 10 Artists by Play Time
Artist Name | Minutes Streamed |
---|---|
Last Podcast On The Left | 753 |
The Murlocs | 398 |
Shannon & The Clams | 295 |
Devendra Banhart | 253 |
Tennis | 236 |
Khruangbin | 222 |
Alan Jackson | 214 |
King Gizzard & The Lizard Wizard | 194 |
The Beatles | 175 |
Daft Punk | 174 |
Ignoring the podcast in the first row of the table, crunching my raw streaming data is still yielding different results from "wrapped". If "time spent streaming" an artist is how the wrapped rankings work, then Khruangbin would be in my top 5 artists - but they are not.
Top 10 Songs by Stream Count
Here's the data breakdown for top songs with the corresponding screenshot from my Spotify Wrapped
Track Name | Artist Name | Stream Count |
---|---|---|
Brother Father Mother Sister | Tim Maia | 19 |
Francesca | The Murlocs | 19 |
Rolling On | The Murlocs | 19 |
The Boy | Shannon & The Clams | 17 |
Get in My Car | BRONCHO | 16 |
My Lady's On Fire | Ty Segall | 15 |
I Want To See The Bright Lights Tonight | Richard & Linda Thompson | 15 |
Nighttime in the Switching Yard - 2007 Remaster | Warren Zevon | 15 |
Chnam oun Dop-Pram Muy (I'm 16) | Ros Serey Sothea | 15 |
Turtles Have Short Legs | CAN | 14 |
Here we can see that "The Boy" by Shannon & The Clams is missing from the screenshot but according to the raw data, I streamed it 17 times. Also - there appears to be a tie between the top 3 songs (at 19 streams) which does match up well to the official "wrapped"
There also appears to be a tie (at 15 streams) for Chnam oun Dop-Pram Muy (I'm 16), My Lady's On Fire and Nighttime in the Switching Yard. Apparently, Spotify chose Nighttime in the Switching Yard to win that tie.
Top Song
The raw data shows that I listened to Brother Father Mother Sister by Tim Maia 19 times - while the wrapped screenshot shows 18 times. I must have listened to the song 1 time between Spotify creating my "wrapped" and when the raw data export happened.
"Wrapping Up"
Overall - I think that Spotify Wrapped was fairly accurate when comparing to it to the raw data but I don't understand why certain artists and songs are just not represented in my "wrapped" rollup. I clearly listened to Shannon & The Clams and Daft Punk a lot last year so what's the deal? Why does Spotify hate Shannon & The Clams and Daft Punk so much?
Possible explanations:
- Spotify uses more data points than just streams & counts (very likely)
- Their tie breaking algorithm takes metrics I don't have into account
- Some artists opt out of being included in wrapped?
- Some artists are excluded for some internal reason at Spotify
- I have bad / out of sync data
- They don't want me to listen to Shannon & The Clams anymore :(
If you're interested, you can view the source code and Spotify data on GitHub