Music, Metadata and Magic!

Oswin Rahadiyan Hartono
4 min readMay 10, 2019

Monday mornings can be transformed from mundane to magical with the simple act of tuning in to Spotify’s Discover Weekly playlist. While it may not consistently leave you in awe, it never fails to surprise with its unexpectedly good selections. Through this curated collection, I’ve discovered numerous talented musicians I had never encountered before, some of whom have quickly become favorites.

https://www.spotify.com/id/redirect/discover-weekly/

Certainly, there’s AI at work (specifically Collaborative Filtering) behind that feature, but I’ll save that discussion for the next topic. What particularly caught my attention was Spotify’s release of Your 2018 Wrapped at the end of 2018. This feature allowed users to discover their top songs of the year, as well as playlists featuring genres and artists they don’t typically explore. I’m excited about the wealth of data available for analysis through the Spotify API

https://newsroom.spotify.com/2018-12-06/relive-your-year-in-music-with-spotify-wrapped-2018/

To begin, as a developer, your first step is to log in to Spotify for Developers using your Spotify account credentials. This login process grants you access to the Spotify API and enables you to create an app for your development projects.

https://developer.spotify.com/dashboard/applications

Next, click on ‘My New App’ to initiate the process of creating a new app. Provide essential details such as the app’s name, description, and intended usage. When prompted to select between commercial or non-commercial purposes, opt for the non-commercial option if your usage will be for research or personal use.

https://developer.spotify.com/dashboard/applications

Voila! With those steps completed, your app is now ready for action.

Be sure to make note of your Client ID and Client Secret once your app has been created. You’ll need these credentials when working with the Python Notebook.

Data Wrangling with Spotify API + Spotipy Library

Spotipy is a lightweight Python library designed to interact with the Spotify Web API. With Spotipy, developers gain full access to the extensive music data offered by the Spotify platform. Furthermore, we can utilize Databricks Community Edition for our Python Notebook, providing a powerful environment for data analysis and manipulation.

Install Spotipy:

pip install spotipy

First, the authentication part :

import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
import spotipy.util as util
# Client ID; copy this from your app
cid ='xxxx'
# Client Secret; copy this from your app
secret = 'zzzz'
# Your Spotify username
username = 'oswin'
#for avaliable scopes see https://developer.spotify.com/web-api/using-scopes/
scope = 'user-library-read playlist-modify-public playlist-read-private'
# Your Redirect URI
redirect_uri='http://mysite.com/callback/'
client_credentials_manager = SpotifyClientCredentials(client_id=cid, client_secret=secret)sp = spotipy.Spotify(client_credentials_manager=client_credentials_manager)token = util.prompt_for_user_token(username, scope, cid, secret, redirect_uri)if token:
sp = spotipy.Spotify(auth=token)
else:
print("Can't get token for", username)

Here’s a quick example of using Spotipy to list all the albums released by the artist ‘Birdy’:

import spotipy

birdy_uri = 'spotify:artist:2WX2uTcsvV5OnS0inACecP'
spotify = spotipy.Spotify()

results = spotify.artist_albums(birdy_uri, album_type='album')
albums = results['items']
while results['next']:
results = spotify.next(results)
albums.extend(results['items'])

for album in albums:
print(album['name'])

Here’s another example demonstrating how to retrieve 30-second samples and cover art for the top 10 tracks by Led Zeppelin:

import spotipy

lz_uri = 'spotify:artist:36QJpDe2go2KgaRleHCDTp'

spotify = spotipy.Spotify()
results = spotify.artist_top_tracks(lz_uri)

for track in results['tracks'][:10]:
print 'track : ' + track['name']
print 'audio : ' + track['preview_url']
print 'cover art: ' + track['album']['images'][0]['url']
print

Here’s an example that will get the URL for an artist image given the artist’s name:

import spotipy
import sys

spotify = spotipy.Spotify()

if len(sys.argv) > 1:
name = ' '.join(sys.argv[1:])
else:
name = 'Radiohead'

results = spotify.search(q='artist:' + name, type='artist')
items = results['artists']['items']
if len(items) > 0:
artist = items[0]
print artist['name'], artist['images'][0]['url']

Create a dataframe of your playlist including tracks’ names and audio features:

import pandas as pdsourcePlaylistID = '1fCOovfyVaAbUiPlqtRF09'
sourcePlaylist = sp.user_playlist(username, sourcePlaylistID);
tracks = sourcePlaylist["tracks"];
songs = tracks["items"];
track_ids = []
track_names = []
for i in range(0, len(songs)):
if songs[i]['track']['id'] != None: # Removes the local tracks in your playlist if there is any
track_ids.append(songs[i]['track']['id'])
track_names.append(songs[i]['track']['name'])
features = []
for i in range(0,len(track_ids)):
audio_features = sp.audio_features(track_ids[i])
for track in audio_features:
features.append(track)

playlist_df = pd.DataFrame(features, index = track_names)

Check the songs you have listened to :

unique_songs = df[df.date < '2018-12-01'].drop_duplicates(subset=['artist', 'title', 'album']).shape[0]
unique_songs

How many minutes you have spent :

df[df.date < '2018-12-01'].duration.astype('int').sum() / 1000 / 60

How many hours you have spent on your favorite artist :

top_listened = df[df.date < '2018-12-01'].groupby(by=['artist']).agg('sum')['duration'].sort_values(ascending=False)[:1]
top_listened = top_listened / 1000 / 60 / 60 # ms / s / m -> h
top_listened

Top artists :

df[df.date < '2018-12-01'].groupby(by=['artist']).agg('count')['track'].sort_values(ascending=False)[:100]

--

--