Skip to content

Funn , understanding spotify API and data representation.

Notifications You must be signed in to change notification settings


Repository files navigation

Spotify Music Analysis Project


The project utilises the spotify API to obtain user's music preferences and analyse the data to learn about the user's music listening trends

Project Structure

  • Dataset/dataset.csv: This is the dataset used for the project. Here's a link to it's kaggle, Will be adding more altho
  • graphingtables/ This generates a panda table with the audio features. Like the one showed in the dataframe.png.
  • This explains every feature from dataframe.png .
  • photogrid/ This script generates a 3x3 grid of album covers from the user's top tracks. The grid is saved as 'recentlyplayed.jpg'.
  • recommendation_model/ This script generates song recommendations based on the user's recently played tracks. (Not ready yet)

Table of Contents

  1. Installation
  2. Usage
  3. Functions and snippets


To run this project, follow these steps:

  1. Clone the repository:

will paste link at last.

  1. Install the requird packages:

    matplotlib spotipy pandas scikit-image

  2. Replace client_id and client_secret in spotify_auth() and .env file with your own Spotify API credentials. To obtain the credentials go to, login and create an app.


  1. Run photogrid/ to generate a grid of album covers from your top tracks.

  2. Run recommendation_model/ to get song recommendations based on your recently played tracks.

  3. Run 'graphingtables/' to get a table with a lot of information. (kinda useless for humans but wtv, I will try to make more cool representation.)

Apart from these these we can write custom codes to utilise functions in to obtain playlist data of users, generate plylist feature vector, recommend songs from a plyalist.

We can also display the playlist album covers by using function written in

Each of the functions in user_playlist_integration is explained belwo with sample usage and outputs.



Reads in the Spotify dataset from the specified location.


Authenticates and returns a Spotify object (sp) using Spotipy and OAuth2.

create_album_outputs(album_name, id_dic, df, sp)

Pulls songs from a specific album on Spotify and returns those available in the provided dataset.

  • Parameters:

  • album_name (str): Name of the album to pull songs from.

  • id_dic (dict): Dictionary mapping album names to album IDs.

  • df (pandas DataFrame): Spotify dataset.

  • sp: Spotify object authenticated with spotify_auth().

  • Returns:

  • album (pandas DataFrame): DataFrame containing songs from the specified album available in the provided dataset.

Note that albums and plylist are different from each other in spotify and the functions to get album and playlist cannot be used interchangeably.

create_playlist_outputs(playlist_name, id_dic, df, sp)

Pulls songs from a specific playlist on Spotify and returns those available in the provided dataset.


  • playlist_name (str): Name of the playlist to pull songs from.
  • id_dic (dict): Dictionary mapping playlist names to playlist IDs.
  • df (pandas DataFrame): Spotify dataset.
  • sp: Spotify object authenticated with spotify_auth().


  • playlist (pandas DataFrame): DataFrame containing songs from the specified playlist available in the provided dataset.

create_playlist_outputs_by_id(playlist_id, df, sp)

Pulls songs from a specific playlist on Spotify using its ID and returns those available in the provided dataset.


  • playlist_id (str): ID of the playlist to pull songs from.
  • df (pandas DataFrame): Spotify dataset.
  • sp: Spotify object authenticated with spotify_auth().


  • playlist (pandas DataFrame): DataFrame containing songs from the specified playlist available in the provided dataset.

Playlist IDs can be obtained either by copying the last part of a playlist link as shown below, or by utilising the Spotipy Module. Below is a snippet to obtain a user's plalist and retrieve the ID's of each of them.

from SpotipyK.recommendation_model.user_playlist_integration import spotify_data,spotify_auth,
#initialising spotify object
sp = spotify_auth()
user_playlists = sp.current_user_playlists()

for playlist in user_playlists['items']:

create_playlist_outputs_by_link(playlist_link, df, sp)

Pulls songs from a specific playlist on Spotify using its link and returns those available in the provided dataset.


  • playlist_link (str): Spotify link of the playlist to pull songs from.
  • df (pandas DataFrame): Spotify dataset.
  • sp: Spotify object authenticated with spotify_auth().


  • playlist (pandas DataFrame): DataFrame containing songs from the specified playlist available in the provided dataset.


from SpotipyK.recommendation_model.user_playlist_integration import spotify_data,spotify_auth,create_playlist_outputs_by_link
import pandas as pd

#importing dataset
df = spotify_data()

#initialising spotify object
sp = spotify_auth()

playlist_link= ""

#getting playlist dataframe
playlist = create_playlist_outputs_by_link(playlist_link,df,sp)

#printing head of allbum

generate_playlist_feature(complete_feature_set, playlist_df, weight_factor)

Summarizes a user's playlist into a single vector and returns the summarized playlist and non playlist features.


  • complete_feature_set (pandas DataFrame): DataFrame including all features for Spotify songs.
  • playlist_df (pandas DataFrame): DataFrame of songs in the playlist.
  • weight_factor (float): Float value representing the recency bias (closer to 1 gives more priority to recent songs).


  • playlist_feature_set_weighted_final (pandas Series): Summarized feature vector representing the playlist.
  • complete_feature_set_nonplaylist (pandas DataFrame): DataFrame of songs that are not in the selected playlist.

To generate feture set after getting the playlist (you can continue the code from the playlist link example)

#preprocessing the dataset 
df['consolidates_genre_lists'] = df['track_genre'].apply(lambda x: x.split("|"))
df['popularity_red'] = pd.qcut(df['popularity'], q=5, labels=False)
float_cols = ['acousticness', 'danceability', 'energy', 'instrumentalness', 'liveness', 'loudness', 'speechiness', 'tempo', 'valence']
complete_set = create_feature_set(df, float_cols= float_cols)


#generate feature set of playlist and nonplaylist songs
playlist_weighted,non_playlist = generate_playlist_feature(complete_set,playlist_df,1.4)


generate_playlist_recommendations(df, features, nonplaylist_features, sp)

Generates and returns top 10 recommendations for a playlist based on cosine similarity.


  • df (pandas DataFrame): Spotify dataset.
  • features (pandas Series): Summarized playlist feature vector.
  • nonplaylist_features (pandas DataFrame): Feature set of songs that are not in the selected playlist.
  • sp: Spotify object authenticated with spotify_auth().


  • non_playlist_df_top_10 (pandas DataFrame): DataFrame containing top 10 recommendations for the playlist.

To generate recommendations using the featureset obtained previously,

recommendation_for_playlist = generate_playlist_recommendations(df,playlist_weighted, non_playlist,sp)

We can also visualise the recommendations using the fucntion in module. Below is the description of the function and example usage.


Visualizes song cover art alongside track names from a given pandas DataFrame (df).


  • df (pandas dataframe): DataFrame containing 'url' (cover art URLs) and 'track_name' (track names).


  • plt (matplotlib.pyplot object): Matplotlib figure object displaying the cover arts with track names.
#import function from SpotipyK.recommendation_model.visualizing


Funn , understanding spotify API and data representation.






No releases published


No packages published
