Aniket Dash
Diploma Level Student, BS in DS
What are the necessary and satisfying conditions for classifying something as music? I read somewhere that the ‘aesthetics of music’ explored the mathematical and cosmological dimensions of rhythmic and harmonic organization. I tried to understand musical behavior and experience. I aim to understand the correlations between musical features and emotional states. “Mood” is a temporary state of mind or feeling. I always thought of it as something with more of an explosive definition but then I googled it and realized it actually is a temporary state of mind which can easily be manipulated. I wish to understand, as I write this, how music plays a vital role in manipulating one’s mood. I’ll try to mathematically formulate some of the theories I have in mind. We will try to design a model that generates recommendations based on our preference and then later tune those recommendations according to our theories. First, we will reiterate some of the basic nature of feelings so that we can formulate them mathematically.
Feelings are closely related to, but not the same as, emotions. A fundamental difference between feelings and emotions is that feelings are experienced consciously, while emotions manifest either consciously or subconsciously. Some people may spend years, or even a lifetime, not understanding the depth of their emotions. Obviously, I googled that too. I don’t understand the difference myself. When somebody cries, how do we figure out if that person is “feeling sad” or is “being emotional”. I am sure there are ways to figure that out but we won’t get into that. The point of getting into the difference was to roughly say that feelings are somehow easier to formulate if we compare them to emotions. Mutability of feelings, provides us with data with diversity. Emotions are known for blending and hiding in plain sight, so I’ll drop them for the time being.
OBSERVATIONS
If you ask me, music adds essence to every feeling. When understood and played correctly, it complements the feeling immensely. Imagine watching a movie with no soundtracks or background scores. As someone who prefers to add the touch of music in everything, I can tell you some of the things I’ve observed. These are complex constructs, and are difficult to read. My efforts are solely directed towards simplifying some of the aspects. Here’s what I got, but before that let’s assume feelings to be sinusoidal waves :

- an episode of lament through reminiscing. When we do that, we live in the future where we miss the current moment by being in the moment. This transformation is often triggered when we try to acknowledge the moment itself instead of living in it or when we use the wrong set of songs or soundtracks in an attempt to amplify the feeling of being happy. What is the “wrong” set of songs ? We can easily classify music into “slow” or “upbeat” songs by its tempo. Tempo of a song is typically measured in beats per minute (BPM ) which indicates the number of beats or pulses that occur in one minute of the song. There are several methods to measure the tempo of a song but our mind does that simply by binning it into slow or fast. Upbeat songs often have a positive impact on our mood. When we are happy, it often amplifies the feeling. I like to imagine this as constructive interference. My happiness and the song I’m listening to, can both be viewed as a wave. Happiness has got its own natural frequency and amplitude. If the song’s wave doesn’t constructively interfere, it ruins the mood or in the worst case turns it into the episode we talked about. Slow songs with arpeggios of guitar might be good for relaxing or reminiscing but if we listen to them when we’re happy, it projects the same effect, we start to live in the future making the current scene something to reminisce about. Well unless you are absolutely crazy, I’m assuming you don’t listen to songs with sad lyrics or something with slow violin when you’re energetically happy.
- Sadness is not that easily transformable into happiness. It doesn’t always budge the way we want it to when we listen to upbeat songs while being sad. However it is somehow easier to transform it into rage. Intuition tells us that listening to songs with lower tempo or something with sad lyrics might nurture the feeling of being sad but it often pushes us into a spiral. I can try to explain this by taking the example of waves and interference but I’ll try another angle. In a noisy environment, the ability to hear a specific sound or voice amidst the background noise primarily depends on the sound’s loudness. In our noisy classroom, my teacher always caught one specific student even though everybody in the room shouted, because he was louder. When I walked on the same line, I noticed that when we feel sad or low, the brain is particularly noisy. Now all we need is a loud student to stand apart from the background noise. We need something loud and fast. Listening to Rock/Metal/Hip-hop often neutralizes the background noise in the brain.
- When I work or relax I prefer music with less speechiness, which means music with less vocals or spoken words. I am sure I am not alone. The mind talks to itself when it is the case of problem solving, critical thinking, brainstorming or just contemplating. When words from the music flood the brain-space it becomes difficult to focus on the words of our own internal thoughts.
DATA PREPARATION
For creating the data, I’ll classify feelings into very broad categories and study the branches later. So initial bins would be, happy, sad, angry. Of course, one can feel a mix of things but let’s assume that feelings are mutually exclusive. To collect feelings and eventually engineer them according to our comparison space we need data from people. This data should somehow reflect people’s feelings through their choice in music. I need to build a preference-vector in the same dimension in which our song-vector resides.
For song data, lots of datasets use melody, key, energy, danceability etc. as important characteristics or features. All of these features are represented by numbers or floats.
Let us get familiar with the terminology used to understand music :
- Energy : The measure of the song’s intensity or activity level.
- Loudness : The perceived volume or intensity of the song.
- Valence : The emotional positivity or negativity of a song.
- Tempo : The speed or pace of a song, often measured in BPM but we will scale it down to a float between 1.0 and 10.0.
- Acousticness : Refers to a numerical measure or attribute of a song that quantifies the degree to which it exhibits acoustic or unplugged characteristics. Songs with a higher acousticness value have more acoustic or organic sound elements, such as acoustic instruments like guitars, pianos or vocals without heavy electronic effects.
- Liveness — The perception of live performance in the song.
- Speechiness — The presence of spoken words or speech in the song.
- Instrumentalness — The presence or absence of instruments in the song.
There are a lot of other features which act as technical descriptors and metadata for a song, like popularity, year of release, artist, duration. We can include them in our model to create a more accurate and robust model.
FEATURE ENGINEERING
I want all the feature values to be on the same scale so that the model isn’t skewed towards one attribute. For example if the values for the energy feature is in the range of 0.01 to 0.09 but the values of the loudness feature is in the range of -10 to 10, the model becomes skewed towards the loudness attribute because of their difference in magnitude. So at first, we need to normalize the float variables or values. Initially I was satisfied with the features we selected for the songs but it looks like we need some additional help to clear things out for us. We need to add the consolidated genre of songs into the feature set. We need a greater level of granularity to allow us to learn more about what types of songs are in a specific preference-vector. So we need to look at our features at a granular level. For example — Rock is a generic genre but if there are subgenres of rock like ‘progressive rock’, ‘Indie rock’, ‘Punk rock’ etc. in our dataset, that adds so much detail to our input space of the model. If we open everything up that roughly gives us 2600 elements in our feature set. Imagine the level of granularity Spotify looks at, to create such accurate recommendations.

Now we can roughly see the preference of a user by using a kind of frequency bias for genres. We will use TF-IDF to do this.TF-IDF stands for Term Frequency — Inverse Document Frequency. It measures the importance of a keyword phrase by comparing it to the frequency of the term in a large set of documents. In our consolidated genre, if some generic genre shows up like “rock” or “pop”, we don’t want it to have a high weight which would influence the model. On the other hand, if something specific shows up which really gives us a hint about a user’s taste in music like “Indian Classical”, we want it to influence the direction of the model. This simple yet elegant algorithm allows us to easily add weights to songs.
def create_feature_set(df, float_cols):
tfidf = TfidfVectorizer()
tfidf_matrix = tfidf.fit_transform(df[‘consolidates_genre_lists’].apply(lambda x: ” “.join(x)))
genre_df = pd.DataFrame(tfidf_matrix.toarray())
genre_df.columns = [‘genre’ + “|” + i for i in tfidf.get_feature_names()]
genre_df.reset_index(drop = True, inplace=True)
year_ohe = ohe_prep(df, ‘year’,’year’) * 0.5
popularity_ohe = ohe_prep(df, ‘popularity_red’,’pop’) * 0.15
floats = df[float_cols].reset_index(drop = True)
scaler = MinMaxScaler()
floats_scaled = pd.DataFrame(scaler.fit_transform(floats), columns = floats.columns) * 0.2
final = pd.concat([genre_df, floats_scaled, popularity_ohe, year_ohe], axis = 1)
final[‘id’]=df[‘id’].values
return final
complete_feature_set = create_feature_set(spotify_df, float_cols=float_cols)
BUILDING PREFERENCE VECTOR
We click on a song or a soundtrack and our music app plays it for us. For apps like Spotify, when we don’t play from a playlist, it creates a queue of similar songs to play after the song we’re listening to.
To vectorize this queue we’ve added some columns like ‘Date’, ‘Months Behind’ and ‘Weight’.
Reason for adding these three columns :
I want to intuitively apply a recency bias or prioritize the songs that were added more recently. Taking ‘Months behind’ as an input I can create a ‘Weight’ that determines how much a certain row contributes as an input into our main model. So the weight of each individual row is multiplied across the entire row. The final preference-vector is just an aggregate of all these rows. This vector summarizes all the songs that I prefer.

CALCULATING SCORE FOR SONGS
We can visualize the idea behind the comparison using a 2D representation of these vectors by taking insights from Madhav Thaker’s visualization of similar vectors. This depiction shows us how we can calculate “scores” for a song with respect to a user’s preference-vector. Cosine similarity is a metric used to measure the similarity between two vectors in a multidimensional space. Essentially, we are calculating the cosine of the angle between the two vectors which represents their similarity in terms of direction. So here, the angle between the preference-vector and the song vector represents a personalized score for a new song which means smaller the angle, higher the song’s score. As far as general recommendation goes, we are quite done. Our model is expected to provide us with recommendations that somehow go in line with our listening, but this isn’t what we aim at. We want to modify the recommendations so that our model gives us appropriate songs with respect to our feelings. For me, the model fails if the user listens to a series of sad songs and the model keeps fueling the fire by suggesting similar songs. We need to roughly estimate the user’s feelings to tune the recommendations appropriately.
def create_necessary_outputs(preference_vector,id_dic, df):
preference = pd.DataFrame()
preference_vector = name
for ix, i in enumerate(sp.preference(id_dic[name])[‘tracks’][‘items’]):
preference.loc[ix, ‘artist’] = i[‘track’][‘artists’][0][‘name’]
preference.loc[ix, ‘name’] = i[‘track’][‘name’]
preference.loc[ix, ‘id’] = i[‘track’][‘id’] # [‘uri’].split(‘:’)[2]
preference.loc[ix, ‘url’] = i[‘track’][‘album’][‘images’][1][‘url’]
preference.loc[ix, ‘date_added’] = i[‘added_at’]
preference[‘date_added’] = pd.to_datetime(preference[‘date_added’])
preference = preference[preference[‘id’].isin(df[‘id’].values)].sort_values(‘date_added’,ascending = False)
return preference

MODIFIED SCORE
Creating a dataset by conducting a survey and asking people questions about the songs they listen to when they feel a certain way and then training a model using supervised learning techniques and finding out a hyperparameter for all the features we discussed to check if a particular feature value exceeds the parameter value to decide what a person is feeling, is a daunting task. Although, if we get such a dataset on which we train our model and find an aggregate value for each feature value which makes the decision of tagging a person happy or sad or angry, just a matter of an if-else statement. Since we don’t have such a dataset I can just rely on an estimation of the user’s cognition from the output of the model we designed. We develop a separate scoring system. So here are some assumptions we are going to rely on for building our modified score :
If the preference vector spits out higher numbers for feature values like ‘Energy’, ‘Danceability’, ‘Valence’, then we stick to the the scoring system we developed earlier because if we get higher values on these features we are going to assume that the person is happy and similar songs would be good recommendation.
If we get lower numbers on the features we discussed above, especially valence and additionally we get higher numbers on features like acousticness, then we somehow set low scores for similar song suggestions. In this way, we keep our theory alive and possibly prevent the user from going down a spiral. Just kidding.
These theories are based on personal experience and philosophy. When the heart is heavy instead of making it heavier it is better to shake it out of apathy. Meaningful noise is always better than the brain noise which induces self-doubt. This was my effort in building something with a more personal touch, something that relates with your feelings and takes appropriate action. If mathematics can help represent complex cosmological phenomena, I’m sure there are ways we can represent feelings as well. Thanks for reading !