Limitations | QM Group 16

The findings of this project should be considered in light of certain limitations.

Methodological limitations

The sample of songs selected likely do not provide a true representation of the differences between all popular male and female artists, for a number of reasons. Male artists dominate the top 100 songs, consequently female lyrical sentiment is underrepresented. A larger sample would provide more data for a more robust analysis. The top 100 songs from January each year were chosen for analysis. Consequently a significant number of them are Christmas related, which skews the overall sentiment, irrespective of gender. Gender-based sentiment differences are much less prevalent in songs about Christmas, because of the generally festive and joyful sentiment, so this is a significant barrier to determining the extent of gendered lyrical differences.

The lexicons were both designed to be used on English words so naturally we would expect foreign songs to be classified incorrectly and thus we have tried to filter them out as much as possible. This is not as easy when songs contain a mix of different languages. The song Mic Drop by BTS is in a mix of English and Korean. Although the individual English parts may be correctly classified, the Korean parts make up a significant portion of the song and the overall classification may be completely different.

This can also cause minor errors when a song contains a small section, such as a single line in another language, particularly when a non-English word has an English meaning, for example ‘sin’ (meaning without in Spanish) which is classified with disgust.

This is likely because music uses more passionate language and may not be the case in other mediums. However, a lexicon-based approach is inherently limited in its ability to understand context because words are treated individually. A better approach in future could be to use a language transformer like GPT-3 and train it specifically for lyrical analysis. (Brown. T, 2020)

“You’re a mean one, Mr. Grinch” is a song described by Genius Annotators as “adding emphasis to the Grinch’s nastiness and sick nature” (Genius, 2016). Although this song was correctly classified by our algorithm with a positivity score of -0.9313 and the top emotion being disgust, it exposes some of the limitations of the NRC Sentiment Analysis. The number of words they have classified is severely limited and this is evident in the fact that there are many words that are obviously emotionally charged with no classification (circled in red). In addition, there are many words throughout their lexicon with odd seeming classifications for example the word “quote” (circled in yellow”) which is classified with ‘anticipation’. Perhaps in a context other than music, such as literature, this makes sense.

The sentiment analysis can also lead to incorrect classifications when words that are present in the lexicon are used with a different meaning.

For example, ‘wan’ which is classified as sad (according to its definition of ‘pale and sickly’) but here it is used as an abbreviation of want.

Or the word ‘bout’ (meaning fight), classified as angry but here it is used as an abbreviation of about.

Finally, the word ‘die’ is always classified as sad which is normally fine but in certain contexts for example, , it clearly has a different meaning, and this classification no longer fits.

These examples show that the lexicon is not designed for slang and more modern words but more importantly that it can’t handle homonyms. Homonyms are an issue because the sentiment analysis function has no concept of context so two words with the same spelling are viewed as the same. This lack of contextual awareness can also lead to the incorrect classification of a sole word.

Here, for example, the word ‘smell’ is classified with disgust even though it’s being used to describe a positive smell.

The sentiment analysis functions have the ability to look for certain contextual clues, namely identifying negation words and intensifiers (e.g., ‘never die’ and ‘really good’) and adjusting the sentiment accordingly. The significance of intensifiers is evident in the following graph which shows that sentiment scores tended towards the extremes.

Conclusion