Methodology for Visualisations
The visualisations were mostly created using Matplotlib and Seaborn . Generic examples of the graphs created (violin plots, bar charts, scatter plots, and density charts) can be found at https://www.python-graph-gallery.com/. The visualisations draw on the information from our main dataset, which was manipulated in some cases where needed, as detailed below.
​
All Graphs
Throughout the presentation of visualisations and accompanying analysis, the songs within our data set will be separated into three gender categories: male, female, and group. Our code is only able to assign ‘male’ or ‘female’ to songs with a single artist. Songs that are a collaboration between artists or sung by a group are given an ‘NaN’ value. In order to make use of the data, I have labelled these as ‘group’ using the following code:

Group therefore refers to any song sung by a band or by multiple collaborating artists, regardless of whether the group consists of different genders.
Bar Charts - Emotional Category Data
To make the bar graphs investigating the distribution of gender data in each emotional category, the following for loop was written to count the number of male, female, and mixed artist songs in each year, storing the data in an excel file.


Bar Charts - Genre Data
Many songs are labelled with a varying number of genres, for example ['hard bop', 'jazz', 'jazz fusion', 'jazz guitar']. This genre data in our main data set was not stored in a way that would allow access to each individual genre, but rather grouped together the genres for each song, which is not useful for creating visualisations. Therefore, the following method found here (Hilsdorf, 2020) was used to convert the genre data into a usable format.
This shows the gathering of the male genre data:
Scatter Plot - Decade Data
Our dataset is of the top 100 songs in each year (January), not each decade. However, since there are 62 years worth of data, it would be practically impossible to differentiate between years if the same colour-coding method was employed to differentiate years in the scatter plot above. Therefore, a new list was created and added as a column to the main data frame to create the 'Song positive and negative salience' graph colour coded by song release decade.

