STAT 503 Case Study Clustering of music clips 1 Description This data was collected by Dr Cook from her own CDs Using a Mac she read the track into the music editing software Amadeus II snipped and saved the first 40 seconds as a WAV file WAV is an audio format developed by Microsoft commonly used on Windows but it is getting less popular These files were read into R using the package tuneR This converts the audio file into numeric data All of the CDs contained left and right channels and variables were calculated on both channels The resulting data has 62 rows cases and 7 columns variables LVar LAve LMax average variance maximum of the frequencies of the left channel LFEner an indicator of the amplitude or loudness of the sound LFreq Median of the location of the 15 highest peak in the periodogram There are 11 tracks by Abba 11 from the Beatles and 10 the Eels which would be considered to be Rock and 13 tracks by Vivaldi 6 of Mozart and 8 of Beethoven considered to be Classical There are also 3 tracks from Enya considered to be New Wave The main question we want to answer is Can we group the tracks into a small number of clusters according to their similarity on audio charactieristics This information might be used to arrange tracks on a digital music player Other questions of interest might be Do the rock tracks have different characteristics than classical tracks How does Enya compare to rock and classical tracks Are there differences between the tracks of different artists 1 2 Plan for Analysis Approach Summary statistics marginal and conditional Reason extract location scale information Plots explore data distributions Numerical clustering Grouping the tracks into clusters of similar audio attributes Use hierarchical k means model based and self organizing maps 2 Type of questions addressed How are rock tracks different on average than classical tracks What is the average LAve for Abba relative to other Artists Are there unusual tracks Is there any obvious clustering of the tracks Which tracks might be considered alike 3 3 1 Results Summary Statistics LVar 1 99 107 2 64 107 Mean SD LAve 7 81 47 22 LMax 2 25 104 8 76 103 LFEner 104 03 5 48 LFreq 231 39 176 69 Table 1 Overall means and standard deviations of the variables Artist Abba Beatles Eels LVar 8 52 106 4 45 107 5 11 107 LAve 81 5 5 99 4 59 LMax 2 35 104 2 76 104 3 13 104 LFEner 103 108 108 LFreq 135 147 181 Beethoven Mozart Vivaldi 7 61 106 4 69 106 3 00 106 0 74 5 94 39 1 2 11 104 1 89 104 1 45 104 101 101 102 350 396 305 Enya 5 03 107 11 8 1 61 104 103 95 Table 2 Means of the variables by artist The classical tracks in general have lower LVar than rock tracks Abba has substantially lower LAve on average than all other artists and Vivaldi has substantially higher values on average The LMax values are similar on average for all artists Beatles and Eels have higher LFEner values on average Classical tracks have lower LFreq values on average than rock tracks 3 3 2 Plots The dotplots in Figure 1 show the distribution of values for each artist Abba tracks have unusually low values of LAve Two Eels tracks have unusually large LVar values One Beatles track has an unusually low LFEner value LVar LAve LMax Vivaldi Mozart Enya Eels Beethoven Beatles Abba 0 0e 00 5 0e 07 1 0e 08 100 0 LFEner 100 200 5000 15000 25000 LFreq Vivaldi Mozart Enya Eels Beethoven Beatles Abba 85 90 95 100 105 110 1150 200 400 600 800 Figure 1 Dotplots of each variable by Artist This is a snapshot from the tour that reveals a number of features in the data Saturday Morning and V6 are two unusual tracks that are simply outliers Several tracks are different to their type of music Hey Jude B4 B8 There is some obvious clustering The Abba tracks are distinguishable from the tracks of other artists mostly due to LAve There is a cluster of rock tracks a mixture of Eels and Beatles tracks 4 4 4 1 Cluster Analysis Hierarchical 0 0e 00 1 2e 08 0e 00 Saturday Morning 4e 07 0e 00 Saturday Morning Saturday Morning All in a Days Work Yellow Submarine Love of the Loveless Wrong About Bobby Girl Cant Buy Me Love Rock Hard Times I Feel Fine Help Ticket to Ride Penny Lane Lone Wolf I Want to Hold Your Hand Love Me Do Waterloo Yesterday B4 The Good Old Days Eleanor Rigby Dancing Queen Agony Restraining Anywhere Is V6 B8 B3 Mamma Mia M6 B1 HeyJude Knowing Me Take a Chance M3 V11 Pax Deorum M5 B5 V10 V8 B2 V4 B6 The Winner The Memory of Trees V5 V2 V12 V13 M1 M2 V1 V7 B7 I Have A Dream SOS Lay All You Money V3 Super Trouper V9 M4 4e 08 8e 08 Ward Single Girl Knowing Me Take a Chance M3 B1 HeyJude B3 Mamma Mia M6 V11 Pax Deorum M5 B5 The Winner The Memory of Trees V5 V2 V12 V10 V8 B2 V4 B6 Lay All You Money V3 V13 M1 M2 V1 V7 B7 I Have A Dream SOS Super Trouper V9 M4 Restraining Anywhere Is Dancing Queen The Good Old Days Eleanor Rigby Agony V6 B8 Love Me Do Waterloo Yesterday B4 All in a Days Work Yellow Submarine Love of the Loveless Wrong About Bobby Cant Buy Me Love Ticket to Ride Rock Hard Times I Feel Fine Help Penny Lane Lone Wolf I Want to Hold Your Hand 2e 07 hclust ward Complete music dist hclust single V13 M1 M2 V1 V7 B7 I Have A Dream SOS V11 Pax Deorum M5 B5 V10 V8 B2 V4 B6 The Winner The Memory of Trees V5 V2 V12 Knowing Me Take a Chance M3 B1 HeyJude B3 Mamma Mia M6 Lay All You Money V9 M4 Super Trouper V3 Love Me Do Waterloo Yesterday B4 Dancing Queen The Good Old Days Eleanor Rigby Restraining Anywhere Is Agony V6 B8 Girl Cant Buy Me Love Ticket to Ride All in a Days Work Yellow Submarine Love of the Loveless Wrong About Bobby Rock Hard Times I Feel Fine Help Penny Lane Lone Wolf I Want to Hold Your Hand 6 0e 07 Wards linkage suggests two clusters are suitable to summarize the data This would result in one cluster of 14 purely rock tracks and a second cluster of 48 mixed tracks A three cluster solution would break the large cluster into two one with 12 tracks 8 rock 3 classical 1 new wave and the other with 36 tracks 10 rock 24 classical 2 new wave With single linkage individual tracks are sequentially peeled off the pack illustrating the skewed nature of the data Saturday morning and Girl are singleton clusters The other 12 tracks from the Wards linkage first cluster are grouped together by single …
View Full Document