Unformatted text preview:

STAT 503X Case Study 2 Italian Olive Oils 1 Description This data consists of the percentage composition of 8 fatty acids palmitic palmitoleic stearic oleic linoleic linolenic arachidic eicosenoic found in the lipid fraction of 572 Italian olive oils An analysis of this data is given in Forina Armanino Lanteri Tiscornia 1983 There are 9 collection areas 4 from southern Italy North and South Apulia Calabria Sicily two from Sardinia Inland and Coastal and 3 from northern Italy Umbria East and West Liguria The data available are Region Area Palmitic Acid Palmitoleic Acid Stearic Acid Oleic Acid Linoleic Acid Linolenic Acid Arachidic Acid Eicosenoic Acid South North or Sardinia Sub regions within the larger regions North and South Apulia Calabria Sicily Inland and Coastal Sardinia Umbria East and West Liguria Percentage 100 in sample Percentage 100 in sample Percentage 100 in sample Percentage 100 in sample Percentage 100 in sample Percentage 100 in sample Percentage 100 in sample Percentage 100 in sample The primary question is How do we distinguish the oils from different regions and areas in Italy based on their combinations of the fatty acids 1 2 2 Suggested Approaches Approach Data Restructuring Summary Statistics Visual Inspection The aim of visual methods is to understand separations help decide the best classification method and hence interpret solutions Numerical Analysis The aim of numerical solutions is to get the best predictive results Reason This is very clean data so I don t see any need to restructure To get at location and scale information for each variable and by groups Univariate Plots Bivariate Plots Touring Plots Linear Discriminant Analysis LDA Quadratic Discriminant Analysis QDA Classification and Regression Trees CART Forests Feed Forward Neural Networks and Support Vector Machines 3 Type of questions addressed What is the average percent composition of eicosenoic acid overall Is there a difference in the average percentage of eicosenoic acid for olives from different growing regions Are there differences in the fatty acid composition between the olives from different growing regions Can define the percentage composition of fatty acids that distinguishes olives from different growing regions 3 Actual Approaches 3 1 Summary Statistics The following tables contain information on the minimum maximum median mean and standard deviation for the fatty acids in total and broken down by different growing region All the 572 observations values reported as percentages Min Median Max Mean Std Dev palmitic 6 10 12 01 17 53 12 32 1 69 palmitoleic 0 15 1 10 2 80 1 26 0 53 stearic 1 52 2 23 3 75 2 29 0 37 oleic 63 00 73 02 84 10 73 12 4 06 linoleic 4 48 10 30 14 70 9 81 2 43 linolenic 0 00 0 33 0 74 0 32 0 13 arachidic 0 0 0 61 1 05 0 58 0 22 eicosenoic 0 01 0 17 0 58 0 16 0 14 From the summary statistics we notice that olive oils are mostly constituted of oleic acid Palmitic accounts for about 12 on average and the rest less than 10 on average South Sardinia North Nth Apul Calabria Sth Apul Sicily Inla Sard Coas Sard East Lig West Lig Umbria n 323 98 151 25 56 206 36 65 33 50 50 51 palmitic 13 32 11 11 10 95 10 27 13 02 13 96 12 3 10 98 11 38 11 45 10 3 10 86 palmitoleic 1 55 0 97 0 84 0 62 1 21 1 84 1 05 0 95 1 01 0 84 1 08 0 60 stearic 2 29 2 26 2 31 2 35 2 63 2 11 2 74 2 17 2 44 2 41 2 57 1 94 oleic 71 00 72 68 77 93 78 20 73 07 69 11 73 58 73 61 70 86 77 46 76 74 79 56 linoleic 10 33 11 97 7 27 7 06 8 19 11 66 8 35 11 25 13 37 6 89 8 97 5 97 linolenic 0 38 0 27 0 22 0 43 0 46 0 35 0 42 0 29 0 24 0 26 0 05 0 34 arachidic 0 63 0 73 0 38 0 72 0 64 0 60 0 76 0 74 0 72 0 64 0 07 0 42 eicosenoic 0 27 0 02 0 02 0 35 0 28 0 24 0 38 0 02 0 02 0 02 0 02 0 02 Southern oils have much higher eicosenoic acid on average eicosenoic and slightly higher palmitic and palmitoleic acid content The north and sardinian oils have some difference in the average oleic linoleic and arachidic acids Among the southern oils there is some difference in most of the averages Northern oils have some difference in most of the averages 4 3 2 Visual Inspection The objective here is to find differences in the measured variables amongst the classes Differences here might be actual separations between classes on a variable or linear combination of variables The approach is relatively simple 1 Use color and or symbol to code the categorical class information into plot 2 Begin with low dimensional plots histogram density plot dot plot scatterplot of the measured variables and work up to high dimensional plots parallel coordinate plot tours exploring class structure in relation to data space 3 2 1 Regions Univariate Plots Using 1D Plot mode sequentially work through the variables either by manually sepecting variables or cycling through automatically to examine separations between regions Its possible to neatly separate the oils from southern Italy from the other two regions using just one variable eicosenoic acid Figure 1 displays a textured dotplot and an ASH plot of this variable The oils from southern Italy are removed and we concentrate on separating the oils from northern Italy and Sardinia Although a clear separation between these two regions cannot be found using one variable two of the variables oleic and linoleic acid appear to be important for the separation Figure 1 3 2 2 Regions Bivariate Plots Starting from the two variables identified by the univariate plots as important for separating northern Italian oils from Sardinian oils the remaining variables are explored in relation two these two using scatterplots Oleic acid and linoleic acid show some but not cleanly separated regions Arachidic acid and linoleic acid display a clear separation between the regions but it is a very non linear boundary 3 2 3 Regions Multivariate Plots Starting from the three variables found from bivariate plots to be important for separating northern Italian and Sardinian oils we use a higher dimensional technique to explore them Using either Rotation Tour1D or Tour2D examine the separation between the two regions in the 3 dimensional space Figure 2 shows the results of using Tour1D on the three variables The two regions can be separated cleanly by a linear combination of linoleic and arachidic acid roughly corresponding to 0 957 linoleic 0 289 arachidic 3 2 4 Areas Northern Italy There are three areas in the region Umbria East and West Liguria From univariate plots there are no clear separations between areas although


View Full Document

ISU STAT 503 - Italian Olive Oils

Documents in this Course
Load more
Download Italian Olive Oils
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Italian Olive Oils and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Italian Olive Oils and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?