Unformatted text preview:

Chapter 2Getting started with qplot2.1 IntroductionIn this chapter, you will learn to make a wide variety of plots with yourfirstggplot2function,qplot(), short forquick plot.qplotmakes it easy toproduce complex plots, often requiring several lines of code using other plottingsystems, in one line.qplot()can do this because it’s based on the grammarof graphics, which allows you to create a simple, yet expressive, descriptionof the plot. In later chapters you’ll learn to use all of the expressive power ofthe grammar, but here we’ll start simple so you can work your way up. Youwill also start to learn some of theggplot2terminology that will b e usedthroughout the book.qplothas been designed to be very similar toplot, which should makeit easy if you’re already familiar with plotting in R. Remember, during an Rsession you can get a summary of all the arguments toqplotwith R help,?qplot.In this chapter you’ll learn:•The basic use ofqplot—If you’re already familiar withplot, this will beparticularly easy, § 2.3.•How to map variables to aesthetic attributes, like colour, size and shap e ,§ 2.4.• How to create many different types of plots by specifying different geoms,and how to combine multiple types in a single plot, § 2.5.•The use of faceting, also known as trellising or conditioning, to break apartsubsets of your data, § 2.6.•How to tune the appearance of the plot by specifying some basic options,§ 2.7.• A few important differences between plot() and qplot(), § 2.8.10 2 Getting started with qplot2.2 DatasetsIn this chapter we’ll just use one data source, so you can get familiar withthe plotting details rather than having to familiarise yourself with differentdatasets. Thediamondsdataset consists of prices and quality informationabout 54,000 diamonds, and is included in theggplot2package. The datacontains the four C’s of diam ond quality, carat, cut, colour and clarity; andfive physical measurements, depth, table, x, y and z, as described in Figure 2.1.The first few rows of the data are shown in Table 2.1.carat cut color clarity depth table price x y z0.2 Ideal E SI2 61.5 55.0 326 3.95 3.98 2.430.2 Premium E SI1 59.8 61.0 326 3.89 3.84 2.310.2 Good E VS1 56.9 65.0 327 4.05 4.07 2.310.3 Premium I VS2 62.4 58.0 334 4.20 4.23 2.630.3 Good J SI2 63.3 58.0 335 4.34 4.35 2.750.2 Very Good J VVS2 62.8 57.0 336 3.94 3.96 2.48Table 2.1:diamondsdataset. The variables depth, table, x, y and z refer to thedimensions of the diamond as shown in Figure 2.1ztable widthxxyzdepthdepth = z depth / z * 100table = table width / x * 100Fig. 2.1: How the variables x, y, z, table and depth are measured.The dataset has not been well cleaned, so as well as demonstrating inter-esting relationships about diamonds, it also demonstrates some data qualityproblems. We’ll also use another dataset,dsmall, which is a random sampleof 100 diamonds. We’ll use this data for plots that are more appropriate forsmaller datasets.> set.seed(1410) # Make the sample reproducible2.3 Basic use 11> dsmall <- diamonds[sample(nrow(diamonds), 100), ]2.3 Basic useAs withplot, the first two arguments toqplot()arexandy, giving thex- and y-coordinates for the objects on the plot. There is also an optionaldataargument. If this is specified,qplot()will look inside that data framebefore looking for objects in your workspace. Using thedataargument isrecommended: it’s a good idea to keep related data in a single data frame. Ifyou don’t specify one,qplot()will try to build one up for you and may lookin the wrong place.Here is a simple example of the use ofqplot(). It produces a scatterplotshowing the relationship between the price and carats (weight) of a diamond.> qplot(carat, price, data = diamonds)The plot shows a strong correlation with notable outliers and some interest-ing vertical striation. The relationship looks exponential, though, so the firstthing we’d like to do is to transform the variables. Becauseqplot()acceptsfunctions of variables as arguments, we plot log(price) vs. log(carat):> qplot(log(carat), log(price), data = diamonds)The relationship now looks linear. With this much overplotting, though, weneed to be cautious about drawing firm conclusions.Arguments can also b e combinations of existing variables, so, if we are curi-ous about the relationship between the volume of the diamond (approximatedby x × y × z) and its weight, we could do the following:12 2 Getting started with qplot> qplot(carat, x * y * z, data = diamonds)We would expect the density (weight/volume) of diamonds to be constant,and so see a linear relationship between volume and weight. The majority ofdiamonds do seem to fall along a line, but there are some large outliers.2.4 Colour, size, shape and other aesthetic attributesThe first big difference when usingqplotinstead ofplotcomes when youwant to assign colours—or sizes or shapes—to the points on your plot. Withplot, it’s your resp onsibility to convert a categorical variable in your data(e.g., “apples”, “bananas”, “pears”) into something thatplotknows how touse (e.g., “red”, “yellow”, “green”).qplotcan do this for you automatically,and it will automatically provide a legend that maps the displayed attributesto the data values. This makes it easy to include additional data on the plot.In the next example, we augment the plot of carat and price with informa-tion about diamond colour and cut. The results are shown in Figure 2.2.qplot(carat, price, data = dsmall, colour = color)qplot(carat, price, data = dsmall, shape = cut)caratprice50001000015000●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●0.5 1.0 1.5 2.0 2.5color●D●E●F●G●H●I●Jcaratprice50001000015000●●●0.5 1.0 1.5 2.0 2.5cut●FairGoodVery GoodPremiumIdealFig. 2.2: Mapping point colour to diamond colour (left), and point shape to cutquality (right).Colour, size and shape are all examples of aesthetic attributes, visualproperties that affect the way observations are displayed. For every aesthetic2.5 Plot geoms 13attribute, there is a function, called a scale, which maps data values to validvalues for that aesthetic. It is this scale that controls the appearance of thepoints and associated legend. For example, in


View Full Document

UW-Madison STAT 849 - Chapter 2 Getting started with qplot

Download Chapter 2 Getting started with qplot
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Chapter 2 Getting started with qplot and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Chapter 2 Getting started with qplot 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?