Datavis with ggplot2

The ggplot2 package is a plotting system for R, based on the grammar of graphics, which tries to take the good parts of base and lattice graphics and none of the bad parts.


1. Introduction: Plotting with ggplot2

First, install and load the ggplot2 package

install.packages("ggplot2")
library(ggplot2)

For this session, we will explore the iris data that is already pre-loaded in R.

head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
5.1 3.5 1.4 0.2 setosa
4.9 3.0 1.4 0.2 setosa
4.7 3.2 1.3 0.2 setosa
4.6 3.1 1.5 0.2 setosa
5.0 3.6 1.4 0.2 setosa
5.4 3.9 1.7 0.4 setosa

You can read about the data by typing

?iris


1.1. A simple function: qplot

The general formula is qplot(x, y, ...). To produce a basic scatter plot, type

qplot(x=Sepal.Length, y=Petal.Length, data=iris)   

Adding color per species and size depending of petals width

qplot(Sepal.Length, Petal.Length, data=iris,
      color=Species, size=Petal.Width, 
      xlab="Sepal", ylab="Petal", main="Iris dataset")

We also add a title and labels for the x and y-axis, using main, xlaband ylab.

To start at 0 for the y-axis: add qplot(...,ylim=c(0,35))
This set lower and upper bounds for y axis.
You can do the same for the x-axis

qplot(Sepal.Length, Petal.Length, data=iris, 
      color=Species, size=Petal.Width, alpha=I(0.7), 
      xlab="Sepal Length", ylab="Petal Length", main="Iris dataset")

By setting the alpha of each point to 0.7, we reduce the effects of overplotting.


1.2. A robust function: ggplot

General formula:
ggplot(data, aes(x,y)) + geom_*()
ggplot begins a plot that you finish by adding layers to, using geom(). ggplot provides more control than qplot().

  • aes: aesthetic, visual properties of the graph
    • options aes: color, fill, shape, size
  • geom: graphical property
    • geom_line; geom_bar; geom_histogram
    • geom_chart; geom_hex, geom_c(point,line) etc. You can specify for each geom the aesthetic mappings, and a default stat and position adjustment: geom_*(aes(color=, fill=, size=....))
  • additional elements :
    • You can add a smoothing trend : + geom_smooth (method="lm")
    • You can change the background (Themes): + theme_bw(), `+theme_classic()
ggplot(mtcars,aes(x=disp,y=mpg))+ geom_point()

From now on, we will only work using ggplot.

Back to top


2. Scatterplot

ggplot(diamonds, aes(x=carat, y=price)) + geom_point()

ggplot(diamonds, aes(x=carat, y=price, color=clarity)) + geom_point()

Adding color: the color of the points is determined by the clarity of the diamonds.

ggplot(diamonds, aes(x=carat, y=price, color=cut)) + geom_point()

Here we changed the color by the parameter “cut” of the dataset diamonds.

ggplot(diamonds, aes(x=carat, y=price, color=clarity, size=cut)) + geom_point()