Basic Scatter Plot and linear fitted line

Lets scatter into some points created by data in xy-space. Data are scattered everywhere but what relation is there between some specific variable with other. Cutting down to simple talking and stick to the heading, we can use mtcars dataset in R.

The dataset from Motor Trend US magazine, 1974 comprises fuel consumption and 10 various aspects of automobile design and their performance for 32 automobiles of different models. I will try to obtain the scatter plot for the model and the fitted line for the model.

In R, there are three popular packages for obtaining plots Base Graphics, Lattice Plot and ggplot2. Here we will create a scatter plot between two variables mpg (mile per gallon) and disp(displacement) along with the fitted regession line with equation and \(R²\) value in it using all there graphics packages.

Lets first fit a linear model,

mdl <- lm(mpg ~ disp, data = mtcars)
sumry <- summary(mdl)
cf <- round(coef(mdl), 2)

eqn <- paste(terms(mdl)[[2]],
             paste0(cf[1], ifelse(cf[2] < 0, " ", " + "),
                    cf[2], " ", terms(mdl)[[3]]), sep = " = ")
sumry.lbl <- paste0("R^2: ", round(sumry$r.squared, 2),
                    ", adj R^2: ", round(sumry$adj.r.squared, 2))

Plots

Base Graphics

with(mtcars, {
  plot(disp, mpg, pch = 22, bg = "gray",
       xlab = "Displacement", ylab = "Mile per Gallon",
       main = "Displacement vs Mile per Gallon")
  abline(mdl, col = "red", lty = 2, lwd = 2)
  text(max(disp), max(mpg), adj = c(1, 1), family = "monospace",
       label = paste(eqn, sumry.lbl, sep = "\n"))
})

Lattice Plot

library(lattice)
lm.panel <- function(x, y, ...) {
  panel.xyplot(x, y, pch = 22, fill = "gray",
               cex = 1.2, col = "black")
  panel.text(max(x), max(y), pos = 2,
             fontfamily = "monospace",
             label = paste(eqn, sumry.lbl, sep = "\n"))
  panel.abline(mdl, col = "red", lty = 2, lwd = 2)
}
xyplot(mpg ~ disp, data = mtcars,
       panel = lm.panel,
       main = "Displacement vs Mile per Gallon",
       xlab = "Displacement", ylab = "Mile per Gallon")

ggplot

library(ggplot2)
plt <- qplot(disp, mpg, data = mtcars, geom = c("point"),
             xlab = "Displacement",
             ylab = "Mile per Gallon",
             main = "Displacement vs Mile per Gallon",
             size = I(3), shape = I(22), fill = I("grey"))
plt + theme_bw() +
  geom_smooth(method = "lm", color = "red", linetype = 2) +
  annotate(x = Inf, y = Inf, geom = "text",
           hjust = 1.2, vjust = 1.2,
           family = "monospace",
           label = paste(eqn, sumry.lbl, sep = "\n"))

The fitted regression summary is,


Call:
lm(formula = mpg ~ disp, data = mtcars)

Residuals:
         Min           1Q       Median           3Q          Max 
-4.892200650 -2.202190927 -0.963085639  1.627154680  7.230540273 

Coefficients:
                  Estimate     Std. Error  t value   Pr(>|t|)
(Intercept) 29.59985475616  1.22971951531 24.07041 < 2.22e-16
disp        -0.04121511996  0.00471183331 -8.74715 9.3803e-10

Residual standard error: 3.25145449 on 30 degrees of freedom
Multiple R-squared:  0.71834334,    Adjusted R-squared:  0.708954785 
F-statistic:   76.51266 on 1 and 30 DF,  p-value: 9.38032654e-10

This means, the effect of displacement on mile per gallon of the cars in the model is negative and its magnitude is 0.04. In other words, on one unit change of displacement, the car will travel 0.04 less per gallon.