3 min read

Basic Scatter Plot and linear fitted line

Lets scatter into some points created by data in xy-space. Data are scattered everywhere but what relation is there between some specific variable with other. Cutting down to simple talking and stick to the heading, we can use mtcars dataset in R.

The dataset from Motor Trend US magazine, 1974 comprises fuel consumption and 10 various aspects of automobile design and their performance for 32 automobiles of different models. I will try to obtain the scatter plot for the model and the fitted line for the model.

In R, there are three popular packages for obtaining plots Base Graphics, Lattice Plot and ggplot2. Here we will create a scatter plot between two variables mpg (mile per gallon) and disp(displacement) along with the fitted regession line with equation and \(R²\) value in it using all there graphics packages.

Lets first fit a linear model,

mdl <- lm(mpg ~ disp, data = mtcars)
sumry <- summary(mdl)
cf <- round(coef(mdl), 2)

eqn <- paste(terms(mdl)[[2]], 
             paste0(cf[1], ifelse(cf[2] < 0, " ", " + "), 
                    cf[2], " ", terms(mdl)[[3]]), sep = " = ")
sumry.lbl <- paste0("R^2: ", round(sumry$r.squared, 2), 
                    ", adj R^2: ", round(sumry$adj.r.squared, 2))
with(mtcars, {
  plot(disp, mpg, pch = 22, bg = "gray",
       xlab = "Displacement", ylab = "Mile per Gallon",
       main = "Displacement vs Mile per Gallon")
  abline(mdl, col = "red", lty = 2, lwd = 2)
  text(max(disp), max(mpg), adj = c(1, 1), family = "monospace",
       label = paste(eqn, sumry.lbl, sep = "\n"))
})

library(lattice)
lm.panel <- function(x, y, ...) {
  panel.xyplot(x, y, pch = 22, fill = "gray", 
               cex = 1.2, col = "black")
  panel.text(max(x), max(y), pos = 2, 
             fontfamily = "monospace",
             label = paste(eqn, sumry.lbl, sep = "\n"))
  panel.abline(mdl, col = "red", lty = 2, lwd = 2)
}
xyplot(mpg ~ disp, data = mtcars,
       panel = lm.panel, 
       main = "Displacement vs Mile per Gallon",
       xlab = "Displacement", ylab = "Mile per Gallon")

library(ggplot2)
plt <- qplot(disp, mpg, data = mtcars, geom = c("point"),
             xlab = "Displacement", 
             ylab = "Mile per Gallon", 
             main = "Displacement vs Mile per Gallon", 
             size = I(3), shape = I(22), fill = I("grey"))
plt + theme_bw() +
  geom_smooth(method = "lm", color = "red", linetype = 2) +
  annotate(x = Inf, y = Inf, geom = "text", 
           hjust = 1.2, vjust = 1.2,
           family = "monospace",
           label = paste(eqn, sumry.lbl, sep = "\n"))

The fitted regression summary is,


Call:
lm(formula = mpg ~ disp, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.8922 -2.2022 -0.9631  1.6272  7.2305 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 29.599855   1.229720  24.070  < 2e-16 ***
disp        -0.041215   0.004712  -8.747 9.38e-10 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.251 on 30 degrees of freedom
Multiple R-squared:  0.7183,    Adjusted R-squared:  0.709 
F-statistic: 76.51 on 1 and 30 DF,  p-value: 9.38e-10

This means, the effect of displacement on mile per gallon of the cars in the model is negative and its magnitude is 0.04. In other words, on one unit change of displacement, the car will travel 0.04 less per gallon.