- R Tutorial
- R - Home
- R - Overview
- R - Environment Setup
- R - Basic Syntax
- R - Data Types
- R - Variables
- R - Operators
- R - Decision Making
- R - Loops
- R - Functions
- R - Strings
- R - Vectors
- R - Lists
- R - Matrices
- R - Arrays
- R - Factors
- R - Data Frames
- R - Packages
- R - Data Reshaping
- R Data Interfaces
- R - CSV Files
- R - Excel Files
- R - Binary Files
- R - XML Files
- R - JSON Files
- R - Web Data
- R - Database
- R Charts & Graphs
- R - Pie Charts
- R - Bar Charts
- R - Boxplots
- R - Histograms
- R - Line Graphs
- R - Scatterplots
- R Statistics Examples
- R - Mean, Median & Mode
- R - Linear Regression
- R - Multiple Regression
- R - Logistic Regression
- R - Normal Distribution
- R - Binomial Distribution
- R - Poisson Regression
- R - Analysis of Covariance
- R - Time Series Analysis
- R - Nonlinear Least Square
- R - Decision Tree
- R - Random Forest
- R - Survival Analysis
- R - Chi Square Tests
- R Useful Resources
- R - Interview Questions
- R - Quick Guide
- R - Useful Resources
- R - Discussion
R - Multiple Regression
Multiple regression is an extension of linear regression into relationship between more than two variables. In simple linear relation we have one predictor and one response variable, but in multiple regression we have more than one predictor variable and one response variable.
The general mathematical equation for multiple regression is −
y = a + b1x1 + b2x2 +...bnxn
Following is the description of the parameters used −
y is the response variable.
a, b1, b2...bn are the coefficients.
x1, x2, ...xn are the predictor variables.
We create the regression model using the lm() function in R. The model determines the value of the coefficients using the input data. Next we can predict the value of the response variable for a given set of predictor variables using these coefficients.
lm() Function
This function creates the relationship model between the predictor and the response variable.
Syntax
The basic syntax for lm() function in multiple regression is −
lm(y ~ x1+x2+x3...,data)
Following is the description of the parameters used −
formula is a symbol presenting the relation between the response variable and predictor variables.
data is the vector on which the formula will be applied.
Example
Input Data
Consider the data set "mtcars" available in the R environment. It gives a comparison between different car models in terms of mileage per gallon (mpg), cylinder displacement("disp"), horse power("hp"), weight of the car("wt") and some more parameters.
The goal of the model is to establish the relationship between "mpg" as a response variable with "disp","hp" and "wt" as predictor variables. We create a subset of these variables from the mtcars data set for this purpose.
input <- mtcars[,c("mpg","disp","hp","wt")] print(head(input))
When we execute the above code, it produces the following result −
mpg disp hp wt Mazda RX4 21.0 160 110 2.620 Mazda RX4 Wag 21.0 160 110 2.875 Datsun 710 22.8 108 93 2.320 Hornet 4 Drive 21.4 258 110 3.215 Hornet Sportabout 18.7 360 175 3.440 Valiant 18.1 225 105 3.460
Create Relationship Model & get the Coefficients
input <- mtcars[,c("mpg","disp","hp","wt")] # Create the relationship model. model <- lm(mpg~disp+hp+wt, data = input) # Show the model. print(model) # Get the Intercept and coefficients as vector elements. cat("# # # # The Coefficient Values # # # ","\n") a <- coef(model)[1] print(a) Xdisp <- coef(model)[2] Xhp <- coef(model)[3] Xwt <- coef(model)[4] print(Xdisp) print(Xhp) print(Xwt)
When we execute the above code, it produces the following result −
Call: lm(formula = mpg ~ disp + hp + wt, data = input) Coefficients: (Intercept) disp hp wt 37.105505 -0.000937 -0.031157 -3.800891 # # # # The Coefficient Values # # # (Intercept) 37.10551 disp -0.0009370091 hp -0.03115655 wt -3.800891
Create Equation for Regression Model
Based on the above intercept and coefficient values, we create the mathematical equation.
Y = a+Xdisp.x1+Xhp.x2+Xwt.x3 or Y = 37.15+(-0.000937)*x1+(-0.0311)*x2+(-3.8008)*x3
Apply Equation for predicting New Values
We can use the regression equation created above to predict the mileage when a new set of values for displacement, horse power and weight is provided.
For a car with disp = 221, hp = 102 and wt = 2.91 the predicted mileage is −
Y = 37.15+(-0.000937)*221+(-0.0311)*102+(-3.8008)*2.91 = 22.7104