Machine Learning - Multiple Linear Regression

It is basically the extension of simple linear regression that predicts a response using two or more features. Mathematically we can explain it as follows −

Consider a dataset having n observations, p features i.e. independent variables and y as one response i.e. dependent variable the regression line for p features can be calculated as follows −

$$h\left ( x_{i} \right )=b_{0}+b_{1}x_{i1}+b_{2}x_{i2}+\cdot \cdot \cdot +b_{p}x_{ip}$$

Here,$h\left ( x_{i} \right )$ is the predicted response value and $b_{0},b_{1},b_{2}....b_{p}$ are the regression coefficients.

Multiple Linear Regression models always includes the errors in the data known as residual error which changes the calculation as follows −

$$h\left ( x_{i} \right )=b_{0}+b_{1}x_{i1}+b_{2}x_{i2}+\cdot \cdot \cdot +b_{p}x_{ip}+e_{i}$$

We can also write the above equation as follows −

$$y_{i}=h\left ( x_{i} \right )+e_{i}\:\: or \:\: e_{i}=y_{i}-h\left ( x_{i} \right )$$

Python Implementation

To implement multiple linear regression in Python using Scikit-Learn, we can use the same LinearRegression class as in simple linear regression, but this time we need to provide multiple independent variables as input.

Let's consider the Boston Housing dataset from Scikit-Learn and implement multiple linear regression using it.

Example

from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
import numpy as np
import matplotlib.pyplot as plt

# Load the Boston Housing dataset
boston = load_boston()

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(boston.data,
boston.target, test_size=0.2, random_state=0)

# Create a linear regression object
lr_model = LinearRegression()

# Fit the model on the training data
lr_model.fit(X_train, y_train)

# Make predictions on the test data
y_pred = lr_model.predict(X_test)

# Calculate the mean squared error
mse = mean_squared_error(y_test, y_pred)

# Calculate the coefficient of determination
r2 = r2_score(y_test, y_pred)

print('Mean Squared Error:', mse)
print('Coefficient of Determination:', r2)

# Plot the predicted values against the actual values
plt.figure(figsize=(7.5, 3.5))
plt.scatter(y_test, y_pred)
plt.xlabel('Actual Values')
plt.ylabel('Predicted Values')

# Add a regression line to the plot
x = np.linspace(0, 50, 100)
y = x
plt.plot(x, y, color='red')

# Show the plot
plt.show()

In this code, we first load the Boston Housing dataset using the load_boston() function from Scikit-Learn. We then split the dataset into training and testing sets using the train_test_split() function.

Next, we create a LinearRegression object and fit it on the training data using the fit() method. We then make predictions on the test data using the predict() method and calculate the mean squared error and coefficient of determination using the mean_squared_error() and r2_score() functions, respectively.

Finally, we plot the predicted values against the actual values using the scatter() function and add a regression line to the plot using the plot() function. We label the x-axis and y-axis using the xlabel() and ylabel() functions and display the plot using the show() function.

Output

When you execute the program, it will produce the following plot as the output and it will print the Mean Squared Error and the Coefficient of Determination on the terminal −

Mean Squared Error: 33.44897999767653
Coefficient of Determination: 0.5892223849182507

Print Page