- Statistics Tutorial
- Home
- Adjusted R-Squared
- Analysis of Variance
- Arithmetic Mean
- Arithmetic Median
- Arithmetic Mode
- Arithmetic Range
- Bar Graph
- Best Point Estimation
- Beta Distribution
- Binomial Distribution
- Black-Scholes model
- Boxplots
- Central limit theorem
- Chebyshev's Theorem
- Chi-squared Distribution
- Chi Squared table
- Circular Permutation
- Cluster sampling
- Cohen's kappa coefficient
- Combination
- Combination with replacement
- Comparing plots
- Continuous Uniform Distribution
- Continuous Series Arithmetic Mean
- Continuous Series Arithmetic Median
- Continuous Series Arithmetic Mode
- Cumulative Frequency
- Co-efficient of Variation
- Correlation Co-efficient
- Cumulative plots
- Cumulative Poisson Distribution
- Data collection
- Data collection - Questionaire Designing
- Data collection - Observation
- Data collection - Case Study Method
- Data Patterns
- Deciles Statistics
- Discrete Series Arithmetic Mean
- Discrete Series Arithmetic Median
- Discrete Series Arithmetic Mode
- Dot Plot
- Exponential distribution
- F distribution
- F Test Table
- Factorial
- Frequency Distribution
- Gamma Distribution
- Geometric Mean
- Geometric Probability Distribution
- Goodness of Fit
- Grand Mean
- Gumbel Distribution
- Harmonic Mean
- Harmonic Number
- Harmonic Resonance Frequency
- Histograms
- Hypergeometric Distribution
- Hypothesis testing
- Individual Series Arithmetic Mean
- Individual Series Arithmetic Median
- Individual Series Arithmetic Mode
- Interval Estimation
- Inverse Gamma Distribution
- Kolmogorov Smirnov Test
- Kurtosis
- Laplace Distribution
- Linear regression
- Log Gamma Distribution
- Logistic Regression
- Mcnemar Test
- Mean Deviation
- Means Difference
- Multinomial Distribution
- Negative Binomial Distribution
- Normal Distribution
- Odd and Even Permutation
- One Proportion Z Test
- Outlier Function
- Permutation
- Permutation with Replacement
- Pie Chart
- Poisson Distribution
- Pooled Variance (r)
- Power Calculator
- Probability
- Probability Additive Theorem
- Probability Multiplecative Theorem
- Probability Bayes Theorem
- Probability Density Function
- Process Capability (Cp) & Process Performance (Pp)
- Process Sigma
- Quadratic Regression Equation
- Qualitative Data Vs Quantitative Data
- Quartile Deviation
- Range Rule of Thumb
- Rayleigh Distribution
- Regression Intercept Confidence Interval
- Relative Standard Deviation
- Reliability Coefficient
- Required Sample Size
- Residual analysis
- Residual sum of squares
- Root Mean Square
- Sample planning
- Sampling methods
- Scatterplots
- Shannon Wiener Diversity Index
- Signal to Noise Ratio
- Simple random sampling
- Skewness
- Standard Deviation
- Standard Error ( SE )
- Standard normal table
- Statistical Significance
- Statistics Formulas
- Statistics Notation
- Stem and Leaf Plot
- Stratified sampling
- Student T Test
- Sum of Square
- T-Distribution Table
- Ti 83 Exponential Regression
- Transformations
- Trimmed Mean
- Type I & II Error
- Variance
- Venn Diagram
- Weak Law of Large Numbers
- Z table
- Statistics Useful Resources
- Statistics - Discussion
Statistics - Linear regression
Once the degree of relationship between variables has been established using co-relation analysis, it is natural to delve into the nature of relationship. Regression analysis helps in determining the cause and effect relationship between variables. It is possible to predict the value of other variables (called dependent variable) if the values of independent variables can be predicted using a graphical method or the algebraic method.
Graphical Method
It involves drawing a scatter diagram with independent variable on X-axis and dependent variable on Y-axis. After that a line is drawn in such a manner that it passes through most of the distribution, with remaining points distributed almost evenly on either side of the line.
A regression line is known as the line of best fit that summarizes the general movement of data. It shows the best mean values of one variable corresponding to mean values of the other. The regression line is based on the criteria that it is a straight line that minimizes the sum of squared deviations between the predicted and observed values of the dependent variable.
Algebraic Method
Algebraic method develops two regression equations of X on Y, and Y on X.
Regression equation of Y on X
${Y = a+bX}$
Where −
${Y}$ = Dependent variable
${X}$ = Independent variable
${a}$ = Constant showing Y-intercept
${b}$ = Constant showing slope of line
Values of a and b is obtained by the following normal equations:
${\sum Y = Na + b\sum X \\[7pt] \sum XY = a \sum X + b \sum X^2 }$
Where −
${N}$ = Number of observations
Regression equation of X on Y
${X = a+bY}$
Where −
${X}$ = Dependent variable
${Y}$ = Independent variable
${a}$ = Constant showing Y-intercept
${b}$ = Constant showing slope of line
Values of a and b is obtained by the following normal equations:
${\sum X = Na + b\sum Y \\[7pt] \sum XY = a \sum Y + b \sum Y^2 }$
Where −
${N}$ = Number of observations
Example
Problem Statement:
A researcher has found that there is a co-relation between the weight tendencies of father and son. He is now interested in developing regression equation on two variables from the given data:
Weight of father (in Kg) | 69 | 63 | 66 | 64 | 67 | 64 | 70 | 66 | 68 | 67 | 65 | 71 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Weight of Son (in Kg) | 70 | 65 | 68 | 65 | 69 | 66 | 68 | 65 | 71 | 67 | 64 | 72 |
Develop
Regression equation of Y on X.
Regression equation of on Y.
Solution:
${X}$ | ${X^2}$ | ${Y}$ | ${Y^2}$ | ${XY}$ |
---|---|---|---|---|
69 | 4761 | 70 | 4900 | 4830 |
63 | 3969 | 65 | 4225 | 4095 |
66 | 4356 | 68 | 4624 | 4488 |
64 | 4096 | 65 | 4225 | 4160 |
67 | 4489 | 69 | 4761 | 4623 |
64 | 4096 | 66 | 4356 | 4224 |
70 | 4900 | 68 | 4624 | 4760 |
66 | 4356 | 65 | 4225 | 4290 |
68 | 4624 | 71 | 5041 | 4828 |
67 | 4489 | 67 | 4489 | 4489 |
65 | 4225 | 64 | 4096 | 4160 |
71 | 5041 | 72 | 5184 | 5112 |
${\sum X = 800}$ | ${\sum X^2 = 53,402}$ | ${\sum Y = 810}$ | ${\sum Y^2 = 54,750}$ | ${\sum XY = 54,059}$ |
Regression equation of Y on X
Y = a+bX
Where , a and b are obtained by normal equations
${\Rightarrow}$ 810 = 12a + 800b ... (i)
${\Rightarrow}$ 54049 = 800a + 53402 b ... (ii)
Multiplying equation (i) with 800 and equation (ii) with 12, we get:
96000 a + 640000 b = 648000 ... (iii)
96000 a + 640824 b = 648588 ... (iv)
Subtracting equation (iv) from (iii)
-824 b = -588
${\Rightarrow}$ b = -.0713
Substituting the value of b in eq. (i)
810 = 12a + 800 (-0.713)
810 = 12a + 570.4
12a = 239.6
${\Rightarrow}$ a = 19.96
Hence the equation Y on X can be written as
Regression equation of X on Y
X = a+bY
Where , a and b are obtained by normal equations
${\Rightarrow}$ 800 = 12a + 810a + 810b ... (V)
${\Rightarrow}$ 54,049 = 810a + 54, 750 ... (vi)
Multiplying eq (v) by 810 and eq (vi) by 12, we get
9720 a + 656100 b = 648000 ... (vii)
9720 a + 65700 b = 648588 ... (viii)
Subtracting eq viii from eq vii
900b = -588
${\Rightarrow}$ b = 0.653
Substituting the value of b in equation (v)
800 = 12a + 810 (0.653)
12a = 271.07
${\Rightarrow}$ a = 22.58
Hence regression equation of X and Y is