Best Practices, Bigdata, Data Science, Exploratory Data Analysis, Machine Learning

Ordinary least squares regression (OLSR)

Ordinary least squares regression (OLSR) 

Invented in 1795 by Carl Friedrich Gauss, it is considered one of the earliest known general prediction methods. OLSR is a generalized linear modeling technique. It is used for estimating all unknown parameters involved in a linear regression model, the goal of which is to minimize the sum of the squares of the difference of the observed variables and the explanatory variables.

Ordinary least squares regression is also known as ordinary least squares or least squared errors regression.

Lets start with a Linear regression model like below:-

Here is few terminology we use when we calculate OLS:

Dependent variable (DV) = response variable = left-hand side (LHS) variable = regressand variable = Output variable = criterion variable.

Independent variables (IV) = explanatory variables = right-hand side (RHS) variables = regressor (excluding a or b0) = Input variable = predictor variable.

a (b0) is an estimator of parameter α, β0

b (b1) is an estimator of parameter β, β1

a and b are the intercept and slope

A positive relationship in a regression means that when the independent variable increases, the dependent variable tends to increase. A negative relationship means that when the independent variable increases, the dependent variable tends to decrease, and vice versa. Regressions can be run to estimate or test many different relationships. 

This is the formula for a regression that contains only two variables:


The Y on the left side of the equation is the dependent variable. The a or Alpha coefficient represents the intercept, which is where the line hits the y-axis in the graph, i.e., the predicted value of Y when X equals 0. The ß or Beta coefficient represents the slope, the predicted change in Y for each one-unit increase in X.

A regression with one dependent variable and more than one independent variable is called a multiple regression. This type of regression is very commonly used. It is a statistical tool to predict the value of the dependent variable, using several independent variables. The independent variables can include quadratic or other nonlinear transformations: for example, if the dependent variable Y is earnings, we might include gender, age, and the square of age as independent variables, in which case the assumption of a “linear” relationship between Y and the three regressors actually allows the possibility of a quadratic relationship with age.

The main purpose of regression for many distinct purpose.

  1. To give descriptive summery of how the outcome varies with the explanatory variables.
  2. To predict outcome given a set of values for explanatory variables.
  3. Estimate the parameters of a model where we describe the process that generates the outcome.
  4. To study casual relationship.

For descriptive summary and prediction as mentioned in above point #1 and #2 there is some technical sense which OLS can do the good job. OLS help us to draw a line based on data points observed. Suppose we have a imaginary line of y= a + bx as shown in below figure. Imagine a vertical distance (or error) between the line and a data point. E=Y-E(Y) is error (or gap) of the deviation of the data point from the imaginary line, regression line. So what is the best values of a and ba and b that minimizes the sum of such errors (deviations of individual data points from the line)

To Perform OLS method we need to do following steps:-

Using the method least squares we can get a and b that can minimize the sum of squared deviations rather than the sum of deviations. So a and b are called least squares estimators (estimators of parameters α and β). The process of getting parameter estimators (e.g., a and b) is called estimation. Least squares method is the estimation method of ordinary least squares (OLS).

Compute a and b so that partial derivatives with respect to a and b are equal to zero 

Method 2 is to take a partial derivative with respect to b and plug in a you got, a=Ybar –b*Xbar

 Now let us take above to an example and calculate the a and b from a given dataset.

Now, let us go to more practical example, suppose we start out knowing the height and hand size of a bunch of individuals in a “sample population,” and that we want to figure out a way to predict hand size from height for individuals not in the sample. By applying OLS, we’ll get an equation that takes hand size—the ‘independent’ variable—as an input, and gives height—the ‘dependent’ variable—as an output.

Below, OLS is done behind-the-scenes to produce the regression equation. The constants in the regression—called ‘betas’—are what OLS spits out. Here, beta_1 is an intercept; it tells what height would be even for a hand size of zero. And beta_2 is the coefficient on hand size; it tells how much taller we should expect someone to be for a given increment in their hand size.

As we discussed our equation, the Error is the difference between prediction and reality: the vertical distance between a real data point and the regression line. OLS is concerned with the squares of the errors. It tries to find the line going through the sample data that minimizes the sum of the squared errors. Below, the squared errors are represented as squares, and our job is to choose betas (the slope and intercept of the regression line) so that the total area of all the squares (the sum of the squared errors) is as small as possible.

Now, real scientists rarely do regression with just one independent variable, but OLS works exactly the same with more. Below is OLS with two independent variables. Instead of the errors being relative to a line, though, they’re now relative to a plane in 3D space. So OLS is to find the equation for that plane. The slice of the plane through each axis is shown in the first two figures.

That’s OLS!

Happy Machine Learning…

Leave a Reply

Your email address will not be published. Required fields are marked *