September 25, 2018
In statistics, linear regression is a linear approach to modelling the relationship between a dependent variable and one or more independent variables. I you would like to know more about linear regression and how it is implemented, check out these two methods to perform Linear Regression from scratch:
Today to perform Linear Regression quickly, we will be using the library scikit-learn. If you don’t have it already you can install it using pip:
pip install scikit-learn
So now lets start by making a few imports:
We need numpy to perform calculations, pandas to import the data set which is in csv format in this case and matplotlib to visualize our data and regression line. We will use the LinearRegression class to perform the linear regression.
Now lets perform the regression:
We have our predictions in Y_pred. Now lets visualize the data set and the regression line:
That’s it! You can use any data set of you choice, and even perform Multiple Linear Regression (more than one independent variable) using the LinearRegression class in sklearn.linear_model. Also this class uses the ordinary Least Squares method to perform this regression. So accuracy wont be high, when compared to other techniques. But if you want to make some quick predictions and get some insight into the data set given to you, then this is a very handy tool.
Find the data set and code here.