DS Unit 2 Sprint 5: Linear Models

Welcome to Linear Models!

Unit 2 is about Predictive Modeling, also known as supervised machine learning with labeled, tabular data!

We can make models to predict continuous numbers and answer questions like "How much?" or "How many?". This modeling task is called regression. We can also make models to predict discrete classes and answer questions like "Is this A or B or C?" This modeling task is called classification.

We'll learn about both prediction and classification tasks in the following modules.

Modules

This sprint is structured to provide you with a comprehensive understanding of linear models:

Module 1

Linear Regression 1

We’ll begin our study of predictive modeling with linear models for regression tasks. In this module, we'll learn about the importance of determining a baseline before creating a model. With the introduction of scikit-learn we'll practice how to prepare data for modeling and use the scikit-learn predictor. We'll wrap up the module by taking a closer look at linear regression coefficients and what we're fitting when we create a linear regression model.

View Module

Module 2

Linear Regression 2

In this module, we will focus on the specifics of how to separate data sets in order to train and test the regression models. We'll also expand on the previous module and learn how to fit a multiple regression model, using multiple features to make predictions. With a better understanding of regression, we can take a deeper look at the math behind it and how the scikit-learn predictors work. To wrap-up the module, we will cover model overfitting and underfitting, how to recognize each, and how to correct for them.

View Module

Module 3

Ridge Regression

For this module, we will cover how to encode various types of data, covering concepts like one-hot encoding and categorical encoding. We'll also cover how to reduce model complexity by implementing feature selection. Finally, we will introduce the concept of regularization and how it can be used to prevent overfitting. We'll practice using the ridge regression technique and learn how to implement it in scikit-learn.

View Module

Module 4

Logistic Regression

To wrap up, we’ll continue our study of predictive modeling with a linear model for classification tasks, called logistic regression. We'll learn how to begin with a baseline for classification. As we did with linear regression, we'll take a closer look at the logistic regression coefficients and learn how to interpret the resulting model. Finally, we'll implement a logistic regression model using scikit-learn.

View Module

Sprint Resources