DS Unit 2 - Sprint 7: Applied Modeling

Welcome to Applied Modeling!

For your portfolio project (Sprint project), you will choose your own labeled, tabular dataset, train a predictive model, and publish a web app or blog post with visualizations to explain your model.

You will use your chosen dataset for all assignments during the Applied Modeling sprint. You will learn how to define machine learning problems, begin the modeling process, choose targets, choose evaluation metrics, and avoid leakage.

You will improve your model predictions with powerful models like gradient boosting and feature selection techniques such as permutation importance. You will improve your model interpretation with insightful visualizations like partial dependence plots and shapley value force plots.

Applying predictive modeling to real decisions isn't easy, but these are the skills employers are looking for! Here is what we'll cover in each of the modules in this sprint.

Sprint Overview

Module 1

Define ML Problems

In this module, we'll start by thinking more carefully about how we choose a target for a particular data set and how the characteristics of the target affect the type of model and evaluation metrics we might choose.

View Module

Module 2

Wrangle ML Datasets

This module focuses on the basics. We'll review how to prepare data for modeling and also how to combine more than one data set.

View Module

Module 3

Permutation and Boosting

In this module, we're getting into some more advanced models and will learn about boosting and bagging techniques, including using the XGBoost model.

View Module

Module 4

Model Interpretation

We'll end this sprint with learning about how to visualize our models in more detail using partial dependence plots and Shapley value plots. These visualizations can help us understand how models are behaving "inside".

View Module

Sprint Resources

Primary Resources

Code-Along Sessions Sprint Challenge

Documentation

Additional Learning

Interpretable Machine Learning Book