DS7 Sprint Challenge - Applied Modeling

Sprint Challenge Overview

This sprint challenge will assess your understanding of the concepts covered throughout this sprint on Applied Modeling. You'll demonstrate your ability to define machine learning problems, wrangle datasets, apply ensemble methods, and interpret model results using real-world data.

Challenge Setup

To get started with the Sprint Challenge, follow these steps:

Access the Jupyter notebook using the link below.
Complete all tasks in the notebook, demonstrating your understanding of the sprint concepts.
You can complete the assignment locally or in Google Colab (make sure to Copy to your Google Drive).

Challenge Notebook

Challenge Expectations

The Sprint Challenge is designed to test your mastery of the following key concepts:

Define ML problems: Choosing appropriate targets, evaluation metrics, and avoiding data leakage
Data wrangling: Exploring tabular data and joining relational datasets for machine learning
Ensemble methods: Understanding bagging vs. boosting and implementing gradient boosting models
Feature importance: Using both default and permutation importance to understand model behavior
Model interpretation: Creating partial dependence plots and SHAP value visualizations

What to Expect

In this sprint challenge, you'll apply everything you've learned about applied modeling to work with a real-world dataset. This challenge will test your ability to:

Define a machine learning problem appropriately
Engineer meaningful features for your models
Handle class imbalance effectively
Apply and interpret permutation importance
Build and interpret ensemble models
Leverage model interpretation tools to explain predictions
Present your findings and recommendations in a clear, concise manner

Remember to demonstrate your understanding of the concepts from all four modules in this sprint!

Sprint Challenge: Applied Modeling

Sprint Challenge Overview

Challenge Setup

Challenge Expectations

What to Expect

Sprint Challenge Resources

Machine Learning Problem Definition

Data Wrangling and Feature Engineering

Model Interpretation