Sprint Challenge: Applied Modeling
Sprint Challenge Overview
This sprint challenge will assess your understanding of the concepts covered throughout this sprint on Applied Modeling. You'll demonstrate your ability to define machine learning problems, wrangle datasets, apply ensemble methods, and interpret model results using real-world data.
Challenge Setup
To get started with the Sprint Challenge, follow these steps:
- Access the Jupyter notebook using the link below.
- Complete all tasks in the notebook, demonstrating your understanding of the sprint concepts.
- You can complete the assignment locally or in Google Colab (make sure to Copy to your Google Drive).
Challenge Expectations
The Sprint Challenge is designed to test your mastery of the following key concepts:
- Define ML problems: Choosing appropriate targets, evaluation metrics, and avoiding data leakage
- Data wrangling: Exploring tabular data and joining relational datasets for machine learning
- Ensemble methods: Understanding bagging vs. boosting and implementing gradient boosting models
- Feature importance: Using both default and permutation importance to understand model behavior
- Model interpretation: Creating partial dependence plots and SHAP value visualizations
What to Expect
In this sprint challenge, you'll apply everything you've learned about applied modeling to work with a real-world dataset. This challenge will test your ability to:
- Define a machine learning problem appropriately
- Engineer meaningful features for your models
- Handle class imbalance effectively
- Apply and interpret permutation importance
- Build and interpret ensemble models
- Leverage model interpretation tools to explain predictions
- Present your findings and recommendations in a clear, concise manner
Remember to demonstrate your understanding of the concepts from all four modules in this sprint!