Module 1: Decision Trees
Module Overview
In this module, you will learn about decision trees, one of the most intuitive and widely used machine learning algorithms. Decision trees are versatile models that can be used for both classification and regression tasks, making them essential tools in a data scientist's toolkit.
You'll explore how decision trees work, from the basic concepts of node splitting to practical implementation using scikit-learn. You'll also learn about the strengths and limitations of decision trees, setting the foundation for more advanced tree-based methods in later modules.
Learning Objectives
- Clean data with outliers and missing values
- Build a Decision Tree using scikit-learn
- Get and interpret feature importances of a tree-based model
Guided Project
Open JDS_SHR_221_guided_project_notes.ipynb in the GitHub repository below to follow along with the guided project:
Guided Project Video
Module Assignment
Complete the Module 1 assignment to practice decision tree techniques you've learned.
It's Kaggle competition time! In this assignment, you'll apply what you've learned about decision trees to a real-world dataset.
Getting Started with Kaggle
If this is your first time using Kaggle, here's how to get started:
- Create an Account: Visit Kaggle.com and register with your email
- Join the Competition: Navigate to the competition page and click "Join Competition"
- Download Data: Go to the "Data" tab and download the dataset files
Watch this walkthrough video for detailed instructions:
Additional Kaggle Resources
Assignment Solution Video
Resources
Documentation
- Scikit-learn: Decision Trees
- Scikit-learn: Imputation of missing values
- Pandas: Working with missing data