Module 1: Natural Language Processing - Introduction

Module Overview

Natural Language Processing (NLP) is a field of artificial intelligence that focuses on enabling computers to understand, interpret, and generate human language. In this module, we'll explore the foundational concepts of NLP, including text preprocessing techniques that are essential for any text-based analysis. We'll learn how to tokenize text, remove stop words, and apply stemming or lemmatization to prepare text data for more advanced NLP applications.

Learning Objectives

Tokenize Text
Remove Stop Words From a List of Tokens
Stem and Lemmatize Text

Guided Project

Open DS_411_Text_Data_Lecture_GP.ipynb in the GitHub repository to follow along with the guided project.

GitHub Repo Slides Guided Project Solution

Module Assignment

Analyze coffee shop reviews to identify attributes of the best and worst rated establishments. Apply text preprocessing techniques including tokenization, lemmatization, and custom stopword removal to clean the data and create visualizations showing token frequency patterns across different star ratings.

Module 1: Natural Language Processing - Introduction

Module Overview

Learning Objectives

Guided Project

Module Assignment

Assignment Solution Video

Additional Resources

NLP Libraries and Text Processing

Tokenization and Preprocessing

Text Analysis and Visualization