DS Unit 3 Sprint 9: Software Engineering

Welcome to Sprint 9

"A data scientist knows more about statistics than a software engineer and more about programming than a statistician."

Being a data scientist means applying statistics and data analysis; writing actual working code that runs and gets results. You've been doing that your entire time at BloomTech - but much of our work has been in the land of Python notebooks, a valuable but limited environment intended for exploration, not engineering.

Someplace a divide between science and engineering - theory and practice, ideas and application. A skilled data scientist masters both - science informs engineering, and engineering increases the rigor of the science by making it reproducible and scalable.

In this unit, we will build the core skills needed to communicate and work with software engineers. You may pleasantly surprise colleagues if you not only know the latest and greatest machine learning models but also be able to build and approach them with best practices in software development. To do this, we will go beyond Python notebooks into the world of modules, packages, containers, and beyond. Onwards!

Sprint Modules

Module 1

Python Modules, Packages, and Environments

Python Notebooks are a glorified REPL - read-eval-print loop. What if you want code that should live on and be reused in various circumstances? Enter modules, packages, and environments!

View Module

Module 2

OOP, Code Style, and Reviews

When you're a software engineer, you're building systems that others will work on - from coworkers to future-you. We'll learn object-oriented programming principles, the most common paradigm for large codebases, plus standard code style and the review process.

View Module

Module 3

Containers and Reproducible Builds

"Works on my machine" is a common problem for code lacking software engineering background. For reproducible and deployable code, containers are the tool of choice. We'll use Docker to build Linux containers that run identically regardless of host environment.

View Module

Module 4

Software Testing, Documentation, and Licensing

Code that runs isn't enough - to be lastingly useful, you need testing and documentation. These aren't "overhead" but core engineering practices. We'll also cover choosing appropriate licenses and understanding the legal landscape of open source dependencies.

View Module

Sprint Resources