DS10 Sprint Challenge

Sprint Challenge Overview

This sprint challenge will assess your understanding of the concepts covered throughout this sprint on Databases and SQL. You'll demonstrate your ability to create and populate SQLite databases, write complex SQL queries for data analysis, work with multi-table relationships using JOINs, and understand the differences between SQL and NoSQL database systems.

Challenge Instructions

Databases and SQL

In this sprint challenge you will write code to interact with SQLite databases. You may use any tools and references you wish, but your final code should reflect your work and be saved in .py files (not notebooks), and be uploaded below.

For all your code, you may only import/use the following:

other modules (python files) that you write
sqlite3 (from the standard library)

Part 1 - Making and populating a Database

Consider the following table of data (column headers are in bold):

s	x	y
'g'	3	9
'v'	5	7
'f'	8	7

Using the standard sqlite3 module:

Open a connection to a new (blank) database file called demo_data.sqlite3
Make a cursor, and execute an appropriate CREATE TABLE statement to accept the above data (name the table demo)
Write and execute appropriate INSERT INTO statements to add the data (as shown above) to the database

Make sure to commit() so your data is saved! The file size should be non-zero.

Then write the following queries (also with sqlite3) to test the demo database and save the results of these queries under the following variables names:

row_count: How many rows are in the table?
xy_at_least_5: How many rows are there where both x and y are at least 5?
unique_y: How many unique values of y are there (hint - COUNT() can accept a keyword DISTINCT)?

If you print out the values to these variables you may see them displayed in a format something like: [(<some_number>, )]. You can leave your answers in this format, no need to index into the list or tuple to access the integer value directly. We will be testing your code by executing queries against your demo_data.sqlite3 database so CodeGrade will be expecting the answers to be in this "list of tuples" format.

Also, CodeGrade will be attempting to import your demo_data.py file as a module to read the values of row_count, xy_at_least_5, and unique_y. Due to this, please do not declare these variables inside of an if __name__ == '__main__': statement. You can print them out inside of that if statement to ensure that you're getting the correct answers, but please declare these three variables and assign them their values in the main body of the demo_data.py file.

Your code (to reproduce all above steps) should be saved in demo_data.py and turned in along with the generated SQLite database (demo_data.sqlite3).

If you wish, you can write your queries in a queries.py file, but this is not required. If you choose to do this, please submit the queries.py file along with the other two required files.

Part 2 - The Northwind Database - Basic Queries

Using sqlite3, connect to the given northwind_small.sqlite3 database.

Above is an entity-relationship diagram - a picture summarizing the schema and relationships in the database. Note that it was generated using Microsoft Access, and some of the specific table/field names are different in the provided data. You can see all the tables available to SQLite as follows:

>>> curs.execute("SELECT name FROM sqlite_master WHERE type='table' ORDER BY name;").fetchall()
[('Category',), ('Customer',), ('CustomerCustomerDemo',),
('CustomerDemographic',), ('Employee',), ('EmployeeTerritory',), ('Order',),
('OrderDetail',), ('Product',), ('Region',), ('Shipper',), ('Supplier',),
('Territory',)]

Warning: unlike the diagram, the tables in SQLite are singular and not plural (do not end in s). And you can see the schema (CREATE TABLE statement) behind any given table with:

>>> curs.execute('SELECT sql FROM sqlite_master WHERE name="Customer";').fetchall()
[('CREATE TABLE "Customer" \n(\n  "Id" VARCHAR(8000) PRIMARY KEY, \n
"CompanyName" VARCHAR(8000) NULL, \n  "ContactName" VARCHAR(8000) NULL, \n
"ContactTitle" VARCHAR(8000) NULL, \n  "Address" VARCHAR(8000) NULL, \n  "City"
VARCHAR(8000) NULL, \n  "Region" VARCHAR(8000) NULL, \n  "PostalCode"
VARCHAR(8000) NULL, \n  "Country" VARCHAR(8000) NULL, \n  "Phone" VARCHAR(8000)
NULL, \n  "Fax" VARCHAR(8000) NULL \n)',)]

In particular note that the primary key is Id, and not CustomerId or SupplierId. On other tables (where it is a foreign key) it will be CustomerId. Also note - the Order table conflicts with the ORDER keyword! We'll just avoid that particular table, but it's a good lesson in the danger of keyword conflicts.

Answer the following questions (each is from a single table) and then save each query under the following variable name:

expensive_items: What are the ten most expensive items (per unit price) in the database? Please return all columns in the table, not just the price and name but all columns.
avg_hire_age: What is the average age of an employee at the time of their hiring? (Hint: a lot of arithmetic works with dates.)

Your code (to load and query the data) should be saved in northwind.py. PLEASE NOTE that in parts 2 and 3 you will be saving the query itself to the indicated variable names, not the results of the query. You can still execute the query to check your work, but please save the query string itself to the variable names above. You're also encouraged to include the output from your queries as docstring comments, to facilitate grading and feedback, but this will not be checked by the autograder. Again, please make sure that these queries are declared in the main body of the northwind.py file and not in an if __name__ == "__main__": if-statement.

Part 3 - Sailing the Northwind Seas (JOIN statements)

You've answered some basic questions from the Northwind database, looking at individual tables - now it's time to put things together with JOIN!

Using sqlite3 in northwind.py, answer the following:

ten_most_expensive: What are the ten most expensive items (per unit price) in the database and their suppliers? Please return only the ProductName, UnitPrice, and CompanyName columns.
largest_category: What is the largest category (by number of unique products in it)?

Your code (to load and query the data) should be saved in northwind.py. PLEASE NOTE that in parts 2 and 3 you will be saving the query itself to the indicated variable names, not the results of the query. You can still execute the query to check your work, but please save the query string itself to the variable names above. You're also encouraged to include the output from your queries as docstring comments to facilitate grading and feedback, but this will not be checked by the autograder. Again, please make sure that these queries are declared in the main body of the northwind.py file and not in an if __name__ == "__main__": if-statement.

Part 4 - Turn it in!

Submit the following files:

demo_data.py
northwind.py
demo_data.sqlite3
northwind_small.sqlite3

You're also encouraged to include the output from your queries as docstring comments, to facilitate grading and feedback. Thanks for your hard work!

Challenge Expectations

The Sprint Challenge is designed to test your mastery of the following key concepts:

Database Creation and Management: Creating SQLite databases, tables, and populating them with data
SQL Query Writing: Writing effective SELECT statements with filtering, aggregation, and sorting
Multi-table Operations: Using JOIN statements to combine data from multiple related tables
Data Analysis with SQL: Performing analytical queries to extract meaningful insights from databases
Database Schema Understanding: Reading and interpreting entity-relationship diagrams and table structures

What to Expect

In this sprint challenge, you'll apply everything you've learned about databases and SQL to work with real-world data scenarios. This challenge will test your ability to:

Create and populate SQLite databases from scratch using Python
Write complex SQL queries to extract specific information from databases
Use aggregate functions like COUNT, AVG, and DISTINCT for data analysis
Perform JOIN operations to combine data from multiple tables
Work with the Northwind database, a classic example used in database education
Handle database connections, cursors, and transactions properly
Save query results in the appropriate format for automated testing
Structure your code professionally with proper variable declarations

Remember to demonstrate your understanding of the concepts from all four modules in this sprint: SQL fundamentals, advanced analysis techniques, NoSQL concepts, and database scalability considerations!

Sprint Challenge: Databases and SQL