Sprint Challenge: Databases and SQL
Sprint Challenge Overview
This sprint challenge will assess your understanding of the concepts covered throughout this sprint on Databases and SQL. You'll demonstrate your ability to create and populate SQLite databases, write complex SQL queries for data analysis, work with multi-table relationships using JOINs, and understand the differences between SQL and NoSQL database systems.
Challenge Instructions
Databases and SQL
In this sprint challenge you will write code to interact with SQLite databases. You may use any tools and references you wish, but your final code should reflect your work and be saved in .py
files (not notebooks), and be uploaded below.
For all your code, you may only import/use the following:
- other modules (python files) that you write
sqlite3
(from the standard library)
Part 1 - Making and populating a Database
Consider the following table of data (column headers are in bold):
s | x | y |
'g' | 3 | 9 |
'v' | 5 | 7 |
'f' | 8 | 7 |
Using the standard sqlite3
module:
- Open a connection to a new (blank) database file called
demo_data.sqlite3
- Make a cursor, and execute an appropriate
CREATE TABLE
statement to accept the above data (name the tabledemo
) - Write and execute appropriate
INSERT INTO
statements to add the data (as shown above) to the database
Make sure to commit()
so your data is saved! The file size should be non-zero.
Then write the following queries (also with sqlite3
) to test the demo database and save the results of these queries under the following variables names:
row_count
: How many rows are in the table?xy_at_least_5
: How many rows are there where bothx
andy
are at least 5?unique_y
: How many unique values ofy
are there (hint -COUNT()
can accept a keywordDISTINCT
)?
If you print out the values to these variables you may see them displayed in a format something like: [(<some_number>, )]
. You can leave your answers in this format, no need to index into the list or tuple to access the integer value directly. We will be testing your code by executing queries against your demo_data.sqlite3
database so CodeGrade will be expecting the answers to be in this "list of tuples" format.
Also, CodeGrade will be attempting to import your demo_data.py
file as a module to read the values of row_count
, xy_at_least_5
, and unique_y
. Due to this, please do not declare these variables inside of an if __name__ == '__main__':
statement. You can print them out inside of that if statement to ensure that you're getting the correct answers, but please declare these three variables and assign them their values in the main body of the demo_data.py
file.
Your code (to reproduce all above steps) should be saved in demo_data.py
and turned in along with the generated SQLite database (demo_data.sqlite3
).
If you wish, you can write your queries in a queries.py
file, but this is not required. If you choose to do this, please submit the queries.py
file along with the other two required files.
Part 2 - The Northwind Database - Basic Queries
Using sqlite3
, connect to the given northwind_small.sqlite3
database.

Above is an entity-relationship diagram - a picture summarizing the schema and relationships in the database. Note that it was generated using Microsoft Access, and some of the specific table/field names are different in the provided data. You can see all the tables available to SQLite as follows:
>>> curs.execute("SELECT name FROM sqlite_master WHERE type='table' ORDER BY name;").fetchall()
[('Category',), ('Customer',), ('CustomerCustomerDemo',),
('CustomerDemographic',), ('Employee',), ('EmployeeTerritory',), ('Order',),
('OrderDetail',), ('Product',), ('Region',), ('Shipper',), ('Supplier',),
('Territory',)]
Warning: unlike the diagram, the tables in SQLite are singular and not plural (do not end in s
). And you can see the schema (CREATE TABLE
statement) behind any given table with:
>>> curs.execute('SELECT sql FROM sqlite_master WHERE name="Customer";').fetchall()
[('CREATE TABLE "Customer" \n(\n "Id" VARCHAR(8000) PRIMARY KEY, \n
"CompanyName" VARCHAR(8000) NULL, \n "ContactName" VARCHAR(8000) NULL, \n
"ContactTitle" VARCHAR(8000) NULL, \n "Address" VARCHAR(8000) NULL, \n "City"
VARCHAR(8000) NULL, \n "Region" VARCHAR(8000) NULL, \n "PostalCode"
VARCHAR(8000) NULL, \n "Country" VARCHAR(8000) NULL, \n "Phone" VARCHAR(8000)
NULL, \n "Fax" VARCHAR(8000) NULL \n)',)]
In particular note that the primary key is Id
, and not CustomerId
or SupplierId
. On other tables (where it is a foreign key) it will be CustomerId
. Also note - the Order
table conflicts with the ORDER
keyword! We'll just avoid that particular table, but it's a good lesson in the danger of keyword conflicts.
Answer the following questions (each is from a single table) and then save each query under the following variable name:
expensive_items
: What are the ten most expensive items (per unit price) in the database? Please return all columns in the table, not just the price and name but all columns.avg_hire_age
: What is the average age of an employee at the time of their hiring? (Hint: a lot of arithmetic works with dates.)
Your code (to load and query the data) should be saved in northwind.py
. PLEASE NOTE that in parts 2 and 3 you will be saving the query itself to the indicated variable names, not the results of the query. You can still execute the query to check your work, but please save the query string itself to the variable names above. You're also encouraged to include the output from your queries as docstring comments, to facilitate grading and feedback, but this will not be checked by the autograder. Again, please make sure that these queries are declared in the main body of the northwind.py
file and not in an if __name__ == "__main__":
if-statement.
Part 3 - Sailing the Northwind Seas (JOIN statements)
You've answered some basic questions from the Northwind database, looking at individual tables - now it's time to put things together with JOIN
!
Using sqlite3
in northwind.py
, answer the following:
ten_most_expensive
: What are the ten most expensive items (per unit price) in the database and their suppliers? Please return only the ProductName, UnitPrice, and CompanyName columns.largest_category
: What is the largest category (by number of unique products in it)?
Your code (to load and query the data) should be saved in northwind.py
. PLEASE NOTE that in parts 2 and 3 you will be saving the query itself to the indicated variable names, not the results of the query. You can still execute the query to check your work, but please save the query string itself to the variable names above. You're also encouraged to include the output from your queries as docstring comments to facilitate grading and feedback, but this will not be checked by the autograder. Again, please make sure that these queries are declared in the main body of the northwind.py
file and not in an if __name__ == "__main__":
if-statement.
Part 4 - Turn it in!
Submit the following files:
demo_data.py
northwind.py
demo_data.sqlite3
northwind_small.sqlite3
You're also encouraged to include the output from your queries as docstring comments, to facilitate grading and feedback. Thanks for your hard work!
Challenge Expectations
The Sprint Challenge is designed to test your mastery of the following key concepts:
- Database Creation and Management: Creating SQLite databases, tables, and populating them with data
- SQL Query Writing: Writing effective SELECT statements with filtering, aggregation, and sorting
- Multi-table Operations: Using JOIN statements to combine data from multiple related tables
- Data Analysis with SQL: Performing analytical queries to extract meaningful insights from databases
- Database Schema Understanding: Reading and interpreting entity-relationship diagrams and table structures
What to Expect
In this sprint challenge, you'll apply everything you've learned about databases and SQL to work with real-world data scenarios. This challenge will test your ability to:
- Create and populate SQLite databases from scratch using Python
- Write complex SQL queries to extract specific information from databases
- Use aggregate functions like COUNT, AVG, and DISTINCT for data analysis
- Perform JOIN operations to combine data from multiple tables
- Work with the Northwind database, a classic example used in database education
- Handle database connections, cursors, and transactions properly
- Save query results in the appropriate format for automated testing
- Structure your code professionally with proper variable declarations
Remember to demonstrate your understanding of the concepts from all four modules in this sprint: SQL fundamentals, advanced analysis techniques, NoSQL concepts, and database scalability considerations!