Sprint Challenge: Databases and SQL

Sprint Challenge Overview

This sprint challenge will assess your understanding of the concepts covered throughout this sprint on Databases and SQL. You'll demonstrate your ability to create and populate SQLite databases, write complex SQL queries for data analysis, work with multi-table relationships using JOINs, and understand the differences between SQL and NoSQL database systems.

Challenge Instructions

Databases and SQL

In this sprint challenge you will write code to interact with SQLite databases. You may use any tools and references you wish, but your final code should reflect your work and be saved in .py files (not notebooks), and be uploaded below.

For all your code, you may only import/use the following:

Part 1 - Making and populating a Database

Consider the following table of data (column headers are in bold):

s x y
'g' 3 9
'v' 5 7
'f' 8 7

Using the standard sqlite3 module:

Make sure to commit() so your data is saved! The file size should be non-zero.

Then write the following queries (also with sqlite3) to test the demo database and save the results of these queries under the following variables names:

If you print out the values to these variables you may see them displayed in a format something like: [(<some_number>, )]. You can leave your answers in this format, no need to index into the list or tuple to access the integer value directly. We will be testing your code by executing queries against your demo_data.sqlite3 database so CodeGrade will be expecting the answers to be in this "list of tuples" format.

Also, CodeGrade will be attempting to import your demo_data.py file as a module to read the values of row_count, xy_at_least_5, and unique_y. Due to this, please do not declare these variables inside of an if __name__ == '__main__': statement. You can print them out inside of that if statement to ensure that you're getting the correct answers, but please declare these three variables and assign them their values in the main body of the demo_data.py file.

Your code (to reproduce all above steps) should be saved in demo_data.py and turned in along with the generated SQLite database (demo_data.sqlite3).

If you wish, you can write your queries in a queries.py file, but this is not required. If you choose to do this, please submit the queries.py file along with the other two required files.

Part 2 - The Northwind Database - Basic Queries

Using sqlite3, connect to the given northwind_small.sqlite3 database.

Northwind ERD

Above is an entity-relationship diagram - a picture summarizing the schema and relationships in the database. Note that it was generated using Microsoft Access, and some of the specific table/field names are different in the provided data. You can see all the tables available to SQLite as follows:

>>> curs.execute("SELECT name FROM sqlite_master WHERE type='table' ORDER BY name;").fetchall()
[('Category',), ('Customer',), ('CustomerCustomerDemo',),
('CustomerDemographic',), ('Employee',), ('EmployeeTerritory',), ('Order',),
('OrderDetail',), ('Product',), ('Region',), ('Shipper',), ('Supplier',),
('Territory',)]

Warning: unlike the diagram, the tables in SQLite are singular and not plural (do not end in s). And you can see the schema (CREATE TABLE statement) behind any given table with:

>>> curs.execute('SELECT sql FROM sqlite_master WHERE name="Customer";').fetchall()
[('CREATE TABLE "Customer" \n(\n  "Id" VARCHAR(8000) PRIMARY KEY, \n
"CompanyName" VARCHAR(8000) NULL, \n  "ContactName" VARCHAR(8000) NULL, \n
"ContactTitle" VARCHAR(8000) NULL, \n  "Address" VARCHAR(8000) NULL, \n  "City"
VARCHAR(8000) NULL, \n  "Region" VARCHAR(8000) NULL, \n  "PostalCode"
VARCHAR(8000) NULL, \n  "Country" VARCHAR(8000) NULL, \n  "Phone" VARCHAR(8000)
NULL, \n  "Fax" VARCHAR(8000) NULL \n)',)]

In particular note that the primary key is Id, and not CustomerId or SupplierId. On other tables (where it is a foreign key) it will be CustomerId. Also note - the Order table conflicts with the ORDER keyword! We'll just avoid that particular table, but it's a good lesson in the danger of keyword conflicts.

Answer the following questions (each is from a single table) and then save each query under the following variable name:

Your code (to load and query the data) should be saved in northwind.py. PLEASE NOTE that in parts 2 and 3 you will be saving the query itself to the indicated variable names, not the results of the query. You can still execute the query to check your work, but please save the query string itself to the variable names above. You're also encouraged to include the output from your queries as docstring comments, to facilitate grading and feedback, but this will not be checked by the autograder. Again, please make sure that these queries are declared in the main body of the northwind.py file and not in an if __name__ == "__main__": if-statement.

Part 3 - Sailing the Northwind Seas (JOIN statements)

You've answered some basic questions from the Northwind database, looking at individual tables - now it's time to put things together with JOIN!

Using sqlite3 in northwind.py, answer the following:

Your code (to load and query the data) should be saved in northwind.py. PLEASE NOTE that in parts 2 and 3 you will be saving the query itself to the indicated variable names, not the results of the query. You can still execute the query to check your work, but please save the query string itself to the variable names above. You're also encouraged to include the output from your queries as docstring comments to facilitate grading and feedback, but this will not be checked by the autograder. Again, please make sure that these queries are declared in the main body of the northwind.py file and not in an if __name__ == "__main__": if-statement.

Part 4 - Turn it in!

Submit the following files:

You're also encouraged to include the output from your queries as docstring comments, to facilitate grading and feedback. Thanks for your hard work!

Challenge Expectations

The Sprint Challenge is designed to test your mastery of the following key concepts:

What to Expect

In this sprint challenge, you'll apply everything you've learned about databases and SQL to work with real-world data scenarios. This challenge will test your ability to:

Remember to demonstrate your understanding of the concepts from all four modules in this sprint: SQL fundamentals, advanced analysis techniques, NoSQL concepts, and database scalability considerations!

Additional Resources