Project 3: The Data Engineer
In this project, you will move from consuming data to architecting it. You have been given a messy, denormalized CSV file containing sales records for a global logistics company. Your job is to design and build a normalized Relational Database (3rd Normal Form) to store this data efficiently.
Scenario
Section titled “Scenario”The raw file logistics_dump.csv has 50 columns. Every row contains redundant info like the Customer’s address, the Shipper’s phone number, and the Product’s weight. This is causing data errors and wasting space.
The Goal: Normalization
Section titled “The Goal: Normalization”You need to split this one giant table into at least four related tables:
- Customers (ID, Name, Email, Address)
- Products (ID, Name, Weight, Price)
- Shippers (ID, CompanyName, Phone)
- Orders (ID, Date, CustomerID, ShipperID, Total)
Project Steps
Section titled “Project Steps”-
Draft the Schema
Define the data types and Primary/Foreign Key relationships for your four tables. -
Create the Objects
Write theCREATE TABLEstatements, including all necessaryNOT NULLandUNIQUEconstraints. -
The ETL (Extract, Transform, Load) Process
WriteINSERT INTO ... SELECT DISTINCT ...statements to move data from theRawDumptable into your new normalized tables. -
Enforce Integrity
AddCHECKconstraints to ensure prices are positive andFOREIGN KEYconstraints withON DELETE CASCADErules.
Key Skills Required
Section titled “Key Skills Required”- Data Definition (DDL): Building the table structures.
- Data Manipulation (DML): Using subqueries to populate your new tables without duplicates.
- Integrity Management: Setting up the “guards” that protect your new architecture.
The SQL Challenge: “The Audit Trail”
Section titled “The SQL Challenge: “The Audit Trail””Question:
Section titled “Question:”Create a Trigger or a View that automatically calculates the “Total Weight” for an order by summing the weight of all products associated with that order ID.
CREATE VIEW OrderWeightSummary ASSELECT o.OrderID, SUM(p.Weight * oi.Quantity) AS TotalWeightFROM Orders oJOIN OrderItems oi ON o.OrderID = oi.OrderIDJOIN Products p ON oi.ProductID = p.IDGROUP BY 1;Final Deliverable
Section titled “Final Deliverable”A single SQL script that, when run on a fresh database:
- Creates the schema.
- Populates the data from a temporary table.
- Verifies the integrity of the data (e.g., checking for orphaned records).