Skip to content

The Relational Model

The foundation of SQL lies in the Relational Model, a mathematical approach to managing data proposed by Edgar F. Codd in 1970. To master SQL, you must first understand how data is organized and, more importantly, how to think about that data.

In simple terms, the Relational Model represents data as a collection of relations. In the world of SQL, we call these relations Tables.

A table is a structured set of data consisting of:

  • Columns (Attributes): The categories of data being stored (e.g., Email, OrderDate, Price).
  • Rows (Tuples): Individual records or instances of those categories (e.g., a specific customer or a single transaction).

Before the Relational Model, data was often stored in hierarchical or network structures where you had to know the physical path to the data to retrieve it. If the structure changed, your code broke.

The Relational Model decouples the logical structure (how we see the data) from the physical storage (how the computer saves it). This allows you to ask for “All customers from New York” without caring whether they are stored on a hard drive in California or a cloud server in Ireland.

The biggest hurdle for most programmers is moving from Imperative Thinking to Declarative Set-Based Thinking.

In languages like Python or Java, you often use loops to process data:

  1. Start at the first row.
  2. Check if the city is “New York”.
  3. If yes, add it to a list.
  4. Move to the next row.
  5. Repeat until the end.

In SQL, you describe the result set you want: “Give me the subset of the ‘Customers’ table where the ‘City’ attribute is ‘New York’.”

The database engine handles the “how”—it might use an index, scan the whole table, or use multiple processor cores. Your job is only to define the criteria.

Let’s look at a sample table called Employees:

EmployeeIDFirstNameDepartmentSalary
101AliceEngineering95000
102BobMarketing72000
103CharlieEngineering88000

The schema is the blueprint of the table. It defines the names of the columns and the type of data they can hold (e.g., Integers for EmployeeID, Strings for FirstName).

  • Cardinality: The number of rows in the table (In our example: 3).
  • Degree (Arity): The number of columns in the table (In our example: 4).

Every table should have a Primary Key—a column (or set of columns) that uniquely identifies each row. No two rows can have the same Primary Key. In our example, EmployeeID is the Primary Key.

Imagine two sets of data:

  • Set A: All Employees.
  • Set B: All people in the Engineering Department.

A SQL query to find Engineering employees is essentially finding the Intersection of these sets or selecting a Subset based on a predicate (a condition).

Subset Selection
-- This is how we express that set-based requirement in SQL
SELECT *
FROM Employees
WHERE Department = 'Engineering';

The result is a new, virtual table containing only the rows that meet the criteria:

EmployeeIDFirstNameDepartmentSalary
101AliceEngineering95000
103CharlieEngineering88000

SQL is based on Relational Algebra and Tuple Relational Calculus. When you write a query, the database’s “Query Optimizer” converts your declarative statement into a series of algebraic operations (like Selection, Projection, and Join).

Because tables are mathematically “sets,” the order of rows in a table technically doesn’t matter. Unless you explicitly ask the database to sort the results (using ORDER BY), you should never rely on the order in which rows are returned. They are a bag of data, not a sorted list.