Skip to content

Set Theory Basics

SQL is based on Relational Algebra, which is a specialized branch of Set Theory. To understand how JOINs and Set Operators work, you must first understand the three core ways two sets of data can interact.

In the context of a database, a Set is a collection of distinct objects. Every table can be viewed as a set of rows.

Imagine we have two sets of data:

  • Set A: 3
  • Set B: 5

Combining all elements from both sets. In pure set theory, duplicates are removed.

  • Result: 5

Finding only the elements that exist in both sets.

  • Result: 3

Finding elements that are in Set A but not in Set B.

  • Result: 2

Detailed Explanation: Relational Cardinality

Section titled “Detailed Explanation: Relational Cardinality”

When we combine two tables, we must consider the Cardinality (the relationship) between them. This determines how many rows the resulting “set” will have.

Each row in Table A matches exactly one row in Table B.

  • Example: Users and UserProfiles.

A single row in Table A can match multiple rows in Table B.

  • Example: A Customer can have many Orders.

Multiple rows in Table A can match multiple rows in Table B.

  • Example: Students and Courses. (A student takes many courses; a course has many students).
  • Note: In database design, M:M relationships are usually handled by a “Junction Table” in the middle.

Example: Conceptualizing Table Interaction

Section titled “Example: Conceptualizing Table Interaction”

Imagine two tables in a library system:

Table: Authors

  • Set of all writers.

Table: Books

  • Set of all books.

If we want to find “Authors who have written at least one book,” we are looking for the Intersection of the set of all people and the set of book creators.

If we want to find “Authors who have never written a book,” we are looking for the Difference between the Authors set and the Books set.

In Set Theory, we use logic to define membership in a set. In SQL, this is the ON clause in a JOIN or the WHERE clause in a query.

When you join Table A and Table B, the database engine is essentially creating a Cartesian Product (every possible combination of rows) and then using your logic to filter out the combinations that don’t belong in the set you requested.

The larger the sets, the more combinations the database has to evaluate. This is why multi-table queries can become slow very quickly if not indexed properly.