Set Theory Basics
SQL is based on Relational Algebra, which is a specialized branch of Set Theory. To understand how JOINs and Set Operators work, you must first understand the three core ways two sets of data can interact.
Definition: What is a Set?
Section titled “Definition: What is a Set?”In the context of a database, a Set is a collection of distinct objects. Every table can be viewed as a set of rows.
Context: The Three Fundamental Operations
Section titled “Context: The Three Fundamental Operations”Imagine we have two sets of data:
- Set A: 3
- Set B: 5
1. Union ($A \cup B$)
Section titled “1. Union ($A \cup B$)”Combining all elements from both sets. In pure set theory, duplicates are removed.
- Result: 5
2. Intersection ($A \cap B$)
Section titled “2. Intersection ($A \cap B$)”Finding only the elements that exist in both sets.
- Result: 3
3. Difference ($A - B$)
Section titled “3. Difference ($A - B$)”Finding elements that are in Set A but not in Set B.
- Result: 2
Detailed Explanation: Relational Cardinality
Section titled “Detailed Explanation: Relational Cardinality”When we combine two tables, we must consider the Cardinality (the relationship) between them. This determines how many rows the resulting “set” will have.
1. One-to-One (1:1)
Section titled “1. One-to-One (1:1)”Each row in Table A matches exactly one row in Table B.
- Example:
UsersandUserProfiles.
2. One-to-Many (1:M)
Section titled “2. One-to-Many (1:M)”A single row in Table A can match multiple rows in Table B.
- Example: A
Customercan have manyOrders.
3. Many-to-Many (M:M)
Section titled “3. Many-to-Many (M:M)”Multiple rows in Table A can match multiple rows in Table B.
- Example:
StudentsandCourses. (A student takes many courses; a course has many students). - Note: In database design, M:M relationships are usually handled by a “Junction Table” in the middle.
Example: Conceptualizing Table Interaction
Section titled “Example: Conceptualizing Table Interaction”Imagine two tables in a library system:
Table: Authors
- Set of all writers.
Table: Books
- Set of all books.
If we want to find “Authors who have written at least one book,” we are looking for the Intersection of the set of all people and the set of book creators.
If we want to find “Authors who have never written a book,” we are looking for the Difference between the Authors set and the Books set.
Under the Hood: The Predicate
Section titled “Under the Hood: The Predicate”In Set Theory, we use logic to define membership in a set. In SQL, this is the ON clause in a JOIN or the WHERE clause in a query.
When you join Table A and Table B, the database engine is essentially creating a Cartesian Product (every possible combination of rows) and then using your logic to filter out the combinations that don’t belong in the set you requested.
Performance Hint
Section titled “Performance Hint”The larger the sets, the more combinations the database has to evaluate. This is why multi-table queries can become slow very quickly if not indexed properly.