Indexing
An index is a powerful data structure that allows the database to find specific rows significantly faster than scanning the entire table. Think of it as the index at the back of a textbook: instead of reading every page to find “Subqueries,” you look it up in the alphabetical list and jump straight to the correct page.
Definition: The Index
Section titled “Definition: The Index”An index is a pointer to data in a table. It is a separate physical structure that stores a sorted version of one or more columns, along with a “Row ID” (the physical address of the full row).
Context: The Read/Write Trade-off
Section titled “Context: The Read/Write Trade-off”Every time you add an index:
- Read Speed Increases: Queries with
WHEREorJOINon that column become much faster. - Write Speed Decreases: When you
INSERTa row, the database must update the table and every index associated with it. - Storage Increases: Indexes take up disk space, sometimes as much as the table itself.
Detailed Explanation: Types of Indexes
Section titled “Detailed Explanation: Types of Indexes”1. B-Tree Indexes (The Default)
Section titled “1. B-Tree Indexes (The Default)”B-Tree (Balanced Tree) is the most common index type. It keeps data sorted and allows for searches, sequential access, and range queries (e.g., BETWEEN or >).
- How it works: It uses a tree structure where each “node” leads to a narrower range of values until it hits a “leaf” containing the Row ID.
- Efficiency: $O(\log n)$. Finding a row in 1 billion records takes about 30 “hops.”
2. Hash Indexes
Section titled “2. Hash Indexes”Hash indexes use a hash table to map values to Row IDs.
- Efficiency: $O(1)$. It is near-instant.
- Limitation: It only works for equality (
=). You cannot use a Hash index for range searches (>) or sorting.
3. Clustered vs. Non-Clustered
Section titled “3. Clustered vs. Non-Clustered”- Clustered Index: This is the table itself. The data is physically stored on the disk in the order of the clustered index. A table can only have one clustered index (usually the Primary Key).
- Non-Clustered Index: A separate structure that points back to the clustered index or the physical row address. You can have many non-clustered indexes.
Example: Creating an Index
Section titled “Example: Creating an Index”-- Create a unique index on EmailCREATE UNIQUE INDEX idx_user_email ON Users(Email);
-- Create a composite index for searching by Last then First nameCREATE INDEX idx_staff_name ON Staff(LastName, FirstName);Under the Hood: Index Seek vs. Index Scan
Section titled “Under the Hood: Index Seek vs. Index Scan”When the database uses an index, it can do two things:
- Index Seek: The engine uses the B-Tree to jump straight to the relevant row. This is the “Gold Standard” of performance.
- Index Scan: The engine reads the entire index from start to finish. This is faster than a Table Scan because the index is smaller, but it’s still slower than a Seek.
The Composite Index “Left-to-Right” Rule
Section titled “The Composite Index “Left-to-Right” Rule”If you have an index on (LastName, FirstName), the index is sorted primarily by LastName.
WHERE LastName = 'Smith'$ ightarrow$ Seek (Fast).WHERE LastName = 'Smith' AND FirstName = 'John'$ ightarrow$ Seek (Very Fast).WHERE FirstName = 'John'$ ightarrow$ Scan (Slow). The database cannot use the index effectively because it’s not sorted by First Name first.