Indexing

An index is a powerful data structure that allows the database to find specific rows significantly faster than scanning the entire table. Think of it as the index at the back of a textbook: instead of reading every page to find “Subqueries,” you look it up in the alphabetical list and jump straight to the correct page.

Definition: The Index

An index is a pointer to data in a table. It is a separate physical structure that stores a sorted version of one or more columns, along with a “Row ID” (the physical address of the full row).

Context: The Read/Write Trade-off

Every time you add an index:

Read Speed Increases: Queries with WHERE or JOIN on that column become much faster.
Write Speed Decreases: When you INSERT a row, the database must update the table and every index associated with it.
Storage Increases: Indexes take up disk space, sometimes as much as the table itself.

Detailed Explanation: Types of Indexes

1. B-Tree Indexes (The Default)

B-Tree (Balanced Tree) is the most common index type. It keeps data sorted and allows for searches, sequential access, and range queries (e.g., BETWEEN or >).

How it works: It uses a tree structure where each “node” leads to a narrower range of values until it hits a “leaf” containing the Row ID.
Efficiency: $O(\log n)$. Finding a row in 1 billion records takes about 30 “hops.”

2. Hash Indexes

Hash indexes use a hash table to map values to Row IDs.

Efficiency: $O(1)$. It is near-instant.
Limitation: It only works for equality (=). You cannot use a Hash index for range searches (>) or sorting.

3. Clustered vs. Non-Clustered

Clustered Index: This is the table itself. The data is physically stored on the disk in the order of the clustered index. A table can only have one clustered index (usually the Primary Key).
Non-Clustered Index: A separate structure that points back to the clustered index or the physical row address. You can have many non-clustered indexes.

Example: Creating an Index

-- Create a unique index on Email
CREATE UNIQUE INDEX idx_user_email ON Users(Email);

-- Create a composite index for searching by Last then First name
CREATE INDEX idx_staff_name ON Staff(LastName, FirstName);

Under the Hood: Index Seek vs. Index Scan

When the database uses an index, it can do two things:

Index Seek: The engine uses the B-Tree to jump straight to the relevant row. This is the “Gold Standard” of performance.
Index Scan: The engine reads the entire index from start to finish. This is faster than a Table Scan because the index is smaller, but it’s still slower than a Seek.

The Composite Index “Left-to-Right” Rule

If you have an index on (LastName, FirstName), the index is sorted primarily by LastName.

WHERE LastName = 'Smith' $ ightarrow$ Seek (Fast).
WHERE LastName = 'Smith' AND FirstName = 'John' $ ightarrow$ Seek (Very Fast).
WHERE FirstName = 'John' $ ightarrow$ Scan (Slow). The database cannot use the index effectively because it’s not sorted by First Name first.