Skip to content

Indexing

An index is a powerful data structure that allows the database to find specific rows significantly faster than scanning the entire table. Think of it as the index at the back of a textbook: instead of reading every page to find “Subqueries,” you look it up in the alphabetical list and jump straight to the correct page.

An index is a pointer to data in a table. It is a separate physical structure that stores a sorted version of one or more columns, along with a “Row ID” (the physical address of the full row).

Every time you add an index:

  1. Read Speed Increases: Queries with WHERE or JOIN on that column become much faster.
  2. Write Speed Decreases: When you INSERT a row, the database must update the table and every index associated with it.
  3. Storage Increases: Indexes take up disk space, sometimes as much as the table itself.

B-Tree (Balanced Tree) is the most common index type. It keeps data sorted and allows for searches, sequential access, and range queries (e.g., BETWEEN or >).

  • How it works: It uses a tree structure where each “node” leads to a narrower range of values until it hits a “leaf” containing the Row ID.
  • Efficiency: $O(\log n)$. Finding a row in 1 billion records takes about 30 “hops.”

Hash indexes use a hash table to map values to Row IDs.

  • Efficiency: $O(1)$. It is near-instant.
  • Limitation: It only works for equality (=). You cannot use a Hash index for range searches (>) or sorting.
  • Clustered Index: This is the table itself. The data is physically stored on the disk in the order of the clustered index. A table can only have one clustered index (usually the Primary Key).
  • Non-Clustered Index: A separate structure that points back to the clustered index or the physical row address. You can have many non-clustered indexes.
Indexing Syntax
-- Create a unique index on Email
CREATE UNIQUE INDEX idx_user_email ON Users(Email);
-- Create a composite index for searching by Last then First name
CREATE INDEX idx_staff_name ON Staff(LastName, FirstName);

When the database uses an index, it can do two things:

  1. Index Seek: The engine uses the B-Tree to jump straight to the relevant row. This is the “Gold Standard” of performance.
  2. Index Scan: The engine reads the entire index from start to finish. This is faster than a Table Scan because the index is smaller, but it’s still slower than a Seek.

The Composite Index “Left-to-Right” Rule

Section titled “The Composite Index “Left-to-Right” Rule”

If you have an index on (LastName, FirstName), the index is sorted primarily by LastName.

  • WHERE LastName = 'Smith' $ ightarrow$ Seek (Fast).
  • WHERE LastName = 'Smith' AND FirstName = 'John' $ ightarrow$ Seek (Very Fast).
  • WHERE FirstName = 'John' $ ightarrow$ Scan (Slow). The database cannot use the index effectively because it’s not sorted by First Name first.