Skip to content

Semi-Structured Data

While SQL is traditionally for “Structured” data (rows and columns), modern databases can also store and query “Semi-Structured” data, primarily in the form of JSON (JavaScript Object Notation). This allows you to combine the rigidity of a schema with the flexibility of a document store.

Most modern databases (like PostgreSQL) support two types of JSON:

  1. JSON: Stores the exact text of the JSON blob. Every time you query it, the database has to re-parse the text.
  2. JSONB (JSON Binary): Stores the data in a decomposed binary format. It is slower to insert but much faster to query because it supports Indexing.
  • Flexible Metadata: Storing user preferences or dynamic attributes that change frequently.
  • API Integration: Storing raw responses from third-party services.
  • E-commerce: Storing product specifications that differ wildly between categories (e.g., a “Laptop” has a CPU, but a “Shirt” has a Fabric).

Detailed Explanation: Extraction Syntax (Postgres)

Section titled “Detailed Explanation: Extraction Syntax (Postgres)”

PostgreSQL uses specific operators to reach into JSON objects:

  • ->: Returns a JSON object/array.
  • ->>: Returns the value as Text.
JSON Extraction
SELECT
data->'user' AS UserObject,
data->>'email' AS EmailText
FROM RawLogs;

We have a Products table with a Metadata JSONB column:

IDNameMetadata
1Laptop{"color": "silver", "ports": 4, "ram": "16GB"}
2Mouse{"color": "black", "wireless": true}

Task: Find the color of all products and filter for wireless items.

Section titled “Task: Find the color of all products and filter for wireless items.”
JSON Filtering
SELECT
Name,
Metadata->>'color' AS Color
FROM Products
WHERE (Metadata->>'wireless')::BOOLEAN = true;

How do you make JSON queries fast? You can’t use a standard B-Tree index because the data isn’t a single value.

Instead, we use a GIN (Generalized Inverted Index).

  • A GIN index breaks the JSON blob into its individual keys and values and maps them back to the row ID.
  • This allows you to run a query like “Find all rows where the metadata contains the key ‘color’” in milliseconds, even on millions of rows.

Sometimes you have a JSON array inside a column and you want to turn it into regular SQL rows. We use the jsonb_array_elements() function for this.

JSON Unnesting
SELECT jsonb_array_elements(Metadata->'tags') FROM Products;