Semi-Structured Data
While SQL is traditionally for “Structured” data (rows and columns), modern databases can also store and query “Semi-Structured” data, primarily in the form of JSON (JavaScript Object Notation). This allows you to combine the rigidity of a schema with the flexibility of a document store.
Definition: JSON and JSONB
Section titled “Definition: JSON and JSONB”Most modern databases (like PostgreSQL) support two types of JSON:
JSON: Stores the exact text of the JSON blob. Every time you query it, the database has to re-parse the text.JSONB(JSON Binary): Stores the data in a decomposed binary format. It is slower to insert but much faster to query because it supports Indexing.
Context: Why use JSON in SQL?
Section titled “Context: Why use JSON in SQL?”- Flexible Metadata: Storing user preferences or dynamic attributes that change frequently.
- API Integration: Storing raw responses from third-party services.
- E-commerce: Storing product specifications that differ wildly between categories (e.g., a “Laptop” has a CPU, but a “Shirt” has a Fabric).
Detailed Explanation: Extraction Syntax (Postgres)
Section titled “Detailed Explanation: Extraction Syntax (Postgres)”PostgreSQL uses specific operators to reach into JSON objects:
->: Returns a JSON object/array.->>: Returns the value as Text.
SELECT data->'user' AS UserObject, data->>'email' AS EmailTextFROM RawLogs;Example: Querying a Metadata Column
Section titled “Example: Querying a Metadata Column”We have a Products table with a Metadata JSONB column:
| ID | Name | Metadata |
|---|---|---|
| 1 | Laptop | {"color": "silver", "ports": 4, "ram": "16GB"} |
| 2 | Mouse | {"color": "black", "wireless": true} |
Task: Find the color of all products and filter for wireless items.
Section titled “Task: Find the color of all products and filter for wireless items.”SELECT Name, Metadata->>'color' AS ColorFROM ProductsWHERE (Metadata->>'wireless')::BOOLEAN = true;Under the Hood: GIN Indexes
Section titled “Under the Hood: GIN Indexes”How do you make JSON queries fast? You can’t use a standard B-Tree index because the data isn’t a single value.
Instead, we use a GIN (Generalized Inverted Index).
- A GIN index breaks the JSON blob into its individual keys and values and maps them back to the row ID.
- This allows you to run a query like “Find all rows where the metadata contains the key ‘color’” in milliseconds, even on millions of rows.
Shredding (JSON to Rows)
Section titled “Shredding (JSON to Rows)”Sometimes you have a JSON array inside a column and you want to turn it into regular SQL rows. We use the jsonb_array_elements() function for this.
SELECT jsonb_array_elements(Metadata->'tags') FROM Products;