String Manipulation
Text data is often “dirty”—containing extra spaces, inconsistent casing, or combined information that needs to be separated. SQL provides a suite of scalar functions to handle these common text processing tasks.
Definition: String Functions
Section titled “Definition: String Functions”String functions are built-in tools that accept text as input and return modified text or metadata (like length) as output.
Context: Dialect Fragmentation
Section titled “Context: Dialect Fragmentation”String functions are unfortunately one of the least standardized parts of SQL. While the logic remains the same, the function names and argument orders often differ between PostgreSQL, SQL Server, and MySQL.
Detailed Explanation: Common Functions
Section titled “Detailed Explanation: Common Functions”1. Concatenation (Joining Strings)
Section titled “1. Concatenation (Joining Strings)”Combining two or more strings into one.
- ANSI/Postgres:
First || ' ' || Last - SQL Server:
First + ' ' + Last - MySQL:
CONCAT(First, ' ', Last)
2. Substring (Extracting Parts)
Section titled “2. Substring (Extracting Parts)”Taking a slice of a string.
SUBSTRING(string FROM start FOR length)- Example:
SUBSTRING('Technogic' FROM 1 FOR 4)returns'Tech'.
3. Cleaning (Trim, Upper, Lower)
Section titled “3. Cleaning (Trim, Upper, Lower)”TRIM(' text ')$ ightarrow$'text'UPPER('sql')$ ightarrow$'SQL'LOWER('SQL')$ ightarrow$'sql'
4. Length
Section titled “4. Length”Returns the number of characters in a string.
LENGTH('Data')returns4.
5. Replace
Section titled “5. Replace”Swaps a specific substring with another.
REPLACE('01-01-2023', '-', '/')$ ightarrow$'01/01/2023'
Example: Cleaning a Customer List
Section titled “Example: Cleaning a Customer List”We have a RawCustomers table:
| FullName | EmailAddress |
|---|---|
| alice smith | ALICE@GMAIL.COM |
| Bob Jones | bob.j@work.co |
Task: Format names to Uppercase and extract the email domain.
Section titled “Task: Format names to Uppercase and extract the email domain.”SELECT UPPER(TRIM(FullName)) AS CleanName, SUBSTRING(EmailAddress FROM POSITION('@' IN EmailAddress) + 1) AS DomainFROM RawCustomers;Output
Section titled “Output”| CleanName | Domain |
|---|---|
| ALICE SMITH | GMAIL.COM |
| BOB JONES | work.co |
Under the Hood: String Storage and Encoding
Section titled “Under the Hood: String Storage and Encoding”When you manipulate strings, you are interacting with the database’s Collation and Encoding.
1. Collation
Section titled “1. Collation”Collation determines how text is compared and sorted. It defines if 'a' is equal to 'A' (Case-Insensitivity) and if 'a' comes before 'b'. If you run a string function in a WHERE clause (e.g., WHERE LOWER(Email) = 'test@test.com'), the database might not be able to use a standard index because the index was built on the original casing, not the lowercase version.
2. Fixed vs. Variable Length
Section titled “2. Fixed vs. Variable Length”CHAR(10): Always takes 10 bytes. If you store ‘Hi’, it adds 8 spaces.VARCHAR(10): Only takes as much space as the text + a few bytes for length metadata.
When using string functions, CHAR columns often require TRIM() before they can be successfully compared or concatenated without trailing spaces causing issues.