Skip to content

String Manipulation

Text data is often “dirty”—containing extra spaces, inconsistent casing, or combined information that needs to be separated. SQL provides a suite of scalar functions to handle these common text processing tasks.

String functions are built-in tools that accept text as input and return modified text or metadata (like length) as output.

String functions are unfortunately one of the least standardized parts of SQL. While the logic remains the same, the function names and argument orders often differ between PostgreSQL, SQL Server, and MySQL.

Combining two or more strings into one.

  • ANSI/Postgres: First || ' ' || Last
  • SQL Server: First + ' ' + Last
  • MySQL: CONCAT(First, ' ', Last)

Taking a slice of a string.

  • SUBSTRING(string FROM start FOR length)
  • Example: SUBSTRING('Technogic' FROM 1 FOR 4) returns 'Tech'.
  • TRIM(' text ') $ ightarrow$ 'text'
  • UPPER('sql') $ ightarrow$ 'SQL'
  • LOWER('SQL') $ ightarrow$ 'sql'

Returns the number of characters in a string.

  • LENGTH('Data') returns 4.

Swaps a specific substring with another.

  • REPLACE('01-01-2023', '-', '/') $ ightarrow$ '01/01/2023'

We have a RawCustomers table:

FullNameEmailAddress
alice smithALICE@GMAIL.COM
Bob Jonesbob.j@work.co

Task: Format names to Uppercase and extract the email domain.

Section titled “Task: Format names to Uppercase and extract the email domain.”
String Transformation
SELECT
UPPER(TRIM(FullName)) AS CleanName,
SUBSTRING(EmailAddress FROM POSITION('@' IN EmailAddress) + 1) AS Domain
FROM RawCustomers;
CleanNameDomain
ALICE SMITHGMAIL.COM
BOB JONESwork.co

Under the Hood: String Storage and Encoding

Section titled “Under the Hood: String Storage and Encoding”

When you manipulate strings, you are interacting with the database’s Collation and Encoding.

Collation determines how text is compared and sorted. It defines if 'a' is equal to 'A' (Case-Insensitivity) and if 'a' comes before 'b'. If you run a string function in a WHERE clause (e.g., WHERE LOWER(Email) = 'test@test.com'), the database might not be able to use a standard index because the index was built on the original casing, not the lowercase version.

  • CHAR(10): Always takes 10 bytes. If you store ‘Hi’, it adds 8 spaces.
  • VARCHAR(10): Only takes as much space as the text + a few bytes for length metadata.

When using string functions, CHAR columns often require TRIM() before they can be successfully compared or concatenated without trailing spaces causing issues.