Skip to main content

Helpful Superset SQL Queries for Charts

String Formatting and Manipulation

Remove underscore and capitalize first letter of the string:

CONCAT(UPPER(SUBSTRING(REPLACE("Column_string", '_', ' ') FROM 1 FOR 1)),
LOWER(SUBSTRING(REPLACE("Column_string", '_', ' ') FROM 2)))
BeforeAfter
"example_string""Example string"

Capitalize first letter of each word and replace _ with a space

INITCAP(REPLACE("Column_string", '_', ' '))
BeforeAfter
"example_string""Example String"

Unnest values based on regex, and then capitalize each first letter of each word.

REGEXP_REPLACE(INITCAP(unnest(ARRAY_REMOVE(string_to_array(Column_string, ' '), ''))), '_', ' ', 'g')
BeforeAfter
"example_string,another_example""Example_String", "Another_Example"

Combine the values of two fields, and then apply regex to replace _ with ", " and capitalize the first letter of each word.

CASE
WHEN "Which_ethnicity_does_the_person_identify_with" IN ('other', 'other, 1') THEN 'Other'
ELSE REGEXP_REPLACE(INITCAP(REPLACE("Which_ethnicity_does_the_person_identify_with", '_', ', ')), ', ([a-z])', ', ' || UPPER(SUBSTRING('\1' FROM 1 FOR 1)) || SUBSTRING('\1' FROM 2))
END
BeforeAfter
"dutch_american""Dutch, American"
"german_french""German, French"
"other""Other"
"other, 1""Other"

Data Type Conversion

Convert a TEXT date field (with format 2019-02-24) into DATETIME

"date"::timestamp
BeforeAfter
"2019-02-24""2019-02-24 00:00:00"
"2020-07-15""2020-07-15 00:00:00"
"2021-11-30""2021-11-30 00:00:00"

Cast a Kobo timestamp to YYYY-MM-DD

MAX(TO_TIMESTAMP(_submission_time, 'YYYY-MM-DD"T"HH24:MI:SS'))
_submission_time (Before)_submission_time
2024-06-25T15:27:322024-06-05

Creating Categories and Ranges

Recast a string column as numeric, and use specific ranges for the values:

CASE 
WHEN CAST("Age" as INTEGER) BETWEEN 0 AND 9 THEN '0-10'
WHEN CAST("Age" as INTEGER) BETWEEN 10 AND 19 THEN '10-20'
WHEN CAST("Age" as INTEGER) BETWEEN 20 AND 29 THEN '20-30'
WHEN CAST("Age" as INTEGER) BETWEEN 30 AND 39 THEN '30-40'
WHEN CAST("Age" as INTEGER) BETWEEN 40 AND 49 THEN '40-50'
WHEN CAST("Age" as INTEGER) BETWEEN 50 AND 59 THEN '50-60'
WHEN CAST("Age" as INTEGER) BETWEEN 60 AND 69 THEN '60-70'
WHEN CAST("Age" as INTEGER) BETWEEN 70 AND 79 THEN '70-80'
WHEN CAST("Age" as INTEGER) >= 80 THEN '80 and over'
ELSE 'Other'
END
Age (Before)Age Range (After)
"5""0-10"
"15""10-20"
"27""20-30"
"31""30-40"
"44""40-50"
"58""50-60"
"65""60-70"
"79""70-80"
"85""80 and over"
"9000""80 and over"
"unknown""Other"

Recast a birthdate text field that is "YYYY-MM-DD" to only YYYY.

CASE
WHEN CAST(SUBSTRING("Birthdate", 1, 4) AS INTEGER) <= 1940 THEN 'Before 1940'
WHEN CAST(SUBSTRING("Birthdate", 1, 4) AS INTEGER) BETWEEN 1940 AND 1949 THEN '1940 - 1949'
WHEN CAST(SUBSTRING("Birthdate", 1, 4) AS INTEGER) BETWEEN 1950 AND 1959 THEN '1950 - 1959'
WHEN CAST(SUBSTRING("Birthdate", 1, 4) AS INTEGER) BETWEEN 1960 AND 1969 THEN '1960 - 1969'
WHEN CAST(SUBSTRING("Birthdate", 1, 4) AS INTEGER) BETWEEN 1970 AND 1979 THEN '1970 - 1979'
WHEN CAST(SUBSTRING("Birthdate", 1, 4) AS INTEGER) BETWEEN 1980 AND 1989 THEN '1980 - 1989'
WHEN CAST(SUBSTRING("Birthdate", 1, 4) AS INTEGER) BETWEEN 1990 AND 1999 THEN '1990 - 1999'
WHEN CAST(SUBSTRING("Birthdate", 1, 4) AS INTEGER) BETWEEN 2000 AND 2009 THEN '2000 - 2009'
WHEN CAST(SUBSTRING("Birthdate", 1, 4) AS INTEGER) BETWEEN 2010 AND 2019 THEN '2010 - 2019'
WHEN CAST(SUBSTRING("Birthdate", 1, 4) AS INTEGER) BETWEEN 2020 AND 2029 THEN '2020 - 2029'
ELSE 'After 2029'
END
Birthdate (Before)Birthyear (After)
"1935-06-15""Before 1940"
"1945-08-20""1940 - 1949"
"1955-12-01""1950 - 1959"
"1965-03-22""1960 - 1969"
"1975-07-30""1970 - 1979"
"1985-11-05""1980 - 1989"
"1995-04-18""1990 - 1999"
"2005-09-25""2000 - 2009"
"2015-02-14""2010 - 2019"
"2025-10-10""2020 - 2029"

Categorize distance based on meter ranges

CASE 
WHEN "how_many_meters_is_the_nest_from_bush" = '' THEN 'Not recorded'
WHEN CAST("how_many_meters_is_the_nest_from_bush" as REAL) BETWEEN 0 AND 0.9 THEN '0-1m'
WHEN CAST("how_many_meters_is_the_nest_from_bush" as REAL) BETWEEN 1 AND 1.9 THEN '1-2m'
WHEN CAST("how_many_meters_is_the_nest_from_bush" as REAL) BETWEEN 2 AND 2.9 THEN '2-3m'
WHEN CAST("how_many_meters_is_the_nest_from_bush" as REAL) BETWEEN 3 AND 3.9 THEN '3-4m'
WHEN CAST("how_many_meters_is_the_nest_from_bush" as REAL) BETWEEN 4 AND 4.9 THEN '4-5m'
WHEN CAST("how_many_meters_is_the_nest_from_bush" as REAL) >= 5 THEN '5m and over'
ELSE 'Not recorded'
END
Distance (Before)Distance Range (After)
"""Not recorded"
"0.5""0-1m"
"1.2""1-2m"
"2.8""2-3m"
"3.5""3-4m"
"4.7""4-5m"
"5.0""5m and over"
"6.3""5m and over"
"unknown""Not recorded"

Data Cleaning and Value Transformation

Convert 0 and 1 form responses to "No" and "Yes"

CASE
WHEN "Is_the_person_a_member_of_the_community" = '0' THEN 'No'
WHEN "Is_the_person_a_member_of_the_community" = '1' THEN 'Yes'
END
BeforeAfter
"0""No"
"1""Yes"

Convert gender form responses to strings in the Indigenous language, with "no answer" fallback

CASE
WHEN "Gender" = 'man__eluwa' THEN 'Man (Eluwa)'
WHEN "Gender" = 'vrouw__w_lei' THEN 'Vrouw (Wëlei)'
ELSE 'No answer'
END
BeforeAfter
"man__eluwa""Man (Eluwa)"
"vrouw__w_lei""Vrouw (Wëlei)"
NULL"No answer"

Set null values to "No answer"

COALESCE("Did_you_take_a_course", 'No answer')
BeforeAfter
NULL"No answer"

Geospatial Data Processing

Create a virtual database with latitude and longitude (for Mapbox Map chart)

For a field Record_your_current_location with values like "-1.234567 8.901234 124.4 15.899999618530273" (latitude, longitude, altitude, accuracy)

SELECT 
CAST(SPLIT_PART("Record_your_current_location", ' ', 1) AS FLOAT) AS latitude,
CAST(SPLIT_PART("Record_your_current_location", ' ', 2) AS FLOAT) AS longitude
FROM my_kobo_form