Last Updated: 2024-04-05
In this tutorial, you will learn how to add conditional logic to your SQL queries using two different expressions, IF
and CASE
. You will work through a series of scenario-based activities to facilitate your learning.
IF
expressionThe IF
expression allows you to conditionally return a value based on a specified condition. It is simple and concise, and is suitable for basic conditional checks with a single condition.
CASE
expressionThe CASE
expression evaluates multiple conditions in order and returns a result based on the first condition that is true. It also supports an ELSE
clause to specify a default result if none of the conditions are met.
The CASE
expression is more versatile and allows for multiple conditions and different result values for each condition. It can also be used in various parts of a query, such as in the SELECT
clause, WHERE
clause, or ORDER BY
clause.
IF
and CASE
In queries where you have a single boolean condition to evaluate, with two possible results, the IF
expression can make your query shorter and easier to read. Otherwise, it's better to use the CASE
statement.
This tutorial uses Starburst Galaxy to teach you how to use conditional logic in your SQL queries. It is suitable for those that already have a basic understanding of SQL.
Once you've completed this tutorial, you will be able to:
IF
expression to add conditional logic to a query.CASE
expression to add conditional logic to a query.IF
and CASE
based on the query requirements.You need a Starburst Galaxy account to complete this tutorial. Please see Starburst Galaxy: Getting started for instructions on setting up a free account.
Starburst tutorials are designed to get you up and running quickly by providing bite-sized, hands-on educational resources. Each tutorial explores a single feature or topic through a series of guided, step-by-step instructions.
As you navigate through the tutorial you should follow along using your own Starburst Galaxy account. This will help consolidate the learning process by mixing theory and practice.
You'll begin your exploration by learning how to use the IF
expression. You can use the IF
expression to construct a SQL query that behaves differently based on a given boolean condition.
The syntax for the IF
expression is as follows:
IF(condition, true_value, false_value)
If the condition evaluates to true
, the true_value
argument will be returned. If it evaluates to false
, false_value
will be returned.
Imagine you are a data analyst at a firm that sells parts to other businesses.
The company's customers are classified into one of five segments, based on the type of work they do: building, automotive, machinery, household, or furniture.
You have been asked to return a report that classifies the company's customers into one of two segment types, based on their segment. Customers in the building, automotive, and machinery segments will be assigned the "heavy" segment type indicating their connection to "heavy industry". Customers in the household or furniture segments should be assigned the "light" segment type.
This is the perfect opportunity to use the IF
expression in a query.
It's time to begin writing queries that use the IF
expression. We're going to use the query editor in Starburst Galaxy to complete this activity.
Starburst Galaxy comes with some sample catalogs to get you started. We'll be using one of these, the tpch
catalog, for this exercise.
The tutorial requires you to have a Starburst Galaxy account set up.
When you write a query, Starburst Galaxy needs to know which datasets you're accessing, and which schemas and tables you need.
Usually this means using the fully-qualified table name, which is written in a specific format:
.
You can think of this as the full name of the table. But just like a full name for a person, the fully-qualified table name is long. To avoid having to write this out each time, you can easily set the cluster, catalog, and schema for your session ahead of time and operate using shorter queries.
free-cluster
.tpch
.sf1
. To see the IF
expression in action, let's run a sample query. This one will return the custkey
and acctbal
from the customer
table. It will also return a third column called goodstanding
that will have a value of 1
if the acctbal
is less than 1000 and a value of 0
otherwise. We're going to limit the results to 5.
SELECT custkey, acctbal,
IF(acctbal < 1000, 1, 0) AS goodstanding
FROM customer
LIMIT 5;
Now that you have practiced with a simple IF
expression, you will test the query that assigns a segment type based on the customer's market segment.
For testing purposes, you will limit the results to 10 rows, rather than returning all customers.
SELECT custkey, mktsegment,
IF
(mktsegment IN
('BUILDING','AUTOMOBILE','MACHINERY'),
'heavy','light')
AS seg_type
FROM customer
LIMIT 10;
Now that you've learned how to use the simple IF
expression, it's time to move on to the more versatile CASE
expression.
In the simple form of the CASE
expression, a single input value is evaluated for equivalence against a series of possible matching values. When a matching value is found, Starburst returns the given result.
If none of the values equal the input value, the ELSE
result will be returned. If there is no ELSE
clause and no matching value, NULL
is returned.
All the possible return values must be of the same data type.
If there is more than one possible match, Starburst Galaxy will stop at the first match it finds, reading from left to right (or top to bottom).
The syntax for the simple CASE expression is as follows:
CASE input_value
WHEN match_value THEN result
[ WHEN ... ]
[ ELSE result ]
END
Example:
SELECT a,
CASE a
WHEN 1 THEN 'one'
WHEN 2 THEN 'two'
ELSE 'many'
END AS b
In the example above, there will be two columns in the result set, a
and b
. The first column is the value of a
. The second column uses a CASE
expression to return one of three different results, depending on the value of a
.
If a
is equal to 1
, the CASE
expression will return the string one
. If a
is equal to 2
, it will return two
. If a
is equal to any other value, the ELSE
result will be returned – in this example, the string many
.
CASE
expressionsThe searched form of the CASE
expression is more powerful and flexible than the simple form. With the simple form, you can only compare one value against the others, and you can only evaluate for equivalence. This is equivalent to asking"is this value equal to that other value?".
With a searched
CASE
expression, you can evaluate a series of boolean expressions – in other words, any expression that results in TRUE
or FALSE
.
As with the simple form, searched CASE
expressions are evaluated from left to right (or top to bottom). Evaluation stops and a result is returned for the first expression that evaluates to TRUE
.
If no conditions are true, the ELSE
result is returned (or NULL
if there is no ELSE
clause).
The syntax for the searched CASE
expression is as follows:
CASE
WHEN condition THEN result
[ WHEN ... ]
[ ELSE result ]
END
Example:
SELECT a, b,
CASE
WHEN a > b THEN 'more'
WHEN a < b THEN 'less'
ELSE 'same'
END AS c
In the example above, the result will have three columns. The first column is the value of a
, and the second column is the value of b
.
The third column uses a searched CASE
expression to return a string. If the value of a
is greater than b
, it returns more
. If a
is less than b
, it returns less
. The ELSE
clause will only be invoked when a
is equal to b
and it returns the string same
.
Imagine that you are a data analyst at a retail firm.
You have been asked to return a report that classifies the company's customers into one of three segments based on the value of their customer ID key, which indicates how early in the company's life they became your customer.
The three segments are "innovators", "early adopters", and "majority." Customers with IDs equal to or below 10,000 will be classed as innovators, customers with IDs between 10,000 and 50,000 will be early adopters, and those with IDs above 50,000 will be in the majority segment.
You can use the CASE
expression to achieve this goal.
You're going to use the Starburst Galaxy query editor again for this activity. You'll continue using the built-in tcph
catalog.
Before jumping into writing the query for segment classification, you're going to run some queries to experiment with the simple and searched forms of the CASE
expression.
CASE
expressionThe following SQL queries the region
table from the tpch
catalog. It maps region names to their commonly-used abbreviations using the simple form of CASE
.
SELECT regionkey,
name,
CASE name
WHEN 'AFRICA' THEN 'EMEA'
WHEN 'AMERICA' THEN 'AMER'
WHEN 'ASIA' THEN 'APAC'
WHEN 'EUROPE' THEN 'EMEA'
WHEN 'MIDDLE EAST' THEN 'EMEA'
ELSE 'UNKNOWN'
END AS abbreviation
FROM region;
CASE
expressionNow you will try out the searched form of CASE
.
You may have noticed in the previous query that some of the regions mapped to the same abbreviation. You can simplify that query by using a searched CASE
expression instead.
SELECT regionkey,
name,
CASE
WHEN name IN ('AFRICA', 'EUROPE', 'MIDDLE EAST') THEN 'EMEA'
WHEN name = 'AMERICA' THEN 'AMER'
WHEN name = 'ASIA' THEN 'APAC'
ELSE 'UNKNOWN'
END AS abbreviation
FROM region;
It's time to write the query that segments customers based on their time with the company.
Question: Given what you have learned so far about using CASE
, which form do you think is more appropriate for this query – simple or searched?
If you answered "searched" you are correct! We need to use inequality comparisons in our query to segment customers, so the searched CASE
expression is the most appropriate.
WHERE
clause to limit the number of records returned:SELECT custkey,
name,
CASE
WHEN custkey <= 10000 THEN 'innovator'
WHEN custkey > 10000 AND custkey <= 50000 THEN 'early adopter'
WHEN custkey > 50000 THEN 'majority'
END AS adopt_segment
FROM customer
WHERE custkey IN (1000,10000,15000,50000,100000)
ORDER BY custkey ASC;
Congratulations! You have reached the end of this tutorial, and the end of this stage of your journey.
You have learned how to write queries that use powerful conditional logic. As a reminder, the IF
expression is a good choice when you need to include a calculated column in your query results that will have one of two possible values, based on a single boolean condition.
For more complex conditional logic, the CASE
expression is the recommended option.
At Starburst, we believe in continuous learning. This tutorial provides the foundation for further training available on this platform, and you can return to it as many times as you like. Future tutorials will make use of the concepts used here.
Starburst has lots of other tutorials to help you get up and running quickly. Each one breaks down an individual problem and guides you to a solution using a step-by-step approach to learning.
Visit the Tutorials section to view the full list of tutorials and keep moving forward on your journey!