Last Updated: 2023-12-20
Starburst Galaxy's built-in attribute-based access control (ABAC) feature allows business domain owners, platform administrators, and data engineers to apply fine-grained access controls to various data entities. This is done by creating policies around the tags applied to those data entities. These controls are combined with roles and privileges, allowing organizations to enact precise, reusable policies around specific data entities.
The following diagram illustrates this architecture.
You need a Starburst Galaxy account to complete this tutorial. Please see Starburst Galaxy: Getting started for instructions on setting up a free account.
Upon successful completion of this tutorial, you will be able to:
Starburst tutorials are designed to get you up and running quickly by providing bite-sized, hands-on educational resources. Each tutorial explores a single feature or topic through a series of guided, step-by-step instructions.
As you navigate through the tutorial you should follow along using your own Starburst Galaxy account. This will help consolidate the learning process by mixing theory and practice.
The Information Security (InfoSec) team at Chryse Corp. requires all departments to hide pii data(personally identifiable information) from unauthorized users.
Fortunately for the InfoSec team, Chryse Corp. uses Starburst Galaxy, so this is easy to implement. All they need to do is create a role and policy that denies access to data entities with a pii tag. Once the role is in place, it can be inherited by other roles across the organization via Starburst Galaxy's role-based access control (RBAC) features, denying pii access to unauthorized individuals.
In this tutorial, you'll help Chryse Corp. by tagging data entities with a pii tag, and then creating a role with a policy that denies pii access to unauthorized users. You will also use role inheritance to deny pii access.
Attribute-based access control works by tagging data to create a group, then applying a policy to specific tags. These controls are especially useful for domain experts and line-of-business owners who work closely with the data associated with their domain.
Policies work by matching expressions to tags, and allowing or denying privileges based on matches. Starburst Galaxy allows two hierarchical levels of tags. A policy is applied to the top level tag, x
, with the matching expression has.tag(x)
. A policy is applied to a specific nested tag, y
, with the expression has.tag(x.y)
or to all nested tags with the expression has.tag(x.*)
. To learn more about policies, see our technical documentation page.
The following video walks through all of the steps in this tutorial. Please feel free to watch and follow along with the steps in your own account, or skip to the written instructions if you prefer.
Tags are the foundation of attribute-based access control. In this section, you will create three tags to identify personally identifiable information (pii). Your team has asked you to create tags for customer phone numbers and social security numbers.
Sign into Starburst Galaxy in the usual way. If you have not already set up an account, you can do that here.
Only the data entity owner can add metadata to data entities. In this tutorial, you'll add tags from the accountadmin role.
Now it's time to create a tag for pii data. The tags for phone number and ssn will be nested under the main pii tag. These are special types of pii data, so the tagging process will reflect this hierarchical relationship.
Now you can create the two nested tags. These will tag phone numbers and Social Security Number (SSN) data.
It's best to check that the new tags have been created successfully.
You are going to assign the tags that you just created to tables, so that you can later create policies based on those tags.
In this scenario, the table that you have been asked to tag is the customer
table, which contains several types of sensitive customer information. You're going to tag each of those types based on their attributes and then restrict access to them based on policies and roles.
customer
tableIn this example, you will add tags to the lakehouse_burst_bank.burst_bank.customer
table.
lakehouse_burst_bank
catalog.burst_bank
schema.customer
table.phone
columnNow it's time to assign your tags, starting with the phone tag. You will assign this tag to the phone
column.
phone
. Now it's time to assign the ssn tag to a column.
ssn
column.Now it's time to assign a pii tag to the last_name
column.
last_name
column.Again, it's best to confirm that all of these changes have been added successfully.
customer
table. Now, you'll bring it all together by creating a role that uses the tags to deny access to pii data. You will also create a second role that inherits the first role to see how the privileges are inherited.
You're creating the role in this step, and in the next step you'll add the policy to deny access.
Now it's time to add a new policy to the tag so that it can be implemented.
New policies require a definition. Complete the required fields to create the new policy.
Each policy in Starburst Galaxy has a defined scope. When you create a new policy, you must outline this scope as part of the creation process.
lakehouse_burst_bank
.has_tag(pii.*)
.In this step, you are going to add a new privilege that denies access when selected.
Now it's time to add a new role. In this scenario, you're creating a role for the marketing department.
After the role is created, you will then add the deny_pii role to the marketing role to see how the privileges combine.
You don't want marketing to have access to pii, so you're going to add the deny_pii role to the new marketing role.
Now it's time to add privileges that allow the marketing role to select all schemas inside the lakehouse_burst_bank
catalog.
Remember that some columns will be hidden for this role. This is because the marketing role has inherited privileges from the deny_pii role.
Now it's time to outline the details of the new privilege being created for the marketing role. This will outline exactly what the new privilege is allowed to do and what it is restricted from doing.
lakehouse_burst_bank
catalog. This will automatically ensure that this privilege applies to all schemas within the catalog.Now it's time to check whether the new privilege is working properly.
lakehouse_burst_bank
catalog. burst_bank
schema.customer
table. Notice that for both phone
and ssn
the Select from table column is denied. If you hover over either of them, you see that this restriction is inherited from the deny_pii policy.
Next, you're going to assign yourself to the role so that you can test it out later.
Your team wants to make sure that the marketing role does not have access to sensitive customer information. This will be an opportunity to test the policies that you just created to confirm that they work. You'll want to make sure that the new privileges allow the correct types of data while restricting the types you intended.
Let's get going!
First, you'll start by level-setting in the accountadmin role. Because accountadmin has broad privileges, you would expect to see all columns and all tables.
You're going to check that this is the case before proceeding.
aws-us-east-1-free
cluster is running, and that the lakehouse_burst_bank
catalog and the burst_bank
schema have been selected. last_name
, phone
, and ssn
columns from the customer
table. Each of these tests the attribute-based access in different ways. SELECT * FROM customer;
SELECT last_name,phone,ssn FROM customer;
Because accountadmin has total access to the columns in your query, you should see all three columns in your query results.
last_name
column shows results. phone
column shows results. ssn
column shows results. Now it's time to test the new marketing role to see whether the correct access has been granted and restricted in the appropriate way.
To do this, you're first going to have to switch roles to marketing so you can test its access as someone using that role.
Now you're in the marketing role. It's time to test out the privileges unique to that role. To do this, you're going to start with the 2nd part of your SQL statement.
Recall that the marketing role should have its access denied to the phone
and ssn
columns in the customer
table. You're going to re-run your query, but this time expect a different result. Specifically, you're going to expect the phone
and ssn
columns cannot be returned, resulting in an error.
Access Denied
error. SELECT
statement that you are not allowed to access.Now it's time to turn to the first query, the one that returned all results. Last time, with accountadmin enabled, you saw results from every column.
This time, you're in the marketing role, so you'd expect the restricted columns to be blocked. Specifically, you'd expect these columns to be absent from the resultset, but all other columns to still be present.
Time to test your hypothesis!
phone
and ssn
columns in the results of the first SQL statement. This is because the Marketing role does not have permission to view these columns. last_name
column are still visible. This is a bit unexpected. Although it makes sense, it might not be what you were imagining for the marketing role. has_tag(pii.*)
, which excludes parent tags. pii.phone
and pii.snn
but not pii
generally. The ABAC policies you set up for the marketing role worked to deny access to customer phone numbers and social security numbers as expected.
However, your team wanted the marketing role to be denied access to the customers' last names as well. Although it makes sense why this isn't the case, it's not quite what the company wanted.
Try to fix the matching expression in the deny_pii policy so that the last_name
column is hidden from the marketing role.
If you get stuck, view the last step below.
Let's edit the matching expression set for the deny_pii policy to add some additional logic. This will help resolve the problem you identified.
Now it's time to make the changes to the logic governing the scope of the policy. This will make it so that the marketing role restricts access to the last_name
column.
has_tag(pii.*) OR has_tag(pii)
. pii.*
expressions and the parent pii
expression. Congratulations! You have reached the end of this tutorial, and the end of this stage of your journey.
Now that you've completed this tutorial, you should have a better understanding of just how easy and convenient it is to use attribute-based access control in Starburst Galaxy.
At Starburst, we believe in continuous learning. This tutorial provides the foundation for further training available on this platform, and you can return to it as many times as you like. Future tutorials will make use of the concepts used here.
Starburst has lots of other tutorials to help you get up and running quickly. Each one breaks down an individual problem and guides you to a solution using a step-by-step approach to learning.
Visit the Tutorials section to view the full list of tutorials and keep moving forward on your journey!