These cookies are essential in order to enable you to move around the website and use its features, such as accessing secure areas of the website.
Analytical/ Performance Cookies
These are analytics cookies that allow us to collect information about how visitors use a website, for instance which pages visitors go to most often, and if they get error messages from web pages. This helps us to improve the way the website works and allows us to test different ideas on the site.
Functional/ Preference Cookies
These cookies allow our website to properly function and in particular will allow you to use its more personal features.
Targeting/ Advertising Cookies
These cookies are used by third parties to build a profile of your interests and show you relevant adverts on other sites. You should check the relevant third party website for more information and how to opt out, as described below.
Last Updated: 2024-04-16
Background
Azure Private Link is a Microsoft Azure service that enables you to securely connect your Azure Virtual Network to Azure Platform as a Service (PaaS) resources, Azure Virtual Machine (VM) instances, and Azure Kubernetes Service (AKS) clusters. This approach provides a secure way to access these services over a private endpoint located inside your virtual network, eliminating the need to expose connections to the public internet.
Starburst Galaxy extends support for Azure Private Link across specific catalogs. This tutorial will guide you through the process of configuring Private Link for Azure data lake storage (ADLS).
Scope of tutorial
In this tutorial, you will learn how to configure Azure Private Link for Azure data lake storage (ADLS). You will not cover the internal steps performed by Starburst technical support.
When you configure an ADLS catalog in Starburst Galaxy, you have two metastore options, the Starburst Galaxy Metastore and Hive Metastore. If you plan on using your own Hive Metastore, you must also configure Private Link access to it. The steps for doing so are included in this tutorial for those that need them.
Learning objectives
Once you've completed this tutorial, you will be able to:
Configure an Azure Private Link connection between Starburst Galaxy and your Azure data lake storage.
Use Private Link to securely connect Starburst Galaxy to your Azure data lake storage.
Prerequisites
You need a Starburst Galaxy account to complete this tutorial. Please see Starburst Galaxy: Getting started for instructions on setting up a free account.
This tutorial comes with a bring your own storage requirement. Before continuing with this tutorial, you will need to set up an Azure data lake storage account.
If your data source is configured with an internal firewall for access control, you will need to create an inbound rule for the Starburst Galaxy CIDR 10.0.0.0/8.
About Starburst tutorials
Starburst tutorials are designed to get you up and running quickly by providing bite-sized, hands-on educational resources. Each tutorial explores a single feature or topic through a series of guided, step-by-step instructions.
Background
If you are configuring Private Link for the first time you are encouraged to work with a Starburst technical resource. This individual will work with you to set up the environment needed to complete the tutorial.
Contacting your technical resource
To be assigned this resource, you should reach out to your regular Starburst account team for assistance.
Working together
Once assigned, your Starburst technical resource will work with you to set up an environment where you can complete the tutorial.
Please review the following overview of this process before beginning the tutorial.
Your responsibilities:
Ensure that you have an ADLS Gen2 enabled account.
Determine your public network access setting.
Record your storage account resource ID.
Submit a support request via Starburst Galaxy to have two private endpoints created.
Accept both endpoint connections.
If you are using an HMS on a VM:
Configure a public load balancer.
Configure an internal load balancer.
Submit a support request via Starburst Galaxy to have a private endpoint created.
Accept the endpoint connection.
Background
Understanding the Azure Private Link architecture is important when completing the steps in this tutorial. In this section you will learn about this architecture and the way that Starburst Galaxy uses it to securely connect private clouds.
This tutorial also follows a corresponding Azure quickstart on the same topic. It is recommended that you consult this documentation if you want to learn more about Azure Private Link.
Reference architecture
The following diagram illustrates a Private Link connection to an ADLS account.
Review the diagram to ensure that you understand the architecture that you will create in this tutorial.
Background
It's time to get started. In this section, you'll begin by confirming that your Azure storage account is configured for ADLS Gen2. After that, you'll obtain your Azure storage account ID.
You'll need to provide this information to the Starburst support team so that they can create the private endpoints.
Step 1: Sign in to Azure portal
You're going to start by signing in to the Azure portal. Remember to sign into the account containing the Azure storage account that you would like to connect using Private Link. If you use multiple Azure accounts, ensure that you pick the correct one.
Sign in to your Azure account.
Step 2: Select storage account
Now it's time to find the right storage account. Depending on your workflow, you might have multiple storage accounts in the same Azure account. Once again, make sure you select the correct one.
Using the search bar at the top of the screen, search for storage accounts.
Select Storage accounts from the list of results.
Using the filter bar, type the name of your storage account.
Select your storage account from the filtered list.
Step 3: Confirm hierarchical namespace property
Azure Data Lake Storage Gen2 is a set of capabilities that you use with the Blob Storage service of your Azure Storage account. Notably, ADLS Gen2 includes hierarchical namespace functionality, allowing users to organize their data into directories and subdirectories. This facilitates better organization and management compared to ADLS Gen1.
This step will show you how to confirm that a hierarchical namespace has been enabled on your ADLS account.
Select the Properties tab.
In the Data Lake Storage section, ensure that the Hierarchical namespace property is Enabled.
Step 4: Determine public network access setting
ADLS accounts provide three options for public network access:
Enabled from all networks
Enabled from selected virtual networks and IP addresses
Disabled
You need to determine the public network access configuration for the storage account you are about to configure for Private Link.
Using the left-hand navigation menu, select Networking.
Select the Firewalls and virtual networks tab.
Confirm your Public network access setting.
Click the Private endpoint connections tab.
Confirm whether you have any existing private endpoint connections. If there are none listed, then you don't have any to this ADLS account.
If you do see one or more private endpoint connections, we advise you to take the appropriate action to enable private endpoints for your internal virtual networks before continuing with this tutorial.
Step 5: Record storage account resource ID
Next, it's time to record your storage account resource ID. Starburst support will need this ID to create two private endpoints in the Starburst Galaxy Vnet.
Using the left-hand navigation menu, select Endpoints.
Copy the Storage account resource ID and paste it into a text editor for later reference.
Step 6: Open support ticket
You are going to use the automated assistant in Starburst Galaxy to open a support ticket and provide support with the Storage account resource ID that you just copied. You will also need to provide your preferred Starburst Galaxy Private Link configuration name.
Log in to Starburst Galaxy.
Click the support icon located at the bottom right of the screen.
Select Chat with technical support.
Select Submit a Support Ticket.
The automated assistant will ask you to provide your email address, first name, and last name.
When you receive the prompt to describe your issue, note that you would like support to create two private endpoints for you. Be sure to include the Storage account resource ID you just copied and your preferred Starburst Galaxy Private Link connection name.
Wait for Starburst support to confirm that they have created the endpoints in Starburst Galaxy. This should take no longer than 24 - 48 hours.
Background
When working with private endpoints in Azure Data Lake Storage Gen2, it is considered best practice to create two private endpoints:
One for the Data Lake Storage Gen2 resource;
The other for the Blob Storage resource.
This is because operations that target the Data Lake Storage Gen2 resource may be redirected to Blob Storage and vice versa. Creating two private endpoints ensures that all operations will complete successfully.
Starburst support will use the Storage account resource ID that you provided to create these private endpoints. You will then need to manually accept the endpoint connections.
Recall that when you accept your first Private Endpoint connection to an ADLS Storage account, existing access will in turn be blocked. Please keep this in mind If you are about to accept the first endpoint connection.
You're going to begin by selecting your private endpoint connection settings. This is found in the Networking section of the Azure portal.
Using the left-hand navigation menu, select Networking.
Select the Private Endpoint Connections tab.
Step 2: Accept connections
Once Starburst support has created the private endpoints, you will see the connections listed as Pending.
Confirm with Starburst support that the endpoints have been created.
In the Private endpoint connections section, click the Refresh button until your connections appear.
When they have appeared, select both new connections.
Click the Approve button.
In the Description field, enter a meaningful description.
Click the Yes button.
In the Connection state column, confirm that the status of both endpoints has changed to Approved.
Note: Click the Refresh button if necessary.
You are now ready to configure an ADLS catalog in Starburst Galaxy using a Private Link connection.
Background
You should only complete this section if both of the following are true:
You have an HMS that is deployed on an Azure virtual machine.
You plan to use that HMS when you configure your Azure ADLS catalog using Private Link.
If you meet both of the above criteria, you need to make sure that your HMS is set up to use Private Link.
In this section, you'll begin by determining if your HMS is already set up for Private Link. If it isn't, you can follow the steps provided to set it up. This will require you to configure both a public and internal load balancer for your HMS.
Step 1: Confirm HMS Private Link status
You can confirm the Private Link status of your HMS with a few quick clicks in the Azure portal.
Using the search bar at the top of the screen, search for virtual machines.
Select Virtual machines from the list of results.
Using the filter bar, type the name of your virtual machine.
Click on your virtual machine.
Using the left-hand navigation menu, select Load balancing.
If you see two load balancers, one for public connectivity and one for internal connectivity, your HMS is already set up for Private Link and you can skip to Step 10 in this section.
Step 2: Add public load balancer
If this is your first time setting up a public load balancer for this VM, it's crucial to understand that once configured, all outbound traffic will be directed through this load balancer. However, it's important to note that by default, a public load balancer does not include an outbound rule. This means that from the moment the load balancer is added until the outbound rule is established, your HMS will be unable to access the public internet.
This is significant because Azure relies on a public endpoint to authenticate your Service Principal Name (SPN) credentials or Access Key for your ADLS account. Any actions initiated from your HMS, such as attempting to run a CREATE SCHEMA command, during this interim period will result in failure.
The good news is that creating the outbound rule typically only takes a few minutes after the load balancer setup. Once this rule is in place, your HMS will no longer encounter issues with CREATE SCHEMA commands.
However, it's important to issue a final caution: Azure VMs default to caching credential verifications for a certain period. Therefore, if you test the CREATE SCHEMA command after completing all the aforementioned steps, it's advisable to either perform it on a completely new ADLS storage account or restart your HMS to ensure the credential cache is cleared.
Using the left-hand navigation menu, select Load balancing.
Click to expand the Add load balancing dropdown menu.
Select Create new.
Select Load Balancer.
Step 3: Configure public load balancer
It's time to add configuration details for the load balancer.
Provide a meaningful Load balancer name. We recommend adding -public to the end as you'll be creating two load balancers and this will help you tell which one is public and which one is internal (ex. kyle-payne-hms01-public).
For Type, select Public.
For Protocol, select TCP.
For both Port and Backend port, type 443.
Click the Create button.
Wait for the load balancer to be created and added. It will take a few minutes to complete both.
Step 4: Add outbound rule to public load balancer
This step will ensure that outbound traffic is allowed. It's ok if your environment requires a slight deviation to adhere to security protocols. The important thing is that you create an outbound rule that allows traffic from this VM running your HMS on port 443.
Click on your load balancer name (ex. kyle-payne-hms01-public).
Using the left-hand navigation menu, select Outbound rules.
Click + Add.
Provide a meaningful Name. (ex. all-outbound).
For IP version, select IPv4.
Expand the Frontend IP address dropdown menu, and select the only address in the list.
For Protocol, select All.
For Backend pool, select the only pool in the list.
For Outbound ports Choose by, select Ports per instance.
For Ports per instance, type 63992 (or whatever the maximum allowed is).
Click the Add button.
At this point your HMS should be able to validate SPN credentials and ADLS account keys.
Step 5: Add internal load balancer for Private Link
Now it's time to add the internal load balancer that is required for Private Link. This follows a very similar process to adding a public load balancer.
Using the search bar at the top of the screen, search for virtual machines.
Select Virtual machines from the list of results.
Using the filter bar, type the name of your virtual machine.
Click on your virtual machine.
Using the left-hand navigation menu, select Load balancing.
Click to expand the Add load balancing dropdown menu.
Select Create new.
Select Load Balancer.
Step 6: Configure internal load balancer
Provide a meaningful Load balancer name. We recommend adding -internal to the end to help differentiate between the public and internal load balancers (ex. kyle-payne-hms01-internal).
For Type, select Internal.
For Protocol, select TCP.
For both Port and Backend port, type the port your HMS thrift service is currently listening on. The default for an HMS is port 9083.
Click the Create button.
Wait for the load balancer to be created and added. It will take a few minutes to complete both.
Step 7: Create Private Link service
Now that you have your load balancers set up, you're ready to create the Private Link service for your HMS.
Using the search bar at the top of the screen, search for Private link services.
Select Private link services from the list of results.
Click + Create.
Step 8: Configure Private Link outbound settings
Expand the Resource group dropdown menu, and select the resource group you wish to put the Private Link service into. If you are in doubt, it is likely a good idea to put this in the same resource group as the HMS VM because there would be no reason to keep this Private Link service around if the HMS was destroyed.
Provide a meaningful Name. We recommend using the same name as the internal load balancer you created earlier since this Endpoint Service will be directly tied to it (ex. kyle-payne-hms01-internal).
Click the Next button.
In the box next to Load Balancer select the internal load balancer you created earlier for this VM (ex. kyle-payne-hms01-internal).
Expand the Load balancer frontend IP address dropdown menu, and select the only IP address in the list.
The Source NAT Virtual Network is automatically selected for you.
Expand the Source NAT subnet menu, and select a subnet that can route to your load balancer. When in doubt, select the same subnet the load balancer is using.
Click the Next button.
Step 9: Complete Private Link service configuration
We recommend leaving Who can request access to your service to Role-based access control only. We have not tested using other options. If your security requirements dictate that you must select something else, you can do that now but you may find you have additional measures you need to take later to get things to work.
Click the Next button.
Configure any Tags you wish to use.
Click the Next button.
Click the Create button.
Step 10: Record Private Link service Alias
Select your Private Link service in the Azure portal if you aren't already on the appropriate page.
Ensure you are in the Overview section.
Click the copy button next to the Alias.
Open a Starburst Galaxy support ticket, and provide the Alias to support. In the ticket, be sure to indicate that you would like a Starburst Galaxy Private Endpoint to be configured to your internal HMS's Private Link service.
Step 11: Accept connection for Private Link service
Once Starburst support has created the private endpoint to your Private Link service, you will see the connection listed as Pending under Private endpoint connections.
The screenshots below show an example of two private endpoint connections. Yours will only have one.
Confirm with Starburst support that the endpoint has been created.
In the Private endpoint connections section, click the Refresh button until your connection appears.
When it has appeared, select the new connection.
Click the Approve button.
In the Description field, enter a meaningful description.
Click the Yes button.
In the Connection state column, confirm that the status of the endpoint has changed to Approved.
Note: Click the Refresh button if necessary.
You are now ready to configure an ADLS catalog with your own HMS in Starburst Galaxy using a Private Link connection.
Tutorial complete
Congratulations! You have reached the end of this tutorial, and the end of this stage of your journey.
You're all set! Now you can use Private Link to configure access to data in your Azure Data Lake Storage account.
Continuous learning
At Starburst, we believe in continuous learning. This tutorial provides the foundation for further training available on this platform, and you can return to it as many times as you like. Future tutorials will make use of the concepts used here.
Next steps
Starburst has lots of other tutorials to help you get up and running quickly. Each one breaks down an individual problem and guides you to a solution using a step-by-step approach to learning.
Tutorials available
Visit the Tutorials section to view the full list of tutorials and keep moving forward on your journey!