1. Tutorial overview
Last Updated: 2024-02-09
Background
AWS PrivateLink allows private connectivity between virtual private clouds (VPC), supported AWS services, and on-premises networks. This connection does not expose traffic to the public internet, making it a great choice for data federation across cloud and on-prem networks and other use cases.
Starburst Galaxy extends support for AWS PrivateLink across certain catalogs. This tutorial will guide you through configuring PrivateLink for an Amazon RDS instance. The steps are also applicable to a database running on an EC2 instance.
Scope of tutorial
In this tutorial, you will learn how to configure AWS PrivateLink for an Amazon RDS instance or database running on an EC2 instance.
Learning objectives
Once you've completed this tutorial, you will be able to:
- Configure AWS PrivateLink for connectivity from Starburst Galaxy to your Amazon RDS instance or database running on an EC2 instance.
- Use PrivateLink to securely connect Starburst Galaxy to your Amazon RDS instance or database running on an EC2 instance.
Prerequisites
- You need a Starburst Galaxy account to complete this tutorial. Please see Starburst Galaxy: Getting started for instructions on setting up a free account.
- This tutorial comes with a bring your own storage requirement. Before proceeding with this lesson, you must already have an Amazon RDS instance set up or a database running on an EC2 instance.
About Starburst tutorials
Starburst tutorials are designed to get you up and running quickly by providing bite-sized, hands-on educational resources. Each tutorial explores a single feature or topic through a series of guided, step-by-step instructions.
As you navigate through the tutorial you should follow along using your own Starburst Galaxy account. This will help consolidate the learning process by mixing theory and practice.
2. Working with a Starburst technical resource
Background
If you are configuring PrivateLink for the first time you are encouraged to work with a Starburst technical resource. This individual will work with you to set up the environment needed to complete the tutorial.
Contacting your technical resource
To be assigned this resource, you should reach out to your regular Starburst account team for assistance.
Working together
Once assigned, your Starburst technical resource will work with you to set up an environment where you can complete the tutorial.
Please review the following overview of this process before beginning the tutorial.
Your responsibilities:
- If you are configuring for RDS, locate your RDS endpoint.
- Record the IP address of the RDS endpoint or EC2 instance
- Create a target group.
- Create a network load balancer.
- Create an endpoint service.
- Allow the Starburst Galaxy AWS account principal to use the endpoint service.
- Submit a support request via Starburst Galaxy to have an endpoint connection created.
- Accept the endpoint connection in your AWS account.
3. RDS PrivateLink architecture
Background
Understanding the AWS PrivateLink RDS architecture is important when completing the steps in this tutorial. In this section you will learn about this architecture and the way that Starburst Galaxy uses it to securely connect private clouds.
This tutorial also follows corresponding AWS documentation on the topic. It is recommended that you consult this documentation if you want to learn more about AWS PrivateLink in general.
Reference architecture
The following diagram illustrates a PrivateLink connection between the Starburst Galaxy VPC and the Amazon RDS VPC.
Review the diagram and corresponding notes below for more information.
- Once the PrivateLink configuration is complete, an endpoint is created in the Starburst Galaxy VPC (Source).
This endpoint connects to a Network Load Balancer located inside an endpoint service situated in the RDS instance VPC (Destination).
This establishes a private connection between Starburst Galaxy and the RDS instance, enabling PrivateLink functionality. - In this reference architecture, the Starburst Galaxy VPC is the source.
- In this reference architecture, the RDS instance VPC is the destination.
4. Obtain RDS instance details
Background
It's time to get started. In this section, you'll begin by obtaining some key information about your Amazon RDS instance, including:
- RDS endpoint
- Port number
- AWS Availability Zone
- RDS endpoint IP address
You'll need this information to create a target group and load balancer in the next sections of this tutorial.
Step 1: Sign in to AWS console
You're going to start by signing in to your AWS console. Remember that this should be the AWS account containing the RDS instance that you would like to connect using PrivateLink, so if you use multiple AWS accounts, ensure that you pick the correct one.
- Sign in to your AWS account.
- In the AWS console landing page, in the Database section, select RDS.
Note: You can also search for your RDS in the AWS console.
Step 2: Select RDS instance
Now it's time to find the right RDS instance. Depending on your workflow, you might have multiple instances in the same AWS account, so make sure you select the correct one.
- Using the Amazon RDS menu on the left, select Databases.
- Search for your RDS instance by name.
- Select the RDS instance to view its details.
Step 3: Record RDS details
Now it's time to record details about your RDS instance. This includes the RDS endpoint, port, and availability zone.
- In the Summary section, record the name of the RDS Engine.
- In the Connectivity & Security section, record the Endpoint, Port, and Availability Zone.
For example: - Engine: PostgreSQL
- Endpoint: erin-rosas-bootcamp.cs8j7iukogcy.us-east-2.rds.amazonaws.com
- Port: 5432
- Availability Zone: us-east-2a
Step 4: Record RDS IP address
Next, you will use your RDS endpoint to determine your RDS IP address.
To do this, you'll use a terminal window. Again, you will be copying information into your text editor.
- Open a Terminal window on your desktop.
- Run one of the following commands to retrieve the IP address.
Note: The command you choose will depend on your operating system. Be sure to replace [rds-endpoint]
with your actual RDS endpoint.
- In Windows run the command
nslookup [rds-endpoint]
- In Linux\MacOS run the command
dig [rds-endpoint]
- Record the IP address of the RDS instance.
5. Obtain EC2 instance details
Background
It's time to get started. In this section, you'll begin by obtaining some key information about your EC2 instance, including:
- AWS region
- AWS Availability Zone
- Private IP address
You'll need this information to create a target group and load balancer in the next sections of this tutorial.
Step 1: Sign in to AWS console
You're going to start by signing in to your AWS console. Remember that this should be the AWS account containing the EC2 instance that you would like to connect using PrivateLink, so if you use multiple AWS accounts, ensure that you pick the correct one.
- Sign in to your AWS account.
- Use the search bar at the top of the screen to search for EC2. Select EC2 from the list of options.
Step 2: Select EC2 instance
Now it's time to find the right EC2 instance. Depending on your workflow, you might have multiple instances in the same AWS account, so make sure you select the correct one.
- Using the menu on the left, select Instances.
- Search for your EC2 instance by name.
- Select the EC2 instance to view its details.
Step 3: Record EC2 details
Now it's time to record details about your EC2 instance. This includes the Private IP address, VPC, and Availability zone.
- In the Summary section, record the VPC ID and Private IPv4 address.
- In the Networking section, record the Availability Zone.
6. Create a target group
Background
Now it's time to set up a target group. In the context of AWS, a target group is responsible for directing incoming traffic from a load balancer to designated targets, such as cloud instances, containers, or IP addresses.
In this tutorial, the target group you create will play a crucial role in routing traffic to your RDS instance's IP address. This ensures efficient communication between the load balancer and RDS instance, optimizing performance and reliability.
Step 1: Start the target group wizard
- Navigate to the EC2 dashboard in the AWS console. This can be done by searching for EC2 and clicking EC2 in the results list.
- From the left-hand navigation menu, expand Load Balancing and click Target Groups.
- Click the Create target group button on the right.
Step 2: Provide a target group name
In this step, AWS will ask you to select a target type and provide a name.
- Select IP addresses as the target type.
- Provide a meaningful Target group name.
Step 3: Configure the target group
Next, you're going to configure your target group for use with your RDS instance or database running on an EC2 instance. To do this, you're going to use some of the details that you copied into your text editor earlier in this tutorial.
- Using the Protocol drop-down menu, select TCP.
- Enter the port number used by your RDS or EC2 instance.
- Select IPv4.
- Select the VPC for your RDS or EC2 instance.
- Using the Health check protocol drop-down menu, select TCP.
- Click Next.
Step 4: Complete configuration process
Almost there! For the final step, you're going to finish the configuration process and create the target group.
- In the IPv4 address field, enter the IP address of your RDS endpoint or EC2 instance.
- In the Ports section, click the Include as pending below button.
- Confirm that your RDS endpoint IP or EC2 instance IP is now listed under Targets and that its Health status is shown as Pending.
- Click Create target group.
7. Create a load balancer
Background
Now it's time to create a network load balancer. In AWS, a Network Load Balancer (NLB) is a service that automatically distributes incoming network traffic across multiple targets based on IP protocol data. This includes Amazon EC2 instances, containers, and IP addresses. Load balancers are also configurable across either a single AWS Availability Zone or multiple Availability Zones.
After configuring PrivateLink, an endpoint in the Starburst Galaxy VPC will connect to your Network Load Balancer using a service located in the RDS instance or EC2 instance VPC.
Step 1: Start the load balancer wizard
- From the left-hand navigation menu, click the Load Balancers link.
- Click the Create load balancer button on the right side of the dashboard.
- Choose the Network Load Balancer.
Step 2: Select load balancer type
AWS load balancers come in several different types. These include Application Load Balancers, Network Load Balancers, and Gateway Load Balancers.
For this tutorial, you're going to select the Network Load Balancer.
- Select the Network Load Balancer by clicking the corresponding Create button.
Step 3: Name your load balancer
It's time to start configuring your new load balancer, starting with a name.
- Enter your Load balancer name in the field provided.
Step 4: Configure the load balancer
Next, you're going to configure your load balancer for use with your RDS instance or EC2 instance.
- In the Scheme section, select Internal.
- In the IP address type field, select IPv4.
- Select the VPC for your RDS instance or EC2 instance.
Step 5: Select the AWS availability zone and subnet(s)
Now it's time to select an AWS availability zone (AZ) for your load balancer. This will be the same AZ that you recorded for your RDS instance or EC2 instance earlier in this tutorial.
- Select the Availability Zone corresponding to your RDS instance or EC2 instance.
- Select your RDS instance or EC2 instance Subnet.
- Leave Private IPv4 address field unchanged.
Step 6: Configure security group
Next, it's time to select a security group to control access to your load balancer. Without this security group, your network load balancer will accept all connections, representing a security concern in production environments.
- Select a Security Group with inbound rules allowing the IP Cidr 172.16.0.0/16 for your database port (For example, 3306 in the case of MySQL).
Step 7: Configure port and target group
- Enter the Port number of your RDS or EC2 instance.
- Using the Forward to drop-down menu, select the target group you just created.
- Click the Create load balancer button.
Step 8: Wait for load balancer to activate
That's it! Your load balancer is now being created. This process takes between three to five minutes.
- Wait for the State to change from Provisioning to Active before moving to the next step.
- Click the Refresh button to view status updates.
8. Create an endpoint service
Background
Now it's time to create an endpoint service.
In the context of AWS PrivateLink, an endpoint service allows you to expose services running in your VPC to other accounts within the same AWS region using a private connection.
Step 1: Start the endpoint service wizard
- Navigate to the VPC dashboard in the AWS console. This can be done by searching for VPC and clicking on VPC in the results list.
- From the left-hand navigation menu, expand Virtual private cloud, and click Endpoint services.
- Click the Create endpoint service button on the right side of the dashboard.
Step 2: Name your endpoint service
It's time to start configuring your new endpoint service, starting with a name.
- Enter your endpoint service name in the field provided.
- In the Load balancer type field, select Network.
Step 3: Configure endpoint service
Now it's time to configure your endpoint service. You're going to make sure that it connects with your network load balancer and uses the correct IP address.
- Select your network load balancer.
- In the Supported IP address type field, select IPv4.
- Click the Create button.
9. Submit Starburst Galaxy support ticket
Background
Time to switch gears. You've completed all of the steps required on your own. Now it's time to contact the Starburst support team to finish the last steps.
Step 1: Enter the Starburst Galaxy ARN
In the last section of this tutorial, you created your endpoint service. At the end of that process, you are directed to a page that displays the details of that service.
You're going to use this section to input the Starburst Galaxy Amazon Resource Name (ARN).
- Select the Allow principals tab under the Details box.
- Select the Allow principals button.
- Enter the following ARN in the ARN field:
arn:aws:iam::179619298502:root
- Select the Allow principals button.
Step 2: Record Service name
Now it's time to locate and copy the service name for your endpoint service. This is one of the endpoint service details listed in AWS.
The Starburst support team will use it to create the endpoint in Starburst Galaxy.
Scroll up and copy the Service name.
Step 3: Open support ticket
You are going to use the automated assistant in Starburst Galaxy to open a support ticket and provide support with the Service name that you just copied. You will also need to provide the port your database is listening on and your preferred Starburst Galaxy PrivateLink configuration name.
- Log in to Starburst Galaxy.
- Click the support icon located at the bottom right of the screen.
- Select Chat with technical support.
- Select Submit a Support Ticket.
- The automated assistant will ask you to provide your email address, first name, and last name.
- When you receive the prompt to describe your issue, note that you would like support to create a private endpoint connection for you. Be sure to include the Service name you just copied, the port your database is listening on, and your preferred Starburst Galaxy PrivateLink connection name.
- Wait for Starburst support to confirm that they have created the Endpoint in Starburst Galaxy. This should take no longer than 24 - 48 hours.
Step 4: Select the Starburst Galaxy endpoint
Do not begin this step until you receive confirmation that the Starburst Galaxy endpoint has been created successfully.
- Scroll down, and select the Endpoint connections tab.
- Wait to see the connection listed.
Note: You may need to click the Refresh button. - Select the endpoint from the list.
Step 5: Accept the endpoint connection request
Now that you've selected the Starburst Galaxy endpoint, it's time to accept the connection request.
- Select the Actions drop-down menu.
- Select Accept endpoint connection request.
- Manually enter accept in the field.
- Click the Accept button.
Step 6: Confirm endpoint connection
That's it. The connection is now being created. This process takes between 1 to 3 minutes to complete.
When this process is complete, you are finished and ready to start using PrivateLink.
- Wait for the State to change from Pending to Available.
- Click the Refresh button to view status updates.
10. Important information for MySQL and MariaDB
Background
If your database is either MySQL or MariaDB, you will likely run into an error when trying to connect to your database from Starburst Galaxy via PrivateLink.
Network load balancer health checks
MySQL and MariaDB count the health checks performed by a network load balancer as connection errors. The network load balancer you created in this tutorial will perform health checks at a default interval of 30 seconds, with each check resulting in approximately 6 errors. This means that, if all default settings are enabled, you will exceed the maximum allowed connection errors in under 10 minutes. Once this limit is exceeded your network load balancer will be blocked.
The PrivateLink connection goes through the network load balancer so when it is blocked, connectivity from Starburst Galaxy to your database over PrivateLink is blocked.
Resulting error in Starburst Galaxy
Here are the errors you will see in Starburst Galaxy when the network load balancer is blocked.
- When using the "Test connection" button during catalog configuration, you will receive an error similar to the following:
Could not connect to MySQL server
.
- When expanding a catalog in the Query editor, you will receive an error similar to the following:
Error listing schemas for catalog pl_mysql_extpartner: java.sql.SQLNonTransientConnectionException: Could not create connection to database server. Attempted reconnect 3 times, Giving up.
There are steps you can take to resolve this error, which are outlined below and also covered in this AWS Database Blog.
Step 1: Confirm issue
One easy way to confirm that you're hitting the health check issue is to run a SQL query that shows which IPs are registering connection errors.
- Run the following SQL in the Starburst Galaxy Query editor:
SELECT *
FROM performance_schema.host_cache
WHERE SUM_CONNECT_ERRORS > 1;
- Check the results for the IP address of your load balancer.
Step 2: Temporarily resolve issue
You can reset the error counters by running the FLUSH HOSTS;
command in the MySQL CLI, MySQL Workbench, or DBeaver. However, this is only a temporary fix because you will quickly hit the limit again and thus be blocked yet again.
- Open up the MySQL CLI, MySQL Workbench, or DBeaver.
- Run the command
FLUSH HOSTS
; - Continue with the next steps to permanently resolve the issue.
Step 3: Increase health check interval
To permanently resolve the issue, you will have to complete three steps. The first step is to increase the health check interval. The default value for this interval is 30 seconds, and the maximum value is 300 seconds.
- Increase the health check interval from the default value of 30 seconds. Since RDS is generally very reliable, we recommend you start by setting it to 300 seconds. You can always lower it again if needed. While this means a health check interval will now happen only 12 times per hour, it is important to remember that the health check process results in approximately 6 errors during each interval.
Step 4: Increase max_connect_errors
You are going to increase the value of max_connect_errors
from the default value of 100 to 5000. With the health check interval set to 300 seconds, you should register approximately 72 errors per hour. Allowing 5000 errors should give you just over 69 hours before the network load balancer will be blocked and in turn block Starburst Galaxy's access.
- Set the value for
max_connect_errors
to 5000. To edit themax_connect_errors
value, create a new RDS Parameter Group and attach it to your database. Alternatively, you can update an existing RDS Parameter Group that is connected to your database. - Reboot your database for the change to take effect.
- Run the following SQL to confirm the change is in effect.
SHOW GLOBAL VARIABLES LIKE 'max_connect_errors';
Step 5: Schedule FLUSH HOSTS
command
The third and final step is to schedule the FLUSH HOSTS;
command to run at an interval less than 69 hours. There are multiple ways to accomplish this. Please choose the best approach for your environment.
11. Tutorial wrap-up
Tutorial complete
Congratulations! You have reached the end of this tutorial, and the end of this stage of your journey.
You're all set! Now you can use PrivateLink to configure access to data in your Amazon RDS instance or database running on an EC2 instance.
Continuous learning
At Starburst, we believe in continuous learning. This tutorial provides the foundation for further training available on this platform, and you can return to it as many times as you like. Future tutorials will make use of the concepts used here.
Next steps
Starburst has lots of other tutorials to help you get up and running quickly. Each one breaks down an individual problem and guides you to a solution using a step-by-step approach to learning.
Tutorials available
Visit the Tutorials section to view the full list of tutorials and keep moving forward on your journey!