Last Updated: 2024-09-18

Background

Starburst Galaxy uses clients and drivers to connect to a variety of applications, tools, and technologies. Organizations typically use these tools to meet their third-party integration needs.

Clients and drivers are also useful because they help abstract and simplify sometimes complicated underlying data architecture and logic. Overall, the use of clients and drivers expands the use case for Starburst Galaxy.

Scope of tutorial

This tutorial will show you how to connect a Trino object instance to a Starburst Galaxy cluster using the trino-js-client driver. This will allow you to access all of the data configured inside the cluster's catalogs. The trino-js-client is a Trino client written in TypeScript operating in a Node.js environment using a JavaScript run-time environment.

Learning objectives

Once you've completed this tutorial, you will be able to:

Recommended knowledge

This tutorial is highly technical in nature.Because of this, it is recommended that you have a basic understanding of JavaScript, including its data structures like objects and arrays, before proceeding.

The project is built in Node.js.For this reason, you should understand both the relationship between Node.js and JavaScript and the process required to create and run projects in this environment before beginning.

Finally, you will need some knowledge and familiarity regarding package management systems, such as npm. Prior familiarity with this topic will make it easier to install and manage the necessary dependencies needed to successfully complete this project.

Prerequisites

Background

Node.js is a JavaScript runtime environment that allows you to run JavaScript or TypeScript code outside of a browser environment. It is designed to make web and mobile development accessible. In this tutorial, you will be using Node.js to run and access the trino-js-client. This section will help you get your environment set up.

Step 1: Download an IDE on your local machine

Let's begin by downloading an IDE of your choice. This will allow you to run your project locally. There are many popular IDEs, including Visual Studio Code or IntelliJ IDEA.

Step 2: Install Node.js

This step will ensure that Node.js is installed correctly on your local machine. To do this, you're going to use code from the Node.js website to download the required packages.

# installs nvm (Node Version Manager)
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.0/install.sh | bash

# download and install Node.js (you may need to restart the terminal)
nvm install 20

# verifies the right Node.js version is in the environment
node -v # should print `v20.17.0`

# verifies the right npm version is in the environment
npm -v # should print `10.8.2`

Your terminal should return node and npm version numbers to ensure that your download is successful.

Step 3: Initialize Node project

Next, you need to initialize a new Node project and create a JavaScript file that will allow you to interact with Trino.

If you haven't created a folder for your project to live, please do so now.

cd path/to/your/project-directory
npm init -y

This command creates a package.json file that stores your project's basic information and tracks any dependencies you add.

Step 4: Create a .js file to run logic

Next, you need to create a .js file that will allow you to run all of your connection and data retrieval logic.

Step 5: Install the trino-js-client

Now it's time to install the trino-js-client in your environment. You can use npm or yarn via the command line to download the client driver into your project.

It's important to note that yarn requires separate download scripts. For more detailed information on how to download yarn, refer to yarn's official docs.

Command for npm:

npm install trino-client 

Command for yarn:

yarn add trino-client

Note: You can ignore any vulnerability warnings.

Step 6: Install dotenv

Now it's time to download the dotenv package. This package will help you store and reference confidential information.

npm install dotenv

Note: You can ignore any vulnerability warnings.

Step 7: Add module type in package.json

By default, Node.js treats files with .js extension as an older module system. This system uses require() statements to import modules. However, more modern versions of JavaScript will use import statements instead. To make sure your file is up to date with the ES6 version of JavaScript, you need to update your package.json file.

"type": "module"

Your package.json file should now look like this:

{
"name": your project name,
"version": your project version,
"type": "module",
...

}

Step 8: Import packages

Now it's time to import packages. These allow you to utilize the downloaded dependencies within your project. In this case, you'll want to import the dotenv and Trino client package.

"dependencies": {
  "dotenv": "^16.4.5",
  "trino-client": "^0.2.3"
}
import {BasicAuth, Trino} from ‘trino-client';
import dotenv from ‘dotenv';

These import statements offer a global reference point to your dependencies that can be used throughout your project.

Background

You will use the trino-js-client to connect to your Starburst Galaxy catalogs. In this section, you'll instantiate the Trino client object, enabling you to query and retrieve your data efficiently.

Let's take a look at the example given in the trino-js-client documentation:

const trino: Trino = Trino.create({
  server: 'http://localhost:8080',
  catalog: 'tpcds',
  schema: 'sf100000',
  auth: new BasicAuth('test'),
});

Notice how the server parameter specifies the location of your catalogs. In this case, you will be replacing this with your Galaxy server URL. Additionally, the catalog and schema define the name and structure of the data source you're querying. Furthermore, the auth parameters, which will authenticate your access to the server itself, will be replaced with your own Galaxy credentials.

Let's dive into setting up the client in your environment! These next few steps will outline how to find the necessary information to replace these placeholders with your own parameters.

Step 1: Create a .env file to store username and password

Before we begin, you need to modify your environment variables to store your username and password. When dealing with sensitive credentials, it's important to add a layer of security in your project's root directory.

GALAXY_USERNAME='yourusername'
GALAXY_PASSWORD='yourpassword'

Step 2: Add .env to .gitignore

Now it's time to set up your .gitignore. Adding your .env file to a .gitignore lets you protect your sensitive data when using Github.

# local env files
.env

Please refer to this article for more information, if needed.

Step 3: Copy Starburst cluster URL from Partner Connect

Your Starburst server URL can be found in the Partner Connect section of your Starburst Galaxy account.

Screenshot showing the Partner connect area of Starburst Galaxy

Note: It is important to make sure that you are selecting the correct cluster containing the catalogs you want to use for this tutorial.

Step 4: Configure environment variables in index.js

You will need to reference hidden environment variables in your index.js file. To do this, you'll instantiate your imported dotenv module using the config method.

dotenv.config()

To access your environment variables, you're going to use the process.env object in Node.js. To authenticate using your credentials, you'll use the BasicAuth method from the trino-client.

const auth = new BasicAuth(process.env.GALAXY_USERNAME, process.env.GALAXY_PASSWORD)

Step 5: Populate Trino object

Now it's time to populate the Trino object. This will allow you to establish a working connection to your Starburst Galaxy cluster and access the data that you will query. Recall the example Trino object from the trino-js-client documentation:

const trino: Trino = Trino.create({
  server: 'http://localhost:8080',
  catalog: 'tpcds',
  schema: 'sf100000',
  auth: new BasicAuth('test'),
});

You will now reconfigure this object to add your own credentials obtained from previous steps.

const trino = Trino.create({
  server:    "https://yourusername-yourclustername.trino.galaxy.starburst.io",
  catalog: "catalog name",
  schema: ‘schema name',
  auth: auth
});

Step 6: Configure SSL Options

Now it's time to manually configure the SSL options. This is the last step needed to instantiate the object. This important and crucial step ensures that your application will only accept connection requests from servers that have valid SSL authorization.

const sslOptions = {
  rejectUnauthorized: true

}

After this, you will set the ssl key to the sslOptions object you just created by including the key-value pair ssl: sslOptions. This will properly ensure that your code remains secure and our data is protected.

ssl: sslOptions

The completed index.js file should look similar to the following:

Great job! You have successfully connected to Starburst Galaxy and Trino. The next steps will involve testing your connection and querying your data.

Background

So far, you have populated your Trino object, allowing it to accept the correct parameters. You have also set up your coding environment to host this object and created a pathway that allows it to store the logic needed to query your data.

Next, it's time to test your connection and query your data using an index file.

Step 1: Use executeQuery function

At the start of this tutorial, we encouraged you to review the trino-js-client documentation. This documentation outlines the executeQuery() function. You'll use this function to retrieve and manage your data.

const executeQuery = async (query) => {
  const iter = await trino.query(query);
  const data = await iter
      .map(results => results.data ?? [])
      .fold([], (row, acc) => [...acc, ...row]);
  console.log("testing return of query on server", data)
return data;
}

Step 2: Pass query string into executeQuery function

Passing a query string into executeQuery() will execute the query via the trino.query() method. The results are processed within an asynchronous loop and mapped to extract the data and fold (or flatten) them into a single array.

You will be using a basic query to first check to see if your object has access to the data stored in your Galaxy catalogs.

const testQuery = 'SHOW CATALOGS'
console.log(executeQuery(testQuery));
node index.js

Step 2: Query data

It's time to execute more complex SQL queries. To do this, you're going to use a modified version of the previous step, replacing the testQuery string with a modified one that retrieves data from the tpcds catalog.

const testQuery = `
SELECT
   c.custkey AS customer_id,
   c.name AS customer_name,
   SUM(o.totalprice) AS total_order_amount
FROM
   tpch.sf1.orders o
JOIN
   tpch.sf1.customer c
ON
   o.custkey = c.custkey
GROUP BY
   c.custkey,
   c.name
ORDER BY
   total_order_amount DESC
`
console.log(executeQuery(testQuery));

Your terminal should log the return values, similar to the following:

You have now successfully run a SQL query in Node.js and confirmed that your Trino client is connected and properly referencing your data. This basic example demonstrates the power of Trino and its ability to integrate with a variety of different tools. You've officially laid the groundwork for any future projects or applications you wish to build using Node.js, Trino, and Starburst Galaxy.

Tutorial complete

Congratulations on finishing this tutorial! You should now have a comprehensive understanding of how to configure a Trino client within a Node.js environment to access data in Starburst Galaxy.

Continuous learning

At Starburst, we believe in continuous learning. This tutorial provides the foundation for further training available on this platform, and you can return to it as many times as you like. Future tutorials will make use of the concepts used here.

Next steps

Starburst has lots of other tutorials to help you get up and running quickly. Each one breaks down an individual problem and guides you to a solution using a step-by-step approach to learning.

Tutorials available

Visit the Tutorials section to view the full list of tutorials and keep moving forward on your journey!

Start Free with
Starburst Galaxy

Up to $500 in usage credits included

  • Query your data lake fast with Starburst's best-in-class MPP SQL query engine
  • Get up and running in less than 5 minutes
  • Easily deploy clusters in AWS, Azure and Google Cloud
For more deployment options:
Download Starburst Enterprise

Please fill in all required fields and ensure you are using a valid email address.