We’ve added some new features, functionality and connectivity to Starburst Galaxy over the last two months. These new additions increase your ability to manage your data lakehouse, strengthen security standards, increase management options for your clusters and enhance the overall user experience in Starburst Galaxy. The combination of these new options help Starburst Galaxy users accelerate time to value on their data lakehouse and decrease time spent on management. In this article, I will provide you with an overview of the recent additions and insight on what’s coming in the next month to Starburst Galaxy.
Connectivity
This month we’re excited to announce new support for Apache Hudi. Hudi is a streaming data lake platform bringing core data warehouse and database functionality to the data lake. It is a great option for organizations with streaming workloads as well as incremental batch pipelines. Hudi joins Iceberg, Delta Lake and Hive as an option for your table format as part of Starburst Galaxy Great Lakes Connectivity. Starburst Galaxy provides you with the optionality to connect to the most popular open table formats on your data lake of choice (Amazon S3, Azure Data Lake Storage and Google Cloud Storage) giving your organization the flexibility to choose the best solution based on your organization’s needs.
Additionally, we’ve added connectivity to some of the most popular cloud data warehouses including Snowflake and Google BigQuery over the last two months. These connectors add to our robust cloud data warehouse options including our pre-existing connectors to Amazon Redshift and Azure Synapse. You can also now access document database MongoDB on Starburst Galaxy.
Lakehouse
One of the key benefits of using Starburst Galaxy as your managed Trino platform is the ability to quickly and easily access new features released in the open source project. MERGE support was recently added in the open source Trino project and is now available to use in Starburst Galaxy. The MERGE statement modifies an existing table based on the result of a comparison between the key fields with another table. The MERGE statement tries to compare the source table with the target table based on a key field and then processes the changes. In a sense, the MERGE statement combines the INSERT, UPDATE, and the DELETE operations. This new addition provides Starburst Galaxy users with even more data warehouse-like capabilities on the data lake.
Security
On the security-side, Starburst Galaxy has added the ability to set-up access control on a location basis in object storage. You can now set-up access control based on the file location within object storage. This makes it easier for admins to set access controls policies with less complexity on the data lake.
Last month we released Single Sign-On (SSO) support for Okta, Google and Azure Active Directory and this month we’ve extended that functionality to clients like Tableau, Looker and more. This allows Starburst Galaxy users to take advantage of single sign-on on the client-level and decreases the friction when leveraging those business intelligence/analytics tools.
Experience
Just this past week, we added query history reports to Starburst Galaxy. The initial launch of query history reports allows users to analyze who is querying what data sources. Query history reports allows you to see the top users, most viewed data sources and the most queried tables for a set period of time. This new feature provides users with the ability to understand even more about their data environment.
If you are a new user to Starburst and Trino, we’ve added two new tutorials: Data Lake Analytics and Data Federation. The Data Lake Analytics tutorial provides a detailed walkthrough of connecting to a sample data lake, in this case Amazon S3, and creating/querying tables within the data lake. The Data Federation tutorial provides an overview of how you use the Starburst Galaxy’s interactive query engine to federate multiple data sources together.
What’s next?
Next week, we’ll add the critical ability to schedule your cluster uptime in Starburst Galaxy. Cluster scheduling allows engineers to remove cold start delays by setting your cluster to run on a predefined schedule. Cluster scheduling provides the ability to pick specific days and times when your selected cluster will be available. This new feature will allow you to get started as soon as you’d like and will provide for even more cost efficiency on the platform.
We have more exciting announcements to come over the next month. As a preview, we’ll be launching new data discovery features solving for a number of issues with the data lakehouse including understanding where the data is located and how it’s being consumed without the need for a cluster to be running. We’re also planning to launch a much requested connector to Elastic providing more log analytics capabilities on the platform. More performance improvements will be coming over the next few weeks. And finally, column-level access control will be shipping in November.
Keep your eyes on this blog for all your information on new Starburst Galaxy releases. And remember: You can try Starburst Galaxy for free with up to $500 in usage credits today!