We recently launched Starburst Galaxy on Google Cloud. With Starburst Galaxy, you can unlock your data sources on Google Cloud. You can operate your Starburst Galaxy clusters on the Google Cloud platform, and take advantage of the proximity of your clusters and the data sources, all on one platform. As a result, the high-performance SQL query engine inside Starburst Galaxy delivers optimal query performance. Starburst Galaxy is the fully-managed SaaS platform designed to allow you to query your data at interactive speeds across your data sources using the business intelligence and analytics tools you already know. And all those benefits are available without any need to think about cluster deployments, upgrades, and other maintenance. Starburst takes care of all that for you.
Tutorial: Configure a Google Cloud Storage catalog
“Organizations are increasingly seeking out solutions that help them derive insights and value from their business data,” said Naveen Punjabi, Head of Analytics Partnerships at Google Cloud “We’re thrilled to have Starburst Galaxy available on Google Cloud to continue providing our customers with the solutions and technologies they need to do more with their data.”
Starburst Galaxy on Google Cloud allows you to get up and running fast. Start querying your data in Google Cloud Storage (GCS) quickly with a guided intuitive interface in your web browser. All you need is an email address. Connecting to your object storage has never been easier. With Starburst Galaxy, you can run queries that access data in Google Cloud Storage, in your Cloud SQL MySQL or PostgreSQL database – all at the same time! This enables you to gain critical business insights faster and easier than ever before. You can even query and join data across your clouds. The platform is designed to allow you to query data no matter where it lives with no movement required.
Starburst Galaxy is not a black box and has been designed with your needs in mind. It allows you to adjust your clusters to set the dial according to your preferred cost-performance ratio. You can create a cluster for any team or individual in your organization. There are no limitations around the number of clusters, concurrent running, or queued queries. And when your query is finished, the idle-shutdown feature saves you even more on cost by automatically shutting down clusters. Suspended clusters are automatically restarted in seconds, whenever a user tries to run a query. No need to pay for any cluster capacity sitting idle. Starburst Galaxy provides you with the flexibility to get the work you need to be done in line with your organization’s needs.
The built-in query editor provides a convenient console for your SQL query editing. In addition, the available drivers and clients enable you to use business intelligence tools of choice to query your data using the tools you already know. Looker users can simply connect to the platform in a few easy steps. Starburst Galaxy combined with Looker allows your data teams to create engaging data visualizations with the most timely and comprehensive data available.
Starburst Galaxy on Google Cloud was conceived to provide data analysts with the freedom to be curious. The ease of getting started, the simplicity of its use combined with integrations with analytics tools marks another step towards access to all your data, wherever it is stored. Starburst Galaxy gives you the tools you need to ask new questions in a friction-free environment. The platform enables the immediate analysis of siloed data.
Starburst Enterprise and Galaxy on Google Cloud
More on Google Cloud Storage
Google Cloud Storage provides an affordable, scalable way to consolidate enterprise data into an object-based data lake. This guide will explain what Google Cloud Storage is, its role in a data lake architecture, and how to use Starburst Galaxy to query your Google Cloud Storage data lake.
What is Google Cloud Storage?
Google Cloud Storage is an object storage service for the Google Cloud platform. Enterprises can store unstructured and structured data on Google’s infrastructure-as-a-service platform to create apps, implement disaster recovery systems, or make data analytics services available across their organizations.
What is the difference between Google storage and cloud storage?
Cloud storage is a general term for using cloud-based platforms to store data. At one end of the spectrum, cloud storage can refer to the personal file systems provided by services like Google Drive or the shared file systems for workgroups and small businesses provided by services like Google Workspace. At the enterprise end of the spectrum, cloud storage can refer to petabyte-scale services like Microsoft’s Azure Files and Azure Blob Storage.
Google’s Colossus infrastructure, which replaced Google File System, is the foundation upon which the company offers its various cloud storage services, including file, block, and object storage options.
Google Workspace and Google Drive scale to replace on-premises file storage for the largest organizations. Another file system option, Filestore, provides scalable, low-latency storage for high-performance computing applications.
The cloud provider also offers block storage services for application and database development on Google’s cloud computing platform. Persistent Disk and Local SSD integrate with Google Compute Engine virtual machines or Kubernetes Engine. As its name implies, Persistent Disk provides persistent storage. Local SSD delivers faster performance but is ephemeral.
Google Cloud Storage is an object-based data storage system. Its flat structure allows storing a large amount of data, whether structured or unstructured. An object storage system’s rich metadata allows queries to return results quickly and efficiently. Google’s service uses a project-bucket-object hierarchy.
A project organizes an enterprise application’s Google Cloud resources. Projects can contain multiple cloud storage buckets, the flat storage containers for the application’s data.
A Google object consists of the data itself and the object’s metadata. This metadata may be the uneditable properties Google’s systems generate, standard properties authorized users may edit, or custom metadata that users define. It’s this detailed metadata that makes object storage services so useful for enterprise applications and analytics.
Data teams can assign objects to different storage classes with pricing based on availability and how long data must remain in that class.
Standard Storage – The most expensive option, Standard Storage is best used for frequently accessed data or data with brief lifecycles.
Nearline Storage – This more affordable option works for less-frequently accessed data that will remain in place for at least a month.
Coldline Storage – Data that must remain accessible for at least a quarter, even though it’s rarely used, is ideal for Coldline Storage.
Archival Storage – The cheapest option for backup or disaster recovery systems, data held long-term in Archival Storage is still accessible within milliseconds.
What is the difference between data storage and a data lake?
Data storage is another broad term generally referring to the physical technology used to record data. Whether on-premises or in the cloud, data storage systems may use spinning magnetic platters, solid-state semiconductor devices, or magnetic tape.
More colloquially, data storage can refer to how systems store data — as files, blocks, or objects — or the software systems used to access and manage data— such as databases, data warehouses, or data lakes.
Enterprises often use Google Cloud Storage as the object storage component of a data lake. This data repository consolidates structured and unstructured data from multiple sources to provide a centralized resource for the company’s data analytics initiatives. Business intelligence analysts can use tools like Tableau to easily support decision-makers, while data scientists can access the lake’s petabyte-scale capacity to develop machine learning and artificial intelligence algorithms.
By itself, however, Google Cloud Storage is not a data lake. In addition to commodity object-based cloud storage, a data lake architecture combines open file and table formats like ORV and Hive with a scalable massively parallel analytics engine like Apache Spark or Trino.
What are the benefits of Google Cloud Storage?
Like Azure Blob Storage and Amazon S3, Google Cloud Storage delivers the benefits of an object-based cloud storage service, including:
Scalability – Capacity limits are rarely an issue as these services can instantly add or remove capacity.
Flexibility – Cloud storage services seamlessly integrate with enterprise IT systems and applications.
Affordability – Companies only pay for the storage they use and eliminate many infrastructure and maintenance costs.
Security – Service providers protect their infrastructure with teams of highly-trained data security professionals.
Resilience – Redundant, global infrastructure and other features keep data accessible.
Google Cloud Services builds upon these standard benefits with features including:
Object lifecycle management – customers can optimize costs by automating data deletion or moving data to cheaper storage classes.
Security – Object-level and bucket-level permissions, access control lists (ACLs), and integration with identity and access management (IAM) systems provide granular access controls to sensitive data.
Is Google Cloud Storage a data warehouse?
Data warehouses only store structured data, which limits the kinds of insights analysts can generate. Still, a data warehouse’s limitations let it quickly return query results, serving the needs of data experts as well as non-technical users.
Companies can use Google Cloud Storage to build a data warehouse by limiting data objects to structured or semi-structured data and using Google’s BigQuery serverless data warehouse services. Data scientists and engineers can use BigQuery’s command-line tool and API to develop big data analytics projects, while analysts can run SQL queries from the Google Cloud console.
How do I query a Cloud Storage data lake?
A robust data lake architecture built on Google Cloud Storage allows a company to maximize its innovation and decision-making potential by generating insights from structured, semi-structured, and unstructured data. As mentioned earlier, object-based cloud storage is a necessary, but not sufficient, element of a data lake. You must configure the open data formats and tables. And you must have an efficient, performant query engine that can operate at scale.
Starburst Galaxy is a modern data lake analytics platform made available in a software-as-a-service (SaaS) model. Galaxy lets you interactively query any data in any data source using the business intelligence and analytics tools you already know. Galaxy’s benefits include:
Single point of access – More than fifty connectors let you integrate any enterprise data source within Starburst’s virtualized access layer, making all your data available through a single pane of glass.
Performance – Building upon Trino’s massively parallel SQL query engine, enhancements like smart indexing accelerate queries and let Starburst deliver warehouse-level performance on a data lake’s object storage architecture.
Security and governance – Starburst adds role-based and attribute-based access controls that use your data lake’s metadata to create fine-grained governance policies that secure data, protect privacy, and ensure compliance.
Management – Starburst accesses data where it lives, eliminating the need for pipeline development and the risks of data movement.
Querying Google Cloud data lakes with Starburst Galaxy
Starburst Galaxy is available on the Google Cloud platform, letting you operate Galaxy clusters close to where your data resides with even faster SQL query performance. Integrations with Dataplex, BigQuery, and Looker let Starburst turn your Google Cloud data lake into your company’s data center of gravity. Business and data teams get access to more data while avoiding the resource demands of data migrations.
Getting started is a simple four-step process:
- Create a Starburst Galaxy account.
- Create a Google Cloud Storage catalog.
- Create a cluster and add the catalog.
- Begin querying with your preferred SQL tools.
Let’s get into the details.
Create a Starburst Galaxy account
Your name and email address are all you need to create a free Starburst Galaxy account.
Create a Google Cloud Storage catalog
From the Starburst Galaxy interface, you can create and configure your Google Cloud Storage catalog. Once you’ve added a name and description, provide a Google Cloud Storage JSON key to grant Starburst access to your object storage. After configuring the metastore and table format, you can complete the connection and set permissions.
Create a cluster and add the catalog
Use a drop-down list to add the new catalog to an existing Starburst Galaxy cluster running on Google Cloud. Starburst’s interface lets you dial in your preferred cost-performance ratio. Other management features like idle-shutdown automatically reduce costs and optimize query performance.
Begin querying with your preferred SQL tools
Authorized users can create SQL queries within the Starburst interface or through the business intelligence tools of their choice. They can also use Starburst Galaxy with Google’s Looker to build visualizations, dashboards, and other data experiences that support data-driven decision-making.
Starburst Galaxy on Google Cloud lets your data analysts explore and discover data independently. They no longer need data engineers to develop pipelines and wrangle data. Instead, they have easy, instant access to everything in the data lake — and other sources that were once siloed.
What are some next steps you can take?
Below are three ways you can continue your journey to accelerate data access at your company
- 1
- 2
Automate the Icehouse: Our fully-managed open lakehouse platform
- 3
Follow us on YouTube, LinkedIn, and X(Twitter).