Starburst Stargate: One Cluster to Rule Them All

Share

I think of Starburst Stargate as the Lord of the Rings feature. Or the galactic empire feature.

In a prior blog post, I introduced dynamic filtering, an important feature for achieving good performance across complex data landscapes.  We continue that discussion here with Starburst Stargate. Many enterprise companies today have data spread over multiple global locations as a consequence of either strategic initiatives or complex environments that have grown over time. A few examples are:

  • Adopting a strategic multi-cloud approach in order to meet the demands of the business or reduce risk
  • A combination of legacy on-premise data centers in addition to cloud-based environments
  • Adoption of a single cloud provider but storage requirements dictate that data lives in multiple regions that can span the globe
  • Additionally, many technology groups find themselves in mid-flight as part of a restructuring of their data landscape but need to ensure that critical business functions are met at the same time

In each of the scenarios above, there is a common challenge that needs to be tackled. That challenge is providing access to data in a complex global environment. Fundamentally the question that needs to be answered is: “How do we provide timely access to data to the right people in the right form when that data is decentralized?” Historically the answer to this question has been to further consolidate the data needed for each business requirement. That is normally done through costly ETL pipelines to a centralized location that results in data duplication. Moreover, for sovereignty and security reasons, data movement can significantly increase risk or be a non-starter in some cases. Starburst Stargate provides a powerful alternative to historical approaches.

Starburst Stargate gives architects and data engineers the power to adopt a multi-cloud or hybrid approach while at the same time providing analyst access to global, geographically diverse data. Starburst Stargate allows multiple Starburst clusters to communicate with each other.

Starburst-Stargate - Cluster Diagram

With Starburst Stargate a cluster is deployed in each data center. That allows processing to occur with as much locality as possible reducing data before transfer over the network between clusters. This eliminates the need to pull remote data to the center of gravity for processing.

Ultimately that means both better cost savings and performance. Cost savings are realized through a reduction in egress costs when querying cross-cloud and a reduction in the need to cache data for good performance. That helps from both an administrative and cost perspective when you don’t have to worry about things like SSDs or copying large amounts of data locally.

The way this works is that multiple clusters are able to view each other’s data sources as they would a standard catalog.  Administrators configure which sources are visible between clusters as part of deployment. As a consequence, access to data across regions and clusters is seamless to end-users. Queries are submitted as normal and Starburst will federate between them without analysts having to know where the underlying data resides.

Semantic layers can also be set up in the form of views that further streamline access. That means semantic layers can now span sources both within and between Starburst clusters giving an added level of control for easy reproducible access to data.

For security purposes, the catalogs that are accessible across clusters are configurable by admins. The sources that are visible to end-users are subjected to the full power of role-based access control within Starburst. Security policies can be written in one centralized location within Starburst to govern global access to data at the catalog, schema, table, and row-level with row-level filtering and column level masking.

Security policies can also be written on the middle semantic layer discussed above that federates between clusters. Overall that means a significant contraction of the complexity of the security landscape which reduces administrative burden, reduces cost, and reduces risk. Starburst Stargate also makes it possible to deal with data sovereignty laws where, for governance reasons, copying data from one location to another isn’t feasible. The solution to that is to now leave the data where it lies and query it remotely through Starburst Stargate.

Starburst Stargate gives a significant amount of flexibility to data leaders who are designing environments for the future. It is a paradigm-shifting approach that allows new architectures to be designed that weren’t feasible in the past. Starburst Stargate works in conjunction with the full suite of performance features within Starburst such as query pushdown, dynamic filtering, parallelization, and cost-based optimization in addition to materialized views and caching.

Each of these features work with each other to create good performance that scales while enhancing flexibility in deploying the best solution for a given use case. Ultimately, the result is dramatically improved time to insights together with a reduction in cost and risk.