×
×
×

Not-for-profit regulatory organization

Scalable analytics to
ensure market integrity

This organization governs brokers and broker-dealer firms in the United States to ensure the integrity of America’s financial system. Starburst serves as a scalable, cost-effective way for the organization to analyze its constantly growing volumes of data.

80 million

trade records per day

25+

data sources

20X

faster queries


Region

Americas

Industry

Financial Services

Environment

AWS

Solution

Enterprise

Employees

1000+

Starburst separates compute and storage, making it possible to scale economically and analyze 25PB of data— 100B rows of new data per day from 25+ sources.

Anonymous

Director of Data Analysis

80 million

trade records per day

25+

data sources

20X

faster queries

About

This not-for-profit organization is authorized by the U.S. Congress to regulate a critical part of the securities industry – brokerage firms doing business with the public. One way that the self-regulatory organization carries out this mission is by analyzing close to 80 billion trading events daily from financial institutions to detect fraud, insider trading, and abuse. As data must be stored for years, the addition of TBs of new data daily leads to the accumulation of many PBs over time.

To address the challenges of massive data growth and increasing demand for efficient computing, the customer migrated its legacy data warehousing systems to an Amazon Web Services (AWS) data lake. When redesigning its data platform, the customer chose to separate compute and storage and query its multi-petabyte AWS data lake using Starburst Enterprise, the world’s fastest distributed SQL query engine.

Challenge

Until a few years ago, the customer ran its data warehousing infrastructure on premises. Organizational barriers and scalability limitations forced them to create separate analytic silos, with each handling a subset of the entire dataset. The resulting data fragmentation made analytics difficult. The growing data volume and analytical needs also started to exceed the capacity of its legacy systems. Scaling was expensive and difficult. Solutions were sized to handle peak capacity, which meant that they became very costly. Expanding them was only possible through long procurement processes, and data had to be moved constantly, so it took far too long to perform analytics, slowing time to insight.

Solution

Shifting to a Scalable AWS Data Lake

To address its mounting data storage and processing challenges, the agency decided to completely rethink its data platform. In 2014, they made the decision to move from on-premise to an AWS data lake model. Today, their cloud data lake consists of:

  • Elastic Compute Clusters (both long-standing and transient ones)
  • Central Catalog (metadata repository)
  • Amazon Simple Storage Service (Amazon S3) Cloud
  • Storage (object store)

Starburst Enterprise

One remaining challenge was to select an interactive SQL engine that would match the performance of the legacy MPP SQL systems. For ad-hoc analytics, the query SLAs are measured in seconds. They chose Starburst Enterprise because it was the only SQL engine able to operate at petabyte scale in the cloud and execute concurrent queries interactively against data stored on Amazon S3. Strong references from other well-known Trino users such as Facebook, Netflix, and Airbnb, combined with Starburst Enterprise enhancements and enterprise support, were crucial.

Starburst Enterprise’s proven integration with AWS was another essential feature. “Starburst was very data-lake friendly,” says the Director of Data Analysis. “It was as if it was built for that model. That was a key differentiator for us. We were very invested in the data lake.”

Today, they use Starburst Enterprise for ad-hoc data profiling, BI, and reporting. Teams of data analysts and scientists execute multiple concurrent SQL queries via JDBC and ODBC clients. Starburst Enterprise then authenticates requests with Active Directory using LDAP and authorizes them via Hive Metastore table permission checks. Finally, during query execution, Trino reads the ORC table data directly off Amazon S3.

The customer has also built several interactive web applications which leverage Starburst as their backend SQL query engine to access data in the AWS Amazon S3 Data Lake.

Results

Faster Insights at Lower Cost

In addition to various added features and optimizations, moving to AWS and partnering with Starburst Enterprise provides the company with a number of advantages over its legacy platform, including:

  • Scalability – no need to worry about data storage or compute resources
  • Elasticity – scaling compute up and down as desired, no need to provision for peak usage anymore
  • Accessibility – no silos and time to insight vastly reduced
  • Performance – upgrades to Starburst Enterprise has resulted in 20X faster queries
  • Flexibility – ability to use the best tool for a given analytical use case
  • Cost efficiency – thanks to AWS and cloud economies of scale

Leveraging both Starburst and AWS Amazon S3 cloud storage eliminates the need to invest in expensive proprietary Big Data appliances to support ever increasing volumes of data. Working with Starburst also results in a significant reduction of Amazon Elastic Compute Cloud (Amazon EC2) costs.

The company can analyze its data interactively in an ad-hoc manner without the data copying and loading required in the past. The migration from legacy data warehousing systems was seamless to end users, and the process of researching market manipulation and investigating potential fraud is now faster than before.

Overall, Starburst gives the customer a scalable, cost-effective way to analyze its constantly growing volumes of data, which is needed to investigate potential abuse cases and conduct ad-hoc exploratory analyses looking for new fraud schemes. “We monitor market data for trading fraud,” says the Director of Data Analysis. “Starburst separates compute and storage, making it possible to scale economically and analyze 25PB of data — 100 billion rows of new data per day from 25+ sources.”

Region

Americas

Industry

Financial Services

Environment

AWS

Solution

Enterprise

Employees

1000+

Get in touch

Want to try Starburst? Have questions? We're here to help.

Contact Us

Start Free with
Starburst Galaxy

Up to $500 in usage credits included

  • Query your data lake fast with Starburst's best-in-class MPP SQL query engine
  • Get up and running in less than 5 minutes
  • Easily deploy clusters in AWS, Azure and Google Cloud
For more deployment options:
Download Starburst Enterprise

Please fill in all required fields and ensure you are using a valid email address.