Cookie Notice
This site uses cookies for performance, analytics, personalization and advertising purposes.
For more information about how we use cookies please see our Cookie Policy.
Manage Consent Preferences
These cookies are essential in order to enable you to move around the website and use its features, such as accessing secure areas of the website.
These are analytics cookies that allow us to collect information about how visitors use a website, for instance which pages visitors go to most often, and if they get error messages from web pages. This helps us to improve the way the website works and allows us to test different ideas on the site.
These cookies allow our website to properly function and in particular will allow you to use its more personal features.
These cookies are used by third parties to build a profile of your interests and show you relevant adverts on other sites. You should check the relevant third party website for more information and how to opt out, as described below.
Fully managed in the cloud
Self-managed anywhere
Use the input above to search.
Here are some suggestions:
Trino Summit is a two-day virtual conference on the 11th and 12th of December 2024. It's an event that brings together engineers, analysts, data scientists, and anyone interested in using or contributing to Trino.
Learn moreUse the input above to search.
Here are some suggestions:
Trino Summit is a two-day virtual conference on the 11th and 12th of December 2024. It's an event that brings together engineers, analysts, data scientists, and anyone interested in using or contributing to Trino.
Learn moretrade records per day
data sources
faster queries
Americas
Financial Services
AWS
Enterprise
1000+
Starburst separates compute and storage, making it possible to scale economically and analyze 25PB of data— 100B rows of new data per day from 25+ sources.
Anonymous
Director of Data Analysis
trade records per day
data sources
faster queries
This not-for-profit organization is authorized by the U.S. Congress to regulate a critical part of the securities industry – brokerage firms doing business with the public. One way that the self-regulatory organization carries out this mission is by analyzing close to 80 billion trading events daily from financial institutions to detect fraud, insider trading, and abuse. As data must be stored for years, the addition of TBs of new data daily leads to the accumulation of many PBs over time.
To address the challenges of massive data growth and increasing demand for efficient computing, the customer migrated its legacy data warehousing systems to an Amazon Web Services (AWS) data lake. When redesigning its data platform, the customer chose to separate compute and storage and query its multi-petabyte AWS data lake using Starburst Enterprise, the world’s fastest distributed SQL query engine.
Until a few years ago, the customer ran its data warehousing infrastructure on premises. Organizational barriers and scalability limitations forced them to create separate analytic silos, with each handling a subset of the entire dataset. The resulting data fragmentation made analytics difficult. The growing data volume and analytical needs also started to exceed the capacity of its legacy systems. Scaling was expensive and difficult. Solutions were sized to handle peak capacity, which meant that they became very costly. Expanding them was only possible through long procurement processes, and data had to be moved constantly, so it took far too long to perform analytics, slowing time to insight.
Shifting to a Scalable AWS Data Lake
To address its mounting data storage and processing challenges, the agency decided to completely rethink its data platform. In 2014, they made the decision to move from on-premise to an AWS data lake model. Today, their cloud data lake consists of:
Starburst Enterprise
One remaining challenge was to select an interactive SQL engine that would match the performance of the legacy MPP SQL systems. For ad-hoc analytics, the query SLAs are measured in seconds. They chose Starburst Enterprise because it was the only SQL engine able to operate at petabyte scale in the cloud and execute concurrent queries interactively against data stored on Amazon S3. Strong references from other well-known Trino users such as Facebook, Netflix, and Airbnb, combined with Starburst Enterprise enhancements and enterprise support, were crucial.
Starburst Enterprise’s proven integration with AWS was another essential feature. “Starburst was very data-lake friendly,” says the Director of Data Analysis. “It was as if it was built for that model. That was a key differentiator for us. We were very invested in the data lake.”
Today, they use Starburst Enterprise for ad-hoc data profiling, BI, and reporting. Teams of data analysts and scientists execute multiple concurrent SQL queries via JDBC and ODBC clients. Starburst Enterprise then authenticates requests with Active Directory using LDAP and authorizes them via Hive Metastore table permission checks. Finally, during query execution, Trino reads the ORC table data directly off Amazon S3.
The customer has also built several interactive web applications which leverage Starburst as their backend SQL query engine to access data in the AWS Amazon S3 Data Lake.
Faster Insights at Lower Cost
In addition to various added features and optimizations, moving to AWS and partnering with Starburst Enterprise provides the company with a number of advantages over its legacy platform, including:
Leveraging both Starburst and AWS Amazon S3 cloud storage eliminates the need to invest in expensive proprietary Big Data appliances to support ever increasing volumes of data. Working with Starburst also results in a significant reduction of Amazon Elastic Compute Cloud (Amazon EC2) costs.
The company can analyze its data interactively in an ad-hoc manner without the data copying and loading required in the past. The migration from legacy data warehousing systems was seamless to end users, and the process of researching market manipulation and investigating potential fraud is now faster than before.
Overall, Starburst gives the customer a scalable, cost-effective way to analyze its constantly growing volumes of data, which is needed to investigate potential abuse cases and conduct ad-hoc exploratory analyses looking for new fraud schemes. “We monitor market data for trading fraud,” says the Director of Data Analysis. “Starburst separates compute and storage, making it possible to scale economically and analyze 25PB of data — 100 billion rows of new data per day from 25+ sources.”
Americas
Financial Services
AWS
Enterprise
1000+
© Starburst Data, Inc. Starburst and Starburst Data are registered trademarks of Starburst Data, Inc. All rights reserved. Presto®, the Presto logo, Delta Lake, and the Delta Lake logo are trademarks of LF Projects, LLC
Up to $500 in usage credits included