
Has the notion of a single data source for Financial Services run its course?


Manveer Singh Sahota
Director of Product Marketing
Starburst
Pat Bates
Solutions Architect
Starburst
Manveer Singh Sahota
Director of Product Marketing
Starburst
Pat Bates
Solutions Architect
Starburst
Fraud and financial crimes are like whack-a-mole. It never ends. There are new patterns and methods of deception being created daily. Predatory practices are changing daily, whether it’s new account fraud, account takeover, embezzlement, identity theft, or money laundering.¹ The most serious of financial crimes is realized in money laundering, and it’s a big business. According to the UN Office on Drugs and Crime, 2-5% of global GDP is laundered yearly. That is between $800 billion and $2 trillion annually. In 2020, the UK Financial Intelligence Unit (UKFIU) received and processed almost 600,000 suspicious activity reports (SARs). Over 95% of these reports came from financial institutions, costing them £28.7 billion.²
The banking landscape has thousands of financial products and services, with new ones coming online daily, accounting for hundreds of billions of transactions daily. Each one of the products and services creates an opportunity for money laundering. Furthermore, fraudsters aggressively seek loopholes and vulnerabilities to use new techniques and exploit the products and services to launder funds.
As financial institutions continue to revamp their AML efforts to combat fraudsters across know-your-customer (KYC) processes, monitoring, and investigations, this blog will explore architectural considerations financial institutions should consider to improve their analytics muscles for AML monitoring capabilities. The recommendations below are based on how leading financial services customers use Starburst to access and query their relevant federated data to improve their toolkits in the fight against financial crimes.
Monitoring and detection are based on identifying sequences of events and transactions that raise red flags that prompt an investigation of potentially fraudulent activity. Identifying what constitutes a red flag is driven by:
As you might expect, this complexity is constantly changing and evolving in addition to the regulatory landscape, i.e., new country-level policies on cryptocurrencies.
Therefore, one of the most critical technology factors that impact the effectiveness of money laundering monitoring is the ability to ascertain customer activity vis-a-vis their historical activities, and that requires the ability to go across various silos and do so fast not to impede business. In an environment of rapidly changing conditions, e.g., billions of transactions per hour, hundreds of thousands of changing financial products, and millions of customers and entities, rapidly feeding and maintaining detection models depends on the ability to access and process petabytes of data quickly.
A typical data architecture for anti-money laundering is summarized in the following figure.
All post-transaction AML functions (KYC, Monitoring, and Investigations) are carried out on curated data populated into a data lake of some form of distributed object storage (never directly against the transactional source systems). Relevant data from the transactional source feeds (transactions, products, customers, etc.) is typically dumped in raw form directly into a staging area of the data lake (Land). Data integration processes (ELT) cleanse and transform the data into dimensional model structures familiar to a data warehouse for high-performance access and processing (Structure). To satisfy the high-performance requirements of AML functions, the data is further refined into analytical data objects precisely tuned for specific AML operations (Consume).
The significant data-related challenges in this architecture that affect AML effectiveness includes but are not limited to the following:
The impact and costs of inadequately addressing the above challenges leading to subpar monitoring capabilities are significant.
Built on Trino, the open-source standard for SQL query engines, Starburst easily integrates into your existing AML architecture by connecting to 50+ data stores. Starburst eliminates the need for extraneous copying and moving data. Instead, individual Starburst connectors are enhanced with table statistics, aggregate pushdown, dynamic filtering, parallelism, and more. Together they provide a single point of access to all your data and create a data consumption layer for your data team. A single query can return results from data in Hadoop, S3, Snowflake, ADLS, Delta Lake, BigQuery, Kafka, Redshift, and many others.
Traditional SQL engines are strained querying data lakes and have difficulty with large data spread across multiple sources. Starburst provides a highly efficient, parallelized execution path that speeds queries while slashing time to insight to just minutes. With the true separation of data storage and compute, Starburst future-proofs any analytics architecture to leverage best-of-breed BI applications better today—and tomorrow. Advanced caching to memory and other data sources, in addition to aggregate pushdown, dramatically improves performance on RDBMS sources. And with federated cost-based query optimization, Starburst brings the efficiency and flexibility to accelerate time to insight with high concurrency.
Lastly, Starburst reinforces security within your AML architecture. Security features such as end-to-end encryption, different authentication types, fine-grained access control, detailed security auditing, and more continue to provide security and AML teams with the needed reassurance that the right people have the right level of access at any time. This enables companies to integrate with a centralized security framework to manage fine-grained access control across an enterprise data lake and other data sources. In addition to our out-of-the-box security features, AML teams can leverage our deep integrations with our premier security partners.
In addition to the core platform capabilities, Starburst further enhances AML monitoring with the following:
The goal of any successful AML practice, beyond civic duty, is to detect money laundering activities before the regulators do and file the necessary documentation like SARs to avoid the millions in fines from a third infraction.
A Starburst-powered monitoring AML practice could use the following approach:
With Starburst, users can access and analyze more complete data using their existing tools and skills based on ANSI SQL. Furthermore, with capabilities like Data Products, Starburst Stargate, and Query Federation, users can query data across multiple sources with simple SQL syntax for cross-cluster queries. This avoids writing complex SQL queries and data integration operations, introducing multiple opportunities for query failure and risks to get the same results.
Without Starburst | With Starburst |
|
|
*These are example queries; specific AML requirements can vary depending on the jurisdiction and industry. It’s important to consult with legal and compliance experts to ensure that your AML monitoring process meets all applicable regulations. | |
This query selects information from two tables: transactions and customers. It returns the customer’s name, address, ID, and transaction information, including the transaction ID, amount, and date.
The WHERE clause includes two conditions. First, it selects transactions with an amount greater than or equal to $10,000. This threshold is often used as an indicator of potential money laundering activity. Second, it selects transactions that occurred within the last 30 days. This is based on the requirement of most AML regulations to monitor transactions in near-real-time.
|
This query is similar to the previous one, but it uses Starburst. The FROM clause specifies the data products created with Starburst, where the transactions and customers tables are stored. The JOIN clause joins the two tables on the customer_id column.
The query uses the WHERE clause to filter transactions with an amount greater than or equal to $10,000 and a transaction date within the last 30 days. In addition, the query can take advantage of Starburst Stargate, a universal data access layer that allows users to execute cross-cluster queries without data movement, helping to enforce data sovereignty requirements. The transactions and customers tables can come from different data sources, such as Hadoop data store in India, cloud object storage in the US, or other databases worldwide. |
As banks and other financial institutions continue to ramp up their AML practices with modern capabilities, here are some architecture considerations to factor in when making updates:
Read “Has the notion of a single data source for Financial Services run its course?” for additional thoughts on the future of data architectures in Financial Services.
AML monitoring is a critical aspect of financial institutions’ operations. However, with the constantly changing landscape of financial crimes and increasing regional regulations and scrutiny, monitoring systems must be continuously updated and improved to detect and prevent fraudulent activities effectively. Financial institutions must leverage technology and data analytics to enhance their AML monitoring systems and establish a robust data governance strategy to ensure the accuracy and reliability of their data sources. By adopting Starburst’s data lake analytics platform, financial institutions gain simplicity, access, agility, and optionality to establish a single point of access for a modern AML monitoring center of excellence that can keep up with the ever-changing landscape of financial crimes. To learn more, visit Starburst for financial services and read how FINRA uses Starburst to process 100 billion new rows of data from 25+ countries daily to catch fraud, insider trading, and abuse.