4 Key Things You Should Know About Indexing

Strategy
  • Roman Vainbrand

    Roman Vainbrand

    Director, Cache Strategy

    Starburst

Share

Data indexing radically accelerates query run time and concurrency without the need for massive compute resources. But before expecting indexing to solve all your needs these are the four things you need to know before this solution will make the desired impact:

1. Indexing is useless if your queries need to perform a full scan

To benefit from indexing, first, you have to intimately understand your business.

Take a telephone book, for example. If you know the family name, the first name and area a person lives in, you will not have to scan the entire list of entries in the phonebook. However, if you don’t know the family name and, let’s say, only know the address, the way the phonebook is indexed is useless to your search. You will have to scan the phone book line by line.

The same is true of indexing databases. If they are indexed in a specific way and you want to run queries that do not match the indexing parameters, you will have to scan line by line.

For indexing to bring optimum efficiencies, the indexes must cover the query connotations and business needs. Everything needs to be indexed according to the questions you need answered from your database. This will eliminate the need to perform full scans, which burn CPU resources and money.

2. The way you write your SQL query matters…a lot

A query written in an inefficient way quickly turns a good query into a bad and slow query.

When writing SQL queries, two things can go wrong. First, if you’re choosing the wrong join strategy (partitioned or replicated) it can lead to poor performance. The second thing, which is easier to act upon, is how you order your tables when you’re doing the join.

So, you need to start with the big table, which is called the “Build Side“, and then to go to the smaller ones. This is the most efficient way. And if you’re doing the opposite, it can be catastrophic.

You can solve it in two ways. You can educate your users to write queries in a smart and efficient way, which is hard, because people want to do their job and are not always thinking about writing the sequel in the most optimized way. And the second option is to maintain table statistics.

And it’s hard for companies to maintain these statistics, because you need to run separate procedures and that takes time and it costs money. We have come across cases where just rewriting the sequence accelerated the query by three-to-five times more.

3. Manage indexes to correspond to changes in query requirements

For questions that we are always asking, it is relatively easy to optimize our data set using indexing. However, business is changing and research questions can have more and more dimensions. These dimensions will be translated into columns in a table.

Traditional indexes are generally optimized for row-based data layouts and not for columnar layouts that are typically used with big data. With columnar data, you cannot index every column without rapidly expanding your storage and grinding your load times to halt.

The key to big data indexing solutions these days is to have a dynamic, intelligent indexing system that can cope with the changing needs of business analytics.

4. A new way to index big data

Nano-blocks are written independently and read in parallel at query time. Users can create big data indexes on any column, adding and removing column indexes without updating the primary dataset.

By building nano-block indexing deep into a query engine that runs directly on data lake solutions, Smart Indexing and Caching can deliver faster big data analytics than is possible with partitioning and the flexibility of changing “partitions” when needed.

In fact, when using the nano-blocks indexing approach, partitioning becomes useless since every query will find its index. This also takes advantage of the flexibility inherent in nano-block indexing by dynamically and automatically adding and removing indexes depending on changing workloads, so future proof for any schema changes.

Cookie Notice

This site uses cookies for performance, analytics, personalization and advertising purposes. For more information about how we use cookies please see our Cookie Policy.

Manage Consent Preferences

Essential/Strictly Necessary Cookies

Required

These cookies are essential in order to enable you to move around the website and use its features, such as accessing secure areas of the website.

Analytical/Performance Cookies

These are analytics cookies that allow us to collect information about how visitors use a website, for instance which pages visitors go to most often, and if they get error messages from web pages.

Functional/Preference Cookies

These cookies allow our website to properly function and in particular will allow you to use its more personal features.

Targeting/Advertising Cookies

These cookies are used by third parties to build a profile of your interests and show you relevant adverts on other sites.