Database Indexing makes DB Query faster

Database Indexing is one of the most common technique to make database query faster. In this post we will learn about how indexing makes DB Query faster.

Let’s make a user table,

id	name	username	email	password
1	Lahin	lahin31	lahin@gmail.com	ndiewf22r
2	Sadikul	sadik	sadik@gmail.com	jceu2j8e2
3	Shaju	shaju	shaju@gmail.com	jdiw8dhw
4	Rubel	rubel	rubel@gmail.com	i2jxejx8ed
5	Forhad	forhad	forhad@gmail.com	jde2idj38d
6	Juman	juman	juman@gmail.com	edeuhdueh

You can consider each column has,

id	name	username	email	password
4 bytes	32 bytes	16 bytes	48 bytes	32 bytes

Each column has different bytes. The total bytes will be (4 + 32 + 16 + 48 + 32) = 132 bytes. Since we have 6 rows so we need (6 * 132) = 792 bytes in order to store the entire user table.

Generally our data is stored inside the Disk and inside of the disk we have a collection of blocks. Each block holds the certain amount of data. Depending on how many rows, there are number of blocks available.

In our example we have 6 rows and let’s say each block can hold 265 bytes of data. Since we’ve 6 rows and each row has 132 bytes,

265 / 132 = 2, so we can say each block can hold 2 rows.

This is how data stored inside blocks.

Let’s write a query,

SELECT * FROM users WHERE username=juman

We are collecting every information who has juman username. The query process will be,

Search the Block 1, not found.
Search the Block 2, not found.
Search the Block 3, found.
Then put the entire Block 3 to memory for further process.

Let’s suppose reading/searching a block can take 1 sec, since we searched 3 blocks it will take 1 * 3 = 3 sec. (What if we have 100 blocks then it will take 100 sec in worst case scenario.)

Now let’s optimize this through index.

Index is a referential table that holds row reference against a particular index value. Index is also stored inside the disk block. So for username the index table will be,

username	block_id
forhad	3
juman	3
lahin31	1
rubel	2
sadik	1
shaju	2

For each column it the size can be,

username	block_id
4 bytes	4 bytes

So username has 4 bytes and block_id has also 4 bytes, so each column requires 8 bytes.

As we have 6 rows so the total size of the index table will be 6 * 8 = 48. Let’s suppose each block can hold 25 bytes of data. So we total number of blocks required for index is 48 / 25 = 2.

As our query is to find information for username juman. So the query process will be,

Searching the first index block, it will see username juman has block_id 3.
It will go to the data block 3.
Then put the entire Block 3 to memory for further process.

So we just searched index block 1 and data block 3. If we suppose each block read can take 1 second reading 2 blocks (index block 1 and data block 3) can take 2 seconds, in worst case scenario this is better than searching 100 blocks in 100 seconds.

This is how the index works.