Database Indexing is one of the most common technique to make database query faster. In this post we will learn about how indexing makes DB Query faster.
Let’s make a user table,
id | name | username | password | |
1 | Lahin | lahin31 | lahin@gmail.com | ndiewf22r |
2 | Sadikul | sadik | sadik@gmail.com | jceu2j8e2 |
3 | Shaju | shaju | shaju@gmail.com | jdiw8dhw |
4 | Rubel | rubel | rubel@gmail.com | i2jxejx8ed |
5 | Forhad | forhad | forhad@gmail.com | jde2idj38d |
6 | Juman | juman | juman@gmail.com | edeuhdueh |
You can consider each column has,
id | name | username | password | |
4 bytes | 32 bytes | 16 bytes | 48 bytes | 32 bytes |
Each column has different bytes. The total bytes will be (4 + 32 + 16 + 48 + 32) = 132 bytes. Since we have 6 rows so we need (6 * 132) = 792 bytes in order to store the entire user table.
Generally our data is stored inside the Disk and inside of the disk we have a collection of blocks. Each block holds the certain amount of data. Depending on how many rows, there are number of blocks available.
In our example we have 6 rows and let’s say each block can hold 265 bytes of data. Since we’ve 6 rows and each row has 132 bytes,
265 / 132 = 2, so we can say each block can hold 2 rows.
This is how data stored inside blocks.
Let’s write a query,
SELECT * FROM users WHERE username=juman
We are collecting every information who has juman username. The query process will be,
- Search the Block 1, not found.
- Search the Block 2, not found.
- Search the Block 3, found.
- Then put the entire Block 3 to memory for further process.
Let’s suppose reading/searching a block can take 1 sec, since we searched 3 blocks it will take 1 * 3 = 3 sec. (What if we have 100 blocks then it will take 100 sec in worst case scenario.)
Now let’s optimize this through index.
Index is a referential table that holds row reference against a particular index value. Index is also stored inside the disk block. So for username the index table will be,
username | block_id |
forhad | 3 |
juman | 3 |
lahin31 | 1 |
rubel | 2 |
sadik | 1 |
shaju | 2 |
For each column it the size can be,
username | block_id |
4 bytes | 4 bytes |
So username has 4 bytes and block_id has also 4 bytes, so each column requires 8 bytes.
As we have 6 rows so the total size of the index table will be 6 * 8 = 48. Let’s suppose each block can hold 25 bytes of data. So we total number of blocks required for index is 48 / 25 = 2.
As our query is to find information for username juman. So the query process will be,
- Searching the first index block, it will see username juman has block_id 3.
- It will go to the data block 3.
- Then put the entire Block 3 to memory for further process.
So we just searched index block 1 and data block 3. If we suppose each block read can take 1 second reading 2 blocks (index block 1 and data block 3) can take 2 seconds, in worst case scenario this is better than searching 100 blocks in 100 seconds.
This is how the index works.
Thanks for making this concept easier. Good read…
Most welcome.
Which order index block will keep the data.? So far we see without index block a keeping data ordering by id asc.
This can be assc or desc. By default it is assc. You can specifically say what order (assc or desc) you want.