Design a scalable, maintainable database schema with Database Normalization

When you build a large-scale system, you want your schema to be scalable and maintainable so that, in the future, you can extend the schema without any hassle. If your database schema contains data redundancy, then (depending on your system requirements) you will face a hard time in the future. So as engineer, it is our prime responsibility to ensure that our database schema is well-architected.

To achieve that, we need to follow database normalization. It helps us to minimize data redundancy and improves data integrity by eliminating different anomalies.

There are three kinds of anomalies. Insertion Anomalies, which occur when you add new data to a database table, due to a lack of foreign key constraints, you will face difficulties. Deletion Anomalies, which occur when losing necessary data due to deleting a record in a table that is linked to other records through relationships. Update Anomalies, which occur when updating a record in one table and forgetting to update it in a different table, resulting in data inconsistency.

Databases like MySQL, PostgreSQL, or any other relational database are recommended to follow normalization, while databases like MongoDB or any other non-relational database strictly don’t recommend to follow normalization.

There are six kinds of normalization. Let’s explore the first three normalizations with a scenario where we are creating a schema (mysql) for an e-commerce solution.

Let’s say we have users and products tables. A user has a name, email, password, etc., and a product has name, image, price, category, brand, and quantity.

To maintain database normalization let’s explore normalization form for users and products tables.

First Normal Form (1NF)

It states that each column must contain an atomic value. For instance, the product_name column should contain the product name, not the product price or product quantity. Also, 1NF ensures it contains one name rather than a comma-separated list of names.

Second Normal Form (2NF)

To achieve a second normal form, the table must be the first normal form, or 1NF and if any non-key column is partially dependent with any primary key, we must split that column with that primary key. This ensures that each attribute in the table is fully functionally dependent on the entire primary key.

For our e-commerce example this is the products table,

idnamecategorypricebrandquantity
10Macbook Air M1 8gbLaptop130000Apple50000
11Dell InspironLaptop75000Dell2000
12Apple Magic MouseMouse15000Apple1000
13HP 250 G8 IntelLaptop30000HP20000

Now does this table follow the second normal form? Answer is NO. Why?

Here category Laptop appears in three different rows as well as brand Apple appears in two rows. In that case category and brand partially depend on the id or name column, together they formed a Composite Key. This is totally the inverse of 2NF or Second Normal Form.

Let’s update products table to make it follow 2NF,

idnamepricequantity
10Macbook Air M1 8gb13000050000
11Dell Inspiron750002000
12Apple Magic Mouse150001000
13HP 250 G8 Intel3000020000

Now the product table contains information that is directly related to the product. Create a new table called categories.

idname
20Laptop
21Mouse

Now the categories table only contains category information. Let’s create a new table for brands,

idname
30Apple
31Dell
32HP

Now the brands table contains only brand information.

This is optional, create two new tables called product_categories and product_brands. These tables contain the connection between products with categories and product with brands.

idproduct_idcategory_id
501020
511120
521221
531320
idproduct_idbrand_id
601030
611131
621230
631332

By splitting into different tables now every table follows Second Normal Form or 2NF. Now none of the tables has partial dependency.

Third Normal Form (3NF)

To maintain the third normal form, a table must follow the second normal form, and the table must not contain transitive dependency.

What is Transitive Dependency?

In a table if a column is dependent with any non-primary key column then we can say the table has Transitive Dependency.

For our products table,

  • All non-key attributes (name, price, quantity) are fully dependent on the primary key (id).
  • There are no transitive dependencies, which means each non-key attribute depends only on the primary key.

So therefore the products table already maintains Third Normal Form or 3NF.

That’s it. This was a practical example of Database Normalization. I hope you understand. If you have any question or suggestion please comment.

Leave a Comment

Your email address will not be published. Required fields are marked *