Insight into 3NF v Star v Snowflake Schemas:

In this article
  1. 3NF
  2. Star Schema
  3. Snowflake Schema
1.Third Normal Form (3NF)

Purpose & Use Case

3NF eliminates data redundancy and ensures data integrity for transactional systems. Perfect for order processing, inventory management, and customer relationship management. It is a normalised approach to data which is mainly seen to be used in OLTP databases, it aims to eliminate redundancies & maintain integrity which avoids anomalies.

Advantages: 

  • Eliminates data redundancy
  • Ensures data integrity and consistency
  • Reduces storage requirements
  • Prevents update/insert/delete anomalies
  • Optimised for transactional operations

Disadvantages: 

  • Complex queries require multiple joins
  • Poor performance for analytical queries
  • Difficult for business users to understand
  • Not optimized for aggregations
  • Higher query complexity for reporting
2. Star Schema

Purpose & Use Case

Star schema introduces controlled redundancy to optimise analytical queries. Perfect for business intelligence, reporting, and data analysis. The “star” shape comes from a central fact table surrounded by dimension tables. This type of schema is usually used for analytics within an OLAP database (such as snowflake, redshift, big query).

 

The above shows an example of the star schema approach. This schema (using the same data set from previous 3NF) introduces an additional layer from the 3NF, where the 3NF is more related to transactions from data source in its raw form (hence OLTP), the star schema approach introduces an additional transformation which in usually used within ELT/ETL process to reshape the data making it more business friendly structure for reporting and analytics. 
This layer introduces joins hence denormalising the data:

Denormalise relationships:

  • products + categoriesdim_product
  • customers + addressesdim_customer
 

Create calculated fields:

  • order_items.quantity * order_items.unit_pricefact_sales.total_amount
  • total_amount - cost_amountfact_sales.profit_amount
 

Add business logic:

  • Customer segmentation rules → dim_customer.customer_segment
  • Product performance categories → dim_product.performance_tier
 

Create time dimensions:

  • Extract date parts from orders.order_datedim_date with quarters, fiscal years, etc.

Advantages: 

  • Eliminates data redundancy
  • Ensures data integrity and consistency
  • Reduces storage requirements
  • Prevents update/insert/delete anomalies
  • Optimised for transactional operations

Disadvantages: 

  • Data redundancy increases storage
  • More complex ETL processes
  • Potential data inconsistency
  • Higher maintenance overhead
  • Not suitable for transactional systems
Key Benefits of This Approach
    1. Separation of Concerns: OLTP systems focus on operations, OLAP on analytics
    2. Performance: Each system optimized for its purpose
    3. Data Quality: Raw data preserved while transformed data serves business needs
    4. Flexibility: Can create multiple transformed views from same raw data
    5. Historical Preservation: Can rebuild analytics if business rules change
3 Snowflake Schema

Purpose & Use Case

Snowflake schema normalises dimension tables to reduce redundancy while maintaining analytical performance. Used when storage costs are high or when dimension tables are very large

The above shows an example of the snowflake schema approach. albeit normalised for OLAP which a storage conscious approach:

Selective Normalisation:

  • Keep customers separate from addressesdim_customer + dim_address + dim_city + dim_state

  • Keep products separate from categoriesdim_product + dim_category + dim_department
  • Split geographic data → dim_store + dim_region + dim_country
 

Create Calculated Fields (Same as Star):

  • order_items.quantity * order_items.unit_pricefact_sales.total_amount
  • total_amount - cost_amountfact_sales.profit_amount
 

Hierarchical Business Logic:

  • Customer data → dim_customer.customer_segment (references dim_segment table)
  • Geographic rules → dim_city.metro_area_flag (references dim_metro_area)
  • Product hierarchy → dim_category.department_id (references dim_department)
 

Create Normalised Time Dimensions:

  • Extract from orders.order_datedim_date (links to dim_monthdim_quarterdim_year)

Key Differences: Star vs Snowflake Transformations

 

AspectStar SchemaSnowflake Schema
Dimension CreationDenormalise everything into single dimensionsNormalise dimensions into multiple related tables
Customer Datacustomers + addressesdim_customer (all in one)customersdim_customerdim_addressdim_city
Product Dataproducts + categoriesdim_product (all in one)productsdim_productdim_categorydim_department
Storage Trade-offHigher redundancy, faster queriesLower redundancy, more joins
ETL ComplexitySimpler transformationsMore complex relationship management

Advantages: 

  • Reduced storage requirements
  • Better data integrity than star schema
  • Easier maintenance of dimension attributes
  • Eliminates redundancy in dimensions
  • More flexible for complex hierarchies

Disadvantages: 

  • More complex queries with multiple joins
  • Potentially slower query performance
  • More complex to understand and navigate
  • Higher ETL complexity
  • Less intuitive for business users
Key Takeaways for Implementation

When to Use Each Approach:

  • 3NF: Use for operational systems requiring data integrity and frequent transactions
  • Star Schema: Use for data warehouses prioritising query performance and user experience
  • Snowflake Schema: Use when storage costs are critical or dimensions are extremely large

The Value of Redundancy in OLAP:

Unlike OLTP systems where redundancy is harmful, OLAP systems benefit from strategic redundancy because:

  • Read Optimisation: Data is primarily read, not updated
  • Column Storage: Repeated values compress extremely well
  • Query Speed: Fewer joins mean faster aggregations
  • Business Logic: Pre-calculated metrics reduce computation time
  • Historical Preservation: Slowly Changing Dimensions maintain data lineage

Get in touch!

What type of project are you interested in?
Where can I reach you?
Where would you like to discuss?