Purpose & Use Case
3NF eliminates data redundancy and ensures data integrity for transactional systems. Perfect for order processing, inventory management, and customer relationship management. It is a normalised approach to data which is mainly seen to be used in OLTP databases, it aims to eliminate redundancies & maintain integrity which avoids anomalies.
Advantages:
Disadvantages:
Purpose & Use Case
Star schema introduces controlled redundancy to optimise analytical queries. Perfect for business intelligence, reporting, and data analysis. The “star” shape comes from a central fact table surrounded by dimension tables. This type of schema is usually used for analytics within an OLAP database (such as snowflake, redshift, big query).
The above shows an example of the star schema approach. This schema (using the same data set from previous 3NF) introduces an additional layer from the 3NF, where the 3NF is more related to transactions from data source in its raw form (hence OLTP), the star schema approach introduces an additional transformation which in usually used within ELT/ETL process to reshape the data making it more business friendly structure for reporting and analytics.
This layer introduces joins hence denormalising the data:
Denormalise relationships:
products
+ categories
→ dim_product
customers
+ addresses
→ dim_customer
Create calculated fields:
order_items.quantity * order_items.unit_price
→ fact_sales.total_amount
total_amount - cost_amount
→ fact_sales.profit_amount
Add business logic:
dim_customer.customer_segment
dim_product.performance_tier
Create time dimensions:
orders.order_date
→ dim_date
with quarters, fiscal years, etc.Advantages:
Disadvantages:
Purpose & Use Case
Snowflake schema normalises dimension tables to reduce redundancy while maintaining analytical performance. Used when storage costs are high or when dimension tables are very large
The above shows an example of the snowflake schema approach. albeit normalised for OLAP which a storage conscious approach:
Selective Normalisation:
Keep customers
separate from addresses
→ dim_customer
+ dim_address
+ dim_city
+ dim_state
products
separate from categories
→ dim_product
+ dim_category
+ dim_department
dim_store
+ dim_region
+ dim_country
Create Calculated Fields (Same as Star):
order_items.quantity * order_items.unit_price
→ fact_sales.total_amount
total_amount - cost_amount
→ fact_sales.profit_amount
Hierarchical Business Logic:
dim_customer.customer_segment
(references dim_segment
table)dim_city.metro_area_flag
(references dim_metro_area
)dim_category.department_id
(references dim_department
)Create Normalised Time Dimensions:
orders.order_date
→ dim_date
(links to dim_month
→ dim_quarter
→ dim_year
)Key Differences: Star vs Snowflake Transformations
Aspect | Star Schema | Snowflake Schema |
---|---|---|
Dimension Creation | Denormalise everything into single dimensions | Normalise dimensions into multiple related tables |
Customer Data | customers + addresses → dim_customer (all in one) | customers → dim_customer → dim_address → dim_city |
Product Data | products + categories → dim_product (all in one) | products → dim_product → dim_category → dim_department |
Storage Trade-off | Higher redundancy, faster queries | Lower redundancy, more joins |
ETL Complexity | Simpler transformations | More complex relationship management |
Advantages:
Disadvantages:
When to Use Each Approach:
The Value of Redundancy in OLAP:
Unlike OLTP systems where redundancy is harmful, OLAP systems benefit from strategic redundancy because: