Data Modelling Concepts - Renne Botchway

Insight into 3NF v Star v Snowflake Schemas:

3NF
Star Schema
Snowflake Schema

Purpose & Use Case

3NF eliminates data redundancy and ensures data integrity for transactional systems. Perfect for order processing, inventory management, and customer relationship management. It is a normalised approach to data which is mainly seen to be used in OLTP databases, it aims to eliminate redundancies & maintain integrity which avoids anomalies.

Advantages:

Eliminates data redundancy
Ensures data integrity and consistency
Reduces storage requirements
Prevents update/insert/delete anomalies
Optimised for transactional operations

Disadvantages:

Complex queries require multiple joins
Poor performance for analytical queries
Difficult for business users to understand
Not optimized for aggregations
Higher query complexity for reporting

Purpose & Use Case

Star schema introduces controlled redundancy to optimise analytical queries. Perfect for business intelligence, reporting, and data analysis. The “star” shape comes from a central fact table surrounded by dimension tables. This type of schema is usually used for analytics within an OLAP database (such as snowflake, redshift, big query).

The above shows an example of the star schema approach. This schema (using the same data set from previous 3NF) introduces an additional layer from the 3NF, where the 3NF is more related to transactions from data source in its raw form (hence OLTP), the star schema approach introduces an additional transformation which in usually used within ELT/ETL process to reshape the data making it more business friendly structure for reporting and analytics.
This layer introduces joins hence denormalising the data:

Denormalise relationships:

products + categories → dim_product
customers + addresses → dim_customer

Create calculated fields:

order_items.quantity * order_items.unit_price → fact_sales.total_amount
total_amount - cost_amount → fact_sales.profit_amount

Add business logic:

Customer segmentation rules → dim_customer.customer_segment
Product performance categories → dim_product.performance_tier

Create time dimensions:

Extract date parts from orders.order_date → dim_date with quarters, fiscal years, etc.

Advantages:

Eliminates data redundancy
Ensures data integrity and consistency
Reduces storage requirements
Prevents update/insert/delete anomalies
Optimised for transactional operations

Disadvantages:

Data redundancy increases storage
More complex ETL processes
Potential data inconsistency
Higher maintenance overhead
Not suitable for transactional systems

Key Benefits of This Approach

1. Separation of Concerns: OLTP systems focus on operations, OLAP on analytics
2. Performance: Each system optimized for its purpose
3. Data Quality: Raw data preserved while transformed data serves business needs
4. Flexibility: Can create multiple transformed views from same raw data
5. Historical Preservation: Can rebuild analytics if business rules change

Purpose & Use Case

Snowflake schema normalises dimension tables to reduce redundancy while maintaining analytical performance. Used when storage costs are high or when dimension tables are very large

The above shows an example of the snowflake schema approach. albeit normalised for OLAP which a storage conscious approach:

Selective Normalisation:

Keep customers separate from addresses → dim_customer + dim_address + dim_city + dim_state
Keep products separate from categories → dim_product + dim_category + dim_department
Split geographic data → dim_store + dim_region + dim_country

Create Calculated Fields (Same as Star):

order_items.quantity * order_items.unit_price → fact_sales.total_amount
total_amount - cost_amount → fact_sales.profit_amount

Hierarchical Business Logic:

Customer data → dim_customer.customer_segment (references dim_segment table)
Geographic rules → dim_city.metro_area_flag (references dim_metro_area)
Product hierarchy → dim_category.department_id (references dim_department)

Create Normalised Time Dimensions:

Extract from orders.order_date → dim_date (links to dim_month → dim_quarter → dim_year)

Key Differences: Star vs Snowflake Transformations

Aspect	Star Schema	Snowflake Schema
Dimension Creation	Denormalise everything into single dimensions	Normalise dimensions into multiple related tables
Customer Data	`customers` + `addresses` → `dim_customer` (all in one)	`customers` → `dim_customer` → `dim_address` → `dim_city`
Product Data	`products` + `categories` → `dim_product` (all in one)	`products` → `dim_product` → `dim_category` → `dim_department`
Storage Trade-off	Higher redundancy, faster queries	Lower redundancy, more joins
ETL Complexity	Simpler transformations	More complex relationship management

Advantages:

Reduced storage requirements
Better data integrity than star schema
Easier maintenance of dimension attributes
Eliminates redundancy in dimensions
More flexible for complex hierarchies

Disadvantages:

More complex queries with multiple joins
Potentially slower query performance
More complex to understand and navigate
Higher ETL complexity
Less intuitive for business users

When to Use Each Approach:

3NF: Use for operational systems requiring data integrity and frequent transactions
Star Schema: Use for data warehouses prioritising query performance and user experience
Snowflake Schema: Use when storage costs are critical or dimensions are extremely large

The Value of Redundancy in OLAP:

Unlike OLTP systems where redundancy is harmful, OLAP systems benefit from strategic redundancy because:

Read Optimisation: Data is primarily read, not updated
Column Storage: Repeated values compress extremely well
Query Speed: Fewer joins mean faster aggregations
Business Logic: Pre-calculated metrics reduce computation time
Historical Preservation: Slowly Changing Dimensions maintain data lineage

Insight into 3NF v Star v Snowflake Schemas:

Key Benefits of This Approach

Get in touch!