Introduction
In data warehousing, surrogate keys and slowly changing dimensions (SCD) are fundamental concepts that ensure data consistency and historical tracking. Surrogate keys provide unique, stable identifiers for dimension records, while SCD techniques handle changes in descriptive attributes over time. This blog explains their importance, how they work, and why they are essential for a robust data warehouse.

What are Surrogate Keys?
Surrogate keys are system-generated identifiers used in dimension tables to maintain unique records. Unlike natural keys (from source systems), surrogate keys are stable and unaffected by backend changes.
Key Advantages of Surrogate Keys:
- Independence: They don’t depend on source system keys, ensuring stability during source system updates.
- Conflict Resolution: Eliminate key conflicts when data comes from multiple sources.
- SCD Compatibility: Support tracking historical changes by allowing duplicate natural keys with distinct surrogate keys.
Example:
Suppose a customer’s ID (CUST1001
) from a CRM system is updated to CUST2001
. With surrogate keys, the data warehouse can maintain the original key (1
) without disruptions.
What are Slowly Changing Dimensions (SCD)?
Slowly Changing Dimensions refer to attributes in dimension tables that change infrequently over time. SCD ensures these changes are recorded to preserve historical accuracy.
Types of SCD:
- Type 1 (Overwrite): Updates the existing value, losing historical data.
- Example: Correcting a spelling error in a customer’s name.
- Type 2 (History Tracking): Adds a new row for each change, maintaining history.
- Example: Tracking a customer’s address changes over time.
- Type 3 (Limited History): Adds a new column for the previous value, capturing only the latest change.
- Example: Storing both the current and previous product categories.
How Surrogate Keys Support SCD:
For Type 2, surrogate keys are critical to differentiate rows with the same natural key (e.g., Customer ID) but different historical records (e.g., Address A vs. Address B).
Examples for Surrogate Keys and SCD
1. Surrogate Keys Example
A dimension table with Customer Information that uses both surrogate and natural keys.
Surrogate Key | Customer ID (Natural Key) | Customer Name | Address | Phone Number |
---|---|---|---|---|
1 | CUST1001 | John Doe | 123 Maple St | 123-456-7890 |
2 | CUST1002 | Jane Smith | 456 Oak Ave | 987-654-3210 |
3 | CUST1003 | Alice Johnson | 789 Pine Blvd | 555-123-4567 |
Key Insight:
- The Surrogate Key is a system-generated unique identifier independent of the Customer ID from the source system.
- If the Customer ID changes in the source system, it does not impact the surrogate key.
2. Slowly Changing Dimensions (SCD) Example
SCD Type 1 (Overwrite):
Customer ID | Customer Name | Address |
---|---|---|
CUST1001 | John Doe | 456 Oak Ave |
Scenario:
- If John Doe moves to a new address, the previous address is overwritten. No historical data is retained.
SCD Type 2 (History Tracking):
Surrogate Key | Customer ID | Customer Name | Address | Effective Date | End Date |
---|---|---|---|---|---|
1 | CUST1001 | John Doe | 123 Maple St | 2024-01-01 | 2024-05-31 |
2 | CUST1001 | John Doe | 456 Oak Ave | 2024-06-01 | NULL |
Scenario:
- When John Doe moves to a new address, a new row is added with updated information and an effective date.
- The historical record is preserved by updating the end date of the previous record.
SCD Type 3 (Limited History):
Customer ID | Customer Name | Current Address | Previous Address |
---|---|---|---|
CUST1001 | John Doe | 456 Oak Ave | 123 Maple St |
Scenario:
- Only the latest change is recorded, and the previous address is stored in a separate column. Historical data is limited to the last update.
Combining Surrogate Keys and SCD
When designing a data warehouse:
- Use surrogate keys as primary keys in dimension tables to avoid reliance on mutable natural keys.
- Implement SCD strategies to retain accurate historical data for reporting and analysis.
- Store natural keys as reference columns to map back to the source system if needed.
Customer Dimension Table with Surrogate Keys and SCD Type 2:
Surrogate Key | Customer ID | Customer Name | Address | Effective Date | End Date |
---|---|---|---|---|---|
1 | CUST1001 | John Doe | 123 Maple St | 2024-01-01 | 2024-05-31 |
2 | CUST1001 | John Doe | 456 Oak Ave | 2024-06-01 | NULL |
Fact Table Example:
Transaction ID | Surrogate Key (Customer) | Transaction Amount | Transaction Date |
---|---|---|---|
101 | 1 | $100 | 2024-03-15 |
102 | 2 | $200 | 2024-07-20 |
Key Insight:
- The Fact Table uses the Surrogate Key to reference the dimension table.
- Transactions before June 2024 are linked to the old address (key 1), and later transactions are linked to the updated address (key 2).
Surrogate keys and SCD are essential for building a reliable, scalable, and history-aware data warehouse. Surrogate keys provide stability, while SCD techniques ensure accurate tracking of changes. Together, they enable organizations to make data-driven decisions with confidence.