Introduction
A well-structured database schema is the foundation of any efficient data management system. Whether you are designing a relational database for an application or a data warehouse for analytical processing, understanding schema design principles is crucial. This blog explores database schemas, key data types, database keys, data warehouse schema designs, and compression techniques in Amazon Redshift.
What is a Database?
A database is an organized collection of data stored electronically, allowing easy access, management, modification, and retrieval of information. It serves as a structured repository where data can be efficiently queried and analyzed.
What is a Schema in a Database?
A database schema defines the logical structure of a database, outlining how data is organized, stored, and related. A schema includes:
- Tables and Columns: Defines the structure of data storage.
- Data Types: Specifies the kind of data stored in each column.
- Relationships: Determines how tables relate to each other.
- Constraints: Enforces rules such as primary keys, foreign keys, and unique constraints.
- Views, Indexes, Stored Procedures, and Functions: Enhances data retrieval efficiency.
Data Types in a Database
Data types define the nature of data that can be stored in a column. They determine:
- The type of data (e.g., numbers, text, dates).
- Storage space required.
- Permissible operations on the data.
Common Data Types in Amazon Redshift:
- Numeric Types: SMALLINT, INTEGER, BIGINT, DECIMAL, REAL, DOUBLE PRECISION
- Character Types: CHAR, VARCHAR, TEXT
- Date/Time Types: DATE, TIME, TIMESTAMPTZ
- Boolean Type: BOOLEAN
- Spatial Data Types: GEOMETRY, GEOGRAPHY
- Complex Types: SUPER, VARBYTE
Types of Database Keys
Keys play a crucial role in ensuring data integrity and relationships in relational databases.
1. Primary Key (PK)
- Uniquely identifies each record in a table.
- Cannot have duplicate or NULL values.
- Example:
CREATE TABLE Customers ( CustomerID INT PRIMARY KEY, Name VARCHAR(50), Email VARCHAR(50) );

2. Foreign Key (FK)
- Establishes relationships between tables.
- References a primary key in another table.
- Example:
CREATE TABLE Orders (
OrderID INT PRIMARY KEY,
CustomerID INT,
FOREIGN KEY (CustomerID) REFERENCES Customers(CustomerID)
);

3. Candidate Key
- A set of attributes that uniquely identify a record.
- A table can have multiple candidate keys, but only one is chosen as the primary key.
4. Composite Key
- A primary key consisting of multiple columns.
- Example:
CREATE TABLE CourseRegistrations (
StudentID INT,
CourseID INT,
PRIMARY KEY (StudentID, CourseID)
)
Unique Key
- Ensures unique values in a column but allows NULL values.
- Example:
CREATE TABLE Employees (
EmployeeID INT PRIMARY KEY,
Email VARCHAR(100) UNIQUE
);
Super Key
- A set of attributes that uniquely identify records. Every candidate key is a super key, but not all super keys are candidate keys.
7. Surrogate Key
- A system-generated key, usually an auto-incremented number.
- Example:
CREATE TABLE Products (
ProductID SERIAL PRIMARY KEY,
ProductName VARCHAR(50)
);
Data Warehouse Schema Design
Schema design is vital in data warehouses to optimize query performance.
1. Star Schema
A star schema consists of a central fact table connected to multiple dimension tables.

Components:
- Fact Table: Stores measurable data (e.g., sales, revenue).
- Dimension Tables: Store descriptive attributes (e.g., customers, products, time).
Example:
Fact Table: sales_fact
- sale_id, product_id, customer_id, date_id, amount
Dimension Tables:
dim_customer
(customer details)dim_product
(product details)dim_date
(date information)
Advantages:
- Simple and intuitive design.
- Faster query performance due to denormalization.
Disadvantages:
- Data redundancy leads to increased storage requirements.
2. Snowflake Schema
A snowflake schema is a normalized version of the star schema, reducing redundancy by breaking down dimension tables into sub-dimensions.

Example:
Fact Table: sales_fact
- sale_id, product_id, customer_id, date_id, amount
Dimension Tables:
dim_customer
→ linked tocustomer_region
dim_product
→ linked toproduct_category
dim_date
Advantages:
- Reduced data redundancy.
- Better data integrity.
Disadvantages:
- Complex queries due to multiple joins.
- Slightly slower query performance.
Criteria | Star Schema | Snowflake Schema |
---|---|---|
Complexity | Simple and denormalized | Complex with normalization |
Query Performance | Faster due to fewer joins | Slower due to more joins |
Storage | More storage required | Optimized storage |
Use Case | Best for OLAP, dashboards | Best for large data models |
Columnar Compression in Amazon Redshift
Amazon Redshift uses columnar storage and compression to improve performance and reduce storage costs.
Types of Compression in Redshift:
- Automatic Compression
- Redshift automatically applies the best compression during data loading.
- Example:
COPY table_name FROM 's3://bucket/path' CREDENTIALS 'aws_credentials' COMPUPDATE ON;
Manual Compression
- Users can specify compression encodings manually.
- Example:
CREATE TABLE table_name (
column1 INT ENCODE AZ64,
column2 VARCHAR(255) ENCODE BYTEDICT,
column3 TIMESTAMP ENCODE DELTA
);
Common Compression Encodings:
- BYTEDICT: Best for low-cardinality columns.
- DELTA: Suitable for sorted numeric or timestamp columns.
- ZSTD & AZ64: General-purpose compression for numeric and text data.
Benefits of Compression:
- Reduced storage costs.
- Faster query execution due to reduced I/O.
- Optimized performance for large datasets.
Database schema design is fundamental for data integrity, efficiency, and performance. Whether working with OLTP or OLAP databases, understanding data types, keys, and schema structures like star and snowflake schemas ensures better data management. In Amazon Redshift, columnar storage and compression further enhance query speeds and storage efficiency, making it a powerful choice for big data processing.
References:
Amazon Redshift Documentation: AWS Redshift Compression