What is the Star Schema?
The Star Schema is basically the simplest form of a data warehouse. This schema is made up of fact tables and dimension table. We have covered dimension tables in previous articles but the concept of fact tables is fairly new.
A fact table contains measurable or factual data about an organization. The information contained in the schema is usually numerical, additive measurements, the tables can consist of numerous columns and an extensive amounts of rows.
The two tables are different from each other only in the way that they are used in the schema. They are actually made up of the same structure and the same SQL syntax is used to create them as well.
Interestingly enough in some schemas a fact table can also play the role of a dimension table in certain conditions and vice versa. Though they may be physically a like it is vital that we also understand the differences between fact table and dimension tables.
A fact table in a sales database, used with the star schema, could deal with the revenue for products of an organization from each customer in each market over a period of time. However, a dimension table in the same database would define the organizations customers, the markets, the products, and the time periods that are found in the fact tables.
When a schema is designed right it will offer dimensions tables that enable the user to leaf through the database and get comfortable with the information that it contains. This helps the user when they need to write queries with constraints so that the information that gratifies those constraints is routed back into the database.
Star Schema Important Issues
As with any other schema performance is a big deal with the Star Schema. The decision support system is particularly important; users utilize this system to query large quantities of data. Star Schema’s happen to perform the most adequate decision support applications.
Another issue that is important mention are the roles that fact and dimension tables play in a schema. When considering the material databases, the fact table is essentially a referencing table, where as the dimension table plays the role of a referenced table.
We can correctly come to the conclusion that a fact table has a foreign key to reference other tables and a dimension table is the foreign key reference from one or multiple tables.
Tables that are references or are referenced by other tables have what is known as a primary key. A primary key is a column or columns with contents that specifically identify the rows. With simple star schemas, the fact table’s primary key can have multiple foreign keys.
The foreign key can be a column or a group of columns in a table which has values that are identified by the primary key of another table. When a database is developed the statements used to make the tables should select the columns that are meant to form the primary keys as well as the foreign keys. Below is an example of a Star Schema.
{qbapagebreak title=Simple Star Schema}
Simple Star Schema
- The Bold column name Indicates the primary key
- Lines indicate one to many foreign key relationships
- Bold italic column names indicate the primary key that is a foreign key to another table
Let’s point out a few things about the Star Schema above:
- Items listed in the boxes above are columns in the tables with the same names as the box names.
- The Primary key columns are in bold text.
- The foreign key columns are in italic text (you can see that the primary key from the green Dimension box is also a key in the orange box, the primary key from the turquoise box is also a foreign key in the orange box.)
- You can see that columns that are part of the primary key and the foreign keys are labeled in bold and italic text, like the key 1 in orange box.
- The foreign key relationships are identified by the lines that are used to connect the boxes that represent tables.
Even though a primary key value must be one of a kind in the rows of a dimension table the value can take place many times in a foreign key of a fact table, as in a many to one relationship. The many to one relationship can be present between the foreign keys of the fact table and the primary key they refer to in the dimension tables.
The star schema can hold many fact tables as well. Multiple fact tables are present because the have unrelated facts, like invoices and sales. With some situations multiple fact tables are present simply to support performance.
You can see multiple fact tables serving this purpose when they are used to support levels of summary data, more specifically when the amount is large, like with daily sales data.
Referencing tables are also used to define many-to-many relationships between dimensions. This is usually referred to as an associative table or even a cross-reference table. This can be seen at work in the sales database as well. In a sales database each product has one or more groups that is belongs to, each of those groups also contain many products.
The many-to-many relationships is designed through the establishment of a referencing table that is meant to define the various combinations of the products and groups within the organization.
We can also identify many-to-many relationships by having dimension tables with multicolumn primary keys that serve as foreign key references in fact tables.
A rough example of this would be yet again with the sale database, as we said before each product is in one or more groups and each of those grouse have multiple products, which is a many-to-many relationship.
Designing a Star Schema
When designing a schema for a database we must keep in mind that the design affects the way in which it can be used as well as the performance.
Due to this fact it is vital that one makes the preliminary investment in time and research they dedicate to the design a database one that is beneficial to the needs of its user. Let’s wrap things up with a few suggestions about things to consider when designing a schema:
- What is the function of the organization? Identify what the main processes are for the organization; it may be sales, product orders, or even product assembly, to name a few. This is a vital step; the processes must be identified in order to create a useful database.
- What is meant to be accomplished? As all databases, a schema should reflect the organization, in what it measures as well as what it tracks.
- Where is the data coming from? It is imperative to consider projected put in data and its sources will disclose whether the existing data can support the projected schema.
- What dimensions and attributes of the organization will be reflected by the dimension tables?
- Will there be dimensions that may change in time? If the organization contains dimensions that change often then it is better to measure it as a fact, rather then have it stored as a dimension.
- What is the level of detail of the facts? Each row should contain the same kind of data. Differing data would be addressed with a multiple fact table design or even by modifying the single table so that there is a flag to identify the differences can be stored with the data. You want to consider the amount of data, the space, and the performance needs when deciding how to deal with different levels of detail in data.
- If there are changes how will they be addressed, and how significant is historical information?