A data schema is a structure that defines the relationships of data in a database or other data storage system.
A fact table is the ‘main table’, usually describing events that have taken place.
- Usually the largest in terms of data fields.
- This is a table that is likely to change/updates.
- Duplicates are likely in fact tables, as an event can occur multiple times.
A dimension table is a table that holds information about categorical fields in the fact table.
- It should be smaller than a fact table, as it is used to slice or group it.
- A dimension table should not contain duplicates.
A star schema has one fact table, and multiple dimension tables all coming off the fact table.
A snowflake schema has one fact table, and multiple dimension tables. Some dimension tables may reference other dimension tables.
In PowerBI, these are important as it prefers tables that are tall and slim, than wide. So setting up schemas is important for importance.
A thing to note it in PowerBI, unlike Tableau, when forming relationships you can use only one field to link tables together.
This means if you have two tables, e.g. Orders and Shipping, and the granularity of both is OrderID and ProductID, you would need to create a field ‘OrderIDProductID’ which merges them, and then use that to form the relationship.
Forming relationships, you will also need to define the cardinality. Cardinality tells us how many records from one table will meet how many records from another table.
This can be
- One to One
- One to Many
- Many to One
- Many to Many
You should always avoid Many to Many is possible, as Many to Many = Many errors.
A Dimension → Fact Table relationship is a one to many relationships.
Cardinality in PowerBI is represented by the 1’s and *’s at the end of each line. 1 → * being one to many, * → * being many to many etc.