joel.fail

The stories of my failures

Using mini dimension for fast-changing attributes

Introduction

This blog post will be about the use of mini-dimensions for fast-changing attributes, also known as the type 4 slowly changing dimension.
Dimension tables in a dimensional model are usually wide and contain alot of attributes. If the case is there where one or a few of the attributes are fast changing, then this approach is a good choice.
An example when this could be used, is in the supermarket industry, where the product catalog is more or less stable, but the prices are not. Due to the high nature of the competitive market and the thin margins supermarkets operate in they often have to adjust their prices to inflations or deflations from their supplier, who on their turn often have to do it aswell because of increasing or decreasing raw material cost or other supply chain costs.

The design

The modeling technique of the type 4 SCD is very similar to that for the type 2 SCD. In there type 4 SCD mini-dimension there will be the capture of the fast changing attributes with a reference to the original key. There will be however another table that will be the master table where all the other fairly static attributes will be stored, this will be a type 1 SCD (In this type the new data overwrites the existing data, this is usually the default dimension).

When designing the mini-dimension, it is important to first identify the fast changing attributes, such as the price in the example of the supermarket products, and split them off. Let’s create an example. We will base this on our imaginary coffee specialty shop CoffeeMart.

use master
Create database prd_db_coffeemart;
go
use prd_db_coffeemart;
go
create schema dim;
go
create table dim.Products
(
    DimProductId int identity (1,1),
    Internal_Product_Code varchar(50),
    Product_Name varchar(50),
    Product_Category varchar(50),
    Gross_Weight_Grams int,
    Sale_Price money,
	Purch_Price money,
    Origin varchar(20),
    Biologic varchar(10)
)


INSERT INTO dim.Products
    (Internal_product_Code, Product_Name, Product_Category, Gross_Weight_Grams, Sale_Price, Purch_Price, Origin, Biologic)
VALUES
    (round((rand()*100000),0), 'El Padrón Arabica', 'Arabica', 500, '11.95', '3.22', 'Nicaragua', 'Yes'),
    (round((rand()*100000),0), 'El Padrón Arabica', 'Arabica', 250, '7.95', '3.77', 'Nicaragua', 'Yes'),
    (round((rand()*100000),0), 'Kopi Luwak', 'Arabica', 250, '44.95', '31.53', 'Indonesia', 'No'),
    (round((rand()*100000),0), 'Kopi Luwak', 'Arabica', 100, '24.95', '18.41', 'Indonesia', 'No'),
    (round((rand()*100000),0), 'CoffeeMart Best', 'Blended', 2000, '19.95', '7.39', 'Uganda', 'Yes'),
    (round((rand()*100000),0), 'CoffeeMart Best', 'Blended', 1000, '12.95', '5.46','Uganda', 'Yes');

Select * from dim.products

After running the query you should see some similar output to here above. This would be the base for the master dimension, the type 1 dimension that gets overwritten.
However we would like to keep monitoring our margins over time correct? And if needed we will need to adjust our selling price.
Like we said above often products like supermarket products and in this case coffee are very susceptible to price fluctuations due to the cost of the raw material and other supply chain costs.
So we need to identify the fast changing attributes, I have highlighted them in the above screenshot. The sale price is a susceptible to change due to the possibility of the purchase price rising a lot. So we need to take that in mind.
About the other attributes we can be pretty sure they are mostly immutable and will not change much in our interest in the future.

So now we have identified the fast changing attributes we need to split them off to a new table, this will be the mini dimension. First of all we need to redesign our master product table, at the same time we’ll create the mini dimension.

create schema stg
go
create table stg.Products
(
    Internal_Product_Code varchar(50),
    Product_Name varchar(50),
    Product_Category varchar(50),
    Gross_Weight_Grams int,
    Sale_Price money,
    Purch_Price money,
    Origin varchar(20),
    Biologic varchar(10)
)
create table dim.Products
(
    DimProductId int identity (1,1),
    Internal_Product_Code varchar(50),
    Product_Name varchar(50),
    Product_Category varchar(50),
    Gross_Weight_Grams int,
    Origin varchar(20),
    Biologic varchar(10)
)

Create table dim.ProductsPricing
(
    DimProductPricingId int identity(1,1),
    DimProductId int,
    Sale_Price money,
    Purch_Price money,
    date_price datetime
)
go
INSERT INTO stg.Products
    (Internal_product_Code, Product_Name, Product_Category, Gross_Weight_Grams, Sale_Price, Purch_Price, Origin, Biologic)
VALUES
    (round((rand()*100000),0), 'El Padrón Arabica', 'Arabica', 500, '11.95', '3.22', 'Nicaragua', 'Yes'),
    (round((rand()*100000),0), 'El Padrón Arabica', 'Arabica', 250, '7.95', '3.77', 'Nicaragua', 'Yes'),
    (round((rand()*100000),0), 'Kopi Luwak', 'Arabica', 250, '44.95', '31.53', 'Indonesia', 'No'),
    (round((rand()*100000),0), 'Kopi Luwak', 'Arabica', 100, '24.95', '18.41', 'Indonesia', 'No'),
    (round((rand()*100000),0), 'CoffeeMart Best', 'Blended', 2000, '19.95', '7.39', 'Uganda', 'Yes'),
    (round((rand()*100000),0), 'CoffeeMart Best', 'Blended', 1000, '12.95', '5.46', 'Uganda', 'Yes');

INSERT INTO dim.Products
    (Internal_product_Code, Product_Name, Product_Category, Gross_Weight_Grams, Origin, Biologic)
SELECT Internal_product_Code, Product_Name, Product_Category, Gross_Weight_Grams, Origin, Biologic
from stg.Products

SELECT *
from dim.Products



Great! so now our dim.Products table should look something like this right?

OK, now it’s all set and we are prepared to load some data into the pricing table. For the purpose of showing the effect of the history variation, you can run this query to create some synthetic data.
In this scenario we assumed the wholesaler provides the daily purchasing prices of the coffee to CoffeeMart through EDI for example.

declare @days int = 365
declare @i int = 1

while @i < @days begin
    Insert into dim.ProductsPricing
        (DimProductId, Sale_Price, Purch_Price, date_price)
    SELECT coalesce(prd.DimProductId,-1) as DimProductId
      , stg.[Sale_Price]
      , stg.[Purch_Price] * (Rand() + 0.4)
	  , DATEADD(dd, @i, getdate()) Inserted_Date
    FROM [prd_db_coffeemart].[stg].[Products] stg
        left join dim.Products prd on stg.internal_product_code = prd.Internal_Product_Code

    set @i = @i + 1
end

You should have a similar result as above. There will be fluctuations in the purchasing prices, if you look close you’ll even see days where there is a negative margin made!
If you would look closely on the profit margin, considering they would buy product every day (often not the case) as usually it will depend on the reorder point at a certain level of stock. Visualized it would look something like this (of course this is a very exegerated fluctuaction and not realistic) but that’s because of the randomized data. This is the visual for Q1 2022 of the product CoffeeMart Best products.

Final word

I hope you enjoyed reading this blog post about mini dimensions, as you can imagine implenting a SCD type 2 in a environment where the attributes are changing alot and it only implies a couple of attributes, like in the above example. This possibility is more efficient, as it saves alot of space on huge dimension tables.

Leave a Comment

Your email address will not be published.