I was wondering what’s the best method to store empty fields from source systems in a datawarehouse. For example;a customer record without a first name stored in the database. When loading this record in my datawarehouse, should I set this field to NULL (to save space) or should I load the original value, maybe an empty string?
There is no good answer to this question. It depends on the architecture and rules. When you use a Data Vault to store your data, you should always use the original value from source without cleansing this data, that’s a rule. But after this data vault, there is a data mart with clean data. Should I set these kinds of values to NULL or should I use empty strings?
My experience is when using NULL values you need to remember a rule. When concatenating fields like building an adress or full name, you will do someting like this:
SELECT FirstName + ‘ ‘ + LastName FROM dbo.Customer
When the FirstName is empty, the result will be NULL. So when you use NULL as FirstName, you should alway do an extra check for non empty values:
SELECT ISNULL(FirstName,”) + ‘ ‘ + ISNULL(LastName,”) FROM dbo.Customer
Also a very important note is when using numbers. Maybe you have three columns in your datamart for profit which you need to SUM in your report. When one column contains a NULL value, the result will be NULL!
Column1 Column2 Column3
3 NULL 2
SELECT Column1 + Column2 + Column3 FROM dbo.Table
SELECT ISNULL(Column1,0) + ISNULL(Column2,0) + ISNULL(Column3,0) FROM dbo.Table
Consided to store an empty string in a NULL value to prevent concatenation problemns and consider to store the value 0 in columns which are used to sum. Maybe a better solution is the store extra fields in the datamart with concatenated data!