When you first start using QlikView (and this applies to the start of a new project) your data structure will probably be a simple snowflake schema where data is loaded ‘as is’ table by table and linked by individual fields.
This is great for checking the data and for initial development of the dashboard user interface (UI) although as I’ve discussed in a recent post (The Optimal QlikView Data Structure) you’ll probably find it necessary to develop that structure into a Star Schema.
QlikView offers a number of ways to connect data within a table; Joins, Concatenation and the ApplyMap function are the methods I’m going to talk through here.
By joining data you’re taking two feeds of information and connecting them to the same line of data as defined by a Key which is defined by any number of identically named fields from both tables.
If we focus on the left join (as it tends to be the most common) then the second table loaded is joined to the first. Any matching rows (as defined by the key) will be loaded from the second table and rows which don’t match will not be loaded.
We can see this clearly by looking at the results from left joining these two tables
Left Join (TableA)
• ID 4 doesn’t exist in the second table so Field B is a null value
• ID 5 doesn’t have a corresponding record in the first table so it’s not loaded
This is a simple example and from here we can quickly start joining information together. However there’s more going on which isn’t immediately clear. If there is duplication of the Keys the join behaves differently.
Left Join (TableA)
Here we can see a potential problem. The original data of TableA has a FieldA value of 20 for ID2. Now because there was a duplication of ID2 in the second table the output shows two records for ID2 and both with a value of FieldA value of 20. If you were to sum this within QlikView you would return 40 which is incorrect. I’ve created a detailed blog video explaining this.
It’s therefore important that you understand the data you’re linking and ensure the integrity of the final output.
There are four main types of join available in QlikView:
• Left Join – all ID’s have to be present in the first table.
• Right Join – all ID’s have to be present in the second table.
• Inner Join – Links data where the ID can be found in both tables
• Outer Join / Join – Keeps data from both tables linking where it can (same as join)
The ApplyMap function has clear benefits although If overused and can cause confusion within the dashboard script due to the fact it behaves a little like a subroutine (so use sparingly)
• Speed (compared to left Join)
• Null Handling
• Data Integrity
Often used within QlikView examples as a way to tidy up data such as various spellings of the same word the ApplyMap function can also be used to Join a single field. (I also use it to determine the latest record within a table.)
If we take the first example of Left Join and use the applymap function instead we can handle the null value left in the output table
Over a large data set the speed also increases by using the applymap function and now we have the null value populated these records can be selected within QlikViews User Interface.
If we had duplicate Key fields (ID) in table a then the output records wouldn’t be duplicated although we would only map the first value encountered so again ensure you understand your data and make steps to combat this. You can link on multiple fields by creating a composite key (concatenating fields together).
Joins and ApplyMap links data together on matching rows (as defined by the key). Concatenation stacks tables one on top of the other sharing fields where they’re named consistently.
In this image the black coloured cells denote null values and you may become uneasy regarding the amount of null fields you may have in your final table. Don’t worry, QlikView handles Null values very efficiently and there is no real performance issues to this type of structure (For further information please check out my blog page – The Optimal Data Structure).
I’ve also written a post that takes you through some real data and highlight the pro’s and con’s of linking and concatenating and how to overcome the challenges you may face