Before development can begin you need to know your data. Whilst working on a contract a number of years ago I created an application that may help.
Be warned it can take a long time to run the script with a large data set, you may want to limit the load size to begin with.
The Application can be found here – http://community.qlik.com/docs/DOC-3352
The QVW loads a CVS or QVD data into memory, loops through each column and field performing several integrity tests.
Date Check: The code checks each column where the name includes the word “Date”. This can be modified in the code. Each field value is checked against possible date formats and identified as either a good or bad date. Bad date distinct values are recorded for further analysis
Value Frequency: The code loads each distinct value for each column and counts the frequency they’re encountered.
Distinct Values: Counts the number of distinct values for each field, again after converting the value to lower case and again after trimming leading & trailing spaces from the value.
Distinct N/A: Uses a mapping table to convert possible N/A entries into a consistent value and counts the number found for each column.
Null Values: Counts the number of empty records for each column
Data Types: Checks to see which data type is found for each value of a column. This could be either a number or text. If all fields are returned as Number or Text the field is assigned this property. If a mixture is returned this is recorded.
This is still a work in progress and I keep adding to it as I encounter new data, I’d love to hear your comments.