Data Conversions with Microsoft Access Part Four

 

In the part one we looked at an overview of the data conversion process, in parts two and three we looked at using Microsoft Access to facilitate the transformation step in the ETL (Extract Transform and Load) Process. In addition to transformation Access provides a lot of useful functionality to assist in the important task of reconciliation. In this fourth and final part we will look at how Access can be used to simplify and automate much of the reconciliation process.

Reconciliation Purpose

Data reconciliation is the task that guarantees both the accuracy and completeness of a data conversion. Ultimately you reconcile the target system back to its source system, however this may be broken down into multiple steps during the conversions process of extraction transformation and loading, (each of which could provide the possibility for loss in data completeness and accuracy.) Data reconciliation is usually the responsibility of the business as they are the owners of the data, however this often leads to delays in reconciliation and even low quality reconciliation as in many cases with a new system, users do not know their way around the system and many users do not have the knowledge or access to tools like SAP query and SE16N which assist with expedient reconciliation.

To overcome such limitations I prefer a shared approach whereby an IS technical resource prepares and performs some preliminary reconciliation prior to passing onto business users, however the ultimate goal of the reconciliation process is an independent review and sign off to ensure data accuracy and completeness, so this should not be left just in the hands of the people performing the conversions.

Reconciliation Steps

As indicated above reconciliation between the target systems back to the source could happen as one single step or as a number of reconciliation steps. In the diagram in figure 1 below we see that reconciliation should happen at each step in the extract, transform and load process. In fact based on timing this might be the only way to ensure a clean conversion because waiting until the load is complete to find the extract is wrong may be too late to fix any issues.

clip_image001

Figure 1 Reconciliation points

As previously stated each reconciliation should have a preliminary MIS reconciliation followed by a business centric reconciliation along with a sign off.

Although to assist with transformation we built some access databases that house the majority of the data we need to reconcile, I prefer to build a separate reconciliation database and keep the transformation database specifically for transformation. In any given project you may perform a conversion four times or more. Each time you come to use the databases you want them as simple as possible so that they are easy to understand. Of course in our reconciliation database we can link to our conversion database tables if we like, though again I like to start from scratch as an “independent” evaluation of the data.

Reconciliation Types

There are a number of different types of reconciliation that can be performed at each stage to ensure both accuracy and completeness.

Gross Level Data Checks

These are quantitative data checks that ensure that at a high level the correct data elements were created. For example in an inventory conversion program it may be possible to look at the total quantity of all inventories between the source and destination systems. If these match then three’s a good chance that the details match, if they do not then maybe it is necessary to look at a level below, maybe inventory by storage location. This could highlight a problem with a specific storage location if all match except one. This concept can be used to drill down to a detailed level quite quickly.

Initial State Analysis

One important aspect of gross level data checks is to check the total set of data in addition to what is being added. For example if I am converting materials for a roll out then there are already materials in the target system before I start converting. If I am adding brand new materials (no overlap) then after the conversion the total amount should be what was there before plus what was added. This is a good way of detecting runaway or leaky conversion programs that do more than they should. This kind of gross level check is often referred to as an initial state analysis and is more relevant during the testing cycles rather than the production cutover.

Using Microsoft Access to facilitate gross level checks is a simple three step process

· Create an aggregate level query on the source data.

· Create an aggregate level query on the target data.

· Compare the two for differences.

To create an aggregate level query firstly we have to determine what information we want to compare. Using the inventory conversion example I will use the legacy batch level inventory data.

My initial test will be at the highest level, ie check total inventory. Starting with an extract from my legacy system I will have a detail level table like below showing inventory at a material plant sloc and batch level

clip_image002

Create a query based on this just having unrestricted or the inventory level in it then set it as a total with the sum function

clip_image003

This gives a result of 4,708,363 units

clip_image004

I can now extract the inventory data loaded into my target system and create a similar query. To save time I will reconcile against the output format file previously created.

Creating the same query on my output format table I get 4,708,256 ( I manually fudged the numbers so they wouldn’t match)

clip_image005

At this level it is easy to see the numbers are different The third step is to create a query that calculates the difference, but this is only useful when you are drilling into multi line details.

Now that we’ve identified a discrepancy we need to drill down to another level of detail. I could drill down by plant then storage location then material then batch, (starting at the highest level then progressively going to the next level of detail). However my example has a relatively small data set so I will go straight to the material level to identify where the deltas are.

Going back to my two totals queries I will add material as another field that will be set up for the group by function. This will total the inventory by material.

clip_image006

Giving me the following results

clip_image007

clip_image008

Now that we have more data I want to create a third query to compare the two and highlight the differences.

I can do this in one of two ways I could create a query using an outer join between both fields this will show me everything in my first table and second.

clip_image010

You can change the join type by right clicking on the join and selecting the middle option

Now I can select both fields from the first table and both from the second (as we are using the legacy as our system of record we will have the legacy table as the first table)

clip_image012

Running this query give me the following

clip_image014

This highlights two materials for which the combination of material and stock didn’t have a match on the target side, ie these two materials are where my issue lies.

I can change my query to only show me the discrepancies, which is particularly useful if the list is a long list. To do this I just set an is null criteria on one of the fields in my second table.

clip_image015

This report now only shows the differences

clip_image017

This method of comparison will quickly highlight the affected materials but it doesn’t give me any clues to by how much each item is off etc. A better method is to calculate the difference between the inventory in each query then look for the difference not equal to zero. To do this I create a new query based on both my material level queries.

I create a join between both my totals queries but only using material as a key (as I know quantity does not match)

clip_image019

I add the quantities from each query then create a calculated field called diff which has is the difference between the two inventories.

clip_image020

This gives me a result like below

clip_image021

Now I can easily see that material 330.123.21 is 110 over and material 330.123.24 is three under. Again for a large list I could add a criteria for diff <> 0 to only show me the deltas

clip_image022

clip_image023

At this point I can go down to the next level of detail (the batch) to see which line items are different. At this point we have in affect drilled down form a gross level check to 100% validation.

100% Validation

The concept of 100% validation without using a tool like Access would be an extremely difficult prospect, as we saw above in our gross level checks we can quickly drill down and find problem areas. However the gross level checks should be a simple pass fail test, if the data fails then at some point we need to go down to 100% validation to determine what the difference is (and correct it). For this we use the same techniques without aggregation of our data.

In our inventory example this would mean creating a query joining our legacy and target tables together at the lowest level of detail.

I.e. joining based on material /plant / storage location and batch. At this level we are at the line item level.

This may cause some issues in that we need to recreate some of the mapping. In my two example tables I have simplified this keeping material and storage location constant however there is a mapping of plant that is different. In our transformation database we had using a mapping table to change the plants based on the logic below

clip_image024

This gives two possible source plants that point to a single target plant. This is a unidirectional transformation. Ie we can easily and repeatedly transform from legacy to target. However we cannot accurately transform back from target to legacy. As the new plant 2101 ambiguously maps to either 1001 or 1003 (but there is no way of knowing which one it actually was.

Because of this unidirectional transformation when reconciling we must always follow the same transformation direction. Looking at my legacy and target tables to compare below this means I need to transform the plant in the legacy table to appear as it would in the target system

clip_image025

clip_image026

With a complex table we may start to replicate all the transformation steps that occurred in the transformation database and it may be tempting to just use that database, but by using a second database we also get the opportunity to discover any transformation errors we had in our first database.

In our simple example we can use the original mapping table you can even reference it from the original access database

Assuming I had created a new reconciliation database that just has my two tables the legacy and the target system

clip_image028

To link or import the mapping table this can easily be done using the external data tab and selecting Access as the source. If there are a number of conversion objects, it may be worthwhile to have just one Access DB that only has mapping tables, then both the transformation and reconciliation databases can link to these tables. This has the advantage that any updates to a map will be usable both for transformation and reconciliation alike.

clip_image029

Figure 2 Shared Mapping tables

Selecting Access as the external object you can link to a table from another access database

clip_image031

The mapping table can now be selected

clip_image032

The linked table now shows up

clip_image033

I can now create 100% validation query mapping the old plant for new as below

clip_image035

In this query I have used the mapping table to create a join between my legacy table and my target table. I have set up keys between all the things that should match and added data from both tables to my output. I have also created a difference column

clip_image037

The report shows the right number of records although it is difficult to determine where the discrepancies are so I can add a <>0 criteria to focus just on my discrepancies

clip_image038

This would give the following 100% validation report highlighting issues.

clip_image040

Now I have quickly identified three batches that have some issues. Drilling in I can find out why these are different and easily create a file to fix the quantities by receiving or reversing some quantities.

Being able to quickly identify the differences between two datasets is the core of reconciliation and Microsoft Access makes this task a breeze.

Again going back to part one in this series, we need to view data conversion as a process and not just a single event, in the same way we have to perform multiple mock conversions we should perform multiple mock reconciliations. This will allow us to create our reconciliation databases in advance during the testing phase and automate them with macros, thus when we get to the production cutover it should be a relatively fast and simple process to perform quick reconciliation.

Statistically Relevant Sampling

The third kind of reconciliation we mentioned is statistically relevant sampling by which we mean sampling a number of records for data quality in the target system. Although Access cannot directly be used for this process as it requires logging into the target system and visually checking a number of records. I could see that where access could help is if it is decided that a statistically relevant sample is 50 materials you could create the random list of 50 materials using a query.

To do this you need to create a top 50 query based on a random number column

To create the random number column you can use the RND function however you need to seed it with a number. If you have a number field that is variable in your data (like inventory) you can use this if you don’t then the best way is to use an auto number field in your table. If your table doesn’t have one then create a new table with an auto number field then use an append query to copy your data into the new table.

Copy the table structure using ctrl C and crtl V

Then add an autunumber field and make it a primary key (so you don’t get duplicates)

clip_image041

Now create an append query to load this table

clip_image043

Now the data has an autonumber field populated

clip_image045

Now we can create a top 50 query based on a randomized result of our auto number field.

clip_image047

This query will now produce a different top 50 each time it is run, as I don’t care to see the auto number of the random number I can deselect them from the output giving a report like below

clip_image049

Because this query changes every time its run I need to create a macro to export this to a spreadsheet so that I can send it to the business to perform the validation.

Summary

In this part we saw how with a few simple queries 100% data validation is a simple process using Access. As is often the case on a project the validation is left up to the business that often lack the tools and the training to perform such analyses. Even if Access isn’t used as a transformation database it should be considered as a reconciliation tool.

Post to Twitter

No comments yet.