notice
Welcome to the SAP blog of Jerome Mungapen, CEO of Mungapen Consulting, Inc. You'll find here my thoughts and ideas on SAP supply chain management and ERP impementations in general. If you want to find out more information about mungapen consulting, inc head on over the the main site at www.MungapenConsulting.com

 

In the part one we looked at an overview of the data conversion process, in parts two and three we looked at using Microsoft Access to facilitate the transformation step in the ETL (Extract Transform and Load) Process. In addition to transformation Access provides a lot of useful functionality to assist in the important task of reconciliation. In this fourth and final part we will look at how Access can be used to simplify and automate much of the reconciliation process.

Reconciliation Purpose

Data reconciliation is the task that guarantees both the accuracy and completeness of a data conversion. Ultimately you reconcile the target system back to its source system, however this may be broken down into multiple steps during the conversions process of extraction transformation and loading, (each of which could provide the possibility for loss in data completeness and accuracy.) Data reconciliation is usually the responsibility of the business as they are the owners of the data, however this often leads to delays in reconciliation and even low quality reconciliation as in many cases with a new system, users do not know their way around the system and many users do not have the knowledge or access to tools like SAP query and SE16N which assist with expedient reconciliation.

To overcome such limitations I prefer a shared approach whereby an IS technical resource prepares and performs some preliminary reconciliation prior to passing onto business users, however the ultimate goal of the reconciliation process is an independent review and sign off to ensure data accuracy and completeness, so this should not be left just in the hands of the people performing the conversions.

Reconciliation Steps

As indicated above reconciliation between the target systems back to the source could happen as one single step or as a number of reconciliation steps. In the diagram in figure 1 below we see that reconciliation should happen at each step in the extract, transform and load process. In fact based on timing this might be the only way to ensure a clean conversion because waiting until the load is complete to find the extract is wrong may be too late to fix any issues.

clip_image001

Figure 1 Reconciliation points

As previously stated each reconciliation should have a preliminary MIS reconciliation followed by a business centric reconciliation along with a sign off.

Although to assist with transformation we built some access databases that house the majority of the data we need to reconcile, I prefer to build a separate reconciliation database and keep the transformation database specifically for transformation. In any given project you may perform a conversion four times or more. Each time you come to use the databases you want them as simple as possible so that they are easy to understand. Of course in our reconciliation database we can link to our conversion database tables if we like, though again I like to start from scratch as an “independent” evaluation of the data.

Reconciliation Types

There are a number of different types of reconciliation that can be performed at each stage to ensure both accuracy and completeness.

Gross Level Data Checks

These are quantitative data checks that ensure that at a high level the correct data elements were created. For example in an inventory conversion program it may be possible to look at the total quantity of all inventories between the source and destination systems. If these match then three’s a good chance that the details match, if they do not then maybe it is necessary to look at a level below, maybe inventory by storage location. This could highlight a problem with a specific storage location if all match except one. This concept can be used to drill down to a detailed level quite quickly.

Initial State Analysis

One important aspect of gross level data checks is to check the total set of data in addition to what is being added. For example if I am converting materials for a roll out then there are already materials in the target system before I start converting. If I am adding brand new materials (no overlap) then after the conversion the total amount should be what was there before plus what was added. This is a good way of detecting runaway or leaky conversion programs that do more than they should. This kind of gross level check is often referred to as an initial state analysis and is more relevant during the testing cycles rather than the production cutover.

Using Microsoft Access to facilitate gross level checks is a simple three step process

· Create an aggregate level query on the source data.

· Create an aggregate level query on the target data.

· Compare the two for differences.

To create an aggregate level query firstly we have to determine what information we want to compare. Using the inventory conversion example I will use the legacy batch level inventory data.

My initial test will be at the highest level, ie check total inventory. Starting with an extract from my legacy system I will have a detail level table like below showing inventory at a material plant sloc and batch level

clip_image002

Create a query based on this just having unrestricted or the inventory level in it then set it as a total with the sum function

clip_image003

This gives a result of 4,708,363 units

clip_image004

I can now extract the inventory data loaded into my target system and create a similar query. To save time I will reconcile against the output format file previously created.

Creating the same query on my output format table I get 4,708,256 ( I manually fudged the numbers so they wouldn’t match)

clip_image005

At this level it is easy to see the numbers are different The third step is to create a query that calculates the difference, but this is only useful when you are drilling into multi line details.

Now that we’ve identified a discrepancy we need to drill down to another level of detail. I could drill down by plant then storage location then material then batch, (starting at the highest level then progressively going to the next level of detail). However my example has a relatively small data set so I will go straight to the material level to identify where the deltas are.

Going back to my two totals queries I will add material as another field that will be set up for the group by function. This will total the inventory by material.

clip_image006

Giving me the following results

clip_image007

clip_image008

Now that we have more data I want to create a third query to compare the two and highlight the differences.

I can do this in one of two ways I could create a query using an outer join between both fields this will show me everything in my first table and second.

clip_image010

You can change the join type by right clicking on the join and selecting the middle option

Now I can select both fields from the first table and both from the second (as we are using the legacy as our system of record we will have the legacy table as the first table)

clip_image012

Running this query give me the following

clip_image014

This highlights two materials for which the combination of material and stock didn’t have a match on the target side, ie these two materials are where my issue lies.

I can change my query to only show me the discrepancies, which is particularly useful if the list is a long list. To do this I just set an is null criteria on one of the fields in my second table.

clip_image015

This report now only shows the differences

clip_image017

This method of comparison will quickly highlight the affected materials but it doesn’t give me any clues to by how much each item is off etc. A better method is to calculate the difference between the inventory in each query then look for the difference not equal to zero. To do this I create a new query based on both my material level queries.

I create a join between both my totals queries but only using material as a key (as I know quantity does not match)

clip_image019

I add the quantities from each query then create a calculated field called diff which has is the difference between the two inventories.

clip_image020

This gives me a result like below

clip_image021

Now I can easily see that material 330.123.21 is 110 over and material 330.123.24 is three under. Again for a large list I could add a criteria for diff <> 0 to only show me the deltas

clip_image022

clip_image023

At this point I can go down to the next level of detail (the batch) to see which line items are different. At this point we have in affect drilled down form a gross level check to 100% validation.

100% Validation

The concept of 100% validation without using a tool like Access would be an extremely difficult prospect, as we saw above in our gross level checks we can quickly drill down and find problem areas. However the gross level checks should be a simple pass fail test, if the data fails then at some point we need to go down to 100% validation to determine what the difference is (and correct it). For this we use the same techniques without aggregation of our data.

In our inventory example this would mean creating a query joining our legacy and target tables together at the lowest level of detail.

I.e. joining based on material /plant / storage location and batch. At this level we are at the line item level.

This may cause some issues in that we need to recreate some of the mapping. In my two example tables I have simplified this keeping material and storage location constant however there is a mapping of plant that is different. In our transformation database we had using a mapping table to change the plants based on the logic below

clip_image024

This gives two possible source plants that point to a single target plant. This is a unidirectional transformation. Ie we can easily and repeatedly transform from legacy to target. However we cannot accurately transform back from target to legacy. As the new plant 2101 ambiguously maps to either 1001 or 1003 (but there is no way of knowing which one it actually was.

Because of this unidirectional transformation when reconciling we must always follow the same transformation direction. Looking at my legacy and target tables to compare below this means I need to transform the plant in the legacy table to appear as it would in the target system

clip_image025

clip_image026

With a complex table we may start to replicate all the transformation steps that occurred in the transformation database and it may be tempting to just use that database, but by using a second database we also get the opportunity to discover any transformation errors we had in our first database.

In our simple example we can use the original mapping table you can even reference it from the original access database

Assuming I had created a new reconciliation database that just has my two tables the legacy and the target system

clip_image028

To link or import the mapping table this can easily be done using the external data tab and selecting Access as the source. If there are a number of conversion objects, it may be worthwhile to have just one Access DB that only has mapping tables, then both the transformation and reconciliation databases can link to these tables. This has the advantage that any updates to a map will be usable both for transformation and reconciliation alike.

clip_image029

Figure 2 Shared Mapping tables

Selecting Access as the external object you can link to a table from another access database

clip_image031

The mapping table can now be selected

clip_image032

The linked table now shows up

clip_image033

I can now create 100% validation query mapping the old plant for new as below

clip_image035

In this query I have used the mapping table to create a join between my legacy table and my target table. I have set up keys between all the things that should match and added data from both tables to my output. I have also created a difference column

clip_image037

The report shows the right number of records although it is difficult to determine where the discrepancies are so I can add a <>0 criteria to focus just on my discrepancies

clip_image038

This would give the following 100% validation report highlighting issues.

clip_image040

Now I have quickly identified three batches that have some issues. Drilling in I can find out why these are different and easily create a file to fix the quantities by receiving or reversing some quantities.

Being able to quickly identify the differences between two datasets is the core of reconciliation and Microsoft Access makes this task a breeze.

Again going back to part one in this series, we need to view data conversion as a process and not just a single event, in the same way we have to perform multiple mock conversions we should perform multiple mock reconciliations. This will allow us to create our reconciliation databases in advance during the testing phase and automate them with macros, thus when we get to the production cutover it should be a relatively fast and simple process to perform quick reconciliation.

Statistically Relevant Sampling

The third kind of reconciliation we mentioned is statistically relevant sampling by which we mean sampling a number of records for data quality in the target system. Although Access cannot directly be used for this process as it requires logging into the target system and visually checking a number of records. I could see that where access could help is if it is decided that a statistically relevant sample is 50 materials you could create the random list of 50 materials using a query.

To do this you need to create a top 50 query based on a random number column

To create the random number column you can use the RND function however you need to seed it with a number. If you have a number field that is variable in your data (like inventory) you can use this if you don’t then the best way is to use an auto number field in your table. If your table doesn’t have one then create a new table with an auto number field then use an append query to copy your data into the new table.

Copy the table structure using ctrl C and crtl V

Then add an autunumber field and make it a primary key (so you don’t get duplicates)

clip_image041

Now create an append query to load this table

clip_image043

Now the data has an autonumber field populated

clip_image045

Now we can create a top 50 query based on a randomized result of our auto number field.

clip_image047

This query will now produce a different top 50 each time it is run, as I don’t care to see the auto number of the random number I can deselect them from the output giving a report like below

clip_image049

Because this query changes every time its run I need to create a macro to export this to a spreadsheet so that I can send it to the business to perform the validation.

Summary

In this part we saw how with a few simple queries 100% data validation is a simple process using Access. As is often the case on a project the validation is left up to the business that often lack the tools and the training to perform such analyses. Even if Access isn’t used as a transformation database it should be considered as a reconciliation tool.

Post to Twitter

 

In this part we look in more detail at how to perform some specific transformation processes using Microsoft Access.

Advanced transformation Topics.

The database in this example was relatively simple and you could argue that a single query written on the load file could complete all the transformation and formatting changes required. However breaking the process into steps with staging tables allows for a more transparent process Also some transformations require that staging tables must be used.

Removing Duplicate Entries

One transformation that may often be required is the removal of duplicate entries. In our example database if we were receiving multiple extracts from multiple legacy systems we may have duplicate entries occurring.

The easiest way to remove duplicate entries is to create a table with the same structure as the data you have with primary keys defined for the combination of fields that needs to be unique. In our example we will create a copy of tbl:Load:MCHB:Generic we will use the cut and paste technique to create a table tbl:Load:MCHB:Unique

Then in the design view you select all the fields that combined need to be unique

clip_image002

In the example above we are saying that the combination of material/plant/Sloc and batch represent a unique entry. So if we have multiples of these then there are duplicate entries that need to be removed.

Now all we need to do to use this table is create an append query to populate this table as the appending process will eliminate the duplicate entries.

When running the append query in foreground you get a message similar to the one below indicating that records were removed due to key violations

clip_image004

Cleaning up Spaces

Data elements from the source system may often have multiple spaces and leading and trailing spaces. It is a simple task to remove spaces from a field using the trim function. Create an update query that includes the fields that you want but do not include the field that you want to remove the leading and trailing spaces. Add a field description for your cleaned field followed by a colon then select the expression builder button

clip_image006

Then insert the Trim$ function with the name of the field to be cleaned up in square brackets

clip_image008

Concatenating Data

Another common task is to concatenate fields together this can be done within the expression builder using the ampersand symbol (&)

In the example below I have concatenated the material and description fields with an underscore symbol

clip_image009

Parsing Data

As much as concatenating data it is often required to parse data into separate columns a common example of this might be an address field that needs to be split into multiple fields. Parsing can be achieved by using certain characters to represent splits in data elements (such as spaces or commas) or parsing can be completed based on the number of characters (fixed width) ie the first 10 characters represent the first element and the next 5 represent another element and the last 20 represent a third element.

In the example below I have an address that is all defined in one single field using commas as the delineation

clip_image010

The transformation required is to parse this into the table structure below with separate fields.

clip_image011

The first step is to create a query with some intermediate fields that identify where the commas are to do this you can use the Instr function that identifies the position of a specific character in a text string.

The instr function has four parameters

clip_image012

The instr function has a start position so once the first comma has been found the second field has to start after the first. In out example there are five fields being parsed by four commas so four fields need to be defined to identify where the commas are. For this we create four fields firstcom, secom, thircom and fourcom, each one of these uses the instr function to identify where the next comma is

clip_image014

The formulas are

firstcom: InStr(1,[address],",",1)

secom: InStr([firstcom]+1,[address],",",1)

thircom: InStr([secom]+1,[address],",",1)

fourcom: InStr([thircom]+1,[address],",",1)

Each one starting at the previous position plus one character

Additionally it is necessary to determine the length of the address field. This can be done using the len command.

length: Len([address])

Once all the commas have been identified then the left$ and mid$ functions can be used to parse out each field.

clip_image015

These use the following formulas

Field

Formula

Street_num

street_num: Left$([address],[firstcom]-1)

Street

street: Mid$([address],[firstcom]+1,[secom]-[firstcom]-1)

City

city: Mid$([address],[secom]+1,[thircom]-[secom]-1)

State

State: Mid$([address],[thircom]+1,[fourcom]-[thircom]-1)

Zip

zip: Mid$([address],[fourcom]+1,[length]-[fourcom])

   

Adding this append query to update the parsed table will convert the data.

clip_image017

If this was a fixed width file the task could be completed with a mid$ function with constants to separate the fields.

If there are many fields to parse this process could get quite involved creating variables so the process could be simplified by a custom function to parse separate words.

Custom Parsing function

To create a custom function you need to use VBA in a module.

This function will have three parameters,

· The string to parse

· The delimiting character

· And the word number you want.

In our example if our address has five words. So If I want to pull out word three I could use my custom function like so

PARSE([address],”,”,3)

This will simplify our query significantly as the query will look like below

clip_image019

And this parse function can parse any string without the need for complex queries using different delimiters.

To create a custom function you must first create a module from the create module

clip_image021

Within the module create a custom function called PARSE

So what king od VBA code do you need to perform this function. Below is the function code

clip_image023

The code basically does a couple of things. Firstly it loops through every character in the string looking for the delimiter character. When it finds one it notes the position of that character and stores is as the end point of a word. It then knows it’s looking for the endpoint of the next word so it continues looping until the next character is found this represents the end of the second word.. And so on until the last delimiter is found. In this case the end of the last word is the total length of the string. This is stored in an array called words and when filled with our example looks like below

clip_image024

From this we can see word one ends at position 3, word 2 ends at position 17, word three at 29 and four at 38, there are in fact five words so the last one ends at the end of the string position 43.

Once it has a word count and last position. It basically uses the standard left$ and mid$ functions to separate out the words. It uses the Select Case / Case function to determine different choices based on if it is the first or last word. In my particular case if someone enters a word higher than the number of words it returns a blank.

You could easily modify this function to simply give the word count, which would be a useful function in data conversion, as in our example you could quickly analyze 1000’s of addresses to see how many words there are. This would tell you if they would all transform into the format you are expecting. For example in some cities there may be a postal district there might be six words as opposed to five, immediately highlighting a problem.

Replacing Data

Replacing data based on its value can easily be achieved using the standard Replace function. In a similar way to the standard Microsoft ctrl H function you can find certain strings and replace them in a source string. The replace function has six parameters

clip_image025

Using the address table as an example if I wanted to replace the word Way with Street I can create a query using the following formula

clnadd: Replace([address],"way","Street")

clip_image026

Giving the results below comparing the original field with the transformed field.

clip_image028

Again using staging tables and update queries allows you to “show your working” and gives the option of going back a step if the destructive transformation has been done incorrectly.

Removing non ASCII characters

Another very frustrating data conversion issue that can happen is when dealing with data extracted from mainframe systems that may or may not be unicode compliant. In some extracts characters may be embedded that do not show up in a text editor. This is really frustrating as you can look at the data with a text editor and just see spaces. These spaces in many cases may not be real spaces (as defined by ASCII character 32) but are in fact characters that may be displayed quite differently in a unicode system.

Below is a grid of the standard ASCII character set showing characters 1 -128

clip_image029

You may find that the extracted text contains characters outside of this range that need to be limited to the 128 characters in the standard set.

In this case we need to create a custom function that loops through each character in a text string to find its associated Ascii code, then if the code is higher than 127 we can replace it with a safe character such as a space (ASCII 32)

In my example I have table tbl:non Unicode that contains some characters outside of the normal 0-127 ASCII range. In an simple ASCII system these will display as spaces and it may never have been known that there were additional characters embedded, however in an extended ASCII or Unicode environment these will display

In my example below the legacy system displayed a description

Rectangular blank 3 mm by 6

However there are additional ascii character higher than 127 which in our Unicode compliant system will display as

Rectangular blank 3 †œÇmm byàõ6

Using a custom Clean Ascii function CLascii I can limit the text to a max ascii character of 127 and replace anything higher with 32 (space)

The query below uses the custom function CLascii to clean the text

clip_image030

Giving the result

clip_image031

The custom function is very similar to the parsing function but even simpler

clip_image033

Depending on how the text files were imported into Access you may or may not see the characters, I have been confounded for hours trying to work out what was going on because the text files were encoded showing these characters as spaces even though they were not.

Validations
Field Validations

In a similar way to creating a staging table to remove duplicates a staging table can also be used to perform field level validations, these at a simple level could be just format checks such as dates or numeric values. Or at a more detailed level they could be limiting the values to predefined lists of values. Field validations can be added to any field in a table, within the validation area you can define a formula that needs to be true for the data to meet a validation. One thing to note is that if you want to validate a field only contains values from another table then you should not use validations for this purpose, rather create a query with an outer join then look for the missing entries in the second table. Although validations can check against a list of values, you have to key them in (as opposed to reference a table)

clip_image035

In the example above the validation checks that the first digit of the batch is not equal to B, for batches that violate this rule they are eliminated from the load process when running the append query.

clip_image037

Creating a validation staging table allows for a query to be created with the generic table to identify which records fell out of the validation.

Lookup Table validations

If a certain data element cannot be automatically transformed then unfortunately it will need to be manually transformed by a user going through line by line transforming the data. I many cases the user will be selecting certain values from a configuration table and in this case you can assist the user by loading the data into a staging table for manual transformation whereby you can limit the allowed values that can be entered in a field using lookup table validations.

For example if the material group mapping is complex and being performed manually by a user then you might want to extract the material group configuration table and use it as validation table. To set a lookup validation against a field select the lookup tab and select the list box option. You will now have the option to select the lookup table you want to validate against.

We have a table tbl:Material:manual_transform which has been loaded with some material data, however assigning the material group will be a manual task.

To prepare for this there is a material group table that shows the allowed values

clip_image038

As you can see it may not be able to programmatically determine if something is paint or shellac or a finish, or if a fixture is a bolt or a nut. So in this case this will be a manual transformation.

Setting up the material group field on the tbl:material:manual_transform you select the lookup tab, then select combo box, and the table that holds the values.

clip_image039

You can select the number of columns to show from the lookup table and also limit the values to the list values using the limit to list option

Now in the manual transform table the user can enter values from the drop down

clip_image040

Splitting Data Files

We have looked at some of the transformation tasks that an Access database may be required to do but another important task an Access database can help with is preparing multiple load files quickly and efficiently. We already looked at when creating a database we start with the end in mind creating a table in the output format so we can simply export as a text file to drive our load program. However on a recent project I was required to create a load file for batch level inventory where each material / batch combination had to be loaded separately into the target system. Some materials had so many batches that they would run for a great deal of time. As the inventory conversion must happen during the blackout we wanted to run this program from multiple files in parallel to reduce the overall run time.

It was decided that we could spare 21 background processes during the inventory conversion so to maximize the load we would create a 21 input files, the top 20 materials in terms of number of batches and the rest.

Without any automation this would be a considerably daunting task. You could create 21 queries each specifying a different material and the 21st excluding certain materials. However the problem was that the top 20 materials were changing on a daily basis so a more flexible approach was required.

Establishing the Top 20 materials

The first step is to dynamically generate a top 20 list, in my example I needed the count of batches by material, which can be achieved by a simple aggregation query

Starting with my output format table it is a simple task to create an aggregate count of batches via a query

Create a query on the output format table and add the two fields Material and Batch then select the aggregate or sum function (sigma)

clip_image042

Now we want to group by material and select a count of the batches

clip_image043

In my sample database I only have 24 materials defined with the batch counts below

clip_image044

So in my this case we’ll create a top 10 and not a top 20

To limit the result of this query to the top 10 back in the design view we select the return values and enter 10

clip_image045

This limits the report to the top 10, adding a descending sort I get the result below

clip_image046

I can now use this table to drive my individual outputs.

Parameter Queries

If I was trying to avoid using VBA then the only viable way of being able to create separate reports for these would be to use a parameter query. This is a query where you can specify a parameter value when you run it.

Create a normal query for the output format then press the parameter button

clip_image048

In the pop up create a parameter mat_t10

clip_image049

Now you can specify this as a criteria for the material field in the query using [mat_t10]

clip_image051

Now when running the query it will ask you for the parameter

clip_image052

You could create a macro to run this query then output the file. You would have to cut and paste the material number each time. (which is not very automated) A much better way would be to use a little VBA to completely automate the task as in the next section

VBA approach using Record Sets

In a similar way to creating the parameter query above, you can create a parameter query that specifies a custom function as its criteria, in this way you can use VBA to control the material number being passed

Below is a query based on our output format table

clip_image054

You can see that the criteria for material is Top_20_mat(). It is not in square brackets because it is not a parameter this is a visual basic function.

In a similar way that we made some custom functions for Parsing and cleaning up data we can create a VB function to represent the material value.

Before we dive into the function let’s take a step back and think about how we will automate out task.

Firstly we want to run our top 10 query and load that into a top 10 table, we can use standard update queries and a macro to perform these steps.

Secondly for each material in the top ten table we want to run our parameter query then export a text file with the naming convention that uses the material number.

Then we want to loop through the remainder of the top 10 doing the same two functions

For this we will need to set each material in the top 10 as a parameter and perform the above two steps.

To achieve this we will need to use a global variable that is available to all objects in the database

Within the visual basic editor create a new module (you don’t need to create a new module you could use the one already created, but to keep it simple I’m creating a new module for these output functions

clip_image056

clip_image058

The first thing we need to do is declare a global variable that can be seen by multiple objects in the database.

clip_image059

This is a public variable and it is outside of any functions so it can be seen by multiple functions.

The next step is to create the function top_20_mat() which will be used in our parameter query

clip_image060

All this function does is set its value to whatever Gmat is

Now we create a subroutine that loops through the top 10 list and for each record it sets GMAT to the current record.

It then runs the query and transfers it to a text file using the “output” specification we set up in a previous section.

This function uses a record set to be able to loop through the top 10 table.

clip_image062

Now all we need to do is create a macro that runs this function

To achieve this we use the runcode command

clip_image064

Then specifying out outputt10 function

Running this macro will create 10 output files instantly

clip_image066

Summary

So far we have seen how data conversion as a process fits in to the ERP implementation process. We have also taken a look at the ubiquitous Microsoft Access and how it can be used to develop a repeatable conversion process. There are many specialized ETL tools that are available that provide a full suite of tools to simplify the conversion process. These tools usually come with a hefty price tag and learning curve. Microsoft Access 2010 comes as part of office 2010 professional that is usually already available and can generally be considered free (almost free at $140) or at least a sunk cost. So before you jump straight into Excel consider more appropriate tool such as MS Access

Post to Twitter

 

In part one we looked at the general process of data conversion within the context of an SAP implementation. In part two we look at the use of Microsoft Access to facilitate data conversions and set up a sample conversion database

Transform and Reconciliation

There are two major tasks during data conversion that may benefit from the use of Microsoft Access. The transformation of data from the as is design into the to-be, and the reconciliation of loads back to the source system.

Transform

Transformation covers two main criteria

· Transforming the format of the data from that extracted from the source system to that required by the target system load programs

· Transforming the context of the data from the as is design to the to-be.

Format transformation is relatively straightforward process, starting with the load programs required data format it is a simple case of ordering the fields in the required order and format. This may result in a flat file structure of a hierarchical structure. The more complex task is to transform the context of the data from the as-is to the to-be design. This may include such elements as substitution, field parsing and concatenation to transform data. At its most complex there may be no systematic way of transforming data and manual input may be required to complete this process.

Reconciliation

Reconciliation covers the systematic comparison of the loaded data set, back to the source system data set. This can often be a series of reports that are used to validate that the data set is complete and that the quality of the data is good. As mentioned in part one, reconciliation can broadly be broken down into three types of validation.

· Gross level checks

· Statistically relevant sampling

· 100% validation

When To Use MS Access Over Excel

When deciding to use Microsoft Access as part of the data conversion process the immediate question is why not use Microsoft Excel. Microsoft Excel is fully capable of data transformation and reconciliation reporting, and most consultants are familiar with excel whereas far fewer use Access.

The Lure Of Excel

It’s a fact that most consultants and computer literate project team members will be familiar at some level with Excel. Excel provides a fast and easily understandable way of preparing files in a tab delimited format which is often used by load programs. With the use of complex formulas and Vlookups it is a relatively straightforward task to be able to transform the context of the data into whatever flat file format you may require. Additionally SAP allows for data to be extracted directly into Excel format, which can then be easily summarized by the use of pivot tables to be able to easily compare back to a source system. This combined with the fact that excel is ubiquitously loaded on the majority of computers, so in all likely hood Excel would be the de-facto choice for transformation and reconciliation.

The Value Of MS Access

So one might question why you would want to use Microsoft Access, which is a less widely known tool, and depending on which version of Microsoft Office you have you may not even have Access installed on your computer.

The main advantages of using Access over Excel are transparency in the conversion process and the ability to build a repeatable process regardless of the data.

When transforming data with excel you start with your extract format, you can copy and paste fields into the correct format and use vlookups to change the context of data to the to-be copying data and cutting and pasting into the desired to be format. I most cases going through this process can take some time and at the end you had destroyed the input file to make the output file format. If at the point of completing the transformation someone provides an updated input file, more often than not you have to manually repeat the steps to get a new output file. Of course it is possible to automate things with macros and maintain separate input and output sheets so you can always link back to the input format, but in general this is a destructive process using excel.

With Microsoft Access you build the structure independently from the data. This allows you to define tables in the input file formats and output file formats independently of any actual data. Although this may appear to be a more involved process the first time round, the real beauty is that once the transformation has been developed based on the structures, then changing the input files to create new outputs is a simple and controlled process. This becomes especially useful when you consider that you may be performing multiple data conversions.

Developing a Repeatable Conversion File with MS Access

Because the data is separate from the structure using Access then it is much easier to create a repeatable process. In fact the data itself is incidental when building the initial database. This forces the developer to focus on the process of the transformation and not just transforming the data at hand. SAP has many controls to ensure that configuration and code changes go through a formal change control process; however the data feeding a conversion program generally has no such controls, leaving it entirely up to the user to ensure that they have correctly completed all the transformation steps.

Thought (design) process in developing a conversion database

Two good design principles to follow when designing conversion databases are;

· Build the database for a specific reason.

· Start with the end in mind

I find it is better to build a number of separate focused databases that are conversion specific rather than trying to build a large multi-purpose database. If a database performs just one function, you’ll have a better chance of maintaining it throughout the project and even being able to hand it off to someone else during the course of the project. I would generally create one database per load program and us it just to prepare the load files for that program. Even though it might be tempting to make it more generic and incorporate reconciliation functions I think it is easier to fault find the simplest database possible when you encounter a problem at 2.00am during the production cutover.

The second principle of starting with the end in mind is straight out of the “seven habits” but holds true in this specific scenario. For instance when building a transformation database the first table I create is a table that is in the file format of the load program that will be consumer of this data.

clip_image002

Figure 1 Start with the End in Mind

I generally start by creating a table with the format of the load file used by the load program. The next step is to create a generic table based on the source system extract. This can be done by importing a sample extract from the source system and giving the table a generic name. The third step I find useful is to duplicate this structure to represent the post transformation (of context) data prior to formatting into the output file format. This can be used to help with the lookup tables that need to be defined to transform data from the as-is context to the to-be context.

There may be opportunities to reduce the number of tales required as queries can also be used to transform data but I often find it useful to “show my working” with various staging tables that can be used to show the data at various stages. The use of staging tables adds steps to the transformation process these can be automated via macros that ensure that the process is repeatable.

Sample conversion database

To walk through the process of creating a transformation database I’m using a hypothetical inventory conversion to demonstrate the principle. For simplicities sake the source system and target systems will be SAP. Inventory is batch managed and will be received into unrestricted stock the conversion will include some vendor consignment. Material and batch conversions are assumed to be completed and validated as pre-requisites to this process. As part of the transform task, material numbers, plants, vendors and storage locations will be transformed based on the following logic.

The old plants are numeric plants 1001,1002,1003 and they will map using the following logic

Old Plant

New Plant

1001

2101

1002

2102

1003

2101

The new storage location in each plant will be “RAW” regardless of the source storage location

Materials cannot be mapped by logic, however an extract of the material master within the target system will allow mapping by use of the old material number field

EG material 12345 is extracted from the source system and it maps to material abcde in the target system which has a value of 12345 in the old material number field.

With simple logic like this you could get the load programs to perform these transformations programmatically using tools like LSMW but this is just a fabricated example to see how to do this in Access.

Preparing the Output Format

The load program has been developed to use a tab delimited file with the following fields

clip_image003

Figure 2 -Load File sample format

The program also expects a header row with the column names as indicated.

The first Step is to create a table called tblOutput_Format

Creating a table directly in design view

clip_image004

I create all the fields in the same order as the load program file as text types. There is no need to create a primary key for these fields.

clip_image005

Extract

The input to the transformation process is the extracted file from the legacy system in the legacy system format. In our case our source system is also SAP. For simplicities sake we will assume that 100% of our materials are batch managed and are not managed in warehouse management. Therefore we will use table MCHB to be the source table for our inventory levels. If we had a combination of batch and non-batch managed materials then we would have to use a combination of MARD along with MCHB. Or you could build an SAP query based on an outer table join between those tables to have one extract to provide data for both scenarios. As in our example we will be using a single table then we will base the extract on the SE16N transaction output to a text file.

From the SE16N screen you can either select just the fields required or just extract the table as a whole. If you extract only a reduced set of fields from the MCHB table then you will need to ensure that you consistently pull the same fields into you’re extract each time.

clip_image007

Figure 3 Sample Se16N extract

Extracting Data Into A Suitable Format

Once the SE16N is run you can select the export as local file unconverted option (although Access will read excel filed directly there is more flexibility in the import interface to load from a text file. Plus for large data sets the unconverted option processes much more quickly than the spreadsheet option.

clip_image008

Unfortunately the converted file format from SE16N has a few header rows that need to be removed prior to loading into Access. Open the file in Notepad and remove the header line and the separator line

clip_image010

Figure 4 Raw file produced by SE16N

clip_image011

Figure 5 Text file cleaned in notepad

Now the file is in a format to import into Access

Importing the source extract into Access with a specification

From within Access select the external data import text file option

clip_image012

Select the MCHB extract file and select the import source data into new table in the current database and select OK

clip_image014

From the text import wizard select delimited and next

clip_image016

In the choice of delimiter select the pipe symbol |, select first row contains field names and text qualifier of none

clip_image018

Then click on advanced to fine tune the import format

clip_image019

Now you can adjust the field names and formats and even skip fields. As we described in the first part of this article data conversion is a process that happens multiple times on a project and not just once, we want to make this process as streamlined and repeatable as possible. So we can now save this import as a specification using the save as button

clip_image020

This will allow the import to be automated in a consistent manner. Continue to save the import, at the prompt for defining primary keys; Access will default to creating an extra field that forms the primary key. I generally do not bother defining a primary key on import tables as I just want the structure of the fields I have included in my extract

clip_image022

However there can be one instance when letting Access define a primary key can be very useful, and that is if you get data type import errors.

When you select the import wizard and import data, Access only reads the first 20 records and tries to determine the data type in many cases access can get the data type wrong, for example the first 20 materials may be internally assigned numerical values, however there may be externally assigned alpha numeric values further into the data set.

clip_image023

If access finds these errors it creates an import error file that just indicates the row number that failed. If you let access set the primary key as a counter field then the row number is also the number of the primary key field.

clip_image025

This can often facilitate fixing the import. I generally set everything to text except for quantity values which I set to doubles and dates which I set as date fields, however in some cases you can still get errors in which case I do a second import with everything set to text (you will not get errors with this data type) and allow access to set a primary key. Then you can create a query using the import error table and the text import to find the entries that fell out and determine what the issue was.

Assuming that you did not get any import errors you now have two tables in your database.

The extracted data from the legacy system and the required output format

clip_image026

The Use of Staging Tables

As we go through this conversion process it will be clear that it is possible to perform many of the transformation tasks just by the use of queries in a single step, so you may think having multiple tables is overkill. However I have found it useful when dealing with multiple data conversions to have a similar design concept between all my databases, so If one day I’m working on a material master conversion then switch to a inventory conversion that I have worked on in a few days I can easily understand what the database does. To this end I like to separate the transformation process using staging tables.

The first staging table I like to have is a generic table in the source extract format. This is generally identical to the table we just imported, so why do we need it? Well as you go through data conversions you will be obtaining multiple extracts from multiple systems and points in time which you may need to process. So each time you extract from your source system you probably have a date and time stamp on the file. You could change the name of your import file to a generic name and then reimport data each time you want to update.

In the example below there are three time stamped extracts. One way of importing them into Access would be to first change the name to a generic name run through the import wizard and save the import. In this way you can import new data files over the top of the old just by changing the file name to a generic name.

clip_image027

I prefer not to use this approach as I like to have a record of what extract was used to generate the load files and sometimes you want to compare extracts to see what the delta is (you could use this to generate a delta load file) So if you decide to keep the separate file names then you will need to import them separately into Access, meaning they will be imported as separate tables in Access. This means any transformation and output queries might need to point to a new table each time you import. To overcome these changes the first thing I do is create a generic table, then base all my transformation queries on the generic table (which never changes) then each time I import a new file, I add one new update query to load my generic table then I’m done.

The quickest way to create the generic extract table is to select the table you just imported and do a copy and paste. Access will ask if you want to copy just the structure or the data as well. From the wizard change the name to your generic load and select structure only.

clip_image028

Now you have a new generic table which won’t change regardless of whether you are loading a QA system the training system or the production system you can base all your transformations on.

Using action queries to load and flush staging tables

As soon as you create a new staging table you need to create two action queries, one to delete the contents and one to load the table.

To create a deletion query you create a query based on your new staging table

clip_image029

Select the staging table then close the add table dialog

clip_image030

You now have a blank query based on your staging table

clip_image032

We want to delete all the fields so we can either drag and drop the fields into the bottom report section or just double click on the asterisk to bring all fields into the query

clip_image033

Now you can change the query type to a delete query and this query effectively empties the staging table.

clip_image035

When a query is changed to one of the action types (update make table delete etc) then the run button is used to actually perform the action.

Save the query with a suitable naming convention so that you’ll know what it does.

Next you need to load data from your extract table. This query will be specific to the file name you load each time so you’ll need to recreate this query each time you want to import a new file.

Create a query based on your extract table, select all the fields either individually or by the asterisk then make this an append query. Selecting the generic export table as the destination

clip_image037

clip_image038

The database now has three tables and two queries

The extract file with data

A generic table in the extract format

A table in the output file format

A deletion query to empty the generic table and an append query to load the generic extract table.

clip_image039

Setting Up A Repeatable Database

At this point before we create too many more tables and queries it is a good idea to organize how Access displays the objects. The default view separates objects by their type, e.g. tables are grouped together and queries are grouped together. Using the custom navigation options it is possible to split the database objects into functional groupings.

I like to organize the database with the following groupings

· Raw Extract Files

· Mapping Tables

· Generic Extract

· Transformation

· Output

· Staging Queries

· Validation Queries

· Macros

These can be set up by right clicking in the objects area and selecting Navigation Options

clip_image040

Then selecting the custom category and adding the custom groups as desired

clip_image042

Now you can activate the custom view from the objects area by clicking the down arrow then selecting custom as the view

clip_image043

clip_image044

You can now drag and drop the various objects into the sections as required. This will become especially useful as the database becomes more complex

clip_image045

Automating With Macros

For every staging table you define you will need at least two staging queries, one to delete the contents of the table and one to load it. If your database has five or more staging tables then you can see there are a lot of queries that need to be run in Sequence to perform the transformation. To keep this process in control I create macros to run the various steps, doing this can make running the database a simple number of macro clicks. The macros required may change based on the particular database requirements but generically I will have macros to do the following

· Import extract data

· Load generic extracts

· Transform data

· Prepare output files

To keep track of which queries go in which macros I like to create the macros as I go and only move my queries to the staging queries folder when I have them referenced in a macro

As we have created the staging queries to load the generic table we can create the Load Generic Extract Macro.

To create a macro select create macro from the create tab

clip_image047

Select the show all actions button then add the setwarnings action

clip_image048

Once selected set the parameter to No, this will prevent access from popping up questions when executing macros.

clip_image050

Next you can drag and drop the delete query into the macro, followed by the append query to load the generic table.

clip_image052

This can also be achieved by using the openquery command

As the last statement you can switch the setwarnings back to On

clip_image053

Identifying the Data Formats Required

Now that we have a generic extract table that we have populated with our specific extract data we can look at how we will perform the transformation into the to-be data context

Transform

Highlighting the mapped fields in green the hard coded fields in blue and the values passed through in yellow we see the following

clip_image055

Based on the above we have three fields that need to be mapped in different ways.

The plant and the material maps will both require mapping tables with a new for old substitution. The vendor can be mapped with a simple concatenation rule so a table is not necessary

Mapping Tables

Our first step is to create the plant mapping table. This has two fields, Old plant and New plant.

We can create a new table in Access with these two fields

clip_image056

Going into the data sheet view we can enter the values manually

clip_image057

For the material master mapping we can base this table on an extract of the material master that will show the old material number and material number and the base unit of measure (BISMT, MATNR and MEINS)

clip_image058

Transforming Data with Queries

Now that we have the mapping tables required to transform our data set to the to be context we can use queries to translate the data.

Essentially we create a query around our Generic extract table that also has the plant and material lookup tables.

clip_image060

Next we want to draw relationships between the keys of the various tables. Linking material to the old material number and plant to the old plant field

clip_image061

The next step is to create the output fields you can create the output in the same format as the Generic extract table and create another staging table called tbl:Transformed:MCHB:Generic or in this case as the transformation is fairly simple the query can easily transform the data and convert into the output format.

Using the fields in the tblOutput_Format table. We can prepare the output format with the same query

clip_image063

The first column is movement type and this is hardcoded as 561. This can be achieved by creating a field called mvt typ then placing a colon : then hardcoding “561” within quotes

clip_image064

Next comes the new plant from the tbl:Plant_mapping table

Then SLOC is another hard coding and so on. Resulting in the query format below

clip_image066

Testing this query to make sure the mappings and hardcoding’s work we can see the expectant result.

clip_image068

On important thing to check is the record count, whenever you have table joins the record count is dependent on the join types and whether there is a full map or not. In this case we have the same number of records post transformation as before so we are good.

At this point we could save this query as a text output file and be done. However I like to “show working” via the use of staging tables, so we can change this query to an append query to update the output format table

When we convert this to an append query Access tries to match up the fields in the destination table. In some cases it finds a match but in others we manually need to add the target fields.

clip_image070

Running through each missing field we can select the available fields from tblOutput_fields

clip_image071

Once all the fields have been assigned the query can be run and it should update the output format table

clip_image073

Now we also need to create a deletion query and add both of these queries to a macro

Now we can create a simple query on the tbl:output_format table to create the output file format and create an text file output

Having created a query that has the format of the output file you can select to export this as a text file.

clip_image075

Select Delimited and Next

clip_image077

Select the tab delimiter, include header rows and set the text qualifier to none

clip_image079

In the same way we saved our import specification we can also save our export specification in the same way under the advanced tab

clip_image081

As you will be creating multiple output files at various points in time it is worth saving the export so it can be automated

clip_image083

This query can now be run and exported as part of a macro using the RunSavedImportExport command.

clip_image084

Preparing a new load file

Now that the database has been built based on the logic and structure of the data, it is a relatively simple task to prepare a new load file from a new data extract.

Because I have chosen to always link back to original load files and have created an interim generic staging table I will always import the new extract into its own table.

I can leverage the load specification to simplify the import into the new table.

In the example below I have imported the extract file dated 09 01 11 using the specification I used for my original sample. To load this data into the generic table I will need to update the append query AP:tbl:Load:MCHB:Generic to pull from the table MCHB_2600_09_01_11

clip_image085

I could of course have imported it in to the original table (MCHB_500_sample) though I would have lost my traceability and ability to compare subsequent extracts.

The easiest way I have found to update this append query is to open it in the SQL view

clip_image087

You can then cut and paste the SQL into a text editor such as notepad

clip_image089

Then use the replace function (ctrl H) to replace the table name to the new table

clip_image090

You can then paste it back into the SQL view of the query. You can validate the change by going back into the design vie and the query will show the new table

clip_image091

At this point the macros previously created can be run in order and perform the transformation on the new data extract and create the output file automatically and almost instantaneously.

You have now understood how to set up an Access database to drive a data conversion process as a repeatable process, in part three; we will look at some specific transformation techniques that can be used to automate many of the transformations that may be required.

Post to Twitter

Get Adobe Flash playerPlugin by wpburn.com wordpress themes
TOP