Data Conversions with Microsoft Access Part II

 

In part one we looked at the general process of data conversion within the context of an SAP implementation. In part two we look at the use of Microsoft Access to facilitate data conversions and set up a sample conversion database

Transform and Reconciliation

There are two major tasks during data conversion that may benefit from the use of Microsoft Access. The transformation of data from the as is design into the to-be, and the reconciliation of loads back to the source system.

Transform

Transformation covers two main criteria

· Transforming the format of the data from that extracted from the source system to that required by the target system load programs

· Transforming the context of the data from the as is design to the to-be.

Format transformation is relatively straightforward process, starting with the load programs required data format it is a simple case of ordering the fields in the required order and format. This may result in a flat file structure of a hierarchical structure. The more complex task is to transform the context of the data from the as-is to the to-be design. This may include such elements as substitution, field parsing and concatenation to transform data. At its most complex there may be no systematic way of transforming data and manual input may be required to complete this process.

Reconciliation

Reconciliation covers the systematic comparison of the loaded data set, back to the source system data set. This can often be a series of reports that are used to validate that the data set is complete and that the quality of the data is good. As mentioned in part one, reconciliation can broadly be broken down into three types of validation.

· Gross level checks

· Statistically relevant sampling

· 100% validation

When To Use MS Access Over Excel

When deciding to use Microsoft Access as part of the data conversion process the immediate question is why not use Microsoft Excel. Microsoft Excel is fully capable of data transformation and reconciliation reporting, and most consultants are familiar with excel whereas far fewer use Access.

The Lure Of Excel

It’s a fact that most consultants and computer literate project team members will be familiar at some level with Excel. Excel provides a fast and easily understandable way of preparing files in a tab delimited format which is often used by load programs. With the use of complex formulas and Vlookups it is a relatively straightforward task to be able to transform the context of the data into whatever flat file format you may require. Additionally SAP allows for data to be extracted directly into Excel format, which can then be easily summarized by the use of pivot tables to be able to easily compare back to a source system. This combined with the fact that excel is ubiquitously loaded on the majority of computers, so in all likely hood Excel would be the de-facto choice for transformation and reconciliation.

The Value Of MS Access

So one might question why you would want to use Microsoft Access, which is a less widely known tool, and depending on which version of Microsoft Office you have you may not even have Access installed on your computer.

The main advantages of using Access over Excel are transparency in the conversion process and the ability to build a repeatable process regardless of the data.

When transforming data with excel you start with your extract format, you can copy and paste fields into the correct format and use vlookups to change the context of data to the to-be copying data and cutting and pasting into the desired to be format. I most cases going through this process can take some time and at the end you had destroyed the input file to make the output file format. If at the point of completing the transformation someone provides an updated input file, more often than not you have to manually repeat the steps to get a new output file. Of course it is possible to automate things with macros and maintain separate input and output sheets so you can always link back to the input format, but in general this is a destructive process using excel.

With Microsoft Access you build the structure independently from the data. This allows you to define tables in the input file formats and output file formats independently of any actual data. Although this may appear to be a more involved process the first time round, the real beauty is that once the transformation has been developed based on the structures, then changing the input files to create new outputs is a simple and controlled process. This becomes especially useful when you consider that you may be performing multiple data conversions.

Developing a Repeatable Conversion File with MS Access

Because the data is separate from the structure using Access then it is much easier to create a repeatable process. In fact the data itself is incidental when building the initial database. This forces the developer to focus on the process of the transformation and not just transforming the data at hand. SAP has many controls to ensure that configuration and code changes go through a formal change control process; however the data feeding a conversion program generally has no such controls, leaving it entirely up to the user to ensure that they have correctly completed all the transformation steps.

Thought (design) process in developing a conversion database

Two good design principles to follow when designing conversion databases are;

· Build the database for a specific reason.

· Start with the end in mind

I find it is better to build a number of separate focused databases that are conversion specific rather than trying to build a large multi-purpose database. If a database performs just one function, you’ll have a better chance of maintaining it throughout the project and even being able to hand it off to someone else during the course of the project. I would generally create one database per load program and us it just to prepare the load files for that program. Even though it might be tempting to make it more generic and incorporate reconciliation functions I think it is easier to fault find the simplest database possible when you encounter a problem at 2.00am during the production cutover.

The second principle of starting with the end in mind is straight out of the “seven habits” but holds true in this specific scenario. For instance when building a transformation database the first table I create is a table that is in the file format of the load program that will be consumer of this data.

clip_image002

Figure 1 Start with the End in Mind

I generally start by creating a table with the format of the load file used by the load program. The next step is to create a generic table based on the source system extract. This can be done by importing a sample extract from the source system and giving the table a generic name. The third step I find useful is to duplicate this structure to represent the post transformation (of context) data prior to formatting into the output file format. This can be used to help with the lookup tables that need to be defined to transform data from the as-is context to the to-be context.

There may be opportunities to reduce the number of tales required as queries can also be used to transform data but I often find it useful to “show my working” with various staging tables that can be used to show the data at various stages. The use of staging tables adds steps to the transformation process these can be automated via macros that ensure that the process is repeatable.

Sample conversion database

To walk through the process of creating a transformation database I’m using a hypothetical inventory conversion to demonstrate the principle. For simplicities sake the source system and target systems will be SAP. Inventory is batch managed and will be received into unrestricted stock the conversion will include some vendor consignment. Material and batch conversions are assumed to be completed and validated as pre-requisites to this process. As part of the transform task, material numbers, plants, vendors and storage locations will be transformed based on the following logic.

The old plants are numeric plants 1001,1002,1003 and they will map using the following logic

Old Plant

New Plant

1001

2101

1002

2102

1003

2101

The new storage location in each plant will be “RAW” regardless of the source storage location

Materials cannot be mapped by logic, however an extract of the material master within the target system will allow mapping by use of the old material number field

EG material 12345 is extracted from the source system and it maps to material abcde in the target system which has a value of 12345 in the old material number field.

With simple logic like this you could get the load programs to perform these transformations programmatically using tools like LSMW but this is just a fabricated example to see how to do this in Access.

Preparing the Output Format

The load program has been developed to use a tab delimited file with the following fields

clip_image003

Figure 2 -Load File sample format

The program also expects a header row with the column names as indicated.

The first Step is to create a table called tblOutput_Format

Creating a table directly in design view

clip_image004

I create all the fields in the same order as the load program file as text types. There is no need to create a primary key for these fields.

clip_image005

Extract

The input to the transformation process is the extracted file from the legacy system in the legacy system format. In our case our source system is also SAP. For simplicities sake we will assume that 100% of our materials are batch managed and are not managed in warehouse management. Therefore we will use table MCHB to be the source table for our inventory levels. If we had a combination of batch and non-batch managed materials then we would have to use a combination of MARD along with MCHB. Or you could build an SAP query based on an outer table join between those tables to have one extract to provide data for both scenarios. As in our example we will be using a single table then we will base the extract on the SE16N transaction output to a text file.

From the SE16N screen you can either select just the fields required or just extract the table as a whole. If you extract only a reduced set of fields from the MCHB table then you will need to ensure that you consistently pull the same fields into you’re extract each time.

clip_image007

Figure 3 Sample Se16N extract

Extracting Data Into A Suitable Format

Once the SE16N is run you can select the export as local file unconverted option (although Access will read excel filed directly there is more flexibility in the import interface to load from a text file. Plus for large data sets the unconverted option processes much more quickly than the spreadsheet option.

clip_image008

Unfortunately the converted file format from SE16N has a few header rows that need to be removed prior to loading into Access. Open the file in Notepad and remove the header line and the separator line

clip_image010

Figure 4 Raw file produced by SE16N

clip_image011

Figure 5 Text file cleaned in notepad

Now the file is in a format to import into Access

Importing the source extract into Access with a specification

From within Access select the external data import text file option

clip_image012

Select the MCHB extract file and select the import source data into new table in the current database and select OK

clip_image014

From the text import wizard select delimited and next

clip_image016

In the choice of delimiter select the pipe symbol |, select first row contains field names and text qualifier of none

clip_image018

Then click on advanced to fine tune the import format

clip_image019

Now you can adjust the field names and formats and even skip fields. As we described in the first part of this article data conversion is a process that happens multiple times on a project and not just once, we want to make this process as streamlined and repeatable as possible. So we can now save this import as a specification using the save as button

clip_image020

This will allow the import to be automated in a consistent manner. Continue to save the import, at the prompt for defining primary keys; Access will default to creating an extra field that forms the primary key. I generally do not bother defining a primary key on import tables as I just want the structure of the fields I have included in my extract

clip_image022

However there can be one instance when letting Access define a primary key can be very useful, and that is if you get data type import errors.

When you select the import wizard and import data, Access only reads the first 20 records and tries to determine the data type in many cases access can get the data type wrong, for example the first 20 materials may be internally assigned numerical values, however there may be externally assigned alpha numeric values further into the data set.

clip_image023

If access finds these errors it creates an import error file that just indicates the row number that failed. If you let access set the primary key as a counter field then the row number is also the number of the primary key field.

clip_image025

This can often facilitate fixing the import. I generally set everything to text except for quantity values which I set to doubles and dates which I set as date fields, however in some cases you can still get errors in which case I do a second import with everything set to text (you will not get errors with this data type) and allow access to set a primary key. Then you can create a query using the import error table and the text import to find the entries that fell out and determine what the issue was.

Assuming that you did not get any import errors you now have two tables in your database.

The extracted data from the legacy system and the required output format

clip_image026

The Use of Staging Tables

As we go through this conversion process it will be clear that it is possible to perform many of the transformation tasks just by the use of queries in a single step, so you may think having multiple tables is overkill. However I have found it useful when dealing with multiple data conversions to have a similar design concept between all my databases, so If one day I’m working on a material master conversion then switch to a inventory conversion that I have worked on in a few days I can easily understand what the database does. To this end I like to separate the transformation process using staging tables.

The first staging table I like to have is a generic table in the source extract format. This is generally identical to the table we just imported, so why do we need it? Well as you go through data conversions you will be obtaining multiple extracts from multiple systems and points in time which you may need to process. So each time you extract from your source system you probably have a date and time stamp on the file. You could change the name of your import file to a generic name and then reimport data each time you want to update.

In the example below there are three time stamped extracts. One way of importing them into Access would be to first change the name to a generic name run through the import wizard and save the import. In this way you can import new data files over the top of the old just by changing the file name to a generic name.

clip_image027

I prefer not to use this approach as I like to have a record of what extract was used to generate the load files and sometimes you want to compare extracts to see what the delta is (you could use this to generate a delta load file) So if you decide to keep the separate file names then you will need to import them separately into Access, meaning they will be imported as separate tables in Access. This means any transformation and output queries might need to point to a new table each time you import. To overcome these changes the first thing I do is create a generic table, then base all my transformation queries on the generic table (which never changes) then each time I import a new file, I add one new update query to load my generic table then I’m done.

The quickest way to create the generic extract table is to select the table you just imported and do a copy and paste. Access will ask if you want to copy just the structure or the data as well. From the wizard change the name to your generic load and select structure only.

clip_image028

Now you have a new generic table which won’t change regardless of whether you are loading a QA system the training system or the production system you can base all your transformations on.

Using action queries to load and flush staging tables

As soon as you create a new staging table you need to create two action queries, one to delete the contents and one to load the table.

To create a deletion query you create a query based on your new staging table

clip_image029

Select the staging table then close the add table dialog

clip_image030

You now have a blank query based on your staging table

clip_image032

We want to delete all the fields so we can either drag and drop the fields into the bottom report section or just double click on the asterisk to bring all fields into the query

clip_image033

Now you can change the query type to a delete query and this query effectively empties the staging table.

clip_image035

When a query is changed to one of the action types (update make table delete etc) then the run button is used to actually perform the action.

Save the query with a suitable naming convention so that you’ll know what it does.

Next you need to load data from your extract table. This query will be specific to the file name you load each time so you’ll need to recreate this query each time you want to import a new file.

Create a query based on your extract table, select all the fields either individually or by the asterisk then make this an append query. Selecting the generic export table as the destination

clip_image037

clip_image038

The database now has three tables and two queries

The extract file with data

A generic table in the extract format

A table in the output file format

A deletion query to empty the generic table and an append query to load the generic extract table.

clip_image039

Setting Up A Repeatable Database

At this point before we create too many more tables and queries it is a good idea to organize how Access displays the objects. The default view separates objects by their type, e.g. tables are grouped together and queries are grouped together. Using the custom navigation options it is possible to split the database objects into functional groupings.

I like to organize the database with the following groupings

· Raw Extract Files

· Mapping Tables

· Generic Extract

· Transformation

· Output

· Staging Queries

· Validation Queries

· Macros

These can be set up by right clicking in the objects area and selecting Navigation Options

clip_image040

Then selecting the custom category and adding the custom groups as desired

clip_image042

Now you can activate the custom view from the objects area by clicking the down arrow then selecting custom as the view

clip_image043

clip_image044

You can now drag and drop the various objects into the sections as required. This will become especially useful as the database becomes more complex

clip_image045

Automating With Macros

For every staging table you define you will need at least two staging queries, one to delete the contents of the table and one to load it. If your database has five or more staging tables then you can see there are a lot of queries that need to be run in Sequence to perform the transformation. To keep this process in control I create macros to run the various steps, doing this can make running the database a simple number of macro clicks. The macros required may change based on the particular database requirements but generically I will have macros to do the following

· Import extract data

· Load generic extracts

· Transform data

· Prepare output files

To keep track of which queries go in which macros I like to create the macros as I go and only move my queries to the staging queries folder when I have them referenced in a macro

As we have created the staging queries to load the generic table we can create the Load Generic Extract Macro.

To create a macro select create macro from the create tab

clip_image047

Select the show all actions button then add the setwarnings action

clip_image048

Once selected set the parameter to No, this will prevent access from popping up questions when executing macros.

clip_image050

Next you can drag and drop the delete query into the macro, followed by the append query to load the generic table.

clip_image052

This can also be achieved by using the openquery command

As the last statement you can switch the setwarnings back to On

clip_image053

Identifying the Data Formats Required

Now that we have a generic extract table that we have populated with our specific extract data we can look at how we will perform the transformation into the to-be data context

Transform

Highlighting the mapped fields in green the hard coded fields in blue and the values passed through in yellow we see the following

clip_image055

Based on the above we have three fields that need to be mapped in different ways.

The plant and the material maps will both require mapping tables with a new for old substitution. The vendor can be mapped with a simple concatenation rule so a table is not necessary

Mapping Tables

Our first step is to create the plant mapping table. This has two fields, Old plant and New plant.

We can create a new table in Access with these two fields

clip_image056

Going into the data sheet view we can enter the values manually

clip_image057

For the material master mapping we can base this table on an extract of the material master that will show the old material number and material number and the base unit of measure (BISMT, MATNR and MEINS)

clip_image058

Transforming Data with Queries

Now that we have the mapping tables required to transform our data set to the to be context we can use queries to translate the data.

Essentially we create a query around our Generic extract table that also has the plant and material lookup tables.

clip_image060

Next we want to draw relationships between the keys of the various tables. Linking material to the old material number and plant to the old plant field

clip_image061

The next step is to create the output fields you can create the output in the same format as the Generic extract table and create another staging table called tbl:Transformed:MCHB:Generic or in this case as the transformation is fairly simple the query can easily transform the data and convert into the output format.

Using the fields in the tblOutput_Format table. We can prepare the output format with the same query

clip_image063

The first column is movement type and this is hardcoded as 561. This can be achieved by creating a field called mvt typ then placing a colon : then hardcoding “561” within quotes

clip_image064

Next comes the new plant from the tbl:Plant_mapping table

Then SLOC is another hard coding and so on. Resulting in the query format below

clip_image066

Testing this query to make sure the mappings and hardcoding’s work we can see the expectant result.

clip_image068

On important thing to check is the record count, whenever you have table joins the record count is dependent on the join types and whether there is a full map or not. In this case we have the same number of records post transformation as before so we are good.

At this point we could save this query as a text output file and be done. However I like to “show working” via the use of staging tables, so we can change this query to an append query to update the output format table

When we convert this to an append query Access tries to match up the fields in the destination table. In some cases it finds a match but in others we manually need to add the target fields.

clip_image070

Running through each missing field we can select the available fields from tblOutput_fields

clip_image071

Once all the fields have been assigned the query can be run and it should update the output format table

clip_image073

Now we also need to create a deletion query and add both of these queries to a macro

Now we can create a simple query on the tbl:output_format table to create the output file format and create an text file output

Having created a query that has the format of the output file you can select to export this as a text file.

clip_image075

Select Delimited and Next

clip_image077

Select the tab delimiter, include header rows and set the text qualifier to none

clip_image079

In the same way we saved our import specification we can also save our export specification in the same way under the advanced tab

clip_image081

As you will be creating multiple output files at various points in time it is worth saving the export so it can be automated

clip_image083

This query can now be run and exported as part of a macro using the RunSavedImportExport command.

clip_image084

Preparing a new load file

Now that the database has been built based on the logic and structure of the data, it is a relatively simple task to prepare a new load file from a new data extract.

Because I have chosen to always link back to original load files and have created an interim generic staging table I will always import the new extract into its own table.

I can leverage the load specification to simplify the import into the new table.

In the example below I have imported the extract file dated 09 01 11 using the specification I used for my original sample. To load this data into the generic table I will need to update the append query AP:tbl:Load:MCHB:Generic to pull from the table MCHB_2600_09_01_11

clip_image085

I could of course have imported it in to the original table (MCHB_500_sample) though I would have lost my traceability and ability to compare subsequent extracts.

The easiest way I have found to update this append query is to open it in the SQL view

clip_image087

You can then cut and paste the SQL into a text editor such as notepad

clip_image089

Then use the replace function (ctrl H) to replace the table name to the new table

clip_image090

You can then paste it back into the SQL view of the query. You can validate the change by going back into the design vie and the query will show the new table

clip_image091

At this point the macros previously created can be run in order and perform the transformation on the new data extract and create the output file automatically and almost instantaneously.

You have now understood how to set up an Access database to drive a data conversion process as a repeatable process, in part three; we will look at some specific transformation techniques that can be used to automate many of the transformations that may be required.

Post to Twitter

  1. alice
    September 28th, 2011 at 06:17
    Reply | Quote | #1

    Good stuff but would like to download. Is there a way to do this?

  2. Jim Ruggle
    October 9th, 2012 at 17:49
    Reply | Quote | #2

    Absolutely outstanding! Thanks

  3. Rita
    June 6th, 2013 at 13:33
    Reply | Quote | #3

    Great summary! Thanks.