Data Flow



The staging framework

The staging framework along with the Task Scheduler is a set of tools to move and manipulate data all the way from the ERP database up to the production database. The framework exists in the staging database and is made up of advanced sql code. The framework allows users to configure the data flow more easily by creating something called elements and arranging the elements into runs. Each element does a specific operation and currently there are 3 types of operations:

  1. Data merge : This operation merges data from a source table or view to a destination table. The operation is given a source and a destination object name and will automatically generate and execute a merge statment. There are advanced settings that allow you to control how the merge statement is generated but in it’s simplest form it will find common columns and a primary key and use those to generate the merge statment
  2. Execute procedure : This operation will execute a procedure
  3. Server side execution : The above operations are purely sql but there are a few things that can’t be executed directly using sql. What is to be executed is defined in the object column in core.stg_element. For the moment we offer 2 types of server side executions:
    • SCHEDULED_ORDERS : Executes scheduled orders, this type has no parameters
    • FORECASTS : Create forecasts. There are a number of available parameters you can set here. More info on that can be found here

inst

Layers

The Data Flow is in the following layers

ERP -> RAW -> PREP -> INV | DBO

ERP Layer

Data from the customers ERP system. If data is provided (e.g. trough linked server) straight from the ERP system, then the data is copied into tables in this layer.

RAW Layer

In this layer we have defined the raw tables preferred for the AGR system. ERP layer data is mapped to the RAW layer. In cases when the customer can provide mapped data then it should be in the RAW data layout. In the future the ERP connector will also link straight into the RAW layer.

PREP Layer

The PREP layer is the start of AGR specific data mapping. Lot of IDs used in AGR are generated here. PREP layer is thought of being the layer where all AGR modules have access to the same data and id structure. For example the Location Financial Planner and Inventory should use the same location_id, even though not all locations would be shared between modules. This might be because you want to consider closed location for Location financial planner to get all sales history in the past.

Final stg layer - INV, FP,

The schema name for the final layer should represent the module the data is used for. Here we have a final version of each table that is required for the system/module that are then merged over to the prod database. All module specific data mapping and business logic should reside here.

Inventory Example

In the Inventory module we have one table that stores information about internal locations (warehouses, stores) and also external locations (vendors). However the ERP’s usually store this information in separate tables so that raw layer is designed to receive the data in that way.

  • nav.location -> raw.location -> prep.location_and_vendor -> inv.location | dbo.location

  • nav.vendor -> raw.vendor -> prep.location_and_vendor -> inv.location | dbo.location

Two Module Example

If a customer would both have the INV module and FP module, then you would use the same PREP layer but map it to both INV tables and FP tables:

  • nav.ItemCategory -> raw.PRODUCT_GROUP -> prep.product_group_column & prep.product_group_details -> Map to INV inv.item_group_columns & inv.item_group_details and FP fp.category_column & fp.category_details

  • nav.ProductGroup -> raw.PRODUCT_GROUP -> prep.product_group_column & prep.product_group_details -> Map to INV inv.item_group_columns & inv.item_group_details and FP fp.category_column & fp.category_details




STG Control Tables

The control tables to configure the AGR data flow are stored in the core schema on the STG database.

core.stg_run

Here a stg run is defined, everything that should be run together in one scheduled task should be grouped together in a stg run. Example: daily job

  • name (nvarchar) - required: The name you give the run, must be unique
  • description (nvarchar) - required: description of the run, is shown in the scheduler in the system. You can either hard code the description or enter a translation key that references the translations in the system
  • force_run (bit) - not required: If true all stg element in the run are executed even if they appear twice e.g. in separate groups.
  • run_stg_groups_in_sequence (bit) - required: If true then stg elements in each group are all executed before starting next group. If false types across groups are executed in the run order defined in the core.stg_element_type table.
  • include_in_system_health (bit) - required: True if the status of the run should be included in the System Health Feature on the Inventory dashboard; 0 otherwise

core.stg_group

In this table we define how we group together elements that are run as part of stg_run. An example of a stg run group is inventory_raw which groups together all merges/procedures that need to be run to populate the RAW layer

  • name (nvarchar) - required: The name you give the group, must be unique
  • description (nvarchar) - required: description of the stg group. You can either hard code the description or enter a translation key that references the translations in the system
  • system_health_group_id (int) - not required: if the stg_group belongs to a stg_run that should be displayed in the System Health Feature, here you enter the System Health group the elements should belong to.

core.stg_run_group

Here you link a stg_run (i.e. the daily job) to a stg_group.

  • stg_run_name (nvarchar) - required: The name of the stg run in the core.stg_run table
  • stg_group_name (nvarchar) - required: The name of the stg run in the core.stg_run table
  • run_order (int) - required: a number defining in what order the groups are run.
  • active(bit) - required: 1 the group should be run when the stg_run is executed; 0 otherwise.

core.stg_element

In this control table the stg elements are defined. They can either be merge procedures (automatically created in the stg run), fiexed procedures or server side elements. You define the merge procedures, i.e. where you want data to go (object) and where it should come from (src_data_object).

  • id (int) - required: the id of the stg element, must be unique
  • type_name (nvarchar) - required: the name of the stg element type the element belongs to
  • name (nvarchar) - required: The name of the stg run in the core.stg_run table
  • description (nvarchar) - not required: a description of the element
  • object(nvarchar) - required: If auto merge then the object is the targ table name, if you want to execute a procedure then you list the procedure name here.
  • operation_mode (int) - required: 0 - Auto detect if procedure or data merge, 1 - data merge, 2 - execute stored procedure, 3 - element executed by the application service.
  • operation (nvarchar) - autopopulated: ‘DATA_MERGE_AUTO_GENERATE’, ‘EXECUTE_PROCEDURE’ or ‘SERVER_SIDE_EXECUTION’
  • parameters (nvarchar) - not required: If executing a procedure directly you list the input parameters here
  • src_data_object (nvarchar) - not required: For autogenerated merge here list the source table/view you are merging from
  • break_on_error (bit) - required: 1 if the stg run should fail if this element fails; 0 if it should continue
  • object_full (nvarchar) - autopopulated full name of the database object
  • src_data_object_full (nvarchar) - autopopulated full name of the source database object if merge
    the following columns should only be populated for elements that should be auto merged:

  • data_merge_column_values (nvarchar) - Here it is possible to set values to columns e.g. if column does not exists in source table. Example column_name=‘value’,column_name_2=0
  • data_merge_join_columns (nvarchar) - comma separated list of what columns to join on, if nothing is specified than it uses the primary key of the table
  • data_merge_compare_columns (nvarchar) - comma separated list of what columns to compare in the merge statement generated. If you want to list what columns to omit place ^ in front of the list. If empty all columns available in the target table are used.
  • data_merge_update_columns (nvarchar) - comma separated list of what columns are updated in the merge statement generated, if matched and difference is in the comparison. If you want to list what columns to omit place ^ in front of the list. If empty all columns available in the target table are used.
  • data_merge_insert_columns (nvarchar) - comma separated list of what columns are used int he insert part of the merge statement generated, if not matched and present in source. If you want to list what columns to omit place ^ in front of the list. If empty all columns available in the target table are used.
  • data_merge_not_match_source_values (nvarchar) - If row is not present in source you can add here update to a column instead of deleting the row in target. E.g. active=0 to set the column active to 0 when the row is missing in source.
  • data_merge_filter (nvarchar) - add a filter here if you don’t want to merge the whole table. For example for historical data you might only want to refresh the last 30 days. This filter is applied to both target and source.
  • data_merge_source_only_filter (nvarchar) - Filter only applied to source table.
  • data_merge_pre_sql (nvarchar) - not required: You can add code here that you want to run before the merge
  • data_merge_post_sql (nvarchar) - not required: You can add code here that you want to run after the merge
  • last_run_time (datetime) - not required: the last time the element was run
  • last_run_time_successful (datetime) - not required: the last time the element was run successfully
  • last_run_event_id (int) - reference to the last event that was logged for this element. See more here

core.stg_group_element

Here you link each stg element (merge/procedure/other) to one or more stg group. Here you also define what order the elements should run in

  • stg_group_name (nvarchar) - required: The name of the stg group in the core.stg_group table
  • stg_element_name (nvarchar) - required: The name of the stg element in the core.stg_element table
  • run_order (nvarchar) - required: a “number” defining in what order the elements are run within the group. It is possible to add sub-numbers (e.g. 3.2) this is useful when adding a element in between two previous elements without having to shift the order of all following elements. These sub-numbers can go down many layers e.g. 1.2.2.2.2.2 but each layer number can be max 3 digits (999).
  • force_run (bit) - required: If true the element is always executed even if it is present in another group executed in the same run.
  • active(bit) - required: If true the element will be run when the group is executed. Default value for this column is true
  • run_serial (nvarchar) - required: Automatically Populated. This serial determines the order of execution of the elements. It is the run_order of the type of the element combined with the run_order set in this table

core.stg_element_type

Different stg element types, to group merge procedures to run_order.

  • name (nvarchar) - required: The name of the type, must be unique
  • description (nvarchar) - required: The name of the stg element in the core.stg_element table
  • run_order (nvarchar) - required: a number defining in what order the elements are run within the group.

core.stg_element_sql

This is not a configuration table but a table that stores a copy of the auto generated merge statements. This can be useful for debugging purposes, e.g. if a merge is failing or is slow. If the merge statement changes, which might happen if you add a new column, a new version of the merge statement is stores, giving you the ability to view all previous versions of the merge statemants

  • stg_element_name (nvarchar) - required: The name of the staging element
  • sql_version (int) - required: The version of the merge statment. Look for the highest version number for each element to locate the current merge statement
  • sql_text (nvarchar) - required: The actual sql merge statment.
  • insert_date (datetime) - required: The date and time the merge was first generated.




Running staging Runs / running elements

The easiest way to run complete staging runs is probably by going to the Task Scheduler in the AGR Website and pressing the play button. This will ensure that all elements will be run, including the server side elements. That being said, there are cases where you will need to execute a single element manually or even a couple. Below is a guide on how to do so direcly in database.

Running server side element

It is not possible to run server side elements in sql, you will need to do that in the task scheduler. You can run a single element by using the partial run like so:

inst inst

Running one staging element

EXEC [core].[execute_stg_element] @stg_element_name = 'inv_items'

This procedure has 6 parameters:

  • @batch_id, @origin_id & @parent_id: These are the standard logging parameter. They can be ignored when running a single stg element.
  • @stg_element_name: The name of the staging element you wish to run, defined in the core.stg_element table.
  • @stg_group_name: Used internally when running runs, no need to use it when running a single element.
  • @debug: The good old debug bit. Default is false but if enabled it might print out some helpful information.

This can be useful when you define new data merge element in the core.stg_element table it is good to use this procedure to test if the data flows correctly from the source to the target. If you use the debug flag set to 1 then you get a print out of the merge statement and a table result that shows what columns and how they are used in the merge statement generation.

Running a whole bunch of elements

EXEC [core].[execute_stg_run] @stg_run_name = N'daily_run'

This procedure has 6 parameters:

  • @origin_id & @parent_id: These are the standard logging parameter. They can be ignored when running manually. The procedure generates a new @batch_id, and therefore it is not an input parameter as in other procedures.
  • @stg_run_name: The name of the run you wish to execute. Available runs can be found in the [core].[stg_run] table
  • @element_type: Optional - Allows you to filter the list down to an element type (layer). Available types can be found in the [core].[stg_element_type] table
  • @element_list & @group_list: If you use the above parameter, @stg_run_name, you can not use the @element_list or the @group_list. It’s an either/or scenario. If you choose to use the @element_list or @group_list you can choose to use either or both. These parameters will accept a comma separated list of element names or group names.
  • @debug: Default is false. If @debug = 1, the query will only print out the stg_elements in the result window, and not run them. This is f.x. useful to see in which order stg_elements in a stg_run would run.
EXEC [core].[execute_stg_run] @element_list = N'inv_product,inv_product_article,inv_product_article_ready_to_prod'
EXEC [core].[execute_stg_run] @group_list = N'inventory_run_abc,inventory_data_transfer,inventory_run_orders'
EXEC [core].[execute_stg_run] @element_list = N'inv_product,inv_product_article', @group_list = N'inventory_run_abc,inventory_run_orders'

Notes:

  • The procedure will execute the elements and/or groups in the order that they appear in the list
  • If you use both the @element_list and @group_list parameters the procedure will first execute the elements and then the groups
  • If you try to execute server side elements or if they are in the groups you are executing the procedure will simply skip over them and give you a warning, not an error




Multi ERP setup and table naming conventions

Multi ERP setup

Most setups only connect to a single ERP system but there are setups that connect to multiple ERP systems. There is support for this in the core.erp_db_info table. The best way to show this is by example. Let’s imagine we have a chain of stores in the Nordics and we need to connect to 3 ERP systems. One is in Norway, one in Denmark and one in Sweden. The table would look someting like this: inst

Naming conventions

ERP data tables in the Staging database are placed in separate schemas. For legacy reasons there is a difference between Nav setups and other erp setups as can be seen below:

  • Navision schemas:
    • [nav]- base tables (example: [nav].[Item])
    • [nav_cus] - custom (added) tables
    • [erp_nav] - synonyms
  • Other ERP schemas
    • [erp] - base tables
    • [erp_cus] - custom tables
    • [erp_syn] - synonyms
  • Multi ERP schemas (by example above)
    • [erp_no] / [erp_se] / [erp_dk] - base tables, each suffix for each ERP system
    • [erp_no_cus] / [erp_se_cus] / [erp_dk_cus] - custom tables
    • [erp_no_syn] / [erp_se_syn] / [erp_dk_syn] - synonyms
      • Note that the no / se / dk should match the multi_erp_schema_suffix column in core.erp_db_info

There is a procedure called core.create_erp_table (see example 1 below) that can help you create ERP tables, custom or not. We recommend that you use it when adding ERP tables.

Customisation

In older versions (6.1 and older) consultants would have almost a clean slate in the staging database to customize whatever they wanted making it almost impossible to update. With the introduction of the new staging framework, which resides almost exclusively in the staging database, we needed to make sure it was in the upgrade path. This meant we needed to create better placeholders for customizations, so keep on reading if you are interested in knowing how we did just that:

The _cus schemas

To make it short and simple: If you need to customize you should do so in the appropriate _cus schema. This is best explained by showing examples:

Example 1: Adding a new ERP table

If you need to add a new ERP table we have a procedure that will help you with that. Let’s assume we are using Navision and the table we would like to add is Hagar$Additional Sales Data. What you will need to do to get the table over is call the procedure like so:

EXEC tb.create_erp_table @table = N'Additional Sales Data'

Note that you don’t need to include the company name prefix (Hagar$), it will be detected using info from the core.erp_db_info table

The procedure will do the following:

  • Locate the table in the ERP database (it doesn’t matter if the database is local or accessible via linked server) using the information found in the core.erp_db_info
  • Create the table in the staging database as nav_cus.AdditionalSalesData (note that it removes spaces) with the same columns and primary key as in the ERP database. Note that since ALL the columns from the source table are created you might want to remove unneccessary columns to speed up the data transfer.
  • Create a synonym, erp_nav.AdditionalSalesData that points to the ERP table
  • Create a merge staging element that merges the data from the ERP table to the staging table
  • Add that staging element to a group so it will be run as a part of the dail job

The tb.create_erp_table procedure has some additional optional parameters which you might find useful:

  • @erp_id: if no erp_id is selected the first one will be automatically selected for you. This is using information from the core.erp_db_info table
  • @stg_group_name: which group to insert the new staging element into, by default we will insert it into the inventory_data_transfer group
  • @custom: Default is true. If true the table will be created in an _cus scheme. Sometimes when using clean setups on other erp systems you can use custom = 0 to create the base ERP tables
  • @execute: Default is false. If false, it will only print out the scripts for you to run, if true then it will print the scripts AND execute them
  • @postfix: A postfix for your table name

Example 2: Changing a view without adding columns

Let’s say we want to change the [raw].[v_STOCK_LEVEL] view. This view is used to populate the [raw].[STOCK_LEVEL] table. The steps needed to customize the view are:

  • Make a copy of the original view and place it in the corresponding _cus schema, in this case: [raw_cus].[v_STOCK_LEVEL]. It is very important that you keep the same view name.
  • Customize the view

The staging element used to move data from the view to the table will automatically detect the presense of the new customized view. There is no need to change the setup of the element

Before view was created: inst

After view was created: inst

Example 3: Adding columns - extra info

Check out the Extra info chapter for more info on standard extra info dataflows

Example 4: Custimizing a procedure

I’m guessing that if you have read the two examples above you can guess how this one goes. Just make a copy of the procedure and place it in the appropriate _cus schema and the framework will detect it’s presense

Example 5: Adding new staging elements

Should the need arise to add new elements they should be added to the appropriate cus schema. When adding lines to the [core].[stg_element] for those new elements you should use the cus name for the object. That way you can see that this object is an addition instead of a custimization of an already existing object. Example: inst It is not strictly required, but a good idea, to also prefix the name of the staging element with **cus** or some other nice prefix to indicate that this is not a part of the standard setup.

The upgrade path

By always making a copy of the original view, tables or procedures we ensure that all custimizations are easily traceable and this also means that we can update the original objects and framework without having to worry that we will mess everything up. There are, of course, pros and cons to this but if we focus on the pros:

  • Easy to find customizations
  • Easy to update
  • You always have the original un-customized object on the side as a reference.

Schema locking

To ensure that consultants don’t accidentally modify, create or drop objects in the non-customizable schemas (raw, prep, inv, core, etc.) we introduced the schema locking mechanism. This mechanism will monitor all DDL requests and check if the object being touched is in a locked schema. If so you will get the following error: inst

The scema locking is implemented with a database trigger (tr_ddl_core_schema_locking) and two tables (core.schema_locking & core.schema_locking_exception_list)

inst inst

The schema locking is only meant to help you avoid making mistakes when custimizing but we realize that there might come the time when you really need to do some modifictions to a schema. When that need arises you simply go into the [core].[schema_locking] table and disable the lock for the appropriate schema but beware that there is nothing that re-enables the lock so please remember to enable it again.

If there is an object that is frequently altered (possibly an autogenerated object) you can add that object to the [core].[schema_locking_exception_list] table and it will be excluded from the schema locking at all times

Notes:

  • If a schema is not in the [core].[schema_locking] table it is considered unlocked
  • In extreme cases you can disable the database trigger which will disable the schema locking completely