LLamasoft Supply Chain Blog

← Back to Blog

Model Building: Approaches to Automation

By Dan Kogan  January 18, 2019

An early step in the typical model-building process is identifying data requirements and tracing them back to data sources.  Though generally good advice to start with the end in mind, it can be overwhelming to trace Supply Chain Guru tables and fields back to various data sources, especially if you are unfamiliar with either – or both – the SCG tables or the organization’s source or reporting systems.

The good news is that you can maximize coverage of the model requirements through a minimal amount of source data.  This approach will help you simplify your model building automation and accelerate your time to value.

Data Sourcing and Standard Data Objects

Below identifies some required, probable, and possible data sources useful in automating model-building from raw data.

Required sources:

  • Shipment History – 80-90% of a distribution optimization model can generally be built from transaction-level shipment history. This shipment history can serve as a master database of network flows, which can be decomposed and distilled into the basic entities – locations and products – of a supply chain model.  From these entities, any applicable relationships – current or future state – can be modeled accordingly.

Probable sources:

  • Locations – master data such as a customer master, vendor master, or facilities table can serve as a source to reconcile site names from the transactional shipment history, supplement attribution where required (i.e. address, sales channel, site function, capacity, etc.), and facilitate limiting the transactional data to a key subset of sites within scope of the modeling exercise
  • Products – master data such as an item master can serve to map SKU numbers to other levels of the product hierarchy, supplement attribution not directly available in the transactional shipment history (i.e. physical characteristics, units of measure, inventory class, etc.), and facilitate limiting the transactional data to a key subset of products within scope of the modeling exercise

Possible sources:

  • Production capability – because history may not always be comprehensive of what is possible, it may be relevant to incorporate a data source which enumerates possible site-product combinations for production. This is more likely required for production-focused models.
  • Inventory history – optional supplementary source for ‘snapshot’ of starting inventory, especially useful for multi-period models

(click for larger image)

Network Flows

Depending on the scope of your models, you can append – or union – inbound, outbound, and intercompany shipments into a master table.  The master ‘network flows’ table should be normalized so that all fields have the same definition.  Below identifies some required, probable, and possible fields to include in your shipment history query/network flows table:

Required fields:

  • Origin – the pickup location
  • Destination – the delivery location
  • Product – the material shipped. Sometimes ‘shipment history’ omits product-level detail requiring a merge – or join – of shipments and orders.  If this is the case, ensure the report contains the relevant reference ID to join shipment data with product/order-level data.  Alternatively, sales history may be a valid proxy for shipment history if it contains the same required fields.  In some instances, the actual material won’t be important and a base unit of measure can suffice (i.e. case, pallet, etc.)
  • Quantity / Weight / Cube – depending on the key unit of measure for your organization, you will one to include one or more of these. If using product-level shipment history, you can derive the weight or cube per unit, if you have the total weight or cube and quantity of units.

Probable fields:

  • Mode – if mode decision will be a variable in your models or if cost structure or basis differs significantly among modes used, you may want to incorporate this element
  • Date – shipment transaction date; this element may not even be required if your models will be a single period model, but is best practice to incorporate and should be easy to obtain from most data sources

Possible fields:

  • Transportation Cost / Sourcing Cost – if transportation and/or sourcing costs will be drivers in your model, they can be incorporated into this table for a ‘top-down’ approach; however, it is not uncommon to have to query these from separate sources and integrate them into the data workflow in a ‘bottoms-up’ approach
  • Shipment IDs – if shipment data is broken out by product/shipment lines, a shipment ID provides a mechanism for grouping and allocating across shipment characteristics (i.e. shipment level transportation cost by product line)

Data Validation and Adjustments

Referential integrity:

Source data is seldom perfect, or at least seldom perfectly aligns to your model construct.  Records can be mis-keyed or duplicated.  Automation can check for duplication and for referential integrity – that no entities are missing among related tables.  For instance, the site name on inbound shipment data may differ than the name of the same site on intercompany shipment data.  SKU numbers and site IDs are typically more controlled than text fields like product description or customer name, so at this stage, you may also automate reconciliation against master data, such as item master, customer master, or vendor master data.  Automation can further segregate and report records which fail duplication and referential integrity checks so that they can be treated or excluded from the model.  The process can even be halted if invalid records breach a certain threshold.

As a best practice, any data clean-up, naming convention reconciliation, or additional attribution should be applied to the network flows table to ensure referential integrity among the downstream tables. The objective is to make this master shipments table comprehensive and clean so that it can serve as the primary – if not the sole – source for all subsequent ‘downstream’ tables required for the model.  Branching all other tables from the network flows table sidesteps some common pitfalls of tables becoming ‘out-of-sync.’  By performing the bulk of any data updates and transformations in this ‘upstream’ table, you can keep tight control of any changes and ensure synchronized propagation to downstream tables.

Flow imbalances:

Some of the most common causes of infeasibility in network optimization models stem from imbalances in the historic flow of products and missing links in your network.  Automation can check source data for inbound/outbound flow imbalances at the product-location level and reconcile this against production-location production capabilities and inventory levels.  Imbalances can be reviewed for legitimacy and treated with seeding initial inventory levels as applicable.


It is common to simplify your model through aggregation.  For instance, you may aggregate products from a very granular SKU or material level, into groups which share common characteristics.  If your aggregation grouping relies on characteristics that are not embedded in the SKU code or material name, you may want to join these attributes (such as the product hierarchy) from the item master or other source to the SKU-level master shipment history.  From here, you can create an aggregation key, such as a concatenation of Product Division and Product Family for aggregation purposes.  Similarly, you may join attributes such as location details or sales channel to the customer destinations in the master shipment file.  For instance, a hypothetical aggregation key for customers could be

[Country]_[Postal Code]_[Sales Channel].  Of course, more complex and multi-layered hierarchical aggregation keys can also be automated, such as aggregate the SKUs which comprise the bottom 20% of volume as “Other Low Volume Product.”

If your aggregation keys are fully defined on the master shipment table, you can aggregate across all applicable dimensions – products and sites – to prevent repeating this step on each subsequent table.  If aggregation strategies are subject to change for different models (for instance a customer-focused model vs. a supplier-focused model), you can create multiple aggregation keys as various options to employ depending on model needs.  These aggregation keys can be selected and modified ‘on-the-fly’ using Data Guru using various mechanisms.

(click for larger image)

Entity Tables – Sites and Products

With the master shipment history in a stable state, this can be distilled into the applicable entity tables.  These entities are the sites and products from which modeling relationships can be formed.

If the master shipment history is comprehensive, additional transformations should be quite limited at this stage.  For your sites, you may wish to merge fixed operating costs and capacities from an additional data source.  If you have sites or products which weren’t in the master shipment history (i.e. new products, potential sites, etc.), you can query these from your master locations and products tables, respectively.

You can also geocode – assign geographic coordinates via location references such as postal code – the sites using Data Guru to eliminate this manual step in Supply Chain Guru.  Since the workflow will be executed repeatedly, you can incorporate logic to only geocode new sites incremental to the model and cache that coordinates of previous sites.


Demand is the ‘pull’ factor which drives the model.  Demand is customer and product-specific, but origin agnostic, as optimization is frequently used to determine from where demand will be satisfied.  To this point, demand can simply be an aggregation or grouping of the master shipments table by customer-product combination – using the applicable aggregation keys – and summing the corresponding quantity.  For forward-looking scenarios, the Demand table contains additional sets of records derived from advanced business analytics, such as forecast from the best fitting statistical model or machine learning algorithm.

Relationship Tables – Inventory, Sourcing, Production, and Transportation Policies

After you have decomposed the master shipment table into the necessary entities, you can build the appropriate relationships among these entities for use in the model.  A common approach is to create ‘all-to-all’ combinations, using a Cartesian join also known as a cross join.  Cross joins are joins which do not define any join criteria used to enumerate all combinations.  Sometimes ‘all-to-all’ combinations are not desirable – for instance, customer-to-customer shipment lanes – so applicable filters and groupings can be used to control the cross join methodology.

(click for larger image)

  • Inventory Policies – these relationships are at the site-product level and are needed for any valid combination where inventory could be held – current state or future state
  • Sourcing Policies / Production Policies – these relationships define how material is sourced and/or made.
    • If the material is being made, this is a combination of production sites and the applicable products, again pending the validity of current and future states.
    • If the material is being procured, this is a combination of source site and destination site and product. Cross joins can be performed using more than two input tables.
  • Transportation Policies – transportation policies are similar to sourcing policies, although they are not required to be product-specific. Additionally, if transportation costs were not incorporated into the master shipment table, they can be joined at this stage if there is a corresponding rate matrix by origin-destination-mode.

(click for larger image)

Baseline Flow Constraints

As a refresher, in addition to acting as a base from which to alter the supply chain elements under consideration, the baseline model also serves as the ‘control group’ of the experiment from which to validate and ensure proper calibration against operational realities.

The trouble with optimization software is that it will provide an optimal solution!  Since your model is built using ‘all-to-all’ combinations to allow for future state scenarios, you will need a mechanism to force – or constrain – the model to mirror the sub-optimal decisions and inefficiencies of past policies.  Modeling with constraints allows us to ‘tell’ the model how to behave as opposed to ‘asking’ what is optimal, effectively converting Supply Chain Guru from a solver into a calculator.  This will allow for verification of a complete model and validation of costs and flows against financial and operational metrics.

You can use flow constraints to ‘lock’ the flows which occurred historically by applying a specific flow amount to an origin-destination-product combination.  But where can you get historic flow amounts by origin-destination-product?  From our master shipments table, of course!  The baseline flow constraints can leverage a copy of the master shipments table.

Depending on the business question at hand, some or all of these constraints will be relaxed during the scenarios.  For instance, if the model is focused on optimizing DC-customer assignments, a scenario may remove or relax the DC-customer flow constraints, but retain the supplier to plant constraints. 

Supply Chain Guru Model Tables

At this point in the process, you should have necessary tables – Sites (Customers, Facilities), Products, Demand, Inventory Policies, Sourcing Policies, Production Policies, Transportation Policies, and Flow Constraints – which can populate a Supply Chain Guru model database.  In short order, you should be able to validate your baseline and begin relaxing…your flow constraints to optimize your supply chain design.

Like this article? There is more to explore! If you are a current customer, please visit our support site for more practitioner insights. Future customers can request a demo here.