Working With Bulk Data


First published on: 06/18/2024/8:49 am

 

 

This page offers guidelines for sending large amounts of data using Saltbox.

 

Saltbox was purpose-built as a cloud-based integration platform as a service, and most integration scenarios involve sending smaller "snippets" of data (e.g., sales documents and master data updates like business partners or items/parts) between systems.

 

A typical sales document from an ERP system in XML format is approximately 3 MB of data. A CSV file with 120,000 records and 10 fields is approximately 10 MB.

 

The limit for a single Saltbox message is 50 MB.

 

However, this doesn't necessarily mean you can send 50 MB messages consistently and expect good workflow performance. Performance will always vary, depending on the scenarios and actions the workflow is undertaking. Another limiting factor is external systems' API performance. For example, you may be able to generate a message that is 50 MB in Saltbox, but the 3rd-party API might not be able to accept that volume of data in one go or may take a long time to process the request. (This is not a Saltbox issue but rather an API performance issue.)

 

Before building workflows, consider these questions:

 

  • How large is the data I'm transmitting in one go?
  • How frequently does this need to happen?
  • If the data is large, does it need to be transmitted in one go, or can we break it down into batches?
  • Do I need to transform the data?

 

Where possible, break large amounts of data down into smaller batches. There are many ways to do this and to simultaneously process multiple batches, so although the overall process time is the same, it reduces resource usage on the Saltbox Tenant.

 

There are multiple connectors and workflow options that let you do this.

 

Method 1: Using connector options and paging to create multiple messages.

 

Some connectors, like SAP Business One, let you use queries as the data source and let you split the query data into several batches to process as different messages using "Paging".

 

Below, if our query returned 2999 records, we would set the page size to 1000. This would generate three Saltbox messages: two messages with 1000 records each and a third message with the remaining 999 records.

 

 

Method 2: Using the trigger option "Data Field Used to Sub Divide Transactions".

 

This takes the data received by the workflow trigger and breaks it down into multiple workflow messages for individual processing.

 

Say the workflow trigger receives this data: 

 

<Root>

<Order>

<OrderNo>1</OrderNo>

<Customer>Vision33 Inc</Customer>

<Order>

<Order>

<OrderNo>2</OrderNo>

<Customer>Another Company</Customer>

<Order>

</Root>

 

That indicates two orders that can be processed separately.

 

Entering the XPATH to node where we want to split the data will result in two workflow messages. (See an example of splitting that data below.)

 

 

 

The "Split Query Results by" field lets you switch from a single message for all the query results to a single message per row or record of data within the query or "Column Groups" option. You can define a column from your query where, if the value changes, a new message will be generated.

 

 

Method 3: Within a workflow, loop through a node set and "route" the mode to another workflow to process in smaller batches.

 

Data can be broken down within a workflow using a "For Each Loop". However, loops have their own performance issues, as the amount of work a workflow does within a loop can increase the resources consumed exponentially. We recommend splitting data using a loop and then using "Route" to send that data to another workflow for processing.

 

For example, XML like this:

 

<Root>

<Order>

<OrderNo>1</OrderNo>

<Customer>Vision33 Inc</Customer>

<Order>

<Order>

<OrderNo>2</OrderNo>

<Customer>Another Company</Customer>

<Order>

</Root>

 

Can be split into batches of:

 

<Order>

<OrderNo>1</OrderNo>

<Customer>Vision33 Inc</Customer>

<Order>

 

Using a loop with the following settings:

 

 

This batch of data inside a loop is in the "Current Iteration" message, which can be used as the input message to a "Route to Another Workflow" action.

 

 

 

Method 4: Count the number of records and use a variable counter incrementing in a determined batch size and route that batch to another workflow.


 


DO NOT use Visual Data Mapper to transform large record sets in one go.

 

 

 

 

-