
Splitting an input row into multiple outputs based on input conditions
Often, it is required to filter input data into multiple outputs depending upon given criteria, for instance, splitting customer data by region, as in this example, or by team. Another very common example is to split the input data into validated records and records that have been rejected due to having failed a quality check (see Checking a column against a list of allowed values in Chapter 3, Validating Data for examples of using tMap to filter invalid rows).
This recipe shows how the tMap
output Expression filters are used to perform filtering of the nature described precedingly.
Getting ready
Open the job jo_cook_ch04_0060_multipleOutputs
.
How to do it...
- When you open
tMap
you will see three identical output tables - Click the Expression filter button
for the table UK to open an expression field, as shown in the next screenshot.
- Drag the input column
countryOfBirth
into this box. - Add
.equals("UK")
to the end of the expression to give the expression:customer.countryOfBirth.equals("UK")
- Your table should now look like the following:
- Repeat the same for the USA table to give the expression:
customer.countryOfBirth.equals("USA")
- Click the tMapSettings button
for the final table, restOfWorld, to open the table properties.
- Set Catch output reject to true, as shown in the following screenshot:
- Exit
tMap
and run the job to see the results.
How it works…
tMap
will pass an input row to the output from the top of the output table list downwards, depending upon their settings.
tMap
will only pass data to an output if:
- It has no filter expression and is not a catch output reject
- It has a filter expression and is not a catch output reject the condition is met
- It is a catch output reject with a filter expression and the row has been rejected from previous output and the condition is met
- If it is a catch output reject with no filter expression
It is sometimes easy to think of this list as a set of if-then-else criteria.
Tip
It is recommended that lists of outputs be ordered like if-then-else to make understanding easier. It is also recommended that multiple tMaps
be used in the scenario where many outputs are created, depending upon complex conditions. It is not that tMap
cannot handle a high level of complexity, rather the impact of changes may be difficult to calculate if there are many inputs, outputs, joins, and conditions.
There's more…
In this recipe, we have multiple copies of the input being created using input criteria. It is worth noting that the outputs do not need to be copies of each other.
It is also worth noting that if no criteria is specified for any output, then tMap
will copy every input row to every output. What's more is that each of the output can be of a different format and have different rules for the same input row. In this instance, tMap
becomes a means of creating multiple different views of the same output data.
What is also possible is that multiple outputs can be specified with catch output reject specified. This means that multiple views of rejected data can also be created.