Join (Merge columns, one-to-many template)

The Join building block merges columns.

For each row in the Primary dataset, it searches for a matching row in the Secondary dataset.

One-to-One vs One-to-Many

One-to-One, meaning one row in the Primary dataset corresponds to one row in the Secondary dataset.

One-to-Many, meaning one row in the Primary dataset corresponds to many rows in the Secondary dataset.

When your datasets fall into the first case, Join behaves like the Lookup building block. In the second case (One-to-Many), it behaves differently.

Example

Let's look at an example.

In the rev table, there is one row for each date.
The costs table has two rows for each date
The costs table also has a row for a date that is not in the rev table.

As a result, for each row in rev, two matching rows will be found in costs.

In the illustration, you can see the resulting dataset. This dataset has three drawbacks:

The rev rows are duplicated for each matching costs row.
Duplication results in doubling the Revenue amount, which can cause confusion.
The fifth row from costs is "missing". As a result a total for Costs is less than in raw data, which can cause confusion.

There are several approaches to address these issues.

The rev rows are duplicated for each matching costs row.

If you don't need this behavior, use the Lookup block instead.

Duplication results in doubling the Revenue amount

Use the "Apply pro-rata distribution for metrics" option.

Conduit divides the Revenue metric by the number of rows in the Costs dataset.

The fifth row from costs is "missing"

To ensure that the totals for all metrics match the totals in the raw data, you need to include the 5th row (from Costs) in the report as well.

To achieve this, use the "Right join" option.

Template

This template demonstrates how to merge columns from two datasets that have the different number of rows. You can replace the Pull blocks with your own data sources.

See also

Tags