See also

27 Nov 2024 08:40 PM

Map JSON to CSV

Remote APIs provide data in JSON format, while Conduit's internal API requires flat table inputs. To convert JSON into a flat table, you must define the process for interpreting and mapping the JSON structure.

Let's consider an example. An HTTP API returns a list of SpaceX rocket launches in JSON format:

[
  {
    "capsule_serial": "C101",
    "capsule_id": "dragon1",
    "status": "retired"
  },
  {
    "capsule_serial": "C102",
    "capsule_id": "dragon1",
    "status": "retired",
  }
]

We need to transform this data into a flat list.

capsule_serial capsule_id status
C101 dragon1 retired
C102 dragon1 retired

Define the mapping from JSON to CSV columns in the "mapping" section:

"mapping": {
    "columns": [
      {
        "accessor": "capsule_serial",
        "name": "capsule_serial",
        "type": "STRING"
      },
      {
        "accessor": "capsule_id",
        "name": "capsule_id",
        "type": "STRING"
      }
    ],
    "rows_accessor": "$.[*]"
  }
  • accessor: name of the JSON element (more about JSONpath)
  • name: name of a CSV column
  • rows_accessor: iterator on elements in JSON

Click the Test button. Now you can see the resulting table in the “CSV(Extracted Data)” panel.
clipboard-2024-11-20-21-36-15-410Z.png

JSONPath

To map data from JSON to CSV, we use a subset of the JSONPath standard: https://goessner.net/articles/JsonPath/. To experiment with JSONPath, you can use the site https://jsonpath.com/.

Let's say we have the following JSON:

{
  "people": [
    {
      "name": "Max",
      "city": "San Francisco",
      "phone_numbers": [
        {"number": "800-999-xxxx"}
      ]
    },
    {
      "name": "Alex",
      "city": "San Francisco",
      "phone_numbers": [
        {"number": "800-991-xxxx"}
      ]
    }
  ]
}

To build the CSV table from this JSON we need rows_accessor. We want to build a list of people. Therefore, for the rows, we need to specify $.people[*]

  1. from the root element $
  2. we access the people element
  3. which is an array []
  4. and we need all * elements of the array.

Next, we write expressions inside the array elements to define the columns.

CSV column name JSONPath Result
name name Max, Alex
city city San Francisco,San Francisco
phone phone_numbers[0].number 800-999-xxxx, 800-991-xxxx

Nested Columns

Accessing nested columns. In the following example , how can I retrieve the values of the nested product or plan details?

{
    "subscription_id": 1,
    "subscriber_code": "AAAAA001",
    "product": {
      "name": "Product 1",
      "id": 1,
      "ucode": "41a5538b-20f6-4948-819f-9bd15aad6f21"
    },
    "accession_date": 1591291899000,
    "plan": {
      "recurrency_period": 30,
      "max_charge_cycles": 10,
      "name": "Plan 1",
      "id": 1
    },
    "status": "ACTIVE"
  }

To read data from this JSON you will need to use JSONPath.

$.product.name
$.product.id
$.product.ucode

$.plan.recurrency_period
$.plan.name

You can test JSONpath expressions here: https://jsonpath.com/

Nested Arrays

To illustrate, consider the following JSON structure:

[
    {
        "id": 1,
        "commissions": [
            {"commission": 1},
            {"commission": 2}
        ]
    },
    {
        "id": 2,
        "commissions": [
            {"commission": 3},
            {"commission": 4}
        ]
    }
]

Before determining the appropriate JSONPath, we first need to decide on the flat table layout we want to achieve. Here are some options:

  1. List all commissions in one column, separated by commas.
  2. Create one column for the 1st commission.
  3. Create two columns if there are exactly two commissions.
  4. Separate tables linked by a key.

Options 2 and 3 are preferred due to their ease of implementation and user-friendly layout.

Option 1: List all commissions in one column, separated by commas

id commissions
1 1,2
2 3,4

This scenario isn’t supported.

Option 2: Create one column for the 1st commission

id commission_1
1 1
2 3

We support this option by using an index in JSONPath, like $..commissions[0].commission.

Option 3: Create two columns for two commissions.

id commission_1 commission_2 commission_3
1 1 2 0.21
2 3 4 0.48

This scenario is supported.
commissions[0].commission.value
commissions[1].commission.value
commissions[2].commission.value

I inserted these columns to configuration

Option 4: What if the number of commissions is unknown, making it unclear how many columns to create?

In this case, we can split into two linked tables:

Table 1: id

id
1
2

Table 2: id and commissions

id commissions
1 1
1 2
2 3
2 4

This scenario is supported, but it requires two separate integrations: one for each table.
For the second table use $..commissions[*].commission

Data Types

Source JSON example Conduit’s Datatype
"published": true BOOL
"price": 2.1 MONEY
"value": 2.1 DECIMAL
"value": 42 INTEGER
"share": 2.1 PERCENT
"name": "Flow 1" STRING
"created_by": "620a104f-d233-4389-a335-73b43bc36902" STRING
"created_at": "2021-04-05" DATE
"created_at": "2021-04-05T16:21:09Z" DATETIME
"createdAt": 1719246818000 DATETIME, also Use additional JSON element "convert_from": "UNIXEPOCHMILI"
close