AI-powered Text Extraction

Tags

ai text extraction solution

Introduction to Conduit's AI-powered Text Extraction

Scanning documents can sometimes introduce challenges with unstructured text data. Conduit steps in as the solution, offering powerful AI capabilities for post-processing scanned text. Once you've scanned your documents, Conduit empowers you to go beyond mere recognition.

Imagine you have the results of recognizing various scanned documents, where the scanned textual data is within table cells. Your goal is to transform this unstructured text into a table format, with essential information neatly organized into separate columns.

Key Post-Processing Capabilities:

  • Data Structuring: Conduit's AI assistant effortlessly transforms raw, scanned text into structured data, making it easier to analyze and interpret.
  • Entity Recognition: Identify and extract specific entities or information from the scanned text, streamlining the retrieval of essential details.
  • Multi-column Mapping: Seamlessly map diverse data points within the scanned text to multiple columns, enhancing the granularity of your structured data.

A couple of brief examples showcasing the functionality of Conduit's AI post-processing capabilities:

Invoice Data Extraction: Conduit excels in extracting critical details from scanned invoices, seamlessly mapping data like billing amounts, dates, and vendor information to distinct columns. This ensures quick and accurate processing for financial analysis.

Contract Parsing: Transform lengthy, unstructured contract texts into organized tables with Conduit's post-processing. Identify key clauses, parties involved, and contract dates, allowing for efficient contract management and review.

Survey Responses Analysis: Post-process scanned survey responses using Conduit to categorize and map textual answers to multiple columns. This simplifies sentiment analysis and aids in understanding trends and patterns within survey data.

Now, let's dive into the detailed instructions!

Selecting Essential Blocks for the Text Extracting Workflow


  1. Log into your Conduit account.

  2. Navigate to the Workflows tab and click on the 'Create New Workflow' button tab and click on 'Create New Workflow.'

  3. Start by adding a 'Pull CSV Import' block to your canvas.

  4. Attach to this block the link to your sheet.

  5. Drag and drop the 'Ask AI (for each row)' block.

Configuring the Ask AI block

The Ask AI block in Conduit serves as an intelligent intermediary, prompting the AI to extract specific information from raw data. Placed in the workflow, it enables precise data mapping by instructing the AI on what to extract and where to place the results. This block has three configurable parameters:

Dataset : The dataset containing the digital documents.
Input Column: The column that contains the data you want to process. This is typically the column where your raw digital data is located.
Output Column: The column where you want the processed data to be placed.

Prompts Crafting Techniques

You have the flexibility to define precise instructions for the AI by crafting natural language commands, commonly referred to as prompts. In the following example, we'll attempt to map the order details to several columns within the same sheet

So, here is how our raw data looks like

clipboard-2024-07-18-23-34-57-149Z.png

Let's compose the following prompt for extracting date & time of the order and run the workflow.

Extract the date of the order from the text in the dataset. Just print the date of the order {value} Extract the date of the order from the text in the dataset. Just print the date of the order {value}

Important : Please do not modify the {value} parameter and bear in mind, that it would be necessary to keep it within your prompt

If everything were done correctly, you should receive a notification on the bottom right corner of the screen, once the workflow is successfully performed. And here are the results of our workflow

clipboard-2024-07-18-23-56-07-579Z.png

Mapping the multiple cells

In the previous example, we've already mapped the date and time of the order to the Date column. Our goal for now is to modify that workflow to map the customer's name, shipping address, and the full name of the fulfilling employee into the separate columns

To achieve this, you can add several Ask AI blocks in a Workflow, assigning to each:

  • Source Dataset
  • Input Column
  • Output Column
  • Prompt: Input prompt text

Important : It's crucial to have for each subsequent Ask AI block, the previous Ask AI block selected as the dataset.

clipboard-2024-07-19-00-01-30-327Z.png

Now, let's add to the canvas the block, that would be responsible for extracting the customer's name and use the following prompt:

Extract the full name of the customer and print it as is {value}

clipboard-2024-07-19-00-03-40-160Z.png

Then, drag to the canvas another block, that'd stand for extracting the shipping address and this prompt:

Extract the shipping address of the customer and print it as is {value}

clipboard-2024-07-19-00-07-34-975Z.png

Now, we need to add the final Ask AI block to the sequence. This one would handling extracting the employee name. And here is the prompt for it:

Extract the full name of the employee who fulfilled the order and print its name as is {value}

clipboard-2024-07-19-00-09-29-157Z.png

Add the 'Save to Google Sheet' block and link it with the output sheet (this could be the same sheet with your raw data)

clipboard-2024-07-19-00-10-44-573Z.png

We're almost set. Let's click on the 'Run Workflow' button and wait for the workflow to finish its job. Here is the output of the workflow:

clipboard-2024-07-19-00-11-18-606Z.png

19 Jul 2024 12:12 AM