See also

Tags

solution finance healthcare
11 Jul 2024 01:19 AM

Privacy and AI

close

How to mask the sensitive information

Question: While sending data to AI for analysis, will I be able to mask the sensitive information of my clients and create user IDs to share only those data for analysis?

Answer:

There are two approaches to ensure data privacy when using AI for analysis.

  1. Local Analysis:
    Conduit offers two modes for working with AI. One of the modes, called Analyst, does not send data to the AI. Instead, it asks the AI to generate a program that performs the analysis locally on the server where the data resides. This way, your data is not sent to the cloud. More details in the "Data Sharing with the AI / Local Analysis" section.

  2. Data Cleanup Before Sending:
    Another approach is to sanitize the data before sending it to the cloud AI. You can create a workflow that removes sensitive information from the data. This can involve removing entire columns, such as account names or IDs, or obfuscating specific parts of columns, like the first digits of account numbers and names. You need to connect these cleaning workflows instead of directly connecting the raw data.

Data Sharing with the AI / Local Analysis

Conduit does not directly send data to the OpenAI API. Instead, it feeds OpenAI metadata (such as table column descriptions) and asks the AI to write a Python program that, with access to the data, can answer the question. Conduit then runs the program on a local server in a sandbox where the program has access to the data.

There are several reasons for this approach:

  • Context window limits: The token limit is 100k, which restricts data volume to around 1000 rows, impractical for business tasks.
  • Elimination of hallucinations: Using Python programs for arithmetic tasks helps avoid AI calculation errors.

However, in certain operations, such as automatically generating column descriptions, we may pass data samples. This happens only with a manual command from the user and can be disabled as it is not a core function of the product.

Other privacy topics

Data Processors

Under the GDPR, AWS acts as our data processor. We utilize AWS services like EC2, S3, and RDS. OpenAI also serves as a data processor, handling metadata and data samples.

We do not use any other data processors beyond AWS and OpenAI.

To improve our product, we use Mixpanel, SessionStack, and Google Analytics. These platforms contain the emails of our logged-in users, but no user data is sent there.

Controlled Access to Internal Data

Moreover, access to internal data is stringently controlled within Conduit. Here’s how it ensures data security:

  • Authentication and Authorization: Conduit uses robust authentication and authorization mechanisms to safeguard your internal data. It integrates with your data systems through SQL and APIs, ensuring that different users have different access levels.
  • User-Specific Data Retrieval: When a user asks a question, the LLM generates a program to answer it. Conduit then provides the user's identity to your internal data API, which returns only the data that the user is authorized to see.
  • Executing the Program: Conduit runs the generated program on the subset of data that the user is permitted to access, ensuring that the response is based on authorized data only.

Mitigating Risks of Sharing Sensitive Information

The system's design minimizes the risk of clients or patients sharing sensitive information. While this depends on specific use cases, the combined effect of the first two points ensures a setup that significantly reduces the potential for unauthorized access or exposure of sensitive information. By sharing only metadata and tightly controlling access to actual data, Conduit provides a secure and compliant environment for data interaction.