Bulk Analysis
Credal’s Bulk Analysis feature allows users to analyze a collection of documents or rows in a spreadsheet simultaneously, using LLMs or Credal Copilots. Use it to automatically extract fields from, classify or synthesize multiple documents simultaneously.
In Depth Guide
This is a 20 minute guide to setting up and running a Bulk Analysis in Credal. Prerequisites are that you have read the AI Copilots User guide.
Bulk Analysis is a powerful tool that you can use to accelerate analyses of a large number of documents, transcripts, or spreadsheets using LLMs. If you have a large amount of documents or data and you want to be able to classify each, extract key fields, or in any other way ask the same questions across each of the documents, this Credal feature will help you accomplish that by parallelizing your LLM requests for each document.
In this guide, we will walk through how to set up and run Bulk Analysis in detail. For a quick overview of this feature, visit our blog!
The Basics
Introduction to Bulk Analysis
Today’s Bulk Analysis demonstrates information consolidation. This is the process of combining a substantial amount of data into a single repository which can then be used to generate meaningful findings and evaluations.
Bulk Analysis harnesses the capabilities of Credal Copilots and to granularly analyze each document or row of a spreadsheet that you want to dive deeper into. The analysis will be driven by how you configure your Copilot.
By leveraging Credal’s Bulk Analysis, you can turn almost any data set into actionable insights with ease. This feature goes beyond what a single large language model (LLM) call can achieve, potentially saving weeks of manual labor while providing unparalleled depth of understanding.
Needless to say, this tool is perfect for text based analysis on extremely large data sets.
Example Use Cases
- Categorizing and Tagging text across thousands of requests/questions/entries/etc.
- Performing Analyses and Identifying Trends on Jira Projects
- GTM Outbound Lead and Email Generation
- Analyzing Call Transcripts for Buying Indication
Setting up Your Copilot
There are 2 key things you need in your Copilot to make Bulk Analysis work: Suggested Questions and User Input. Each data source will be passed in to the copilot as a User Input which will then be asked each of the suggested questions. If these are not configured in your Copilot, your Bulk Analysis run (preview or not) will error.
Defining User Inputs
For each row in a spreadsheet or source in a Document Collection, Bulk Analysis will pass that into your Copilot as a “User Input.” Navigate to your Copilot Configuration, toggle the switch to “Include full contents from documents.” This means that your full document or spreadsheet row will be passed to the LLM as is without any search done upon it.
Click on “Add User input” at the bottom right of this section and you will be prompted like so:
Add a name and description for your user input. What you name or describe it as won’t affect your Bulk Analysis run.
Writing Prompts
This is the heart of what new insights you will discover from your Bulk Analysis. Writing good prompts is the key to getting meaningful insights from your input data. To write these prompts, it is important to deeply think about what trends you want to uncover and what insights you want to discover from your bulk analysis source data. To create one, navigate to the “Suggested questions” subsection in the Copilot configuration. Each question corresponds to a column in your final Bulk Analysis table with an extracted insight.
Let’s dive deeper into what kinds of questions might be relevant. Let’s say I am analyzing trends for Jira ticket blockers. To accomplish this, I might want to count for each ticket how many blockers stalled progress on it over the course of its completion and analyze what kinds of tickets experienced more blockers than others. The prompts I’d write would then be:
Notice that I do not ask for metadata such as “Assignee” or “Date” since this data is structured data on a ticket. Credal will automatically export that information for you. Now once I run my Bulk Analysis, I have the ability to detect any outliers for # of blockers, access what they were, and understand whether the bottleneck is in the frontend or backend team.
Testing and Refining your Prompts
After you’ve done a first pass at writing questions that you believe will extract meaningful data from each source or spreadsheet row, it’s time to refine the prompts. The Preview run will be super helpful for this (few sections down) or you can navigate to the Preview/Evaluate tab to test out the prompts you’ve written. The practice of refining prompts to give you responses you want is called Prompt Engineering. Stay tuned to our blog for a post on Prompt Engineering and how you can leverage tried and tested techniques to get the most value out of an LLM call. Don’t be afraid to spend some time trying and rewriting these prompts to best fit what you’re looking for.
Setting up Your Source Data
You can choose to either use a Document Collection or a Tabular formed Data Source as your Bulk Analysis template. If you choose a Document Collection, Bulk Analysis will flatten any folder hierarchies and list each individual document as a row in the Bulk Analysis. If you choose a spreadsheet, each row will be considered a separate data source.
Using a Document Collection
Navigate to the “Document Collection” tab on the left of the Credal UI.
Create a Document Collection and either manually or via API upload data you want to analyze to Credal. This might be sales transcripts as a series of documents, a Jira project, or a folder with meeting notes.
Using Tabular Data
Alternatively, if your data is structured in a spreadsheet, you can opt to use this format as your input for Bulk Analysis. This is especially useful if your data is already neatly organized into rows and columns. As of today, Credal will only look at the first sheet in a Google Sheets file. More functionality to come!
Configuring your Bulk Analysis
Navigate to the Bulk Analysis tab on the left of the Credal home page.
Create a new Bulk Analysis the same way you would create a new Document Collection or Copilot. The description isn’t being used anywhere but it’s certainly helpful to be as descriptive as possible for collaborative efforts.
Linking Copilot and Source Data
Select the Copilot you created from the dropdown and select the Document Collection or Spreadsheet that you want to analyze.
Validating Setup
After linking, always validate your setup to make sure nothing is missed. This means checking that you have 1 User Input in the “Pinned sources” section, Suggested Questions set up (these are your prompts), and data connected to your config.
Running the Bulk Analysis
It really is as easy as clicking a button.
Performing a Preview Run
Before running the full analysis, it’s wise to do a preview run on a smaller subset of data. This helps you identify any issues early on. The preview will only display the first 5 documents/rows from your source data which allows you to quickly iterate on your Bulk Analysis configuration. This is a good place to further tweak prompts, adjusting for better quality and output structure.
Conducting a Full Run
Once satisfied with the preview, proceed to the “Run” tab to conduct a full run. This will enable you to analyze every document in your collection comprehensively, unlocking thorough and actionable insights.
Interpreting Results
Chatting with Results
Post-analysis, Credal allows you to interact with your results conversationally. You can ask, “What were some common themes around security and governance?” or “Exactly how many customers mentioned the Salesforce integration as being useful for them?” This interactive feature can highlight trends and deep-dive into specific insights seamlessly.
Downloading CSV
For further analysis, you can download the results as a CSV file. This is useful for creating charts, aggregations, or integrating results with other data tools. If you’ve crafted your questions well, you can even create numerical charts or extract mathematical findings by attaching the output spreadsheet to the web UI and turning on Code Interpreter!
Exploring Further Capabilities
Beyond primary analysis, there are exciting future expansions on the horizon, such as:
- Transforming Bulk Analysis results into ongoing valuable data assets. This means your output table will be continuously updated without your supervision.
- Integrating the results into live dashboards like Tableau for continuous visual updates.