The Split Text processing component in this flow splits the incoming Data into chunks to be embedded into the vector store component.The component offers control over chunk size, overlap, and separator, which affect context and granularity in vector store retrieval results.
This component performs operations on DataFrame rows and columns.To use this component in a flow, connect a component that outputs DataFrame to the DataFrame Operations component.This example fetches JSON data from an API. The Smart function component extracts and flattens the results into a tabular DataFrame. The DataFrame Operations component can then work with the retrieved data.
The API Request component retrieves data with only source and result fields. For this example, the desired data is nested within the result field.
Connect a Smart function to the API request component, and a Language model to the Smart function. This example connects a Groq model component.
In the Groq model component, add your Groq API key.
To filter the data, in the Smart function component, in the Instructions field, use natural language to describe how the data should be filtered.
TIPAvoid punctuation in the Instructions field, as it can cause errors.
To run the flow, in the Smart function component, click Run component.
To inspect the filtered data, in the Smart function component, click Inspect output. The result is a structured DataFrame.
Add the DataFrame Operations component, and a Chat Output component to the flow.
In the DataFrame Operations component, in the Operation field, select Filter.
To apply a filter, in the Column Name field, enter a column to filter on. This example filters by name.
Click Playground, and then click Run Flow. The flow extracts the values from the name column.
This component can perform the following operations on Pandas DataFrame.
Operation
Required Inputs
Info
Add Column
new_column_name, new_column_value
Adds a new column with a constant value.
Drop Column
column_name
Removes a specified column.
Filter
column_name, filter_value
Filters rows based on column value.
Head
num_rows
Returns first n rows.
Rename Column
column_name, new_column_name
Renames an existing column.
Replace Value
column_name, replace_value, replacement_value
Replaces values in a column.
Select Columns
columns_to_select
Selects specific columns.
Sort
column_name, ascending
Sorts DataFrame by column.
Tail
num_rows
Returns last n rows.
Parameters
Inputs
Name
Display Name
Info
df
DataFrame
The input DataFrame to operate on.
operation
Operation
The DataFrame operation to perform. Options include Add Column, Drop Column, Filter, Head, Rename Column, Replace Value, Select Columns, Sort, and Tail.
column_name
Column Name
The column name to use for the operation.
filter_value
Filter Value
The value to filter rows by.
ascending
Sort Ascending
Whether to sort in ascending order.
new_column_name
New Column Name
The new column name when renaming or adding a column.
new_column_value
New Column Value
The value to populate the new column with.
columns_to_select
Columns to Select
A list of column names to select.
num_rows
Number of Rows
The number of rows to return for head/tail operations. The default is 5.
This component performs operations on Data objects, including selecting keys, evaluating literals, combining data, filtering values, appending/updating data, removing keys, and renaming keys.
To use this component in a flow, connect a component that outputs Data to the Data Operations component’s input. All operations in the component require at least one Data input.
In the Operations field, select the operation you want to perform. For example, send this request to the Webhook component. Replace YOUR_FLOW_ID with your flow ID.
In the Data Operations component, select the Select Keys operation to extract specific user information. To add additional keys, click Add More.
Filter by name, username, and email to select the values from the request.
This component converts one or multiple Data objects into a DataFrame. Each Data object corresponds to one row in the resulting DataFrame. Fields from the .data attribute become columns, and the .text field (if present) is placed in a ‘text’ column.
To use this component in a flow, connect a component that outputs Data to the Data to Dataframe component’s input. This example connects a Webhook component to convert text and data into a DataFrame.
To view the flow’s output, connect a Chat Output component to the Data to Dataframe component.
Send a POST request to the Webhook containing your JSON data. Replace YOUR_FLOW_ID with your flow ID. This example uses the default LLM Controls server address.
In the Playground, view the output of your flow. The Data to DataFrame component converts the webhook request into a DataFrame, with text and data fields as columns.
Send another employee data object.
In the Playground, this request is also converted to DataFrame.
Parameters
Inputs
Name
Display Name
Info
data_list
Data or Data List
One or multiple Data objects to transform into a DataFrame.
Outputs
Name
Display Name
Info
dataframe
DataFrame
A DataFrame built from each Data object’s fields plus a text column.
This component formats DataFrame or Data objects into text using templates, with an option to convert inputs directly to strings using stringify.To use this component, create variables for values in the template the same way you would in a Prompt component. For DataFrames, use column names, for example Name: {Name}. For Data objects, use {text}.To use the Parser component with a Structured Output component, do the following:
Connect a Structured Output component’s DataFrame output to the Parser component’s DataFrame input.
Connect the File component to the Structured Output component’s Message input.
Connect the OpenAI model component’s Language Model output to the Structured Output component’s Language Model input.
The flow looks like this:
In the Structured Output component, click Open Table. This opens a pane for structuring your table. The table contains the rows Name, Description, Type, and Multiple.
Create a table that maps to the data you’re loading from the File loader. For example, to create a table for employees, you might have the rows id, name, and email, all of type string.
In the Template field of the Parser component, enter a template for parsing the Structured Output component’s DataFrame output into structured text. Create variables for values in the template the same way you would in a Prompt component. For example, to present a table of employees in Markdown:
To run the flow, in the Parser component, click Run component.
To view your parsed text, in the Parser component, click Inspect output.
Optionally, connect a Chat Output component, and open the Playground to see the output.
For an additional example of using the Parser component to format a DataFrame from a Structured Output component, see the Market Research template flow.
Parameters
Inputs
Name
Display Name
Info
mode
Mode
The tab selection between “Parser” and “Stringify” modes. “Stringify” converts input to a string instead of using a template.
pattern
Template
The template for formatting using variables in curly brackets. For DataFrames, use column names, such as Name: {Name}. For Data objects, use {text}.
input_data
Data or DataFrame
The input to parse. Accepts either a DataFrame or Data object.
sep
Separator
The string used to separate rows or items. The default is a newline.
clean_data
Clean Data
When stringify is enabled, this option cleans data by removing empty rows and lines.
This component extracts patterns from text using regular expressions. It can be used to find and extract specific patterns or information from text data.To use this component in a flow:
Connect the Regex Extractor to a URL component and a Chat Output component.
In the Regex Extractor tool, enter a pattern to extract text from the URL component’s raw output. This example extracts the first paragraph from the “In the News” section of https://en.wikipedia.org/wiki/Main_Page:
To use this component in a flow, connect a component that outputs DataFrames, Data, or Messages to the Save to File component’s input. The following example connects a Webhook component to two Save to File components to demonstrate the different outputs.
In the Save to File component’s Input Type field, select the expected input type. This example expects Data from the Webhook.
In the File Format field, select the file type for your saved file. This example uses .md in one Save to File component, and .xlsx in another.
In the File Path field, enter the path for your saved file. This example uses ./output/employees.xlsx and ./output/employees.md to save the files in a directory relative to where LLM Controls is running. The component accepts both relative and absolute paths, and creates any necessary directories if they don’t exist.
tipIf you enter a format in the file_path that is not accepted, the component appends the proper format to the file. For example, if the selected file_format is csv, and you enter file_path as ./output/test.txt, the file is saved as ./output/test.txt.csv so the file is not corrupted.
Send a POST request to the Webhook containing your JSON data. Replace YOUR_FLOW_ID with your flow ID. This example uses the default LLM Controls server address.
In your local filesystem, open the outputs directory. You should see two files created from the data you’ve sent: one in .xlsx for structured spreadsheets, and one in Markdown.
This component uses an LLM to generate a Lambda function for filtering or transforming structured data.To use the Smart function component, you must connect it to a Language Model component, which the component uses to generate a function based on the natural language instructions in the Instructions field.This example gets JSON data from the https://jsonplaceholder.typicode.com/users API endpoint. The Instructions field in the Smart function component specifies the task extract emails. The connected LLM creates a filter based on the instructions, and successfully extracts a list of email addresses from the JSON data.
The natural language instructions for how to filter or transform the data using a Lambda function, such as Filter the data to only include items where the 'status' is 'active'.
sample_size
Sample Size
For large datasets, the number of characters to sample from the dataset head and tail.
max_size
Max Size
The number of characters for the data to be considered “large”, which triggers sampling by the sample_size value.
This component splits text into chunks based on specified criteria. It’s ideal for chunking data to be tokenized and embedded into vector databases.The Split Text component outputs Chunks or DataFrame. The Chunks output returns a list of individual text chunks. The DataFrame output returns a structured data format, with additional text and metadata columns the applied.
To use this component in a flow, connect a component that outputs Data or DataFrame to the Split Text component’s Data port. This example uses the URL component, which is fetching JSON placeholder data.
In the Split Text component, define your data splitting parameters.
This example splits incoming JSON data at the separator },, so each chunk contains one JSON object.The order of precedence is Separator, then Chunk Size, and then Chunk Overlap. If any segment after separator splitting is longer than chunk_size, it is split again to fit within chunk_size.After chunk_size, Chunk Overlap is applied between chunks to maintain context.
Connect a Chat Output component to the Split Text component’s DataFrame output to view its output.
Click Playground, and then click Run Flow. The output contains a table of JSON objects split at },.
Clear the Separator field, and then run the flow again. Instead of JSON objects, the output contains 50-character lines of text with 10 characters of overlap.