Use this file to discover all available pages before exploring further.
Data components load data from a source into your flow.They may perform some processing or type checking, like converting raw HTML data into text, or ensuring your loaded file is of an acceptable type.
The URL data component loads content from a list of URLs.In the component’s URLs field, enter the URL you want to loadTo add multiple URL fields, click the add buttonAlternatively, connect a component that outputs the Message type, like the Chat Input component, to supply your URLs from a component
This component loads and parses files of various supported formats and converts the content into a Data object. It supports multiple file types and provides options for parallel processing and error handling.To Load a Document
The path to files to load. Supports individual files or bundled archives.
file_path
Server File Path
A Data object with a file_path property pointing to the server file or a Message object with a path to the file. Supersedes ‘Path’ but supports the same file types.
separator
Separator
The separator to use between multiple outputs in Message format.
silent_errors
Silent Errors
If true, errors do not raise an exception.
delete_server_file_after_processing
Delete Server File After Processing
If true, the Server File Path is deleted after processing.
ignore_unsupported_extensions
Ignore Unsupported Extensions
If true, files with unsupported extensions are not processed.
ignore_unspecified_files
Ignore Unspecified Files
If true, Data with no file_path property is ignored.
use_multithreading
[Deprecated] Use Multithreading
Set ‘Processing Concurrency’ greater than 1 to enable multithreading. This option is deprecated.
concurrency_multithreading
Processing Concurrency
When multiple files are being processed, the number of files to process concurrently. Default is 1. Values greater than 1 enable parallel processing for 2 or more files.
This component fetches content from one or more URLs, processes the content, and returns it in various formats. It supports output in plain text or raw HTML.In the component’s URLs field, enter the URL you want to load
To use this component in a flow, connect the DataFrame output to a component that accepts the input. For example, connect the URL component to a Chat Output component.
In the URL component’s URLs field, enter the URL for your request. This example uses llmc.org.
Optionally, in the Max Depth field, enter how many pages away from the initial URL you want to crawl. Select 1 to crawl only the page specified in the URLs field. Select 2 to crawl all pages linked from that page. The component crawls by link traversal, not by URL path depth.
Click Playground, and then click Run Flow. The text contents of the URL are returned to the Playground as a structured DataFrame.
In the URL component, change the output port to Message, and then run the flow again. The text contents of the URL are returned as unstructured raw text, which you can extract patterns from with the Regex Extractor tool.
Connect the URL component to a Regex Extractor and Chat Output.
In the Regex Extractor tool, enter a pattern to extract text from the URL component’s raw output. This example extracts the first paragraph from the “In the News” section of https://en.wikipedia.org/wiki/Main_Page.
Parameters
Inputs
Name
Display Name
Info
urls
URLs
Click the ’+’ button to enter one or more URLs to crawl recursively.
max_depth
Max Depth
Controls how many ‘clicks’ away from the initial page the crawler will go.
prevent_outside
Prevent Outside
If enabled, only crawls URLs within the same domain as the root URL.
use_async
Use Async
If enabled, uses asynchronous loading which can be significantly faster but might use more system resources.
format
Output Format
Output Format. Use Text to extract the text from the HTML or HTML for the raw HTML content.
timeout
Timeout
Timeout for the request in seconds.
headers
Headers
The headers to send with the request.
Outputs
Name
Display Name
Info
data
Data
A list of Data objects containing fetched content and metadata.
This component defines a webhook trigger that runs a flow when it receives an HTTP POST request.If the input is not valid JSON, the component wraps it in a payload object so that it can be processed and still trigger the flow. The component does not require an API key.When you add a Webhook component to a flow, the flow’s API access pane exposes an additional Webhook cURL tab that contains a POST /v1/webhook/$FLOW_ID code snippet. You can use this request to send data to the Webhook component and trigger the flow. For example:To test the webhook component:
Add a Webhook component to the flow.
Connect the Webhook component’s Data output to the Data input of a Parser component.
Connect the Parser component’s Parsed Text output to the Text input of a Chat Output component.
In the Parser component, under Mode, select Stringify. This mode passes the webhook’s data as a string for the Chat Output component to print.
To send a POST request, copy the code from the Webhook cURL tab in the API pane and paste it into a terminal.
Send the POST request.
Open the Playground. Your JSON data is posted to the Chat Output component, which indicates that the webhook component is correctly triggering the flow.
Parameters
Inputs
Name
Display Name
Description
data
Payload
Receives a payload from external systems through HTTP POST requests.
curl
cURL
The cURL command template for making requests to this webhook.
endpoint
Endpoint
The endpoint URL where this webhook receives requests.
Outputs
Name
Display Name
Description
output_data
Data
Outputs processed data from the webhook input, and returns an empty Data object if no input is provided. If the input is not valid JSON, the component wraps it in a payload object.
This component recursively loads files from a directory and converts the content into Data objects. It supports filtering by file type, depth control, and optional multithreading for parallel processing.To Load a Directory
Enter the directory path or use the default current directory
Optionally select specific file types to filter
Set the search depth and enable recursive mode if needed
Paramters
Inputs
Name
Display Name
Info
path
Path
Path to the directory to load files from. Defaults to current directory (’.‘)
types
File Types
File types to load. Select one or more types or leave empty to load all supported types.
This component loads and parses spreadsheet files (.xlsx, .xls) and converts the content into a searchable data format for analysis. It supports sheet selection, row limits, caching, and optimized loading for large files.To Load a Spreadsheet
Click the Select files button and upload an .xlsx or .xls file
Optionally specify a sheet name (defaults to the first sheet)
Configure row limits and caching as needed
Parameters
Inputs
Name
Display Name
Info
path
Files
The path to the spreadsheet file to load. Supports .xlsx and .xls files.
file_path
Server File Path
A Data object with a file_path property pointing to the server file or a Message object with a path to the file. Supercedes ‘Path’ but supports same file types.
separator
Separator
Specify the separator to use between multiple outputs in Message format.
silent_errors
Silent Errors
If true, errors will not raise an exception.
delete_server_file_after_processing
Delete Server File After Processing
If true, the Server File Path will be deleted after processing.
ignore_unsupported_extensions
Ignore Unsupported Extensions
If true, files with unsupported extensions will not be processed.
ignore_unspecified_files
Ignore Unspecified Files
If true, Data with no file_path property will be ignored.
sheet_name
Sheet Name
Name of the sheet to load (default: first sheet).
max_rows
Max Rows
Maximum number of rows to load (0 = all rows).
include_index
Include Row Index
Include row index as a column.
text_key
Text Key
The key to use for the text column. Defaults to ‘text’.
enable_caching
Enable Caching
Cache loaded data to avoid reloading for subsequent queries.
cache_dir
Cache Directory
Directory to store cached data (default: system temp directory).
optimize_for_large_files
Optimize for Large Files
Use optimized loading for files with 100,000+ rows.
chunk_size
Chunk Size
Number of rows to process in chunks for large files.
Outputs
Name
Display Name
Info
data_list
Data List
The parsed spreadsheet content as a list of Data objects.
summary
Data Summary
Summary information about the loaded data including total rows, columns, and sample data.
dataframe
DataFrame
The spreadsheet content as a DataFrame object for SQL queries.
This component loads and parses child links from a root URL recursively. It crawls web pages up to a specified depth and extracts content in text or HTML format.To Crawl URLs
Enter one or more URLs by clicking the + button
Set the Max Depth to control how many link levels to crawl
Configure output format (Text or HTML)
Parameters
Inputs
Name
Display Name
Info
urls
URLs
Enter one or more URLs to crawl recursively, by clicking the ’+’ button.
max_depth
Max Depth
Controls how many ‘clicks’ away from the initial page the crawler will go: depth 1: only the initial page; depth 2: initial page + all pages linked directly from it; depth 3: initial page + direct links + links found on those pages. Note: This is about link traversal, not URL path depth.
prevent_outside
Prevent Outside
If enabled, only crawls URLs within the same domain as the root URL. This helps prevent the crawler from going to external websites.
use_async
Use Async
If enabled, uses asynchronous loading which can be significantly faster but might use more system resources.
format
Output Format
Output Format. Use ‘Text’ to extract the text from the HTML or ‘HTML’ for the raw HTML content.
This component uploads files to an Amazon S3 bucket. It supports two upload strategies: storing parsed data or uploading the original file as-is, with options for path prefixing and stripping.To Upload Files to S3
Choose the strategy to upload the file. By Data means that the source file is parsed and stored as LLM Controls data. By File Name means that the source file is uploaded as is.