Vector databases store vector data, which backs AI workloads like chatbots and Retrieval Augmented Generation.Vector database components establish connections to existing vector databases or create in-memory vector stores for storing and retrieving vector data.Vector database components are distinct from memory components, which are built specifically for storing and retrieving chat messages from external databases.
This example uses the Astra DB vector store component. Your vector store component’s parameters and authentication may be different, but the document ingestion workflow is the same. A document is loaded from a local machine and chunked. The Astra DB vector store generates embeddings with the connected model component and stores them in the connected Astra DB database.This vector data can then be retrieved for workloads like Retrieval Augmented Generation.The user’s chat input is embedded and compared to the vectors embedded during document ingestion for a similarity search. The results are output from the vector database component as a Data object and parsed into text. This text fills the {context} variable in the Prompt component, which informs the Open AI model component’s responses.Alternatively, connect the vector database component’s Retriever port to a retriever tool, and then to an agent component. This enables the agent to use your vector database as a tool and make decisions based on the available data.
LLMC Vector DB helps you store and search through your personal documents in a way that feels fast, secure, and tailored just for you. Whether you’re uploading files, adding notes, or saving content from the web, everything is stored in a way that makes it easy to find later , without digging through folders.
Private and Secure
Your content is always separated from others. Everything you add is linked to your account only, ensuring complete privacy.Organized Just for You
You’ll only see your own collections. No clutter. No mix-ups. It automatically shows the content that belongs to you neatly filtered and personalized.Flexible Setup
While you focus on your content, LLM Controls is smartly configured to perform at its best, adapting to different needs and environments. Developers can still fine-tune things like speed, search preferences, and more if needed.Seamless Content Capture
Just upload a document or paste a link, and LLM Controls takes care of the rest. It understands and organizes the content instantly, so you can find what you need later in just a few words.
This component implements a Vector Store using Astra DB with search capabilities.For more information, see the DataStax documentation.
Parameters
Inputs
Name
Display Name
Info
token
Astra DB Application Token
The authentication token for accessing Astra DB.
environment
Environment
The environment for the Astra DB API Endpoint. For example, dev or prod.
database_name
Database
The database name for the Astra DB instance.
api_endpoint
Astra DB API Endpoint
The API endpoint for the Astra DB instance. This supersedes the database selection.
collection_name
Collection
The name of the collection within Astra DB where the vectors are stored.
keyspace
Keyspace
An optional keyspace within Astra DB to use for the collection.
embedding_choice
Embedding Model or Astra Vectorize
Choose an embedding model or use Astra vectorize.
embedding_model
Embedding Model
Specify the embedding model. Not required for Astra vectorize collections.
number_of_results
Number of Search Results
The number of search results to return. Default:4.
search_type
Search Type
The search type to use. The options are Similarity, Similarity with score threshold, and MMR (Max Marginal Relevance).
search_score_threshold
Search Score Threshold
The minimum similarity score threshold for search results when using the Similarity with score threshold option.
advanced_search_filter
Search Metadata Filter
An optional dictionary of filters to apply to the search query.
autodetect_collection
Autodetect Collection
A boolean flag to determine whether to autodetect the collection.
content_field
Content Field
A field to use as the text content field for the vector store.
deletion_field
Deletion Based On Field
When provided, documents in the target collection with metadata field values matching the input metadata field value are deleted before new data is loaded.
ignore_invalid_documents
Ignore Invalid Documents
A boolean flag to determine whether to ignore invalid documents at runtime.
astradb_vectorstore_kwargs
AstraDBVectorStore Parameters
An optional dictionary of additional parameters for the AstraDBVectorStore.
Outputs
Name
Display Name
Info
vector_store
Vector Store
The Astra DB vector store instance configured with the specified parameters.
search_results
Search Results
The results of the similarity search as a list of Data objects.
The Astra DB Vector Store component offers two methods for generating embeddings.
Embedding Model: Use your own embedding model by connecting an Embeddings component in LLM Controls.
Astra Vectorize: Use Astra DB’s built-in embedding generation service. When creating a new collection, choose the embeddings provider and models, including NVIDIA’s NV-Embed-QA model hosted by Datastax.
importantThe embedding model selection is made when creating a new collection and cannot be changed later.
The Astra DB component includes hybrid search, which is enabled by default.The component fields related to hybrid search are Search Query, Lexical Terms, and Reranker.
Search Query finds results by vector similarity.
Lexical Terms is a comma-separated string of keywords, like features, data, attributes, characteristics.
Reranker is the re-ranker model used in the hybrid search. The re-ranker model is nvidia/llama-3.2-nv.reranker.
Hybrid search performs a vector similarity search and a lexical search, compares the results of both searches, and then returns the most relevant results overall.
importantTo use hybrid search, your collection must be created with vector, lexical, and rerank capabilities enabled. These capabilities are enabled by default when you create a collection in a database in the AWS us-east-2 region. For more information, see the DataStax documentation.
To use Hybrid search in the Astra DB component, do the following:
Click New Flow > RAG > Hybrid Search RAG.
In the OpenAI model component, add your OpenAI API key.
In the Astra DB vector store component, add your Astra DB Application Token.
In the Database field, select your database.
In the Collection field, select or create a collection with hybrid search capabilities enabled.
In the Playground, enter a question about your data, such as What are the features of my data? Your query is sent to two components: an OpenAI model component and the Astra DB vector database component. The OpenAI component contains a prompt for creating the lexical query from your input:
To view the keywords and questions the OpenAI component generates from your collection, in the OpenAI component, click Inspect output.
To view the DataFrame generated from the OpenAI component’s response, in the Structured Output component, click Inspect output. The DataFrame is passed to a Parser component, which parses the contents of the Keywords column into a string.This string of comma-separated words is passed to the Lexical Terms port of the Astra DB component. Note that the Search Query port of the Astra DB port is connected to the Chat Input component from step 6. This Search Query is vectorized, and both the Search Query and Lexical Terms content are sent to the reranker at the find_and_rerank endpoint.The reranker compares the vector search results against the string of terms from the lexical search. The highest-ranked results of your hybrid search are returned to the Playground.
This component implements a Cassandra Graph Vector Store with search capabilities.
Parameters
Inputs
Name
Display Name
Info
database_ref
Contact Points / Astra Database ID
The contact points for the database or AstraDB database ID. Required.
username
Username
The username for the database. Leave this field empty for AstraDB.
token
Password / AstraDB Token
The user password for the database or AstraDB token. Required.
keyspace
Keyspace
The table Keyspace or AstraDB namespace. Required.
table_name
Table Name
The name of the table or AstraDB collection where vectors are stored. Required.
setup_mode
Setup Mode
The configuration mode for setting up the Cassandra table. The options are “Sync” or “Off”. Default: “Sync”.
cluster_kwargs
Cluster arguments
An optional dictionary of additional keyword arguments for the Cassandra cluster.
search_query
Search Query
The query string for similarity search.
ingest_data
Ingest Data
The list of data to be ingested into the vector store.
embedding
Embedding
The embedding model to use.
number_of_results
Number of Results
The number of results to return in similarity search. Default: 4.
search_type
Search Type
The search type to use. The options are “Traversal”, “MMR traversal”, “Similarity”, “Similarity with score threshold”, or “MMR (Max Marginal Relevance)”. Default: “Traversal”.
depth
Depth of traversal
The maximum depth of edges to traverse. Used for “Traversal” or “MMR traversal” search types. Default: 1.
search_score_threshold
Search Score Threshold
The minimum similarity score threshold for search results. Used for “Similarity with score threshold” search types.
search_filter
Search Metadata Filter
An optional dictionary of filters to apply to the search query.
Outputs
Name
Display Name
Info
vector_store
Vector Store
The Cassandra Graph vector store instance configured with the specified parameters.
search_results
Search Results
The results of the similarity search as a list of Data objects.
This component creates a Chroma Vector Store with search capabilities.The Chroma DB component creates an ephemeral vector database for experimentation and vector storage.
To use this component in a flow, connect it to a component that outputs Data or DataFrame. This example splits text from a URL component, and computes embeddings with the connected OpenAI Embeddings component. Chroma DB computes embeddings by default, but you can connect your own embeddings model, as seen in this example.
In the Chroma DB component, in the Collection field, enter a name for your embeddings collection.
Optionally, to persist the Chroma database, in the Persist field, enter a directory to store the chroma.sqlite3 file. This example uses ./chroma-db to create a directory relative to where LLM Controls is running.
To load data and embeddings into your Chroma database, in the Chroma DB component, click Run component.
tipWhen loading duplicate documents, enable the Allow Duplicates option in Chroma DB if you want to store multiple copies of the same content, or disable it to automatically deduplicate your data.
To view the split data, in the Split Text component, click Inspect output.
To query your loaded data, open the Playground and query your database. Your input is converted to vector data and compared to the stored vectors in a vector similarity search.
The Local DB component is LLM Controls’ enhanced version of Chroma DB.The component adds a user-friendly interface with two modes (Ingest and Retrieve), automatic collection management, and built-in persistence in Lang’s cache directory.Local DB includes Ingest and Retrieve modes.The Ingest mode works similarly to ChromaDB, and persists your database to the LLM Controls cache directory. The LLM Controls cache directory location is specified in LLMC_CONFIG_DIR. For more information.The Retrieve mode can query your Chroma DB collections.For more information, see the Chroma documentation.
Parameters
Inputs
Name
Type
Description
collection_name
String
The name of the Chroma collection. Default: “LLMC”.
persist_directory
String
Custom base directory to save the vector store. Collections are stored under $DIRECTORY/vector_stores/$COLLECTION_NAME. If not specified, it uses your system’s cache folder.
existing_collections
String
Select a previously created collection to search through its stored data.
embedding
Embeddings
The embedding function to use for the vector store.
allow_duplicates
Boolean
If false, will not add documents that are already in the Vector Store.
search_type
String
Type of search to perform: “Similarity” or “MMR”.
ingest_data
Data/DataFrame
Data to store. It is embedded and indexed for semantic search.
search_query
String
Enter text to search for similar content in the selected collection.
number_of_results
Integer
Number of results to return. Default: 10.
limit
Integer
Limit the number of records to compare when Allow Duplicates is False.
Outputs
Name
Type
Description
vector_store
Chroma
A local Chroma vector store instance configured with the specified parameters.
This component performs Graph RAG (Retrieval Augmented Generation) traversal in a vector store, enabling graph-based document retrieval. For more information, see the Graph RAG documentation.For an example flow, see the Graph RAG template.
Parameters
Inputs
Name
Display Name
Info
embedding_model
Embedding Model
Specify the embedding model. This is not required for collections embedded with Astra vectorize.
vector_store
Vector Store Connection
Connection to the vector store.
edge_definition
Edge Definition
Edge definition for the graph traversal. For more information, see the GraphRAG documentation.
strategy
Traversal Strategies
The strategy to use for graph traversal. Strategy options are dynamically loaded from available strategies.
search_query
Search Query
The query to search for in the vector store.
graphrag_strategy_kwargs
Strategy Parameters
Optional dictionary of additional parameters for the retrieval strategy. For more information, see the strategy documentation.
Outputs
Name
Type
Description
search_results
List[Data]
Results of the graph-based document retrieval as a list of Data objects.
This component implements a Vector Store using HCD.To use the HCD vector store, add your deployment’s collection name, username, password, and HCD Data API endpoint. The endpoint must be formatted like http[s]://DOMAIN_NAME or IP_ADDRESS[:port], for example, http://192.0.2.250:8181.Replace DOMAIN_NAME or IP_ADDRESS with the domain name or IP address of your HCD Data API connection.To use the HCD vector store for embeddings ingestion, connect it to an embeddings model and a file loader:
Parameters
Inputs
Name
Display Name
Info
collection_name
Collection Name
The name of the collection within HCD where the vectors will be stored. Required.
username
HCD Username
Authentication username for accessing HCD. Default is “hcd-superuser”. Required.
password
HCD Password
Authentication password for accessing HCD. Required.
api_endpoint
HCD API Endpoint
API endpoint URL for the HCD service. Required.
search_input
Search Input
Query string for similarity search.
ingest_data
Ingest Data
Data to be ingested into the vector store.
namespace
Namespace
Optional namespace within HCD to use for the collection. Default is “default_namespace”.
ca_certificate
CA Certificate
Optional CA certificate for TLS connections to HCD.
metric
Metric
Optional distance metric for vector comparisons. Options are “cosine”, “dot_product”, “euclidean”.
batch_size
Batch Size
Optional number of data to process in a single batch.
bulk_insert_batch_concurrency
Bulk Insert Batch Concurrency
Optional concurrency level for bulk insert operations.
bulk_insert_overwrite_concurrency
Bulk Insert Overwrite Concurrency
Optional concurrency level for bulk insert operations that overwrite existing data.
bulk_delete_concurrency
Bulk Delete Concurrency
Optional concurrency level for bulk delete operations.
setup_mode
Setup Mode
Configuration mode for setting up the vector store. Options are “Sync”, “Async”, “Off”. Default is “Sync”.
pre_delete_collection
Pre Delete Collection
Boolean flag to determine whether to delete the collection before creating a new one.
metadata_indexing_include
Metadata Indexing Include
Optional list of metadata fields to include in the indexing.
embedding
Embedding or Astra Vectorize
Allows either an embedding model or an Astra Vectorize configuration.
metadata_indexing_exclude
Metadata Indexing Exclude
Optional list of metadata fields to exclude from the indexing.
collection_indexing_policy
Collection Indexing Policy
Optional dictionary defining the indexing policy for the collection.
number_of_results
Number of Results
Number of results to return in similarity search. Default is 4.
search_type
Search Type
Search type to use. Options are “Similarity”, “Similarity with score threshold”, “MMR (Max Marginal Relevance)”. Default is “Similarity”.
search_score_threshold
Search Score Threshold
Minimum similarity score threshold for search results. Default is 0.
search_filter
Search Metadata Filter
Optional dictionary of filters to apply to the search query.
Outputs
Name
Type
Description
vector_store
HyperConvergedDatabaseVectorStore
The HCD vector store instance.
search_results
List[Data]
The results of the similarity search as a list of Data objects.
This component facilitates a Weaviate Vector Store setup, optimizing text and document indexing and retrieval. For more information, see the Weaviate Documentation.