Skip to main content
When you ask the AI a question and it replies “I don’t know” or gives an off-topic answer, that’s where the Knowledge Base comes in. By converting your internal documents into a form the AI can reference directly, the Knowledge Base enables accurate, document-grounded answers.

Example

“What is our company’s annual leave policy?”
StateBehaviorResult
No Knowledge BaseAnswers from general AI knowledgeInaccurate or “I don’t know”
Knowledge Base connectedSearches hr-policy.pdf for related content, then answersAccurate policy + citation
Each Knowledge Base can use a different Document Processing Profile with its own extraction method and chunking strategy. See Document Processing Profile Selection below for details.
Knowledge Base list

RAG Pipeline

Uploaded documents go through this pipeline before becoming searchable.
StageDescription
Apply ProfileApplies the extraction/chunking strategy from the KB’s document processing profile
Extract TextExtracts text from documents (default, OCR, LLM Vision, etc., depending on profile)
ChunkingSplits long documents into search-friendly sizes (fixed-size or semantic)
Preserve TablesKeeps tables intact in adjacent chunks instead of splitting (per profile)
Preserve ContextAdds an LLM-generated summary of the document context to each chunk (per profile)
EmbeddingConverts text to high-dimensional vectors
Similarity SearchFinds chunks most similar to the question
LLM Response GenerationGenerates an answer using the retrieved documents as context

Creating a Knowledge Base

1

Create a new Knowledge Base

In Workspace > Knowledge Base, click the + button at the top-right.
Knowledge Base creation form
FieldDescriptionExample
NameKB name (required)“HR Policy 2024”
DescriptionPurpose and content (required)“HR team policies and guidelines”
AccessPublic/Private and groups/organizational unitsPublic, or restricted to specific groups/organizations
Document Processing ProfileExtraction/chunking strategy applied to this KB”Default Extraction”, “LLM Vision High Precision”, etc.
2

Upload documents

Add documents to the new Knowledge Base. Click the Add Content (+) button to choose an upload method.
Knowledge Base detail
Upload methods:
MethodDescription
Drag and DropDrag files onto the upload area
Upload FilesSelect “Upload Files” from the “Add Content” menu
Upload DirectorySelect “Upload Directory” — bulk-upload all files in a folder
Add TextWrite text directly to add as content
Cloud StorageGoogle Drive, OneDrive, SharePoint (visible when admin has configured)
3

Wait for processing

Uploaded documents go through text extraction → chunking → embedding → indexing automatically. A real-time notification appears when processing completes.
Files that take more than 10 minutes to process are auto-failed. Delete and re-upload the file in that case.
Bulk uploads:
  • 5+ files or directory uploads switch to batch mode
  • A progress bar shows steps (upload → processing) at the top, with failures shown in red
  • 3 files are processed in parallel
  • Progress state persists across page refreshes
4

Verify and validate

Click a document to view extracted text and connect it to an agent to validate retrieval quality. Toggle Summary in the file list to see the AI-generated document summary.

Supported File Formats

CategoryFormatsMax Size
DocumentsPDF, DOCX, PPTX, TXT, MD50MB
SpreadsheetsXLSX, CSV20MB
WebHTML-
CodePY, JS, TS, JSON, YAML10MB

Dynamic Filters

Dynamic filters let you classify documents in a KB by metadata and automatically narrow the search scope.
For internal mechanics of dynamic filters — Manual vs. AI comparison, 5-step search flow, and more — see the Dynamic Filters Deep-Dive.

Defining the Filter Schema

Click “Add Filter” in Knowledge Base settings to define filter fields.
Filter schema definition
SettingDescriptionExample
NameFilter field name”Department”, “Year”
TypeData typeEnum, Collection, Number, Date
OptionsList of allowed values (Enum/Collection types)“Finance, HR, Engineering”
DescriptionDescription so the AI understands the filter”Filter by the document’s department”
RequiredMarks files missing this field with a warningRequired check

Filter Types

TypeDescriptionSlot Limit
EnumSingle selection from predefined optionsMax 4
CollectionMultiple selections from predefined optionsMax 4
NumberInteger filterMax 2
DateDate range filterMax 2
Document Type (doc_type)Document type auto-classified from chunk content (policy/guide/report/form, etc.)Auto
The Document Type (doc_type) filter is a system filter — the LLM auto-classifies based on chunk content. Without entering metadata, you can automatically separate search scope by document format — even when policy docs and manuals coexist in the same KB, the agent picks the right type for each question.

Per-File Metadata Settings

After defining the filter schema, set metadata values per file. The metadata state is shown by color in the file list.
File metadata input
ColorMeaning
GreenAll filter fields have values
YellowOnly some fields are set
OrangeA required field is empty
Gray borderNo metadata set
Purple spinnerAI extraction in progress
When metadata changes, the vector index is updated automatically. Existing vectors are kept — no re-embedding required.

AI Auto-Extraction

When you set an extraction prompt on the filter schema, the LLM analyzes document content and filename at upload time and auto-extracts metadata.
1

Set extraction mode

Click the Manual / AI toggle in the filter schema and switch to AI mode.
2

Write the extraction prompt

Write an extraction prompt for each filter. Example: “Extract the country name from the filename”
3

Pick an AI model

Select the LLM model for extraction.
4

Run extraction

Runs automatically on upload, or run single/all extraction manually.
MethodDescription
Auto-extractionRuns automatically at upload (when AI mode is active)
Single extractionClick the extract button on a file’s metadata edit screen
Bulk extractionRe-extract metadata for all files at once
You can use “filename” as a condition in the extraction prompt. Example: “If the filename starts with [XX] (country code), extract that country code”
When a connected KB has dynamic filters, the AI auto-infers filter conditions from the user’s question to narrow search scope.
Q: Show me the Finance team's 2024 policies
→ Filters auto-applied: department=Finance, year=2024
→ Searches only matching documents

Tool Description

The tool description is an AI-only description that tells the agent when and in what situations to use this Knowledge Base.
Use when there are questions about company HR policies and internal guidelines.
Reference for HR-related questions like leave, benefits, and travel policies.
If no tool description is set, the KB’s general description is used instead. We recommend writing detailed tool descriptions so the AI can pick the right Knowledge Base among many. AI auto-generation: Click the auto-generate button next to the tool description field — the AI drafts the description based on the KB name, description, and file list.

Knowledge Base Management

Document Management

ActionMethod
Add documentAdd Content (+) button or drag-and-drop
Delete documentSelect a document and click delete
View contentClick a document to preview the extracted text
SearchSearch by filename (server-side, fast even on large KBs)
SortNewest (default) / Oldest / By name
The file list uses infinite scroll to load 50 entries at a time. File state badges:
BadgeMeaning
Failed (red)Processing failed — click Retry to retry
Processing (orange)Processing in progress
Summary (toggle)Processed + summary available — click to view inline summary
Re-uploading a file with the same name shows a duplicate confirmation dialog.
OptionBehavior
OverwriteDelete the existing file and replace
SkipSkip the duplicate, keep the existing file
CancelCancel upload

Reindexing

To rebuild the vector index for the entire KB, run reindex from Admin Settings > Documents. This is admin-only and processes all KBs at once. When you edit and save an individual file’s content, only that file is automatically re-processed.

Using in Chat

Reference directly during chat with @kb-name.
@hr-policy What is the leave application process?
Click citation numbers in the AI response to view the source content.

Document Processing Profile Selection

New feature — Apply different document processing strategies (extraction engine, chunking method, table preservation) per Knowledge Base.
Choose one of the profiles defined by the admin in Admin > Settings > Documents. If you don’t pick one, the default profile applies.

Profile Use Cases

Knowledge BaseRecommended ProfileReason
HR policy (text PDF)Default ExtractionPlain text, no extra cost
Financial report (many tables)Table Preservation enabledImproves table data search accuracy
Scanned documents (image PDF)LLM VisionAccurately extracts text inside images
Technical docs (long reports)Semantic Chunking + Context PreservationTopic-based separation + preserved opening context
Only admins can create profiles. If a needed profile is missing, ask your admin. See Admin > Settings > Documents for profile setup.

Advanced Settings

Adjust document processing and search parameters in admin settings.
These settings are global defaults for KBs without a Document Processing Profile. KBs with a profile use the profile settings instead.

Document Processing Options

SettingDescriptionDefault
Chunk SizeDocument split unit (characters)1000
Chunk OverlapOverlapping characters between chunks100
OCR EnabledExtract text from imagesEnabled

Content Extraction Engine

Choose the engine for extracting text from documents in admin settings.
EngineStrengthsBest For
Default (PyPDF/Langchain)No setup neededPlain-text PDF, DOCX
TikaServer required, supports many formatsMixed file formats
DoclingServer requiredComplex layout documents
Azure Document IntelligenceAzure subscription required, high-precision OCRScanned documents, table-heavy PDFs
Google Document AIGCP subscription requiredDocuments with embedded images
Mistral OCRMistral API requiredPDF OCR
LLM VisionVision LLM-based, high precisionComplex layouts, charts

Embedding Engine

EngineStrengths
Local (SentenceTransformer)No external transmission, strong security
OpenAIHigh quality, API costs
Azure OpenAIOptimized for enterprise
OllamaLocal server, custom models

Search Settings

Search settings have two layers — global (admin) and per-KB.
SettingDefaultDescription
Top K10Chunks to retrieve via vector search
Reranker Top K3Final chunks after reranking
Reranker Threshold0.1Minimum reranker score (lower = more pass through)
Override search settings per Knowledge Base. Click the Search Settings icon on the KB edit screen. Leaving values empty applies the global admin settings.

Per-KB Document Summary Settings

Control document summary generation per KB. Configure under the “File Summary” section in the search settings modal.
SettingDescriptionDefault
Enable File SummaryAuto-generate AI summary on file processing completionOn
Summary ModelLLM model for summarizationUses Task Model
For files with summaries, click the “Summary” toggle in the list to view inline. When question generation is enabled, the LLM pre-generates “questions a user might ask to find this content” for each chunk and stores them as separate vectors.
StateSearch MethodEffect
DisabledContent vectors onlyStandard search
EnabledWeighted sum of content + question vectorsImproved accuracy with user-question-style phrasing
Enabling question generation adds LLM calls during document processing. Processing time and cost may increase.
Configure question generation per KB independently of the global setting. Use the “Question Generation” section in the search settings modal to enable/disable and select the LLM model.

Best Practices

Document Preparation

  1. Clean format: Distinguish titles and subheadings clearly with consistent styling
  2. Keep current: Update documents regularly and remove old ones
  3. Right size: Split large documents by topic and group related content together

Knowledge Base Composition

  1. Separate by topic: Create separate KBs for “HR Policy”, “IT Guide”, “Product Manual”, etc.
  2. Granular access: Manage sensitive info separately and restrict access by department
  3. Write tool descriptions: Detailed tool descriptions help agents auto-pick the right KB

Use Cases

Build Knowledge Bases of HR policy, work manuals, and IT guides so new hires can adapt quickly by asking the AI.
  • Knowledge Bases: “HR Policy”, “Work Manual”, “IT Usage Guide”
  • Connect 3 KBs to the agent
  • Differentiate KB purposes via tool descriptions
Build Knowledge Bases of product manuals, FAQs, and technical docs to provide accurate answers to customer inquiries.
  • Dynamic filters: filter by product name, version
  • Agent: customer-support system prompt + Knowledge Base connection
  • Use citation display to ensure trust
Classify per-department documents with dynamic filters and search the right policies by department.
  • Dynamic filters: department, year, document type
  • AI auto-extraction for metadata
  • Access control to protect sensitive documents

FAQ

By default, there’s no limit on file count or capacity. Admins can set limits via environment variables.
When a file with the same name is detected, a duplicate confirmation dialog appears with Overwrite, Skip, or Cancel options.
The default extraction engine supports OCR, but for scanned documents or image-heavy PDFs, Azure Document Intelligence or Google Document AI engines are more accurate. Check the extraction engine setting with your admin.
No. Metadata changes only update the vector index’s filter fields; existing vectors stay intact. Processed quickly without re-embedding.
When multiple KBs are connected, search settings merge as follows:
  • Top K, Reranker Top K: Use the largest value across KBs
  • Reranker Threshold: Use the lowest value across KBs (more results pass)
Existing documents’ vectors are not auto-reprocessed when the profile changes. To apply the new profile to existing documents, delete and re-upload them, or run a full reindex from admin settings.
  • LLM Vision: page count × ~2 LLM calls (extraction + boundary correction)
  • Context Preservation: chunk count × 1 LLM call
For a 10-page PDF + 20 chunks: LLM Vision (19 calls) + Context Preservation (20 calls) ≈ 39 LLM calls. Bulk uploading large documents can cost a lot, so use selectively for important docs.

Dynamic Filters Deep-Dive

Filter internals, Manual vs AI, the 5-step search flow

Knowledge Graph

Connect Knowledge Bases + glossaries + DBs into one graph

Agents

Connect a Knowledge Base to an agent

Glossary

Improve AI understanding by defining domain terms