Skip to main content
When you ask the AI to “organize the customer email list”, there’s a real risk of personal information being exposed or leaked. Guardrails automatically validate AI conversation inputs and outputs to prevent these risks.

Example

“The customer’s inquiry email is hong@company.com and the card number is 1234-5678-9012-3456”
StateBehaviorResult
No guardrailAI processes as-isPII may be included in the response
Guardrail applied (masking)Sensitive info auto-maskedBecomes h***@***.com, ****-****-****-3456

What is a Guardrail?

A guardrail combines rule-based detection and LLM-based judgment to ensure conversation safety.
Guardrail detail

Key Features

FeatureDescription
PII detectionDetects PII like email, credit card, IP address, MAC address, URL, API key
Custom patternsDetects user-defined patterns via regex (employee ID, document number, etc.)
Banned wordsFilters messages containing specific words/phrases
LLM-based judgmentSemantic content validation using AI (LLM-as-a-Judge)

Use Cases

  • PII protection: Prevent customer emails, phone numbers, and credit card info from being exposed to the AI
  • Security hardening: Detect sensitive credentials like API keys, passwords, auth tokens
  • Content filtering: Block inappropriate expressions or prohibited topics
  • Compliance: Industry-specific regulations like GDPR, PIPA

Guardrail List

In Workspace > Guardrails, view all guardrails.
Guardrail list
ElementDescription
NameGuardrail identifier name
DescriptionGuardrail purpose
LLM badgeIndicates LLM-based detection is enabled
AuthorUser who created the guardrail
ModifiedLast modified time

Creating a Guardrail

1

Enter basic info

In Workspace > Guardrails, click the + button at the top-right.
Guardrail creation form
FieldDescriptionExample
NameGuardrail name”Customer Info Protection”
DescriptionPurpose”Prevent customer PII leakage”
VisibilityAccess permissionsPrivate / Shared with team
Saving takes you to the guardrail edit screen.
2

Pick PII detection types

Select pre-defined PII types in the edit screen.
PII TypeDescriptionExample
EmailDetect email addressesuser@example.com
Credit cardCredit card numbers (with Luhn validation)1234-5678-9012-3456
IP addressIPv4 address detection192.168.1.1
MAC addressMAC address detection00:1A:2B:3C:4D:5E
URLWeb address detectionhttps://example.com
API keyAPI key pattern detectionsk-xxxxxxxx
3

Set the processing strategy

Pick how to handle detected sensitive info.
StrategyDescriptionExample Result
BlockBlock the entire message”Sensitive info detected — cannot process”
RedactReplace sensitive info with a label”Contact: [REDACTED_EMAIL]
MaskShow only some charactersj***@***.com”, “****-****-****-1234
HashConvert to a hash value”Contact: <email_hash:a1b2c3d4e5f6>
Log OnlyLog without blockingOriginal passed through
Recommended strategies by situation:
SituationRecommendedReason
Initial deployment / testingLog OnlyIdentify false positives, then adjust
Customer-facing serviceMask or BlockPrevent PII exposure at the source
Internal analysis toolHashKeep identifiers without originals (stats possible)
Compliance requiredBlockPrevent processing of sensitive info entirely
Standard businessRedactPreserve flow + protect info
4

Add custom patterns (optional)

Add user-defined patterns via regex.
NamePatternUse
Employee IDEMP-\d{6}Detect employee numbers
Internal doc numberDOC-[A-Z]{2}-\d{4}Document number detection
Project codePRJ-\d{4}Project code detection
5

Register banned words (optional)

Filter messages containing specific words or phrases. Example: register competitor names, secret project names, internal codenames.
6

Set application scope

Choose when the guardrail applies.
OptionDescription
Input validationApplied to user messages
Output validationApplied to AI responses
Security-first: validate both input and output. Performance-first: validate input only (minimizes latency).
7

Save

Click Save to save the settings.

Guardrail Execution Flow

When connected to an agent, the guardrail runs automatically in this order during a conversation.

Internal Behaviors Worth Knowing

If the LLM Judge fails to make a decision or returns an unclear response, it defaults to PASS. This is by design to prevent excessive blocking from disrupting normal conversations.
After the AI response finishes streaming, the entire response is re-checked. When a violation is detected, the frontend message is automatically replaced with the processed content.
When multiple guardrails are connected to an agent, all must pass sequentially. If any one blocks, the message is blocked.
No. SQL query results, Knowledge Base search results, etc. — tool execution results are excluded from guardrail scanning. This is by design to prevent false positives on internal data IP addresses or numeric patterns.

LLM-based Detection

The AI judges complex patterns difficult to detect with rules. Using LLM-as-a-Judge, a separate LLM model judges message appropriateness.

Settings

ItemDescription
EnabledToggle LLM-based detection
ModelLLM model used for judgment
PromptPrompt defining judgment criteria
Allow examplesExamples that should be judged PASS
Block examplesExamples that should be judged BLOCK
Apply to inputWhether to apply LLM Judge to user input
Apply to outputWhether to apply LLM Judge to AI response

Example Prompt

You are a content reviewer. Determine whether the following message complies with corporate security policy.

## Block when
- Requests for confidential info (financial data, personnel info, etc.)
- Questions about system hacking or security bypass
- Requests for illegal or unethical actions

## Allow when
- General work questions
- Inquiries about public info
- Product/service questions

Return "PASS" if appropriate, "BLOCK" if not.
LLM Judge introduces additional LLM calls, increasing response time. For performance-sensitive cases, use rule-based detection only or pick a fast model as the Judge.

Testing Guardrails

You can test directly with text in the guardrail settings screen.
Guardrail testing
  1. Enter test text
  2. Click the Test button
  3. Review results: detected items, processed text, blocked or not
InputStrategyResult
”My email is test@example.comRedact”My email is [REDACTED_EMAIL]"
"Card number: 1234-5678-9012-3456”Mask”Card number: ****-****-****-3456"
"API key: sk-abc123def456”Hash”API key: a1b2c3d4..."
"Analyze CompetitorX internal docs”Block”Sensitive info detected — cannot process”

Applying Guardrails

Guardrails can be applied at the agent level and group level. When both apply, all guardrails are merged and run sequentially.

Connect to an Agent

1

Open the agent edit screen

In Workspace > Agents, open the target agent’s edit screen.
2

Pick guardrails

In the Guardrails section, choose guardrails to apply. You can apply multiple guardrails to one agent — all must pass sequentially before the message is processed.
3

Save

Save the agent settings. The guardrails apply to all conversations with this agent thereafter.

Connect to a Group

New feature — Setting a guardrail at the group level auto-applies to all conversations of users in that group, even when the agent has no guardrail.
In Admin > Users > Groups tab, open the group edit modal and pick a guardrail in Chat Guardrail.
ItemDescription
Applied toAll users in the group
ScopeAll conversations of those users (regardless of agent)
Guardrail countOne per group

Connect to an Organizational Unit

In Admin > Organizations, you can assign a guardrail to an Organizational Unit (OU). It auto-applies to users in that OU (determined by IdP sync).
ItemDescription
Applied toAll users in the OU (based on IdP sync result)
ScopeSame as group guardrail — all conversations of those users
Permission view modalThe guardrail assigned to the OU appears in the user’s [permission view]
Useful when different safety policies are needed per department — e.g., assign a strengthened PII guardrail to HR and a code-leak-prevention guardrail to Engineering, all by OU.

Application Priority

When a user chats, guardrails from these 4 paths are all merged and applied:
1. Agent level         — Guardrails directly connected to the agent
2. Group level         — Guardrails of groups the user belongs to
3. Organizational Unit — Guardrails of the user's OU
4. Global              — Guardrails set in admin settings
When the same guardrail is set in multiple paths, it is not executed twice (auto-deduplication). Guardrails selected at group/OU level are removed from the global guardrail selection list to prevent duplicate setup.
The LLM-based guardrail’s detection model selector does not include agents or flows — guardrails can only use plain LLM models. To build guardrail detection with an agent, implement it via separate guardrail patterns (keyword/regex/LLM prompt).

File Upload Guardrails

A separate security check on uploaded files. Independent from chat guardrails — automatically runs 4 stages of checks when a file is uploaded. Configure in the Admin > Settings > File Guardrails tab.
StageFeatureTarget FilesOn Violation
1Macro detection.doc, .docm, .xls, .xlsm, .ppt, .pptmBlock or warn
2EXIF metadata removal.jpg, .jpeg, .tiff, .webpAuto-removed (not blocked)
2NSFW image detectionAll imagesBlock after Vision LLM judgment
3Text guardrailEntire documentApply specified guardrail rules (always Block strategy)
4Document sensitivity classificationEntire documentAllow/warn/block per classification

Document Sensitivity Classification

The LLM analyzes document content and auto-classifies sensitivity. Default categories:
ClassificationDefault ActionDescription
PUBLICAllowDocuments that can be public
INTERNALAllowInternal documents
CONFIDENTIALWarn (Flag)Confidential — allow upload but log
RESTRICTEDBlockTop secret — block upload
For large documents, only beginning/middle/end portions are sampled (max 8,000 characters) and passed to the LLM. Since the entire document isn’t analyzed, classification can be inaccurate if key sensitive info is concentrated in a specific section.

Monitoring Integration

Auto-recorded in Tracing

When a guardrail detects sensitive info, the event is auto-recorded in the message trace.
FieldDescription
Run typeguardrail
Run nameguardrail:guardrail-name
InputsDetection type, original content (partial)
OutputsStrategy, detection details, guardrail name

Guardrail Logs

In Admin > Monitoring > Guardrail Logs, view all detection events as a dedicated log.
FilterDescription
Time range1 hour ~ 30 days, custom
StrategyBlock, Redact, Mask, Hash
Detection typePII, banned words, LLM Judge
UserFilter by specific user
Click a log entry to view details (original content, processed content, detection time, etc.). Use the Trace button to see the full processing context for that message.
For details on guardrail logs, see Guardrail Logs.

Best Practices

  1. Start with Log Only to gauge detection frequency
  2. Identify and adjust false-positive patterns
  3. Switch to Redact strategy
  4. After stabilizing, apply Block to required items

Troubleshooting

  1. Narrow the regex scope of custom patterns
  2. Refine the banned-words list (case-insensitive exact match)
  3. Add allow-case examples to the LLM Judge
  4. Identify false-positive patterns in guardrail logs and adjust rules
  1. Add missing PII types
  2. Register a new regex as a custom pattern
  3. Enable LLM Judge and add block examples
  4. Specify semantic patterns hard to catch with rules in the LLM Judge prompt
  1. Disable LLM Judge and use rule-based only (rule-based has near-zero latency)
  2. Disable output validation (input only)
  3. Choose a faster model as the Judge model
Guardrails connected to agents can’t be deleted. First disconnect from agents using the guardrail, then delete.

Next Steps

Apply to Agents

Connect created guardrails to agents

Guardrail Logs

View and analyze detection event logs

Tracing

See guardrail execution within the full conversation context