Guardrails - Cloosphere Guide

When you ask the AI to “organize the customer email list”, there’s a real risk of personal information being exposed or leaked. Guardrails automatically validate AI conversation inputs and outputs to prevent these risks.

Example

“The customer’s inquiry email is hong@company.com and the card number is 1234-5678-9012-3456”

State	Behavior	Result
No guardrail	AI processes as-is	PII may be included in the response
Guardrail applied (masking)	Sensitive info auto-masked	Becomes `h*@.com`, `*--**-3456`

What is a Guardrail?

A guardrail combines rule-based detection and LLM-based judgment to ensure conversation safety.

Key Features

Feature	Description
PII detection	Detects PII like email, credit card, IP address, MAC address, URL, API key
Custom patterns	Detects user-defined patterns via regex (employee ID, document number, etc.)
Banned words	Filters messages containing specific words/phrases
LLM-based judgment	Semantic content validation using AI (LLM-as-a-Judge)

Use Cases

PII protection: Prevent customer emails, phone numbers, and credit card info from being exposed to the AI
Security hardening: Detect sensitive credentials like API keys, passwords, auth tokens
Content filtering: Block inappropriate expressions or prohibited topics
Compliance: Industry-specific regulations like GDPR, PIPA

Guardrail List

In Workspace > Guardrails, view all guardrails.

Element	Description
Name	Guardrail identifier name
Description	Guardrail purpose
LLM badge	Indicates LLM-based detection is enabled
Author	User who created the guardrail
Modified	Last modified time

Creating a Guardrail

Enter basic info

In Workspace > Guardrails, click the + button at the top-right.

Field	Description	Example
Name	Guardrail name	”Customer Info Protection”
Description	Purpose	”Prevent customer PII leakage”
Visibility	Access permissions	Private / Shared with team

Saving takes you to the guardrail edit screen.

Pick PII detection types

Select pre-defined PII types in the edit screen.

PII Type	Description	Example
Email	Detect email addresses	`user@example.com`
Credit card	Credit card numbers (with Luhn validation)	`1234-5678-9012-3456`
IP address	IPv4 address detection	`192.168.1.1`
MAC address	MAC address detection	`00:1A:2B:3C:4D:5E`
URL	Web address detection	`https://example.com`
API key	API key pattern detection	`sk-xxxxxxxx`

Set the processing strategy

Pick how to handle detected sensitive info.

Strategy	Description	Example Result
Block	Block the entire message	”Sensitive info detected — cannot process”
Redact	Replace sensitive info with a label	”Contact: `[REDACTED_EMAIL]`”
Mask	Show only some characters	”`j*@.com`”, “`*--**-1234`”
Hash	Convert to a hash value	”Contact: `<email_hash:a1b2c3d4e5f6>`”
Log Only	Log without blocking	Original passed through

Recommended strategies by situation:

Situation	Recommended	Reason
Initial deployment / testing	Log Only	Identify false positives, then adjust
Customer-facing service	Mask or Block	Prevent PII exposure at the source
Internal analysis tool	Hash	Keep identifiers without originals (stats possible)
Compliance required	Block	Prevent processing of sensitive info entirely
Standard business	Redact	Preserve flow + protect info

Add custom patterns (optional)

Add user-defined patterns via regex.

Name	Pattern	Use
Employee ID	`EMP-\d{6}`	Detect employee numbers
Internal doc number	`DOC-[A-Z]{2}-\d{4}`	Document number detection
Project code	`PRJ-\d{4}`	Project code detection

Filter messages containing specific words or phrases. Example: register competitor names, secret project names, internal codenames.

Set application scope

Choose when the guardrail applies.

Option	Description
Input validation	Applied to user messages
Output validation	Applied to AI responses

Security-first: validate both input and output. Performance-first: validate input only (minimizes latency).

Save

Click Save to save the settings.

Guardrail Execution Flow

When connected to an agent, the guardrail runs automatically in this order during a conversation.

Internal Behaviors Worth Knowing

LLM Judge's safety-first principle

If the LLM Judge fails to make a decision or returns an unclear response, it defaults to PASS. This is by design to prevent excessive blocking from disrupting normal conversations.

How output guardrails work

After the AI response finishes streaming, the entire response is re-checked. When a violation is detected, the frontend message is automatically replaced with the processed content.

What happens with multiple guardrails?

When multiple guardrails are connected to an agent, all must pass sequentially. If any one blocks, the message is blocked.

Are tool execution results scanned?

No. SQL query results, Knowledge Base search results, etc. — tool execution results are excluded from guardrail scanning. This is by design to prevent false positives on internal data IP addresses or numeric patterns.

LLM-based Detection

The AI judges complex patterns difficult to detect with rules. Using LLM-as-a-Judge, a separate LLM model judges message appropriateness.

Settings

Item	Description
Enabled	Toggle LLM-based detection
Model	LLM model used for judgment
Prompt	Prompt defining judgment criteria
Allow examples	Examples that should be judged PASS
Block examples	Examples that should be judged BLOCK
Apply to input	Whether to apply LLM Judge to user input
Apply to output	Whether to apply LLM Judge to AI response

Example Prompt

You are a content reviewer. Determine whether the following message complies with corporate security policy.

## Block when
- Requests for confidential info (financial data, personnel info, etc.)
- Questions about system hacking or security bypass
- Requests for illegal or unethical actions

## Allow when
- General work questions
- Inquiries about public info
- Product/service questions

Return "PASS" if appropriate, "BLOCK" if not.

LLM Judge introduces additional LLM calls, increasing response time. For performance-sensitive cases, use rule-based detection only or pick a fast model as the Judge.

Testing Guardrails

You can test directly with text in the guardrail settings screen.

Enter test text
Click the Test button
Review results: detected items, processed text, blocked or not

Input	Strategy	Result
”My email is test@example.com”	Redact	”My email is `[REDACTED_EMAIL]`"
"Card number: 1234-5678-9012-3456”	Mask	”Card number: `**--**-3456`"
"API key: sk-abc123def456”	Hash	”API key: `a1b2c3d4...`"
"Analyze CompetitorX internal docs”	Block	”Sensitive info detected — cannot process”

Applying Guardrails

Guardrails can be applied at the agent level and group level. When both apply, all guardrails are merged and run sequentially.

Connect to an Agent

Open the agent edit screen

In Workspace > Agents, open the target agent’s edit screen.

Pick guardrails

In the Guardrails section, choose guardrails to apply. You can apply multiple guardrails to one agent — all must pass sequentially before the message is processed.

Save

Save the agent settings. The guardrails apply to all conversations with this agent thereafter.

Connect to a Group

New feature — Setting a guardrail at the group level auto-applies to all conversations of users in that group, even when the agent has no guardrail.

In Admin > Users > Groups tab, open the group edit modal and pick a guardrail in Chat Guardrail.

Item	Description
Applied to	All users in the group
Scope	All conversations of those users (regardless of agent)
Guardrail count	One per group

Connect to an Organizational Unit

In Admin > Organizations, you can assign a guardrail to an Organizational Unit (OU). It auto-applies to users in that OU (determined by IdP sync).

Item	Description
Applied to	All users in the OU (based on IdP sync result)
Scope	Same as group guardrail — all conversations of those users
Permission view modal	The guardrail assigned to the OU appears in the user’s [permission view]

Useful when different safety policies are needed per department — e.g., assign a strengthened PII guardrail to HR and a code-leak-prevention guardrail to Engineering, all by OU.

Application Priority

When a user chats, guardrails from these 4 paths are all merged and applied:

Agent level         — Guardrails directly connected to the agent
Group level         — Guardrails of groups the user belongs to
Organizational Unit — Guardrails of the user's OU
Global              — Guardrails set in admin settings

When the same guardrail is set in multiple paths, it is not executed twice (auto-deduplication). Guardrails selected at group/OU level are removed from the global guardrail selection list to prevent duplicate setup.

The LLM-based guardrail’s detection model selector does not include agents or flows — guardrails can only use plain LLM models. To build guardrail detection with an agent, implement it via separate guardrail patterns (keyword/regex/LLM prompt).

File Upload Guardrails

A separate security check on uploaded files. Independent from chat guardrails — automatically runs 4 stages of checks when a file is uploaded. Configure in the Admin > Settings > File Guardrails tab.

Stage	Feature	Target Files	On Violation
1	Macro detection	.doc, .docm, .xls, .xlsm, .ppt, .pptm	Block or warn
2	EXIF metadata removal	.jpg, .jpeg, .tiff, .webp	Auto-removed (not blocked)
2	NSFW image detection	All images	Block after Vision LLM judgment
3	Text guardrail	Entire document	Apply specified guardrail rules (always Block strategy)
4	Document sensitivity classification	Entire document	Allow/warn/block per classification

Document Sensitivity Classification

The LLM analyzes document content and auto-classifies sensitivity. Default categories:

Classification	Default Action	Description
PUBLIC	Allow	Documents that can be public
INTERNAL	Allow	Internal documents
CONFIDENTIAL	Warn (Flag)	Confidential — allow upload but log
RESTRICTED	Block	Top secret — block upload

For large documents, only beginning/middle/end portions are sampled (max 8,000 characters) and passed to the LLM. Since the entire document isn’t analyzed, classification can be inaccurate if key sensitive info is concentrated in a specific section.

Monitoring Integration

Auto-recorded in Tracing

When a guardrail detects sensitive info, the event is auto-recorded in the message trace.

Field	Description
Run type	`guardrail`
Run name	`guardrail:guardrail-name`
Inputs	Detection type, original content (partial)
Outputs	Strategy, detection details, guardrail name

Guardrail Logs

In Admin > Monitoring > Guardrail Logs, view all detection events as a dedicated log.

Filter	Description
Time range	1 hour ~ 30 days, custom
Strategy	Block, Redact, Mask, Hash
Detection type	PII, banned words, LLM Judge
User	Filter by specific user

Click a log entry to view details (original content, processed content, detection time, etc.). Use the Trace button to see the full processing context for that message.

For details on guardrail logs, see Guardrail Logs.

Best Practices

Phased rollout
Per-role guardrails
Periodic review

Start with Log Only to gauge detection frequency
Identify and adjust false-positive patterns
Switch to Redact strategy
After stabilizing, apply Block to required items

Role	Recommended
Customer support bot	All PII + Mask
Internal analysis tool	API keys only + Hash
HR assistant	All PII + LLM Judge + Block
General work bot	Default PII + Redact

Troubleshooting

Many false positives

Narrow the regex scope of custom patterns
Refine the banned-words list (case-insensitive exact match)
Add allow-case examples to the LLM Judge
Identify false-positive patterns in guardrail logs and adjust rules

Detection misses

Add missing PII types
Register a new regex as a custom pattern
Enable LLM Judge and add block examples
Specify semantic patterns hard to catch with rules in the LLM Judge prompt

Slow responses

Disable LLM Judge and use rule-based only (rule-based has near-zero latency)
Disable output validation (input only)
Choose a faster model as the Judge model

Cannot delete a guardrail

Guardrails connected to agents can’t be deleted. First disconnect from agents using the guardrail, then delete.

Next Steps

Apply to Agents

Connect created guardrails to agents

Guardrail Logs

View and analyze detection event logs

Tracing

See guardrail execution within the full conversation context

​Example

​What is a Guardrail?

​Key Features

​Use Cases

​Guardrail List

​Creating a Guardrail

​Guardrail Execution Flow

​Internal Behaviors Worth Knowing

​LLM-based Detection

​Settings

​Example Prompt

​Testing Guardrails

​Applying Guardrails

​Connect to an Agent

​Connect to a Group

​Connect to an Organizational Unit

​Application Priority

​File Upload Guardrails

​Document Sensitivity Classification

​Monitoring Integration

​Auto-recorded in Tracing

​Guardrail Logs

​Best Practices

​Troubleshooting

​Next Steps

Apply to Agents

Guardrail Logs

Tracing

Example

What is a Guardrail?

Key Features

Use Cases

Guardrail List

Creating a Guardrail

Guardrail Execution Flow

Internal Behaviors Worth Knowing

LLM-based Detection

Settings

Example Prompt

Testing Guardrails

Applying Guardrails

Connect to an Agent

Connect to a Group

Connect to an Organizational Unit

Application Priority

File Upload Guardrails

Document Sensitivity Classification

Monitoring Integration

Auto-recorded in Tracing

Guardrail Logs

Best Practices

Troubleshooting

Next Steps