VexAI Features
A comprehensive look at every major capability, from conversation memory and multi-provider LLM support to the security observer, performance caching, identity system, and live thinking indicators.
Conversation Intelligence
VexAI maintains rich, persistent context for every conversation, far beyond a simple chat log. A two-tier memory system, semantic search, and full Discord message mirroring ensure the bot always knows what happened, even across restarts.
Two-Tier Conversation History
Messages flow through two storage layers:
| Tier | Storage | Behaviour |
|---|---|---|
| Hot tier | In-memory | Fast access for the current session. Automatically populated on startup from the cold tier so no context is lost on restart. |
| Cold tier | SQLite | Persistent on disk. Every message is written here in real time, surviving process crashes and reboots. |
Conversation Summaries
When a conversation grows long, VexAI generates a summary of the earlier exchanges and injects it into future prompts. This keeps token usage under control while preserving the essential context the bot needs to stay coherent.
Semantic Memory
Important facts, user preferences, key events, and decisions are stored in an FTS5-powered full-text search index. The bot can recall relevant memories on-the-fly, even if the original message is thousands of messages ago.
Local Message Database
VexAI maintains a full local mirror of Discord messages with FTS5 search. Messages are synced in real time as they arrive, and a background backfill process pulls in historical messages on first run.
- Real-time sync: every new message is written to the local DB instantly
- Background backfill: historical messages are fetched and indexed on startup
- FTS5 full-text search: fast keyword and phrase search across the entire message history
Message Audit Trail
Edits and deletions are never lost. VexAI preserves every revision of every message, including the original content and all subsequent edits. The get_message_edits tool shows the complete timeline with diffs.
get_message_edits to investigate who changed what and when, which is useful for moderation and accountability.
Local-First Search
Search queries hit the local database first, returning up to 5,000 results with zero API overhead. If local results are insufficient, a fallback to the Discord API returns up to 50 additional results.
| Feature | Detail |
|---|---|
| Local result cap | 5,000 messages |
| API fallback cap | 50 messages |
| Sort order | Ascending or descending |
| Match modes | Exact string match or FTS5 full-text |
| Date filters | Before / after / range |
Semantic Search
Beyond keyword matching, VexAI supports AI embedding-powered semantic search. Queries are converted to vector embeddings and compared against stored message embeddings, finding results by meaning rather than exact words.
Multi-Provider LLM Support
VexAI is not locked to a single AI provider. An abstract provider layer lets you swap or mix models freely, with built-in resilience and per-user customisation.
Provider Architecture
All providers extend an abstract BaseProvider that handles retry logic with 3 attempts and exponential backoff, so transient API errors don't break conversations.
| Provider | Notes |
|---|---|
| OpenAI | GPT-4o, GPT-4, GPT-3.5, o1, o3, and all OpenAI models |
| Anthropic | Claude Sonnet, Opus, Haiku families |
| OpenRouter | Unified gateway to 100+ models from multiple vendors |
| OpenAI-compatible | Any API implementing the OpenAI chat completions spec (LM Studio, Ollama, vLLM, etc.) |
Per-User Model Overrides
Individual users can switch their own model with the /model set slash command. Overrides are stored in the database and persist across restarts.
# Set your personal model
/model set provider:openai model:gpt-4o
# Reset to server default
/model reset
Mode Presets
The /mode set command provides quick switching between pre-configured quality/cost profiles:
| Mode | Behaviour |
|---|---|
| fast | Lowest-latency model, prioritises response speed |
| cheap | Most cost-effective model available |
| balanced | Good trade-off between quality and cost |
| quality | Best available model, regardless of cost or latency |
Model-Specific Hints
Some models require special handling. VexAI includes built-in hints and adapters for:
- Kimi: extended context window handling
- MiniMax: custom tool format adapters
- DeepSeek: optimised prompt formatting
- Qwen: tool call compatibility patches
Security Observer
Every tool invocation passes through the security observer before execution. A tiered risk classification system minimises overhead while ensuring destructive actions always get full review.
4-Tier Risk Classification
| Tier | Level | Examples | Review |
|---|---|---|---|
| 0 | Exempt | Read-only tools (search, list, get) | Auto-approved, zero overhead |
| 1 | Low | Non-destructive writes (send message, set nickname) | Rule-based checks only |
| 2 | Medium | Impactful but reversible (mute, role assign) | LLM review + verdict caching (10 min TTL) |
| 3 | High | Destructive / irreversible (ban, delete role, purge) | Always full LLM review, never cached |
Rate Limiting
Per-user, per-tool rate limits prevent abuse even if the LLM is tricked into rapid actions:
- 10 messages/min: general tool invocations
- 3 DMs/min: direct message sends
Performance Impact
The tiered system achieves an 80โ85% reduction in security LLM calls compared to reviewing every tool invocation. Tier 0 and Tier 1 tools never touch the LLM, and Tier 2 verdicts are cached for 10 minutes.
Configuration
- Alert channel: escalations and denials are posted to a configurable Discord channel
- Exempt tools: specific tools can be marked as exempt from review
- Runtime toggle: the
/observerslash command enables or disables the observer on-the-fly
Approval Gate
Destructive actions require explicit human confirmation before execution. When the bot needs to perform a high-risk action (ban a user, delete a role, purge messages, etc.), it posts an approval request in Discord.
This provides a critical safety net: even if the LLM misinterprets an instruction, a human must explicitly confirm before any irreversible damage occurs.
Performance & Caching
VexAI uses aggressive, multi-layer caching to minimise latency and reduce API costs without sacrificing accuracy.
Tool Result Cache
Tool outputs are cached with per-tool TTLs based on data volatility:
| Data Type | TTL | Examples |
|---|---|---|
| Static | 5 minutes | Server info, role list, channel list |
| Semi-static | 2 minutes | User profiles, member list |
| External | 1 minute | API responses, web fetches |
Security Verdict Cache
Tier 2 security verdicts are cached for 10 minutes. If the same user invokes the same tool with similar parameters within that window, the cached verdict is reused, and no additional LLM call is required.
Prompt Optimizer
Not every tool description needs to be in every prompt. The prompt optimizer uses lazy skill descriptions based on message keywords, so only skills relevant to the current message are expanded in the system prompt.
Response Dedup Cache
A 30-second TTL deduplication cache prevents the bot from processing the same message multiple times in rapid succession (e.g., due to Discord webhook retries or race conditions).
Cache Monitoring
The built-in cache_stats tool shows real-time hit rates, miss rates, and eviction counts for all cache layers.
Local-First Queries
Searches against the local message database have zero cache overhead and query SQLite directly with no API calls, making them both fast and free.
Identity System
VexAI's personality, rules, and persistent knowledge are defined by a set of Markdown files that are injected into every system prompt. This makes the bot's behaviour fully customisable without touching code.
Identity Files
| File | Path | Purpose |
|---|---|---|
| IDENTITY.md | data/IDENTITY.md |
Bot personality: name, tone, speaking style, persona |
| SOUL.md | data/SOUL.md |
Deeper behaviour patterns: values, priorities, how the bot "thinks" |
| RULES.md | data/RULES.md |
Hard rules the bot must follow: safety boundaries, forbidden actions |
| CORE_MEMORY.md | data/CORE_MEMORY.md |
Persistent facts: server-wide knowledge the bot should always know |
Per-User Memory
Each user gets a dedicated memory file at data/users/{userId}/MEMORY.md. The bot writes user-specific preferences, facts, and notes here, and this file is included in the system prompt when that user is in the conversation.
Per-User Identity Overrides
Individual users can have custom identity overrides that modify how the bot interacts with them specifically, which is useful for different tones in different channels or for specific team members.
Live Thinking Indicator
When VexAI is processing a complex request involving tool calls, users see real-time progress instead of silence.
How It Works
- Temporary status message: appears immediately when tool execution begins
- LLM-summarised progress: the status updates every ~4 seconds with a human-readable summary of what the bot is doing
- Sub-agent tracking: if sub-agents are running, their progress is reflected in the status
- Auto-cleanup: the status message is automatically removed when the final reply is sent
Response Footer
Every response includes a footer with timing and tool usage breakdown:
โฑ 4.2s ยท Tools: 3ร search_messages, 1ร fetch_messages