Features

VexAI Features

A comprehensive look at every major capability, from conversation memory and multi-provider LLM support to the security observer, performance caching, identity system, and live thinking indicators.

Conversation Intelligence

VexAI maintains rich, persistent context for every conversation, far beyond a simple chat log. A two-tier memory system, semantic search, and full Discord message mirroring ensure the bot always knows what happened, even across restarts.

Two-Tier Conversation History

Messages flow through two storage layers:

Tier Storage Behaviour
Hot tier In-memory Fast access for the current session. Automatically populated on startup from the cold tier so no context is lost on restart.
Cold tier SQLite Persistent on disk. Every message is written here in real time, surviving process crashes and reboots.

Conversation Summaries

When a conversation grows long, VexAI generates a summary of the earlier exchanges and injects it into future prompts. This keeps token usage under control while preserving the essential context the bot needs to stay coherent.

Semantic Memory

Important facts, user preferences, key events, and decisions are stored in an FTS5-powered full-text search index. The bot can recall relevant memories on-the-fly, even if the original message is thousands of messages ago.

How it works: When the LLM identifies something worth remembering (a user preference, a decision, or a fact), it writes it to the semantic memory store. Future prompts automatically retrieve the most relevant memories via full-text search.

Local Message Database

VexAI maintains a full local mirror of Discord messages with FTS5 search. Messages are synced in real time as they arrive, and a background backfill process pulls in historical messages on first run.

  • Real-time sync: every new message is written to the local DB instantly
  • Background backfill: historical messages are fetched and indexed on startup
  • FTS5 full-text search: fast keyword and phrase search across the entire message history

Message Audit Trail

Edits and deletions are never lost. VexAI preserves every revision of every message, including the original content and all subsequent edits. The get_message_edits tool shows the complete timeline with diffs.

Tip: Use get_message_edits to investigate who changed what and when, which is useful for moderation and accountability.

Local-First Search

Search queries hit the local database first, returning up to 5,000 results with zero API overhead. If local results are insufficient, a fallback to the Discord API returns up to 50 additional results.

Feature Detail
Local result cap5,000 messages
API fallback cap50 messages
Sort orderAscending or descending
Match modesExact string match or FTS5 full-text
Date filtersBefore / after / range

Semantic Search

Beyond keyword matching, VexAI supports AI embedding-powered semantic search. Queries are converted to vector embeddings and compared against stored message embeddings, finding results by meaning rather than exact words.

Example: Searching for "deployment issues" will also surface messages about "server problems", "Docker errors", or "hosting failures", even if those exact words weren't used.

Multi-Provider LLM Support

VexAI is not locked to a single AI provider. An abstract provider layer lets you swap or mix models freely, with built-in resilience and per-user customisation.

Provider Architecture

All providers extend an abstract BaseProvider that handles retry logic with 3 attempts and exponential backoff, so transient API errors don't break conversations.

Provider Notes
OpenAIGPT-4o, GPT-4, GPT-3.5, o1, o3, and all OpenAI models
AnthropicClaude Sonnet, Opus, Haiku families
OpenRouterUnified gateway to 100+ models from multiple vendors
OpenAI-compatibleAny API implementing the OpenAI chat completions spec (LM Studio, Ollama, vLLM, etc.)
Tip: Tool and message adapters automatically normalize format differences between providers. You don't need to worry about prompt format. VexAI handles it.

Per-User Model Overrides

Individual users can switch their own model with the /model set slash command. Overrides are stored in the database and persist across restarts.

# Set your personal model
/model set provider:openai model:gpt-4o

# Reset to server default
/model reset

Mode Presets

The /mode set command provides quick switching between pre-configured quality/cost profiles:

Mode Behaviour
fastLowest-latency model, prioritises response speed
cheapMost cost-effective model available
balancedGood trade-off between quality and cost
qualityBest available model, regardless of cost or latency

Model-Specific Hints

Some models require special handling. VexAI includes built-in hints and adapters for:

  • Kimi: extended context window handling
  • MiniMax: custom tool format adapters
  • DeepSeek: optimised prompt formatting
  • Qwen: tool call compatibility patches

Security Observer

Every tool invocation passes through the security observer before execution. A tiered risk classification system minimises overhead while ensuring destructive actions always get full review.

4-Tier Risk Classification

Tier Level Examples Review
0 Exempt Read-only tools (search, list, get) Auto-approved, zero overhead
1 Low Non-destructive writes (send message, set nickname) Rule-based checks only
2 Medium Impactful but reversible (mute, role assign) LLM review + verdict caching (10 min TTL)
3 High Destructive / irreversible (ban, delete role, purge) Always full LLM review, never cached
Fail-closed: If the security observer encounters an error (LLM timeout, parse failure, etc.), the tool invocation is denied by default. The system never fails open.

Rate Limiting

Per-user, per-tool rate limits prevent abuse even if the LLM is tricked into rapid actions:

  • 10 messages/min: general tool invocations
  • 3 DMs/min: direct message sends

Performance Impact

The tiered system achieves an 80โ€“85% reduction in security LLM calls compared to reviewing every tool invocation. Tier 0 and Tier 1 tools never touch the LLM, and Tier 2 verdicts are cached for 10 minutes.

Configuration

  • Alert channel: escalations and denials are posted to a configurable Discord channel
  • Exempt tools: specific tools can be marked as exempt from review
  • Runtime toggle: the /observer slash command enables or disables the observer on-the-fly

Approval Gate

Destructive actions require explicit human confirmation before execution. When the bot needs to perform a high-risk action (ban a user, delete a role, purge messages, etc.), it posts an approval request in Discord.

How it works: The bot sends a message describing exactly what it intends to do. The requesting user (or an admin) reacts to approve or deny. Only after approval does the action execute.

This provides a critical safety net: even if the LLM misinterprets an instruction, a human must explicitly confirm before any irreversible damage occurs.

Performance & Caching

VexAI uses aggressive, multi-layer caching to minimise latency and reduce API costs without sacrificing accuracy.

Tool Result Cache

Tool outputs are cached with per-tool TTLs based on data volatility:

Data Type TTL Examples
Static5 minutesServer info, role list, channel list
Semi-static2 minutesUser profiles, member list
External1 minuteAPI responses, web fetches

Security Verdict Cache

Tier 2 security verdicts are cached for 10 minutes. If the same user invokes the same tool with similar parameters within that window, the cached verdict is reused, and no additional LLM call is required.

Prompt Optimizer

Not every tool description needs to be in every prompt. The prompt optimizer uses lazy skill descriptions based on message keywords, so only skills relevant to the current message are expanded in the system prompt.

Result: 30โ€“50% token reduction in system prompts on average, significantly reducing LLM costs on every message.

Response Dedup Cache

A 30-second TTL deduplication cache prevents the bot from processing the same message multiple times in rapid succession (e.g., due to Discord webhook retries or race conditions).

Cache Monitoring

The built-in cache_stats tool shows real-time hit rates, miss rates, and eviction counts for all cache layers.

Local-First Queries

Searches against the local message database have zero cache overhead and query SQLite directly with no API calls, making them both fast and free.

Identity System

VexAI's personality, rules, and persistent knowledge are defined by a set of Markdown files that are injected into every system prompt. This makes the bot's behaviour fully customisable without touching code.

Identity Files

File Path Purpose
IDENTITY.md data/IDENTITY.md Bot personality: name, tone, speaking style, persona
SOUL.md data/SOUL.md Deeper behaviour patterns: values, priorities, how the bot "thinks"
RULES.md data/RULES.md Hard rules the bot must follow: safety boundaries, forbidden actions
CORE_MEMORY.md data/CORE_MEMORY.md Persistent facts: server-wide knowledge the bot should always know

Per-User Memory

Each user gets a dedicated memory file at data/users/{userId}/MEMORY.md. The bot writes user-specific preferences, facts, and notes here, and this file is included in the system prompt when that user is in the conversation.

Tip: You can manually edit any of these files to fine-tune the bot's personality or add knowledge. Changes take effect on the next message, with no restart needed.

Per-User Identity Overrides

Individual users can have custom identity overrides that modify how the bot interacts with them specifically, which is useful for different tones in different channels or for specific team members.

Live Thinking Indicator

When VexAI is processing a complex request involving tool calls, users see real-time progress instead of silence.

How It Works

  • Temporary status message: appears immediately when tool execution begins
  • LLM-summarised progress: the status updates every ~4 seconds with a human-readable summary of what the bot is doing
  • Sub-agent tracking: if sub-agents are running, their progress is reflected in the status
  • Auto-cleanup: the status message is automatically removed when the final reply is sent

Response Footer

Every response includes a footer with timing and tool usage breakdown:

โฑ 4.2s ยท Tools: 3ร— search_messages, 1ร— fetch_messages
Why it matters: Without a thinking indicator, users assume the bot is frozen after 2โ€“3 seconds of silence. The live status builds trust and keeps users informed, especially for operations that take 10+ seconds.