- Implemented a new SPA page for LLM Bridge and MCP Server settings in `llm-config.js`. - Added functionality for managing LLM and MCP configurations, including toggling, saving settings, and testing connections. - Created HTTP endpoints in `llm_utils.py` for handling LLM chat, status checks, and MCP server configuration. - Integrated model fetching from LaRuche and Ollama backends. - Enhanced error handling and logging for better debugging and user feedback.
38 KiB
BJORN — LLM Bridge, MCP Server & LLM Orchestrator
Complete architecture, operation, commands, fallbacks
Table of contents
- Overview
- Created / modified files
- LLM Bridge (
llm_bridge.py) - MCP Server (
mcp_server.py) - LLM Orchestrator (
llm_orchestrator.py) - Orchestrator & Scheduler integration
- Web Utils LLM (
web_utils/llm_utils.py) - EPD comment integration (
comment.py) - Configuration (
shared.py) - HTTP Routes (
webapp.py) - Web interfaces
- Startup (
Bjorn.py) - LaRuche / LAND Protocol compatibility
- Optional dependencies
- Quick activation & configuration
- Complete API endpoint reference
- Queue priority system
- Fallbacks & graceful degradation
- Call sequences
1. Overview
┌─────────────────────────────────────────────────────────────────────┐
│ BJORN (RPi) │
│ │
│ ┌─────────────┐ ┌──────────────────┐ ┌─────────────────────┐ │
│ │ Core BJORN │ │ MCP Server │ │ Web UI │ │
│ │ (unchanged) │ │ (mcp_server.py) │ │ /chat.html │ │
│ │ │ │ 7 exposed tools │ │ /mcp-config.html │ │
│ │ comment.py │ │ HTTP SSE / stdio │ │ ↳ Orch Log button │ │
│ │ ↕ LLM hook │ │ │ │ │ │
│ └──────┬──────┘ └────────┬─────────┘ └──────────┬──────────┘ │
│ └─────────────────────────────────────────────┘ │
│ │ │
│ ┌──────────────────────────▼─────────────────────────────────┐ │
│ │ LLM Bridge (llm_bridge.py) │ │
│ │ Singleton · Thread-safe │ │
│ │ │ │
│ │ Automatic cascade: │ │
│ │ 1. LaRuche node (LAND/mDNS → HTTP POST /infer) │ │
│ │ 2. Local Ollama (HTTP POST /api/chat) │ │
│ │ 3. External API (Anthropic / OpenAI / OpenRouter) │ │
│ │ 4. None (→ fallback templates in comment.py) │ │
│ │ │ │
│ │ Agentic tool-calling loop (stop_reason=tool_use, ≤6 turns) │ │
│ │ _BJORN_TOOLS: 7 tools in Anthropic format │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌──────────────────────────▼─────────────────────────────────┐ │
│ │ LLM Orchestrator (llm_orchestrator.py) │ │
│ │ │ │
│ │ mode = none → LLM has no role in scheduling │ │
│ │ mode = advisor → LLM suggests 1 action/cycle (prio 85) │ │
│ │ mode = autonomous→ own thread, loop + tools (prio 82) │ │
│ │ │ │
│ │ Fingerprint (hosts↑, vulns↑, creds↑, queue_id↑) │ │
│ │ → skip LLM if nothing new (token savings) │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌──────────────────────────▼─────────────────────────────────┐ │
│ │ Action Queue (SQLite) │ │
│ │ scheduler=40 normal=50 MCP=80 autonomous=82 advisor=85│ │
│ └─────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
↕ mDNS _ai-inference._tcp.local. (zeroconf)
┌──────────────────────────────────────────┐
│ LaRuche Swarm (LAN) │
│ Node A → Mistral 7B :8419 │
│ Node B → DeepSeek Coder :8419 │
│ Node C → Phi-3 Mini :8419 │
└──────────────────────────────────────────┘
Design principles:
- Everything is disabled by default — zero impact if not configured
- All dependencies are optional — silent import if missing
- Systematic fallback at every level — Bjorn never crashes because of the LLM
- The bridge is a singleton — one instance per process, thread-safe
- EPD comments preserve their exact original behaviour if LLM is disabled
- The LLM is the brain (decides what to do), the orchestrator is the arms (executes)
2. Created / modified files
Created files
| File | Approx. size | Role |
|---|---|---|
llm_bridge.py |
~450 lines | LLM Singleton — backend cascade + agentic tool-calling loop |
mcp_server.py |
~280 lines | FastMCP MCP Server — 7 Bjorn tools |
web_utils/llm_utils.py |
~220 lines | LLM/MCP HTTP endpoints (web_utils pattern) |
llm_orchestrator.py |
~410 lines | LLM Orchestrator — advisor & autonomous modes |
web/chat.html |
~300 lines | Chat interface + Orch Log button |
web/mcp-config.html |
~400 lines | LLM & MCP configuration page |
Modified files
| File | What changed |
|---|---|
shared.py |
+45 config keys (LLM bridge, MCP, orchestrator) |
comment.py |
LLM hook in get_comment() — 12 lines added |
utils.py |
+1 entry in lazy WebUtils registry: "llm_utils" |
webapp.py |
+9 GET/POST routes in _register_routes_once() |
Bjorn.py |
LLM Bridge warm-up + conditional MCP server start |
orchestrator.py |
+LLMOrchestrator lifecycle + advisor call in background tasks |
action_scheduler.py |
+skip scheduler if LLM autonomous only (llm_orchestrator_skip_scheduler) |
requirements.txt |
+3 comment lines (optional dependencies documented) |
3. LLM Bridge (llm_bridge.py)
Internal architecture
LLMBridge (Singleton)
├── __init__() Initialises singleton, launches LaRuche discovery
├── complete() Main API — cascades all backends
│ └── tools=None/[...] Optional param to enable tool-calling
├── generate_comment() Generates a short EPD comment (≤80 tokens)
├── chat() Stateful chat with per-session history
│ └── tools=_BJORN_TOOLS if llm_chat_tools_enabled=True
├── clear_history() Clears a session's history
├── status() Returns bridge state (for the UI)
│
├── _start_laruche_discovery() Starts mDNS thread in background
├── _discover_laruche_mdns() Listens to _ai-inference._tcp.local. continuously
│
├── _call_laruche() Backend 1 — POST http://[node]:8419/infer
├── _call_ollama() Backend 2 — POST http://localhost:11434/api/chat
├── _call_anthropic() Backend 3a — POST api.anthropic.com + AGENTIC LOOP
│ └── loop ≤6 turns: send → tool_use → execute → feed result → repeat
├── _call_openai_compat() Backend 3b — POST [base_url]/v1/chat/completions
│
├── _execute_tool(name, inputs) Dispatches to mcp_server._impl_*
│ └── gate: checks mcp_allowed_tools before executing
│
└── _build_system_prompt() Builds system prompt with live Bjorn context
_BJORN_TOOLS : List[Dict] Anthropic-format definitions for the 7 MCP tools
_BJORN_TOOLS — full list
_BJORN_TOOLS = [
{"name": "get_hosts", "description": "...", "input_schema": {...}},
{"name": "get_vulnerabilities", ...},
{"name": "get_credentials", ...},
{"name": "get_action_history", ...},
{"name": "get_status", ...},
{"name": "run_action", ...}, # gated by mcp_allowed_tools
{"name": "query_db", ...}, # SELECT only
]
Backend cascade
llm_backend = "auto" → LaRuche → Ollama → API → None
llm_backend = "laruche" → LaRuche only
llm_backend = "ollama" → Ollama only
llm_backend = "api" → External API only
At each step, if a backend fails (timeout, network error, missing model), the next one is tried silently. If all fail, complete() returns None.
Agentic tool-calling loop (_call_anthropic)
When tools is passed to complete(), the Anthropic backend enters agentic mode:
_call_anthropic(messages, system, tools, max_tokens, timeout)
│
├─ POST /v1/messages {tools: [...]}
│
├─ [stop_reason = "tool_use"]
│ for each tool_use block:
│ result = _execute_tool(name, inputs)
│ append {role: "tool", tool_use_id: ..., content: result}
│ POST /v1/messages [messages + tool results] ← next turn
│
└─ [stop_reason = "end_turn"] → returns final text
[≥6 turns] → returns partial text + warning
_execute_tool() dispatches directly to mcp_server._impl_* (no network), checking mcp_allowed_tools for run_action.
Tool-calling in chat (chat())
If llm_chat_tools_enabled = True, the chat passes tools=_BJORN_TOOLS to the backend, letting the LLM answer with real-time data (hosts, vulns, creds…) rather than relying only on its training knowledge.
Chat history
- Each session has its own history (key =
session_id) - Special session
"llm_orchestrator": contains the autonomous orchestrator's reasoning - Max size configurable:
llm_chat_history_size(default: 20 messages) - History is in-memory only — not persisted across restarts
- Thread-safe via
_hist_lock
4. MCP Server (mcp_server.py)
What is MCP?
The Model Context Protocol (Anthropic) is an open-source protocol that lets AI agents (Claude Desktop, custom agents, etc.) use external tools via a standardised interface.
By enabling Bjorn's MCP server, any MCP client can query and control Bjorn — without knowing the internal DB structure.
Exposed tools
| Tool | Arguments | Description |
|---|---|---|
get_hosts |
alive_only: bool = True |
Returns discovered hosts (IP, MAC, hostname, OS, ports) |
get_vulnerabilities |
host_ip: str = "", limit: int = 100 |
Returns discovered CVE vulnerabilities |
get_credentials |
service: str = "", limit: int = 100 |
Returns captured credentials (SSH, FTP, SMB…) |
get_action_history |
limit: int = 50, action_name: str = "" |
History of executed actions |
get_status |
(none) | Real-time state: mode, active action, counters |
run_action |
action_name: str, target_ip: str, target_mac: str = "" |
Queues a Bjorn action (MCP priority = 80) |
query_db |
sql: str, params: str = "[]" |
Free SELECT against the SQLite DB (read-only) |
Security: each tool checks mcp_allowed_tools — unlisted tools return a clean error. query_db rejects anything that is not a SELECT.
_impl_run_action — priority detail
_MCP_PRIORITY = 80 # > scheduler(40) > normal(50)
sd.db.queue_action(
action_name=action_name,
mac=mac, # resolved from hosts WHERE ip=? if not supplied
ip=target_ip,
priority=_MCP_PRIORITY,
trigger="mcp",
metadata={"decision_method": "mcp", "decision_origin": "mcp"},
)
sd.queue_event.set() # wakes the orchestrator immediately
Available transports
| Transport | Config | Usage |
|---|---|---|
http (default) |
mcp_transport: "http", mcp_port: 8765 |
Accessible from any MCP client on LAN via SSE |
stdio |
mcp_transport: "stdio" |
Claude Desktop, CLI agents |
5. LLM Orchestrator (llm_orchestrator.py)
The LLM Orchestrator transforms Bjorn from a scriptable tool into an autonomous agent. It is completely optional and disableable via llm_orchestrator_mode = "none".
Operating modes
| Mode | Config value | Operation |
|---|---|---|
| Disabled | "none" (default) |
LLM plays no role in planning |
| Advisor | "advisor" |
LLM consulted periodically, suggests 1 action |
| Autonomous | "autonomous" |
Own thread, LLM observes + plans with tools |
Internal architecture
LLMOrchestrator
├── start() Starts autonomous thread if mode=autonomous
├── stop() Stops thread (join 15s max)
├── restart_if_mode_changed() Called from orchestrator.run() each iteration
├── is_active() True if autonomous thread is alive
│
├── [ADVISOR MODE]
│ advise() → called from orchestrator._process_background_tasks()
│ ├── _build_snapshot() → compact dict (hosts, vulns, creds, queue)
│ ├── LLMBridge().complete(prompt, system)
│ └── _apply_advisor_response(raw, allowed)
│ ├── parse JSON {"action": str, "target_ip": str, "reason": str}
│ ├── validate action ∈ allowed
│ └── db.queue_action(priority=85, trigger="llm_advisor")
│
└── [AUTONOMOUS MODE]
_autonomous_loop() Thread "LLMOrchestrator" (daemon)
└── loop:
_compute_fingerprint() → (hosts, vulns, creds, max_queue_id)
_has_actionable_change() → skip if nothing increased
_run_autonomous_cycle()
├── filter tools: read-only always + run_action if in allowed
├── LLMBridge().complete(prompt, system, tools=[...])
│ └── _call_anthropic() agentic loop
│ → LLM calls run_action via tools
│ → _execute_tool → _impl_run_action → queue
└── if llm_orchestrator_log_reasoning=True:
logger.info("[LLM_ORCH_REASONING]...")
_push_to_chat() → "llm_orchestrator" session in LLMBridge
sleep(llm_orchestrator_interval_s)
Fingerprint and smart skip
def _compute_fingerprint(self) -> tuple:
# (host_count, vuln_count, cred_count, max_completed_queue_id)
return (hosts, vulns, creds, last_id)
def _has_actionable_change(self, fp: tuple) -> bool:
if self._last_fingerprint is None:
return True # first cycle always runs
# Triggers ONLY if something INCREASED
# hosts going offline → not actionable
return any(fp[i] > self._last_fingerprint[i] for i in range(len(fp)))
Token savings: if llm_orchestrator_skip_if_no_change = True (default), the LLM cycle is skipped if no new hosts/vulns/creds and no action completed since the last cycle.
LLM priorities vs queue
_ADVISOR_PRIORITY = 85 # advisor > MCP(80) > normal(50) > scheduler(40)
_AUTONOMOUS_PRIORITY = 82 # autonomous slightly below advisor
Autonomous system prompt — example
"You are Bjorn's autonomous orchestrator, running on a Raspberry Pi network security tool.
Current state: 12 hosts discovered, 3 vulnerabilities, 1 credentials.
Operation mode: ATTACK. Hard limit: at most 3 run_action calls per cycle.
Only these action names may be queued: NmapScan, SSHBruteforce, SMBScan.
Strategy: prioritise unexplored services, hosts with high port counts, and hosts with no recent scans.
Do not queue duplicate actions already pending or recently successful.
Use Norse references occasionally. Be terse and tactical."
Advisor response format
// Action recommended:
{"action": "NmapScan", "target_ip": "192.168.1.42", "reason": "unexplored host, 0 open ports known"}
// Nothing to do:
{"action": null}
Reasoning log
When llm_orchestrator_log_reasoning = True:
- Full reasoning is logged via
logger.info("[LLM_ORCH_REASONING]...") - It is also injected into the
"llm_orchestrator"session inLLMBridge._chat_histories - Viewable in real time in
chat.htmlvia the Orch Log button
6. Orchestrator & Scheduler integration
orchestrator.py
# __init__
self.llm_orchestrator = None
self._init_llm_orchestrator()
# _init_llm_orchestrator()
if shared_data.config.get("llm_enabled") and shared_data.config.get("llm_orchestrator_mode") != "none":
from llm_orchestrator import LLMOrchestrator
self.llm_orchestrator = LLMOrchestrator(shared_data)
self.llm_orchestrator.start()
# run() — each iteration
self._sync_llm_orchestrator() # starts/stops thread according to runtime config
# _process_background_tasks()
if self.llm_orchestrator and mode == "advisor":
self.llm_orchestrator.advise()
action_scheduler.py — skip option
# In run(), each iteration:
_llm_skip = bool(
shared_data.config.get("llm_orchestrator_skip_scheduler", False)
and shared_data.config.get("llm_orchestrator_mode") == "autonomous"
and shared_data.config.get("llm_enabled", False)
)
if not _llm_skip:
self._publish_all_upcoming() # step 2: publish due actions
self._evaluate_global_actions() # step 3: global evaluation
self.evaluate_all_triggers() # step 4: per-host triggers
# Steps 1 (promote due) and 5 (cleanup/priorities) always run
When llm_orchestrator_skip_scheduler = True + mode = autonomous + llm_enabled = True:
- The scheduler no longer publishes automatic actions (no more
B_require,B_trigger, etc.) - The autonomous LLM becomes sole master of the queue
- Queue hygiene (promotions, cleanup) remains active
7. Web Utils LLM (web_utils/llm_utils.py)
Follows the exact same pattern as all other web_utils (constructor __init__(self, shared_data), methods called by webapp.py).
Methods
| Method | Type | Description |
|---|---|---|
get_llm_status(handler) |
GET | LLM bridge state (active backend, LaRuche URL…) |
get_llm_config(handler) |
GET | Current LLM config (api_key masked) |
get_llm_reasoning(handler) |
GET | llm_orchestrator session history (reasoning log) |
handle_chat(data) |
POST | Sends a message, returns LLM response |
clear_chat_history(data) |
POST | Clears a session's history |
get_mcp_status(handler) |
GET | MCP server state (running, port, transport) |
toggle_mcp(data) |
POST | Enables/disables MCP server + saves config |
save_mcp_config(data) |
POST | Saves MCP config (tools, port, transport) |
save_llm_config(data) |
POST | Saves LLM config (all parameters) |
8. EPD comment integration (comment.py)
Behaviour before modification
get_comment(status, lang, params)
└── if delay elapsed OR status changed
└── _pick_text(status, lang, params) ← SQLite DB
└── returns weighted text
Behaviour after modification
get_comment(status, lang, params)
└── if delay elapsed OR status changed
│
├── [if llm_comments_enabled = True]
│ └── LLMBridge().generate_comment(status, params)
│ ├── success → LLM text (≤12 words, ~8s max)
│ └── failure/timeout → text = None
│
└── [if text = None] ← SYSTEMATIC FALLBACK
└── _pick_text(status, lang, params) ← original behaviour
└── returns weighted DB text
Original behaviour preserved 100% if LLM disabled or failing.
9. Configuration (shared.py)
LLM Bridge section (__title_llm__)
| Key | Default | Type | Description |
|---|---|---|---|
llm_enabled |
False |
bool | Master toggle — activates the entire bridge |
llm_comments_enabled |
False |
bool | Use LLM for EPD comments |
llm_chat_enabled |
True |
bool | Enable /chat.html interface |
llm_chat_tools_enabled |
False |
bool | Enable tool-calling in web chat |
llm_backend |
"auto" |
str | auto | laruche | ollama | api |
llm_laruche_discovery |
True |
bool | Auto-discover LaRuche nodes via mDNS |
llm_laruche_url |
"" |
str | Manual LaRuche URL (overrides discovery) |
llm_ollama_url |
"http://127.0.0.1:11434" |
str | Local Ollama URL |
llm_ollama_model |
"phi3:mini" |
str | Ollama model to use |
llm_api_provider |
"anthropic" |
str | anthropic | openai | openrouter |
llm_api_key |
"" |
str | API key (masked in UI) |
llm_api_model |
"claude-haiku-4-5-20251001" |
str | External API model |
llm_api_base_url |
"" |
str | Custom base URL (OpenRouter, proxy…) |
llm_timeout_s |
30 |
int | Global LLM call timeout (seconds) |
llm_max_tokens |
500 |
int | Max tokens for chat |
llm_comment_max_tokens |
80 |
int | Max tokens for EPD comments |
llm_chat_history_size |
20 |
int | Max messages per chat session |
MCP Server section (__title_mcp__)
| Key | Default | Type | Description |
|---|---|---|---|
mcp_enabled |
False |
bool | Enable MCP server |
mcp_transport |
"http" |
str | http (SSE) | stdio |
mcp_port |
8765 |
int | HTTP SSE port |
mcp_allowed_tools |
[all] |
list | List of authorised MCP tools |
LLM Orchestrator section (__title_llm_orch__)
| Key | Default | Type | Description |
|---|---|---|---|
llm_orchestrator_mode |
"none" |
str | none | advisor | autonomous |
llm_orchestrator_interval_s |
60 |
int | Delay between autonomous cycles (min 30s) |
llm_orchestrator_max_actions |
3 |
int | Max actions per autonomous cycle |
llm_orchestrator_allowed_actions |
[] |
list | Actions the LLM may queue (empty = mcp_allowed_tools) |
llm_orchestrator_skip_scheduler |
False |
bool | Disable scheduler when autonomous is active |
llm_orchestrator_skip_if_no_change |
True |
bool | Skip cycle if fingerprint unchanged |
llm_orchestrator_log_reasoning |
False |
bool | Log full LLM reasoning |
10. HTTP Routes (webapp.py)
GET routes
| Route | Handler | Description |
|---|---|---|
GET /api/llm/status |
llm_utils.get_llm_status |
LLM bridge state |
GET /api/llm/config |
llm_utils.get_llm_config |
LLM config (api_key masked) |
GET /api/llm/reasoning |
llm_utils.get_llm_reasoning |
Orchestrator reasoning log |
GET /api/mcp/status |
llm_utils.get_mcp_status |
MCP server state |
POST routes (JSON data-only)
| Route | Handler | Description |
|---|---|---|
POST /api/llm/chat |
llm_utils.handle_chat |
Send a message to the LLM |
POST /api/llm/clear_history |
llm_utils.clear_chat_history |
Clear a session's history |
POST /api/llm/config |
llm_utils.save_llm_config |
Save LLM config |
POST /api/mcp/toggle |
llm_utils.toggle_mcp |
Enable/disable MCP |
POST /api/mcp/config |
llm_utils.save_mcp_config |
Save MCP config |
All routes respect Bjorn's existing authentication (webauth).
11. Web interfaces
/chat.html
Terminal-style chat interface (black/red, consistent with Bjorn).
Features:
- Auto-detects LLM state on load (
GET /api/llm/status) - Displays active backend (LaRuche URL, or mode)
- "Bjorn is thinking..." indicator during response
- Unique session ID per browser tab
Enter= send,Shift+Enter= new line- Textarea auto-resize
- "Clear history" button — clears server-side session
- "Orch Log" button — loads the autonomous orchestrator's reasoning
- Calls
GET /api/llm/reasoning - Renders each message (cycle prompt + LLM response) as chat bubbles
- "← Back to chat" to return to normal chat
- Helper message if log is empty (hint: enable
llm_orchestrator_log_reasoning)
- Calls
Access: http://[bjorn-ip]:8000/chat.html
/mcp-config.html
Full LLM & MCP configuration page.
LLM Bridge section:
- Master enable/disable toggle
- EPD comments, chat, chat tool-calling toggles
- Backend selector (auto / laruche / ollama / api)
- LaRuche mDNS discovery toggle + manual URL
- Ollama configuration (URL + model)
- External API configuration (provider, key, model, custom URL)
- Timeout and token parameters
- "TEST CONNECTION" button
MCP Server section:
- Enable toggle with live start/stop
- Transport selector (HTTP SSE / stdio)
- HTTP port
- Per-tool checkboxes
- "RUNNING" / "OFF" indicator
Access: http://[bjorn-ip]:8000/mcp-config.html
12. Startup (Bjorn.py)
# LLM Bridge — warm up singleton
try:
from llm_bridge import LLMBridge
LLMBridge() # Starts mDNS discovery if llm_laruche_discovery=True
logger.info("LLM Bridge initialised")
except Exception as e:
logger.warning("LLM Bridge init skipped: %s", e)
# MCP Server
try:
import mcp_server
if shared_data.config.get("mcp_enabled", False):
mcp_server.start() # Daemon thread "MCPServer"
logger.info("MCP server started")
else:
logger.info("MCP server loaded (disabled)")
except Exception as e:
logger.warning("MCP server init skipped: %s", e)
The LLM Orchestrator is initialised inside orchestrator.py (not Bjorn.py), since it depends on the orchestrator loop cycle.
13. LaRuche / LAND Protocol compatibility
LAND Protocol
LAND (Local AI Network Discovery) is the LaRuche protocol:
- Discovery: mDNS service type
_ai-inference._tcp.local. - Inference:
POST http://[node]:8419/infer
What Bjorn implements on the Python side
# mDNS listening (zeroconf)
from zeroconf import Zeroconf, ServiceBrowser
ServiceBrowser(zc, "_ai-inference._tcp.local.", listener)
# → Auto-detects LaRuche nodes
# Inference call (urllib stdlib, zero dependency)
payload = {"prompt": "...", "capability": "llm", "max_tokens": 500}
urllib.request.urlopen(f"{url}/infer", data=json.dumps(payload))
Scenarios
| Scenario | Behaviour |
|---|---|
| LaRuche node detected on LAN | Used automatically as priority backend |
| Multiple LaRuche nodes | First discovered is used |
| Manual URL configured | Used directly, discovery ignored |
| LaRuche node absent | Cascades to Ollama or external API |
zeroconf not installed |
Discovery silently disabled, DEBUG log |
14. Optional dependencies
| Package | Min version | Feature unlocked | Install command |
|---|---|---|---|
mcp[cli] |
≥ 1.0.0 | Full MCP server | pip install "mcp[cli]" |
zeroconf |
≥ 0.131.0 | LaRuche mDNS discovery | pip install zeroconf |
No new dependencies added for LLM backends:
- LaRuche / Ollama: uses
urllib.request(Python stdlib) - Anthropic / OpenAI: REST API via
urllib— no SDK needed
15. Quick activation & configuration
Basic LLM chat
curl -X POST http://[bjorn-ip]:8000/api/llm/config \
-H "Content-Type: application/json" \
-d '{"llm_enabled": true, "llm_backend": "ollama", "llm_ollama_model": "phi3:mini"}'
# → http://[bjorn-ip]:8000/chat.html
Chat with tool-calling (LLM accesses live network data)
curl -X POST http://[bjorn-ip]:8000/api/llm/config \
-d '{"llm_enabled": true, "llm_chat_tools_enabled": true}'
LLM Orchestrator — advisor mode
curl -X POST http://[bjorn-ip]:8000/api/llm/config \
-d '{
"llm_enabled": true,
"llm_orchestrator_mode": "advisor",
"llm_orchestrator_allowed_actions": ["NmapScan", "SSHBruteforce"]
}'
LLM Orchestrator — autonomous mode (LLM as sole planner)
curl -X POST http://[bjorn-ip]:8000/api/llm/config \
-d '{
"llm_enabled": true,
"llm_orchestrator_mode": "autonomous",
"llm_orchestrator_skip_scheduler": true,
"llm_orchestrator_max_actions": 5,
"llm_orchestrator_interval_s": 120,
"llm_orchestrator_allowed_actions": ["NmapScan", "SSHBruteforce", "SMBScan"],
"llm_orchestrator_log_reasoning": true
}'
# → View reasoning: http://[bjorn-ip]:8000/chat.html → Orch Log button
With Anthropic API
curl -X POST http://[bjorn-ip]:8000/api/llm/config \
-d '{
"llm_enabled": true,
"llm_backend": "api",
"llm_api_provider": "anthropic",
"llm_api_key": "sk-ant-...",
"llm_api_model": "claude-haiku-4-5-20251001"
}'
With OpenRouter (access to all models)
curl -X POST http://[bjorn-ip]:8000/api/llm/config \
-d '{
"llm_enabled": true,
"llm_backend": "api",
"llm_api_provider": "openrouter",
"llm_api_key": "sk-or-...",
"llm_api_model": "meta-llama/llama-3.2-3b-instruct",
"llm_api_base_url": "https://openrouter.ai/api"
}'
Model recommendations by scenario
| Scenario | Backend | Recommended model | Pi RAM |
|---|---|---|---|
| Autonomous orchestrator + LaRuche on LAN | laruche | Mistral/Phi on the node | 0 (remote inference) |
| Autonomous orchestrator offline | ollama | qwen2.5:3b |
~3 GB |
| Autonomous orchestrator cloud | api | claude-haiku-4-5-20251001 |
0 |
| Chat + tools | ollama | phi3:mini |
~2 GB |
| EPD comments only | ollama | smollm2:360m |
~400 MB |
16. Complete API endpoint reference
GET
GET /api/llm/status
→ {"enabled": bool, "backend": str, "laruche_url": str|null,
"laruche_discovery": bool, "ollama_url": str, "ollama_model": str,
"api_provider": str, "api_model": str, "api_key_set": bool}
GET /api/llm/config
→ {all llm_* keys except api_key, + "llm_api_key_set": bool}
GET /api/llm/reasoning
→ {"status": "ok", "messages": [{"role": str, "content": str}, ...], "count": int}
→ {"status": "error", "message": str, "messages": [], "count": 0}
GET /api/mcp/status
→ {"enabled": bool, "running": bool, "transport": str,
"port": int, "allowed_tools": [str]}
POST
POST /api/llm/chat
Body: {"message": str, "session_id": str?}
→ {"status": "ok", "response": str, "session_id": str}
→ {"status": "error", "message": str}
POST /api/llm/clear_history
Body: {"session_id": str?}
→ {"status": "ok"}
POST /api/llm/config
Body: {any subset of llm_* and llm_orchestrator_* keys}
→ {"status": "ok"}
→ {"status": "error", "message": str}
POST /api/mcp/toggle
Body: {"enabled": bool}
→ {"status": "ok", "enabled": bool, "started": bool?}
POST /api/mcp/config
Body: {"allowed_tools": [str]?, "port": int?, "transport": str?}
→ {"status": "ok", "config": {...}}
17. Queue priority system
Priority Source Trigger
──────────────────────────────────────────────────────────────
85 LLM Advisor llm_orchestrator.advise()
82 LLM Autonomous _run_autonomous_cycle() via run_action tool
80 External MCP _impl_run_action() via MCP client or chat
50 Normal / manual queue_action() without explicit priority
40 Scheduler action_scheduler evaluates triggers
The scheduler always processes the highest-priority pending item first. LLM and MCP actions therefore preempt scheduler actions.
18. Fallbacks & graceful degradation
| Condition | Behaviour |
|---|---|
llm_enabled = False |
complete() returns None immediately — zero overhead |
llm_orchestrator_mode = "none" |
LLMOrchestrator not instantiated |
mcp not installed |
_build_mcp_server() returns None, WARNING log |
zeroconf not installed |
LaRuche discovery silently disabled, DEBUG log |
| LaRuche node timeout | Exception caught, cascade to next backend |
| Ollama not running | URLError caught, cascade to API |
| API key missing | _call_api() returns None, cascade |
| All backends fail | complete() returns None |
LLM returns None for EPD |
comment.py uses _pick_text() (original behaviour) |
| LLM advisor: invalid JSON | DEBUG log, returns None, next cycle |
| LLM advisor: disallowed action | WARNING log, ignored |
| LLM autonomous: no change | cycle skipped, zero API call |
| LLM autonomous: ≥6 tool turns | returns partial text + warning |
| Exception in LLM Bridge | try/except at every level, DEBUG log |
Timeouts
Chat / complete() → llm_timeout_s (default: 30s)
EPD comments → 8s (hardcoded, short to avoid blocking render)
Autonomous cycle → 90s (long: may chain multiple tool calls)
Advisor → 20s (short prompt + JSON response)
19. Call sequences
Web chat with tool-calling
Browser → POST /api/llm/chat {"message": "which hosts are vulnerable?"}
└── LLMUtils.handle_chat(data)
└── LLMBridge().chat(message, session_id)
└── complete(messages, system, tools=_BJORN_TOOLS)
└── _call_anthropic(messages, tools=[...])
├── POST /v1/messages → stop_reason=tool_use
│ └── tool: get_hosts(alive_only=true)
│ → _execute_tool → _impl_get_hosts()
│ → JSON of hosts
├── POST /v1/messages [+ tool result] → end_turn
└── returns "3 exposed SSH hosts: 192.168.1.10, ..."
← {"status": "ok", "response": "3 exposed SSH hosts..."}
LLM autonomous cycle
Thread "LLMOrchestrator" (daemon, interval=60s)
└── _run_autonomous_cycle()
├── fp = _compute_fingerprint() → (12, 3, 1, 47)
├── _has_actionable_change(fp) → True (vuln_count 2→3)
├── self._last_fingerprint = fp
│
└── LLMBridge().complete(prompt, system, tools=[read-only + run_action])
└── _call_anthropic(tools=[...])
├── POST → tool_use: get_hosts()
│ → [{ip: "192.168.1.20", ports: "22,80,443"}]
├── POST → tool_use: get_action_history()
│ → [...]
├── POST → tool_use: run_action("SSHBruteforce", "192.168.1.20")
│ → _execute_tool → _impl_run_action()
│ → db.queue_action(priority=82, trigger="llm_autonomous")
│ → queue_event.set()
└── POST → end_turn
→ "Queued SSHBruteforce on 192.168.1.20 (Mjolnir strikes the unguarded gate)"
→ [if log_reasoning=True] logger.info("[LLM_ORCH_REASONING]...")
→ [if log_reasoning=True] _push_to_chat(bridge, prompt, response)
Reading reasoning from chat.html
User clicks "Orch Log"
└── fetch GET /api/llm/reasoning
└── LLMUtils.get_llm_reasoning(handler)
└── LLMBridge()._chat_histories["llm_orchestrator"]
→ [{"role": "user", "content": "[Autonomous cycle]..."},
{"role": "assistant", "content": "Queued SSHBruteforce..."}]
← {"status": "ok", "messages": [...], "count": 2}
→ Rendered as chat bubbles in #messages
MCP from external client (Claude Desktop)
Claude Desktop → tool_call: run_action("NmapScan", "192.168.1.0/24")
└── FastMCP dispatch
└── mcp_server.run_action(action_name, target_ip)
└── _impl_run_action()
├── db.queue_action(priority=80, trigger="mcp")
└── queue_event.set()
← {"status": "queued", "action": "NmapScan", "target": "192.168.1.0/24", "priority": 80}
EPD comment with LLM
display.py → CommentAI.get_comment("SSHBruteforce", params={...})
└── delay elapsed OR status changed → proceed
├── llm_comments_enabled = True ?
│ └── LLMBridge().generate_comment("SSHBruteforce", params)
│ └── complete([{role:user, content:"Status: SSHBruteforce..."}],
│ max_tokens=80, timeout=8)
│ ├── LaRuche → "Norse gods smell SSH credentials..." ✓
│ └── [or timeout 8s] → None
└── text = None → _pick_text("SSHBruteforce", lang, params)
└── SELECT FROM comments WHERE status='SSHBruteforce'
→ "Processing authentication attempts..."