Sokrates Delivery Architecture

Summary

The Sókrates delivery architecture describes how the platform — an on-premises AI appliance providing automated data integration, a living knowledge graph, and continuous operational intelligence — is deployed, operated, and evolved at customer sites. This page covers the operational lifecycle from hardware provisioning through onboarding, steady-state operation, and cross-fleet learning. For the underlying system design, see Technical Architecture Whitepaper. For the commercial model, see Sokrates Product Bundles (Cowork, Code, Compound).

1. Delivery Model

Sókrates is delivered as a managed on-premises subscription. The customer receives a physical appliance that sits behind their firewall. Sókrates provides the hardware, the software stack, continuous model improvements, automatic schema healing, and fleet-wide intelligence updates. The customer gets the output of a dedicated AI and data engineering team without the payroll.

The delivery progresses through three service tiers — Cowork (augment existing staff), Code (expand capabilities), and Compound (deploy autonomous agents) — each representing a phase transition in how the business uses AI. See Sokrates Product Bundles (Cowork, Code, Compound) for the commercial rationale.

2. Hardware

Coordination Tier: CWWK N305

All deployments use the CWWK 4-LAN N305 fanless mini-servers (Intel i3-N305, 8C/8T up to 3.80GHz, DDR5 up to 32GB, 4x Intel i226-V 2.5GbE LAN, M.2 NVMe, 15W TDP). These run the coordination stack: Hermes Agent, a local Eidos instance, Neo4j, MCP servers, and route complex inference to cloud providers via OpenRouter. See CWWK 4-LAN N305 (Sokrates Box) for full specifications.

Inference Tier: NVIDIA DGX Spark

The local inference appliance is the NVIDIA DGX Spark — a desktop-format AI computer powered by the GB10 Grace Blackwell Superchip. It provides 128 GB of unified coherent memory (shared CPU/GPU via NVLink-C2C, no PCIe bottleneck) and 1 PFLOP of FP4 AI performance.

This hardware enables full local inference:

Gemma 4 31B Dense in bf16 occupies ~62 GB, leaving 66 GB for inference context, KV cache, Neo4j, and the rest of the stack.
Client-specific LoRA fine-tuning is feasible on-device — the unified memory accommodates base weights, adapter weights, optimiser states, and gradient buffers.
Continuous local inference — the Sókrates agent runs 24/7 without routing to external APIs for core reasoning tasks.
Fine-tuned variants planned for vertical-specific deployments.

The DGX Spark retails at $3, 999 (F o u n d e r^{'} s E d i t i o n) . C o mbin e d w i t h o p e n - so u r ceso f tw a r e (N i x O S, N eo 4 j C E, A p a c h e 2.0 G e mma), t h e ma r g ina l ha r d w a r ecos tp er d e pl oy m e n t i s u n d er$ 5,000.

On pilot/dev deployments without DGX Spark, the CWWK N305 handles coordination while complex reasoning routes to cloud providers via OpenRouter.

Fleet Command (Future)

A central DGX Station GB300 (Blackwell Ultra, 748 GB memory) serves as fleet command — running larger models for cross-fleet learning, producing improved LoRA adapters, and distributing basis updates. Edge boxes route complex inference to fleet command when local capacity is insufficient.

3. Operating System and Deployment

The entire software stack runs on NixOS, providing reproducible builds, declarative system configuration, and atomic rollbacks. The system is defined in a Nix flake — a customer box can be rebuilt from the flake definition to an identical state.

Two NixOS configurations exist:

sokrates-dev — Development hardware (GMKtec), open internet, full development tooling. Used internally. See sokrates-dev.
sokrates-box — Production appliance, locked-down egress whitelist, fleet management. See Sókrates Box NixOS Image.

All services are containerised via Docker. The host NixOS system reserves /var/lib/sokrates/ for persistent data, secrets, and MCP configurations. Images are built via Nix flakes and do not rely on host-side volume mounts for application code.

4. Software Stack

The deployed stack comprises four primary components, each with a distinct operational role:

4.1 Hermes Agent — Communication and Channel I/O

Hermes is the communication layer. It handles channel I/O across Slack, Microsoft Teams, Email, Telegram, Discord, and WhatsApp. It manages message routing, user interaction, and the agentic execution loop — but it never touches customer system credentials.

Hermes runs as a NixOS systemd service and maintains persistent state through three cognitive files (SOUL.md, USER.md, MEMORY.md) injected into its system prompt. It supports skill generation from successful task trajectories and delegates complex work to ephemeral subagents via delegate_task.

See Hermes Agent Framework for the full cognitive architecture.

4.2 Eidos — Knowledge Graph and Operational Memory

Eidos is the knowledge graph — a FastAPI service backed by Neo4j with Voyage AI contextual embeddings, exposed via MCP. It stores and reasons over the customer’s operational topology: entities, processes, constraints, and observations, all typed through the Hyle ontological framework.

Eidos is seeded during onboarding and grows autonomously as the Sókrates agent maps the customer’s operations. A Curator Agent performs daily maintenance — consolidating duplicates, flagging contradictions, and proposing merges for human approval.

4.3 The Sókrates Agent — Continuous Operational Intelligence

On DGX Spark deployments, a finetuned Gemma 4 31B Dense model runs continuously as the Sókrates agent — a multi-mode orchestration system that cycles through four operational modes:

Socratic Interrogation — examining the knowledge graph for structural anomalies, incomplete ontology regions, and cross-source entity resolution opportunities.
Topology Mapping — evaluating materialised hyperedges on schedule and virtual hyperedges on trigger, updating the organisational model.
Inefficiency Surfacing — authoring new generating queries that capture discovered patterns, encoding operational insights as living facts in the graph.
Validation — verifying that new generating queries terminate, produce non-empty results, and introduce no cycles in the metalayer dependency graph.

On coordination-tier hardware (CWWK N305) without a DGX Spark, complex reasoning is routed to cloud providers via OpenRouter. The Sókrates agent’s continuous inference capability scales with the hardware tier.

4.4 The Hypergraph Metalayer

The metalayer computes organisational topology as living hyperedges — generating queries whose result sets change whenever the underlying data changes. This is Datalog semantics over a graph database: ground facts (Hyle nodes), derived facts (generating queries), and compositional rules (metalayer expressions) evaluated to a fixed point.

Hyperedges can be materialised (cached, refreshed on schedule — suitable for stable structures like org charts) or virtual (computed on demand — suitable for active bottlenecks and in-flight anomalies). Differential evaluation ensures that when a few input facts change, only the affected derived facts are recomputed.

See Technical Architecture Whitepaper, §4 for the full metalayer design.

5. Onboarding: The Zero-Integration Pipeline

Customer onboarding follows the DMCG (Datamodel Code Generator) pipeline described in the Technical Architecture Whitepaper, §2. The operational sequence:

Connect — The customer’s operational systems (ERP, CRM, project management) expose OpenAPI specifications. Sókrates reads these specs directly.
Generate — DMCG produces typed Pydantic v2 models with --base-class hyle.BaseNode. Every generated class inherits the metaclass, the registry, the query builder, and persistence methods automatically.
Register — HyleMeta fires during class creation, auto-registering the node type. Eidos receives an observer callback, creates Neo4j constraints and indexes, and updates its schema cache.
Enrich — The Gemma 4 model (or Claude on pilot hardware) performs semantic field enrichment, ontological classification, and cross-source entity resolution.
Activate — The customer’s data is immediately queryable and persistable in the knowledge graph. No manual mapping. No integration sprint.

When a customer’s API schema changes — a field added, a type modified, an entity removed — the pipeline regenerates automatically. This is self-healing at the schema level: the knowledge graph’s type system evolves in lockstep with the source systems.

6. Self-Evolution

The Sókrates agent does not merely run inference — it improves itself through the Hermes self-evolution harness (DSPy + GEPA):

A skill, prompt, or generating query with measurable performance is selected as a target.
An evaluation dataset is mined from real session history or synthetically generated.
The target is wrapped as a DSPy module.
GEPA (Genetic Evolution of Prompt Architectures) reads execution traces to understand why things fail, proposes targeted improvements. Works with as few as 3 examples, no GPU training required.
The optimised version is evaluated against baseline with statistical significance checks.
Deployment via git commit, with optional A/B testing and rollback via git revert.

Sókrates extends this beyond the Hermes baseline with metalayer query evolution (generating queries that surface genuine inefficiencies are selected for; noisy ones are eliminated) and cognitive antibodies (execution traces analysed for confabulation signatures and circular reasoning patterns, which become negative examples for future generations).

7. The Basis — Cross-Fleet Intelligence

While customer data never crosses customer boundaries, structural intelligence compounds across the fleet:

Common schema patterns (ERP entities, CRM structures, HR hierarchies) are recognised faster with each deployment.
Metalayer query templates that surfaced bottlenecks at one company apply to others.
LoRA adapters are refined with each deployment’s training signal.
The time from “box arrives” to “first useful insight” decreases monotonically.

The Basis is a structured knowledge base of deployment principles classified as Tentative, Confirmed, Conditional, or Refuted. The Sókrates agent at a new engagement starts with “generation-zero” knowledge of common industry failure modes and optimisation patterns, consulted via API from fleet command.

Data sovereignty is an architectural invariant, not a bolt-on. The NixOS egress whitelist and nftables security boundary enforce it at the OS level. What flows to fleet command (with explicit client consent) is pattern intelligence — “schemas with these characteristics tend to have these ontological patterns” — not row-level data.

8. Security Architecture

The deployed stack enforces strict separation between channel I/O and customer data:

Component	Credentials	Network Access
Hermes Agent	Channel tokens (Slack, Telegram, etc.)	Internet (channel APIs only)
Eidos + Sókrates Agent	Customer system credentials (encrypted via sops-nix)	Customer LAN + Sókrates API only
Neo4j	Local auth	Localhost only

Enforcement is at the OS level via nftables — this is network isolation, not application-level access control. Hermes cannot reach customer system credentials. Eidos cannot reach the open internet. See Hermes Agent Security Model and Sokrates Permission Model and Hermes Agent Privileges.

Additional hardening:

Hermes runs under systemd with NoNewPrivileges, ProtectSystem=strict, and scoped ReadWritePaths.
Tirith pre-execution scanning intercepts dangerous command patterns and requires operator approval.
The Curator Agent’s proposed graph modifications are presented to a human via Telegram before execution.

9. Subscription Lifecycle

Onboarding: Hardware provisioned and shipped. NixOS image flashed. Customer selects which systems to connect (MCP server scoping) and which employees the agent may interact with. DMCG pipeline runs against connected APIs. Eidos seeded with existing documentation. Hermes configured for the customer’s channel topology.

Steady state: The Sókrates agent runs continuously — mapping topology, surfacing inefficiencies, authoring generating queries. Schema healing tracks source system changes automatically. The Curator Agent maintains graph hygiene daily. LoRA adapters refine over time. Basis consultations improve with each fleet-wide deployment.

Cancellation: The customer retains the physical box, the Eidos knowledge graph, the Hermes channel plumbing, and all MCP connectors. The Sókrates agent (the continuous intelligence layer) and Basis access are deactivated. The customer keeps their data and infrastructure; they lose the brain.

Sókrates Wiki

Explorer

Sokrates Delivery Architecture

Sokrates Delivery Architecture

Summary

1. Delivery Model

2. Hardware

Coordination Tier: CWWK N305

Inference Tier: NVIDIA DGX Spark

Fleet Command (Future)

3. Operating System and Deployment

4. Software Stack

4.1 Hermes Agent — Communication and Channel I/O

4.2 Eidos — Knowledge Graph and Operational Memory

4.3 The Sókrates Agent — Continuous Operational Intelligence

4.4 The Hypergraph Metalayer

5. Onboarding: The Zero-Integration Pipeline

6. Self-Evolution

7. The Basis — Cross-Fleet Intelligence

8. Security Architecture

9. Subscription Lifecycle

Graph View

Table of Contents

Backlinks

Sókrates Wiki

Explorer

Sokrates Delivery Architecture

Sokrates Delivery Architecture

Summary

1. Delivery Model

2. Hardware

Coordination Tier: CWWK N305

Inference Tier: NVIDIA DGX Spark

Fleet Command (Future)

3. Operating System and Deployment

4. Software Stack

4.1 Hermes Agent — Communication and Channel I/O

4.2 Eidos — Knowledge Graph and Operational Memory

4.3 The Sókrates Agent — Continuous Operational Intelligence

4.4 The Hypergraph Metalayer

5. Onboarding: The Zero-Integration Pipeline

6. Self-Evolution

7. The Basis — Cross-Fleet Intelligence

8. Security Architecture

9. Subscription Lifecycle

Related

Graph View

Table of Contents

Backlinks