The Basis

Summary

The Basis is Sókrates’s accumulated body of operational principles — generalizable knowledge about how enterprises organise work, where workflows break down, and what interventions reliably improve them. It is the core competitive moat: seeded once from curated sources, then grown continuously as the Sókrates Agent operates across customer deployments. The Basis is never stored on customer hardware; it is consulted by the agent’s intelligence layer via API during reasoning.

What the Basis Contains

The Basis is a tiered, vectorised knowledge system of operational principles. Each principle is a generalizable statement about enterprise behaviour with associated evidence, confidence, and domain scope.

Confidence Tiers

TierCriteriaRole
Confirmed3+ independent sources across 2+ domainsHigh-confidence defaults — the agent applies these without hesitation
TentativeSingle domain or limited evidenceBulk of the seed — applied with monitoring, promoted on validation
ConditionalContext-dependent with explicit conditionsApplied only when conditions match the customer’s situation
Forced SeedsInjected hypotheses without corpus evidenceSókrates-specific methodology (Socratic elicitation, depth-first escalation, composable skill chains)
Refuted / Anti-patternsKnown failure modes and contradictionsNegative knowledge — prevents the agent from repeating known mistakes

Two Knowledge Sources

1. Genesis Engine (Seed) The Basis Genesis Engine (genesis-workspace/) is a 5-stage extraction pipeline that produces the generation-0 seed:

  1. Corpus collection: 1,565 quality-filtered documents from 14 repositories — enterprise process docs (Dynamics 365, SharePoint), agent frameworks (Anthropic Cookbook, Google ADK), organisational handbooks (SOPs, HR manuals, Basecamp handbook), and the Sókrates autoresearch specification.
  2. Principle extraction: Batched NLP scoring identifies principle-bearing sentences (imperative language, optimisation keywords, failure modes, conditions). ~19,000 raw principles extracted.
  3. Semantic clustering: Voyage-4-large embeddings (1024-dim) + DBSCAN clustering groups semantically equivalent principles. 2,108 clusters from 19,054 principles (59% noise, intentionally aggressive filtering).
  4. Pruning and tiering: Heuristic classification into include/quarantine/discard bins. Tier assignment based on source diversity and confidence. Implementation details and platform-specific friction quarantined for potential future use.
  5. Meta-review and calibration: Adversarial quality check — assess coverage gaps, calibrate confidence, inject forced seeds, document contradictions.

The gen-0 seed contains ~54 calibrated principles. Quarantined clusters (100+) form a recovery queue for future extensions.

2. Runtime Discovery (Growth) As the Sókrates Agent (via Hermes) works on customer knowledge graphs through Eidos, it discovers patterns about organisational topology that generalise beyond the individual customer:

  • Workflow structures that recur across companies of similar size and sector
  • Common friction points in cross-departmental handoffs
  • Patterns in how Icelandic SMEs organise around ERP systems (Dynamics, Navision)
  • Failure modes in AI adoption that repeat across deployments
  • Effective intervention patterns that transfer between customers

These discoveries are abstracted, anonymised, and promoted into the Basis when they meet the tiering criteria. The longer the fleet operates, the richer the Basis becomes — this is the tenure-driven value compounding that makes the service more valuable over time.

Format

The Basis exists in multiple complementary representations:

  • Markdown (basis.md): Human-readable tiered principles for review and calibration
  • Vectorised embeddings (.npz): Dense vectors for semantic search and similarity-based retrieval during agent reasoning
  • Structured JSON: Clustered principles with metadata (source, domain, confidence, conditions) for programmatic access
  • Quarantine queue: Low-confidence or platform-specific principles held for future promotion

How the Agent Consults the Basis

The Sókrates Agent does not reason from scratch for every customer interaction. When encountering a new situation — a workflow to map, a friction point to diagnose, a skill to build — the agent queries the Basis for relevant principles:

  1. The situation is embedded using the same Voyage model
  2. Nearest-neighbour search retrieves applicable principles from the Basis
  3. Confirmed principles are applied directly; Tentative principles are applied with monitoring
  4. Conditional principles are evaluated against the customer’s context
  5. Anti-patterns are checked to avoid known failure modes

This reduces redundant reasoning, accelerates onboarding in established verticals, and ensures consistency across the fleet.

Why the Basis Is Defensible

The base bundle (pre-built Skill.md workflows, generic connectors, onboarding materials) is replicable by any competent AI implementation firm within weeks. The Basis is not.

The Basis requires:

  • The curated corpus and extraction pipeline (engineering investment)
  • The vectorisation and clustering infrastructure
  • The calibration and meta-review process (domain expertise)
  • Most critically: the fleet-wide operational data from real deployments that continuously enriches it

A competitor starting today gets the open-source frameworks. They do not get the accumulated operational knowledge from months of real enterprise deployments across the Icelandic SME market. The only way to acquire the Basis without growing it is to acquire the company that grew it.