Hyle: Graph-Native ORM with Dynamic Schema Registry

Summary

Hyle is the proposed data substrate layer for Eidos — the matter to Eidos’s form, borrowing Aristotelian terminology intentionally. It is a graph-native ORM built on top of Pydantic v2 and datamodel-code-generator (DMCG), using a metaclass + decorator architecture to provide a living, dynamic registry of BaseNode subclasses that can be hot-loaded from code-generated models at runtime. The design emerged from extracting and adapting the schema ETL pipeline originally built for Microsoft Fabric Lakehouses.

Details

Conceptual Foundation

The name pairing is deliberate and precise: Eidos (the knowledge graph — form, relationships, meaning) and Hyle (the schema substrate — matter, structure, the stuff that receives form). In Aristotelian terms, raw schemas are formless until the knowledge graph gives them relational structure. The architectural goal is a system where org topology is not modeled statically but computed continuously from two input streams: machine-discovered operational data from ERP systems (via OpenAPI specs) and bespoke ontology elements designed by Sokrates’s Socratic Workflow Archaeologist.

Node Registry Architecture

The core design challenge is a discriminated union registry that must be a living registry rather than a static type annotation — because DMCG regenerates models when schemas evolve, and those models must be hot-swappable in a running process.

The architecture uses two layers with a clean separation of concerns:

HyleMeta (metaclass, extending Pydantic’s ModelMetaclass): Handles structural concerns — intercepting class creation before finalization, enforcing the node contract (must have a node_type Literal field), injecting query/repository methods, and auto-registering concrete node types. The metaclass gives fail-at-construction semantics rather than fail-at-first-use, which is critical for a system loading code-generated classes into a running process.

@node decorator: Handles semantic concerns — graph label, indexes, constraints, repository binding, schema version, and source provenance (discovered vs designed). Splitting structural and semantic concerns across metaclass and decorator prevents the two from colliding and keeps each layer’s responsibilities legible.

NodeRegistry: A standalone, injectable class (not entangled in the metaclass) that holds _types (latest version per node type), _versions (full version history), and _hooks (observer callbacks). The observer pattern on on_register is how Eidos learns about Hyle schema changes — when Hyle registers a new node type, Eidos can automatically create Neo4j constraints, update its schema cache, and trigger relationship inference.

The union_type property on NodeRegistry dynamically builds the discriminated union from whatever is currently registered, making it compatible with Pydantic’s Field(discriminator="node_type") deserialization path.

Schema Evolution and Hot-Swapping

Schema healing uses importlib over raw exec for all the right reasons: files on disk provide an audit trail, proper __module__ attributes aid debuggability, importlib.reload() supports hot-swapping, and sys.modules namespacing enables version isolation. The pattern is to version-namespace module names (hyle.generated.customer_v2) and remove the old module from sys.modules before loading the new one to prevent the import cache from serving stale versions. The metaclass fires during exec_module, so registration happens as a construction step.

Dual Input Channels

Hyle accepts two structurally different inputs into the same registry with different trust profiles and evolution policies:

  • OpenAPI → DMCG → BaseNode subclass: Pre-validated, machine-discovered schemas from customer ERP systems. DMCG’s --base-class flag points at hyle.BaseNode, so generated models inherit the metaclass, registry, and query builder automatically. These heal automatically when the OpenAPI spec changes.
  • Sokrates-designed ontology elements: Hand-crafted nodes encoding org topology relationships that no single ERP system knows about — workflow bottlenecks, cross-system dependencies, organizational pathologies discovered by the Workflow Archaeologist. These require deliberate schema changes because their semantics are load-bearing in a way machine-discovered schemas are not.

The source field on the @node decorator ("discovered" vs "designed") encodes this policy difference at the class level.

Hypergraph Semantics and Datalog

A subsequent insight extends the design toward hypergraph semantics: a hyperedge in Eidos is its generating query rather than having one or being produced by one. The membership of a hyperedge — which nodes it connects — is defined by executing a standing query, making the hyperedge alive and continuously recomputed rather than snapshotted. This maps directly to Datalog semantics: the knowledge graph is a Datalog program, and the org topology is its minimal model.

This stratified structure has three layers:

  • Layer 0 (Hyle/matter): Raw BaseNode instances from OpenAPI-discovered ERP schemas — ground facts.
  • Layer 1 (generating queries): Hyperedges as standing queries whose result sets are their membership — re-executed when underlying data changes.
  • Layer 2 (metalayer DSL): Compositions of Layer 1 queries, analogous to CTEs, where higher-order hyperedges reference lower-order ones by name.

The metalayer DSL needs stratification to prevent circular hyperedge dependencies (analogous to Datalog’s stratified negation). Materialization policy (MATERIALIZED vs VIRTUAL on hyperedge definitions) is a first-class concern: stable hyperedges get materialized for fast reads; actively evolving org structures stay virtual for always-fresh results.

Self-healing is definitional rather than mechanical under this model — re-evaluating generating queries produces the correct graph state without a separate reconciliation process.

Query Language: GQL over Cypher

The recommended compilation target for Hyle’s query builder is GQL (ISO/IEC 39075:2024) rather than Cypher directly. GQL is the first new ISO database language since SQL in 1987, effectively Cypher’s heir — Neo4j and AWS jointly endorsed it, and Cypher is converging toward it. Critically, Microsoft Fabric has native GQL support, giving Hyle both Neo4j compatibility and Fabric portability from a single query builder AST. The architecture is: method-chaining query builder produces a GQL AST; dialect-specific drivers serialize to Cypher (Neo4j), native GQL (Fabric), or future implementations.

  • Eidos — the knowledge graph layer that Hyle serves as substrate for
  • Socratic Workflow Archaeologist — the agent that discovers org topology and feeds bespoke ontology elements into Hyle
  • datamodel-code-generator — DMCG, the code generation tool that produces BaseNode subclasses from OpenAPI/JSON schemas
  • NodeRegistry — the living discriminated union registry central to Hyle’s hot-swap architecture
  • GQL (ISO/IEC 39075) — the query language target for Hyle’s query builder
  • Neo4j — the graph database backend for Eidos/Hyle in production
  • Microsoft Fabric — the Lakehouse platform where the original schema ETL pipeline was built, informing Hyle’s design
  • Hermes Agent — the agent operating at the metalayer, writing new generating queries that extend the org topology model