Claude Session History and Context Synthesis

Summary

The process of identifying, extracting, and normalizing project history from Claude Code and Claude.ai to build the Sokrates knowledge graph. This workflow involves locating local session transcripts, exporting cloud-based conversations, and preparing them for multi-agent processing into structured markdown.

Details

The Sokrates project utilizes a systematic approach to capture developer intent and architectural decisions by harvesting session data from the Claude ecosystem. This data serves as the primary source for the project’s internal wiki and Neo4j-backed knowledge graph.

Claude Code Local Storage

Claude Code stores session history locally on the developer’s machine. The primary storage location is within the user’s home directory under ~/.claude/. The structure is organized by project paths, where the path to the project root is slugified to create a unique directory name.

For the Sokrates project located at /home/rationallyprime/projects/sokrates, the transcripts are stored at: ~/.claude/projects/-home-rationallyprime-projects-sokrates/

The storage schema includes:

  • <uuid>.jsonl files: Individual session transcripts stored in JSON Lines format, where each line represents a single message or tool interaction.
  • <uuid>/ directories: Folders containing auxiliary session data, such as task lists, plans, and intermediate agent states.
  • memory/ directory: A project-specific location for the Claude Code auto-memory system, which persists across sessions but is distinct from the raw transcripts.
  • ~/.claude/history.jsonl: A global index file that tracks session metadata across all projects on the machine.

Context Extraction Workflow

The synthesis process follows a multi-step pipeline to consolidate disparate data sources:

  1. Local Extraction: Session .jsonl files are copied from the ~/.claude directory into a dedicated, git-ignored docs/ directory within the project repository.
  2. Cloud Export: Conversations from the Claude.ai web interface are exported in JSON format. JSON is preferred over Markdown or plain text for these exports to maintain parity with the local .jsonl structure, ensuring easier parsing by downstream agents.
  3. Normalization and Processing: The combined dataset (local transcripts and cloud exports) is processed using a “team of agents” approach. This involves utilizing high-reasoning models (Claude 3.5 Sonnet) for synthesis and high-throughput models (such as Gemini 1.5 Flash) for initial data cleaning and chunking.
  4. Wiki Generation: The processed data is transformed into structured Markdown files. These files are initially organized within an Obsidian vault to visualize the knowledge graph before being finalized as a wiki for the Sokrates team.

This process ensures that “ephemeral” developer conversations are converted into “persistent” organizational knowledge, adhering to the project’s goal of creating an AI-augmented operations department.