ETL Architecture Evaluation: connectwise-etl Selected as Canonical Foundation

Summary

A structured evaluation was conducted across three ETL repositories — PSA, connectwise-etl, and BC_ETL — to identify the best PySpark/Delta Lake schema generation pipeline to use as a canonical foundation. connectwise-etl was selected for its rigorous hexagonal architecture and clean protocol-based dependency injection. The local copy was subsequently synced to origin/main, which was 4 commits ahead.

Details

Repositories Evaluated

Three candidate repos were identified for parallel analysis:

  • connectwise-etl — already present locally at /home/rationallyprime/projects/connectwise-etl/
  • PSA — cloned to /tmp/sokrates-scratchpad-hunt/ for evaluation
  • BC_ETL — inaccessible; the wiselausnir org is behind SAML SSO, requiring GitHub CLI re-authorization. Excluded from the comparison.

Architecture Comparison

DimensionPSAconnectwise-etl
Architectureuv workspace monorepo (unified-etl-core + per-integration packages)Hexagonal with 6 pure @runtime_checkable protocols
Protocols/DISchemaConverter protocol, factory registry, decorator compositionIntegrationPluginProtocol, DataFetcherProtocol, BronzeProcessorProtocol, etc. — full DI wiring via ETLRunner
Multi-integrationConnectWise, Business Central, Crayon — all as workspace packagesConnectWise only, but plugin contract supports adding more
Schema GenerationRegistry-based auto-detection (OpenAPI, CDM, JSON) with post-processing pipelineManual datamodel-codegen invocation
ConfigFrozen Pydantic models, fail-fast, YAML-driven dimensionsLakehouseConfig with Unity Catalog support, RuntimeContext
ObservabilitystructlogLogfire + structlog (OpenTelemetry)
Error Handling6-layer error codes, decorator-basedSame pattern, plus typed ValidationErrorDetails, APIErrorDetails, etc.
Type Safetybasedpyright strict, full hintsFull hints, TypeAlias for domain types, TYPE_CHECKING guards

Decision

connectwise-etl was chosen as the winner. The deciding factors were its more rigorous protocol-based hexagonal architecture — 6 distinct protocol files with @runtime_checkable, explicit DI wiring through ETLRunner, a Processors frozen dataclass container, and cleaner domain/infrastructure separation. PSA is the more feature-complete monorepo evolution covering multiple integrations, but connectwise-etl provides the cleaner, more modular foundation with stronger adherence to protocol-driven DI.

PSA was deleted from the scratchpad. BC_ETL was never cloned. The /tmp/sokrates-scratchpad-hunt/connectwise-etl/ throwaway clone was also cleaned up since the canonical local copy exists.

Post-Evaluation Sync

At the time of evaluation, the local connectwise-etl working tree was 4 commits behind origin/main and had 7 unstaged modified files (pre-architecture versions superseded by remote commits). The remote carried:

  • 188f680 — Update ConnectWise base URL to EU cloud
  • 31769c4 — Implement protocol-based ETL architecture with ConnectWise plugin
  • 8841602 — fix: Add IncrementalHandler and resolve type checking issues
  • fd76037 — feat: Add etl_core framework with protocol-based architecture

Local changes were stashed and a fast-forward pull was performed. The stash dropped cleanly — the local dirty files were pre-protocol versions superseded by the remote. The local copy is now canonical and up to date.

  • connectwise-etl repository
  • PSA repository
  • BC_ETL repository
  • hexagonal architecture
  • ETLRunner dependency injection
  • LakehouseConfig and Unity Catalog
  • protocol-based plugin contracts
  • PySpark Delta Lake schema generation
  • Logfire observability