ETL Architecture Evaluation: connectwise-etl Selected as Canonical Foundation
Summary
A structured evaluation was conducted across three ETL repositories — PSA, connectwise-etl, and BC_ETL — to identify the best PySpark/Delta Lake schema generation pipeline to use as a canonical foundation. connectwise-etl was selected for its rigorous hexagonal architecture and clean protocol-based dependency injection. The local copy was subsequently synced to origin/main, which was 4 commits ahead.
Details
Repositories Evaluated
Three candidate repos were identified for parallel analysis:
- connectwise-etl — already present locally at
/home/rationallyprime/projects/connectwise-etl/ - PSA — cloned to
/tmp/sokrates-scratchpad-hunt/for evaluation - BC_ETL — inaccessible; the
wiselausnirorg is behind SAML SSO, requiring GitHub CLI re-authorization. Excluded from the comparison.
Architecture Comparison
| Dimension | PSA | connectwise-etl |
|---|---|---|
| Architecture | uv workspace monorepo (unified-etl-core + per-integration packages) | Hexagonal with 6 pure @runtime_checkable protocols |
| Protocols/DI | SchemaConverter protocol, factory registry, decorator composition | IntegrationPluginProtocol, DataFetcherProtocol, BronzeProcessorProtocol, etc. — full DI wiring via ETLRunner |
| Multi-integration | ConnectWise, Business Central, Crayon — all as workspace packages | ConnectWise only, but plugin contract supports adding more |
| Schema Generation | Registry-based auto-detection (OpenAPI, CDM, JSON) with post-processing pipeline | Manual datamodel-codegen invocation |
| Config | Frozen Pydantic models, fail-fast, YAML-driven dimensions | LakehouseConfig with Unity Catalog support, RuntimeContext |
| Observability | structlog | Logfire + structlog (OpenTelemetry) |
| Error Handling | 6-layer error codes, decorator-based | Same pattern, plus typed ValidationErrorDetails, APIErrorDetails, etc. |
| Type Safety | basedpyright strict, full hints | Full hints, TypeAlias for domain types, TYPE_CHECKING guards |
Decision
connectwise-etl was chosen as the winner. The deciding factors were its more rigorous protocol-based hexagonal architecture — 6 distinct protocol files with @runtime_checkable, explicit DI wiring through ETLRunner, a Processors frozen dataclass container, and cleaner domain/infrastructure separation. PSA is the more feature-complete monorepo evolution covering multiple integrations, but connectwise-etl provides the cleaner, more modular foundation with stronger adherence to protocol-driven DI.
PSA was deleted from the scratchpad. BC_ETL was never cloned. The /tmp/sokrates-scratchpad-hunt/connectwise-etl/ throwaway clone was also cleaned up since the canonical local copy exists.
Post-Evaluation Sync
At the time of evaluation, the local connectwise-etl working tree was 4 commits behind origin/main and had 7 unstaged modified files (pre-architecture versions superseded by remote commits). The remote carried:
188f680— Update ConnectWise base URL to EU cloud31769c4— Implement protocol-based ETL architecture with ConnectWise plugin8841602— fix: Add IncrementalHandler and resolve type checking issuesfd76037— feat: Add etl_core framework with protocol-based architecture
Local changes were stashed and a fast-forward pull was performed. The stash dropped cleanly — the local dirty files were pre-protocol versions superseded by the remote. The local copy is now canonical and up to date.
Related
- connectwise-etl repository
- PSA repository
- BC_ETL repository
- hexagonal architecture
- ETLRunner dependency injection
- LakehouseConfig and Unity Catalog
- protocol-based plugin contracts
- PySpark Delta Lake schema generation
- Logfire observability