Real Estate Valuation Model Correction and Agent Spread Refinement
Summary
This process involved identifying and resolving a critical discrepancy in the real estate valuation pipeline where predictions were deflated by a factor of ~7.6x due to a price index mismatch. Additionally, the “Agent Effect” metric was fundamentally redefined from a tax-assessment baseline to a model-residual baseline to eliminate structural market bias.
Details
Price Index Mismatch and Retraining
During system validation, it was observed that a property with a fasteignamat (official tax assessment) of 140.2M ISK was receiving a model prediction of only 18.3M ISK. Investigation revealed a mismatch between the training and inference stages regarding the Icelandic House Price Index (HPI).
The training code in src/residual_modeling/modeling/dataset.py was referencing an old Hagstofa price index (base March 2000 = 100) located at data/price_index.xlsx. However, the inference code and the database’s sale_price_real column had been updated to use the newer HMS (Húsnæðis- og mannvirkjastofnun) index (data/kaupvisitala.csv), which uses a base of January 2024 = 100. Because the model was trained on data deflated by the old index but reinflated at prediction time using the new index, the resulting values were “deeply deflated.”
The Retrain Agent executed the following fixes:
- Updated
_PRICE_INDEX_PATHin the dataset logic to point todata/kaupvisitala.csv. - Fixed a SQL join bug in
matched_dataset()whereagent_statisticswas missing a join on theagentstable. - Retrained the four-layer residual model (
L1 Structural,L2 Spatial,L3 Temporal,L4 Presentation). - Verified the fix: The prediction for the test property (Vogatunga 83) moved from 18.3M ISK to 139.7M ISK, coming within 0.4% of the official assessment.
Agent Spread Metric Redefinition
A secondary architectural flaw was identified in how “Agent Effects” (the value an agent adds to a sale) were calculated.
Original (Broken) Logic:
The system previously calculated spread as sale_price / fasteignamat - 1. Because fasteignamat is a tax assessment that systematically lags behind market prices, almost all properties sell above this value. This resulted in every agent appearing to have a massive positive “effect” (averaging +20.3%), which was actually just a measurement of the market-wide gap between tax assessments and actual sale prices.
Revised (Correct) Logic:
The metric was redefined to isolate the agent’s actual contribution by comparing the sale price to the model’s base prediction (L1+L2+L3). The new formula is:
spread = sale_price / model_L1L2L3_prediction - 1
This change ensures that the “Agent Effect” accounts for structural (size, type), spatial (location), and temporal (market timing) factors before attributing the residual value to the agent.
Implementation and Results
The fix was implemented in s10_stats.py via a new recompute_agent_spreads() function, which is now wired to run automatically after the training pipeline completes in layers.py.
The impact on agent metrics was significant:
- Helga Pálsdóttir: Her calculated spread dropped from +19.5% (erroneous) to +3.0% (realistic).
- Market Distribution: The spread distribution shifted from being entirely positive (0.97% to 203%) to being centered near zero (-6.9% to +23.9%), correctly identifying agents who underperform or overperform relative to the model’s baseline.
- Market Average: The hardcoded
market_avg_spreadwas updated from 0.2034 to 0.034.
Related
- Eidos
- HMS
- Fasteignamat
- Residual Modeling Pipeline
- ISK
- Kaupvísitala