Real Estate Valuation Model Correction and Agent Spread Refinement

Summary

This process involved identifying and resolving a critical discrepancy in the real estate valuation pipeline where predictions were deflated by a factor of ~7.6x due to a price index mismatch. Additionally, the “Agent Effect” metric was fundamentally redefined from a tax-assessment baseline to a model-residual baseline to eliminate structural market bias.

Details

Price Index Mismatch and Retraining

During system validation, it was observed that a property with a fasteignamat (official tax assessment) of 140.2M ISK was receiving a model prediction of only 18.3M ISK. Investigation revealed a mismatch between the training and inference stages regarding the Icelandic House Price Index (HPI).

The training code in src/residual_modeling/modeling/dataset.py was referencing an old Hagstofa price index (base March 2000 = 100) located at data/price_index.xlsx. However, the inference code and the database’s sale_price_real column had been updated to use the newer HMS (Húsnæðis- og mannvirkjastofnun) index (data/kaupvisitala.csv), which uses a base of January 2024 = 100. Because the model was trained on data deflated by the old index but reinflated at prediction time using the new index, the resulting values were “deeply deflated.”

The Retrain Agent executed the following fixes:

  1. Updated _PRICE_INDEX_PATH in the dataset logic to point to data/kaupvisitala.csv.
  2. Fixed a SQL join bug in matched_dataset() where agent_statistics was missing a join on the agents table.
  3. Retrained the four-layer residual model (L1 Structural, L2 Spatial, L3 Temporal, L4 Presentation).
  4. Verified the fix: The prediction for the test property (Vogatunga 83) moved from 18.3M ISK to 139.7M ISK, coming within 0.4% of the official assessment.

Agent Spread Metric Redefinition

A secondary architectural flaw was identified in how “Agent Effects” (the value an agent adds to a sale) were calculated.

Original (Broken) Logic: The system previously calculated spread as sale_price / fasteignamat - 1. Because fasteignamat is a tax assessment that systematically lags behind market prices, almost all properties sell above this value. This resulted in every agent appearing to have a massive positive “effect” (averaging +20.3%), which was actually just a measurement of the market-wide gap between tax assessments and actual sale prices.

Revised (Correct) Logic: The metric was redefined to isolate the agent’s actual contribution by comparing the sale price to the model’s base prediction (L1+L2+L3). The new formula is: spread = sale_price / model_L1L2L3_prediction - 1

This change ensures that the “Agent Effect” accounts for structural (size, type), spatial (location), and temporal (market timing) factors before attributing the residual value to the agent.

Implementation and Results

The fix was implemented in s10_stats.py via a new recompute_agent_spreads() function, which is now wired to run automatically after the training pipeline completes in layers.py.

The impact on agent metrics was significant:

  • Helga Pálsdóttir: Her calculated spread dropped from +19.5% (erroneous) to +3.0% (realistic).
  • Market Distribution: The spread distribution shifted from being entirely positive (0.97% to 203%) to being centered near zero (-6.9% to +23.9%), correctly identifying agents who underperform or overperform relative to the model’s baseline.
  • Market Average: The hardcoded market_avg_spread was updated from 0.2034 to 0.034.
  • Eidos
  • HMS
  • Fasteignamat
  • Residual Modeling Pipeline
  • ISK
  • Kaupvísitala