```A Critical Examination of
Hedonic Analysis of a
Regression Model (HARM)
and
META-ANALYSIS
Albert R. Wilson
BSSE, MBA, CRE (Ret)
1
Regression Model
A model
intended to allow an exploration
of the hypothetical relationship
between possible explanatory variables
and the sales price
2
Regression Model
• Reflection of reality
• The touchstone of that reality? Actual
market participants
3
“Estimated” versus “Predicted”
• Estimated = Sale IN database
• Predicted = Sale NOT IN database
4
Predicted Sales Prices
At the mean
predicted sales price variance
is larger than estimated variance
by σ2 (variance in the data)
5
Mean Confidence Intervals (MCI)
Estimated and Predicted
MCI FOR PREDICTED 4.38 TIMES MCI FOR ESTIMATED
6
DATABASE EDITING
GARBAGE IN => GARBAGE OUT
(GIGO)
7
Case Example
Influence on the Removal of
“Flipping Transactions” on the
Predicted Prices for 33 Properties
PREDICTED SALES PRICES
PROPERTY NO.
SUM
n
Adj. R-squared
8
AS PRESENTED
FLIPS REMOVED
% CHANGE
5,069,239
4,018,112
(1,051,127)
391
379
-12
0.7684
0.7593
-0.0091
Editing and Confirmation of Data
STEP 1:
Edit to identify obvious issues (the desk edit)
Case Example
9
Assessor’s Data 4,325
R-Squared
0.79
Removed 747
17.3%
0.83
MLS Data
Removed 779
44.3%
1,888
Editing and Confirmation of Data
STEP 2:
Identify sales that are not appropriate to the
analysis
10
Editing and Confirmation of Data
STEP 3:
Sales confirmation
• A values-neutral interview of sale participants
• OBJECT: to elicit the primary factors motivating
the conclusion of the sale price
MUST NOT INTRODUCE ANALYST OPINION
THIS IS THE ONLY MEANS OF
IDENTIFYING/CONFIRMING THE REASONS
FOR A CONCLUDED PRICE
11
Regression Model Considerations
Faithfully represent:
• Identified concerns of actual market
participants
• Restrictions imposed by the data
Estimates of prices
the ONLY VERIFIABLE OUTPUT
12
Coefficient Calculation
Result of iterative calculations
designed to provide the
most accurate estimates of sales prices
in database
13
Coefficient Calculation
Goodness of Fit
• Measures of the Goodness of Fit apply
only to the relationship between the
estimated and actual sales prices in the
database
• They do not apply to the coefficients
14
Most commonly-cited
Goodness-of-Fit Measure
R-Squared
(Coefficient of Determination)
15
R-Squared
• Generally-applied interpretation:
– R-Squared is the amount of variance
“explained” by the model
16
Low R-Squared Models
Mathematically, as the R-Squared
approaches 0.30, it becomes
more likely
that the model is only measuring
random effects
17
The Omitted and Additional
Variable Problem
• Omitting generally increases magnitude
and statistical significance of the
remaining coefficients
• Adding generally decreases the
magnitude and statistical significance of
the remaining variable coefficients
18
Illustration of Omitting or Adding a Variable
Base Model
Variable
Intercept
Coeff.
67,370
Added Variable–APN
t-stat
17.52
APN
t-stat
-663,632
-8.14
.023
8.98
-1085.06%
66,293
17.14
%
Change
-1.60%
% Change
Coeff.
t-stat
Fixtures
2,653
5.39
2,511
5.15
-5.35%
2,886
5.84
8.74%
NoPatio
(12,801)
-7.77
(5,036)
-2.73
-60.66%
(13,451)
-8.13
5.08%
SqFt
40.79
29.23
42.80
30.61
4.93%
41.59
29.72
1.96%
Pool
8,366
6.77
8,908
7.28
6.48%
19,382
12.90
20,153
13.54
3.98%
19,980
13.24
3.09%
(16,141)
-11.24
(11,230)
-7.38
-30.43%
(15,276)
-10.61
-5.36%
(8,875)
-4.52
(7,114)
-3.64
-19.84%
(8,012)
-4.06
-9.72%
2000
207
0.08
1,787
-0.67
763.29%
271
0.10
30.92%
2001
(2,017)
-0.76
665
0.258
-132.97%
(2,028)
-0.76
0.55%
2002
(719)
-0.25
3,976
1.36
-652.99%
(615)
-0.21
-14.46%
2003
7,213
2.67
7,647
2.86
6.02%
7,258
2.71
0.62%
2004
41,149
15.50
40,380
15.37
-1.87%
40,901
15.31
-0.60%
2005
132,077
51.04
130,662
50.93
-1.07%
131,129
50.43
-0.72%
2006
160,367
45.29
159,842
45.63
-0.33%
159,897
44.89
-0.29%
Garage
Middle Ring
Inner Ring
R-Squared
19
Coeff.
Omitted Variable–Pool
0.83
0.83
0.83
Consequences of Variable Selection
Including the Assessor’s Parcel Number
APN Coefficient Value
t-statistic
Mean Value
R-Squared
Mean Sale Price
0.023
8.98
30,834,360
0.83
\$211,000
Results in an incremental increase in the sales price of
0.023
x
(APN Coef.) x
20
30,834.360
(Mean Value)
=
=
\$709,190
(Incremental Increase)
Consequences of Variable Selection
Omission of a Variable:
• Removal of “Pool”; present in 38% of properties
– SQFT Cofficient changed from \$40.79 to \$41.79
– Approximately the same t-statistic
• Removal of “Fixtures”; present in 100% of
properties
– SQFT Coefficient changed from \$40.79 to \$46.50
– T-statistic = 50.94
21
Coefficients
Coefficients are simply
multipliers for the explanatory variable
22
Causation in Real Estate
From the Real Estate Appraiser’s perspective:
1. Causation demonstrated through sales
confirmation interviews.
2. Causation NEVER proven through a
regression.
23
Strengths and Weaknesses
• Can never be better than the data
• Requires significant amount of data: five to 15 or more
sales
• Upper limit to the amount of data: too much may be
worse than too little
• Guide: Are the sales competitive to the subject?
• Estimate of sales prices most accurate at the mean value
of the data
• Variance of a predicted sales price larger than variance of
estimated
• Thousands of possible regression models
24
Further Considerations
• Absent standards, the “Rubber Ruler” may
apply
• When recognized and published standards
are not used, author must demonstrate
the accuracy and reliability of his/her work
25
Hedonic Analysis
The Hedonic Assumption
The coefficient accurately and only
represents the contribution of the
declared meaning of the
explanatory variable to the
sale price
27
Hedonic Analysis
The validity of the hedonic assumption
must be demonstrated
28
“Revealed Preference”
Idea cannot be supported
for real estate
Supporting Literature
Not a single paper demonstrated the validity
of the hedonic assumption
PLUS
• NO indication of confirmation of raw data
• NO indication of adherence to any recognized /
published standards
• NO indication of confirmation of results with the normal
or typical market participant
THE RUBBER RULER EFFECT IS MUCH IN EVIDENCE.
30
Regression Model Accuracy
If the regression model is inaccurate,
then there is no reason
to expect the coefficients to be
accurate or meaningful.
Therefore the HARM cannot be accurate.
31
CASE EXAMPLE
TO POOL OR NOT TO POOL
•
•
•
•
32
Using the data from the previous case.
Does a pool influence value?
By how much?
The Hedonic Approach, the coefficient is the
marginal contribution to value.
COMBINED POOL AND NO POOLS
Variable
Intercept
COEFFICIENT
MEAN
VALUES
COMBINED POOL AND NO POOLS,
POOL COEFFICIENT SET TO ZERO
EXPECTED
COEFFICIENT
VALUES
MEAN
VALUES
EXPECTED
VALUES
54,089.83
1
54,090
54,089.83
1
54,090
ORIG_FIXTURES
2,805.33
8.73
24,491
2,805.33
8.73
24,491
ORIG_NOPATIO
-14,116.47
0.34
-4,800
-14,116.47
0.34
-4,800
9,161.98
0.38
3,482
9,161.98
0
0
41.52
2283.62
94,815
41.52
2283.62
94,815
16,212.83
0.4
6,485
16,212.83
0.4
6,485
5,980.33
1
5,980
5,980.33
1
5,980
ORIG_POOL
ORIG_SQF
ORIG_X_3GARAGE
SY2000
EXPECTED MEAN
SALE PRICE
Adj R2
33
184,543
0.8816
181,061
0.8816
TO POOL OR NOT TO POOL (CONT.)
• What are the coefficients if there is no pool?
34
COMBINED WITH NO POOL VARIABLE
Variable
Intercept
COEFFICIENT
EXPECTED VALUES
52788.1063
1
52,788
ORIG_FIXTURES
3,087.8801
8.73
26,957
ORIG_NOPATIO
-14,724.7843
0.34
-5,006
42.3986
2283.62
96,822
ORIG_X_3GARAGE
16,924.691
0.4
6,770
SY2000
5,727.7462
1
5,728
ORIG_SQF
EXPECTED MEAN
SALE PRICE
Adj R2
35
MEAN VALUES
184,059
0.8790
Comparision
•
•
•
•
•
•
•
•
36
Orig Fixt
2,805
Orig-nopatio -14,116
Orig-no pool
9,162
Orig-sqf
41.52
Orig-garage
16,213
SY2000
5,980
ESP
\$184,513
R-sq
0.88
3,088
-14,725
NA
42.40
16,925
5,728
\$184,059
0.88
POOL OR NOT TO POOL (CONT.)
• WHAT HAPPENS IF WE CONSIDER A DATABASE
WITH POOLS, AND SEPARATELY A DATABASE
WITHOUT POOLS?
37
WITH POOL ON PROPERTY
Variable
Intercept
COEFFICIENT
MEAN
VALUES
WITHOUT POOL ON PROPERTY
EXPECTED
VALUES
COEFFICIENT
MEAN
VALUES
EXPECTED
VALUES
65,957.89
1.00
65,958
54,993.78
1.00
54,994
ORIG_FIXTURES
2,505.59
9.65
24,179
2,784.14
8.16
22,719
ORIG_NOPATIO
-15,415.46
0.22
-3,391
-14,838.47
0.41
-6,084
41.63
2,586.79
107,690
41.46
2,097.20
86,956
15,768.93
0.40
6,308
16,308.32
0.31
5,056
4,211.37
1.00
4,211
7,209.87
1.00
7,210
ORIG_POOL
ORIG_SQF
ORIG_X_3GARAGE
SY2000
EXPECTED MEAN
SALE PRICE
Adj R2
38
204,954
0.08711
170,850
0.8895
POOLS AND NO POOLS
SEPARATELY
• ESTIMATED SALE PRICE WITH POOL \$204,954
– R-SQUARED 0.87
• ESTIMATED SALE PRICE W/O POOL \$170,805
– R-SQUARED 0.89
39
The Coefficient – What Counts?
ALL THAT STATISTICAL SIGNIFICANCE CAN TELL
US IS THAT
FOR THIS MODEL AND DATABASE
THE COEFFICIENT IS A SIGNIFICANT
(OR INSIGNIFICANT)
MULTIPLIER FOR THE EXPLANATORY VARIABLE.
NOTHING MORE.
40
The Appropriate Standard:
Economic Significance
For us, economic significance
is determined by
what the normal or typical participant
considers important to the
conclusion of the transaction.
41
A Criticality:
NOT ONE hedonic analysis encountered
to date has actually asked this question:
“What was important to you in
concluding your transaction?”
42
Hedonic Analysis of a Regression
Model (HARM) is:
• Highly inaccurate and unreliable method
• Not appropriate for appraisal work
Observations apply to hedonic analysis
NOT
regression models!
43
```