- Notifications
You must be signed in to change notification settings - Fork2
eyayaw/cleaning-RWI-GEO-RED
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Helpers for cleaningImmoscout24/RWI-GEO-RED housing data
This repo contains a set of R scripts for cleaning housing data provided by RWI-GEO-RED, and for constructing hedonic price/rent indexes.
The data cleaning steps I followed in these scripts are ad hoc and thus you may need to adapt them to your need should you find them useful.Somes of the steps in the scripts might not be relevant for your particular use case, for example, extracting the zip files and parsing the variable/value labels, and translating variable names. You may read the scripts in the following order.
- parse-labels_from_Stata-log-files.R
- prepare_prices-rents_data.R
- clean_prices.R
- clean_rents.R
- hedonic-model_prices.R
- hedonic-model_rents.R
R 4.2
or higherDevelopment version of data.table (v 1.14.7) or higher:
install.packages("data.table")# latest development versiondata.table::update_dev_pkg()
Besides installing packages, you may need to create a.Renviron
file for the location of the RWI-GEO-RED data and the desired start and end year, for example:
RED_FOLDER=C:/Users/x/RWI-GEO-RED_v6YEAR_START=2007YEAR_END=2021
Note: Alternatively, you can define those constants inside the respective scripts.
Furthermore, you need to download additional data:
- For the CPI to work,download the monthly CPI from Destatis (make sure to select the years you need) and save it as
extra/cpi_61121-0002.csv
clean_prices.R andclean_rents.R filter and clean the data for house/apartment prices and rents respectively. Furthermore, datasets for houses and apartments are combined,
Filtering:
- for each observation which is classified as 'likely duplicate' (
dupID_gen == 1
), drop the previous one - rents: only keep observations with
grid_id > 0 & rent > 0 & floor_space > 0 & num_rooms > 0 & utilities > 0
- prices: only keep observations with
grid_id > 0 & price > 0 & floor_space > 0 & num_rooms > 0
- for each observation which is classified as 'likely duplicate' (
Combining rents for apartments and houses:
- assign code
999L
to object properties that only apply to the other category (apartments/houses)
- assign code
Cleaning:
- missing values (with FDZ codes < 0) are recoded to "na"
num_bedrooms
: limited between 1 and 7 bedrooms (>=7
recoded to7+
), missing recoded to "na or 0"num_bathrooms
,num_floors
: limited between 1 and 4 (rest seenum_bedrooms
)
Additional processing at the end ofclean_rents.R
:
- construction/renovation year
- construction year <-> renovation year in case the other one is NA
- drop construction years below 1900
- if NA, impute construction year and drop imputations < 1900
- discard properties with
- (i) a monthly rental price below 1e/m2 or above 50e/m2
- (ii) floor space below 30m2 or above 500m2
And finally:
- the distance to the central business district (CBD) is computed and
- the consumer price index (CPI) is used to adjust rents/prices for inflation
- kategorie_Wohnung has label value "11" that has no label (neither Stata nor csv, in the csv the label is "11")
The data come with a rich set of property characteristics which enable us to compute a hedonic (price) index to quality-adjust house prices. I construct a mix-adjusted house price index from the following panel hedonic regression
where
Additionally, in theextra/
folder, labor market regions (Kosfeld and Werner (2012)), the (1kmx1km) grid, municipality, and district information for Germany are provided. Note that the Kosfeld and Werner (2012)'s labor market regions are updated for the 2019 (end of the year) administrative structure (Verwaltungsgliederung am 31.12.2019) of districts.
About
R scripts for cleaning Immoscout24/RWI-GEO-RED data
Topics
Resources
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
Packages0
Contributors2
Uh oh!
There was an error while loading.Please reload this page.