- Notifications
You must be signed in to change notification settings - Fork2
R package to obtain past sales data from allhomes.com.au
License
Unknown, MIT licenses found
Licenses found
mevers/allhomes
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
In mid October 2022, there was a major & breaking change to how Allhomes/Domain Group makes past sales data available throughallhomes.com.au. As a result, all methods provided by theallhomes
package to extract past sales data have been invalidated. Until I have had an opportunity to carefully review these changes, I cannot say if or when a fix will be possible. Until then, sadly, you will not be able to download Allhomes past sales data throughallhomes
.
This is the repository for theallhomes
R package. The main function that the package provides isget_past_sales_data()
which extracts past sales data fromallhomes.com.au for a (or multiple) suburb(s) and year(s).
Install the package from CRAN
install.packages("allhomes")
Or directly from GitHub
remotes::install_github("mevers/allhomes")
The functionget_past_sales_data()
takes the following two arguments:
suburb
: This is acharacter
vector denoting a (or multiple) suburbs. Every entry must be of the form "<suburb_name>, <state/territory_abbreviation>", e.g. "Balmain, NSW".year
: This is annumeric
orinteger
vector of the the year(s) of the sales history.
Example:
get_past_sales_data("Balmain, NSW",2019) %>% print(width=100)#[2022-07-27 14:52:47] Looking up division ID for suburb='Balmain, NSW'...#[2022-07-27 14:52:47] URL: https://www.allhomes.com.au/svc/locality/searchallbyname?st=NSW&n=balmain#[2022-07-27 14:52:47] Finding data for ID=7857, year=2019...#[2022-07-27 14:52:47] URL: https://www.allhomes.com.au/ah/research/_/120785712/sale-history?year=2019#[2022-07-27 14:52:48] Found 229 entries.## A tibble: 229 × 27# divis…¹ state postc…² value year address bedro…³ bathr…⁴ ensui…⁵ garages carpo…⁶ contr…⁷ trans…⁸# <chr> <chr> <chr> <int> <dbl> <chr> <dbl> <dbl> <lgl> <dbl> <lgl> <chr> <chr># 1 Balmain NSW 2041 7857 2019 1 Long… NA NA NA NA NA 06/12/… 02/04/…# 2 Balmain NSW 2041 7857 2019 7 Alex… NA NA NA NA NA 30/08/… 16/10/…# 3 Balmain NSW 2041 7857 2019 29 Bir… NA NA NA NA NA 25/10/… 06/12/…# 4 Balmain NSW 2041 7857 2019 2 Well… 6 3 NA 4 NA 25/05/… 26/08/…# 5 Balmain NSW 2041 7857 2019 109 Mo… 4 2 NA 2 NA 25/02/… 08/04/…# 6 Balmain NSW 2041 7857 2019 10 Tha… 4 2 NA 4 NA 05/10/… 16/12/…# 7 Balmain NSW 2041 7857 2019 3/100 … NA NA NA NA NA 18/07/… 06/09/…# 8 Balmain NSW 2041 7857 2019 160 Be… 5 4 NA 1 NA 18/10/… 13/12/…# 9 Balmain NSW 2041 7857 2019 25 Isa… NA NA NA NA NA 01/05/… 02/09/…#10 Balmain NSW 2041 7857 2019 71 Mor… 4 2 NA 2 NA 24/05/… 05/07/…## … with 219 more rows, 14 more variables: list_date <chr>, price <dbl>, block_size <dbl>,## transfer_type <chr>, full_sale_price <dbl>, days_on_market <dbl>, sale_type <lgl>,## sale_record_source <chr>, building_size <lgl>, land_type <lgl>, property_type <lgl>,## purpose <chr>, unimproved_value <lgl>, unimproved_value_ratio <lgl>, and abbreviated variable## names ¹division, ²postcode, ³bedrooms, ⁴bathrooms, ⁵ensuites, ⁶carports, ⁷contract_date,## ⁸transfer_date## ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names
Under the hood, the functionget_past_sales_data()
first calls a helper functionget_ah_division_ids()
that determines for everysuburb
entry the Allhomes "division" name and ID. The division ID is then used to extract past sales data from the Allhomes website using the low-level functionextract_past_sales_data()
.
Currently, there are limited sanity checks in place to verify if past sales data are available for a particular suburb and year. Allhomes does not have data for all suburbs and years (for example, Allhomes past sales data for Victoria is pretty much absent).
allhomes
also provides two datasetsdivisions_ACT
anddivisions_NSW
that list division names and IDs for all Allhomes divisions (suburbs) in the ACT and NSW, respectively.
Please report any bugs asGitHub issues. If you like to get involved, please get in touch and/or submit a PR.
The (unofficial) Allhomes API distinguishes between different types of "localities"; in increasing level of granularity these are: state > region > district > division > street > address. Divisions (roughly) correspond to suburbs. Theallhomes
package pulls in past sales data at the division (i.e. suburb) level.
Allhomes (which is part ofDomain Group) receives historical past sales data from relevant state departments. Some details on Allhomes' data retention are givenhere.
While there seems to exist an (unofficial) Allhomes API to query IDs (which are necessary for looking up past sales data), past sales data themselves need to be scraped from somewhat awkwardly-formatted static HTML tables. Data for every sale is stored within a<tbody>
element; within every<tbody>
element, individual values (address, price, dates, block size, etc.) are spread across 3 lines, each contained within a<td>
element; unfortunately, the format of every line is not consistent.
This project is neither related to nor endorsed byallhomes.com.au. With changes to how Allhomes (and Domain group) manages and formats data, some or all of the functions might break at any time. There is also no guarantee that historical past sales data won't change.
All data provided are subject to theallhomes "Advertising Sales Agreement terms and conditions - All Homes Pty Ltd".
About
R package to obtain past sales data from allhomes.com.au