- Notifications
You must be signed in to change notification settings - Fork598
awesome-archive/lianjia-scrawler
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
- This repo provides a tool to scrawl house info at LianJia.com and data would be stored in Mysql datatbase (Currently it also supports Sqlite and Postgres). It is easy to export to CSV or other formates.
- You also cansync Mysql to Elasticsearch. In this way, you can usekibana to analyse these data.
- This tool could collect community infomation from each region at first, then you'd like to use these communities to learn about onsale, history price, sold and rent information.
- Please modify cookie info when this tool is blocked by lianjia due to ip traffic issue.
- Download source code and install package dependency.
1. git clone https://github.com/XuefengHuang/lianjia-scrawler.git2. cd lianjia-scrawler# If you'd like not to use [virtualenv](https://virtualenv.pypa.io/en/stable/), please skip step 3 and 4.3. virtualenv lianjia4. source lianjia/bin/activate5. pip install -r requirements.txt
- Setting DB config at config.ini
[Mysql]enable = Truescheme = testhost = 127.0.0.1port = 3306user = rootpassword = secret[Sqlite]enable = Falsedbname = lianjia.db[Postgresql]enable = Falsescheme = testhost = 127.0.0.1user = postgrespassword = secret
Please add your favor region at scrawl.py
regionlist = [u'chaoyang', u'xicheng', u'dongcheng'] # only pinyin support
Start
python scrawl.py
and enjoy! (Please comment line13 if you have already got community list)
# All communities in every region. class Community(BaseModel):id = PrimaryKeyField()title = CharField()link = CharField(unique=True)district = CharField()bizcircle = CharField()tagList = CharField()# All onsale house information in every community.class Houseinfo(BaseModel):houseID = BigIntegerField(primary_key=True)title = CharField()link = CharField()community = CharField()years = CharField()housetype = CharField()square = CharField()direction = CharField()floor = CharField()taxtype = CharField()totalPrice = IntegerField()unitPrice = IntegerField()followInfo = CharField()validdate = DateTimeField(default=datetime.datetime.now)# All onsale house history price in every community.class Hisprice(BaseModel):houseID = BigIntegerField()totalPrice = IntegerField()date = DateTimeField(default=datetime.datetime.now)class Meta:primary_key = CompositeKey('houseID', 'totalPrice')# All sold house information in every community.class Sellinfo(BaseModel):houseID = BigIntegerField(primary_key=True)title = CharField()link = CharField()community = CharField()years = CharField()housetype = CharField()square = CharField()direction = CharField()floor = CharField()status = CharField()source = CharField()totalPrice = IntegerField()unitPrice = IntegerField()dealdate = DateField()updatedate = DateTimeField(default=datetime.datetime.now)# All rent house information in every community.class Rentinfo(BaseModel):houseID = BigIntegerField(primary_key=True)title = CharField()link = CharField()region = CharField()zone = CharField()meters = CharField()other = CharField()subway = CharField()decoration = CharField()heating = CharField()price = IntegerField()pricepre = CharField()updatedate = DateTimeField(default=datetime.datetime.now)
About
A tool to scrawl house info at LianJia
Resources
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
No releases published
Packages0
No packages published
Uh oh!
There was an error while loading.Please reload this page.