Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Python script to fetch cars data

License

NotificationsYou must be signed in to change notification settings

DhrumilShah98/CarsDataPythonScript

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

SCRAPING JSON WEBPAGE

author     = "Dhrumil Amish Shah"copyright  = "Copyright 2019"credits    = ["Dhrumil Amish Shah"]version    = "1.0.0"maintainer = "Dhrumil Amish Shah"github     = "https://github.com/DhrumilShah98/"linkedIn   = "https://linkedin.com/in/dhrumilshah98/"

NOTE: I CAN NOT SHARE THE NAME OF THE WEBSITE

I WAS ABLE TO GET TOTAL OF 1238 CAR DETAILS. PRETTY AMAZING RIGHT :)

FOLDER: PythonExcelScript :- 1. CarCreateSpecsExcelScript

  • Fetch the data from 'car_website_name' website. Lucky Me: I was able to find a JSON which contains a lot of data.
  • I invested a lot of time to find the main data source on "car_website_name's"
  • I found one data source (JSON file) which I used to call multiple URLs which were within itfrom which i got chunks of data. [CONTAINS A LOT DATA OUT OF WHICH I USED ONLY FEW]
  • Then I cleaned all the data and I decided what should I keep and what I should not. [TEDIOUS TASK]
  • I used that file (Here I downloaded it first, but instead I could also have made a network call) Well... I downloaded it first.
  • Below is the code. It calls multiple URLs one by one, get required data by parsing and then make an Excel structure out of it.

Summarizing....

  1. Got a data source (JSON form), called multiple URLS by analyzaing the JSON form,
  2. From multiple URLs, I got a chunks of data after which I created a simple excel structure.
  3. You can contact me on linkedIn if you want this data either in Excel or JSON.

NOTE: I HAVE NOT ADDED SPECIFICATIONS FOR EACH CAR VARIANT IN THE EXCEL SHEET. I AM WORKING ON IT CURRENTLYBUT, I HAVE ADDED ALL THE SPECIFICATIONS IN THE JSON. I AM WORKING TO DO SAME IN THE EXCEL TOO.

TOTAL NUMBER OF COLUMNS: 38 COLUMNS1) brandNameEg: Hyundai2) nameEg: Hyundai Grand i103) engineEg: 1197cc4) mileageEg: 17.0kmpl5) seatingEg: 5 seater6) modelShortNameEg: Grand i107) subTextEg: 1197cc, 18.9kmpl8) fuelNameEg: Petrol9) carVariantIdEg: Hyundai Grand i10 1.2 Kappa Sportz Dual Tone10) titleEg: Hyundai Grand i10 1.2 Kappa Sportz Dual Tone11) highWayAvgEg: 18.912) urbanAvgEg: 19.113) displayCarVariantIdEg: Hyundai Grand i10 1.2 Kappa Sportz Dual Tone14) vehicleTypeEg: Hatchback15) priceRangeEg: 6.4 Lakh16) modelPriceRangeEg: 4.97 - 7.63 Lakh17) oem_nameEg: hyundai18) model_nameEg: hyundai grand i1019) variant_nameEg: hyundai-grand-i10-1.2-kappa-sportz-dual-tone20) car_segmentEg: hatchback cars21) engine_ccEg: 119722) fuel_typeEg: petrol23) transmission_typeEg: manual24) brand_newEg: hyundai25) model_newEg: hyundai grand i1026) display_model_newEg: Hyundai Grand i1027) variant_newEg: hyundai grand i10 1.2 kappa sportz dual tone28) fuel_type_newEg: petrol29) engine_capacity_newEg: 1000cc - 2000cc30) max_engine_capacity_newEg: 119731) min_engine_capacity_newEg: 119732) transmission_type_newEg: manual33) mileage_newEg: 15 kmpl and above34) max_mileage_newEg: 1835) min_mileage_newEg: 18 36) seating_capacity_newEg: 5 37) transmission_typeEg: manual 38) url - JSONEg: url for many other data in json form45 MORE SPECIFICATIONS FOR EACH CAR VARIANT(variant_name). ALSO, THERE ARE MANY MORE FEATURES WHICH I HAVE NOT SCRAPPED BUT I WILL DO IT IN FUTURE IF I GET SOME MORE TIME ;)

FOLDER: PythonJsonScript :- 1. CarCreateSpecsJsonScript & 2.CarCleanJsonScript

The first file (1. CarCreateSpecsJsonScript) is responsible for getting all the data while the second file (2. CarCleanJsonScript)is responsible for structuring and cleaning of the data. [THIS CAN BE DONE IN ONE FILE, BUT I DID IN 2 FILES]

  • Fetch data from 'car_website_name' website. Lucky Me: I was able to find a JSON which contains a lot of data.
  • I invested a lot of time to find a main data source on "car_website_name's"
  • I found one data source (JSON file) which I used to call multiple URLs which were within itfrom which i got chunks of data. [CONTAINS A LOT DATA OUT OF WHICH I USED ONLY FEW]
  • Then I cleaned all the data and I decided what should I keep and what I should not. (TEDIOUS TASK)
  • I used that file (Here I downloaded it first, but instead I could also have made a network call)
  • Well..... I downloaded it first.
  • Below is the code. It calls multiple URLs one by one, get required data by parsing and then make a JSON structure out of it.

Summarizing....

  1. Got a data source (JSON form), called multiple URLS by analyzaing the JSON form,
  2. From multiple URLs, I got a chunks of data after which I created an awesome JSON structure.
  3. You can contact me on linkedIn if you want this data either in Excel or JSON.

THESE 1238 cars were further modularized as below

Format of the new JSON which I created for simplicity.{  "make": [    {      "name": "makeName",      "model": [        {          "modelName": "modelName1",          "variant": [            {            "variantName": "variantName1",            "specification1": "specification",            "specification2": "specification",...            },            {            "variantName": "variantName2",            "specification1": "specification",            "specification2": "specification",...            }          ]        },        {            "modelName": "modelName2",          "variant": [            {            "variantName": "variantName",            "specification1": "specification",            "specification2": "specification",...            }          ]        }      ]    }  ]}

SAMPLE JSON

EACH VARIANT HAS MORE THAN 45 PROPERITES. PRETTY COOL RIGHT...???I HAVE SHOW HERE A SMALL JSON HERE BELOW....{ "make": [  {   "model": [    {     "variant": [      {       "variantName": "RXE",       "variantUrl": "variant_url_with_many_different_properites_apart_from_this_in_json",       "Driver Airbag": "Yes",       "Rear Tread (mm)": "1545",       "Fuel Supply System": "Multi Point Fuel Injection",..45 plus      },      {       "variantName": "RXL"       "variantUrl": "variant_url_with_many_different_properites_apart_from_this_in_json",       "Driver Airbag": "Yes",       "Rear Tread (mm)": "1545",       "Fuel Supply System": "Multi Point Fuel Injection",       "Body Type": "MUV",..45 plus      }     ],     "modelName": "Triber"    },      {"model <2>": [{"variant: [{<1,2,....n>}]}], "modelName 2": "model_name"},      {"model <2>": [{"variant: [{<1,2,....n>}]}], "modelName 2": "model_name"}      ],      "name": "Renault"    }  ]}'make' is an array of multiple objects.Each object in 'make' contains "name" of make and an array of 'model'.'model' is an array of multiple objects.Each object in 'model' contains "modelName" of make and an array of 'variant'.'variant' is an array of multiple objects.Each object in 'variant' contains multiple properties like 'variantName' and more...

45 MORE SPECIFICATIONS FOR EACH CAR VARIANT.ALSO, THERE ARE MANY MORE FEATURES WHICH I HAVE NOT SCRAPPED BUT I WILL DO IT IN FUTURE IF I GET SOME MORE TIME ;)

THIS PROJECT WAS FUN :)


[8]ページ先頭

©2009-2025 Movatter.jp