Splash is a javascript rendering service. I don't have much idea what this service actually is. All I know is the service is one of many tools that could help me scrapping sites that needs javascript to run and enabled. And Splash could work well along withScrapy, the web scrapping framework that I currently learn about. And as always, If this service can be done installed using Docker then I would give a try the docker way.
Pulling the Image
As instructed from the docker registrypage, we can pull the latest splash image using this docker command (The image size is huge enough, prepare your internet) :
docker pull scrapinghub/splash
And when check the image listed usingdocker image ls
, we could see that it has a huge size:
scrapinghub/splash latest 9364575df985 12 months ago 1.89GB
Run As Container Service
We can name the service anything you want, but here let's it'ssplash-test
. We forward the port to8050:8050
so we can access it on our browser. Here is the full command to create and run the container:
docker run --name splash-test -p 8050:8050 -d scrapinghub/splash
Once it created, you can check whether the service is running or stopped usingdocker container ls
:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES6e49662c03a7 scrapinghub/splash "python3 /app/bin/sp…" 48 seconds ago Up 46 seconds 0.0.0.0:8050->8050/tcp, :::8050->8050/tcp splash-test
You could also check the resource used by the service usingdocker stats
:
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS6e49662c03a7 splash-test 0.08% 181.8MiB / 6.043GiB 2.94% 1.09MB / 987kB 0B / 0B 37
Render A Javascript-Required Site
You can access the service using your browser athttp://localhost:8050/ and here is what it looks like:
If you successfully followed me at this point, then you can start to render any web site that needs javascript enabled to view the pages. For example, you can usehttps://www.transfermarkt.com/ because I find that this site can't be viewed when I disable the javascript on the browser. So try it by filling the URL form with it and hit the greenRender me!
button.
As the result, you can see the snapshot image of the site, some statistics, and more importanly the raw html document that ready for you to scrap it.
That's it and have fun scrapping!
Top comments(0)
For further actions, you may consider blocking this person and/orreporting abuse