Posted onAug 25, 2021

Getting Started with Splash in Docker

Splash is a javascript rendering service. I don't have much idea what this service actually is. All I know is the service is one of many tools that could help me scrapping sites that needs javascript to run and enabled. And Splash could work well along withScrapy, the web scrapping framework that I currently learn about. And as always, If this service can be done installed using Docker then I would give a try the docker way.

Pulling the Image

As instructed from the docker registrypage, we can pull the latest splash image using this docker command (The image size is huge enough, prepare your internet) :

docker pull scrapinghub/splash

And when check the image listed usingdocker image ls, we could see that it has a huge size:

scrapinghub/splash                        latest       9364575df985   12 months ago   1.89GB

Run As Container Service

We can name the service anything you want, but here let's it'ssplash-test. We forward the port to8050:8050 so we can access it on our browser. Here is the full command to create and run the container:

docker run --name splash-test -p 8050:8050 -d scrapinghub/splash

Once it created, you can check whether the service is running or stopped usingdocker container ls:

CONTAINER ID   IMAGE                COMMAND                  CREATED          STATUS          PORTS                                       NAMES6e49662c03a7   scrapinghub/splash   "python3 /app/bin/sp…"   48 seconds ago   Up 46 seconds   0.0.0.0:8050->8050/tcp, :::8050->8050/tcp   splash-test

You could also check the resource used by the service usingdocker stats:

CONTAINER ID   NAME          CPU %     MEM USAGE / LIMIT     MEM %     NET I/O          BLOCK I/O   PIDS6e49662c03a7   splash-test   0.08%     181.8MiB / 6.043GiB   2.94%     1.09MB / 987kB   0B / 0B     37

Render A Javascript-Required Site

You can access the service using your browser athttp://localhost:8050/ and here is what it looks like:

If you successfully followed me at this point, then you can start to render any web site that needs javascript enabled to view the pages. For example, you can usehttps://www.transfermarkt.com/ because I find that this site can't be viewed when I disable the javascript on the browser. So try it by filling the URL form with it and hit the greenRender me! button.