Movatterモバイル変換


[0]ホーム

URL:


Skip to content
DEV Community
Log in Create account

DEV Community

BK ☕
BK ☕

Posted on • Edited on

     

Automate Taking Website Screenshots With Selenium in Python

Update 09/11/2020

Updated Title for more clarity
Published onhttps://bilalkhoukhi.com/blog/automate-taking-website-screenshots-with-selenium-in-python

Update 01/01/2019

Here is a Node.js app doing the same work, it is using a React front-end.Node.js App with Selenium and React

Update 12/26/2018

If you wish to run the job using Headless Chrome, either on Windows, Mac, Linux (Desktop or through SSH), I have added a new section at the end of the post.

Original Post

A not too long story short I dread doing repeated tasks manually and that's why I'd rather spend minutes (hours when doing it for the first time) to write a script that automates the process for me. Otherwise I cannot focus, and if you're also horrible at focusing just like I am, I recommend readingDeep Work by Cal Newport.

Problem:

I have a Photoshop mock-up of a laptop, stablet and a phone that I need to use to showcase a website.

Image

That's an easy and quick job, 3 screenshots and 2 minutes later, you're done. But what if you have 10, 20, or 100 websites, and you have to supply screenshots of different screen sizes (a desktop, a mobile and a tablet)?

Solution:

Selenium is the solution. And since I've only used it with Python, I choose to do this tutorial with Python, it is also faster for me to setup compared to a Node.js project, but the functionality should be similar.

Let's start:

If you don't have Python installed in your system already, get it from here:Python 3.6+
You will also needChrome Webdriver

Once you have Python setup, install Selenium by running the command:

$pipinstallselenium
Enter fullscreen modeExit fullscreen mode

It's easy to get started with python, all you need is a single file that imports the modules you will be using,selenium and perhapstime which will allow us to delay/wait a few seconds after the page is loaded before we take the screenshot.

We will also make use of thetime module, which is part of thePython standard library..

Next, Create a file with the extension.py and open it with your favorite editor.

First we'll start by importing webdriver from selenium and time.

fromseleniumimportwebdriverimporttime
Enter fullscreen modeExit fullscreen mode

Now let's define a list of links we want to take screenshots of

links=['dev.to','twitter.com','github.com']
Enter fullscreen modeExit fullscreen mode

It's import to note that the webdriver does not consider the links above as valid since they do not have thehttps:// protocol. We will be adding it later, and the reason I added them without the protocol is so that I can use the above as file names.

We can start using the webdriver by callingwebdrive.chrome(/path/to/chromedriver) as follows

withwebdriver.Chrome("C:/chromedriver_win32/chromedriver")asdriver:# code goes here using driver
Enter fullscreen modeExit fullscreen mode

The above reads aswith this function defined as driver.. do things with driver
Which is somewhat equivalent to

driver=webdriver.Chrome("C:/chromedriver_win32/chromedriver")# code goes here using driver
Enter fullscreen modeExit fullscreen mode

As far as I can tell, the difference is that the first method will automatically close the browser once it is done doing its job, unlike the second one, which you would have to explicitly calldriver.close() to close the browser. If anyone knows other reasons, please let me know in the comments.

For this tutorial I will be using the first method.
For those new to Python, it is important to note that the following lines will be indented, as they're contained in the statement above, similar to how you would use curly braces to contain a function/class body in C++, JavaScript and other languages.

For the purpose of taking a screenshot, we will be using 3 commands:set_window_size(),get() andsave_screenshot()

driver.set_window_size(width,height)# takes two arguments, width and height of the browser and it has to be called before using get()driver.get(url)# takes one argument, which is the url of the website you want to opendriver.save_screenshot(/output-dir/file_name.png)# this one takes one argument which is the path and filename all concatenated. Important: the filename should end with .png
Enter fullscreen modeExit fullscreen mode

To learn more about these commands head to theWebdriver API documentation

Let's put it all together, we will need to loop through the list of links we created above, and define some variables to use for filename, width and height, and the link with thehttps:// protocol

withwebdriver.Chrome("C:/chromedriver_win32/chromedriver")asdriver:forlinkinlinks:desktop={'output':str(link)+'-desktop.png','width':2200,'height':1800}tablet={'output':str(link)+'-tablet.png','width':1200,'height':1400}mobile={'output':str(link)+'-mobile.png','width':680,'height':1200}linkWithProtocol='https://'+str(link)
Enter fullscreen modeExit fullscreen mode

We have defined 3 dictionaries that contain the output filename, width and height for every screen size we need, you can change the dimensions based on your needs.

Now we can start using the commands explained above

# set the window size for desktopdriver.set_window_size(desktop['width'],desktop['height'])driver.get(linkWithProtocol)time.sleep(s)driver.save_screenshot(desktop['output'])
Enter fullscreen modeExit fullscreen mode

Noticed that we have usedtime.sleep(1), what that means is that we asked python to delay the execution of the next line until 2 seconds pass. Even without thesleep command, the driver will wait for the page to fully load before it takes the screenshot, however, if you have any animation, you may want to wait for things to settle down, if that's not needed, then you can get rid of the line.

and this is how the tablet and mobile will look like:

# set the window size for tabletdriver.set_window_size(tablet['width'],tablet['height'])driver.get(linkWithProtocol)time.sleep(2)driver.save_screenshot(tablet['output'])# set the window size for mobiledriver.set_window_size(mobile['width'],mobile['height'])driver.get(linkWithProtocol)time.sleep(2)driver.save_screenshot(mobile['output'])
Enter fullscreen modeExit fullscreen mode

And with that, all you need to do is run the script from the command line

python script.py
Enter fullscreen modeExit fullscreen mode

This will open Chrome browser, resize it, take the screenshot each time and output the file at the same directory you have script.py until it is done.

The whole code put together can be foundhere on Github

Final note

Selenium is capable of great things, it can manipulate pages with CSS and JavaScript, fill out forms and submit them, etc... Learn it and put it to good use.

How to use Selenium with Headless Chrome

It can be annoying to have the browser open and close so many times while you're trying to finish other work. The solution is to use Headless Chrome.

Not much will change from the code above.
We need to define ourChromeOptions() and add an argument to it usingadd_argument() command as follows:

options=webdriver.ChromeOptions()# define optionsoptions.add_argument("headless")# pass headless argument to the options
Enter fullscreen modeExit fullscreen mode

Then we will pass a new argument to the command we defined before:

withwebdriver.Chrome("C:/chromedriver_win32/chromedriver")asdriver:# code here
Enter fullscreen modeExit fullscreen mode

becomes

withwebdriver.Chrome("C:/chromedriver_win32/chromedriver",chrome_options=options)asdriver:# code here
Enter fullscreen modeExit fullscreen mode

This is all you will need for Mac and Windows which I have tested. Linux desktop have not been tested, but I assume it will work as well.

Linux Server through SSH

Things work differently between Linux Desktop and Server, the latter has no screen, and I am sure it is stripped from so many drivers it does not need, and for that reason, we will have to pass a few more arguments to make it work.
As I have not tested this on Linux Desktop, if the solution above does not work, then treat it like we will treat Linux Server

First install Chrome Browser (I never thought I'd need it here)

sudoapt-getinstall-y chromium-browser# BTW I use Ubuntu (says a non-Arch user)
Enter fullscreen modeExit fullscreen mode

Then we add these arguments

options=webdriver.ChromeOptions()# define optionsoptions.add_argument("headless")# pass headless argument to the optionsoptions.binary_location='/usr/bin/chromium-browser'# location of the Chrome Browser binaryoptions.add_argument("disable-infobars")# disabling infobarsoptions.add_argument("--disable-extensions")# disabling extensionsoptions.add_argument("--disable-gpu")# applicable to windows os onlyoptions.add_argument("--disable-dev-shm-usage")# overcome limited resource problemsoptions.add_argument("--no-sandbox")# Bypass OS security model# thanks to https://stackoverflow.com/a/50642913/2291648, this answer helped debug the issue
Enter fullscreen modeExit fullscreen mode

And just like we did above, pass thechrome_options=options to thewebdriver.Chrome(...,webdriver.Chrome) as a second argument.

This worked for me, and I hope it does for you as well.

Github link (updated repo)

Top comments(10)

Subscribe
pic
Create template

Templates let you quickly answer FAQs or store snippets for re-use.

Dismiss
CollapseExpand
 
rhymes profile image
rhymes
Such software as dreams are made on.I mostly rant about performance, unnecessary complexity, privacy and data collection.
  • Joined

Hey Bilal, nice overview. I noted something odd that I wanted to ask you about. The moduletime is in the standard library astime and so istime.sleep(). I'm not sure wat you're installing through pip. I checked onpypi but it's not clear.

CollapseExpand
 
bk profile image
BK ☕
Addicted to coffee and code. Frontend Engineer.Follow me on Twitter @imbk_dev
  • Location
    HTX
  • Joined

Hello rhymes, thank you for noticing that. I double checked, and indeed the time module is part of the standard library.docs.python.org/3/library/time.html
I have edited the post. Your feedback is appreciated.

CollapseExpand
 
smirza profile image
Suhail
An open source enthusiast and a python developer.
  • Location
    India
  • Work
    Problem Solver (as I like to call it) at WalmartLabs
  • Joined

Thanks a lot, Bilal for sharing this. Curious to know if this solution is OS dependent.

CollapseExpand
 
bk profile image
BK ☕
Addicted to coffee and code. Frontend Engineer.Follow me on Twitter @imbk_dev
  • Location
    HTX
  • Joined
• Edited on• Edited

Hello Suhail, you're very welcome.
I am on my Mac right now, so I had the chance to do a quick test.
All I had to do is download Chrome Webdriver for MacOS from the link provided in the post, installed selenium with pip. Of course I had the change the path to the webdriver, I ran the python script and it worked with no additional tweaking.

Another tip, if you'd like to run the job with Chrome headless:

# imports here# links = [array of links]options=webdriver.ChromeOptions()options.add_argument('headless')withwebdriver.Chrome('/Users/userName/chromedriver',chrome_options=options)asdriver:# code here

I tested the headless on MacOS, I am trying to test this on Ubuntu on a remote server. I'll update the post if I succeed.

CollapseExpand
 
smirza profile image
Suhail
An open source enthusiast and a python developer.
  • Location
    India
  • Work
    Problem Solver (as I like to call it) at WalmartLabs
  • Joined

Nice work Bilal, Thanks a lot for the effort.

CollapseExpand
 
bk profile image
BK ☕
Addicted to coffee and code. Frontend Engineer.Follow me on Twitter @imbk_dev
  • Location
    HTX
  • Joined

Suhail, I have updated my post if you're interested in Headless Chrome.

CollapseExpand
 
smirza profile image
Suhail
An open source enthusiast and a python developer.
  • Location
    India
  • Work
    Problem Solver (as I like to call it) at WalmartLabs
  • Joined

Thanks a ton Bilal :)

CollapseExpand
 
mutale85 profile image
Mutale85
Using the best possible tools to building my SaaS business.Originally from Kitwe, now living in the capital Lusaka.
  • Location
    Lusaka, Zambia
  • Education
    Diploma in Fire Science Engineering
  • Work
    FullStack Developer at Osabox
  • Joined
• Edited on• Edited

using the solutions you provided here, I have been able to add a few things.

  1. Get Urls from Mysql database
  2. Skip broken links or url that return no responseThanks a lot , this has been my foundation to starting Selenium.My code is hosted here.mutamuls.medium.com/screenshot-url...
CollapseExpand
 
sambathlim86 profile image
Sambat LIM
Hello, My name Sambat Lim. I am currently a web programmer.
  • Joined

for new version of chrome webdriver, you should change your url from "google.com" to "google.com".

CollapseExpand
 
mutale85 profile image
Mutale85
Using the best possible tools to building my SaaS business.Originally from Kitwe, now living in the capital Lusaka.
  • Location
    Lusaka, Zambia
  • Education
    Diploma in Fire Science Engineering
  • Work
    FullStack Developer at Osabox
  • Joined

Hello Sir. Thanks for this tutorial. I have learnt a lot.

Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment'spermalink.

For further actions, you may consider blocking this person and/orreporting abuse

Addicted to coffee and code. Frontend Engineer.Follow me on Twitter @imbk_dev
  • Location
    HTX
  • Joined

Trending onDEV CommunityHot

DEV Community

We're a place where coders share, stay up-to-date and grow their careers.

Log in Create account

[8]ページ先頭

©2009-2025 Movatter.jp