Movatterモバイル変換


[0]ホーム

URL:


Skip to content
DEV Community
Log in Create account

DEV Community

AWS Community Builders  profile imageKarthik Subramanian
Karthik Subramanian forAWS Community Builders

Posted on • Edited on • Originally published atMedium

     

Web Scraping with Selenium & AWS Lambda

In my last post I created a lambda that accepts a request, stores it in a dynamodb table and sends a message to an SQS queue.

Let’s now create another lambda to read from that queue and process the request by scraping the url using selenium.

Installing Selenium

Create a new file under src called “chrome-deps.txt” and copy the following into it -

acl adwaita-cursor-theme adwaita-icon-theme alsa-lib at-spi2-atk at-spi2-coreatk avahi-libs cairo cairo-gobject colord-libs cryptsetup-libs cups-libs dbusdbus-libs dconf desktop-file-utils device-mapper device-mapper-libs elfutils-default-yama-scopeelfutils-libs emacs-filesystem fribidi gdk-pixbuf2 glib-networking gnutls graphite2gsettings-desktop-schemas gtk-update-icon-cache gtk3 harfbuzz hicolor-icon-theme hwdata jasper-libsjbigkit-libs json-glib kmod kmod-libs lcms2 libX11 libX11-common libXau libXcomposite libXcursor libXdamagelibXext libXfixes libXft libXi libXinerama libXrandr libXrender libXtst libXxf86vm libdrm libepoxyliberation-fonts liberation-fonts-common liberation-mono-fonts liberation-narrow-fonts liberation-sans-fontsliberation-serif-fonts libfdisk libglvnd libglvnd-egl libglvnd-glx libgusb libidn libjpeg-turbo libmodmanlibpciaccess libproxy libsemanage libsmartcols libsoup libthai libtiff libusbx libutempter libwayland-clientlibwayland-cursor libwayland-egl libwayland-server libxcb libxkbcommon libxshmfence lz4 mesa-libEGL mesa-libGLmesa-libgbm mesa-libglapi nettle pango pixman qrencode-libs rest shadow-utils systemd systemd-libs trousers ustrutil-linux vulkan vulkan-filesystem wget which xdg-utils xkeyboard-config
Enter fullscreen modeExit fullscreen mode

Create another file called “install-browser.sh” and copy the following -

#!/bin/bashecho "Downloading Chromium..."curl "https://www.googleapis.com/download/storage/v1/b/chromium-browser-snapshots/o/Linux_x64%2F$CHROMIUM_VERSION%2Fchrome-linux.zip?generation=1652397748160413&alt=media" > /tmp/chromium.zipunzip /tmp/chromium.zip -d /tmp/mv /tmp/chrome-linux/ /opt/chromecurl "https://www.googleapis.com/download/storage/v1/b/chromium-browser-snapshots/o/Linux_x64%2F$CHROMIUM_VERSION%2Fchromedriver_linux64.zip?generation=1652397753719852&alt=media" > /tmp/chromedriver_linux64.zipunzip /tmp/chromedriver_linux64.zip -d /tmp/mv /tmp/chromedriver_linux64/chromedriver /opt/chromedriver
Enter fullscreen modeExit fullscreen mode

Update the Dockerfile to look like this -

FROM public.ecr.aws/lambda/python:3.9 as stage# Hack to install chromium dependenciesRUN yum install -y -q sudo unzip# Current stable version of ChromiumENV CHROMIUM_VERSION=1002910# Install ChromiumCOPY install-browser.sh /tmp/RUN /usr/bin/bash /tmp/install-browser.shFROM public.ecr.aws/lambda/python:3.9 as baseCOPY chrome-deps.txt /tmp/RUN yum install -y $(cat /tmp/chrome-deps.txt)COPY --from=stage /opt/chrome /opt/chromeCOPY --from=stage /opt/chromedriver /opt/chromedriverCOPY create.py ${LAMBDA_TASK_ROOT}COPY process.py ${LAMBDA_TASK_ROOT}COPY requirements.txt ${LAMBDA_TASK_ROOT}COPY db/ ${LAMBDA_TASK_ROOT}/db/RUN python3.9 -m pip install -r requirements.txt -t .
Enter fullscreen modeExit fullscreen mode

Update the requirements.txt file and add

selenium==4.4.2
Enter fullscreen modeExit fullscreen mode

And install the dependency

pip install -r src/requirements.txt
Enter fullscreen modeExit fullscreen mode

Process the request

Create a new file under src for the new lambda function called “process.py”



Finally, modify the template.yaml file to tell SAM about the new lambda -



Since we created a new lambda function, we need to tell aws where to grab the image from. Modify the samconfig.toml file and add another entry into the image_repositories array for ProcessFunction with the exact same value as that of CreateFunction. So if the row looked like this before -

image_repositories = ["CreateFunction=541434768954.dkr.ecr.us-east-2.amazonaws.com/serverlessarchexample8b9687a4/createfunction286a02c8repo"]
Enter fullscreen modeExit fullscreen mode

It should now look like this -

image_repositories = ["CreateFunction=541434768954.dkr.ecr.us-east-2.amazonaws.com/serverlessarchexample8b9687a4/createfunction286a02c8repo","ProcessFunction=541434768954.dkr.ecr.us-east-2.amazonaws.com/serverlessarchexample8b9687a4/createfunction286a02c8repo"]
Enter fullscreen modeExit fullscreen mode

Test the changes

Build the app -

sam build
Enter fullscreen modeExit fullscreen mode

To mimic receiving an event from the queue, we invoke the lambda by passing it a sample payload.

Under the events directory, update the contents of the event.json file -



Now we run the app locally with the following command -

sam local invoke --env-vars ./tests/env.json -e ./events/event.json ProcessFunction
Enter fullscreen modeExit fullscreen mode

The output should look like -

SAM output

Check the local dynamodb table to verify that the request was marked complete -

DynamoDB table

Deploying the changes

Deploy the changes to aws with the following command -

sam deploy
Enter fullscreen modeExit fullscreen mode

The output should look like this -

SAM deploy output

Just like before, test the changes by triggering a request for postman & validating the data in the dynamodb table -

dyanmodb table

You’ll notice that the message from the last test was also processed successfully.

Source Code

Here is the source code for the project created here.

Next: Part 5: Writing a CSV to S3 from AWS Lambda

Top comments(0)

Subscribe
pic
Create template

Templates let you quickly answer FAQs or store snippets for re-use.

Dismiss

Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment'spermalink.

For further actions, you may consider blocking this person and/orreporting abuse

Build On!

Would you like to become an AWS Community Builder? Learn more about the program and apply to join when applications are open next.

More fromAWS Community Builders

DEV Community

We're a place where coders share, stay up-to-date and grow their careers.

Log in Create account

[8]ページ先頭

©2009-2025 Movatter.jp