Movatterモバイル変換


[0]ホーム

URL:


Skip to content
DEV Community
Log in Create account

DEV Community

Dennis O'Keeffe
Dennis O'Keeffe

Posted on • Originally published atblog.dennisokeeffe.com

     

Scraping websites with Xray

In this short post, we're going to scrape the website that this blog is hosted on to get all the links and posts back using Nodejs andXray.

Setup

We are going to keep things super minimal and bare. We just want a proof of concept on how to scrape the data from the rendered website HTML.

mkdirhello-xraycdhello-xrayyarn init-yyarn add x-raytouchindex.js
Enter fullscreen modeExit fullscreen mode

Scraping the website

Going to theblog and inspecting with the Developer Tools, we can see that there isn't many classes to go with, but we can use the selectors to decide how we are going to get the information back.

The website with developer tools

Create a new fileindex.js and add the following:

constXray=require("x-ray")functiongetPosts(url="https://blog.dennisokeeffe.com/"){constx=Xray()returnnewPromise((resolve,reject)=>{x(`${url}`,"main:last-child",{items:x("div",[{title:"h3 > a",description:"p",link:"h3 > a@href",date:"small",},]),})((err,data)=>{if(err){reject(err)}resolve(data)})})}constmain=async()=>{constposts=awaitgetPosts()console.log(posts)}main()
Enter fullscreen modeExit fullscreen mode

In the above script, we are simply running a main function that callsgetPosts and waits for the Promise to resolve before logging out the results.

The important part of the code comes from within thegetPosts function:

x(`${url}`,"main:last-child",{items:x("div",[{title:"h3 > a",description:"p",link:"h3 > a@href",date:"small",},]),})((err,data)=>{if(err){reject(err)}resolve(data)})
Enter fullscreen modeExit fullscreen mode

Thex function is calling the blog URL, the looking for the last child of themain DOM element you can see in the HTML DOM from the image shared above.

We are telling Xray to return an array ofitems, and within that, we want to add all the elements that fit the object we pass. In our case, I am using standard selectors to grab the title, description and date, but am using the extra@href helper with thelink to fetch the URL to the blog post!

That's it! Let's run the scraper now usingnode index.js.

Result

Perfect! Now you can take these same shorts tips and apply to anything you need to scrape down the track. Looking for alternatives or to use automation? You should also checkout Puppeteer or Playwright (added to resource links).

Resources and Further Reading

  1. GitHub - Xray
  2. GitHub - Puppeteer
  3. GitHub - Playwright
  4. Completed project

Originally posted on myblog. Follow me on Twitter for more hidden gems@dennisokeeffe92.

Top comments(1)

Subscribe
pic
Create template

Templates let you quickly answer FAQs or store snippets for re-use.

Dismiss
CollapseExpand
 
functional_js profile image
Functional Javascript
Full Stack System Designer and Software Engineer
  • Location
    Vancouver
  • Education
    Computer Science
  • Joined

Nice one Dennis.
I tested it out and it works.

I converted it from an explicit Promise idiom to an async-await idiom....

constXray=require('x-ray');//utilconstlpromise=p=>p.then(o=>console.log(o.items));/*@funcretrieve posts using xray@typedef {{items: string[]}} itemsObj@return {Promise<itemsObj>}*/constgetPosts=async()=>{constx=Xray();consturl="https://blog.dennisokeeffe.com";try{returnawaitx(url,"main:last-child",{items:x("div",[{title:"h3 > a",description:"p",link:"h3 > a@href",date:"small",},]),});}catch(err){console.error(err);}};//@testslpromise(getPosts());

Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment'spermalink.

For further actions, you may consider blocking this person and/orreporting abuse

Software Engineer by trade. Formerly of Culture Amp, UsabilityHub, Present Company and NightGuru.
  • Location
    Melbourne, Australia
  • Work
    Senior Engineer @ Visibuild
  • Joined

More fromDennis O'Keeffe

DEV Community

We're a place where coders share, stay up-to-date and grow their careers.

Log in Create account

[8]ページ先頭

©2009-2025 Movatter.jp