Movatterモバイル変換


[0]ホーム

URL:


Skip to content
DEV Community
Log in Create account

DEV Community

Cover image for Sometimes things simply don't work
Artur Daschevici
Artur Daschevici

Posted on

     

Sometimes things simply don't work

As I have previously mentioned I am rather fond ofpuppeteer. It's a useful library for all kinds of web automation...but like any open source project it needs some TLC.

I am not in any way associated with the developers at puppeteer, but if you are looking for a way to contribute, they areopen source

The frustration

I was looking at a somewhat long page(think vertically) and tried to create a screenshot of it. The optimist in me was thinking that it will simply work so I went on as usual and planned my approach on the assumption that it will function as intended.

I checked the screenshot and found that it was a tiled image of a fixed size crop from the top of the file. First reaction was frustration...but I think it was more at myself that I had not allowed any margin for error in the experiment.

The insight

There is no reason to point fingers when something is not working, especially in OSS, if you have the chops fix it for yourself, share it, if it is good enough it might get adopted upstream. In other words perfect is the enemy of good.

The bug

Before focusing on hacking my way out of the jam I scoured the web, as usually problems are not as unique as one might think. I am ashamed to admit it, but I'm not fond of documentation and hacking my way out of a problem by digging into the different related projects' docs is the last step in my debugging journey.

I found that this was related to anold, still open bug in the puppeteer repo.

Discussion ongoing to quite recently...but still open.

The consensus I could gather is either useplaywright or use a workaround to solve it in the puppeteer layer. The root cause of the bug isa websocket size limitation on the CDP protocol for chromium.

I had an intention of using playwright but in some of my tests it was failing to load some pages so I decided to revisit the puppeteer idea and solve the issue where I can.

Hacking my way through it

Started by doing a height based chunking method. A more generic approach was to create achunker that returns a function so that the chunk height is configurable via the parameter.

// return a chunker function with the height for each chunk// number will be the full height of the element you want to // grab a screenshot ofconstchunkBy=(n)=>number=>{letchunks=newArray(Math.floor(number/n)).fill(n);chunks=chunks.map((c,i)=>{return{height:c,start:i*c}});constremainder=number-chunks[chunks.length-1].start-chunks[chunks.length-1].height;if(remainder>0){chunks.push({height:remainder,start:chunks[chunks.length-1].start+chunks[chunks.length-1].height});}console.log('CHUNKS =',chunks);returnchunks;};
Enter fullscreen modeExit fullscreen mode

Afterwards I wrote the method for grabbing the screenshot that works regardless of the height of it so that it works around the CDP limitation.

asyncfunctiongrabSelectorScreenshot(){constbrowser=awaitpuppeteer.launch();constpage=awaitbrowser.newPage();awaitpage.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3');// urls is a list of string urlsfor(consturlofurls){consthashed=crypto.createHash('sha256').update(url).digest('hex');awaitpage.goto(url,{waitUntil:'networkidle0'});// this is where the element is selectedconstelement=awaitpage.$("div#document1 div.eli-container");// get height and width for later iteratingconst{width,height}=awaitelement.boundingBox();constdesignatedPathPng=`./screenshots/${hashed}-merged-ss.png`;// chunk by 4000 heightconstheights=chunkBy4k(height);// keep track of starting point and height// to have continuous mapping of the imageconstchunks=heights.map((h,i)=>{returnelement.screenshot({clip:{x:0,y:h.start,height:h.height,width,},path:`./screenshots/${hashed}-${i}-ss.png`})});// wait for all the part files to be writtenconstfilesResolved=awaitPromise.all(chunks)// merge all the parts in a vertical layoutconstmergedImage=awaitmergeImg(filesResolved,{direction:true});// this is interesting, the merged image is a promise,// but the write only worked via a function callbackmergedImage.write(designatedPathPng,async()=>{browser.close();constdataPng=awaitreadFile(designatedPathPng);constb64imgPng=Buffer.from(dataPng).toString('base64');// clean up the temporary files createdawaitdeleteFilesMatchingPattern('./screenshots',newRegExp(`^${hashed}-\\d+-ss\\.png$`));returnb64imgPng;});}}
Enter fullscreen modeExit fullscreen mode

Cleaning up temporary files

You probably want to clean up the files. One way to do that:

asyncfunctiondeleteFilesMatchingPattern(dirPath,regex){try{constfiles=awaitreaddir(dirPath);// Read all files in the directoryfor(letfileoffiles){if(regex.test(file)){// Check if the file matches the patternconstfilePath=path.join(dirPath,file);awaitfs.unlink(filePath);// Delete the fileconsole.log(`Deleted:${filePath}`);}}}catch(error){console.error('Error:',error);}}
Enter fullscreen modeExit fullscreen mode

In hindsight, probably a better way to do this is by using actualtmp files and decouple the cleanup, but this was good enough for a barebones script.

Conclusion

  • OSS needs some TLC
  • problems are rarely unique
  • it's better to hack at it and unblock yourself, switching library is more of a PITA as there are no guarantees

Top comments(0)

Subscribe
pic
Create template

Templates let you quickly answer FAQs or store snippets for re-use.

Dismiss

Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment'spermalink.

For further actions, you may consider blocking this person and/orreporting abuse

AnyStack developer. Enjoy JavaScript, Python, Go and Rust. Bullish on AI, RAGs and Knowledge Graphs
  • Location
    Zurich
  • Work
    Wannabe Indie Hacker
  • Joined

More fromArtur Daschevici

DEV Community

We're a place where coders share, stay up-to-date and grow their careers.

Log in Create account

[8]ページ先頭

©2009-2025 Movatter.jp