Discussion on: Working alone is so exhausting so I created my own assistant

View post

Replies for: Thank you Lionel-rowe! ( Still confused lol)

The parent puppeteer script runs in the Node.JS runtime, whereas the callback to page.evaluate runs in the Chromium browser runtime, headlessly by default ("headless" basically just means that it runs in the background, so you can't visibly see it running). Passing complex data between runtimes is often not possible, because the different runtimes don't know how to interpret it, and DOM elements (DOM is the way the browser interprets HTML) are internally very complex. So to simplify the message passing, Puppeteer uses a serialized format that both runtimes can easily understand. The drawback is that any data that can't be converted to this serialized format is lost.

You can think of "serialized" as meaning something like flat, like a string of letters or binary digits. JSON is a typical serialization format and is useful because it allows the "flattening" of "deep" structures. For example, the JavaScript object { a: { b: 1 } } nests b within a, yet it can be serialized to the JSON string {"a":{"b":1}}. Why is this flat? Well, it's simply the character {, followed by ", followed by a, etc., so it can be read left-to-right; even though the object it represents is a tree structure.

Puppeteer does much of this JSON serialization "under the hood", so you often don't need to worry about it; but JSON can't serialize DOM nodes, because they contain circular structures, e.g. *{ a: { b: *{ a: ... } } } (where * represents a reference to the exact same object). So you need to return only things that JSON can represent — strings, numbers, booleans, null, arrays, and objects containing other JSON-able stuff.

const elementData = await page.evaluate(() => {
    const el = document.querySelector('h1')

    return {
        textContent: el.textContent, // string — OK
        childElementCount: el.childElementCount, // number — OK
        className: el.className, // string — OK
        outerHTML: el.outerHTML, // string — OK
    }
})

console.log(elementData)
// {
//     textContent: 'Posted on Mar 15'
//     childElementCount: 1,
//     className: 'fs-xs color-base-60'
//     outerHTML: '<p class="fs-xs color-base-60">Posted on <time datetime="2022-03-15T02:18:47Z" class="date-no-year" title="Tuesday, March 15, 2022, 2:18:47 AM">Mar 15</time></p>',
// }

Min • Mar 18 '22

Omg @lionelrowe
Big thank you to explain this with full of kind detail!
This is really easy to understand! You are the best!👍👍👍👍👍