DEV Community

chanon-mike
chanon-mike

Posted on

Extract texts and image from Google Slide embedded iframe with Plasmo extensions

In this blog, I am trying to fetch the text and image from emded google slide iframe using TypeScript.
You can either try to run it as script but here I'm using Plasmo, a React-based extension framework.

I want to make this because of how tiring to take notes for every class in the school. I make it an extension because you can easily add a download button directly through the website without switching tab and running scripts everytime.

Environment setup

Following Plasmo official documentation, by installing the dependencies. If you don't have node installed yet, install it first.

They strongly recommend pnpm for package management, so here we are using pnpm (If you don't have it, follow pnpm doc

I would like to use src directory, so I will include --with-src flag

# Fill out the project description and install dependencies
pnpm create plasmo --with-src
pnpm install
Enter fullscreen mode Exit fullscreen mode

Running development server

pnpm dev
Enter fullscreen mode Exit fullscreen mode

Then, load it in to Google Chrome extension.
Go to Extension management page, then click on load unpacked on the top left corner

Load Unpacked

Then, select and load build/chrome-mv3-dev folder.

Folder Structure

Done!
Now you can focus on developing the extension itself!

Development

Content Scripts

You can create contents directory directly inside src directory to make it run when the website url matches.
Here I will make the file for the downloading scripts named google-slide.ts.

import type { PlasmoCSConfig } from 'plasmo';

export const config: PlasmoCSConfig = {
    matches: ['https://docs.google.com/*']
};

window.addEventListener('load', () => {
    console.log('content script loaded');
});
Enter fullscreen mode Exit fullscreen mode

This script will run when the url start with https://docs.google.com/.

You only need to make the function to extract the text from slide itself! In my usecase, I want to be able to download it as html file. So, I created the function inside the same file.

Caution: Your slide structure may not be the same as mine, so please adjust the code

const downloadSlide = async () => {
    console.log('downloadSlide run');

    let totalPage = '';
    const name = document.title;

    while (true) {
        const pageTitle =
            document
                .querySelector('.punch-viewer-svgpage-a11yelement')
                ?.getAttribute('aria-label')
                ?.replace(/\x0B/g, '<br>') || '';
        const svgContainer = document.querySelector('.punch-viewer-svgpage-svgcontainer:last-child svg') as SVGSVGElement;
        let text = '';

        // Get text in svg
        svgContainer?.querySelectorAll('g').forEach((g) => {
            const label = g.getAttribute('aria-label');
            if (label) {
                text += label + '<br />';
            }
        });

        // Add text to each page
        const page = `<div class="row"><h4>${pageTitle}</h4><div></div>${new XMLSerializer().serializeToString(
            svgContainer
        )}<div>${text}</div></div>`;
        totalPage += page;

        // End the loop if the last slide is reached
        if (
            document.querySelector('.docs-material-menu-button-flat-default-caption').getAttribute('aria-setsize') ==
            document.querySelector('.docs-material-menu-button-flat-default-caption').getAttribute('aria-posinset')
        ) {
            break;
        }

        // Go to the next slide and wait for slide transition
        document.dispatchEvent(new KeyboardEvent('keydown', { keyCode: 39 }));
        await new Promise((resolve) => setTimeout(resolve, 1));
    }

    const html = `<!DOCTYPE html><html><head><title>${name}</title></head><body><main class="container">${totalPage}</main></body></html>`;

    const link = document.createElement('a');
    link.href = URL.createObjectURL(new Blob([html], { type: 'text/html' }));
    link.download = name + '.html';
    link.click();
    window.close();
};
Enter fullscreen mode Exit fullscreen mode

To explain the code, I extract the title of the document for the file name.
Then running while loop to get text from each page of the slide. Here is the part where it extract text

// title
const pageTitle =
    document
        .querySelector('.punch-viewer-svgpage-a11yelement')
        ?.getAttribute('aria-label')
        ?.replace(/\x0B/g, '<br>') || '';

// text container in svg
const svgContainer = document.querySelector('.punch-viewer-svgpage-svgcontainer:last-child svg') as SVGSVGElement;
let text = '';

// Get text in svg
svgContainer?.querySelectorAll('g').forEach((g) => {
    const label = g.getAttribute('aria-label');
    if (label) {
        text += label + '<br />';
    }
});
Enter fullscreen mode Exit fullscreen mode

Then, I would like to have slide image too, so I use new XMLSerializer().serializeToString(svgContainer) for convert svg to image.

I loop this for every page and end it if last slide is reached. Finally, create a hidden button with a link to download the html created.

Now, the only things left is to create condition to be able to download this slide! In my case, I add a parameter download=true for the url. For example https://docs.google.com/presentation/d/e/xxxx/embed?download=true

window.addEventListener('load', () => {
    console.log('content script loaded');

    if (window.location.search.match(/download=true/)) {
        downloadSlide();
    }
});
Enter fullscreen mode Exit fullscreen mode

You can add the parameter directly and download the slide now!

Download Button

I don't want to add parameter everytime, so I gonna add the button for download the slide at the bottom of embeded iframe.

Here, created new file inside src/contents directory as a .tsx file.

src/content/download-slide.tsx

import type { PlasmoCSConfig, PlasmoGetInlineAnchor } from 'plasmo';

export const config: PlasmoCSConfig = {
    matches: ['Your url'] # Change url here
};

export const getInlineAnchor: PlasmoGetInlineAnchor = async () => document.querySelector('.embed-responsive');

const DownloadSlideButton = () => {
    const url = document.querySelector(`.embed-responsive iframe`).getAttribute('src');

    return (
        <>
            <a href={`${url}&download=true`} target="_blank" rel="noopener noreferrer">
                <button>Download as HTML</button>
            </a>
        </>
    );
};

export default DownloadSlideButton;
Enter fullscreen mode Exit fullscreen mode

After reloading your site, you will be able to see the button at the bottom of your embeded slide iframe!

Summary

You are able to create extension to download the slide in HTML! If you want to deploy it to chrome extension store, try following plasmo official documentation.

Below is the code summary

src/contents/google-slide.ts

import type { PlasmoCSConfig } from 'plasmo';

export const config: PlasmoCSConfig = {
    matches: ['https://docs.google.com/*']
};

window.addEventListener('load', () => {
    if (window.location.search.match(/download=true/)) {
        downloadSlide();
    }
});

const downloadSlide = async () => {
    console.log('downloadSlide run');

    let totalPage = '';
    const name = document.title;

    while (true) {
        const pageTitle =
            document
                .querySelector('.punch-viewer-svgpage-a11yelement')
                ?.getAttribute('aria-label')
                ?.replace(/\x0B/g, '<br>') || '';
        const svgContainer = document.querySelector('.punch-viewer-svgpage-svgcontainer:last-child svg') as SVGSVGElement;
        let text = '';

        // Get text in svg
        svgContainer?.querySelectorAll('g').forEach((g) => {
            const label = g.getAttribute('aria-label');
            if (label) {
                text += label + '<br />';
            }
        });

        // Add text to each page
        const page = `<div class="row"><h4>${pageTitle}</h4><div></div>${new XMLSerializer().serializeToString(
            svgContainer
        )}<div>${text}</div></div>`;
        totalPage += page;

        // End the loop if the last slide is reached
        if (
            document.querySelector('.docs-material-menu-button-flat-default-caption').getAttribute('aria-setsize') ==
            document.querySelector('.docs-material-menu-button-flat-default-caption').getAttribute('aria-posinset')
        ) {
            break;
        }

        // Go to the next slide and wait for slide transition
        document.dispatchEvent(new KeyboardEvent('keydown', { keyCode: 39 }));
        await new Promise((resolve) => setTimeout(resolve, 1));
    }

    const html = `<!DOCTYPE html><html><head><title>${name}</title><style>/* Your styles here */</style></head><body><main class="container">${totalPage}</main></body></html>`;

    const link = document.createElement('a');
    link.href = URL.createObjectURL(new Blob([html], { type: 'text/html' }));
    link.download = name + '.html';
    link.click();
    window.close();
};
Enter fullscreen mode Exit fullscreen mode

src/content/download-slide.tsx

import type { PlasmoCSConfig, PlasmoGetInlineAnchor } from 'plasmo';

export const config: PlasmoCSConfig = {
    matches: ['Your url'] # Change url here
};

export const getInlineAnchor: PlasmoGetInlineAnchor = async () => document.querySelector('.embed-responsive');

const DownloadSlideButton = () => {
    const url = document.querySelector(`.embed-responsive iframe`).getAttribute('src');

    return (
        <>
            <a href={`${url}&download=true`} target="_blank" rel="noopener noreferrer">
                <button>Download as HTML</button>
            </a>
        </>
    );
};

export default DownloadSlideButton;
Enter fullscreen mode Exit fullscreen mode

Top comments (0)