DEV Community

James Candan
James Candan

Posted on

12

Use Javascript Regex to Find All IDs That Contain a String and Copy the Text to the Clipboard

Web scraping is a powerful tool. I sometimes find that spinning up a full-fledged Beautiful Soup Python script for a short task is unnecessary. Today I had an issue with a web page that would not allow me to select items in a table to copy, and, even if it did, I would have had the additional, unwanted column data in my clipboard.

Solution: Console web scraping

Let's break this down.

First, what I wanted was a way to capture each element. The text I desired from the table was wrapped in a <div id="edit-tid-24-view"></div> tag. I tried targeting them first by a "begins with" filter:

document.querySelectorAll('[id^="edit-tid"]');
Enter fullscreen mode Exit fullscreen mode

This got me part of the way there, but I needed to target ID attribute values that not only started with this, but ended with -view. In typical Regex, you might do something like /edit-tid.*-view/. A bit greedy, but would have done the trick in my case. However, we don't really get to use Regex in querySelectors. So, I combined two filters: one for the beginning portion, the other for the ending portion.

document.querySelectorAll('[id^="edit-tid"][id$="-view"]');
Enter fullscreen mode Exit fullscreen mode

After that, it was quite simple. I wanted to loop through the NodeList object that was returned, so I had to first convert it to an Array.

Array.from(someObject);
Enter fullscreen mode Exit fullscreen mode

Once there, I could have mapped the innerText of each Node from the DOM to an array of the desired strings.

Array.from(someObject).map(function(item) { return item.text; });
Enter fullscreen mode Exit fullscreen mode

However, I was not satisfied with that.

I wanted my list cleanly output, and piped directly to my clipboard. Javascript allows one to select and execute a copy command on the document object. However I was working in the console, and found something much simpler: the copy function works in the console.

I simply concatenated the strings together with a carriage return, and copied the result to my clipboard.

Conclusion

Here's my Developer Tools Console web scraper in all it's glory.

copyText = ''; 
Array.from(
    document.querySelectorAll('[id^="edit-tid"][id$="-view"]'))
    .forEach(function (x) { 
        copyText += x.text + '\n' 
    }
); 
copy(copyText);
Enter fullscreen mode Exit fullscreen mode

Sentry blog image

How I fixed 20 seconds of lag for every user in just 20 minutes.

Our AI agent was running 10-20 seconds slower than it should, impacting both our own developers and our early adopters. See how I used Sentry Profiling to fix it in record time.

Read more

Top comments (0)

nextjs tutorial video

📺 Youtube Tutorial Series

So you built a Next.js app, but you need a clear view of the entire operation flow to be able to identify performance bottlenecks before you launch. But how do you get started? Get the essentials on tracing for Next.js from @nikolovlazar in this video series 👀

Watch the Youtube series

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay