Boyinbode Ebenezer Ayomide

Posted on Feb 14

How I Built an AI-Driven Job Automation Engine: My Hardest Engineering Lessons

#ai #architecture #automation #webscraping

Building a job automation engine sounds like a fun project until you hit the first React-based form that ignores your scripts.

After months of building for one a client on a Job automation core, I've realized that modern web automation isn't about "botting"—it's about teaching code to behave with human-level nuance. If you're looking to build something similar or just want to level up your browser automation game, here is the blueprint I followed and the "gotchas" I learned along the way.

1. Don't Build a Scraper. Build a System.

The biggest mistake I made early on was trying to run automation directly from my API routes. Browser instances are heavy; they spike memory and can kill your main service.

My Solution: I moved everything to a distributed microservice architecture using NestJS and Bull (Redis).

When you're building this, treat every automation as a background job. This gives you:

Retries: If a page fails to load, the system retries with exponential backoff.
Concurrency Control: You can limit how many browsers run at once so you don't melt your server.

2. Dealing with Brittle Selectors (The AI Bridge)

If you rely on id="first_name", your code will break next week. Modern UIs change too fast.

What I learned: Use LLMs as a "semantic translator." Instead of hardcoding selectors, I wrote a service that crawls the page, extracts all visible inputs/labels, and sends that "schema" to an AI.

Here’s how you can implement this "discovery" phase:

// How I extract form metadata for the AI to "understand"
const fields = await page.evaluate(() => {
  return Array.from(document.querySelectorAll('input, select, textarea')).map(el => ({
    id: el.id,
    label: el.closest('div')?.querySelector('label')?.innerText || '',
    type: el.tagName
  }));
});

// Then, I ask the AI to map the user's resume data to these IDs
const answers = await aiService.generateAnswers(fields, resumeData);

By doing this, I stopped caring if the developer changed the CSS or renamed a field. The AI just "gets it."

3. Outsmarting the "React Void"

This was my biggest "aha!" moment. I noticed that high-level commands like page.fill() often failed to trigger React state updates. The value would appear on the screen, but when I clicked "Submit," the form acted like it was empty.

The Fix: You have to simulate the actual hardware events. I learned that focusing, typing manually, and blurring the input is the only way to satisfy modern framework listeners (like Redux or React Hook Form).

// The "Human" way to fill a field
async function humanFill(page: Page, selector: string, value: string) {
    const input = await page.$(selector);
    await input.focus();
    await input.click();

    // Clear the field first
    await page.keyboard.press('Control+A');
    await page.keyboard.press('Backspace');

    // Type with a slight delay to mimic a human
    await page.keyboard.type(value, { delay: 40 });

    // Explicitly trigger the 'change' and 'input' events
    await input.evaluate(el => {
        el.dispatchEvent(new Event('input', { bubbles: true }));
        el.dispatchEvent(new Event('change', { bubbles: true }));
    });
}

4. The Stealth Layer (Staying Under the Radar)

Job boards don't like bots. If you use a default headless browser, you’ll get blocked.

My Tip: Always remove the navigator.webdriver flag and use a real User-Agent. But the real "pro tip" I discovered? CSP Bypassing. Sometimes site security blocks reCAPTCHA assets on certain network configurations. I used Playwright's route interceptor to fix this:

// A little trick to ensure reCAPTCHA always loads correctly
await page.route('**/*.recaptcha.net/**', route => {
    const url = route.request().url().replace('www.recaptcha.net', 'www.google.com');
    route.continue({ url });
});

5. Trust but Verify (Success Detection)

A common mistake is assuming that if the URL changed, the submission worked. In my experience, job boards are full of "noise"—like diversity surveys that show a "Thank You" message even if your main application crashed.

I learned to clone the DOM, strip out the noise (footers, survey sections), and perform a strict text search.

Final Thoughts
The web is messy. Automation is the art of handling that mess. My biggest takeaway is that human-centric automation (keystrokes, AI-mapping, event-dispatching) beats brute-force scraping every single time.

If you’re building your own engine, start small, capture screenshots of every failure, and treat the browser like a living person interacting with your code.

Have questions about how I handled specific anti-bot blocks or AI prompts? Drop them in the comments!

Let's Connect

Linkedln | Github | Twitter |

Top comments (1)

wfgsss • Feb 15

The "React Void" section really resonated — I hit the exact same issue with Chinese e-commerce platforms where page.fill() silently fails because the site uses custom React components with internal state management. Your humanFill approach with explicit event dispatching is the right call.

One pattern I'd add: on sites like Yiwugo and DHgate, the DOM structure changes based on Accept-Language headers, so the same form can have completely different selectors depending on locale. I ended up building a two-pass system: first pass extracts all interactive elements with their visible labels (similar to your AI discovery phase), second pass maps user data to those labels using fuzzy matching. Works surprisingly well across language boundaries.

The CSP bypass for reCAPTCHA is a great trick. For Chinese platforms the equivalent challenge is usually SMS verification or slider CAPTCHAs — different beast entirely, but the principle of intercepting and rerouting requests still applies.

Curious about your retry strategy — do you use exponential backoff per-field or per-form? I found that per-field retries with a fresh page context between attempts gives much better success rates than retrying the entire form submission.