DEV Community

Faseeh Arif
Faseeh Arif

Posted on

Need Help!

Hi I am in need of help from someone who is proficient in webscraping as I am new to programing. I have an assignment to extract the "about client" section from the job links. My script extracts only one "about client" but for other links it doesnot do that and throws error. The thing is there is xml file link from which I am extracting the job links and when those links are opened the html code is under java script for wich i am using selenium. I hae tried everything but didn't got the solution.`def extract_client_info(job_url):
client_info = {'About the Client': np.nan}

if job_url and job_url != "N/A":
    try:
        # Open the job URL
        driver.get(job_url)

        # Wait for the page to load
        WebDriverWait(driver, 30).until(
            EC.presence_of_element_located((By.CSS_SELECTOR, '.cfe-about-client-v2'))
        )

        # Extract specific details
        about_client_section = driver.find_element(By.CSS_SELECTOR, '.cfe-about-client-v2')
        client_location = about_client_section.find_element(By.CSS_SELECTOR, '[data-qa="client-location"]').text.strip()
        client_job_posting_stats = about_client_section.find_element(By.CSS_SELECTOR, '[data-qa="client-job-posting-stats"]').text.strip() if about_client_section.find_elements(By.CSS_SELECTOR, '[data-qa="client-job-posting-stats"]') else "N/A"
        client_company_profile = about_client_section.find_element(By.CSS_SELECTOR, '[data-qa="client-company-profile"]').text.strip()

        # Combine extracted information
        client_info['About the Client'] = (
            f"Location: {client_location}\n"
            f"Job Posting Stats: {client_job_posting_stats}\n"
            f"Company Profile: {client_company_profile}"
        )

    except Exception as e:
        print(f"Failed to get 'About the Client' for {job_url}: {e}")
        client_info['About the Client'] = np.nan

    finally:
        # Wait for 10 seconds before making the next request
        time.sleep(10)

return client_info`
Enter fullscreen mode Exit fullscreen mode

Heroku

Simplify your DevOps and maximize your time.

Since 2007, Heroku has been the go-to platform for developers as it monitors uptime, performance, and infrastructure concerns, allowing you to focus on writing code.

Learn More

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay