DEV Community

Gcobani Mkontwana
Gcobani Mkontwana

Posted on

Python scripts fails to copy the website and translating text into Hindi on my Terminal

Hi team

I need some help with my script, the script does two things, first scraps the webiste. In this instance i am using selenium to scrap the site and this works. But when i tried to translate what the content of the site(title course and name) From english to Hindi using data frame its failes. See below exceptions and stuck since this morning;

                                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "C:\Users\Zux\PycharmProjects\ClassCentral\ClassCentral\Lib\site-packages\googletrans\client.py", line 182, in translate
data = self._translate(text, dest, src, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Zux\PycharmProjects\ClassCentral\ClassCentral\Lib\site-packages\googletrans\client.py", line 78, in _translate
token = self.token_acquirer.do(text)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Zux\PycharmProjects\ClassCentral\ClassCentral\Lib\site-packages\googletrans\gtoken.py", line 194, in do
self._update()
File "C:\Users\Zux\PycharmProjects\ClassCentral\ClassCentral\Lib\site-packages\googletrans\gtoken.py", line 62, in _update
code = self.RE_TKK.search(r.text).group(1).replace('var ', '')
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'group'

// code to implement scraping the website using selenium webdrive and pandas.
`from googletrans import Translator
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions
import pandas as pd
import urllib.request as urllib2
import time

chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument("window-size=1920,1080")
driver = webdriver.Chrome(options=chrome_options)
url = "https://www.classcentral.com/collection/top-free-online-courses"
driver.get(url)

translator = Translator()

try:
while True:
# wait until button is clickable
WebDriverWait(driver, 1).until(
expected_conditions.element_to_be_clickable((By.XPATH, "//button[@data-name='LOAD_MORE']"))
).click()
time.sleep(0.5)
except Exception as e:
pass

all_courses = driver.find_element(by=By.CLASS_NAME, value='catalog-grid__results')
courses = all_courses.find_elements(by=By.CSS_SELECTOR, value='[class="color-charcoal course-name"]')

df = pd.DataFrame([[course.text, course.get_attribute('href')] for course in courses],
columns=['Title (eng)', 'Link'])

df['Title (hin)'] = df['Title (eng)'].apply(lambda x: translator.translate(x, dest='hi').text)

print(df)`

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

Top comments (0)

Image of Docusign

🛠️ Bring your solution into Docusign. Reach over 1.6M customers.

Docusign is now extensible. Overcome challenges with disconnected products and inaccessible data by bringing your solutions into Docusign and publishing to 1.6M customers in the App Center.

Learn more