DEV Community

Gcobani Mkontwana
Gcobani Mkontwana

Posted on

Python scripts fails to copy the website and translating text into Hindi on my Terminal

Hi team

I need some help with my script, the script does two things, first scraps the webiste. In this instance i am using selenium to scrap the site and this works. But when i tried to translate what the content of the site(title course and name) From english to Hindi using data frame its failes. See below exceptions and stuck since this morning;

                                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "C:\Users\Zux\PycharmProjects\ClassCentral\ClassCentral\Lib\site-packages\googletrans\client.py", line 182, in translate
data = self._translate(text, dest, src, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Zux\PycharmProjects\ClassCentral\ClassCentral\Lib\site-packages\googletrans\client.py", line 78, in _translate
token = self.token_acquirer.do(text)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Zux\PycharmProjects\ClassCentral\ClassCentral\Lib\site-packages\googletrans\gtoken.py", line 194, in do
self._update()
File "C:\Users\Zux\PycharmProjects\ClassCentral\ClassCentral\Lib\site-packages\googletrans\gtoken.py", line 62, in _update
code = self.RE_TKK.search(r.text).group(1).replace('var ', '')
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'group'

// code to implement scraping the website using selenium webdrive and pandas.
`from googletrans import Translator
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions
import pandas as pd
import urllib.request as urllib2
import time

chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument("window-size=1920,1080")
driver = webdriver.Chrome(options=chrome_options)
url = "https://www.classcentral.com/collection/top-free-online-courses"
driver.get(url)

translator = Translator()

try:
while True:
# wait until button is clickable
WebDriverWait(driver, 1).until(
expected_conditions.element_to_be_clickable((By.XPATH, "//button[@data-name='LOAD_MORE']"))
).click()
time.sleep(0.5)
except Exception as e:
pass

all_courses = driver.find_element(by=By.CLASS_NAME, value='catalog-grid__results')
courses = all_courses.find_elements(by=By.CSS_SELECTOR, value='[class="color-charcoal course-name"]')

df = pd.DataFrame([[course.text, course.get_attribute('href')] for course in courses],
columns=['Title (eng)', 'Link'])

df['Title (hin)'] = df['Title (eng)'].apply(lambda x: translator.translate(x, dest='hi').text)

print(df)`

Top comments (0)