DEV Community

Gcobani Mkontwana
Gcobani Mkontwana

Posted on

Python scripts fails to copy the website and translating text into Hindi on my Terminal

Hi team

I need some help with my script, the script does two things, first scraps the webiste. In this instance i am using selenium to scrap the site and this works. But when i tried to translate what the content of the site(title course and name) From english to Hindi using data frame its failes. See below exceptions and stuck since this morning;

                                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "C:\Users\Zux\PycharmProjects\ClassCentral\ClassCentral\Lib\site-packages\googletrans\client.py", line 182, in translate
data = self._translate(text, dest, src, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Zux\PycharmProjects\ClassCentral\ClassCentral\Lib\site-packages\googletrans\client.py", line 78, in _translate
token = self.token_acquirer.do(text)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Zux\PycharmProjects\ClassCentral\ClassCentral\Lib\site-packages\googletrans\gtoken.py", line 194, in do
self._update()
File "C:\Users\Zux\PycharmProjects\ClassCentral\ClassCentral\Lib\site-packages\googletrans\gtoken.py", line 62, in _update
code = self.RE_TKK.search(r.text).group(1).replace('var ', '')
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'group'

// code to implement scraping the website using selenium webdrive and pandas.
`from googletrans import Translator
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions
import pandas as pd
import urllib.request as urllib2
import time

chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument("window-size=1920,1080")
driver = webdriver.Chrome(options=chrome_options)
url = "https://www.classcentral.com/collection/top-free-online-courses"
driver.get(url)

translator = Translator()

try:
while True:
# wait until button is clickable
WebDriverWait(driver, 1).until(
expected_conditions.element_to_be_clickable((By.XPATH, "//button[@data-name='LOAD_MORE']"))
).click()
time.sleep(0.5)
except Exception as e:
pass

all_courses = driver.find_element(by=By.CLASS_NAME, value='catalog-grid__results')
courses = all_courses.find_elements(by=By.CSS_SELECTOR, value='[class="color-charcoal course-name"]')

df = pd.DataFrame([[course.text, course.get_attribute('href')] for course in courses],
columns=['Title (eng)', 'Link'])

df['Title (hin)'] = df['Title (eng)'].apply(lambda x: translator.translate(x, dest='hi').text)

print(df)`

AWS Security LIVE!

Tune in for AWS Security LIVE!

Join AWS Security LIVE! for expert insights and actionable tips to protect your organization and keep security teams prepared.

Learn More

Top comments (0)

Sentry image

See why 4M developers consider Sentry, “not bad.”

Fixing code doesn’t have to be the worst part of your day. Learn how Sentry can help.

Learn more