Overall
I was developing my AI application.
Problem facing
I found that Jinja is conflict with jsonify()
, result in error of upload file and chat with GPT. Mentor helped me to solve it.
Learn
Since I want to extract all company ID in https://www.ctgoodjobs.hk/ . I tried to web crawling. Seccess.
import requests
from bs4 import BeautifulSoup
import json
import parsel # 第三方的模块
def main():
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36',
}
# Fetch the job listings from the API
url = "https://www.ctgoodjobs.hk/top-companies"
html_data = requests.get(url=url, headers=headers).text
selector = parsel.Selector(html_data)
# .get(): return string; no: return lsit
# .get: return 1st; no: all
lists = selector.css('.sub-sec li')
extra_data = selector.css('div.sub-sec ul.extra::text').get() # no text: string with <element> tag
company_ids = []
# tokenization
company_ids = extra_data.strip().split(',')
for list in lists:
company_id = list.css('a::attr(data-company-id)').get()
company_ids.append(company_id)
print(company_ids)
print(len(company_ids))
if __name__ == "__main__":
main()
Top comments (0)