I found a big problem. When I was reading the data extracted from ctgoodjobs.hk in chroma DB, there is no a record with real salary value, all are N/A. There is job with real salary value.
e.g.
Hash: deef24dd43347dfd6077827d6bdfa86d
https://jobs.ctgoodjobs.hk/job/08976889/licensing-officer-shatin-22k-transport-department
Salary has real number
From streamlit run app.py
Salary hasn’t real number
I try to run ‘crawler.py’ and ‘app.py’ again.
Another problem was occurred. In the previous day, I thought running ‘crawler.py’ and ‘app.py’ successfully, I was worried it may not work.
I didn’t understand the query. When I searched with a job title, no correct output.
In order to check whether successful to add record to chroma DB. I ran program
# job_counter.py
import chromadb
def connect_to_chromadb():
client = chromadb.PersistentClient(path="./job_posts") # 確保這個路徑與主應用程序中的路徑相同
return client
def count_jobs(client):
collection_name = "jobs"
collection = client.get_collection(name=collection_name)
return collection.count()
if __name__ == "__main__":
client = connect_to_chromadb()
total_jobs = count_jobs(client)
print(f"Total number of jobs in the database: {total_jobs}")
before and after running crawler. Result is success.
After I modify the code of ‘crawler.py’, the salary was shown now.
Top comments (0)