DEV Community

MAK KA WAI
MAK KA WAI

Posted on

Daily Log - 19/08/2024

I found a big problem. When I was reading the data extracted from ctgoodjobs.hk in chroma DB, there is no a record with real salary value, all are N/A. There is job with real salary value.

e.g.
Hash: deef24dd43347dfd6077827d6bdfa86d
https://jobs.ctgoodjobs.hk/job/08976889/licensing-officer-shatin-22k-transport-department

Image description

Salary has real number

From streamlit run app.py

Image description

Salary hasn’t real number

I try to run ‘crawler.py’ and ‘app.py’ again.
Another problem was occurred. In the previous day, I thought running ‘crawler.py’ and ‘app.py’ successfully, I was worried it may not work.

I didn’t understand the query. When I searched with a job title, no correct output.

Image description


In order to check whether successful to add record to chroma DB. I ran program

# job_counter.py

import chromadb

def connect_to_chromadb():
    client = chromadb.PersistentClient(path="./job_posts")  # 確保這個路徑與主應用程序中的路徑相同
    return client

def count_jobs(client):
    collection_name = "jobs"
    collection = client.get_collection(name=collection_name)
    return collection.count()

if __name__ == "__main__":
    client = connect_to_chromadb()
    total_jobs = count_jobs(client)
    print(f"Total number of jobs in the database: {total_jobs}")

Enter fullscreen mode Exit fullscreen mode

before and after running crawler. Result is success.


After I modify the code of ‘crawler.py’, the salary was shown now.

Image description

Top comments (0)