DEV Community

shalinibhavi525-sudo
shalinibhavi525-sudo

Posted on

How do you handle the Cold Start Problem for a semantic knowledge graph? (TF-IDF vs. Fine-Tuned BERT)

I’m working on a personal Augmented Intelligence System that builds a semantic knowledge graph from a user's unstructured notes. The goal is to show hidden connections between concepts. Currently, I'm using TF-IDF (Term Frequency-Inverse Document Frequency) for embedding, but it suffers from a massive cold start problem—it needs hundreds of documents to establish meaningful weightings before it becomes useful to a new user. The alternative is using a tiny, fine-tuned BERT model for zero-shot classification, but the computational cost and latency for real-time analysis in Python/Streamlit is terrifying. For a single-user, solo-developer project with zero budget, which architecture path is more sustainable and effective?

(P.S. If anyone wants to see the current TF-IDF implementation, the project is live at https://calyxbhavi.streamlit.app/

Top comments (0)