Hi community,
I have few questions of LLMs and finetuning. I am new to this field and this might be noob questions, but i wanted to understand this clearly.
- When someone says Llama-7b. this means that it has ~7 billion parameters, right? What does these parameter mean?
2.Suppose someone finetunes a 7b model with 2M tokens, does that changes the original parameter?
I have seen people finetuning one model(say pretrained Llama-7b) with their own dataset, does that override the safetensors file? like if originally it had 3 files with 5GB size. after finetuning will the previous files be deleted? or will i get a new safetensor file with previous one still in place.
Follow up to question 3 -> If it overrides, wouldn't it change the original behaviour of the model? Suppose i trained on interview parameters and now if i ask "sun rises in which direction", originally it would say "East", but after finetuning, it would forget its original finetuned data, right? Precisely, what I am asking that if trained on Data D1 and then after some days i train on Data D2, will it still have context of D1. or will it forget? How to preserve the original knowledge while training with D2?
I have seen people using pipelines for LLM inferencing to get the prompt answers. With single core GPU, it is only able to answer 1 question at a time. suppose I host a server, and multiple requests starts coming, wouldn't it become a bottleneck. Is there someway to avoid this? Is multicore GPU the only answer for this?
How to boost by running on CPU. Llama cpp uses only one CPU core to get the answer for a query. Can we increase it to use all cores when there are multiple queries at same time when hosted as a server.
Top comments (0)