DEV Community

Cover image for Introducing LoadTime: Bringing Clarity to HuggingFace Model Loading Times
riversun
riversun

Posted on

1

Introducing LoadTime: Bringing Clarity to HuggingFace Model Loading Times

Are you tired of the uncertainty of how long it takes to load large-scale pre-trained language models, such as those from HuggingFace, into GPU or CPU memory? Do you find yourself gazing blankly at a static screen, not knowing when the loading will finally conclude? Let me present you with the solution - LoadTime.

LoadTime is a library that I've developed to tackle this very problem. It provides a progress bar during the memory loading process, bringing an end to your uncertainty. Now, let's dive a little deeper into how it works and what it has to offer.

While HuggingFace's models do provide progress indications during the download phase, they leave you in the dark during memory loading. This is where LoadTime comes into play. The mechanism is simple yet effective: During the initial load, LoadTime caches the total loading time. When the same model is loaded subsequently, LoadTime uses this cached time as a reference to display a progress bar.

Unlike other progress display libraries like tqdm , which require a known total count beforehand, LoadTime is designed for situations where the total is unknown. It provides real-time updates based on past data, making it a versatile tool for your tasks.

Example

pip install loadtime
Enter fullscreen mode Exit fullscreen mode

Here is a simple example of how to use the LoadTime package:

In order to use it, you simply need to wrap the model loading part with LoadTime. It's as easy as that!

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from loadtime import LoadTime

model_path = "togethercomputer/RedPajama-INCITE-Chat-3B-v1"


# model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.float16)


model = LoadTime(name=model_path,
                 fn=lambda: AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.float16))()


tokenizer = AutoTokenizer.from_pretrained(model_path) # important. load tokenizer after model.

Enter fullscreen mode Exit fullscreen mode

Thanks.

AWS GenAI LIVE image

Real challenges. Real solutions. Real talk.

From technical discussions to philosophical debates, AWS and AWS Partners examine the impact and evolution of gen AI.

Learn more

Top comments (0)

Billboard image

Create up to 10 Postgres Databases on Neon's free plan.

If you're starting a new project, Neon has got your databases covered. No credit cards. No trials. No getting in your way.

Try Neon for Free →

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay