🤗How to download open source LLM models from huggingface and use it locally on your machine

Download any open-source LLM on your local machine.

Article by @Rohan Handore

19th April 2024


Hugging face is an excellent source for trying, testing and contributing to open source LLM models.

Acquiring models from Hugging Face is a straightforward process facilitated by the transformers library. This Python library, crafted by Hugging Face, offers a user-friendly and effective means to access and leverage these models. It offers a unified API for accessing and utilizing pre-trained models from Hugging Face.

Let’s see this with an example

  1. Install the huggingface-transformers library

pip install transformers

2. Import the library as well as the specific model you wish to obtain.

from transformers import AutoModel, AutoTokenizer

3. Specify the model you want to download

model_name = 'bert-base-uncased' #change the name if you want to use some other model
  1. Download the model and tokenizer

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)
  1. Let’s save the model locally and then load it from our system

local_model_directory="." # mention the localtion where you want to save the model
model.save_pretrained("local_model_directory")
tokenizer.save_pretrained("local_model_directory")

model = AutoModel.from_pretrained("local_model_directory")
tokenizer = AutoTokenizer.from_pretrained("local_model_directory")
  1. Now let’s try to generate some response from the model that we have saved

inputs = tokenizer.encode("rephrase like a data analyst. what is MTD sales in Singapore", return_tensors="pt").to("cuda")
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))

Consider the following additional considerations when obtaining and utilising open source LLM models:

  1. The model’s size varies according to the size of the dataset it was trained on, which influences storage space and computational resources required. Larger models necessitate more resources.

  2. Some models like LLAMA may require additional permissions from

  3. Models may exhibit varying proficiencies across different tasks. Some excel in text generation, while others are better suited for tasks such as question answering.

  4. The model may have limitations in understanding the context of the provided text, leading to potential errors in its output.

  5. Prior to deploying the model in a production environment, thorough evaluation is essential. This involves testing the model on a diverse array of tasks and datasets to assess its performance and reliability.

Last updated