Setting Up and Using Open Source LLM
Article by @Rohan Handore 22th April 2024
Last updated
Article by @Rohan Handore 22th April 2024
Last updated
In the contest between Closed Source LLM models like OpenAI and Claude, and Open Source LLM models such as LLAMA and Falcon, we are observing firsthand the dynamic evolution of linguistic technologies, as they redefine boundaries and set new standards for natural language processing. How to start with OpenAI API guide you can find Here! In this post we’ll describe different ways to start using Open Source LLM’s.
Where to find the Best Open Source LLM’s?
The most well known Open Source platform for transformer based models is HuggingFace.
In HuggingFace you can find Datasets, all kinds of open source models like NLP, Computer vision, Multimodal, Audio, Tabular, Reinforcement Learning. You can filter the models by the relevant task:
Let’s have a look on Meta’s LLM — the LLAMA-2.
The latest Llama models have made significant advancements in the realm of large language models. There are three variations of Llama-2, boasting 7, 13, and 70 billion parameters. Notably, while most of these models are open-sourced for both commercial and research use, the Llama 2 34B version remains unreleased to the public.Complementing these are the refined conversational models, Llama-2-Chat, available in 7B, 34B, and 70B configurations.
The number of parameters is a rough indicator of:
Complexity: Larger models can, in general, capture more intricate patterns in data. A model with billions of parameters, like 7b, would be considered very large and complex.
Computational Requirements: Larger models need more memory and processing power, both for training and inference. Therefore, a 7b model would require substantial hardware resources, especially when compared to smaller models.
2. Using Open Source model in your application
Let’s explore how to execute the open-source LLAMA-2 model on Google Colab for inference. (Note: We won’t be fine-tuning models in this guide). As previously mentioned, the LLM model demands significant computational resources like GPU, ample RAM, and disk space. To run it effectively on Google Colab, a ‘Pro’ account might be necessary.
Every month, Google Colab Pro allocates 90 credit points, allowing a balance to grow up to 200 points. For running the ‘Llama-2–7b-chat’ model, selecting a T4 GPU is advisable. It’s approximate usage rate stands at 2.05 credit points per hour.
Let’s start with installing some dependencies:
To use LLAMA models, you’ll need access permission from Meta. You can apply for it Here! Once you’ve gained access, you can utilize your HuggingFace access key for authentication.
After executing the code, you’ll receive a message prompting you to input your HuggingFace access token. Please copy and paste it into the indicated field.
Then you can run this pipeline code configuring the arguments:
Then you can ask any question you want, for example:
This is the response we get:
How cool is that?!
I hope you enjoyed this article and are now inspired to run your own Open Source model on your device :)