First hands-on experience with running LLAMA2
This post shares the step-by-step process of running the LLAMA2 model locally on MAC. The post is a reflection of my learning of Large Language Models and their practical applications.
Image by Freepik
This post assumes your system already has Python installed.
Installing required libraries
Clone llama C++ repo
git clone https://github.com/ggerganov/llama.cpp.git
You can create a virtual environment for setting up llama. I have used here conda
commant to create a new environment for my setup.
conda create -n llama
conda activate llama
Next, go to the repository and run the make
command.
cd llama.cpp
make
Install python packages
pip3 install llama-cpp-python
Accesing llama model files
We need llama2 model files. To access these files, a request has to be made by filling out the form given here. On submission, you will get an email providing instructions to download llama2 models.
The email will have an URL which is asked while downloading the models.
Converting model files
Now, we have llama2 models and installed llama CPP binding for Python. We will now convert the downloaded model files.
The first step in converting the model file is to run the following command while being in the llama.cpp directory. To run this command, we need the path of the directory containing the downloaded models.
python3 convert.py <directory_containing_llama_model>
This command will generate a file with the name ggml-model-f16.gguf and save it in the repository of llama_model which has the downloaded models.
The second step will run the quantize command.
./quantize <directory_containing_llama_model>/ggml-model-f16.gguf <directory_containing_llama_model>/ggml-model-q4_0.gguf q4_0
You can also use the already converted file available here if you get any error while running above two steps.
Installing LangChain library to build applications
LangChain makes it easier to develop applications using language models. You can learn about it more here
pip3 install langchain
Building applications on the top of a language model
Now, we will see the fun part of asking a question to llama2 and getting its answer.
The following python script has been used from the tutorial available here.
from langchain.llms import LlamaCpp
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
= CallbackManager([StreamingStdOutCallbackHandler()])
callback_manager
= LlamaCpp(model_path='../ggml-model-q4_0.gguf',
llm =0.0,
temperature=1,
top_p=6000,
n_ctx=callback_manager,
callback_manager=True,
verbose
)
= "who is mr. narendra modi?"
question = llm(question)
answer
print('Q:',question)
print('A:',answer)
Output:
Mr. Narendra Modi is the current Prime Minister of India, serving since May 2014. He is known ...
References
https://medium.com/@karankakwani/build-and-run-llama2-llm-locally-a3b393c1570e
https://github.com/facebookresearch/llama-recipes/blob/main/demo_apps/HelloLlamaLocal.ipynb