Running LLM's Locally Just Got Better

Running LLM's Locally Just Got Better

January 4, 2024

It’s the first week of the 2024, and I can’t think of a better way to start the year other than to write about the current hot thing - LLM’s (Large Language Models).

Large Language Models

A large language model (LLM) is an AI system designed to process and understand vast amounts of natural language data. It consists of multiple layers of neural networks that are trained on massive datasets, for example:

  • Academic datasets
  • Code
  • Financial Data
  • Books
  • And more

Each is used for specific tasks, enabling the model to learn patterns, grammar rules, and semantics from the text. Today, the most popular deep learning architecture for large language models is called Transformer, that was introduced by Google back in 2017.

Some well-known examples of LLMs include OpenAI’s GPT (Generative Pre-trained Transformer) series, Google’s BERT (Bidirectional Encoder Representations from Transformers), Google’s latest - BARD (Bi-directional Attention for Reading and Dialog), and Meta’s LLaMA. These models have demonstrated impressive results in a wide range of natural language processing tasks, such as summarization, question-answering, and more.

Okay, we get the point. We have big tech companies with a huge amount of Nvidia’s A100 40/80GB GPUs, which are stupidly expensive and overpriced due to the high demand. They are training and fine-tuning these models to their needs, and I personally think that they are biased, at least to some level.

Buy What About The Little Guy?

Back in 2016, 3 French guys wanted to develop a chatbot app for teenagers, later on they open sourced their model, and pivoted their focus on being a platform for machine learning aka - Hugging Face 🤗.

Today, Hugging Face is basically the “GitHub” for the AI community, where one can find open-sourced datasets, models, and most importantly - Transformers.

Transformers provides APIs and tools to easily download and train state-of-the-art pretrained models. Using pretrained models can save a lot of time and resources required to train a model from scratch, and even some models can be adjusted (weights etc..). These models support common tasks in different modalities, such as: Natural Language Processing (NLP), Computer Vision, Audio and Multimodal (for example vision and text).

The Star of The Show - Ollama

Written mostly in Go, Ollama is a user-friendly tool designed to run large language models (LLMs) locally on a computer. It supports various LLM models like:

  • phi-2
  • Starling-lm (trained by Berkeley)
  • LLaMA-2
  • CodeLLaMA
  • Mistral
  • Mixtral
  • Dolphin-mixtral
  • Multimodal models (llava)
  • Uncensored models (dolphin, llama and more)
  • And at least 20 more models.

Each one of them has different tags for parameters (7b, 30b, etc.), and one can simply pull any model they want and run it. Before we get to the “how-to,” I want to talk more about the features ollama provides and why it’s a game-changer. Features range from querying models through the CLI, RESTful API, Docker, tweaking models using the Modelfile, which is the blueprint to create and share models with Ollama, importing models using source model weights found on Hugging Face or similar, and many community integrations.

The landscape of Large Language Models (LLMs) is rapidly evolving, and tools like Ollama and Hugging-Face’s Transformers are making these powerful technologies more accessible than ever before. With Ollama/Transformers, individuals and small teams now have the capability to run state-of-the-art models locally, without the need for expensive hardware or extensive technical know-how. This democratization of AI tools is a game-changer, leveling the playing field and allowing a wider range of people to explore, innovate, and contribute to the field of artificial intelligence.

I run it on my MacBook Pro M1 Max, and the installation is fairly easy. All you need to do is to download the binary from ollama.ai and run it. Currently, they only offer support for macOS/linux.

After running, the Ollama logo will appear at the top, indicating that it’s running and ready.

ollama-logo

Now all you need to do is to run the folowing command:

ollama pull <model>:<tag> to pull a model with a specific tag. For example: ollama pull llama2:70b to download the 70B version of llama2.

The list of available models can be found at the Ollama’s library

Ollama Web UI

I found this tool to be very helpful and friendly to use. It’s a ChatGPT like interface for Ollama. Ollama Web UI

It’s packed with huge amount of features like: (taken from the repo)

  • 🖥️ Intuitive Interface

  • 🚀 Effortless Setup

  • 💻 Code Syntax Highlighting

  • ✒️🔢 Full Markdown and LaTeX Support

  • 📜 Prompt Preset Support

  • 👍👎 RLHF Annotation

  • 📥🗑️ Download/Delete Models

  • ⬆️ GGUF File Model Creation

  • 🤖 Multiple Model Support

  • 🔄 Multi-Modal Support

  • 🧩 Modelfile Builder

  • ⚙️ Conversations With Multiple Models

  • 🤝 OpenAI API Integration

  • 🔄 Regeneration History Access

  • 📜 Chat History

  • 📤📥 Import/Export Chat History

  • 🗣️ Voice Input Support

  • ⚙️ Fine-Tuned Control with Advanced Parameters

  • 🔗 External Ollama Server Connection

  • 🔐 Role-Based Access Control (RBAC)

  • 🔒 Backend Reverse Proxy Support

To install it, all you need to do is:

git clone https://github.com/ollama-webui/ollama-webui.git
cd ollama-webui/

# Copying required .env file
cp -RPp example.env .env

# Building Frontend - Assuming you have node.js install on your machine.
npm i
npm run build

# Serving Frontend with the Backend
cd ./backend
pip install -r requirements.txt
sh start.sh

Conclusion

Ollama’s features, from its user-friendly CLI and RESTful API to its comprehensive open-source WebUI, it allows users to leverage LLMs in new and exciting ways. Whether it’s natural language processing, or multimodal tasks, Ollama provides an intuitive and flexible platform for experimentation and development.

As we move forward, the continued growth and improvement of platforms like Ollama will be crucial in innovative AI community. By lowering the barriers to entry and providing robust, user-friendly tools, we can expect to see a surge in creative and diverse applications of LLMs. The future of AI is not just in the hands of big tech companies but also in the creative minds of individual developers, researchers, and enthusiasts around the world, thanks to tools like Ollama and Transformers.

Thank you for reading! See ya in the next one! 🤗👨‍💻

Last updated on