Running AI locally

Martin Průcha, 16. 04. 2024


Shifting AI applications towards local operation brings several benefits, such as higher security or the ability to feed the model with internal data more easily. It is a trend of edge computing, which brings computation and data storage closer to the location where it is needed, improving response times and saving bandwidth. There are several ways to integrate AI on software of its own; from downloading an open-source AI model, to buying a specialized chip which has a chatbot integrated in itself. 

To make the topic concise, we will focus on the Large Language Model (LLM). LLM can be conceptualized as a vast network of interconnected digital neurons that somewhat resemble the neural networks of the human brain – the only difference is that where in human brains the neurons are  physical, in LLM everything is virtual (the actual computation is not based on neural networks but on classical chips). During the training process, these nodes are weighted on a massive dataset of labeled question-answer pairs. Through this dataset, the model learns the intricacies of language usage. ‘Weighing’ in this context refers to the adjustment of numerical values associated with the connections between the nodes. 

Downloading an LLM involves obtaining this complex software, which contains pre-trained neural network. There are many open-source LLMs on Github. Although the initial models are pre-trained, they can be fine-tuned to further specialize in particular topics or styles of communication. 

Implementing an LLM locally necessitates a certain level of programming expertise, a powerful computer capable of handling intensive processing tasks, and an appropriate user interface (UI) to interact with the model (though a simple Command Line Interface can do the job). For those aiming at smaller scale projects, Neurochat by reddit user Alfredo Ortega might do the job.  

However, one limitation of running LLMs locally is that open-source models may not be as powerful as their commercial counterparts, see the graph below:

While open-source models are a great starting point, they typically have a smaller token window—the range of text they can consider at one time—which can limit their effectiveness for complex tasks. Commercial models can operate with up to a one-million-token window, enabling a deeper and more nuanced understanding of the text.

For those who do not have a background in programming or the necessary resources to run an LLM, there are alternatives. Companies like NVIDIA offer platforms that make it easier to access the power of AI without the need for in-depth technical knowledge. Nvidia platform invites users to experience chatting with a generative AI model powered by their RTX GPUs. Another interesting project is Flitig, which promises to build autonomous AI coworkers designed for the user, as is shown in this video

Running AI locally offers a range of benefits from improved latency to enhanced privacy, but it requires the right expertise and hardware. For those who cannot overcome these hurdles, the alternatives offer a taste of what AI can do. 

Author: Priklenk

Picture: 

 


More posts