Tech

Easy Non-Coding Local LLM

Local LLMs can provide data privacy, security, offline availability, customization, and much more compared to hosted services like ChatGPT/Claude/Bard, etc.

Henry

Oct 1, 2023 • 2 min read

0. Before you continue

This article is very time-sensitive due to the rapid development of AI.

1. Background

Hugging Face is a machine learning (ML) and data science platform and community that helps users build, deploy and train machine learning models. It provides the infrastructure to demo, run and deploy artificial intelligence (AI) in live applications. Hugging Face
LM Studio is a user-friendly desktop application designed for experimenting with local and open-source Large Language Models (LLMs). It allows users to discover, download, and run any ggml-compatible model from Hugging Face. LM Studio

2. LLM Download

Note: In this article, operating system is Windows. Also Mac M1/M2 is available

For demo, I will use the model Nous Hermes Llama 2 7B - GGUF Link
All Nous-Hermes-Llama-2-7B-GGUF models:

Name	Quant method	Bits	Size	Max RAM required	Use case
nous-hermes-llama-2-7b.Q2_K.gguf	Q2_K	2	2.83 GB	5.33 GB	smallest, significant quality loss - not recommended for most purposes
nous-hermes-llama-2-7b.Q3_K_S.gguf	Q3_K_S	3	2.95 GB	5.45 GB	very small, high quality loss
nous-hermes-llama-2-7b.Q3_K_M.gguf	Q3_K_M	3	3.30 GB	5.80 GB	very small, high quality loss
nous-hermes-llama-2-7b.Q3_K_L.gguf	Q3_K_L	3	3.60 GB	6.10 GB	small, substantial quality loss
nous-hermes-llama-2-7b.Q4_0.gguf	Q4_0	4	3.83 GB	6.33 GB	legacy; small, very high quality loss - prefer using Q3_K_M
nous-hermes-llama-2-7b.Q4_K_S.gguf	Q4_K_S	4	3.86 GB	6.36 GB	small, greater quality loss
nous-hermes-llama-2-7b.Q4_K_M.gguf	Q4_K_M	4	4.08 GB	6.58 GB	medium, balanced quality - recommended
nous-hermes-llama-2-7b.Q5_0.gguf	Q5_0	5	4.65 GB	7.15 GB	legacy; medium, balanced quality - prefer using Q4_K_M
nous-hermes-llama-2-7b.Q5_K_S.gguf	Q5_K_S	5	4.65 GB	7.15 GB	large, low quality loss - recommended
nous-hermes-llama-2-7b.Q5_K_M.gguf	Q5_K_M	5	4.78 GB	7.28 GB	large, very low quality loss - recommended
nous-hermes-llama-2-7b.Q6_K.gguf	Q6_K	6	5.53 GB	8.03 GB	very large, extremely low quality loss
nous-hermes-llama-2-7b.Q8_0.gguf	Q8_0	8	7.16 GB	9.66 GB	very large, extremely low quality loss - not recommended

For demo purpose, I’m using the nous-hermes-llama-2-7b.Q4_K_M.gguf model.
Put the file under /your_path/author_name/repo_name/

3. Essentials

As of 20231001, GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp. GGUF offers numerous advantages over GGML, such as better tokenisation, and support for special tokens. It is also supports metadata, and is designed to be extensible.
Download Visual Studio latest version and install Desktop development with C++, MSVC vxxx -VS 2022 C++ x64/x86 build tools, C++ Clang Compiler for Windows, C++ CMake tools for Windows.
Run pip install llama-cpp-python

4. LM Studio Download

Go to official website and download the latest version of LM Studio.
Load your model and enjoy.

Copyright statement: Unless otherwise stated, all articles on this blog adopt the CC BY-NC-SA 4.0 license agreement. For non-commercial reprints and citations, please indicate the author: Henry, and original article URL. For commercial reprints, please contact the author for authorization.