Unlocking the Future of AI: Harnessing Local LLMs and Autogen for Unprecedented Power

Mastering the Art of AI Deployment: Empowering Your Arsenal with Local LLMs such as Llama2 and Mistral-7B

Unlocking the Future of AI: Harnessing Local LLMs and Autogen for Unprecedented Power
AutoGen - Enable Next-Gen LLM Apps
a framework that enables development of LLM applications using multiple agents that can converse with each other to solve task
AutoGen Example II - Snake Game
Classic snake game generated by AutoGen

1. Background

Are you in search of a way to create a formidable army of organized AI agents with Autogen, using local LLMs, instead of relying on the paid OpenAI service? Well, you've come to the right place!

While Chat LLMs are undeniably impressive, giving them the capability to act as intelligent agents takes things to a whole new level. What if you could harness the power of multiple such agents? Enter Microsoft's cutting-edge Autogen project.

However, there's a significant hurdle to overcome. Autogen was initially designed to be closely integrated with OpenAI, but this approach has its limitations. It can be quite expensive, and it's subject to censorship and lacks sentience.

This is where the simplicity and versatility of using a local LLM like Mistral-7B come into play. You're not limited to just one model; you have the freedom to choose from a variety of options, including Llama2, Falcon, Vicuna, Alpaca, and more. The only limit is your hardware's capabilities.

The key to making this work is to employ the OpenAI JSON format for output in your local LLM server, such as Oobabooga's text-generation-webui, and then seamlessly connect it to Autogen. That's precisely what we're going to guide you through today.

It's worth noting that there are other methods available for making LLMs generate text in OpenAI API format, such as using the llama.cpp Python bindings.

In this comprehensive tutorial, we'll cover the following steps:

  1. Acquiring Oobabooga's text-generation-webui, an LLM (Mistral-7B), and Autogen.
  2. Configuring the OpenAI format extension on Oobabooga.
  3. Initiating the local LLM server with the OpenAI format.
  4. Connecting it to Autogen.

So, without further ado, let's get started on this exciting journey.

2. Setting Up Oobabooga's Text Generation Web UI with LLM (Mistral-7B) and Autogen

Before you begin, it's a good idea to set up a virtual environment to manage your Python packages. If you're new to this concept, create a new virtual environment and activate it. It's a neat way to keep your project's dependencies isolated.

Getting Oobabooga's Text Generation Web UI:

Oobabooga's Text Generation Web UI is a well-known tool for hosting Large Language Models (LLMs) on your local machine. You can get started by visiting the web page dedicated to this tool and following their straightforward installation guide. Additionally, if you're utilizing an NVIDIA GPU for acceleration, consider downloading CUDA for better performance.

Getting an LLM (Mistral-7B-Instruct):

After downloading the Text Generation Web UI, hold off on running it for now. You'll need to obtain an LLM to bring your project to life.

Today, we're diving into Mistral-7B, specifically Mistral-7B-instruct-v0.1.Q4_K_S.gguf, which is an optimized version of the model developed by TheBloke. You can choose the optimized model that best suits your needs based on the information provided in the description.

Depending on your hardware capabilities, you have the flexibility to select either a smaller or larger LLM model. Feel free to experiment on your computer; after all, we're exploring the realms of science here.

To acquire the LLM, head over to the Files and Versions page and download the following files:

  • config.json
  • Mistral-7B-instruct-v0.1.Q4_K_S.gguf (this version performs well on most setups)

Once you've downloaded these files, navigate to the installation folder of Text Generation Web UI and find the "models" folder. Within this folder, create a new directory with a name of your choice, such as "mistral-7b-instruct." The path will resemble this:


Place both the config.json file and the model.gguf in this newly created folder.

Getting Autogen:

To install Microsoft's multi-agent Python library for generating content, simply use the pip package installer in your terminal:

pip install pyautogen

With these components in place, you'll be ready to harness the power of Text Generation Web UI, your chosen LLM, and the Autogen library for your text generation endeavors.

3. Setting Up OpenAI Format Extension on Oobabooga

Now that you've successfully installed the text-generation-webui and acquired the LLM, the next step is to configure your local Oobabooga server to communicate in the OpenAI JSON format. OpenAI's API formats and features are well-documented, offering you a wide array of possibilities to explore.

To integrate Autogen with your local server, you'll need to activate the "openai" extension in the Oobabooga's text-generation-webui extensions folder.

Here's how you can set it up:

  1. Locate the OpenAI Extension: Open your terminal and navigate to the "text-generation-webui/extensions/openai" folder within your Oobabooga installation directory.

  2. Install Extension Requirements: In this directory, you'll find a "requirements.txt" file. Install the necessary requirements by running the following command in your terminal:

pip install -r requirements.txt

By following these steps, you'll enable your Oobabooga server to speak in the OpenAI JSON format, allowing for seamless integration with Autogen and enhancing your text generation capabilities. This step brings you closer to harnessing the full power of your local server for AI-driven content generation.

4. Starting the Local LLM Server in OpenAI Format

Now that you've prepared the OpenAI extension and configured the Oobabooga server, it's time to get the LLM server up and running from the text-generation-webui root folder.

While the term "webui" implies a web-based user interface, you can also use it as a standalone server to access APIs from other programs you develop.

To initiate the local server with the OpenAI API extension, execute the following command based on your operating system:

For Windows:

./start_windows.bat --extensions openai --listen --loader llama.cpp --model mistral-7b-instruct

For Linux:

./start_linux.sh --extensions openai --listen --loader llama.cpp --model mistral-7b-instruct

For MacOS:

./start_macos.sh --extensions openai --listen --loader llama.cpp --model mistral-7b-instruct

Let's break down the command:

  • We include the "extensions openai" parameter to load the OpenAI extension.
  • "Listen" is used to start a server that Autogen can query.
  • "Loader" and "model" are used to specify the loader for the model and the model folder name we created earlier, which contains the config.json and model.gguf files.

The web interface is now running on your localhost at port 7860, but more importantly, your OpenAI-compatible API is also ready for Autogen to access at your local host via This setup enables seamless communication between Autogen and your local LLM server, allowing you to unleash the full potential of text generation for your projects.

5. Connecting Autogen to Your Local LLM Server

By now, you have Autogen installed, and it's time to connect it to your local LLM server. Start by creating a new directory wherever it's convenient for you, and add a new Python file named autogen.py (you can rename it as you like).

Typically, if you were connecting to OpenAI GPT's API, your script would begin like this:

import autogen  # Start by importing the autogen library

config_list = [
        'model': 'gpt-3.5-turbo',
        'api_key': 'your OpenAI API key'

However, for your local server, you'll initiate it as follows:

import autogen  # Start by importing the autogen library

config_list = [
        "model": "mistral-instruct-7b",  # The name of your running model
        "api_base": "",  # The local address of the API
        "api_type": "open_ai",
        "api_key": "sk-12345678901234567890",  # Just a placeholder

Since you're working locally, you don't need a real API key, so the "sk-1234567..." placeholder will suffice.

Next, let's set up the agent and the human user. Please read the comments for a better understanding:

import autogen  # Start by importing the autogen library

config_list = [
        "model": "mistral-instruct-7b",  # The name of your running model
        "api_base": "",  # The local address of the API
        "api_type": "open_ai",
        "api_key": "sk-12345678901234567890",  # Just a placeholder

# Create an AI AssistantAgent named "assistant"
assistant = autogen.AssistantAgent(
        "seed": 42,  # Seed for caching and reproducibility
        "config_list": config_list,  # A list of OpenAI API configurations
        "temperature": 0,  # Temperature for sampling
        "request_timeout": 400,  # Timeout
    },  # Configuration for Autogen's enhanced inference API, compatible with OpenAI API

# Create a human UserProxyAgent instance named "user_proxy"
user_proxy = autogen.UserProxyAgent(
    is_termination_msg=lambda x: x.get("content", "").rstrip().endswith("TERMINATE"),
        "work_dir": "agents-workspace",  # Set the working directory for the agents to create files and execute
        "use_docker": False,  # Set to True or specify an image name like "python:3" to use Docker

# The assistant receives a message from the user_proxy, which contains the task description
    message="""Create a posting schedule with captions in Instagram for a week and store it in a .csv file.""",

Make sure to change the message in user_proxy.initiate_chat() to reflect your specific instructions.

When you run the script with the given message, you may notice a new directory called "agents-workspace" with a .csv file inside, created "manually" by the agent. This setup facilitates communication between Autogen and your local LLM server for effective text generation.

6. Setting Up Multiple Agents with Roles and Contexts

In this more advanced scenario, we will create a chat group comprising various agents and humans, each with specific roles and contexts. This approach is akin to a messaging app where their contexts (system messages) dictate their behavior and hierarchy. Here's how we'll set it up:

import autogen

# Use the local LLM server, as before
config_list = [
        "model": "mistral-instruct-7b",  # The name of your running model
        "api_base": "",  # The local address of the API
        "api_type": "open_ai",
        "api_key": "sk-12345678901234567890",  # Just a placeholder

# Set a "universal" config for the agents
agent_config = {
    "seed": 42,  # Change the seed for different trials
    "temperature": 0,
    "config_list": config_list,
    "request_timeout": 120,

# Humans
user_proxy = autogen.UserProxyAgent(
   system_message="A human admin. Engage with the planner for plan discussions. Approval from this administrator is required for plan execution.",

executor = autogen.UserProxyAgent(
    system_message="Executor. Execute the code authored by the engineer and provide a report on the outcome.",
    code_execution_config={"last_n_messages": 3, "work_dir": "paper"},

# Agents
engineer = autogen.AssistantAgent(
    system_message='''Engineer. Please adhere to the following coding guidelines:

1. Only proceed with approved plans.
2. When providing Python/shell code for task solutions, encapsulate the code within a code block and specify the script type.
3. Ensure that the code provided is complete and functional, requiring no modifications from the user.
4. Utilize code blocks only when the intention is to execute the code.
5. Do not include multiple code blocks in a single response.
6. Avoid instructing others to copy and paste results.
7. If an error is encountered during execution, rectify the issue and provide the corrected code.
8. Offer the full code for solutions, not partial or code changes.
9. If the error persists or the task remains unsolved even after successful code execution, critically analyze the problem, revisit initial assumptions, gather necessary additional information, and consider alternative approaches.

scientist = autogen.AssistantAgent(
    system_message="""Scientist. You are expected to follow an approved plan and possess the ability to categorize papers based on their abstracts, without the need for coding."""

planner = autogen.AssistantAgent(
    system_message='''Planner. Recommend a plan and iterate on it in response to feedback from both the administrator and a critic until it gains administrative approval. This plan may encompass collaboration with an engineer, capable of coding, and a scientist who does not write code. Begin by presenting the plan, ensuring clarity regarding the responsibilities of the engineer and the scientist at each step.

critic = autogen.AssistantAgent(
    system_message="Critic. Thoroughly review the plan, claims, and code submitted by other agents and offer constructive feedback. Ensure that the plan encompasses verifiable information, such as source URLs, where applicable.",

# Start the "group chat" between agents and humans
groupchat = autogen.GroupChat(agents=[user_proxy, engineer, scientist, planner, executor, critic], messages=[], max_round=50)
manager = autogen.GroupChatManager(groupchat=groupchat, llm_config=agent_config)

# Start the Chat!
Find recent papers about Multi-agent from arXiv in the last week. Predict the trends.

# To follow up on the previous question, use:
# user_proxy.send(
#     recipient=assistant,
#     message="""Please provide your subsequent response in this space""",
# )

There you have it - your new ensemble of agents ready to tackle complex tasks with clearly defined roles and contexts.

7. Some thoughts

I highly recommend delving deeper into the Autogen documentation to explore the full potential of agency automation. Autogen offers a wide range of capabilities and features, and understanding its capabilities can help you harness its power for a variety of applications.

Once you've gained a solid grasp of how Autogen functions at its core, you might consider using it through a user-friendly interface like Autogen-UI. This can streamline your interactions with Autogen and make it even more accessible for various tasks.

Furthermore, if you're working within a company or organization, you could explore the possibility of integrating Autogen into your company's dashboard or creating a custom interface. This can provide a seamless and customized experience tailored to your specific needs, ensuring that you can efficiently leverage Autogen's capabilities for your projects and tasks.

By thoroughly exploring Autogen and its various interfaces, you'll be better equipped to automate tasks, streamline workflows, and enhance productivity across a wide range of applications, ultimately making the most of this powerful automation tool.

Copyright statement: Unless otherwise stated, all articles on this blog adopt the CC BY-NC-SA 4.0 license agreement. For non-commercial reprints and citations, please indicate the author: Henry, and original article URL. For commercial reprints, please contact the author for authorization.