How to Build and Run a Self-Improving AI Agent with Hermes on NVIDIA Hardware

By • min read

Introduction

Agentic AI is transforming how we work, and the open-source community has embraced frameworks that make self-improving agents a reality. Hermes Agent, developed by Nous Research, has quickly become the most used agent worldwide on OpenRouter, with over 140,000 GitHub stars in under three months. It’s designed for reliability and continuous learning—capabilities that were historically difficult to achieve. By running locally on NVIDIA RTX PCs, RTX PRO workstations, or DGX Spark, you get always-on, high-speed performance. This guide walks you through setting up Hermes and leveraging its unique features, such as self-evolving skills and contained sub-agents, while pairing it with Qwen 3.6 models for a powerful, local AI assistant.

How to Build and Run a Self-Improving AI Agent with Hermes on NVIDIA Hardware — Source: blogs.nvidia.com

What You Need

Hardware: An NVIDIA RTX GPU (e.g., RTX 30-series or newer), NVIDIA RTX PRO workstation, or NVIDIA DGX Spark. Minimum 20GB VRAM recommended for Qwen 3.6 35B model; 8GB+ for smaller models.
Software: Python 3.10+, Git, and NVIDIA CUDA Toolkit (12.x) with cuDNN. A package manager like pip or conda.
Optional: A local LLM such as Qwen 3.6 27B or 35B (Alibaba) to maximize Hermes’s self-improvement capabilities.
Network: Internet connection for initial downloads; Hermes runs offline after setup.

Step-by-Step Guide

Step 1: Verify Your Hardware Environment

Ensure your system meets the requirements for running local AI agents. Hermes is optimized for always-on use on NVIDIA hardware. Check that your GPU has sufficient VRAM (e.g., at least 8GB for smaller models, 20GB+ for Qwen 3.6 35B). For the best experience, use an NVIDIA RTX GPU or a DGX Spark, which provides the computational power needed for 24/7 operation without cloud dependencies. Update your NVIDIA drivers to the latest version to ensure compatibility with CUDA and PyTorch.

Step 2: Install Hermes Agent Framework

Clone the official Hermes repository from GitHub. Open a terminal and run:

git clone https://github.com/NousResearch/Hermes.git
cd Hermes

Create a Python virtual environment to avoid conflicts:

python -m venv herm-env
source herm-env/bin/activate  # On Windows: herm-env\Scripts\activate

Install the required dependencies:

pip install -r requirements.txt

This pulls in libraries for model loading, tool integration, and GPU acceleration.

Step 3: Configure Hermes for Local Execution

Hermes is provider- and model-agnostic, but for local use you’ll need to set it to run on your NVIDIA GPU. Edit the configuration file (usually config.yaml or .env) to specify:

Model path: Point to a local LLM (e.g., Qwen 3.6 35B or another model you have downloaded).
Device: Set to cuda to leverage GPU acceleration.
Always-on mode: Enable persistent execution so Hermes runs continuously.

Example snippet:

model:
  path: "/path/to/qwen3.6-35b"
  device: "cuda"
agent:
  persistent: true

Save the file and test the configuration by running a simple command like python herm.py --check.

Step 4: (Optional) Download and Integrate Qwen 3.6 Models

For the best performance with Hermes, use the Qwen 3.6 series from Alibaba. These open-weight LLMs are designed for local agents. The 35B model runs on ~20GB VRAM and outperforms previous 120B models. Download the model from Hugging Face or official repository:

pip install huggingface-hub
huggingface-cli download Qwen/Qwen3.6-35B

Then update your Hermes configuration to point to the downloaded model folder. The 27B variant is also available and delivers accuracy matching 400B-parameter predecessors, making it ideal for lower-memory systems.

Step 5: Launch Hermes and Explore Core Capabilities

Start the agent with:

python herm.py

Once running, you can interact via command line, integrate with messaging apps, or allow file access. Hermes’s unique features become active automatically:

Self-Evolving Skills: When you give feedback or assign a complex task, Hermes writes a new skill and saves it for future use. It learns and adapts over time.
Contained Sub-Agents: For multi-step tasks, Hermes spawns isolated sub-agents with focused contexts. This keeps the main agent efficient and avoids confusion.
Reliability by Design: Every skill and tool is curated by Nous Research, so you don’t need constant debugging—even with 30B-parameter local models.

Try giving it a challenging task like “Organize my documents by project and summarize each folder” and observe how it refines its approach.

Step 6: Enable Self-Improvement Through Feedback

The real power of Hermes is its ability to improve itself. After each interaction, provide explicit feedback (e.g., “That worked well, save the method” or “Please try a different approach”). Hermes records these learnings as new skills. Over time, it becomes more efficient and accurate without manual reprogramming. You can also review the skill library by calling herm.skills.list() to see what it has learned.

Tips for Success

Start with Qwen 3.6 27B if your GPU has less than 20GB VRAM; it matches the accuracy of much larger models and runs smoothly on RTX 3060+.
Monitor VRAM usage with nvidia-smi to avoid out-of-memory errors. Stop other GPU tasks while running Hermes.
Use DGX Spark for 24/7 operation—its design is perfect for always-on local agents without overheating or power draw issues.
Back up your skill repository regularly; it contains your customized improvements and can be shared across installations.
Test with identical models in other frameworks to see Hermes’s orchestration advantage: you’ll get better results without tuning.
Join the community on Nous Research’s Discord or GitHub for shared skills and troubleshooting tips.

By following these steps, you turn your NVIDIA-powered PC into a self-improving AI assistant that works locally, privately, and reliably. Whether you’re automating workflows, managing files, or exploring agentic AI, Hermes with Qwen 3.6 unlocks a new level of productivity.