🚀 E-book
Learn how to master the modern AI infrastructural challenges.
September 29, 2025

Building AI Agents with Agno and GPT-OSS 120B

Table of Contents:

Introduction

Modern AI applications increasingly rely on intelligent agents that do more than chat; they reason, search, and collaborate. By using Agno, a lightweight framework, and Clarifai’s GPT-OSS 120B, an open-source large language model accessible through an OpenAI-compatible API, you can create sophisticated agents with minimal setup.

This tutorial walks you through three progressively advanced examples:

  1. A web-search agent that answers current events questions.

  2. A knowledge-based agent that accesses domain-specific information.

  3. A multi-agent system where specialized agents work together.

You will also find instructions for setting up your environment and a link to a Colab notebook with the full code so you can follow along.

Setting Up the Environment

To get started, install Agno along with libraries for search, PDF processing, vector storage, finance data, and the Clarifai SDK:

Make sure you have a Clarifai Personal Access Token (PAT) and set it as an environment variable so your agents can authenticate to access GPT-OSS-120B model from Clarifai.

1. A Simple Agent with Web Search

The first example creates an agent that combines GPT-OSS 120B with DuckDuckGo search to answer questions about recent events. The language model interprets the query, the search tool fetches live information, and the agent then assembles a coherent response.

This straightforward setup demonstrates how easily you can combine reasoning with web search. It serves as the foundation for more complex agents.

2. Adding a Knowledge Base

Real-world applications often require access to proprietary or specialized data. In this example, you’ll build a Thai cuisine expert using a recipes PDF. The process includes:

  • Embedding the document with text-embedding-ada-002 from the Clarifai community. 

  • Storing the vectors in LanceDB for efficient retrieval.

  • Configuring the agent to consult its knowledge base first, and only fall back to web search if necessary.

The agent returns a grounded recipe from the PDF and uses web search as a fallback. This approach is essential for building domain experts that rely on proprietary or internal data sources.

3. Coordinating Multiple Agents

For complex scenarios, multi-agent orchestration can help divide and conquer tasks. Agno supports teams of agents, enabling specialization and collaboration. In this example:

  • A Web Research Agent fetches news and current information.

  • A Financial Analysis Agent pulls stock and market data.

  • A Coordinator synthesizes their outputs into a single response.

Here, each agent plays a distinct role, demonstrating how specialization leads to more comprehensive answers. This architecture is ideal for domains such as market research, technical analysis, or any multi-faceted problem that benefits from teamwork.

Conclusion

This walkthrough showcased how to build progressively more capable agents with Agno and GPT-OSS 120B:

  • Simple Web-Search Agent: A quick way to combine language understanding with live data.

  • Knowledge-Based Domain Expert: An agent that draws from proprietary data and uses web search only when needed.

  • Multi-Agent System: A coordinated approach where specialized agents collaborate to solve complex problems.

Each stage adds new capabilities, enabling you to build more advanced systems. For many use cases, a simple web-search agent may suffice. For specialized assistants or research tools, embedding your own data is crucial. And for multi-domain tasks, orchestrating multiple agents can be incredibly powerful.

There is no one-size-fits-all agent—each implementation can be fully customized based on your specific needs, business objectives, and domain requirements.

You can extend these patterns by building multi-agent teams, integrating domain-specific APIs, or experimenting with different agent designs such as coordinator-agent, collaborative-agent, or specialized-task agents. These approaches enable the creation of flexible, adaptive AI systems that can be tailored to solve complex, real-world challenges efficiently and effectively. To explore the examples in this tutorial, check out this notebook

Agentic AI workflows are computationally demanding because they involve multiple agents interacting, reasoning over large contexts, and responding in real time. To operate effectively, these workloads require both high throughput and low latency.

The Clarifai Reasoning Engine provides the computational efficiency required for such workflows. Independent benchmarks by Artificial Analysis on the GPT-OSS-120B model show that it can process over 500 tokens per second with 0.3 seconds to first token, demonstrating the kind of performance that enables responsive and scalable multi-agent systems. You can try out the GPT-OSS-120B model.