Access Trinity Mini with an API

TL;DR

Arcee Trinity Mini is an advanced AI model designed to deliver strong reasoning, coding, and math capabilities while being efficient with computing resources. It uses a mixture-of-experts architecture, activating only about 3 billion of its 26 billion parameters for each task. This approach makes it faster and more cost-effective to run than many larger models.

You can run Trinity Mini directly on Clarifai using the Playground for quick tests and experimentation or access the model through Clarifai’s OpenAI-compatible API for seamless integration into your applications and workflows.

Introduction

When we think of reasoning models, top-tier models like OpenAI GPT-5.2 and Google Gemini 3 Pro usually come to mind. However, open-weight models offer comparable performance while giving developers greater control and customization options.

One such model is Arcee Trinity Mini, a U.S.-built, open-weight model from Arcee AI designed specifically for real-world production workflows. It excels at multi-step reasoning, coding, and generating structured outputs, making it an excellent choice for applications requiring precision and efficiency.

In this guide, you will learn how Trinity Mini works, how to access it via API through Clarifai and how to start using it in your own application.

What is Arcee Trinity Mini?

Arcee Trinity Mini is a powerful open‑weight language model developed by Arcee AI. It is part of the Trinity family of models that are built for real‑world applications such as multi‑turn conversations, tool use, structured outputs, and reasoning tasks. Trinity Mini is designed to perform reliably in production environments, whether you run it in the cloud, on‑premises, or through a hosted API. Its consistent capabilities make it a strong choice for developers and teams aiming to build advanced AI systems with predictable performance.

While major closed models often dominate the spotlight, Trinity Mini provides an open‑weight alternative that offers developers more control and flexibility. It lets you tailor the model for your workflows without being locked into proprietary ecosystems.

Key Features and Benefits

Trinity Mini fills a growing need for efficient and customizable models that can be deployed at scale. Here are the key features that make it valuable for both developers and businesses:

Multi-step Reasoning and Tool Orchestration
Trinity Mini is built to manage complex tasks that require multiple reasoning steps and interaction with external tools. This makes it ideal for building agent pipelines where the model needs to perform sequences of actions, such as querying databases, calling APIs, or generating code dynamically.

Long Context Support (128K Tokens)
The model supports a context window of up to 128,000 tokens. This allows it to maintain continuity over long documents, multi-turn conversations, or detailed workflows without losing track of relevant information. Such extended context capabilities are valuable for use cases like legal document review, research summaries, or any scenario that demands deep understanding over lengthy inputs.

Structured Output with JSON Schema Enforcement
Trinity Mini enforces output formats through native JSON schema adherence. This means the responses conform to predefined structures, minimizing the need for complex parsing or error handling on the client side. This feature is essential for integrating the model’s output directly into automated systems and pipelines, improving reliability and reducing development overhead.

Efficient Performance and Throughput
Thanks to its sparse Mixture-of-Experts (MoE) architecture, Trinity Mini activates only a fraction of its total parameters per token, allowing it to deliver reasoning power comparable to much larger dense models at a fraction of the compute cost. This design enables it to handle hundreds of API requests per second on a single Nvidia A100 GPU, supporting scalable and cost-effective deployment in production environments.

Accessing Arcee Trinity Mini via Clarifai

Prerequisites

Getting started with Arcee Trinity Mini through the Clarifai API is straightforward. Follow these steps to set up your environment and authenticate.

Clarifai Account: Sign up at clarifai.com to gain access to the platform’s AI models.
Personal Access Token (PAT): You need a PAT to authenticate your API requests. Get one by navigating to Settings > Secrets in your Clarifai dashboard and creating or copying your token.
SDKs: Clarifai provides SDKs for Python and Node.js, and also supports OpenAI-compatible clients. For detailed instructions and to install other SDKs, visit the Clarifai Quickstart Guide.
Authentication and Setup: To authenticate your API requests, set your Personal Access Token as an environment variable:

API Usage

Here’s how to make your first API call to the Arcee Trinity Mini model using different methods.

Using Python SDK:

Using Node.js SDK:

Using OpenAI-Compatible Python Client

Using the Playground

For quick experimentation and validation, you can use the Clarifai Playground to interact with Arcee Trinity Mini directly in the browser. This is useful for testing prompts, exploring model behavior, and verifying outputs without writing any code.

Screenshot 2026-01-26 at 2.48.46 PM

Benchmark Performance of Trinity Mini

Arcee Trinity Mini delivers impressive reasoning and tool-calling capabilities while maintaining high efficiency. Here’s how it performs across several challenging benchmarks:

Reasoning Accuracy

MMLU (Zero-Shot): Trinity Mini scores 84.95% across 57 subjects, including math, law, and science, demonstrating strong general knowledge and reasoning skills without task-specific training.
Math-500: It achieves 92.10% on this advanced math reasoning benchmark, showing solid proficiency in complex calculations and problem-solving.
GPQA-Diamond: On graduate-level science questions, Trinity Mini reaches 58.55%, reflecting its ability to handle specialized and technical content.

Tool Calling and Structured Output

BFCL v3 (Function Calling): With 59.67%, Trinity Mini reliably generates responses that strictly adhere to JSON schema requirements, making it ideal for agent workflows that depend on structured data.
MUSR (Multi-Step Reasoning): The model attains 63.49% accuracy on tasks requiring sequential, logical steps, highlighting its multi-turn reasoning strength.

Throughput and Scalability

Processes over 200 tokens per second on a single A100 GPU using bfloat16 precision.
Activates only about 3 billion parameters per token, compared to 8–14 billion for similar dense models, resulting in significant compute savings.
Supports an extended 128,000-token context window without the memory overhead typically associated with long contexts, enabling robust understanding of large documents or conversations.

Benchmark Comparison Table

Benchmark	Trinity Mini	LLaMA-3.1-8B	Qwen-2.5-7B	Mistral-class	Gemini-class
SimpleQA	8.90	9.10	6.50	10.70	—
MUSR	63.49	64.40	64.47	56.30	—
MMLU (Zero-Shot)	84.95	87.26	85.58	82.30	83.02
Math-500	92.10	95.00	90.20	87.40	95.80
GPQA-Diamond	58.55	70.05	65.40	55.00	60.91
BFCL v3	59.67	53.01	—	48.25	—

Applications and Use Cases

Arcee Trinity Mini is well suited for a wide range of real-world applications where reasoning quality, long context handling, and structured outputs are essential.

Conversational AI Applications

Trinity Mini can power conversational systems that go beyond simple question answering. Its ability to maintain long context makes it ideal for multi-turn customer support chatbots that need to remember prior messages, user preferences, or earlier troubleshooting steps. It also works well for virtual assistants that integrate with tools or APIs, such as fetching data, triggering actions, or returning structured responses. In addition, the model can support interactive documentation or knowledge base experiences, where users explore technical content through natural language conversations.

Agentic Workflows

For agent-based systems, Trinity Mini provides strong multi-step reasoning and reliable tool calling. This enables agent workflows that plan actions, invoke external tools, and refine results over several steps. It is particularly useful for workflow automation, where the model generates structured outputs that downstream systems can consume without extra parsing. Trinity Mini also fits naturally into retrieval-augmented generation (RAG) pipelines, where its extended context window allows it to reason over large retrieved documents while maintaining coherence.

Enterprise Integration

In enterprise environments, Trinity Mini offers an efficient path to production deployment. Its performance characteristics make it suitable for cost-conscious, high-throughput applications accessed through APIs. Teams can use it to build internal tools with natural language interfaces, allowing employees to query systems or generate insights without specialized training. The model is also well suited for document analysis and processing pipelines, where its 128K context support enables it to handle long reports, contracts, or technical documents in a single pass.

Conclusion

Arcee Trinity Mini offers a powerful combination of efficient architecture, advanced reasoning capabilities, and support for long-context understanding. It is an excellent choice for developers and businesses looking to build sophisticated AI applications. Its sparse mixture-of-experts design delivers high performance on challenging benchmarks while keeping compute costs manageable. With native support for structured outputs and function calling, Trinity Mini fits naturally into agent workflows, conversational AI, and complex document processing pipelines.

By accessing Trinity Mini through Clarifai’s robust API, you can quickly integrate these capabilities into your projects, whether you are building chatbots, automation systems, or data analysis tools. Start experimenting today in the Clarifai Playground or dive straight into API integration to unlock the full potential of this versatile model.

To learn more and get started:

Join our Discord for updates.
Contact us for integration help.
Sign up to access top models.

Previous Return to Blog Menu Next