Clarifai 11.4: Faster Model Deployment & Inference with Python SDK

A new Python-based method for model uploading and inference

We have completely revamped the way models are uploaded and used for inference with a new Python-based method that prioritizes simplicity, speed, and developer experience.

Built with a Python-first, user-centric design, this flexible approach simplifies the process of working with models. It allows users to focus more on building and iterating, and less on navigating API mechanics. The new method streamlines inference, accelerates development, and significantly improves overall usability.

Model Upload

The Clarifai Python SDK now makes it even easier to upload custom models. Whether you're using a pre-trained model from Hugging Face or OpenAI, or one you've developed from scratch, integration is seamless. Once uploaded, your model can immediately take advantage of Clarifai's robust platform features.

After import, your model is automatically deployed and ready for use. You can evaluate it, connect it with other models and agent operators in a workflow, or serve inference requests directly.

As part of this release, we’ve significantly simplified how you define the model.py file for custom model uploads. The new ModelClass pattern allows you to implement predict, generate, and streaming methods without the need for extra abstraction or boilerplate. You can get started in just a few lines of code.

Here’s a quick example: a simple method that appends “Hello World” to any input text, with built-in support for different types of streaming responses. Check out the full documentation here.

Inference

The new inference approach offers an efficient, scalable, and simplified way to run predictions with your models.

Designed with a Python-first, developer-friendly focus, it reduces complexity so you can spend more time building and iterating, and less time dealing with low-level API details.

Below is an example of how to make a client-side predict call that corresponds to the predict method defined in the previous section. Checkout the docs here.

New Published Models

Published Llama-4-Scout-17B-16E-Instruct, a powerful model in the Llama 4 series featuring 17 billion parameters and 16 experts for advanced instruction tuning. It supports a native 10 million-token context window (currently 8k supported on Clarifai), making it ideal for multi-document analysis, complex codebase understanding, and personalized, intelligent workflows.
Published Qwen3-30B-A3B-GGUF, the latest addition to the Qwen series. This new release features both dense and mixture-of-experts (MoE) models, with significant improvements in reasoning, instruction-following, agent-based tasks, and multilingual capabilities. The Qwen3-30B-A3B outperforms larger models like QwQ-32B, leveraging fewer active parameters while maintaining strong performance across coding and reasoning benchmarks.

Screenshot 2025-05-12 at 8.46.41 AM

Published OpenAI’s latest o3 model, a powerful and well-rounded LLM that sets a new standard for performance across math, science, coding, and visual reasoning tasks. It is built for complex, multi-step thinking and excels at technical problem-solving, interpreting visual data such as charts and diagrams, high-stakes decision-making, and creative ideation.
Published o4-mini, a smaller model optimized for fast, cost-efficient reasoning. Despite its compact size, o4-mini delivers impressive accuracy on math and coding benchmarks like AIME 2025. It is ideal for use cases that require strong reasoning capabilities while keeping latency and cost low. Both the models are also available on the Playground, Try them out here.

Enhanced the Playground experience

Added automatic mode detection based on the selected model — now intelligently switches between Chat and Vision modes for predictions.
Improved model search and identification for a faster, more accurate selection experience.
Introduced a Personal Access Token (PAT) dropdown, enabling users to easily insert their PAT keys into code snippets.

Screenshot 2025-05-12 at 8.57.59 AM

Implemented dynamic pricing display that updates based on the selected deployment.
The selected deployment ID is now automatically injected into the inference code.

Enhanced the Control Center

We’re bringing a major update to the Control Center with detailed Compute time metrics for Compute Orchestration. This enhancement gives you deeper visibility into how compute resources are used and billed across your workflows:

1. Added Compute Hours in the Overview tab.
2. Added Compute Hours costs in the Costs tab.
3. Added Compute Hours usage details in the Usage tab.
Added Compute Orchestration operations to audit logging: Operations related to clusters, nodepools, and model deployments are now tracked and visible in the Teams & Logs tab within the Control Center.
Introduced new, more efficient and stable chart types with improved tooltips for better data visualization and user experience.
Enhanced the design of the "Total Model Predictions by ID" chart by making the chart clickable, allowing users to navigate directly to the corresponding model. Also introduced other UI refinements for a more intuitive experience.
Adjusted hover cards on charts to stay within the viewport by dynamically lowering their position and adding scrollbars when content exceeds the visible area.

Improved the Community platform

Revamped the Explore page with refreshed visual designs, a featured models showcase, and categorized use cases such as LLMs and VLMs.
Updated the individual model viewer page with an improved UI, direct access to the Playground, deployment listings, and additional enhancements.

Screenshot 2025-05-12 at 1.38.32 PM

Additional Changes

The Home page is now accessible to all users, with sections requiring login automatically hidden for non-logged-in users. A new "Recent Activity" section shows users their most recent actions and operations. We also made improvements to usability, performance, and overall user experience.
New organization accounts now start on the Community plan by default, instead of inheriting the user’s personal plan. This change applies to users on the Community, Essential, and Professional plans. Enterprise users are not affected. The "Member Since" column now shows when a member joined the organization, and Settings pages are hidden from users without the required permissions.
The billing section has been redesigned for a more intuitive credit card management experience. We've added validation to prevent duplicate card entries and support for setting or changing the default credit card.
The Python SDK now supports Pythonic models for a more native experience. We fixed failing tests to improve stability. The CLI is now ~20x faster for most operations, includes config contexts, improved error messages, and corrected return arguments in the model builder. Learn more here.

Ready to start building?

With this Python-first release, uploading and running inference on your custom models is now faster, simpler, and more intuitive than ever. Whether you're integrating a pre-trained model or deploying one you've built from scratch, the Clarifai Python SDK gives you the tools to move from prototype to production with minimal overhead.

Explore the documentation and start building today.

Previous Return to Blog Menu Next

Clarifai 11.4: Faster Model Deployment & Inference with Python SDK

Table of Contents:

A new Python-based method for model uploading and inference

Model Upload

Inference

New Published Models

Enhanced the Playground experience

Enhanced the Control Center

Improved the Community platform

Additional Changes

Ready to start building?

CONTACT

Platform

Solutions

Community

COMPANY

Resources

CONTACT