🚀 E-book
Learn how to master the modern AI infrastructural challenges.
October 24, 2025

Run DeepSeek API - How to Use the DeepSeek API

Table of Contents:

How to use DeepSeek API

DeepSeek API: How to Use DeepSeek via API – Developer Guide

TL;DR

DeepSeek models, including DeepSeek‑R1 and DeepSeek‑V3.1, are accessible directly through the Clarifai platform. You can get started without needing a separate DeepSeek API key or endpoint.

  • Experiment in the Playground: Sign up for a Clarifai account and open the Playground. This lets you test prompts interactively, adjust parameters, and understand the model behavior before integration.
  • Integrate via API: Integrate models via Clarifai’s OpenAI-compatible endpoint by specifying the model URL and authenticating with your Personal Access Token (PAT).


https://api.clarifai.com/v2/ext/openai/v1

Authenticate with your Personal Access Token (PAT) and specify the model URL, such as DeepSeek‑R1 or DeepSeek‑V3.1.

Clarifai handles all hosting, scaling, and orchestration, letting you focus purely on building your application and using the model’s reasoning and chat capabilities.

DeepSeek in 90 Seconds—What and Why

DeepSeek encompasses a range of large language models (LLMs) designed with diverse architectural strategies to optimize performance across various tasks. While some models employ a Mixture-of-Experts (MoE) approach, others utilize dense architectures to balance efficiency and capability.

1. DeepSeek-R1

DeepSeek-R1 is a dense model that integrates reinforcement learning (RL) with knowledge distillation to enhance reasoning capabilities. It employs a standard transformer architecture augmented with Multi-Head Latent Attention (MLA) to improve context handling and reduce memory overhead. This design enables the model to achieve high performance in tasks requiring deep reasoning, such as mathematics and logic.

2. DeepSeek-V3

DeepSeek-V3 adopts a hybrid approach, combining both dense and MoE components. The dense part handles general conversational tasks, while the MoE component activates specialized experts for complex reasoning tasks. This architecture allows the model to efficiently switch between general and specialized modes, optimizing performance across a broad spectrum of applications.

3. Distilled Models

To provide more accessible options, DeepSeek offers distilled versions of its models, such as DeepSeek-R1-Distill-Qwen-7B. These models are smaller in size but retain much of the reasoning and coding capabilities of their larger counterparts. For instance, DeepSeek-R1-Distill-Qwen-7B is based on the Qwen 2.5 architecture and has been fine-tuned with reasoning data generated by DeepSeek-R1, achieving strong performance in mathematical reasoning and general problem-solving tasks.

DeepSeek models were designed to balance reasoning power with efficiency:

  • Dense vs. Mixture‑of‑Experts (MoE): A dense model like R1 uses all its parameters for every token prediction, providing strong reasoning but higher computational cost. MoE architectures (V3) contain multiple “experts”; only a subset of them activate per token, reducing cost while scaling to very large parameter counts. V3 reportedly uses 671 billion total parameters with 37 billion active at inference and offers a 128 k token context window (eight times larger than GPT‑4o’s), making it ideal for long documents.

  • Training strategies: DeepSeek combines supervised learning, RL and knowledge distillation. R1 fine‑tunes on reasoning‑heavy tasks, while V3 uses MoE for dynamic routing. Distilled models compress knowledge from larger versions to 7 billion parameters or less, making them easier to deploy on consumer GPUs.

  • Simple example: Suppose you need to prove a mathematical lemma. R1’s dense reasoning helps chain multiple steps logically. By contrast, summarizing a 50‑page contract is where V3’s long context shines. Distilled models might power an edge device summarizer.

Architecture type

Characteristics

Pros

Cons

Dense (e.g., R1)

All parameters participate in every inference step. RL‑fine‑tuned for reasoning

Excellent logical consistency; good for mathematics, coding, algorithm design

Higher latency and cost; cannot scale beyond ~70 B parameters without huge infrastructure

Hybrid MoE (e.g., V3)

Combines dense core with MoE specialists; only a fraction of experts activate per token

Scales to hundreds of billions of parameters with manageable inference cost; long context windows

Slightly higher engineering complexity; slower to converge during training

Distilled

Smaller models distilled from R1 or V3; typically 7–13 B parameters

Low latency and cost; easy to deploy on edge devices

Slightly reduced reasoning depth; context window limited compared with V3

 

Expert Insights

  • Stanford HAI notes that corporate investment in generative AI reached US $33.9 billion and “AI adoption is now mainstream”. Analysts suggest that such investment is driven by cost reductions from MoE architectures.
  • McKinsey’s 2025 state of AI report observes that more than three‑quarters of organizations use AI, and 21 % have redesigned processes to capitalize on these technologies. This underscores the importance of understanding model architectures.
  • Arch Partnership emphasizes that hybrid MoE models like DeepSeek‑V3 outperform peers on MMLU, DROP and other reasoning benchmarks, indicating that a hybrid design can deliver both breadth and depth.
  • Model size & performance: DeepSeek‑V3 has 671 billion total parameters (37 billion active) and a 128 k context window. Benchmarks show V3 leads on tasks such as MMLU (88.5), DROP (F1 91.6), Codeforces (51.6) and AIME (39.2), outperforming GPT‑4o on some benchmarks.

How to Access DeepSeek API on Clarifai

DeepSeek models can be accessed on Clarifai in three ways: through the Clarifai Playground UI, via the OpenAI-compatible API, or using the Clarifai SDK. Each method provides a different level of control and flexibility, allowing you to experiment, integrate, and deploy models according to your development workflow.

Clarifai Playground

The Playground provides a fast, interactive environment to test prompts and explore model behavior. 

You can select any DeepSeek model, including DeepSeek‑R1, DeepSeek‑V3.1, or distilled versions available on the community. You can input prompts, adjust parameters such as temperature and streaming, and immediately see the model responses. The Playground also allows you to compare multiple models side by side to test and evaluate their responses.

DeepSeek API

Within the Playground itself, you have the option to view the API section, where you can access code snippets in multiple languages, including cURL, Java, JavaScript, Node.js, the OpenAI-compatible API, the Clarifai Python SDK, PHP, and more. 

You can select the language you need, copy the snippet, and directly integrate it into your applications. For more details on testing models and using the Playground, see the Clarifai Playground Quickstart

DeepSeek API

Try it: The Clarifai Playground is the fastest way to test prompts. Navigate to the model page and click “Test in Playground”.

Via the OpenAI‑Compatible API

Clarifai provides a drop-in replacement for the OpenAI API, allowing you to use the same Python or TypeScript client libraries you are familiar with while pointing to Clarifai’s OpenAI-compatible endpoint. Once you have your PAT set as an environment variable, you can call any Clarifai-hosted DeepSeek model by specifying the model URL.

Python Example

import os

from openai import OpenAI

 

client = OpenAI(

    base_url="https://api.clarifai.com/v2/ext/openai/v1",

    api_key=os.environ["CLARIFAI_PAT"]

)

response = client.chat.completions.create(

    model="https://clarifai.com/deepseek-ai/deepseek-chat/models/DeepSeek-R1",

    messages=[

        {"role": "system", "content": "You are a helpful assistant."},

        {"role": "user", "content": "Tell me a three sentence bedtime story about a unicorn."}

    ],

    max_completion_tokens=100,

    temperature=0.7

)

print(response.choices[0].message.content)

TypeScript Example

import OpenAI from "openai";

const client = new OpenAI({

  baseURL: "https://api.clarifai.com/v2/ext/openai/v1",

  apiKey: process.env.CLARIFAI_PAT,

});

 

const response = await client.chat.completions.create({

  model: "https://clarifai.com/deepseek-ai/deepseek-chat/models/DeepSeek-R1",

  messages: [

    { role: "system", content: "You are a helpful assistant." },

    { role: "user", content: "Who are you?" }

  ],

});

console.log(response.choices?.[0]?.message.content);

Clarifai Python SDK

Clarifai’s Python SDK simplifies authentication and model calls, allowing you to interact with DeepSeek models using concise Python code. After setting your PAT, you can initialize a model client and make predictions with just a few lines.

import os

from clarifai.client import Model

model = Model(

    url="https://clarifai.com/deepseek-ai/deepseek-chat/models/DeepSeek-V3_1",

    pat=os.environ["CLARIFAI_PAT"]

)

response = model.predict(

    prompt="What is the future of AI?",

    max_tokens=512,

    temperature=0.7,

    top_p=0.95,

    thinking="False"

)

print(response)

Vercel AI SDK

For modern web applications, the Vercel AI SDK provides a TypeScript toolkit to interact with Clarifai models. It supports the OpenAI-compatible provider, enabling seamless integration.

import { createOpenAICompatible } from "@ai-sdk/openai-compatible";

import { generateText } from "ai";

const clarifai = createOpenAICompatible({

  baseURL: "https://api.clarifai.com/v2/ext/openai/v1",

  apiKey: process.env.CLARIFAI_PAT,

});

const model = clarifai("https://clarifai.com/deepseek-ai/deepseek-chat/models/DeepSeek-R1");

const { text } = await generateText({

  model,

  messages: [

    { role: "system", content: "You are a helpful assistant." },

    { role: "user", content: "What is photosynthesis?" }

  ],

});

console.log(text);

This SDK also supports streaming responses, tool calling, and other advanced features.In addition to the above, DeepSeek models can also be accessed via cURL, PHP, Java, and other languages. For a complete list of integration methods, supported providers, and advanced usage examples, refer to the documentation.

Advanced Inference Patterns - DeepSeek API

DeepSeek models on Clarifai support advanced inference features that make them suitable for production-grade workloads. You can enable streaming for low-latency responses, and tool calling to let the model interact dynamically with external systems or APIs. These capabilities work seamlessly through Clarifai’s OpenAI-compatible API.

Streaming Responses

Streaming returns model output token by token, improving responsiveness in real-time applications like chat interfaces or dashboards. The example below shows how to stream responses from a DeepSeek model hosted on Clarifai.

import os

from openai import OpenAI

# Initialize the OpenAI-compatible client for Clarifai

client = OpenAI(

    base_url="https://api.clarifai.com/v2/ext/openai/v1",

    api_key=os.environ["CLARIFAI_PAT"]

)

# Create a chat completion request with streaming enabled

response = client.chat.completions.create(

    model="https://clarifai.com/deepseek-ai/deepseek-chat/models/DeepSeek-V3_1",

    messages=[

        {"role": "system", "content": "You are a helpful assistant."},

        {"role": "user", "content": "Explain how transformers work in simple terms."}

    ],

    max_completion_tokens=150,

    temperature=0.7,

    stream=True

)

print("Assistant's Response:")

for chunk in response:

    if chunk.choices and chunk.choices[0].delta and chunk.choices[0].delta.content is not None:

        print(chunk.choices[0].delta.content, end="")

print("\n")

Streaming helps you render partial responses as they arrive instead of waiting for the entire output, reducing perceived latency.

Streaming vs. non‑streaming:
In traditional requests, the server returns the entire completion once it is finished. For interactive applications (e.g., chatbots or dashboards), waiting for long responses can harm perceived latency. Streaming returns tokens as soon as they are generated, letting you start rendering partial outputs. Table 3 highlights differences:

Response mode

Pros

Cons

Use cases

Non‑streaming

Simple to implement; no need to aggregate partial responses

User waits until entire response finishes; less responsive

Batch processing, offline summarisation

Streaming

Low perceived latency; allows progressive rendering; can interrupt early

Slightly more code complexity; may require asynchronous handling

Chatbots, dashboards, generative UIs

 

Tool Calling

Tool calling enables a model to invoke external functions during inference, which is especially useful for building AI agents that can interact with APIs, fetch live data, or perform dynamic reasoning. DeepSeek-V3.1 supports tool calling, allowing your agents to make context-aware decisions. Below is an example of defining and using a tool with a DeepSeek model.

import os

from openai import OpenAI

# Initialize the OpenAI-compatible client for Clarifai

client = OpenAI(

    base_url="https://api.clarifai.com/v2/ext/openai/v1",

    api_key=os.environ["CLARIFAI_PAT"]

)

# Define a simple function the model can call

tools = [

    {

        "type": "function",

        "function": {

            "name": "get_weather",

            "description": "Returns the current temperature for a given location.",

            "parameters": {

                "type": "object",

                "properties": {

                    "location": {

                        "type": "string",

                        "description": "City and country, for example 'New York, USA'"

                    }

                },

                "required": ["location"],

                "additionalProperties": False

            }

        }

    }

]

# Create a chat completion request with tool-calling enabled

response = client.chat.completions.create(

    model="https://clarifai.com/deepseek-ai/deepseek-chat/models/DeepSeek-V3_1",

    messages=[

        {"role": "user", "content": "What is the weather like in New York today?"}

    ],

    tools=tools,

    tool_choice='auto'

)

# Print the tool call proposed by the model

tool_calls = response.choices[0].message.tool_calls

print("Tool calls:", tool_calls)

For more advanced inference patterns, including multi-turn reasoning, structured output generation, and extended examples of streaming and tool calling, refer to the documentation

Choosing the right access method. While the Playground is ideal for experimentation, the API offers fine‑grained control and the SDK abstracts away low‑level details. Table 2 summarizes considerations:

Access method

Best for

Advantages

Limitations

Playground

Early exploration and prompt engineering

Interactive UI; easy parameter tuning; model comparison; shows code snippets

Not suitable for automated workflows; manual input

OpenAI‑compatible API

Production and server‑side integration

Drop‑in replacement for the OpenAI API; supports Python, TypeScript and cURL; clarifai handles scaling

Requires writing boilerplate code and handling rate limits

Clarifai Python SDK

Python developers wanting convenience

Simplifies authentication and predictions; supports batching and streaming; handles error management

Less control over HTTP details; only Python

Vercel AI SDK

Modern web and edge applications

TypeScript wrapper; integrates with ai package; supports streaming & tool calling

Requires Node.js environment; adds dependency

 

Which DeepSeek Model Should I Pick?

Clarifai hosts multiple DeepSeek variants. Choosing the right one depends on your task:

  • DeepSeek‑R1use for reasoning and complex code. It excels at mathematical proofs, algorithm design, debugging and logical inference. Expect slower responses due to extended “thinking mode,” and higher token usage.

  • DeepSeek‑V3.1use for general chat and lightweight coding. V3.1 is a hybrid: it can switch between non‑thinking mode (faster, cheaper) and thinking mode (deeper reasoning) within a single model. Ideal for summarization, Q&A and everyday assistant tasks.

  • Distilled models (R1‑Distill‑Qwen‑7B, etc.) – these are smaller versions of the base models, offering lower latency and cost with slightly reduced reasoning depth. Use them when speed matters more than maximal performance.

Model

Architecture & parameters

Context window

Strengths

Recommended tasks

Considerations

DeepSeek‑R1

Dense transformer, RL‑fine‑tuned; ~32 B parameters

32 k tokens

Excellent reasoning and coding ability

Mathematical reasoning, algorithm design, debugging, complex problem solving

Higher latency and cost; smaller context window

DeepSeek‑V3.1

Hybrid MoE + dense; total 671 B (37 B active)

128 k tokens

Balances chat and reasoning; long context; strong performance on MMLU, DROP, code tasks

Summarisation, long‑document Q&A, general chat, reasoning tasks with long inputs

More complex architecture; might require careful prompt engineering

Distilled models (e.g., R1‑Distill‑Qwen‑7B)

Distillation of R1 or V3; ~7 B parameters

8–16 k tokens

Fast and cost‑effective; easy to host on consumer hardware

Real‑time applications, edge devices, moderate reasoning

Lower reasoning depth; shorter context

DeepSeek‑OCR (planned)

OCR model for image‑to‑text

N/A (image input)

Extracts text from images; extends DeepSeek capabilities

Document scanning, receipt digitisation, ID verification

Not yet available on Clarifai (as of Oct 2025)

 

Benefits of DeepSeek API

Why should you choose DeepSeek via Clarifai over self‑hosting or other providers? Key benefits include:

  1. Cost efficiency: MoE architectures like V3 dramatically reduce compute cost per token. The Stanford AI Index notes a 280× drop in inference costs between 2022 and 2024. You pay only for active experts rather than entire dense models.

  2. Scalability & orchestration: Clarifai manages hosting, auto‑scaling and fault tolerance, freeing you from maintaining expensive infrastructure. Combined with Clarifai’s OpenAI‑compatible endpoint, you integrate once and scale instantly.

  3. Open source & MIT license: DeepSeek models are open source under the MIT licence, giving you flexibility to fine‑tune or self‑host if needed.

  4. Reasoning performance: R1 and V3 outperform many peers on reasoning tasks (MMLU, DROP, Codeforces). Distilled models provide near‑comparable performance with lower latency.

  5. Improved productivity: Studies show generative AI adoption yields 15–30 % productivity improvements and average ROI of 3.7×, highlighting tangible returns.

An annotated diagram or infographic could depict these benefits by linking cost declines to MoE architectures and showing ROI uplift percentages.

Expert Insights

  • IDC and Microsoft emphasise that leading organisations see an average ROI of 3.7×, with top performers achieving 10.3×.

  • McKinsey notes that AI adoption is mainstream and business leaders view generative AI as a lever for efficiency and innovation.

  • AmplifAI cautions that 75% of customers worry about data security, and nearly 45% of companies lack AI talent—another reason to use managed platforms like Clarifai that offer secure environments and simplified tooling.

  • Productivity & ROI: 74% of companies meet or exceed their generative‑AI ROI expectations, and productivity improves by 15–30%.

  • Customer interactions: 59% of surveyed firms say generative AI is transforming customer interactions.

Quick Summary: Why use DeepSeek via Clarifai?

Running DeepSeek models on Clarifai offers cost savings, scalability, open‑source flexibility, and leading reasoning performance. AI adoption is mainstream, investment is surging and generative‑AI projects deliver substantial ROI. Clarifai’s managed infrastructure lowers the barrier to entry while ensuring security and compliance.


Use Cases of DeepSeek Models—What You Can Build

DeepSeek’s reasoning abilities and long context make it suitable for various applications:

  1. AI assistants & chatbots: Build customer‑service bots that answer questions, generate summaries and schedule appointments. Streaming responses provide real‑time feedback and improved user satisfaction.

  2. Coding copilots: Use R1 for complex code generation, debugging and algorithm design. Its dense architecture excels at step‑wise reasoning.

  3. Document analysis & summarisation: V3’s 128 k context window allows ingestion of long contracts or research papers for summarisation, extraction and Q&A.

  4. Education & tutoring: Create educational tutors that explain concepts step by step and check mathematical proofs.

  5. Content moderation & policy enforcement: Distilled models can classify text or images quickly, making them ideal for edge deployments.

Additionally, upcoming models such as DeepSeek‑OCR (announced but not yet available at the time of writing) will enable text extraction from images and documents, further broadening use cases.

Framework: LLM Use‑Case Decision Checklist

  1. Task complexity: Does the task require deep reasoning (R1), general chat (V3) or quick responses (distilled)?

  2. Context length: Does the input exceed ~8 k tokens? If so, choose a model with a large context window like V3.

  3. Latency & cost constraints: If your application runs on edge devices or needs fast responses, consider distilled models.

  4. Data sensitivity: Ensure you have appropriate data anonymisation and security in place before sending prompts.

  5. Integration path: Determine whether a chat interface (streaming) or tool‑calling agent is needed.

Stats & Data

  • Workflow redesign: McKinsey reports that 21 % of organisations have redesigned workflows to incorporate AI.
  • Customer transformation: 59 % of companies see generative AI transforming customer interactions and expect improvements in customer service.
  • Revenue & cost impacts: Sequencr’s analysis shows that generative AI leads to 15.2 % cost savings, 10 % revenue increase and productivity gains of 15 %–30 %.

Expert Insights

  • NVIDIA points out that models with long context windows like V3 enable new workflows such as contract review and code‑base analysis that were previously difficult with 4 k–8 k context limits.
  • Microsoft/IDC note that the average company sees results within eight months and that the top ROI comes from use cases that streamline knowledge‑worker tasks.

Quick Summary: DeepSeek Use Cases

DeepSeek models can power chatbots, coding copilots, document analysers, tutors and moderation tools. Evaluate task complexity, context length, latency and integration requirements. Adoption is growing: 21 % of organisations have redesigned workflows, and generative AI delivers significant cost savings and revenue gains.

Frequently Asked Questions (FAQs)

Q1: Do I need a DeepSeek API key?
No. When using Clarifai, you only need a Clarifai Personal Access Token. Do not use or expose the DeepSeek API key unless you are calling DeepSeek directly (which this guide does not cover).

Q2: How do I switch between models in code?
Change the model value to the Clarifai model ID, such as openai/deepseek-ai/deepseek-chat/models/DeepSeek-R1 for R1 or openai/deepseek-ai/deepseek-chat/models/DeepSeek-V3.1 for V3.1.

Q3: What parameters can I tweak?
You can adjust temperature, top_p and max_tokens to control randomness, sampling breadth and output length. For streaming responses, set stream=True. Tool calling requires defining a tool schema.

Q4: Are there rate limits?
Clarifai enforces soft rate limits per PAT. Implement exponential backoff and avoid retrying 4XX errors. For high throughput, contact Clarifai to increase quotas.

Q5: Is my data secure?
Clarifai processes requests in secure environments and complies with major data‑protection standards. Store your PAT securely and avoid including sensitive data in prompts unless necessary.

Q6: Can I fine‑tune DeepSeek models?
DeepSeek models are MIT‑licensed. Clarifai plans to offer private hosting and fine‑tuning for enterprise customers in the near future. Until then, you can download and fine‑tune the open‑source models on your own infrastructure.

Conclusion

You now have a fast, standard way to integrate DeepSeek models, including R1, V3.1, and distilled variants, into your applications. Clarifai handles all infrastructure, scaling, and orchestration. No separate DeepSeek key or complex setup is needed. Try the models today through the Clarifai Playground or API and integrate them into your applications.

Sumanth Papareddy
WRITTEN BY

Sumanth Papareddy

ML/DEVELOPER ADVOCATE AT CLARIFAI

Developer advocate specialized in Machine learning. Summanth work at Clarifai, where he helps developers to get the most out of their ML efforts. He usually writes  about Compute orchestration, Computer vision and new trends on AI and technology.

Developer advocate specialized in Machine learning. Summanth work at Clarifai, where he helps developers to get the most out of their ML efforts. He usually writes  about Compute orchestration, Computer vision and new trends on AI and technology.