By DigitalOcean
Today, we’re announcing that Arcee AI’s Trinity Large-Thinking is now available in Public Preview on DigitalOcean’s Agentic Inference Cloud, giving developers the ability to run frontier-class reasoning workloads without managing infrastructure or stitching together complex systems.
DigitalOcean is proud to partner with Arcee to bring Trinity Large-Thinking to AI builders, available via Serverless Inference, on day one. Instantly available and queried directly through the DigitalOcean Cloud Console or API alongside the compute, data, and services you already run on DigitalOcean.
Trinity Large-Thinking didn’t emerge in a vacuum. It’s been pressure-tested in exactly the kind of workloads DigitalOcean is built for.
Arcee is a 26-person San Francisco startup that spent nine months building a full open-weight model family from the ground up, with the explicit goal of producing models developers and enterprises could actually own. The result is a family ranging from 4.5B to 400B parameters, and a top-of-stack reasoning model that has earned its place in production.
In its first two months, Trinity served over 3.4 trillion tokens on OpenRouter, becoming the most-used open weight model in the U.S., driven by always-on, agentic workloads running continuously.
Trinity Large-Thinking builds on that foundation with extended reasoning, stronger multi-turn tool use, and more stable long-running behavior. It ranks #2 on PinchBench (Kilo’s benchmark for agentic model capability) at approximately 96% lower price point than the top-ranking model.
Developers shouldn’t have to choose between a model that can reason and one they can afford to run at scale. Thanks to the partnership between DigitalOcean and Arcee, they don’t have to.
Reasoning workloads are long-running, multi-step, and deeply integrated into the rest of your stack. This is crucial for building agents and complex applications that dynamically interpret unstructured data and execute complex, multi-step action sequences.
On DigitalOcean’s Agentic Inference Cloud, Trinity Large-Thinking runs as part of a complete system and not a standalone model endpoint you have to wire up yourself.
With this launch, you get:
Frontier reasoning at usable economics: #2 on PinchBench for agentic tasks at ~$0.90/M output tokens. Capable enough for complex systems, affordable enough to run continuously.
Integrated infrastructure: Run agents alongside your Kubernetes clusters, databases, and storage. No stitching across vendors.
Instant, serverless access: No provisioning or scaling. Query immediately via API or console, your infrastructure adapts to your workload.
Full model control: Apache 2.0 licensed weights available on Hugging Face. Inspect, fine-tune, distill, or self-host as needed.
This is what the next phase of AI infrastructure looks like: integrated systems where reasoning, data, and compute run together.
As more workloads shift toward continuous, agent-driven execution, the platform they run on matters just as much as the model itself.
Hear more about Trinity Large-Thinking and the partnership between DigitalOcean and Arcee from CEO Mark McQuade at Deploy on April 28th in San Francisco. Save your spot to attend live.
Trinity Large-Thinking is live now in Public Preview on DigitalOcean Serverless Inference in public preview. You can start running advanced reasoning workloads immediately, without managing infrastructure, and without compromising on cost.
Get started quickly using the request below:
curl --location '[https://inference.do-ai.run/v1/chat/completions](https://inference.do-ai.run/v1/chat/completions)' \
--header 'Authorization: Bearer $DO_API_TOKEN' \
--header 'Content-Type: application/json' \
--data '{
"model": "trinity-large-thinking",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
],
"temperature": 0.7,
"max_completion_tokens": 256
}'


