Deepseek V3.1 Terminus (Tested) + Roo,Kilo,Cline: This is THE BEST OPEN CODER RIGHT NOW!

Summary of Deepseek V3.1 Terminus Model Review

This video provides an in-depth review of the new Deepseek V3.1 Terminus model, highlighting its improvements in tool calling, function calling, and overall reliability, particularly in coding tasks.

Main Points

Deepseek V3.1 Terminus Launch: Deepseek has released its latest model, V3.1 Terminus, with dropped weights and resources. [0:04-0:36]
"Terminus" Naming: The name "Terminus" is speculated to signify its position as the final model in the V3 lineup, with V4 to follow. [0:34-1:08]
Key Improvements:
- Enhanced tool and function calling capabilities. [0:04-0:36]
- Elimination of "finicky Chinese token things" for increased reliability. [0:04-0:36]
- Improved performance in coding tasks, making it suitable for AI coders like those in Kilo Code or Rue. [0:34-1:08]
Performance Benchmarks:
- Scores third position on non-agentic benchmark tasks. [2:07-2:44]
- Reasoning Model: The reasoning model has seen a decline in performance compared to previous versions, struggling to complete math questions and generally performing poorly. [2:07-2:44, 4:14-4:47]
- Non-Reasoning Model: The non-reasoning model is highlighted as the major improvement and performs "quite good." [2:07-2:44, 5:46-6:19]
Agentic and Non-Agentic Test Results:
- Successful Tests: Floor plan generation, SVG Panda with a burger, chess board with autoplay, Kandinsky style Minecraft, butterfly in a garden, CLI tool in Rust. [2:40-3:46]
- Unsuccessful Tests: Pokeball generation. [2:40-3:15]
- General Tasks: Performs well and avoids unnecessary reasoning, which was an issue with previous models leading to increased costs and poor user experience. [3:43-4:17]
Accessing the Model:
- Recommended Method: Use Deepseek's official API or Requesty's Deepseek chat endpoint for optimal performance, especially for the non-reasoning model. [4:44-5:18, 5:15-5:49]
- Issues with Other Platforms: Kilo Code, Rue, Open Router, and Klein are noted to force the reasoning model, leading to a worse experience. [4:44-5:18, 5:46-6:19]
Coding Test Results (Non-Reasoning Model via Requesty/Official API):
- Movie Tracker App: No diff failures or terminal errors, with human-like and colorful design, featuring search and calendar views (though the calendar has a minor font/background issue). [6:16-7:53]
- TUI Calculator in Go: Implemented well, looks like a proper calculator. [7:51-8:27]
- Godo Game Implementation: One of the few open models capable of working with Godo, successfully implementing a life bar affected by jumps and a step counter. [8:23-8:58]
- Open Code: Fails on massive tasks due to context limits, placing second after Codebuff. [8:54-9:27]
Overall Coding Performance: The non-reasoning model is considered the best open coding model yet, outperforming GLM, Kimmy, Codeex, and others, despite a 50-cent cost for one task. [8:54-9:27, 9:24-10:00]
Context Window: Still has a 128,000 context window, which can be limiting. Biterover can be used to extend memory. [9:24-10:00, 10:27-10:31]

Key Takeaways

Deepseek V3.1 Terminus is a significant upgrade, especially for coding tasks, when using the non-reasoning model. [5:46-6:19]
Avoid platforms that force the reasoning model (e.g., Open Router, Kilo Code, Rue) if you want to leverage the strengths of Terminus. [4:44-5:18]
Requesty or Deepseek's official API are the recommended ways to access the model for optimal results. [4:44-5:18, 5:15-5:49]
The reasoning model is currently a weak point and should be avoided for complex reasoning tasks. [4:14-4:47]
Biterover can be integrated to manage and share memories for long-running coding projects. [9:24-10:00, 10:27-10:31]

Ninja Chat Promotion

The video also features a promotion for Ninja Chat, an AI platform offering access to models like GPT40, Claude 4 Sonnet, and Gemini 2.5 Pro for $11/month, with a comparison playground and mind map generator. [1:06-1:38] Discount codes are provided. [1:36-2:10]