DEV Community

xiaoqiangapi
xiaoqiangapi

Posted on • Edited on

How I Tested DeepSeek, Zhipu, and MiniMax API Latency from Overseas: Full Data & Method

DeepSeek: 1.45s avg TTFT. Zhipu: 1.98s. MiniMax: 2.30s. Here's how I tested them as a non-coder, and what I learned.

🎯 ** are you confused too? **
1.Is it fast to call the new API relay? Are there any objective and comparable data?
2.Those professional testing tools are complex to configure and require writing scripts. I simply don't have time to tinker with them.
3.Is there a simple, reproducible way for me to verify the authenticity myself without deep learning?

** What this article provides: **
βœ… a simple and easy-to-operate TTFT test method (not the standard answer)
βœ… Specific test tools and operation steps
βœ… Cost accounting - less than 0.005 cents for three calls

❌ This is not an "authoritative performance testing guide," I am a developer who is transitioning from a Chinese teacher. ** Inviting tech experts to share more professional testing methods! **


1. Why Should I test it Myself? Who am I? And the real experience of choosing tools

I'm @xiaoqiangapi, an entrepreneur who has taught Chinese for over a decade and now works as a Chinese LLM API intermediary for global developers.

My API is connected to DeepSeek, Zhipu GLM, and MiniMax. Users often ask: "How fast is your transit when I use it overseas? What are the actual data?"

To be honest, I can't just say "very fast." So I decided to ** test it myself ** to answer the question in the most intuitive way.

But at the beginning, I also encountered some difficulties.
*When I first searched for "API testing tools", Postman, Insomnia, Apidog popped up... I clicked on the Postman page and saw a screen full of tabs, environment variables, collections, scripts - to be honest, as a newly transitioned teacher, my first reaction was, "I probably can't handle this." I don't want to get stuck on the configuration of one tool for days. So I looked up articles and comparisons again, and finally chose Apidog -- because it offers a graphical interface and a free plan. There's no need to learn the script from scratch. After opening it, you can debug with just a few taps, which is friendly to beginners like me.
*
Not knowing what metrics would be considered "fair" ** : At the beginning, I only focused on the total time (that is, the full time shown on the Timeline). Later, I realized that for streaming output LLMS, ** First Word response time (TTFT) ** is the metric that most affects the user experience, and the time users wait for the first word determines their psychological perception of "fast or not" . You generate fast overall but wait three seconds for the first word, and users already have a preconceived notion of "slow".

Based on these experiences, I've figured out a very simple method.

2 Test Environments and Methods (Reproducible, comparable)

-- I chose regular broadband in Seoul, South Korea, to simulate the scenario when most overseas developers access.

** Parameter configuration: **

  • * * * * API address: https://api.xiaoqiangonline.shop/v1/chat/completions relay gateways (I)
  • ** Test tools ** : Apidog + mobile stopwatch
  • Test model ** :
    • DeepSeek (deepseek-chat)
    • Zhipu GLM (' GLM-4-flash-250414 ')
    • MiniMax (' minimax-M2.7 ')
  • ** Test Method ** : stream output (' stream: true '), and take the ** initial response time (TTFT) ** three consecutive times
  • ** test question ** : Korean everyday conversation '"μ•ˆλ…•ν•˜μ„Έμš”? Youdaoplaceholder0. Youdaoplaceholder1? '(Hello, it's a nice day. Could you say hello briefly?)

πŸ“Œ ** Special notes on Apidog ** : According to official Apidog 2026 data, it has become the trusted full lifecycle development platform for over 500,000 teams worldwide, integrating design, debugging, testing, mocks, and documentation . But the total time it shows (overall response time) is not "first word delay", so I measured it with a mobile phone stopwatch.

3 Measured Data: TTFT Values I Measured (Honest and Open)

Here are the results of taking the average of the TTFT three times for each model:

/ Model/Time 1 / Time 2 / Time 3 / ** Average First Letter Response (TTFT)**
|------|-------|-------|-------|-----------------|
DeepSeek | * * * * | s | s | s | 1.55 1.38 1.42 1.45 seconds * * * * ⚑ |
** Zhipu GLM** 1.95s / 2.02s / 1.97s / ** 1.98s / **
S | | | MiniMax * * * * 2.28 2.35 s 2.27 s | | | * * * * 2.30 seconds

The data is honest. I don't glorify or fabricate: DeepSeek takes the lead in first-word response time, Zhipu is stable, and MiniMax is slightly slower but still in the smooth range.

4 Specific Operation Steps (You can reproduce and verify immediately)

(1.) Create a new POST request in Apidog
URL: https://api.xiaoqiangonline.shop/v1/chat/completions
Method: POST
(2.) Add Headers
text
Authorization: Bearer your API KeyContent-Type: application/json
(3.) Fill in the request Body (JSON, must include "stream": true)
json
{ "model": "deepseek-chat", "messages": [ { "role": "user", "content": "μ•ˆ λ…• ν•˜ μ„Έ μš”? 였 늘 λ‚  씨 κ°€ μ’‹ λ„€ μš”. κ°„ 단 ν•œ 인 사 ν•œ 마 λ”” ν•΄ 쀄 수 있 μ–΄ μš”?"}], "stream" : true}
(4.) send requests and time
Click the Send button with your mouse and start your phone's stopwatch immediately.
Observe the Apidog response area: Stop the stopwatch immediately when the first text snippet appears on the screen.
Record the time (that's the TTFT - First Word delay).
Repeat each model three times and take the average.

(5.) Notes
Youdaoplaceholder0 Do not look at the "total time" that Apidog automatically displays -- that is the full response time, not the first word delay.
Youdaoplaceholder0 If the API does not return a stream (i.e., a full JSON at once), TTFT cannot be measured; only total time consumption can be measured.

Here are three test screenshots of the Chinese model

deepseek test data
Apidog screenshot: MiniMax request shows 3.58seconds total time, first word delay measured with phone stopwatch is 1.4 seconds

πŸ“Œ Note: Apidog in the screenshot shows "total time" (3.58 seconds), not the first word delay. The first word delay was measured with a phone stopwatch. Here only one screenshot of the test is shown as an example.

Zhipu GLM actual test data
Apidog screenshot: Total time for Zhipu GLM API request is 4.61 seconds, first word delay measured by phone stopwatch is 1.95 seconds

πŸ“Œ Note: The "total time" (4.61 seconds) shown in Apidog is the end-to-end latency, not the first word delay. The first word delay was measured with a phone stopwatch. Only one test screenshot is shown as an example.

Minimax test data
Apidog screenshot: Total time for MiniMax API request is 5.29 seconds, first word delay measured by phone stopwatch is 2.28 seconds

πŸ“Œ Note: The "total time" (5.29 seconds) shown in Apidog is the end-to-end latency, not the first word delay. The first word delay was measured with a phone stopwatch. Only one test screenshot is shown as an example.

5. Do a cost account by the way: Is it really "fast and economical"?

!(https://dev-to-uploads.s3.amazonaws.com/uploads/articles/nnmb119ulyj3xjgckb7h.png)
Here are the tokens and amounts I actually consumed:
Total consumption for 3 tests: 528 tokens
Current experience package pricing: $5/500,000 Token
Equivalent cost β‰ˆ 0.005 US dollars (half a cent)
If the API is called 1,000 times a day, the total cost of the Token each month would be about $5.
The conclusion is that the intermediary channels for AI in China are not only fast but also very cost-effective for start-up individuals and teams.


(More test data I will keep updating on my X account)

6.Does the model itself have "personality" differences?

During the test, I noticed that in addition to the differences in speed, there are also generational differences in style among the large models:
Different models have different focuses in terms of speed and style:
DeepSeek (1.45 seconds) : The response is straightforward and concise, suitable for real-time customer service, chatbots, and other scenarios that are sensitive to the first character delay.
Zhipu GLM (1.98 seconds) : Logical and well-structured, suitable for generating long content and organizing reports.
MiniMax (2.30 seconds) : Smooth and natural, with rich details, suitable for open scenarios such as casual chatting and creative writing.

7.speed and style are just appearances. What really matters to developers are the following three quantifiable conclusions

βœ… tests take less than an hour altogether - much faster than you might think
βœ… Apidog with a graphical interface and a mobile stopwatch can be reproduced even with zero coding experience
βœ… How is the latency of the Chinese LLM API overseas? β†’ Based on my actual test of the transit gateway, DeepSeek has a first-word delay of about 1.45 seconds, Zhipu 1.98 seconds, and MiniMax about 2.30 seconds, all of which can meet the basic efficiency requirements of the production scenario. If the first-word response requirements are extremely high, DeepSeek has the most comprehensive advantage at present.
The data belongs to others, but the experience is yours. If you encounter any problems in the reproduction, feel free to leave a comment and I'll do my best to answer them.

At the end of the writing, my feelings
I'm a Chinese teacher, not an expert in operations or data backends. My testing method is very "stupid". I choose tools by "check, look, ask, touch", and there are no built-in bonus shortcuts. But I'm willing to lay all the details out in the light.
If you think such genuine sharing is valuable, give it a thumbs up and share it with more independent developers in need.
Next topic preview: "Is the LLM you're calling really secure? How to detect it ". Friends who are interested are welcome to take the test together!
"I tested from a single location with basic tools. If you've run similar tests in production, what latency are you seeing? Let me know in the comments."

Top comments (1)

Collapse
 
xiaoqiangapi3721 profile image
xiaoqiangapi

Thank you for reading! This article is a true test record of my transformation as a teacher. The methods might be a bit "clumsy", but all the data are honest.

If you have more professional API latency testing methods (such as using scripts to measure TTFT, conducting concurrent stress tests, etc.), I would be very grateful if you could share them in the comment section. I also want to learn.

Also, if you are also working on the overseas access of Chinese LLMs, we welcome you to share your actual test data. We can work together to compile a more comprehensive comparison.

Thank you for your support. πŸ™