sta

Posted on Nov 21

Asking n People (n Generative AI Models) Simultaneously

#ai #llm #openai #openaiapi

Summary

Discussion on using Generative AI APIs
Efficiently inquiring multiple models in parallel
Concept:
- Generative AI models are like humans, and having n models equates to having "n different people".
- By making parallel inquiries to n models, you can experience asking n people simultaneously.
- This can be achieved by making API calls in parallel.
Example:
- pgpt.py cosense code

Background

As Fast as Possible. As Smooth as Possible.

The back-and-forth with ChatGPT can feel primitive and leaves room for improvement. Engineers using built-in AI agents like Copilot, Cline, and Cursor might never want to return to rudimentary copy-pasting.

This isn't limited to coding, but applies to inquiries in general. Whether asking about unfamiliar terms or getting ten ideas for brainstorming, being able to invoke these directly from an editor or IDE is obviously better.

It's crucial to make things as seamless as possible. Everyday convenience and magical innovation lie in efficiency, with seamlessness as a cornerstone. Minimizing hassle and ensuring smooth usability is one of the essential pursuits in engineering.

Let's Go Parallel

As a Knowledge Architect, I often engage in divergent and creative work. Thus, being able to seamlessly call upon Generative AI is directly tied to productivity.

I focused on the concept of parallelism. In other words, making inquiries to n models simultaneously.

Approach

1: Consider Models as People

Let's use OpenAI API as an example.

Here are some models:

GPT-4o
GPT-4.1
GPT-5
GPT-5.1
GPT-o1
GPT-o3-mini
GPT-o4-mini

Each model has slightly different characteristics. It's as if they are people. Let's just consider them as such.

Mr. 4o
Mr. 4.1
Mr. 5
Mr. 5.1
Mr. o1
Mr. o3-mini
Mr. o4-mini

In this scenario, it's like having seven different people.

As a side note, you can check the OpenAI API models available to you here.

2: Ask Simultaneously

Think of asking these seven people simultaneously.

Here's a sample code from a small tool called pgpt.py.

def request_to_model(model_name, prompt, timeout=130):
    try:
        response = client.chat.completions.create(
            model=model_name,
            messages=[
                {'role': 'user', 'content': prompt},
            ],
            timeout=timeout
        )
        return response.choices[0].message.content
    except Exception as e:
        return f"[ERROR in {model_name}]: {str(e)}"

def main():
    with concurrent.futures.ThreadPoolExecutor() as executor:
        future_to_model = {
            executor.submit(request_to_model, model, prompt): model
            for model in MODELS
        }
        for future in concurrent.futures.as_completed(future_to_model):
            model = future_to_model[future]
            content = future.result()
            save_response(args.input, model, content)

Using Python's concurrent processing, requests are thrown simultaneously. Responses from the completed models (people) are saved as they arrive.

3: Acquire Results from Simultaneous Inquiries

This is simple: just open all the files where the results from seven people are output.

In pgpt.py, results are saved with filenames like p-gpt-4.1.md, p-gpt-4o.md, and so on. Open all of these in VSCode. Any other editor will work too.

Modern editors have a feature to auto-reload when there's an external update, which means this setup is enough. You get to experience a simultaneous inquiry result handled in parallel and displayed automatically. This allows me to immediately inquire pgpt with shortcut keys in the editor and navigate through the results from seven people instantly. It's a far better experience than switching back and forth between browser tabs with ChatGPT.

Side Note: Personal Impressions

Here are my impressions of each individual.

Mr. 4o is incredibly fast in response.
- The quality in creative tasks is honestly mediocre, but the model is well-versed in most information except for newer concepts from the past few years, and it can usually provide answers.
Mr. 5 is slow in response but delivers dense information.
- However, as often criticized in the Keep 4o movement, it lacks warmth. It also tends to push frameworks and improvement ideas quite readily.
Mr. o1, Mr. o3-mini, and Mr. o4-mini offer a balance between Mr. 4o and Mr. 5.
- Each provides slightly different responses, which is extremely insightful.
- By comparing responses from these three, you generally get a good starting point. If not, it's likely due to my own inexperience (including insufficient prompting).
Mr. 4.1 doesn't leave much of an impression.
Mr. 5.1 doesn't either. It's still new as of 2025/11/21, so there's hope for the future.

Conclusion

By making requests to n models in parallel, you can swiftly gather opinions from n people — achieving such an experience.

As a Knowledge Architect, I usually delve into the realm of concept creation, but I do implement these small tools. As an engineer, I'm not very skilled, so take my implementation as a mere reference.

More importantly, consider the mindset of “treating models as people and asking n people simultaneously”. If you're more adept than I am, you'd likely implement something even more convenient. Feel the power of the experience of asking n people at once!

DEV Community