does not require any fine-tuning and only utilizes the interface for prompting and generation of LLMs.
we do not need to concatenate prompt and all model responses so only one LLM is needed to be used in the last layer.
⚡ design of Mixture-of-Agents.
2 parts : proposers & aggregators
Proposers : generating useful reference responses for use by other models. While the proposer may not necessarily produce responses with high scores by itself, it should offer more context and diverse perspectives, ultimately contributing to better final responses when used by an aggregator
Aggregators : synthesizing responses from other models into a single, high-quality output. An effective aggregator should maintain or integrate inputs that are of lesser quality than its own.
By incorporating more aggregators into the process, we can iteratively synthesize and refine the responses, leveraging the strengths of multiple models to produce superior outcomes.
Initially, LLMs in the first layer, denoted as agents A1,1,...A1,n, independently generate responses to a given prompt. These responses are then presented to agents in the next layer (A2, 1,... A2, n) for further refinement.
This iterative refinement process continues for several cycles until a more robust and comprehensive response is obtained.
The selection of LLMs for each MoA requires two primary criteria:
Performance Metrics: The average win rate of models in layer i plays a significant role in determining their suitability for inclusion in layer i + 1.
Diversity Considerations: The diversity of model outputs is also crucial. Responses generated by heterogeneous models contribute significantly more than those produced by the same model
By leveraging these criteria — performance and diversity — MoA aims to mitigate individual model deficiencies and enhance overall response quality through collaborative synthesis.
⚡limitations :
model cannot decide the first token until the last MoA layer is reached. This potentially results in a high Time to First Token (TTFT), which can negatively impact user experience.
To mitigate this issue, we can limit the number of MoA layers, as the first response aggregation has the most significant boost on generation quality.
Top comments (0)