Version: Latest

ReMoM (Reasoning for Mixture of Models)

Overview

remom is a looper algorithm for breadth-controlled multi-model orchestration with intelligent synthesis. It performs multi-round parallel reasoning and synthesizes the best answer from all responses.

It aligns to config/algorithm/looper/remom.yaml.

Inspired by: PaCoRe — extended to support mixture of models.

Key Advantages

Multi-round parallel reasoning with configurable breadth schedule.
Intelligent synthesis from multiple model responses.
Model distribution strategies: weighted, equal, or first_only.
Compaction strategy to manage token budgets across rounds.
Customizable synthesis templates.

Algorithm Principle

ReMoM orchestrates multiple rounds of parallel model calls:

Round 1: Launch breadth_schedule[0] parallel calls across candidate models.
Compaction: Optionally compact intermediate responses (full or last_n_tokens).
Round 2: Launch breadth_schedule[1] calls, feeding compacted responses as context.
Final Synthesis: One final call synthesizes all intermediate results into a coherent answer.

The breadth schedule controls how many calls happen per round. For example [32, 4] means 32 calls in round 1, 4 in round 2, then 1 final synthesis call.

Execution Flow

Model Distribution Strategies

Strategy	Description
`weighted`	Distribute calls proportional to model weights in `modelRefs`
`equal`	Distribute calls equally across all candidate models
`first_only`	All calls go to the first (highest-weight) model

What Problem Does It Solve?

Some tasks benefit from parallel exploration and later synthesis rather than one-shot selection of a single model. remom gives the router a breadth-controlled way to explore multiple reasoning paths and merge them into one final answer.

When to Use

One route should coordinate multiple models over several passes.
You need a configurable breadth schedule instead of one-step escalation.
Intermediate responses should be included or excluded explicitly.
Multi-round reasoning with synthesis produces better answers than single-shot.

Known Limitations

High token consumption: each round generates multiple responses.
Synthesis quality depends on the synthesis template and model capability.
Longer latency due to sequential round execution.
Requires careful tuning of breadth_schedule to balance quality vs. cost.

Configuration

algorithm:
  type: remom
  remom:
    breadth_schedule: [3, 2, 1]         # Parallel calls per round
    model_distribution: weighted         # weighted, equal, or first_only
    temperature: 0.7                     # Temperature for model calls
    include_reasoning: false             # Include reasoning in synthesis
    compaction_strategy: full            # full or last_n_tokens
    compaction_tokens: 1000              # Tokens to keep for last_n_tokens
    synthesis_template: ""               # Custom synthesis template (optional)
    max_concurrent: 3                    # Max concurrent calls per round
    shuffle_seed: 42                     # Seed for response shuffling
    include_intermediate_responses: false # Include intermediate responses in output
    max_responses_per_round: null        # Limit responses per round
    on_error: skip                       # skip or fail

Parameters

Parameter	Type	Default	Description
`breadth_schedule`	list[int]	required	Parallel calls per round (e.g., `[3, 2, 1]`)
`model_distribution`	string	`weighted`	Strategy: `weighted`, `equal`, `first_only`
`temperature`	float	`1.0`	Temperature for model calls
`include_reasoning`	bool	`false`	Include reasoning content in synthesis prompts
`compaction_strategy`	string	`full`	Strategy: `full` or `last_n_tokens`
`compaction_tokens`	int	`1000`	Tokens to keep for `last_n_tokens` compaction
`synthesis_template`	string	—	Custom synthesis prompt template
`max_concurrent`	int	—	Maximum concurrent model calls per round
`shuffle_seed`	int	`42`	Random seed for response shuffling
`include_intermediate_responses`	bool	`true`	Include intermediate responses in output
`max_responses_per_round`	int	—	Maximum responses to keep per round
`on_error`	string	`skip`	Behavior on failure: `skip` or `fail`

ReMoM (Reasoning for Mixture of Models)

Overview​

Key Advantages​

Algorithm Principle​

Execution Flow​

Model Distribution Strategies​

What Problem Does It Solve?​

When to Use​

Known Limitations​

Configuration​

Parameters​