Skip to main content
Documentation

ReMoM (Reasoning for Mixture of Models)

Overview

Version: Latest

ReMoM (Reasoning for Mixture of Models)

Overview

remom is a looper algorithm for breadth-controlled multi-model orchestration with intelligent synthesis. It performs multi-round parallel reasoning and synthesizes the best answer from all responses.

It aligns to config/algorithm/looper/remom.yaml.

Inspired by: PaCoRe — extended to support mixture of models.

Key Advantages

  • Multi-round parallel reasoning with configurable breadth schedule.
  • Intelligent synthesis from multiple model responses.
  • Model distribution strategies: weighted, equal, or first_only.
  • Compaction strategy to manage token budgets across rounds.
  • Customizable synthesis templates.

Algorithm Principle

ReMoM orchestrates multiple rounds of parallel model calls:

  1. Round 1: Launch breadth_schedule[0] parallel calls across candidate models.
  2. Compaction: Optionally compact intermediate responses (full or last_n_tokens).
  3. Round 2: Launch breadth_schedule[1] calls, feeding compacted responses as context.
  4. Final Synthesis: One final call synthesizes all intermediate results into a coherent answer.

The breadth schedule controls how many calls happen per round. For example [32, 4] means 32 calls in round 1, 4 in round 2, then 1 final synthesis call.

Execution Flow

Model Distribution Strategies

StrategyDescription
weightedDistribute calls proportional to model weights in modelRefs
equalDistribute calls equally across all candidate models
first_onlyAll calls go to the first (highest-weight) model

What Problem Does It Solve?

Some tasks benefit from parallel exploration and later synthesis rather than one-shot selection of a single model. remom gives the router a breadth-controlled way to explore multiple reasoning paths and merge them into one final answer.

When to Use

  • One route should coordinate multiple models over several passes.
  • You need a configurable breadth schedule instead of one-step escalation.
  • Intermediate responses should be included or excluded explicitly.
  • Multi-round reasoning with synthesis produces better answers than single-shot.

Known Limitations

  • High token consumption: each round generates multiple responses.
  • Synthesis quality depends on the synthesis template and model capability.
  • Longer latency due to sequential round execution.
  • Requires careful tuning of breadth_schedule to balance quality vs. cost.

Configuration

algorithm:
type: remom
remom:
breadth_schedule: [3, 2, 1] # Parallel calls per round
model_distribution: weighted # weighted, equal, or first_only
temperature: 0.7 # Temperature for model calls
include_reasoning: false # Include reasoning in synthesis
compaction_strategy: full # full or last_n_tokens
compaction_tokens: 1000 # Tokens to keep for last_n_tokens
synthesis_template: "" # Custom synthesis template (optional)
max_concurrent: 3 # Max concurrent calls per round
shuffle_seed: 42 # Seed for response shuffling
include_intermediate_responses: false # Include intermediate responses in output
max_responses_per_round: null # Limit responses per round
on_error: skip # skip or fail

Parameters

ParameterTypeDefaultDescription
breadth_schedulelist[int]requiredParallel calls per round (e.g., [3, 2, 1])
model_distributionstringweightedStrategy: weighted, equal, first_only
temperaturefloat1.0Temperature for model calls
include_reasoningboolfalseInclude reasoning content in synthesis prompts
compaction_strategystringfullStrategy: full or last_n_tokens
compaction_tokensint1000Tokens to keep for last_n_tokens compaction
synthesis_templatestringCustom synthesis prompt template
max_concurrentintMaximum concurrent model calls per round
shuffle_seedint42Random seed for response shuffling
include_intermediate_responsesbooltrueInclude intermediate responses in output
max_responses_per_roundintMaximum responses to keep per round
on_errorstringskipBehavior on failure: skip or fail