Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

🐏 RamParILS

A parallel Rust rewrite of ParamILS — automated algorithm configuration via Iterated Local Search.

Used as the inner tuner in Grackle, a strategy portfolio invention system for automated reasoning solvers.

💡 What it does

Given a target algorithm with configurable parameters, RamParILS searches for the parameter setting that minimises runtime or a numeric solution cost on a set of training instances. It uses FocusedILS by default. RamParILS starts by scoring configurations on a configurable prefix of the training instances and increases that shared fidelity when the incumbent survives a challenge. Neighbours are submitted to a bounded worker pool, and the first fully evaluated improvement is accepted. Results can be stored in a persistent SQLite cache for reuse by compatible tuning runs.

Cache entries are keyed only by the active configuration and instance path. Use a separate cache when the algorithm command, cutoff, objective, solver version, wrapper behavior, or random seed changes. Reusing a cache across incompatible scenarios can silently return stale results.

🚀 Key differences from Ruby ParamILS

Ruby ParamILSRamParILS
EvaluationSequentialParallel over all (neighbor, instance) pairs
CacheIn-memory, per-runPersistent SQLite, shared across runs
Python APIsubprocess callNative extension via PyO3
Non-deterministic algorithms (multiple seeds)SupportedNot (yet) supported

Parallel evaluation is the primary motivation for the rewrite. Actual speedup depends on worker count, neighbourhood width, current fidelity, solver runtimes, early acceptance, and cache hits. The persistent cache compounds this advantage across compatible Grackle tuning runs on overlapping problem sets.

🤝 Acknowledgements

This project is part of DEEPER and supported by the DEEPER grant from Renaissance Philanthropy.

⚙️ Installation

📦 Python extension (pip)

pip install ramparils

🦀 CLI binaries

Clone the repository and install both command-line tools:

git clone https://github.com/deeper4ai/ramparils.git
cd ramparils
cargo install --path . --locked

This installs ramparils and ramparils-db into Cargo’s binary directory, normally ~/.cargo/bin. Make sure that directory is on PATH.

For a repository-local build instead:

cargo build --release --locked
./target/release/ramparils --help
./target/release/ramparils-db --help

Both methods require Rust 1.85+.

🛠️ Python extension (from source)

git clone https://github.com/deeper4ai/ramparils.git
cd ramparils
pip install maturin
maturin develop

maturin develop installs the extension into the active Python environment. Using a virtual environment is recommended.

✅ Verify

Verify the Python package:

python -c "import ramparils; help(ramparils.specialize)"

Verify a Cargo installation:

ramparils --help
ramparils-db --help

The Python package requires Python 3.9 or newer. The command-line tools do not require Python.

⌨️ CLI

ramparils --scenariofile path/to/scenario.yaml

All tuning options — instances, cutoff times, algorithm settings, cache location, debug flags — live in a single YAML scenario file. The CLI has no other flags: every knob is a field in the YAML, making scenarios self-contained, reproducible, and easy to share.


🧭 Scenario file reference

# Required: use instance_file or instances
algo:          "ruby /path/to/solver_wrapper.rb"
paramfile:     "/path/to/solver.params"
instance_file: "data/train.txt"
# instances:   ["data/one.cnf", "data/two.cnf"]
cutoff_time:   5.0
tuner_timeout: 300.0

# Optional — shown with defaults
run_obj:               runtime   # runtime | quality
overall_obj:           mean      # mean | median
approach:              focused   # focused | basic | random
perturbation_strength: 4
initial_fidelity:      1
fidelity_step:         1
bound_multiplier:      10.0
pruning:               true
iterative_deepening:   false
lambda_n:              0.5
lambda_c:              0.5
lambda_t:              0.5
cores:                 0         # 0 = all available
num_run:               0
cache_db:              ":memory:"    # use a file path to persist across runs
debug:                 false
debug_wrapper:         false
debug_solver:          false
debug_log:             ~         # path or null
error_log:             ~         # path or null

🔑 Required fields

FieldTypeDescription
algostringShell command used to invoke the target algorithm. Invoked as <algo> <instance> <cutoff_time> -p1 v1 … via sh -c.
paramfilestringPath to the .params file describing the parameter space (domains, defaults, conditionals, forbidden combinations).
instance_filestringPath to a text file listing training instance paths, one per line. Blank lines and # comments are ignored.
instanceslist of stringsInline training-instance paths. This works in YAML as well as Python. If both instance fields are set, instances takes precedence.
cutoff_timefloatPer-run time limit in seconds. Passed to the target algorithm; the solver wrapper is expected to respect it.
tuner_timeoutfloatTotal wall-clock budget for the tuner in seconds. RamParILS stops launching new evaluations once this is exceeded and returns the best configuration found.

Paths in the scenario, parameter file, and instance list are interpreted from the directory where ramparils is started, not from the scenario file’s directory. Use absolute paths or run from a documented working directory when the scenario must be portable.

🎯 Objective

FieldDefaultDescription
run_objruntimeWhich numeric value to minimise: runtime or quality. For maximisation problems, make the wrapper convert utility to a cost, for example by negating it.
overall_objmeanHow per-run results are aggregated across instances: mean or median. median is more robust to outliers but ignores magnitude.

🔍 Algorithm

FieldDefaultDescription
approachfocusedSearch mode. focused (default) starts at initial_fidelity instances and increases fidelity when the incumbent survives a challenge. basic uses all instances from the start. random currently follows the same full-fidelity ILS path as basic; it is not an independent random-search baseline. See Algorithm.
perturbation_strength4Number of random parameter changes applied during perturbation to escape a local optimum. Larger values jump further in the space; smaller values stay closer to the current local optimum.
initial_fidelity1Initial number of instances used to score each configuration in FocusedILS. Larger values shift worker capacity from speculative neighbor evaluation toward parallel instance evaluation. Values are clamped to the available instance count.
fidelity_step1Number of instances added when FocusedILS increases fidelity after the incumbent survives a challenge. Values of 0 are treated as 1.
bound_multiplier10.0Heuristic capping threshold. A candidate is abandoned when its partial mean exceeds bound_multiplier × incumbent_score. Lower values prune more aggressively and can change search results.
pruningtrueEnable heuristic capping. Disable it when exact uncapped comparisons are required or when partial means are not meaningful for the objective.

FocusedILS uses the first N entries from instance_file, not a random sample. Order the file deliberately or shuffle it before a run when early prefixes should represent the full training set.

📈 Iterative deepening

Runs multiple ILS phases with an exponential schedule. Early phases use fewer instances and a shorter cutoff to rapidly explore the space; later phases refine the best region with the full budget. Useful when the training set is large or cutoff_time is long. See Iterative deepening.

FieldDefaultDescription
iterative_deepeningfalseEnable iterative deepening.
lambda_n0.5Geometric instance-count factor. Each later phase grows toward the full instance set; 0.5 produces approximate doubling.
lambda_c0.5Geometric cutoff factor. Each later phase grows toward cutoff_time; 0.5 produces approximate doubling.
lambda_t0.5Geometric cumulative-deadline factor. Each phase receives the time remaining before its scheduled deadline; 0.5 doubles successive deadlines toward tuner_timeout.

⚙️ Execution

FieldDefaultDescription
cores0Number of parallel worker threads. 0 uses all available CPU cores. Set to a specific number to limit parallelism on shared machines.
cache_db":memory:"Path to the SQLite cache file. Defaults to an in-memory cache (not persisted). Set to a file path to share cached results across runs on the same benchmark.
num_run0Run index, reserved for future use as a random seed. Has no effect currently.

Cache entries are keyed only by the active configuration and instance path. The algorithm command, cutoff, objective, solver version, wrapper behavior, and random seed are not included. Use a separate cache if any of them change; otherwise stale results may be reused without warning.

🩺 Debug

FieldDefaultDescription
debugfalsePrint structured debug output to stderr: new incumbents, scores, and timing.
debug_wrapperfalsePrint one line per solver wrapper invocation (instance, parameters). Verbose; useful for tracing evaluation order.
debug_solverfalsePrint one line per solver result (status, runtime, quality). Verbose; useful for diagnosing wrapper output.
debug_lognullWrite debug output to this file in addition to (or instead of) stderr. Independent of debug — file logging can be active without stderr logging.
error_lognullWrite details of failed solver runs (non-zero exit, missing result line) to this file for post-hoc diagnosis.
test_instance_filenullReserved for future use.

📤 Output

The best configuration found is printed to stdout as -param1 val1 -param2 val2 … (active parameters only, alphabetically sorted):

-alpha 1.256 -ps 0.1 -rho 0.5 -wp 0.03

This format is directly compatible with Grackle’s strategy parsing. Inactive conditional parameters are omitted — they are irrelevant given the active parameter values.

🐍 Python API

import ramparils

The Python extension provides a single function, specialize, that runs FocusedILS from an initial strategy and returns the best configuration found within the time budget. It is a native Rust extension built with PyO3, so there is no subprocess overhead — the ILS loop, parallel evaluation, and SQLite cache all run in-process. All tuning options are passed as fields in the scenario dict, matching the YAML keys documented in the CLI reference.

🐏 specialize

ramparils.specialize(strategy, scenario) -> dict[str, str]

Specialize a strategy on a set of benchmark instances using ILS.

Runs Iterated Local Search starting from strategy, evaluating (configuration, instance) pairs in parallel, and returns the best configuration found within the time budget. Results may be cached in SQLite when cache_db names a file.

Cache entries are keyed only by the active configuration and instance path. Use a separate cache when the algorithm command, cutoff, objective, solver version, wrapper behavior, or random seed changes. Reusing an incompatible cache can silently return stale results.

📥 Arguments

strategydict[str, str]

Initial parameter configuration as {name: value} strings. Must contain every parameter defined in scenario["paramfile"]. Values are strings even for numeric parameters (e.g. "1.189").

scenariodict

Tuning scenario. All keys match the YAML fields in the CLI reference.

Required keys:

KeyTypeDescription
algostrCommand to invoke the target algorithm. Invoked as <algo> <instance> <cutoff_time> -p1 v1 …
paramfilestrPath to the .params file describing the parameter space.
cutoff_timefloatPer-run time limit in seconds, passed to the target algorithm.
tuner_timeoutfloatTotal wall-clock budget for the tuner in seconds.

At least one of the following must be supplied to specify instances:

KeyTypeDescription
instanceslist[str]List of instance paths directly.
instance_filestrPath to a text file with one instance path per line.

If both are present, instances takes precedence. Relative paths are resolved from the Python process’s current working directory, not from the parameter or instance-list file.

Optional keys (all have defaults matching the CLI):

KeyDefaultDescription
run_obj"runtime"Numeric value to minimise: "runtime" or "quality". Convert maximisation objectives to costs in the wrapper.
overall_obj"mean""mean" or "median".
approach"focused""focused", "basic", or "random"; currently random follows the same full-fidelity ILS path as basic.
perturbation_strength4Neighbourhood steps per perturbation.
initial_fidelity1Initial instances per configuration in FocusedILS, capped by the instance count.
fidelity_step1Instances added at each FocusedILS fidelity increase.
bound_multiplier10.0Adaptive capping multiplier.
pruningTrueEnable adaptive capping.
iterative_deepeningFalseEnable iterative deepening.
lambda_n0.5Iterative deepening instance-count factor.
lambda_c0.5Iterative deepening cutoff-time factor.
lambda_t0.5Iterative deepening timeout factor.
cores0Parallel workers; 0 uses all available cores.
num_run0Run index / random seed (reserved).
cache_db":memory:"Path to the SQLite cache. Defaults to in-memory (not persisted). Set to a file path to share results across calls.
debugFalsePrint new incumbents and scores to stderr.
debug_wrapperFalsePrint every solver invocation.
debug_solverFalsePrint every solver result.
debug_logNoneWrite debug output to this file.
error_logNoneWrite failed solver runs to this file.
test_instance_fileNoneReserved for future use.

FocusedILS uses the first N entries from instances or instance_file while fidelity grows. Put a representative ordering in the list; RamParILS does not shuffle it automatically.

📤 Returns

dict[str, str] — The best configuration found. Only active parameters are included (inactive conditional parameters are omitted).

⚠️ Raises

RuntimeError — If the scenario is invalid, the instance list is empty, the paramfile cannot be parsed, or the cache cannot be opened.

🧪 Example

import ramparils

result = ramparils.specialize(
    strategy={
        "alpha": "1.189",
        "rho":   "0.5",
        "ps":    "0.1",
        "wp":    "0.03",
    },
    scenario={
        # Required
        "algo":          "ruby /path/to/saps_wrapper.rb",
        "paramfile":     "/path/to/saps.params",
        "instances":     [
            "/path/to/instances/inst1.cnf",
            "/path/to/instances/inst2.cnf",
        ],
        "cutoff_time":   5.0,
        "tuner_timeout": 120.0,
        # Optional — defaults shown
        "cache_db":      "/tmp/ramparils_cache.db",
        "cores":         0,       # 0 = all available
        "approach":      "focused",
        "debug":         False,
    },
)

print("Best config found:")
for k, v in sorted(result.items()):
    print(f"  {k} = {v}")

See examples/saps_python.py for a runnable example.

🧠 Algorithm

RamParILS implements Iterated Local Search (ILS) for automated algorithm configuration. The goal is to find the parameter setting of a target algorithm that minimises runtime or a numeric quality cost on a set of training instances. Both objective modes are minimisation; wrappers for maximisation problems must transform utility into a cost.


ILS alternates between two phases: local search finds a local optimum in the configuration space, and perturbation escapes it by applying a random walk. Starting from an initial configuration (typically the parameter file’s declared defaults), local search explores the neighbourhood one parameter at a time, greedily accepting any neighbor that improves performance. When no improving neighbor exists, perturbation applies perturbation_strength random steps to escape the local optimum, and the cycle repeats until the tuner timeout expires.

The neighbourhood of a configuration is all configurations that differ in exactly one parameter. For a space with P parameters and average domain size D, each configuration has at most (D−1)×P neighbors. RamParILS submits neighbours to a bounded worker pool and accepts the first fully evaluated improvement. The wall-clock cost still depends on neighbourhood width, fidelity, worker count, cache hits, and solver runtimes.


⚖️ BasicILS vs FocusedILS

The approach field selects the ILS variant.

BasicILS (approach: basic) evaluates candidates on the complete training-instance set before comparing their aggregate scores.

FocusedILS (approach: focused, the default) uses progressive global fidelity. Candidates are compared by their aggregate score on the current prefix of the training-instance list. When the incumbent survives a challenge, the prefix grows by fidelity_step until all instances are used. A challenger must have a strictly lower score at the current fidelity to replace the current configuration.

RamParILS starts FocusedILS at initial_fidelity instances per configuration and increases that global fidelity by fidelity_step, up to the number of available instances. With W workers and fidelity F, the current scheduler can approximately evaluate ceil(W/F) different neighbors at once. Increasing F therefore uses more workers on instances of the same neighbor and reduces speculative work on neighbors that may become irrelevant after the first improving move.

Fidelity always uses the first F entries in the supplied instance list. The list is not shuffled, so its early prefixes should be reasonably representative of the complete training set.

Random (approach: random) currently uses all instances from the start and otherwise follows the same ILS path as basic. It should not currently be interpreted as an independent uniform random-search baseline.


✂️ Adaptive capping

Adaptive capping (controlled by pruning and bound_multiplier) stops evaluating a configuration early if its partial mean exceeds bound_multiplier × incumbent_score. This is a heuristic: later results could lower the final mean, so capping can change which configurations are explored and accepted.

A lower bound_multiplier (e.g., 2.0) prunes more aggressively and speeds up search, but may occasionally discard a configuration that would have been competitive on later instances. The default of 10.0 is conservative for positive cost objectives. Set pruning: false when exact uncapped comparisons are required or when partial means and multiplier bounds are inappropriate for the objective.


📈 Iterative deepening

Iterative deepening (iterative_deepening: true) runs ILS in multiple phases with an exponential schedule rather than a single run. Early phases use a small fraction of the training instances and a short per-run cutoff — just enough to quickly filter the search space and find a good starting point. Later phases gradually increase the instance count, cutoff time, and per-phase budget, refining the best region found so far. The incumbent from each phase seeds the next, so early exploration and late refinement share information.

Three growth factors control the schedule:

FieldControlsEffect of larger value
lambda_ngeometric instance-count growthlarger values use more instances in early phases
lambda_cgeometric cutoff growthlarger values use longer cutoffs in early phases
lambda_tgeometric cumulative-deadline growthlarger values give earlier phases later deadlines

All three default to 0.5, giving an approximate geometric doubling schedule. The timeout values are cumulative deadlines measured from the start of iterative deepening, not independent budgets added together. Each phase gets the time remaining before its deadline.

Iterative deepening is most useful when the training set is large (hundreds of instances) and the cutoff time is long (tens of seconds), making a full-budget single run prohibitively slow at the start.

🎛️ Parameter file format

Parameter files (.params) describe the configuration space: which parameters exist, their domains, defaults, conditional activation, and forbidden combinations.

RamParILS supports the discrete parameter syntax used by the original Ruby ParamILS implementation. Continuous ranges and other ParamILS variants are not supported; enumerate every allowed value explicitly.

🔢 Discrete parameters

name {val1, val2, val3, ...} [default]

Example:

alpha {1.01, 1.066, 1.126, 1.189, 1.256, 1.326, 1.4} [1.189]
rho   {0, 0.17, 0.5, 1}                               [0.5]

Values are always strings. Numeric values are parsed by the target algorithm.

🔀 Conditional parameters

A conditional parameter is only active (included in the command line) when its parent has a specific value:

child {val1, val2} [default] | parent in {allowed1, allowed2}

Example:

noise_type {random, walk} [random]
noise_param {0.0, 0.1, 0.5} [0.1] | noise_type in {walk}

noise_param is only passed to the algorithm when noise_type = walk. Otherwise it is omitted from the command line.

⛔ Forbidden combinations

A forbidden combination prevents specific joint assignments from being evaluated:

{param1=val1, param2=val2, ...}

Example:

{alpha=1.01, rho=0}

Any configuration where alpha=1.01 and rho=0 simultaneously is skipped during search.

💬 Comments

Lines starting with # and trailing #... are ignored:

# This is a comment
alpha {1.01, 1.189} [1.189]   # inline comment

🧩 Full example

# SAPS parameters
alpha {1.01, 1.066, 1.126, 1.189, 1.256, 1.326, 1.4} [1.189]
rho   {0, 0.17, 0.5, 1}                               [0.5]
ps    {0.0, 0.01, 0.05, 0.1, 0.2, 0.5}                [0.1]
wp    {0.0, 0.01, 0.05, 0.1, 0.2}                     [0.03]

# Forbidden: degenerate case
{alpha=1.01, rho=0}

🔌 Solver wrapper protocol

RamParILS communicates with the target algorithm via a simple subprocess protocol.

📥 Invocation

The solver is invoked as a shell command:

<algo> <instance> <cutoff_time> -param1 val1 -param2 val2 …
  • <algo> — the command from the scenario’s algo field (passed to sh -c)
  • <instance> — path to the instance file
  • <cutoff_time> — per-run time limit in seconds
  • -param val pairs — active parameters in alphabetical order

The complete command is passed to sh -c. Scenario files and parameter values must therefore be trusted, and wrappers should avoid paths or values that need shell quoting.

Example:

ruby /path/to/saps_wrapper.rb /data/inst1.cnf 5.0 -alpha 1.189 -rho 0.5 -ps 0.1 -wp 0.03

📤 Output

The solver must print a result line to stdout:

#%# RamParIls #%# <status>, <runtime>, <quality>
FieldValuesDescription
statustextOutcome of the run; stored verbatim in the cache
runtimefloatActual runtime in seconds (capped to cutoff_time)
qualityfloatNumeric cost to minimise when run_obj: quality

The result line may appear anywhere in stdout; other output is ignored. RamParILS stores status for reporting but does not use it when scoring. The wrapper must therefore encode timeouts, crashes, and unsolved cases as an appropriate runtime or quality penalty.

If no valid result line is found, the status becomes UNKNOWN, runtime is set to cutoff_time, and quality is set to 10000000. UNKNOWN results are not written to the persistent cache.

🧪 Examples

Successful run:

#%# RamParIls #%# OK, 1.23, 0.0

Timeout:

#%# RamParIls #%# TIMEOUT, 5.0, 0.0

🦴 Minimal wrapper skeleton

#!/usr/bin/env ruby
instance, cutoff, *params = ARGV

# ... run your solver here ...

puts "#%# RamParIls #%# OK, #{runtime}, 0.0"

📖 Glossary


Active parameter A parameter that is included in the solver invocation for a given configuration. Conditional parameters are only active when their parent parameter has the required value; inactive parameters are omitted from the command line entirely.


Adaptive capping (see also: pruning) A heuristic early-stopping rule that abandons evaluation when a candidate’s partial mean exceeds bound_multiplier × incumbent_score. Later results could lower the final mean, so capping can change the search trajectory. Controlled by the pruning and bound_multiplier scenario fields.


Configuration A complete assignment of values to all parameters in the parameter space. Also called a strategy in the context of solver portfolios. Represented internally as {name → value} string maps.


Conditional parameter A parameter whose domain is only meaningful when a parent parameter has a specific value. Declared in the .params file as child {…} [default] | parent in {val}. Conditional parameters that are inactive are omitted from the solver command line.


Cutoff time (cutoff_time) The per-run time limit in seconds passed to the target algorithm. The solver wrapper is expected to respect this limit and report TIMEOUT if reached. Adaptive capping uses this as the ceiling for individual run runtimes.


Dominance (FocusedILS) In the current implementation, configuration θ₁ dominates θ₂ when it has been evaluated at at least the same fidelity and has a strictly lower aggregate score. Ties do not count as improvements.


Fidelity The number of leading training instances used to score each configuration. FocusedILS starts at initial_fidelity and grows by fidelity_step when the incumbent survives a challenge. Instances are taken in list order and are not shuffled.


Forbidden combination A joint assignment of parameter values that is excluded from the search. Declared in the .params file as {param1=val1, param2=val2}. Any configuration containing a forbidden combination is skipped during neighbourhood exploration.


Incumbent The best configuration found so far during the ILS run. Updated whenever a new configuration has a strictly lower aggregate score at the required fidelity. The final incumbent is returned as the result.


Instance A benchmark problem on which the target algorithm is evaluated. Passed as a file path to the solver wrapper. RamParILS evaluates configurations across the training instance set to estimate generalisation performance.


Iterated Local Search (ILS) The search algorithm at the core of RamParILS. Alternates between local search (greedy improvement within the neighbourhood) and perturbation (random escape from a local optimum). See Algorithm for a full description.


Local optimum A configuration whose entire neighbourhood contains no strictly better configuration. ILS escapes local optima via perturbation rather than accepting them as the final answer.


Neighbourhood The set of all configurations that differ from the current configuration in exactly one parameter value. Local search explores the neighbourhood at each step, evaluating all neighbours in parallel.


Objective What the tuner is trying to optimise. run_obj: runtime minimises the mean (or median) solver runtime across instances; run_obj: quality minimises the mean (or median) numeric cost returned by the solver. See also: overall objective.


Overall objective (overall_obj) How per-instance results are aggregated into a single scalar for comparison. mean is sensitive to all instances including outliers; median is more robust but ignores magnitude differences.


Parameter space The set of all configurations defined by the .params file: parameter names, discrete domains, defaults, conditional activations, and forbidden combinations. RamParILS searches this space to find a good configuration.


Perturbation A random walk of perturbation_strength steps applied to the current local optimum to escape it and seed the next local search. Each step randomly changes one parameter to a uniformly sampled value from its domain. Larger values jump further in the space.


Pruning (see also: adaptive capping) Shorthand for adaptive capping: heuristic early termination when a candidate’s partial mean exceeds its configured bound. Enabled by default (pruning: true).


Run objective (run_obj) What a single solver invocation measures: runtime (wall-clock seconds) or quality (a scalar cost returned by the solver). Both objectives are minimised.


Solver wrapper A script or executable that invokes the target algorithm with a given instance and parameter setting, then prints a result line in RamParILS format. See Solver protocol for the exact interface.


Strategy Synonym for configuration, commonly used in the context of automated reasoning solver portfolios (e.g., Grackle). A strategy is a complete parameter setting that defines the solver’s behaviour.


Strategy hash A compact fingerprint (64-bit integer) of the active configuration used as part of the cache key. It is computed from the sorted active param=value pairs, so inactive conditional values do not create duplicate cache entries. The hash is not guaranteed to remain portable across Rust versions.


Tuner timeout (tuner_timeout) The total wall-clock budget for the RamParILS run in seconds. Once elapsed, no new evaluations are started and the incumbent is returned. Distinct from cutoff_time, which limits individual solver runs.