Optimization API Reference

SFS Optimization

demestats.fit.fit_sfs.fit(demo, paths: Mapping[Tuple[Any, ...] | Set[Tuple[Any, ...]], float], afs, afs_samples, cons, lb, ub, *, folded=False, method: str = 'trust-constr', sequence_length: float = None, theta: float = None, projection: bool = False, num_projections: float = 200, seed: float = 5, gtol: float = 1e-05, xtol: float = 1e-05, maxiter: int = 1000, barrier_tol: float = 1e-05)[source]

Fit demographic model parameters using SFS likelihood optimization. Main optimization function that estimates demographic parameters by maximizing the likelihood of the observed allele frequency spectrum. Supports both full SFS computation and accelerated projected methods.

Parameters:
  • demo (demes.Graph) – demes model graph.

  • paths (Params) – Parameter paths to optimize. Each path specifies a demographic parameter in the model.

  • afs (array_like) – Observed allele frequency spectrum.

  • afs_samples (dictionary) – Dictionary specifying the number of haploids in each population for the afs

  • cons (dict) – Dictionary containing equality and inequality constraints. Expected keys: ‘eq’ for (Aeq, beq) equality constraints Aeq@x = beq, and ‘ineq’ for (G, h) inequality constraints G@x <= h.

  • lb (array_like) – Lower bounds for parameters.

  • ub (array_like) – Upper bounds for parameters.

  • folded (bool, optional) – Set equal to True if you have a folded SFS, otherwise the default is False.

  • method (str, optional) – Optimization method (default: “trust-constr”).

  • sequence_length (float, optional) – Sequence length. Required for Poisson likelihood when theta is given.

  • theta (float, optional) – Population-scaled mutation rate. If provided, Poisson likelihood is used instead of multinomial.

  • projection (bool, optional) – Whether to use random projections for acceleration (default: False).

  • num_projections (int, optional) – Number of random projections to use if projection=True (default: 200).

  • seed (int, optional) – Random seed for projection matrix generation (default: 5).

  • gtol (float, optional) – Gradient tolerance for convergence (default: 1e-5).

  • xtol (float, optional) – Parameter tolerance for convergence (default: 1e-5).

  • maxiter (int, optional) – Maximum number of iterations (default: 1000).

  • barrier_tol (float, optional) – Barrier tolerance for interior-point methods (default: 1e-5).

Returns:

(params_opt, opt_value, x_opt) where: - params_opt: Dictionary of optimized parameters - opt_value: Optimal negative log-likelihood value - x_opt: Optimized parameter vector

Return type:

tuple

Notes

This function implements a sophisticated optimization pipeline: 1. Parameter space transformation using Hessian-based whitening 2. Constraint handling with equality and inequality constraints 3. Optional random projections for computational efficiency 4. Boundary enforcement with penalty gradients

The optimization is performed in a transformed space where the Hessian is approximately identity, improving convergence rates.

ICR Optimization

demestats.fit.fit_icr.get_tree_from_positions_data_efficient(ts, num_samples=50, gap=150000, k=2, seed=5, option='random')[source]

Extract coalescence-time data from trees sampled at regularly spaced genome positions, together with the corresponding population sampling configurations. This function will return an error if there’s missing data (i.e. where a genomic position doesn’t have a local tree).

Parameters:
  • ts (tskit.TreeSequence) – The input tree sequence from which local trees are sampled.

  • num_samples (int, optional) – The number of sampling replicates to generate when option="random". Default is 50.

  • gap (int or float, optional) – The spacing between successive genomic positions at which trees are queried. Default is 150000.

  • k (int, optional) – The number of sampled nodes. Default is 2.

  • seed (int, optional) – The random seed used to initialize JAX-based sampling. Default is 5.

  • option ({"random", "all"}, optional) – Strategy used to generate samples. With "random", num_samples random samples of size k are drawn without replacement. With "all", every possible combination of k samples is used. If the total number of haploids (N) in the tree is large, do not use “all” as the number of total sampling configurations is N choose k.

Returns:

A pair (data, cfg_list) where:

data is a JAX array containing the extracted coalescence times for each sample across queried genomic positions.

cfg_list is a list of dictionaries giving the per-population sample counts.

Return type:

tuple

Notes

The queried positions are constructed by drawing a random starting position in the first interval of length gap and then stepping across the genome in increments of gap. For each queried position and each sampled tree sequence, the function records the time to first coalescence across the sampled nodes.

When option="random", repeated random samples are drawn. When option="all", all N choose k sample combinations are enumerated, which may be expensive for large ts.num_samples.

::
data, cfgs = get_tree_from_positions_data_efficient(

ts, num_samples=100, gap=100000, k=2, seed=42, option=”random”,

)

data, cfgs = get_tree_from_positions_data_efficient(

ts, gap=50000, k=2, option=”all”,

)

demestats.fit.fit_icr.process_data(cfg_list)[source]

Convert a list of dictionary sampling configurations into a vectorized form for comptability with JAX.

Parameters:

cfg_list (sequence of dict) – A sequence of dictionaries where each dictionary maps deme names to the number of sampled haploids in that configuration.

Returns:

A pair (cfg_mat, deme_names) where:

cfg_mat is a JAX integer array of shape (num_samples, D) containing the sampling counts for each configuration and deme.

deme_names is the ordered collection of deme names corresponding to the columns of cfg_mat.

Return type:

tuple

Notes

The deme ordering is taken from the keys of the first configuration in cfg_list and is used consistently for every row in the output matrix. If a deme is missing from a later configuration, its count is filled with 0.

This function is used for converting a list-based representation of sampling configurations into a compact array form suitable for downstream numerical computation.

::
cfg_mat, deme_names = process_data([

{“P0”: 2, “P1”: 0}, {“P0”: 1, “P1”: 1}, {“P0”: 0, “P1”: 2},

])

demestats.fit.fit_icr.fit(demo, paths: Mapping[Tuple[Any, ...] | Set[Tuple[Any, ...]], float], data, cfg_list, cons, lb, ub, k, *, method: str = 'trust-constr', gtol: float = 1e-08, xtol: float = 1e-08, maxiter: int = 1000, barrier_tol: float = 1e-08)[source]

Fit demographic model parameters using ICR likelihood optimization.

Parameters:
  • demo (demes.Graph) – demes model graph.

  • paths (Params) – Parameter paths to optimize. Each path specifies a demographic parameter in the model.

  • data (array_like) – array containing the extracted coalescence times for each subsample across queried genomic positions

  • cfg_list (list of dictionaries) – list of dictionaries giving the per-population sample counts associated with each sample

  • cons (dict) – Dictionary containing equality and inequality constraints. Expected keys: ‘eq’ for (Aeq, beq) equality constraints Aeq@x = beq, and ‘ineq’ for (G, h) inequality constraints G@x <= h.

  • lb (array_like) – Lower bounds for parameters.

  • ub (array_like) – Upper bounds for parameters.

  • k (int) – Sample size

  • method (str, optional) – Optimization method (default: “trust-constr”).

  • gtol (float, optional) – Gradient tolerance for convergence (default: 1e-5).

  • xtol (float, optional) – Parameter tolerance for convergence (default: 1e-5).

  • maxiter (int, optional) – Maximum number of iterations (default: 1000).

  • barrier_tol (float, optional) – Barrier tolerance for interior-point methods (default: 1e-5).

Returns:

(params_opt, opt_value, x_opt) where: - params_opt: Dictionary of optimized parameters - opt_value: Optimal negative log-likelihood value - x_opt: Optimized parameter vector

Return type:

tuple

Notes

This function implements a sophisticated optimization pipeline: 1. Parameter space transformation using Hessian-based whitening 2. Constraint handling with equality and inequality constraints 3. Optional random projections for computational efficiency 4. Boundary enforcement with penalty gradients

The optimization is performed in a transformed space where the Hessian is approximately identity, improving convergence rates.