Model Builder Class

class starlord.ModelBuilder

Builds and fits a Bayesian model to the given specification.

Variables are defined implicitly – if you use a variable, the ModelBuilder will handle declaring them based on their category, which is determined by a prefix (e.g. p.foo is a parameter named foo). The categories of variable are:

Parameters:: p.[name], these are model parameters to be sampled from.
Constants:: c.[name], these are set when the sampler is run and don’t change.
Local Variables:: l.[name] these are calculated for each log likelihood call but not recorded
Grid Variables:: d.[grid_name].[output_name], these indicate the grid should be interpolated to get the value, which will often result in more parameters being implicitly defined.

Typically, you initialize the builder (there are no significant options at init) and use constraint(), assign(), and sometimes expression() to define the model’s likelihood. Then you can look at how the model is set up with summary(); using grid variables often automatically defines new parameters for their inputs. If you don’t like the default inputs, you can override them with override_input. Finally, you must define priors with prior() before you can get a sampler for the model with build_sampler().

Distributions (for priors and likelihoods) are specified from the following list, with the number of expected parameters in parentheses:

Normal (2):: The common normal distribution with a mean and standard deviation.
Uniform (2):: A uniform distribution with a lower and upper bound.
Beta (2):: A Beta distribution parameterized by \(\alpha\) and \(\beta\).
Gamma (2):: A Gamma distribution parameterized by \(\alpha\) and \(\beta\). Note that unlike scipy.stats.gamma, this is in terms of the shape and rate rather than shape and scale \(\theta = 1/\beta\).
Exponential (1):: The exponential distribution parameterized by a rate parameter \(\lambda\).
Trunc_power (3):: A power law distribution with power \(k\) and lower and upper bounds \(a\) and \(b\).
Trunc_normal (4):: A normal distribution with a mean and standard deviation as well as lower and upper bounds \(a\) and \(b\).
Trunc_exponential (3):: The exponential distribution parameterized by a rate parameter \(\lambda\), and lower and upper bounds \(a\) and \(b\).
Chabrier (4):: A piecewise prior for stellar log-masses based on a log-normal and power-law distribution from Chabrier (2002). It has four parameters – the log-normal to power-law boundary point, the log-normal mean and sigma, and the rate for the power law component rate = (1+power)*ln(10).
Chabrier_disk (0):: The Chabrier prior using the parameters for disk and young cluster stars from Chabrier (2002), table 2.
Chabrier_globular (0):: The Chabrier prior using the parameters for globular cluster stars from Chabrier (2002), table 2.
Chabrier_spheroid (0):: The Chabrier prior using the parameters for spheroid stars from Chabrier (2002), table 2.

__init__(verbose=False, fancy_text=True)

Parameters:

verbose (bool) – If True, print extra debugging info
fancy_text (bool) – If True, color and style terminal output text

assign(var, expr)

Adds a likelihood component that sets a local variable to the given expression.

Parameters:

var (str) – The variable to be assigned (e.g. l.varname)
expr (str) – The value or expression to set the variable to (e.g. math.log10(p.mass))

Return type:

None

Example

Suppose a grid you wish to use named foo outputs bar, but you have measured the sqrt(bar). Rather than propagating uncertainties (an approximation), you could instead use:

builder.assign("l.sqrt_bar", "math.sqrt(foo.bar)")
builder.constraint("l.sqrt_bar", "normal", ["c.sqrt_bar_mu", "c.sqrt_bar_sigma"])

Grid names are resolved in expr as usual. I’ve written the mean and uncertainty as (arbitrarily-named) constants to set later, but you can use literals instead if you want to.

build_sampler(sampler_type, constants={}, **init_args)

Construct an MCMC sampler for the model.

Parameters:

sampler_type (str) – selects the sampler, should be “dynesty” or “emcee”
constants (dict) – a dict of constant names and the values they should take

Returns:

A properly-initialized SamplerNested if sampler_type is “dynesty” or a SamplerEnsemble if it is “emcee”

Raises:

KeyError – if a required constant was not provided in constants
ValueError – if the sampler_type was not one of “dynesty” or “emcee”

constraint(var, dist, params)

Adds a constraint term to the log-likelihood for the given distribution and variable.

Parameters:

var (str) – The variable to which the distribution applies
dist (str) – The distribution to be used; usually normal or uniform, but the full list may be found in starlord.ModelBuilder.
params (list[str | float]) – The parameters of the distribution.

Return type:

None

Example

Suppose you are fitting a stellar model and the 2MASS_H magnitude is 6.5 +/- 0.05. If you’re using the MIST grid, you could add this constraint to the model with:

builder.constraint("mist.2MASS_H", "normal", [6.5, 0.05])

expression(expr)

Directly insert an expression into the generated code.

Starlord will identify any variables assigned or used within to ensure the code is sorted properly by dependency (see ModelBuilder docstring). Most of the time you can use assign() or constraint() instead, but this gives you the flexibility to add more complicated log-likelihood calculations. For now this is the only way to implement a for loop.

Parameters:: expr (str) – The expression to be inserted into the code, as a str.
Return type:: None

generate_code()

Generates the code for the model.

Returns:: A string containing the generated Cython code.
Raises:: AssertionError – if one of the various consistency checks fails.
Return type:: str

static is_valid_param(var)

Parameters:: var (str)
Return type:: bool

override_mapping(key, value)

Sets the value or symbol to use a deferred variable, often grid variables.

This can be used to fix grid axes to a particular value, or make them depend on some additional grid output or calculation. Grid inputs are set by default according to the their entry in the input_mappings grid metadata. If there is no entry then they default to being a parameter named “p.{input_name}”.

Parameters:

key (str) – The deferred variable key, e.g. “d.grid.output_var” or “d.nongrid_var”.
value (str) – What to set the variable to wherever it appears.

Examples

Suppose you are fitting a stellar model and wish to lock the metallicity to solar. If you’re using the mist grid, you could do this with:

builder.override_input("mist.feh", "0")

In the same circumstance, if you wanted to set logG to 2% higher than what the evolution tracks output (as a sensitivity test, perhaps), you could use:

builder.override_input("mist.logG", "1.02*d.mistTracks.logG")

Note that this uses another grid via a deferred variable. Starlord detectrs this via the “d.” prefix. In fact, the default input refers to d.mistTracks.logG already.

prior(param, dist, params)

Sets the prior for a model parameter. All parameters must have a prior.

Parameters:

param (str) – The name of the parameter to set, e.g. p.some_param
dist (str) – The distribution to be used; usually normal or uniform, but the full list may be found in starlord.ModelBuilder.
params (list[str | float]) – The parameters of the distribution.

Return type:

None

set_from_dict(model)

Load model description from a dict following the TOML input spec.

Parameters:: model (dict) – The model dict to be loaded, it should only have the keys ‘expr’, ‘var’, ‘prior’, ‘override’, ‘options’ or the name of a grid.
Return type:: None

Example

Loading the model from a TOML file to be used within the Python API:

model = tomllib.load("mymodel.toml")['model']
builder = ModelBuilder().set_from_dict(model)

summary()

Generates a summary of the model currently defined.

The model does not need to be in a finalized state to be run, so it may help to check this periodically as you build the model.

Returns:: The model summary.
Return type:: str

validate_constants(constants, print_summary=False)

Check that the constants provided match those that were expected.

Parameters:

constants (dict) – a dict of the constant names and values (without the ‘c.’) to test.
print_summary (bool) – if True, print a list of the constants and values, noting extra or missing constants.

Returns:

A set() of any missing constant names A set() of any extra constant names that weren’t expected

Return type:

Tuple[set[str], set[str]]

property code_generator: CodeGenerator

distribution_name = re.compile('[a-zA-z_]+')

outname_regex = re.compile('(l|d.[a-zA-Z]\\w+).[a-zA-Z1-9]\\w*(--[a-zA-Z0-9]+)?')

overridable_regex = re.compile('(?:([a-zA-Z]\\w+)__)?([a-zA-Z]\\w+)(?:--([a-zA-Z0-9]+))?')

property txt: _TextFormatCodes_

varname_regex = re.compile('([pcld]).([a-zA-Z1-9]\\w*)(?:--([a-zA-Z0-9]+))?')