batchflow.research¶
Research module.
Note
This module requries multiprocess package <http://multiprocess.rtfd.io/>`_.
Research¶
- class Research(name='research', domain=None, experiment=None, n_configs=None, n_reps=1, repeat_each=None)[source]¶
Research is an instrument to run multiple parallel experiments with different combinations of parameters called experiment configs. Configs are produced by
domain.Domain
(some kind of parameters grid.)- Parameters
name (str, optional) – name (relative path) of the research and corresponding folder to store results, by default ‘research’.
domain (Domain, optional) – grid of parameters (see
domain.Domain
) to produce experiment configs, by default None.experiment (Experiment, optional) – description of the experiment (see
experiment.Experiment
), by default None. Experiment can be defined explicitly as a parameter or constructed by Research methods (:meth:.add_callable, :meth:.add_generator, etc.).n_configs (int, optional) – the number of configs to get from domain (see n_items of
domain.Domain.set_iter_params()
), by default None.n_reps (int, optional) – the number of repetitions for each config (see n_reps of
domain.Domain.set_iter_params()
), by default 1.repeat_each (int, optional) – see repeat_each of
domain.Domain.set_iter_params()
, by default 100.
- attach_env_meta()[source]¶
Get version of packages (by “pip list” and “conda list”) and python version. Results will be stored in research folder (if it is created) or in _env attribute.
- attach_git_meta(cwd='.')[source]¶
Get git repo state (current commit, diff and status). Results will be stored in research folder (if it is created) or in _env attribute.
- Parameters
cwd (str, optional) – path to repo, by default ‘.’
- get_devices(devices)[source]¶
Return list if lists. Each sublist consists of devices for each branch.
- Parameters
devices (int, str, None or list of them) – devices to split between workers and branches. (see Example below)
- Returns
The first nesting level corresponds to workers. The second to branches. The third is a list of devices for current branch. For example, worker with index 2 and its branch with index 3 will get list of devices devices[2][3].
- Return type
list of lists of lists
Examples
For 3 workers and 2 branches:
None -> [[[None], [None]], [[None], [None]], [[None], [None]]] 1 -> [[['1'], ['1']], [['1'], ['1']], [['1'], ['1']]] [1, 2] -> [[['1'], ['1']], [['1'], ['2']], [['2'], ['2']]] [1, 2, 3, 4, 5] -> [[['1'], ['2']], [['3'], ['4']], [['5'], ['1']]] [0, 1, ..., 12] -> [[['0', '1'], ['2', '3']], [['4', '5'], ['6', '7']], [['8', '9'], ['10', '11']]]
- run(name=None, workers=1, branches=1, n_iters=None, devices=None, executor_class=<class 'batchflow.research.experiment.Executor'>, dump_results=True, parallel=True, executor_target='threads', loglevel=None, bar=True, detach=False, debug=False, finalize=True, git_meta=False, env_meta=False, seed=None, profile=False, memory_ratio=None, n_gpu_checks=3, gpu_check_delay=5, create_id_prefix=False, redirect_stdout=True, redirect_stderr=True)[source]¶
Run research.
- Parameters
name (str, optional) – redefine name of the research (if needed), by default None.
workers (int or list of Config instances, optional) – number of parallel workers, by default 1. If int, number of parallel workers to execute experiments. If list of Configs, list of configs for each worker which will be appended to configs from domain. Each element corresponds to one worker.
branches (int or list of Config instances, optional) – number of different branches with different configs with the same root, by default 1. If list of Configs, list of configs for each branch which will be appended to configs from domain. Each element corresponds to one branch.
n_iters (int, optional) – number of experiment iterations, by default None, None means that experiment will be executed until StopIteration exception.
devices (str or list, optional) – devices to split between workers and branches, by default None.
executor_class (Executor-inherited class, optional) – executor for experiments, by default None (means that Executor will be used).
dump_results (bool, optional) – dump results or not, by default True.
parallel (bool, optional) – execute experiments in parallel in separate processes or not, by default True.
executor_target ('for' or 'threads', optional) – how to execute branches, by default ‘threads’.
loglevel (str, optional) – logging level, by default ‘debug’.
bar (bool or class) – use or not progress bar.
detach (bool, optional) – run research in separate process or not, by default False.
debug (bool, optional) – If False, continue research after exceptions. If True, raise Exception. Can be used only with parallel=False and executor_target=’for’, by default False.
finalize (bool, optional) – continue experiment iteration after exception in some unit or not, by default True.
git_meta (bool, optional) – attach get repo state or not (see
Research.attach_git_meta()
).env_meta (bool, optional) – attach env meta or not (see
Research.attach_env_meta()
).seed (bool or int or object with a seed sequence attribute) – see
make_seed_sequence()
.profile (bool, optional) – perform Research profiling or not, be default False.
memory_ratio (float or None, optional) – the ratio of free memory for all devices in worker to start experiment. If None, check will be skipped.
n_gpu_checks (int, optional) – the number of such checks
gpu_check_delay (float, optional) – time in seconds between checks.
create_id_prefix (bool or int, optional) – add prefix to experiment id to allow to sort them by the order of parameters in domain. If int, the number of digits for the parameter code formatting.
redirect_stdout (int or bool, optional) –
- how to redirect stdout/stderr to files:
0 or False - no redirection, True - redirect to common research file “stdout.txt”/”stderr.txt” when dump_results=True
or to separate items in research.storage.experiments_stdout when dump_results=False
1 - redirect to common research file “stdout.txt”/”stderr.txt” (only when dump_results=True) 2 - redirect output streams of experiments into separate file in experiments folders 3 - redirect to common file and to separate experiments files (only when dump_results=True)
redirect_stderr (int or bool, optional) –
- how to redirect stdout/stderr to files:
0 or False - no redirection, True - redirect to common research file “stdout.txt”/”stderr.txt” when dump_results=True
or to separate items in research.storage.experiments_stdout when dump_results=False
1 - redirect to common research file “stdout.txt”/”stderr.txt” (only when dump_results=True) 2 - redirect output streams of experiments into separate file in experiments folders 3 - redirect to common file and to separate experiments files (only when dump_results=True)
- Returns
Research instance
**How does it work**
At each iteration all units of the experiment will be executed in the order in which were added.
If update_domain callable is defined, domain will be updated with the corresponding function
accordingly to when parameter of
update_domain()
.
- property results¶
- property profiler¶
- property is_finished¶
Whether all tasks are completed or not.
Domain¶
- class Domain(domain=None, **kwargs)[source]¶
Domain of parameters to generate configs for experiments.
- Parameters
domain (dict) – parameter values to try. Each key is a parameter, values is a list of parameter values or batchflow.Sampler.
**kwargs – the same as a domain dict. domain using is preferable when parameter name includes symbols like ‘/’.
Note
Domain generates configs of parameters. The simplest example is Domain(a=[1,2,3]). That domain defines parameter ‘a’ and its possible values [1,2,3]. You can iterate over all possible configs (3 configs in our example) and repeat generated configs in the same order several times (see n_reps in
set_iter_params()
).Besides, parameter values can be a batchflow.Sampler, e.g. Domain(a=NumpySampler(‘normal’)). In that case values for parameter ‘a’ will be sampled from normal distribution.
Dict in domain definition can consist of several elements, then we will get all possible combinations of parameters, e.g. Domain(a=[1,2], b=[3,4]) will produce four configs. If domain has parameters with array-like values and with sampler as values simultaneously, domain will produce all possible combinations of parameters with array-like values and for each combination values of other parameters will be sampled.
To get configs from Domain use
iterator()
. It produces configs wrapped byConfigAlias
.Additional parameters like the number of repetitions or the number of samples for domains with samplers are defined in
set_iter_params()
.Operations with Domain
sum by +: Concatenate two domains. For example, the resulting domain Domain(a=[1]) + Domain(b=[1]) will produce two configs: {‘a’: 1}, {‘b’: 1} (not one dict with ‘a’ and ‘b’).
multiplication by *: Cartesian multiplications of options in Domain. For example, if domain1 = Domain({‘a’: [1, 2]}), domain2 = Domain({‘b’: [3, 4]}) and domain3 = Domain({‘c’: bf.Sampler(‘n’)}) then domain1 * domain2 * domain3 will have all options and generate 4 configs: {‘a’: 1, ‘b’: 3, ‘c’: xi_1}, {‘a’: 1, ‘b’: 4, ‘c’: xi_2}, {‘a’: 2, ‘b’: 3, ‘c’: xi_3}, {‘a’: 2, ‘b’: 4, ‘c’: xi_4} where xi_i are independent samples from normal distribution. The same resulting domain can be defined as Domain({‘a’: [1, 2], ‘b’: [3, 4], ‘c’: bf.Sampler(‘n’)}).
multiplication by @: element-wise multiplication of array-like options. For example, if domain1 = Domain({‘a’: [1, 2]}) and domain2 = Domain({‘b’: [3, 4]}) then domain1 @ domain2 will have two configs: {‘a’: 1, `b: 3}`, {‘a’: 2, `b: 4}`.
multiplication with weights: can be used to sample configs from sum of domains. For example, the first ten configs from 0.3 * Domain({‘p1’: NS(‘n’, loc=-10)}) + 0.2 * Domain({‘p2’: NS(‘u’)}) + 0.5 * Domain({‘p3’: NS(‘n’, loc=10)}) will be {‘p1’: -10.3059}, {‘p3’: 8.9959}, {‘p3’: 9.1302}, {‘p3’: 10.2611}, {‘p1’: -7.9388}, {‘p2’: 0.5455}, {‘p1’: -9.2497}, {‘p3’: 9.9769}, {‘p2’: 0.3510}, {‘p3’: 8.8519} (depends on seed).
If you sum options with and without weights, they are grouped into consequent groups where all options has or not weights, for each group configs are generated consequently (for groups with weights) or sampled as described above. For example, for domain = domain1 + 1.2 * domain2 + 2.3 * domain3 + domain4 + 1. * domain5 we will get:
all configs from domain1
configs will be sampled from 1.2 * domain2 + 2.3 * domain3
all configs from domain4
configs will be sampled from 1. * domain4
If one of the domains here is a sampler-like domain, then samples from that domain will be generated endlessly.
- create_aliases(options)[source]¶
Create aliases by wrapping into Alias class for each key and value of the dict.
- set_iter_params(n_items=None, n_reps=1, repeat_each=None, produced=0, additional=True, create_id_prefix=False, seed=None)[source]¶
Set parameters for iterator.
- Parameters
n_items (int or None) – the number of configs that will be generated from domain. If the size of domain is less then n_items, elements will be repeated. If n_items is None and there is not a cube that consists only of sampler-options then n_items will be setted to the number of configs that can be produced from that domain. If n_items is None and there is a cube that consists only of sampler-option then domain will produce infinite number of configs.
n_reps (int) – each element will be repeated n_reps times.
repeat_each (int) – if there is not a cube that consists only of sampler-options then elements will be repeated after producing repeat_each configs. Else repeat_each will be setted to the number of configs that can be produced from domain.
produced (int) – how many configs was produced before (is needed to use after domain update).
additional (bool) – append ‘repetition’ and ‘updates’ to config or not.
seed (bool or int or object with a seed sequence attribute) – see
make_seed_sequence()
.
- update(generated, research)[source]¶
Update domain by update_func. If returns None, domain will not be updated.
- property size¶
Return the number of configs that will be produces from domain.
- property len¶
Return the number of configs that will be produced from domain without repetitions. None if infinite.
- property iterator¶
Get domain iterator.
- option_items(name, values)[source]¶
Return all possible ConfigAlias instances which can be created from the option.
- Returns
- Return type
list of ConfigAlias objects.
- option_sample(name, values, size=None)[source]¶
Return ConfigAlias objects created on the base of Sampler-option.
- Parameters
- Returns
- Return type
ConfigAlias (if size is None) or list of ConfigAlias objects (otherwise)
Alias¶
ConfigAlias¶
- class ConfigAlias(config=None)[source]¶
Wrapper for Config to infer its aliased version. Each key and value from initial config will be wrapped with Alias class (if it is not).
- Parameters
config (dict, list of tuple) – each tuple is a pair (key, value), key is Alias or str, value is Alias or object.
Notes
ConfigAlias has two main methods: config and alias. config returns initial config as Config instance. alias returns aliased versions of config or its string representation.
- pop_config(key)[source]¶
Pop item from ConfigAlias by config value (not by alias).
- Returns
ConfigAlias for popped keys. None if key doesn’t exist.
- Return type
ConfigAlias or None
- pop_alias(key)[source]¶
Pop item from ConfigAlias by alias (not by value).
- Returns
ConfigAlias for popped keys. None if key doesn’t exist.
- Return type
ConfigAlias or None
Experiment¶
- class Experiment(instance_creators=None, actions=None, namespaces=None)[source]¶
Experiment description which consists of lists of instances to create and actions to execute. Each action defines executable unit (callable or generator) and corresponding execution parameters. Actions will be executed in the order defined by list. Actions can be defined as attributes of some instance (e.g., see name of :meth:.add_callable).
- Parameters
instance_creators (list, optional) – list of instance_creators, by default None. Can be extended by :meth:.add_instance.
actions (list, optional) – list of actions, by default None. Can be extended by :meth:.add_executable_unit and other methods.
namespaces (list, optional) – list of namespaces, by default None. If None, then global namespace will be added.
- property is_alive¶
- property is_failed¶
- add_executable_unit(name, src=None, mode='func', when=1, save_to=None, dump=None, args=None, **kwargs)[source]¶
Add executable unit to experiment.
- Parameters
name (str) – name of unit to use inside of the research. Can be ‘instance_name.attr’ to refer to instance attr.
src (callable or generator, optional) – callable or generator to wrap into ExecutableUnit, by default None.
mode (str, optional) – type of src (‘func’ or ‘generator’), by default ‘func’
when (int, str or list, optional) – iterations to execute callable (see when of :class:ExecutableUnit), by default 1.
save_to (str or list, optional) – dst to save output of the unit (if needed), by default None.
dump (int, str or list, optional) – iterations to dump results (see when of :class:ExecutableUnit), by default 1.
args (list, optional) – args to execute unit, by default None.
kwargs (dict) – kwargs to execute unit.
- Returns
- Return type
- add_callable(name, func=None, args=None, when=1, save_to=None, dump=None, **kwargs)[source]¶
Add callable to experiment.
- Parameters
name (str) – name of callable to use inside of the research. Can be ‘instance_name.method’ to refer to instance method.
func (callable, optional) – callable to add into experiment, by default None.
args (list, optional) – args to execute callable, by default None.
when (int, str or list, optional) – iterations to execute callable (see when of :class:ExecutableUnit), by default 1.
save_to (str or list, optional) – dst to save output of the callable (if needed), by default None.
dump (int, str or list, optional) – iterations to dump results (see when of :class:ExecutableUnit), by default 1.
root (bool, optional) – does unit is the same for all branches or not, by default False.
kwargs (dict) – kwargs to execute callable.
- Returns
- Return type
- add_generator(name, generator=None, args=None, **kwargs)[source]¶
Add generator to experiment.
- Parameters
name (str) – name of generator to use inside of the research. Can be ‘instance_name.method’ to refer to instance method.
generator (generator, optional) – generator to add into experiment, by default None.
args (list, optional) – args to create iterator, by default None.
when (int, str or list, optional) – iterations to get item from generator (see when of :class:ExecutableUnit), by default 1.
save_to (NamedExpression, optional) – dst to save generated item (if needed), by default None.
root (bool, optional) – does unit is the same for all branches or not, by default False.
kwargs (dict) – kwargs to create iterator.
- Returns
- Return type
- add_instance(name, creator, root=False, **kwargs)[source]¶
Add instance of some class into research.
- Parameters
- Returns
- Return type
- add_pipeline(name, root=None, branch=None, run=False, variables=None, dump=None, when=1, **kwargs)[source]¶
Add pipeline to experiment.
- Parameters
name (str) – name of pipeline to use inside of the research. Can be ‘instance_name.attribute’ to refer to instance attribute.
root (batchflow.Pipeline, optional) – a pipeline to execute, by default None. It must contain run action with lazy=True or run_later. Only if branch is None, root may contain parameters that can be defined by config. from domain.
branch (Pipeline, optional) – a parallelized pipeline to execute, by default None. Several copies of branch pipeline will be executed in parallel per each batch received from the root pipeline. May contain parameters that can be defined by domain, all branch pipelines will correspond to different experiments and will have different configs from domain.
run (bool, optional) – if False then .next_batch() will be applied to pipeline, else .run() , by default False.
dump (int, str or list, optional) – iterations to dump results (see when of :class:ExecutableUnit), by default 1.
variables (str, list or None, optional) – variables of pipeline to save.
when (int, str or list, optional) – iterations to execute (see when of :class:ExecutableUnit), by default 1.
- Returns
- Return type
- save(src, dst, when=1, save_output_dict=False, copy=False)[source]¶
Save something to research results.
- property only_callables¶
Check if experiment has only callables.
- property results¶
- property profile_info¶
ResearchResults¶
- class ResearchResults(name, dump_results=True, **kwargs)[source]¶
Class to collect, load and process research results.
- Parameters
- load(**kwargs)[source]¶
Load (filtered if needed) results, configs and artifacts paths if they was dumped.
- load_results(experiment_id=None, name=None, iterations=None, config=None, alias=None, domain=None, **kwargs)[source]¶
Load and filter experiment results.
- Parameters
experiment_id (str or list, optional) – exepriments to load, by default None.
name (str or list, optional) – keys of results to load, by default None.
iterations (int or list, optional) – iterations to load, by default None.
config (Config, optional) – config with parameters values to load, by default None.
alias (Config, optional) – the same as config but with aliased values, by default None.
domain (Domain, optional) – domain with parameters values to load, by default None.
kwargs (dict) – is used as config. If config is not defined but alias is, then will be concated to alias.
- load_artifacts(experiment_id=None, name=None, config=None, alias=None, domain=None, **kwargs)[source]¶
Load and filter experiment artifacts (all files/folders in experiment folder except standart ‘results’, ‘config.dill’, ‘config.json’, ‘experiment.log’).
- Parameters
experiment_id (str or list, optional) – exepriments to load, by default None
name (str or list, optional) – names of artifacts to load into artifacts list, by default None
config (Config, optional) – config with parameters values to load, by default None
alias (Config, optional) – the same as config but with aliased values, by default None
domain (Domain, optional) – domain with parameters values to load, by default None
kwargs (dict) – is used as config. If config is not defined but alias is, then will be concated to alias.
- filter(experiment_id=None, name=None, iterations=None, config=None, alias=None, domain=None, **kwargs)[source]¶
Filter experiment_id by specified parameters and convert name, iterations to lists.
- Parameters
experiment_id (str or list, optional) – exepriments to load, by default None
name (str or list, optional) – keys of results to load, by default None
iterations (int or list, optional) – iterations to load, by default None
config (Config, optional) – config with parameters values to load, by default None
alias (Config, optional) – the same as config but with aliased values, by default None
domain (Domain, optional) – domain with parameters values to load, by default None
kwargs (dict) – is used as config. If config is not defined but alias is, then will be concated to alias.
- property df¶
Create pandas.DataFrame from results.
- to_df(pivot=True, include_config=True, use_alias=False, concat_config=False, remove_auxilary=True, drop_columns=True, **kwargs)[source]¶
Create pandas.DataFrame from filtered results.
- Parameters
pivot (bool, optional) – if True, two columns will be created: name (for results variable) and value. If False, for each variable separate column will be created. By default True
include_config (bool, optional) – include config into dataframe or not, by default True
use_alias (bool, optional) – use alias of config values or not, by default True
concat_config (bool, optional) – create one column for config (it will be concated) or create columns for each config parameter, by default False
remove_auxilary (bool, optional) – remove columns ‘repetition’, ‘device’, ‘updates’ or not, by default True
drop_columns (bool, optional) – remove or not separate columns for config parametrs when concat_config=True.
- Returns
- Return type
pandas.DataFrame
- load_iteration_files(path, iterations)[source]¶
Load files for specified iterations from specified path.
- configs_to_df(use_alias=True, concat_config=False, remove_auxilary=True, drop_columns=True)[source]¶
Create pandas.DataFrame with configs.
- Parameters
use_alias (bool, optional) – use alias of config values or not, by default True
concat_config (bool, optional) – create one column for config (it will be concated) or create columns for each config parameter, by default False
remove_auxilary (bool, optional) – remove columns ‘repetition’, ‘device’, ‘updates’ or not, by default True
drop_columns (bool, optional) – remove or not separate columns for config parametrs when concat_config=True.
- Returns
- Return type
pandas.DataFrame
- artifacts_to_df(include_config=True, use_alias=False, concat_config=False, remove_auxilary=True, drop_columns=True, **kwargs)[source]¶
Create pandas.DataFrame with experiment artifacts (all in experiment folder except standart ‘results’, ‘config.dill’, ‘config.json’, ‘experiment.log’).
- Parameters
use_alias (bool, optional) – use alias of config values or not, by default True
concat_config (bool, optional) – create one column for config (it will be concated) or create columns for each config parameter, by default False
remove_auxilary (bool, optional) – remove columns ‘repetition’, ‘device’, ‘updates’ or not, by default True
drop_columns (bool, optional) – remove or not separate columns for config parametrs when concat_config=True.
kwargs (dict, optional) – filtering kwargs for
load_artifacts()
.
- Returns
dataframe with name of the id of the experiment, artifact full path (with path to research folder) and relative path (inner path in research folder). Also can include experiment config.
- Return type
pandas.DataFrame
- filter_ids_by_configs(config=None, alias=None, domain=None, **kwargs)[source]¶
Filter configs.
- Parameters
repetition (int, optional) – index of the repetition to load, by default None
experiment_id (str or list, optional) – experiment id to load, by default None
configs (dict, optional) – specify keys and corresponding values to load results, by default None
aliases (dict, optional) – the same as configs but specify aliases of parameters, by default None
- Returns
filtered list on configs
- Return type
Named expressions¶
Contains named expression classes for Research
- class E(unit=None, all=False, **kwargs)[source]¶
NamedExpression for Experiment or its unit in Research.
- Parameters
- class EC(name=None, full=False, **kwargs)[source]¶
NamedExpression for Experiment config.
- Parameters
- class O(name, **kwargs)[source]¶
NamedExpression for ExecutableUnit output.
- Parameters
name (str) – name of the unit to get output.