nbtools

Linting and execution

Functions for running Jupyter Notebooks programmatically.

nbtools.exec_notebook.exec_notebook(path, inputs=None, outputs=None, inputs_pos=1, replace_inputs_pos=False, display_inputs=False, display_outputs=False, working_dir='./', execute_kwargs=None, out_path_db=None, out_path_ipynb=None, out_path_html=None, remove_db='always', add_timestamp=True, hide_code_cells=False, display_links=True, raise_exception=False, return_notebook=False, _output_queue=None)[source]

Execute a Jupyter Notebook programmatically. Heavily inspired by https://github.com/tritemio/nbrun.

Intended to be an analog of exec, providing a way to inject / extract variables from the execution. For a detailed description of how to do that, check the inputs and outputs parameters. The executed notebook is optionally saved to disk as .ipynb / .html file: we strongly recommend always doing so.

The raise_exception flag defines the behavior if the execution of the notebook fails due to an exception.

Under the hood, this function does the following:

Create an internal database to communicate variables (both inputs and outputs). Save inputs to it.
Add a cell for reading inputs from the internal database, add a cell for saving outputs to it.
Execute notebook.
Handle exceptions.
Read outputs from the database.
Add a timestamp cell to the notebook, if needed.
Save the executed notebook as .ipynb and / or .html.
Return a dictionary with intermediate results, execution info and values of outputs variables.

If there are no inputs or outputs, a database is not created and additional cells are not inserted. Note, if either of them is provided, then one of out_path_ipynb or out_path_db must be explicitly defined.

Parameters:

path (str) – Path to the notebook to execute.
inputs (dict, optional) – Inputs for execution are essentially equivalent to notebook globals. Must be a dictionary with variable names and their values; therefore, keys must be valid Python identifiers. Under the hood, inputs are saved into a database, loaded in the notebook in a separate cell, that is inserted at the inputs_pos position. Therefore, values must be serializable.
outputs (str or iterable of str, optional) – The list of variable names to return from the notebook. Extracted from the notebook in a separate cell, which is inserted at the last position. Note, if some of the variables don’t exist, no errors are raised.
inputs_pos (int, optional) – Position to insert the cell with inputs loading into the notebook.
replace_inputs_pos (int, optional) – Whether to replace inputs_pos code cell with inputs or insert a new one.
display_inputs (bool, optional) – Whether to display inputs or not. Under the hood, inputs are provided using a shelve database. If display_inputs=True, variables will be inserted in the cell in the following manner: input_name = input_value, instead of importing code from shelve.
display_outputs (bool, optional) – Whether to display outputs or not. Under the hood, outputs are saved using a shelve database. If display_outputs=True, variables will be shown in the last cell in the following manner: print(input_name), instead of dumping code into the database.
working_dir (str) – The working directory of starting the kernel.
out_path_db (str, optional) – Path to save the internal database files (without file extension). If not provided, then it is inferred from out_path_ipynb.
out_path_ipynb (str, optional) – Path to save the output ipynb file.
out_path_html (str, optional) – Path to save the output html file.
remove_db (str, optional) –
Whether to remove the internal database after notebook execution. Possible options are:
- 'always': remove the database after notebook execution
- 'not_failed_case': remove the database if there wasn’t any execution failure
- 'never': don’t remove the database after notebook execution
Running exec_notebook() with the 'not_failed_case' or 'never' option helps to reproduce failures in the out_path_ipynb notebook: it will take the inputs from the saved shelve database. Note, that the database exists only if inputs and / or outputs are provided.
execute_kwargs (dict, optional) – Parameters of nbconvert.preprocessors.ExecutePreprocessor. For example, you can provide timeout, kernel_name, resources (such as metadata) and other nbclient.client.NotebookClient arguments.
add_timestamp (bool, optional) – Whether to add a cell with execution information at the beginning of the saved notebook.
hide_code_cells (bool, optional) – Whether to hide the code cells in the saved notebook.
display_links (bool, optional) – Whether to display links to the executed notebook and html at execution.
raise_exception (bool, optional) – Whether to re-raise exceptions from the notebook.
return_notebook (bool, optional) – Whether to return the notebook object from this function.
_output_queue (None) – Placeholder for the run_in_process() decorator to return this function result.

Returns:

exec_res – Dictionary with the notebook execution results. It provides the following information:

'failed'bool
Whether the notebook execution failed.

'outputs'dict
Saved notebook local variables. Is not presented in exec_res dict, if outputs argument is None.

'failed cell number'int
An error cell execution number (if notebook failed).

'traceback'str
Traceback message from the notebook (if notebook failed).

'notebook'nbformat.notebooknode.NotebookNode, optional
Executed notebook object. Note that this output is provided only if return_notebook is True.

Return type:

dict

nbtools.exec_notebook.extract_traceback(notebook)[source]

Extracts information about an error from the notebook.

Parameters:

notebook (nbformat.notebooknode.NotebookNode) – Executed notebook to find an error traceback.

Returns:

Tuple of three elements:

bool
Whether the executed notebook has an error traceback.

int or None
Number of a cell with a traceback. If None, then the notebook doesn’t contain an error traceback.

str
Error traceback if exists.

Return type:

tuple

nbtools.exec_notebook.run_in_process(func)[source]: Decorator to run the func in a separate process for terminating all related processes properly.

Functions for code quality control of Jupyter Notebooks.

nbtools.pylint_notebook.pylint_notebook(path=None, options=(), config=None, disable=(), enable=(), printer=<built-in function print>, remove_files=True, return_info=False, **pylint_params)[source]

Execute pylint for a provided Jupyter Notebook.

Under the hood, roughly does the following:

Creates a .pylintrc file next to the path, if needed.
Converts the notebook to .py file next to the path.
Runs pylint with additional options.
Create a report and display it, if needed.

Parameters:

path (str, optional) – Path to the Jupyter notebook. If not provided, the current notebook is used.
options (sequence) – Additional options for pylint execution.
config (str, None) – Path to a pylint config in the .pylintrc format. Note, if config is not None, then disable and enable are not used.
printer (callable or None) – Function to display the report.
remove_files (bool) – Whether to remove .pylintrc and .py files after the execution.
return_info (bool) – Whether to return a dictionary with intermediate results. It contains the notebook code string, as well as pylint stdout and stderr.
disable (sequence) – Which checks to disable. Each element should be either a code or a name of the check.
enable (sequence) – Which checks to enable. Each element should be either a code or a name of the check. Has priority over disable.
max_line_length (int) – Allowed line length.
pylint_params (dict) – Additional parameter of linting. Each is converted to a separate valid entry in the .pylintrc file.

GPU utils

Core utility functions to work with Jupyter Notebooks.

nbtools.core.free_gpus(devices=None)[source]

Terminate all processes on gpu devices.

Parameters:: devices (iterable of ints) – Device indices to terminate processes. If None, than free all available gpus.

nbtools.core.get_available_gpus(n=1, min_free_memory=0.9, max_processes=2, verbose=False, raise_error=False, return_memory=False)[source]

Select n gpus from available and free devices.

Parameters:

n (int, str) –
- If 'max', then use maximum number of available devices.
- If int, then number of devices to select.
min_free_memory (int, float) –
- If int, minimum amount of free memory (in MB) on a device to consider it free.
- If float, minimum percentage of free memory.
max_processes (int) – Maximum amount of computed processes on a device to consider it free.
verbose (bool) – Whether to show individual device information.
raise_error (bool) – Whether to raise an exception if not enough devices are available.
return_memory (bool) – Whether to return memory available on each GPU.

Returns:

available_devices – List with available GPUs indices or dict of indices and 'available' and 'max' memory (in MB)

Return type:

list

nbtools.core.get_gpu_free_memory(index, ratio=True)[source]: Get free memory of a device (ratio or size in MB).

nbtools.core.set_gpus(n=1, min_free_memory=0.9, max_processes=2, verbose=False, raise_error=False)[source]

Set the CUDA_VISIBLE_DEVICES variable to n available devices.

Parameters:

n (int, str) –
- If 'max', then use maximum number of available devices.
- If int, then number of devices to select.
min_free_memory (int, float) –
- If int, minimum amount of free memory (in MB) on a device to consider it free.
- If float, minimum percentage of free memory.
max_processes (int) – Maximum amount of computed processes on a device to consider it free.
verbose (bool or int) –
Whether to show individual device information.
- If 0 or False, then no information is displayed.
- If 1 or True, then display the value assigned to CUDA_VISIBLE_DEVICES variable.
- If 2, then display memory and process information for each device.
raise_error (bool) – Whether to raise an exception if not enough devices are available.

Returns:

devices – Indices of selected and reserved GPUs.

Return type:

list

Monitoring tools

class nbtools.nbstat.resource_inspector.ResourceInspector(formatter=None)[source]: A class to controll the process of gathering information about system resources into ResourceTables, merging them into views, and formatting into nice colored strings.