nbtools
Linting and execution
Functions for running Jupyter Notebooks programmatically.
- nbtools.exec_notebook.exec_notebook(path, inputs=None, outputs=None, inputs_pos=1, replace_inputs_pos=False, display_inputs=False, display_outputs=False, working_dir='./', execute_kwargs=None, out_path_db=None, out_path_ipynb=None, out_path_html=None, remove_db='always', add_timestamp=True, hide_code_cells=False, display_links=True, raise_exception=False, return_notebook=False, _output_queue=None)[source]
Execute a Jupyter Notebook programmatically. Heavily inspired by https://github.com/tritemio/nbrun.
Intended to be an analog of
exec
, providing a way to inject / extract variables from the execution. For a detailed description of how to do that, check theinputs
andoutputs
parameters. The executed notebook is optionally saved to disk as.ipynb
/.html
file: we strongly recommend always doing so.The
raise_exception
flag defines the behavior if the execution of the notebook fails due to an exception.- Under the hood, this function does the following:
Create an internal database to communicate variables (both
inputs
andoutputs
). Saveinputs
to it.Add a cell for reading
inputs
from the internal database, add a cell for savingoutputs
to it.Execute notebook.
Handle exceptions.
Read
outputs
from the database.Add a timestamp cell to the notebook, if needed.
Save the executed notebook as
.ipynb
and / or.html
.Return a dictionary with intermediate results, execution info and values of
outputs
variables.
If there are no
inputs
oroutputs
, a database is not created and additional cells are not inserted. Note, if either of them is provided, then one ofout_path_ipynb
orout_path_db
must be explicitly defined.- Parameters:
path (str) – Path to the notebook to execute.
inputs (dict, optional) – Inputs for execution are essentially equivalent to notebook
globals
. Must be a dictionary with variable names and their values; therefore, keys must be valid Python identifiers. Under the hood, inputs are saved into a database, loaded in the notebook in a separate cell, that is inserted at theinputs_pos
position. Therefore, values must be serializable.outputs (str or iterable of str, optional) – The list of variable names to return from the notebook. Extracted from the notebook in a separate cell, which is inserted at the last position. Note, if some of the variables don’t exist, no errors are raised.
inputs_pos (int, optional) – Position to insert the cell with
inputs
loading into the notebook.replace_inputs_pos (int, optional) – Whether to replace
inputs_pos
code cell withinputs
or insert a new one.display_inputs (bool, optional) – Whether to display
inputs
or not. Under the hood, inputs are provided using a shelve database. Ifdisplay_inputs=True
, variables will be inserted in the cell in the following manner:input_name = input_value
, instead of importing code from shelve.display_outputs (bool, optional) – Whether to display
outputs
or not. Under the hood, outputs are saved using a shelve database. Ifdisplay_outputs=True
, variables will be shown in the last cell in the following manner:print(input_name)
, instead of dumping code into the database.working_dir (str) – The working directory of starting the kernel.
out_path_db (str, optional) – Path to save the internal database files (without file extension). If not provided, then it is inferred from
out_path_ipynb
.out_path_ipynb (str, optional) – Path to save the output ipynb file.
out_path_html (str, optional) – Path to save the output html file.
remove_db (str, optional) –
- Whether to remove the internal database after notebook execution. Possible options are:
'always'
: remove the database after notebook execution'not_failed_case'
: remove the database if there wasn’t any execution failure'never'
: don’t remove the database after notebook execution
Running
exec_notebook()
with the'not_failed_case'
or'never'
option helps to reproduce failures in theout_path_ipynb
notebook: it will take the inputs from the saved shelve database. Note, that the database exists only if inputs and / or outputs are provided.execute_kwargs (dict, optional) – Parameters of
nbconvert.preprocessors.ExecutePreprocessor
. For example, you can providetimeout
,kernel_name
,resources
(such as metadata) and othernbclient.client.NotebookClient
arguments.add_timestamp (bool, optional) – Whether to add a cell with execution information at the beginning of the saved notebook.
hide_code_cells (bool, optional) – Whether to hide the code cells in the saved notebook.
display_links (bool, optional) – Whether to display links to the executed notebook and html at execution.
raise_exception (bool, optional) – Whether to re-raise exceptions from the notebook.
return_notebook (bool, optional) – Whether to return the notebook object from this function.
_output_queue (None) – Placeholder for the
run_in_process()
decorator to return this function result.
- Returns:
exec_res – Dictionary with the notebook execution results. It provides the following information:
'failed'
bool
Whether the notebook execution failed.
'outputs'
dict
Saved notebook local variables. Is not presented in
exec_res
dict, ifoutputs
argument isNone.
'failed cell number'
int
An error cell execution number (if notebook failed).
'traceback'
str
Traceback message from the notebook (if notebook failed).
'notebook'
nbformat.notebooknode.NotebookNode
, optionalExecuted notebook object. Note that this output is provided only if
return_notebook
isTrue
.
- Return type:
- nbtools.exec_notebook.extract_traceback(notebook)[source]
Extracts information about an error from the notebook.
- Parameters:
notebook (
nbformat.notebooknode.NotebookNode
) – Executed notebook to find an error traceback.- Returns:
Tuple of three elements:
bool
Whether the executed notebook has an error traceback.
int
orNone
Number of a cell with a traceback. If
None
, then the notebook doesn’t contain an error traceback.
str
Error traceback if exists.
- Return type:
- nbtools.exec_notebook.run_in_process(func)[source]
Decorator to run the
func
in a separate process for terminating all related processes properly.
Functions for code quality control of Jupyter Notebooks.
- nbtools.pylint_notebook.pylint_notebook(path=None, options=(), config=None, disable=(), enable=(), printer=<built-in function print>, remove_files=True, return_info=False, **pylint_params)[source]
Execute
pylint
for a provided Jupyter Notebook.- Under the hood, roughly does the following:
Creates a
.pylintrc
file next to thepath
, if needed.Converts the notebook to .py file next to the
path
.Runs
pylint
with additional options.Create a report and display it, if needed.
- Parameters:
path (str, optional) – Path to the Jupyter notebook. If not provided, the current notebook is used.
options (sequence) – Additional options for
pylint
execution.config (str, None) – Path to a pylint config in the
.pylintrc
format. Note, if config is not None, then disable and enable are not used.printer (callable or None) – Function to display the report.
remove_files (bool) – Whether to remove
.pylintrc
and.py
files after the execution.return_info (bool) – Whether to return a dictionary with intermediate results. It contains the notebook code string, as well as
pylint
stdout and stderr.disable (sequence) – Which checks to disable. Each element should be either a code or a name of the check.
enable (sequence) – Which checks to enable. Each element should be either a code or a name of the check. Has priority over
disable
.max_line_length (int) – Allowed line length.
pylint_params (dict) – Additional parameter of linting. Each is converted to a separate valid entry in the
.pylintrc
file.
GPU utils
Core utility functions to work with Jupyter Notebooks.
- nbtools.core.free_gpus(devices=None)[source]
Terminate all processes on gpu devices.
- Parameters:
devices (iterable of ints) – Device indices to terminate processes. If
None
, than free all available gpus.
- nbtools.core.get_available_gpus(n=1, min_free_memory=0.9, max_processes=2, verbose=False, raise_error=False, return_memory=False)[source]
Select
n
gpus from available and free devices.- Parameters:
If
'max'
, then use maximum number of available devices.If
int
, then number of devices to select.
min_free_memory (int, float) –
If
int
, minimum amount of free memory (in MB) on a device to consider it free.If
float
, minimum percentage of free memory.
max_processes (int) – Maximum amount of computed processes on a device to consider it free.
verbose (bool) – Whether to show individual device information.
raise_error (bool) – Whether to raise an exception if not enough devices are available.
return_memory (bool) – Whether to return memory available on each GPU.
- Returns:
available_devices – List with available GPUs indices or dict of indices and
'available'
and'max'
memory (in MB)- Return type:
- nbtools.core.get_gpu_free_memory(index, ratio=True)[source]
Get free memory of a device (ratio or size in MB).
- nbtools.core.set_gpus(n=1, min_free_memory=0.9, max_processes=2, verbose=False, raise_error=False)[source]
Set the
CUDA_VISIBLE_DEVICES
variable ton
available devices.- Parameters:
If
'max'
, then use maximum number of available devices.If
int
, then number of devices to select.
min_free_memory (int, float) –
If
int
, minimum amount of free memory (in MB) on a device to consider it free.If
float
, minimum percentage of free memory.
max_processes (int) – Maximum amount of computed processes on a device to consider it free.
Whether to show individual device information.
If
0
orFalse
, then no information is displayed.If
1
orTrue
, then display the value assigned toCUDA_VISIBLE_DEVICES
variable.If
2
, then display memory and process information for each device.
raise_error (bool) – Whether to raise an exception if not enough devices are available.
- Returns:
devices – Indices of selected and reserved GPUs.
- Return type: