nbtools
Linting and execution
Functions for running Jupyter Notebooks programmatically.
- nbtools.exec_notebook.exec_notebook(path, inputs=None, outputs=None, inputs_pos=1, replace_inputs_pos=False, display_inputs=False, display_outputs=False, working_dir='./', execute_kwargs=None, out_path_db=None, out_path_ipynb=None, out_path_html=None, remove_db='always', add_timestamp=True, hide_code_cells=False, display_links=True, raise_exception=False, return_notebook=False, _output_queue=None)[source]
Execute a Jupyter Notebook programmatically. Heavily inspired by https://github.com/tritemio/nbrun.
Intended to be an analog of
exec, providing a way to inject / extract variables from the execution. For a detailed description of how to do that, check theinputsandoutputsparameters. The executed notebook is optionally saved to disk as.ipynb/.htmlfile: we strongly recommend always doing so.The
raise_exceptionflag defines the behavior if the execution of the notebook fails due to an exception.- Under the hood, this function does the following:
Create an internal database to communicate variables (both
inputsandoutputs). Saveinputsto it.Add a cell for reading
inputsfrom the internal database, add a cell for savingoutputsto it.Execute notebook.
Handle exceptions.
Read
outputsfrom the database.Add a timestamp cell to the notebook, if needed.
Save the executed notebook as
.ipynband / or.html.Return a dictionary with intermediate results, execution info and values of
outputsvariables.
If there are no
inputsoroutputs, a database is not created and additional cells are not inserted. Note, if either of them is provided, then one ofout_path_ipynborout_path_dbmust be explicitly defined.- Parameters:
path (str) – Path to the notebook to execute.
inputs (dict, optional) – Inputs for execution are essentially equivalent to notebook
globals. Must be a dictionary with variable names and their values; therefore, keys must be valid Python identifiers. Under the hood, inputs are saved into a database, loaded in the notebook in a separate cell, that is inserted at theinputs_posposition. Therefore, values must be serializable.outputs (str or iterable of str, optional) – The list of variable names to return from the notebook. Extracted from the notebook in a separate cell, which is inserted at the last position. Note, if some of the variables don’t exist, no errors are raised.
inputs_pos (int, optional) – Position to insert the cell with
inputsloading into the notebook.replace_inputs_pos (int, optional) – Whether to replace
inputs_poscode cell withinputsor insert a new one.display_inputs (bool, optional) – Whether to display
inputsor not. Under the hood, inputs are provided using a shelve database. Ifdisplay_inputs=True, variables will be inserted in the cell in the following manner:input_name = input_value, instead of importing code from shelve.display_outputs (bool, optional) – Whether to display
outputsor not. Under the hood, outputs are saved using a shelve database. Ifdisplay_outputs=True, variables will be shown in the last cell in the following manner:print(input_name), instead of dumping code into the database.working_dir (str) – The working directory of starting the kernel.
out_path_db (str, optional) – Path to save the internal database files (without file extension). If not provided, then it is inferred from
out_path_ipynb.out_path_ipynb (str, optional) – Path to save the output ipynb file.
out_path_html (str, optional) – Path to save the output html file.
remove_db (str, optional) –
- Whether to remove the internal database after notebook execution. Possible options are:
'always': remove the database after notebook execution'not_failed_case': remove the database if there wasn’t any execution failure'never': don’t remove the database after notebook execution
Running
exec_notebook()with the'not_failed_case'or'never'option helps to reproduce failures in theout_path_ipynbnotebook: it will take the inputs from the saved shelve database. Note, that the database exists only if inputs and / or outputs are provided.execute_kwargs (dict, optional) – Parameters of
nbconvert.preprocessors.ExecutePreprocessor. For example, you can providetimeout,kernel_name,resources(such as metadata) and othernbclient.client.NotebookClientarguments.add_timestamp (bool, optional) – Whether to add a cell with execution information at the beginning of the saved notebook.
hide_code_cells (bool, optional) – Whether to hide the code cells in the saved notebook.
display_links (bool, optional) – Whether to display links to the executed notebook and html at execution.
raise_exception (bool, optional) – Whether to re-raise exceptions from the notebook.
return_notebook (bool, optional) – Whether to return the notebook object from this function.
_output_queue (None) – Placeholder for the
run_in_process()decorator to return this function result.
- Returns:
exec_res – Dictionary with the notebook execution results. It provides the following information:
'failed'boolWhether the notebook execution failed.
'outputs'dictSaved notebook local variables. Is not presented in
exec_resdict, ifoutputsargument isNone.
'failed cell number'intAn error cell execution number (if notebook failed).
'traceback'strTraceback message from the notebook (if notebook failed).
'notebook'nbformat.notebooknode.NotebookNode, optionalExecuted notebook object. Note that this output is provided only if
return_notebookisTrue.
- Return type:
- nbtools.exec_notebook.extract_traceback(notebook)[source]
Extracts information about an error from the notebook.
- Parameters:
notebook (
nbformat.notebooknode.NotebookNode) – Executed notebook to find an error traceback.- Returns:
Tuple of three elements:
boolWhether the executed notebook has an error traceback.
intorNoneNumber of a cell with a traceback. If
None, then the notebook doesn’t contain an error traceback.
strError traceback if exists.
- Return type:
- nbtools.exec_notebook.run_in_process(func)[source]
Decorator to run the
funcin a separate process for terminating all related processes properly.
Functions for code quality control of Jupyter Notebooks.
- nbtools.pylint_notebook.pylint_notebook(path=None, options=(), config=None, disable=(), enable=(), printer=<built-in function print>, remove_files=True, return_info=False, **pylint_params)[source]
Execute
pylintfor a provided Jupyter Notebook.- Under the hood, roughly does the following:
Creates a
.pylintrcfile next to thepath, if needed.Converts the notebook to .py file next to the
path.Runs
pylintwith additional options.Create a report and display it, if needed.
- Parameters:
path (str, optional) – Path to the Jupyter notebook. If not provided, the current notebook is used.
options (sequence) – Additional options for
pylintexecution.config (str, None) – Path to a pylint config in the
.pylintrcformat. Note, if config is not None, then disable and enable are not used.printer (callable or None) – Function to display the report.
remove_files (bool) – Whether to remove
.pylintrcand.pyfiles after the execution.return_info (bool) – Whether to return a dictionary with intermediate results. It contains the notebook code string, as well as
pylintstdout and stderr.disable (sequence) – Which checks to disable. Each element should be either a code or a name of the check.
enable (sequence) – Which checks to enable. Each element should be either a code or a name of the check. Has priority over
disable.max_line_length (int) – Allowed line length.
pylint_params (dict) – Additional parameter of linting. Each is converted to a separate valid entry in the
.pylintrcfile.
GPU utils
Core utility functions to work with Jupyter Notebooks.
- nbtools.core.free_gpus(devices=None)[source]
Terminate all processes on gpu devices.
- Parameters:
devices (iterable of ints) – Device indices to terminate processes. If
None, than free all available gpus.
- nbtools.core.get_available_gpus(n=1, min_free_memory=0.9, max_processes=2, verbose=False, raise_error=False, return_memory=False)[source]
Select
ngpus from available and free devices.- Parameters:
If
'max', then use maximum number of available devices.If
int, then number of devices to select.
min_free_memory (int, float) –
If
int, minimum amount of free memory (in MB) on a device to consider it free.If
float, minimum percentage of free memory.
max_processes (int) – Maximum amount of computed processes on a device to consider it free.
verbose (bool) – Whether to show individual device information.
raise_error (bool) – Whether to raise an exception if not enough devices are available.
return_memory (bool) – Whether to return memory available on each GPU.
- Returns:
available_devices – List with available GPUs indices or dict of indices and
'available'and'max'memory (in MB)- Return type:
- nbtools.core.get_gpu_free_memory(index, ratio=True)[source]
Get free memory of a device (ratio or size in MB).
- nbtools.core.set_gpus(n=1, min_free_memory=0.9, max_processes=2, verbose=False, raise_error=False)[source]
Set the
CUDA_VISIBLE_DEVICESvariable tonavailable devices.- Parameters:
If
'max', then use maximum number of available devices.If
int, then number of devices to select.
min_free_memory (int, float) –
If
int, minimum amount of free memory (in MB) on a device to consider it free.If
float, minimum percentage of free memory.
max_processes (int) – Maximum amount of computed processes on a device to consider it free.
Whether to show individual device information.
If
0orFalse, then no information is displayed.If
1orTrue, then display the value assigned toCUDA_VISIBLE_DEVICESvariable.If
2, then display memory and process information for each device.
raise_error (bool) – Whether to raise an exception if not enough devices are available.
- Returns:
devices – Indices of selected and reserved GPUs.
- Return type: