nbtools

Linting and execution

Functions for running Jupyter Notebooks programmatically.

nbtools.exec_notebook.exec_notebook(path, inputs=None, outputs=None, inputs_pos=1, replace_inputs_pos=False, display_inputs=False, display_outputs=False, working_dir='./', execute_kwargs=None, out_path_db=None, out_path_ipynb=None, out_path_html=None, remove_db='always', add_timestamp=True, hide_code_cells=False, display_links=True, raise_exception=False, return_notebook=False, _output_queue=None)[source]

Execute a Jupyter Notebook programmatically. Heavily inspired by https://github.com/tritemio/nbrun.

Intended to be an analog of exec, providing a way to inject / extract variables from the execution. For a detailed description of how to do that, check the inputs and outputs parameters. The executed notebook is optionally saved to disk as .ipynb / .html file: we strongly recommend always doing so.

The raise_exception flag defines the behavior if the execution of the notebook fails due to an exception.

Under the hood, this function does the following:
  • Create an internal database to communicate variables (both inputs and outputs). Save inputs to it.

  • Add a cell for reading inputs from the internal database, add a cell for saving outputs to it.

  • Execute notebook.

  • Handle exceptions.

  • Read outputs from the database.

  • Add a timestamp cell to the notebook, if needed.

  • Save the executed notebook as .ipynb and / or .html.

  • Return a dictionary with intermediate results, execution info and values of outputs variables.

If there are no inputs or outputs, a database is not created and additional cells are not inserted. Note, if either of them is provided, then one of out_path_ipynb or out_path_db must be explicitly defined.

Parameters:
  • path (str) – Path to the notebook to execute.

  • inputs (dict, optional) – Inputs for execution are essentially equivalent to notebook globals. Must be a dictionary with variable names and their values; therefore, keys must be valid Python identifiers. Under the hood, inputs are saved into a database, loaded in the notebook in a separate cell, that is inserted at the inputs_pos position. Therefore, values must be serializable.

  • outputs (str or iterable of str, optional) – The list of variable names to return from the notebook. Extracted from the notebook in a separate cell, which is inserted at the last position. Note, if some of the variables don’t exist, no errors are raised.

  • inputs_pos (int, optional) – Position to insert the cell with inputs loading into the notebook.

  • replace_inputs_pos (int, optional) – Whether to replace inputs_pos code cell with inputs or insert a new one.

  • display_inputs (bool, optional) – Whether to display inputs or not. Under the hood, inputs are provided using a shelve database. If display_inputs=True, variables will be inserted in the cell in the following manner: input_name = input_value, instead of importing code from shelve.

  • display_outputs (bool, optional) – Whether to display outputs or not. Under the hood, outputs are saved using a shelve database. If display_outputs=True, variables will be shown in the last cell in the following manner: print(input_name), instead of dumping code into the database.

  • working_dir (str) – The working directory of starting the kernel.

  • out_path_db (str, optional) – Path to save the internal database files (without file extension). If not provided, then it is inferred from out_path_ipynb.

  • out_path_ipynb (str, optional) – Path to save the output ipynb file.

  • out_path_html (str, optional) – Path to save the output html file.

  • remove_db (str, optional) –

    Whether to remove the internal database after notebook execution. Possible options are:
    • 'always': remove the database after notebook execution

    • 'not_failed_case': remove the database if there wasn’t any execution failure

    • 'never': don’t remove the database after notebook execution

    Running exec_notebook() with the 'not_failed_case' or 'never' option helps to reproduce failures in the out_path_ipynb notebook: it will take the inputs from the saved shelve database. Note, that the database exists only if inputs and / or outputs are provided.

  • execute_kwargs (dict, optional) – Parameters of nbconvert.preprocessors.ExecutePreprocessor. For example, you can provide timeout, kernel_name, resources (such as metadata) and other nbclient.client.NotebookClient arguments.

  • add_timestamp (bool, optional) – Whether to add a cell with execution information at the beginning of the saved notebook.

  • hide_code_cells (bool, optional) – Whether to hide the code cells in the saved notebook.

  • display_links (bool, optional) – Whether to display links to the executed notebook and html at execution.

  • raise_exception (bool, optional) – Whether to re-raise exceptions from the notebook.

  • return_notebook (bool, optional) – Whether to return the notebook object from this function.

  • _output_queue (None) – Placeholder for the run_in_process() decorator to return this function result.

Returns:

exec_res – Dictionary with the notebook execution results. It provides the following information:

  • 'failed'bool

    Whether the notebook execution failed.

  • 'outputs'dict

    Saved notebook local variables. Is not presented in exec_res dict, if outputs argument is None.

  • 'failed cell number'int

    An error cell execution number (if notebook failed).

  • 'traceback'str

    Traceback message from the notebook (if notebook failed).

  • 'notebook'nbformat.notebooknode.NotebookNode, optional

    Executed notebook object. Note that this output is provided only if return_notebook is True.

Return type:

dict

nbtools.exec_notebook.extract_traceback(notebook)[source]

Extracts information about an error from the notebook.

Parameters:

notebook (nbformat.notebooknode.NotebookNode) – Executed notebook to find an error traceback.

Returns:

Tuple of three elements:

  • bool

    Whether the executed notebook has an error traceback.

  • int or None

    Number of a cell with a traceback. If None, then the notebook doesn’t contain an error traceback.

  • str

    Error traceback if exists.

Return type:

tuple

nbtools.exec_notebook.run_in_process(func)[source]

Decorator to run the func in a separate process for terminating all related processes properly.

Functions for code quality control of Jupyter Notebooks.

nbtools.pylint_notebook.pylint_notebook(path=None, options=(), config=None, disable=(), enable=(), printer=<built-in function print>, remove_files=True, return_info=False, **pylint_params)[source]

Execute pylint for a provided Jupyter Notebook.

Under the hood, roughly does the following:
  • Creates a .pylintrc file next to the path, if needed.

  • Converts the notebook to .py file next to the path.

  • Runs pylint with additional options.

  • Create a report and display it, if needed.

Parameters:
  • path (str, optional) – Path to the Jupyter notebook. If not provided, the current notebook is used.

  • options (sequence) – Additional options for pylint execution.

  • config (str, None) – Path to a pylint config in the .pylintrc format. Note, if config is not None, then disable and enable are not used.

  • printer (callable or None) – Function to display the report.

  • remove_files (bool) – Whether to remove .pylintrc and .py files after the execution.

  • return_info (bool) – Whether to return a dictionary with intermediate results. It contains the notebook code string, as well as pylint stdout and stderr.

  • disable (sequence) – Which checks to disable. Each element should be either a code or a name of the check.

  • enable (sequence) – Which checks to enable. Each element should be either a code or a name of the check. Has priority over disable.

  • max_line_length (int) – Allowed line length.

  • pylint_params (dict) – Additional parameter of linting. Each is converted to a separate valid entry in the .pylintrc file.

GPU utils

Core utility functions to work with Jupyter Notebooks.

nbtools.core.free_gpus(devices=None)[source]

Terminate all processes on gpu devices.

Parameters:

devices (iterable of ints) – Device indices to terminate processes. If None, than free all available gpus.

nbtools.core.get_available_gpus(n=1, min_free_memory=0.9, max_processes=2, verbose=False, raise_error=False, return_memory=False)[source]

Select n gpus from available and free devices.

Parameters:
  • n (int, str) –

    • If 'max', then use maximum number of available devices.

    • If int, then number of devices to select.

  • min_free_memory (int, float) –

    • If int, minimum amount of free memory (in MB) on a device to consider it free.

    • If float, minimum percentage of free memory.

  • max_processes (int) – Maximum amount of computed processes on a device to consider it free.

  • verbose (bool) – Whether to show individual device information.

  • raise_error (bool) – Whether to raise an exception if not enough devices are available.

  • return_memory (bool) – Whether to return memory available on each GPU.

Returns:

available_devices – List with available GPUs indices or dict of indices and 'available' and 'max' memory (in MB)

Return type:

list

nbtools.core.get_gpu_free_memory(index, ratio=True)[source]

Get free memory of a device (ratio or size in MB).

nbtools.core.set_gpus(n=1, min_free_memory=0.9, max_processes=2, verbose=False, raise_error=False)[source]

Set the CUDA_VISIBLE_DEVICES variable to n available devices.

Parameters:
  • n (int, str) –

    • If 'max', then use maximum number of available devices.

    • If int, then number of devices to select.

  • min_free_memory (int, float) –

    • If int, minimum amount of free memory (in MB) on a device to consider it free.

    • If float, minimum percentage of free memory.

  • max_processes (int) – Maximum amount of computed processes on a device to consider it free.

  • verbose (bool or int) –

    Whether to show individual device information.

    • If 0 or False, then no information is displayed.

    • If 1 or True, then display the value assigned to CUDA_VISIBLE_DEVICES variable.

    • If 2, then display memory and process information for each device.

  • raise_error (bool) – Whether to raise an exception if not enough devices are available.

Returns:

devices – Indices of selected and reserved GPUs.

Return type:

list

Monitoring tools

class nbtools.nbstat.resource_inspector.ResourceInspector(formatter=None)[source]

A class to controll the process of gathering information about system resources into ResourceTables, merging them into views, and formatting into nice colored strings.