
Batch class CTImagesMaskedBatch for storing CT-scans with masks.

class radio.preprocessing.ct_masked_batch.CTImagesMaskedBatch(index, *args, **kwargs)[source]

Bases: radio.preprocessing.ct_batch.CTImagesBatch

Batch class for storing batch of ct-scans with masks for nodules.

Allows to load info about cancer nodules, then create cancer-masks for each patient. Created masks are stored in self.masks

Parameters:index (dataset.index) – ids of scans to be put in a batch

tuple of strings. – List names of data components of a batch, which are images, masks, origin and spacing. NOTE: Implementation of this attribute is required by Base class.


int – number of nodules in batch


ndarray – contains ct-scans for all patients in batch.


ndarray – contains masks for all patients in batch.


np.recarray – contains info on cancer nodules location. record array contains the following information about nodules:

  • self.nodules.nodule_center – ndarray(num_nodules, 3) centers of nodules in world coords;
  • self.nodules.nodule_size – ndarray(num_nodules, 3) sizes of nodules along z, y, x in world coord;
  • self.nodules.img_size – ndarray(num_nodules, 3) sizes of images of patient data corresponding to nodules;
  • self.nodules.offset – ndarray(num_nodules, 3) position of individual patient scan inside batch;
  • self.nodules.spacing – ndarray(num_nodules, 3) of spacing attribute of patients which correspond to nodules;
  • self.nodules.origin – ndarray(num_nodules, 3) of origin attribute of patients which correspond to nodules.

Binarize masks by threshold.

Parameters:threshold (float) – threshold for masks binarization.
central_crop(crop_size, crop_mask=False, **kwargs)[source]

Make crop of crop_size from center of images.

  • crop_size (tuple, list or ndarray of int) – (z,y,x)-shape of central crop along three axes(z,y,x order is used).
  • crop_mask (bool) – if True, crop the mask in the same way.

Return type:


classification_targets(threshold=10, **kwargs)[source]

Unpack data from batch in format suitable for classification task.

Parameters:threshold (int) – minimum number of ‘1’ pixels in mask to consider it cancerous.
Returns:targets for classification task: labels corresponding to cancerous nodules (‘1’) and non-cancerous nodules (‘0’).
Return type:ndarray(batch_size, 1)
components = ('images', 'masks', 'spacing', 'origin')

Create masks component from nodules component.


nodules must be not None before calling this method. see fetch_nodules_info() for more details.


Create masks component of different size then images, using nodules component.

Parameters:shape (tuple, list or ndarray of int.) – (z_dim,y_dim,x_dim), shape of mask to be created.
  • ndarray – 3d array with masks in form of skyscraper.
  • # TODO (one part of code from here repeats create_mask function) – better to unify these two func

Fetch nodules info (centers and sizes) from masks.

Runs skimage.measure.labels for fetching nodules regions from masks. Extracts nodules info from segmented regions and put this information in self.nodules np.recarray.

Parameters:images_loaded (bool) – if True, i.e. images component is loaded, and image_size is used to compute correct nodules location inside skyscraper. If False, it doesn’t update info of location inside skyscraper.
Return type:batch


Sizes along [zyx] will be the same.

fetch_nodules_info(nodules=None, nodules_records=None, update=False, images_loaded=True)[source]

Extract nodules’ info from nodules into attribute self.nodules.

  • nodules (pd.DataFrame) –
    • ’seriesuid’: index of patient or series.
    • ’coordZ’,’coordY’,’coordX’: coordinates of nodules center.
    • ’diameter_mm’: diameter, in mm.
  • nodules_records (np.recarray) – if not None, should contain the same fields as describe in Note.
  • update (bool) – if False, warning appears to remind that nodules info will be earased and recomputed.
  • images_loaded (bool) – if True, i.e. images component is loaded, and image_size is used to compute correct nodules location inside skyscraper. If False, it doesn’t update info of location inside skyscraper.

Return type:



Run this action only after load(). The method fills in record array self.nodules that contains the following information about nodules:

  • self.nodules.nodule_center – ndarray(num_nodules, 3) centers of nodules in world coords;
  • self.nodules.nodule_size – ndarray(num_nodules, 3) sizes of nodules along z, y, x in world coord;
  • self.nodules.img_size – ndarray(num_nodules, 3) sizes of images of patient data corresponding to nodules;
  • self.nodules.offset – ndarray(num_nodules, 3) of biases of patients which correspond to nodules;
  • self.nodules.spacing – ndarray(num_nodules, 3) of spacinf attribute of patients which correspond to nodules;
  • self.nodules.origin – ndarray(num_nodules, 3) of origin attribute of patients which correspond to nodules.
  • self.nodules.patient_pos – ndarray(num_nodules, 1) refers to positions of patients which correspond to stored nodules.

Invert the order of slices for each patient

Return type:batch


>>> batch = batch.flip()
get_axial_slice(patient_pos, height)[source]

Get tuple of images slice and masks slice by patient and slice position.

  • patient_pos (int) – patient position in the batch
  • height (float) – number of slice (z-axis), scaled to [0:1] used to get slice with position: int(height * number_of slices_for_patient) from patient’s scan and mask.

(images_slice,masks_slice) by patient_pos and number of slice

Return type:


get_pos(data, component, index)[source]

Return a positon of an item for a given index in data or in self.`component`.

Fetch correct position inside batch for an item, looks for it in data, if provided, or in component in self.

  • data (None or ndarray) – data from which subsetting is done. If None, retrieve position from component of batch, if ndarray, returns index.
  • component (str) – name of a component, f.ex. ‘images’. if component provided, data should be None.
  • index (str or int) – index of an item to be looked for. may be key from dataset (str) or index inside batch (int).

Position of item

Return type:



This is an overload of get_pos from base Batch-class, see corresponding docstring for detailed explanation.

static make_data_keras(batch, model=None, mode='segmentation', is_training=True, **kwargs)[source]

Prepare data in batch for training neural network implemented in keras.

  • mode (str) – mode can be one of following ‘classification’, ‘regression’ or ‘segmentation’. Default is ‘segmentation’.
  • data_format (str) – data format batch data. Can be ‘channels_last’ or ‘channels_first’. Default is ‘channels_last’.
  • is_training (bool) – whether model is in training or prediction mode. Default is True.
  • threshold (int) – threshold value of ‘1’ pixels in masks to consider it cancerous. Default is 10.

kwargs for keras model train method: {‘x’: ndarray(…), ‘y’: ndarrray(…)} for training neural network.

Return type:

dict or None

static make_data_tf(batch, model=None, mode='segmentation', is_training=True, **kwargs)[source]

Prepare data in batch for training neural network implemented in tensorflow.

  • mode (str) – mode can be one of following ‘classification’, ‘regression’ or ‘segmentation’. Default is ‘segmentation’.
  • data_format (str) – data format batch data. Can be ‘channels_last’ or ‘channels_first’. Default is ‘channels_last’.
  • is_training (bool) – whether model is in training or prediction mode. Default is True.
  • threshold (int) – threshold value of ‘1’ pixels in masks to consider it cancerous. Default is 10.

feed dict and fetches for training neural network.

Return type:

dict or None

static make_indices(size)[source]

Generate list of batch indices of given size.

Parameters:size (int) – size of list with indices
Returns:list of random indices
Return type:list


>>> indices = CTImagesMaskedBatch.make_indices(20)
>>> indices
array(['3c3eb09b', '5b192d1f', 'f28ddbb0', '14460196', '31a92510',
       '3f324e44', '066ccf28', '5570938d', '5d1fb8f6', '539ea09c',
       '68f9f235', '8f7b0c49', 'c7903591', 'dc8e9504', '54e9eebc',
       '778abd5a', '99691fc6', '7da49e85', '0f343345', '876fb9e6'], dtype='<U8')
make_xip(depth, stride=1, mode='max', projection='axial', padding='reflect', **kwargs)[source]

Make intensity projection (maximum, minimum, mean or median).

Notice that axis is chosen according to projection argument.

  • depth (int) – number of slices over which xip operation is performed.
  • stride (int) – stride-step along projection dimension.
  • mode (str) – Possible values are ‘max’, ‘min’, ‘mean’ or ‘median’.
  • projection (str) – Possible values: ‘axial’, ‘coronal’, ‘sagital’. In case of ‘coronal’ and ‘sagital’ projections tensor will be transposed from [z,y,x] to [x,z,y] and [y,z,x].
  • padding (str) – mode of padding that will be passed in numpy.padding function.
nodules_dtype = dtype([('patient_pos', '<i4'), ('offset', '<i4', (3,)), ('img_size', '<i4', (3,)), ('nodule_center', '<f8', (3,)), ('nodule_size', '<f8', (3,)), ('spacing', '<f8', (3,)), ('origin', '<f8', (3,))])

Convert nodules_info ndarray into pandas dataframe.

Pandas DataFrame will contain following columns: ‘source_id’ - id of source element of batch; ‘nodule_id’ - generated id for nodules; ‘locZ’, ‘locY’, ‘locX’ - coordinates of nodules’ centers; ‘diamZ’, ‘diamY’, ‘diamX’ - sizes of nodules along zyx axes;

Parameters:nodules (ndarray of type nodules_info) – nodules_info type is defined inside of CTImagesMaskedBatch class.
Returns:centers, ids and sizes of nodules.
Return type:pd.DataFrame

Get number of nodules in CTImagesMaskedBatch.

Returns:number of nodules in CTImagesMaskedBatch. if fetch_nodules_info method has not been called yet returns 0.
Return type:int
predict_on_scan(model_name, strides=(16, 32, 32), crop_shape=(32, 64, 64), batch_size=4, targets_mode='segmentation', data_format='channels_last', show_progress=True, model_type='tf')[source]

Get predictions of the model on data contained in batch.

Transforms scan data into patches of shape CROP_SHAPE and then feed this patches sequentially into model with name specified by argument ‘model_name’; after that loads predicted masks or probabilities into ‘masks’ component of the current batch and returns it.

  • model_name (str) – name of model that will be used for predictions.
  • strides (tuple, list or ndarray of int) – (z,y,x)-strides for patching operation.
  • crop_shape (tuple, list or ndarray of int) – (z,y,x)-shape of crops.
  • batch_size (int) – number of patches to feed in model in one iteration.
  • targets_mode (str) – type of targets ‘segmentation’, ‘regression’ or ‘classification’.
  • data_format (str) – format of neural network input data, can be ‘channels_first’ or ‘channels_last’.
  • model_type (str) – represents type of model that will be used for prediction. Possible values are ‘keras’ or ‘tf’.

Return type:


regression_targets(threshold=10, **kwargs)[source]

Unpack data from batch in format suitable for regression task.

Parameters:threshold (int) – minimum number of ‘1’ pixels in mask to consider it cancerous.
Returns:targets for regression task: cancer center, size and label(1 for cancerous and 0 for non-cancerous). Note that in case of non-cancerous crop first 6 column of output array will be set to zero.
Return type:ndarray(batch_size, 7)
sample_dump(dst, n_iters, nodule_size=(32, 64, 64), batch_size=20, share=0.8, **kwargs)[source]

Perform sample_nodules and dump on the same batch n_iters times.

Can be used for fast creation of large datasets of cancerous/non-cancerous crops.

  • dst (str) – folder to dump nodules in.
  • n_iters (int) – number of iterations to be performed.
  • nodule_size (tuple, list or ndarray of int) – (z,y,x)-shape of sampled nodules.
  • batch_size (int or None) – size of generated batches.
  • share (float) – share of cancer nodules. See docstring of sample_nodules for more info about possible combinations of parameters share and batch_size.
  • **kwargs (dict) – additional arguments supplied into sample_nodules. See docstring of sample_nodules for more info.
sample_nodules(batch_size, nodule_size=(32, 64, 64), share=0.8, variance=None, mask_shape=None, histo=None)[source]

Sample random crops of images and masks from batch.

Create random crops, both with and without nodules in it, from input batch.

  • batch_size (int) – number of nodules in the output batch. Required, if share=0.0. If None, resulting batch will include all cancerous nodules.
  • nodule_size (tuple, list or ndarray of int) – crop shape along (z,y,x).
  • share (float) – share of cancer crops in the batch. if input CTImagesBatch contains less cancer nodules than needed random nodules will be taken.
  • variance (tuple, list or ndarray of float) – variances of normally distributed random shifts of nodules’ start positions.
  • mask_shape (tuple, list or ndarray of int) – size of masks crop in (z,y,x)-order. If not None, crops with masks would be of mask_shape. If None, mask crop shape would be equal to crop_size.
  • histo (tuple) – np.histogram()’s output. Used for sampling non-cancerous crops.

batch with cancerous and non-cancerous crops in a proportion defined by share with total batch_size nodules. If share == 1.0, batch_size is None, resulting batch consists of all cancerous crops stored in batch.

Return type:


sample_random_nodules(num_nodules, nodule_size, histo=None)[source]

Sample random nodules positions in CTImagesBatchMasked.

Samples random nodules positions in ndarray. Each nodule have shape defined by nodule_size. If size of patients’ data along z-axis is not the same for different patients, NotImplementedError will be raised.

  • num_nodules (int) – number of nodules to sample from dataset.
  • nodule_size (ndarray(3, )) – crop shape along (z,y,x).
  • histo (tuple) – np.histogram()’s output. 3d-histogram, represented by tuple (bins, edges).

ndarray(num_nodules, 3). 1st array’s dim is an index of sampled nodules, 2nd points out start positions (integers) of nodules in batch skyscraper.

Return type:


segmentation_targets(data_format='channels_last', **kwargs)[source]

Unpack data from batch in format suitable for regression task.

Parameters:data_format (str) – data_format shows where to put new axis for channels dimension: can be ‘channels_last’ or ‘channels_first’.
Returns:batch array with masks.
Return type:ndarray(batch_size, ..)
unpack(component='images', **kwargs)[source]

Basic way for unpacking components from batch.

  • component (str) – component to unpack, can be ‘images’ or ‘masks’.
  • data_format (str) – can be ‘channels_last’ or ‘channels_first’. Reflects where to put channels dimension: right after batch dimension or after all spatial axes.
  • kwargs (dict) – key-word arguments that will be passed in callable if component argument reffers to method of batch class.

Return type:

ndarray(batch_size, ..) or None


Update histogram of nodules’ locations using nodules locations from batch.

Parameters:histo (list) – list(np.histogram()), used for sampling cancerous locations.


Execute action only after .fetch_nodules_info().


Fetch nodules from array by starting positions.

Takes array with data of shape (z, y, x) from batch, ndarray(p, 3) with starting indices of nodules where p is number of nodules and size of type ndarray(3, ) which contains sizes of nodules along each axis. The output is 3d ndarray with nodules put in CTImagesBatch-compatible skyscraper structure.

  • data (ndarray) – CTImagesBatch skyscraper represented by 3D ndarray.
  • positions (ndarray(l, 3) of int) – Contains nodules’ starting indices along [zyx]-axis accordingly in data.
  • size (ndarray(3,) of int) – Contains nodules’ sizes along each axis (z,y,x).


Dtypes of positions and size arrays must be the same.

Returns:3d ndarray with nodules
Return type:ndarray