CTImagesMaskedBatch

Batch class CTImagesMaskedBatch for storing CT-scans with masks.

class radio.preprocessing.ct_masked_batch.CTImagesMaskedBatch(index, *args, **kwargs)[source]

Bases: radio.preprocessing.ct_batch.CTImagesBatch

Batch class for storing batch of ct-scans with masks for nodules.

Allows to load info about cancer nodules, then create cancer-masks for each patient. Created masks are stored in self.masks

Parameters:index (dataset.index) – ids of scans to be put in a batch
components

tuple of strings. – List names of data components of a batch, which are images, masks, origin and spacing. NOTE: Implementation of this attribute is required by Base class.

num_nodules

int – number of nodules in batch

images

ndarray – contains ct-scans for all patients in batch.

masks

ndarray – contains masks for all patients in batch.

nodules

np.recarray – contains info on cancer nodules location. record array contains the following information about nodules:

  • self.nodules.nodule_center – ndarray(num_nodules, 3) centers of nodules in world coords;
  • self.nodules.nodule_size – ndarray(num_nodules, 3) sizes of nodules along z, y, x in world coord;
  • self.nodules.img_size – ndarray(num_nodules, 3) sizes of images of patient data corresponding to nodules;
  • self.nodules.offset – ndarray(num_nodules, 3) position of individual patient scan inside batch;
  • self.nodules.spacing – ndarray(num_nodules, 3) of spacing attribute of patients which correspond to nodules;
  • self.nodules.origin – ndarray(num_nodules, 3) of origin attribute of patients which correspond to nodules.
binarize_mask(threshold=0.35)[source]

Binarize masks by threshold.

Parameters:threshold (float) – threshold for masks binarization.
central_crop(crop_size, crop_mask=False, **kwargs)[source]

Make crop of crop_size from center of images.

Parameters:
  • crop_size (tuple, list or ndarray of int) – (z,y,x)-shape of central crop along three axes(z,y,x order is used).
  • crop_mask (bool) – if True, crop the mask in the same way.
Returns:

Return type:

batch

classification_targets(threshold=10, **kwargs)[source]

Unpack data from batch in format suitable for classification task.

Parameters:threshold (int) – minimum number of ‘1’ pixels in mask to consider it cancerous.
Returns:targets for classification task: labels corresponding to cancerous nodules (‘1’) and non-cancerous nodules (‘0’).
Return type:ndarray(batch_size, 1)
components = ('images', 'masks', 'spacing', 'origin')
create_mask()[source]

Create masks component from nodules component.

Notes

nodules must be not None before calling this method. see fetch_nodules_info() for more details.

fetch_mask(shape)[source]

Create masks component of different size then images, using nodules component.

Parameters:shape (tuple, list or ndarray of int.) – (z_dim,y_dim,x_dim), shape of mask to be created.
Returns:
  • ndarray – 3d array with masks in form of skyscraper.
  • # TODO (one part of code from here repeats create_mask function) – better to unify these two func
fetch_nodules_from_mask(images_loaded=True)[source]

Fetch nodules info (centers and sizes) from masks.

Runs skimage.measure.labels for fetching nodules regions from masks. Extracts nodules info from segmented regions and put this information in self.nodules np.recarray.

Parameters:images_loaded (bool) – if True, i.e. images component is loaded, and image_size is used to compute correct nodules location inside skyscraper. If False, it doesn’t update info of location inside skyscraper.
Returns:
Return type:batch

Notes

Sizes along [zyx] will be the same.

fetch_nodules_info(nodules=None, nodules_records=None, update=False, images_loaded=True)[source]

Extract nodules’ info from nodules into attribute self.nodules.

Parameters:
  • nodules (pd.DataFrame) –
    contains:
    • ’seriesuid’: index of patient or series.
    • ’coordZ’,’coordY’,’coordX’: coordinates of nodules center.
    • ’diameter_mm’: diameter, in mm.
  • nodules_records (np.recarray) – if not None, should contain the same fields as describe in Note.
  • update (bool) – if False, warning appears to remind that nodules info will be earased and recomputed.
  • images_loaded (bool) – if True, i.e. images component is loaded, and image_size is used to compute correct nodules location inside skyscraper. If False, it doesn’t update info of location inside skyscraper.
Returns:

Return type:

batch

Notes

Run this action only after load(). The method fills in record array self.nodules that contains the following information about nodules:

  • self.nodules.nodule_center – ndarray(num_nodules, 3) centers of nodules in world coords;
  • self.nodules.nodule_size – ndarray(num_nodules, 3) sizes of nodules along z, y, x in world coord;
  • self.nodules.img_size – ndarray(num_nodules, 3) sizes of images of patient data corresponding to nodules;
  • self.nodules.offset – ndarray(num_nodules, 3) of biases of patients which correspond to nodules;
  • self.nodules.spacing – ndarray(num_nodules, 3) of spacinf attribute of patients which correspond to nodules;
  • self.nodules.origin – ndarray(num_nodules, 3) of origin attribute of patients which correspond to nodules.
  • self.nodules.patient_pos – ndarray(num_nodules, 1) refers to positions of patients which correspond to stored nodules.
flip()[source]

Invert the order of slices for each patient

Returns:
Return type:batch

Examples

>>> batch = batch.flip()
get_axial_slice(patient_pos, height)[source]

Get tuple of images slice and masks slice by patient and slice position.

Parameters:
  • patient_pos (int) – patient position in the batch
  • height (float) – number of slice (z-axis), scaled to [0:1] used to get slice with position: int(height * number_of slices_for_patient) from patient’s scan and mask.
Returns:

(images_slice,masks_slice) by patient_pos and number of slice

Return type:

tuple

get_pos(data, component, index)[source]

Return a positon of an item for a given index in data or in self.`component`.

Fetch correct position inside batch for an item, looks for it in data, if provided, or in component in self.

Parameters:
  • data (None or ndarray) – data from which subsetting is done. If None, retrieve position from component of batch, if ndarray, returns index.
  • component (str) – name of a component, f.ex. ‘images’. if component provided, data should be None.
  • index (str or int) – index of an item to be looked for. may be key from dataset (str) or index inside batch (int).
Returns:

Position of item

Return type:

int

Notes

This is an overload of get_pos from base Batch-class, see corresponding docstring for detailed explanation.

static make_data_keras(batch, model=None, mode='segmentation', is_training=True, **kwargs)[source]

Prepare data in batch for training neural network implemented in keras.

Parameters:
  • mode (str) – mode can be one of following ‘classification’, ‘regression’ or ‘segmentation’. Default is ‘segmentation’.
  • data_format (str) – data format batch data. Can be ‘channels_last’ or ‘channels_first’. Default is ‘channels_last’.
  • is_training (bool) – whether model is in training or prediction mode. Default is True.
  • threshold (int) – threshold value of ‘1’ pixels in masks to consider it cancerous. Default is 10.
Returns:

kwargs for keras model train method: {‘x’: ndarray(…), ‘y’: ndarrray(…)} for training neural network.

Return type:

dict or None

static make_data_tf(batch, model=None, mode='segmentation', is_training=True, **kwargs)[source]

Prepare data in batch for training neural network implemented in tensorflow.

Parameters:
  • mode (str) – mode can be one of following ‘classification’, ‘regression’ or ‘segmentation’. Default is ‘segmentation’.
  • data_format (str) – data format batch data. Can be ‘channels_last’ or ‘channels_first’. Default is ‘channels_last’.
  • is_training (bool) – whether model is in training or prediction mode. Default is True.
  • threshold (int) – threshold value of ‘1’ pixels in masks to consider it cancerous. Default is 10.
Returns:

feed dict and fetches for training neural network.

Return type:

dict or None

static make_indices(size)[source]

Generate list of batch indices of given size.

Parameters:size (int) – size of list with indices
Returns:list of random indices
Return type:list

Examples

>>> indices = CTImagesMaskedBatch.make_indices(20)
>>> indices
array(['3c3eb09b', '5b192d1f', 'f28ddbb0', '14460196', '31a92510',
       '3f324e44', '066ccf28', '5570938d', '5d1fb8f6', '539ea09c',
       '68f9f235', '8f7b0c49', 'c7903591', 'dc8e9504', '54e9eebc',
       '778abd5a', '99691fc6', '7da49e85', '0f343345', '876fb9e6'], dtype='<U8')
make_xip(depth, stride=1, mode='max', projection='axial', padding='reflect', **kwargs)[source]

Make intensity projection (maximum, minimum, mean or median).

Notice that axis is chosen according to projection argument.

Parameters:
  • depth (int) – number of slices over which xip operation is performed.
  • stride (int) – stride-step along projection dimension.
  • mode (str) – Possible values are ‘max’, ‘min’, ‘mean’ or ‘median’.
  • projection (str) – Possible values: ‘axial’, ‘coronal’, ‘sagital’. In case of ‘coronal’ and ‘sagital’ projections tensor will be transposed from [z,y,x] to [x,z,y] and [y,z,x].
  • padding (str) – mode of padding that will be passed in numpy.padding function.
nodules_dtype = dtype([('patient_pos', '<i4'), ('offset', '<i4', (3,)), ('img_size', '<i4', (3,)), ('nodule_center', '<f8', (3,)), ('nodule_size', '<f8', (3,)), ('spacing', '<f8', (3,)), ('origin', '<f8', (3,))])
nodules_to_df(nodules)[source]

Convert nodules_info ndarray into pandas dataframe.

Pandas DataFrame will contain following columns: ‘source_id’ - id of source element of batch; ‘nodule_id’ - generated id for nodules; ‘locZ’, ‘locY’, ‘locX’ - coordinates of nodules’ centers; ‘diamZ’, ‘diamY’, ‘diamX’ - sizes of nodules along zyx axes;

Parameters:nodules (ndarray of type nodules_info) – nodules_info type is defined inside of CTImagesMaskedBatch class.
Returns:centers, ids and sizes of nodules.
Return type:pd.DataFrame
num_nodules

Get number of nodules in CTImagesMaskedBatch.

Returns:number of nodules in CTImagesMaskedBatch. if fetch_nodules_info method has not been called yet returns 0.
Return type:int
predict_on_scan(model_name, strides=(16, 32, 32), crop_shape=(32, 64, 64), batch_size=4, targets_mode='segmentation', data_format='channels_last', show_progress=True, model_type='tf')[source]

Get predictions of the model on data contained in batch.

Transforms scan data into patches of shape CROP_SHAPE and then feed this patches sequentially into model with name specified by argument ‘model_name’; after that loads predicted masks or probabilities into ‘masks’ component of the current batch and returns it.

Parameters:
  • model_name (str) – name of model that will be used for predictions.
  • strides (tuple, list or ndarray of int) – (z,y,x)-strides for patching operation.
  • crop_shape (tuple, list or ndarray of int) – (z,y,x)-shape of crops.
  • batch_size (int) – number of patches to feed in model in one iteration.
  • targets_mode (str) – type of targets ‘segmentation’, ‘regression’ or ‘classification’.
  • data_format (str) – format of neural network input data, can be ‘channels_first’ or ‘channels_last’.
  • model_type (str) – represents type of model that will be used for prediction. Possible values are ‘keras’ or ‘tf’.
Returns:

Return type:

CTImagesMaskedBatch.

regression_targets(threshold=10, **kwargs)[source]

Unpack data from batch in format suitable for regression task.

Parameters:threshold (int) – minimum number of ‘1’ pixels in mask to consider it cancerous.
Returns:targets for regression task: cancer center, size and label(1 for cancerous and 0 for non-cancerous). Note that in case of non-cancerous crop first 6 column of output array will be set to zero.
Return type:ndarray(batch_size, 7)
sample_dump(dst, n_iters, nodule_size=(32, 64, 64), batch_size=20, share=0.8, **kwargs)[source]

Perform sample_nodules and dump on the same batch n_iters times.

Can be used for fast creation of large datasets of cancerous/non-cancerous crops.

Parameters:
  • dst (str) – folder to dump nodules in.
  • n_iters (int) – number of iterations to be performed.
  • nodule_size (tuple, list or ndarray of int) – (z,y,x)-shape of sampled nodules.
  • batch_size (int or None) – size of generated batches.
  • share (float) – share of cancer nodules. See docstring of sample_nodules for more info about possible combinations of parameters share and batch_size.
  • **kwargs (dict) – additional arguments supplied into sample_nodules. See docstring of sample_nodules for more info.
sample_nodules(batch_size, nodule_size=(32, 64, 64), share=0.8, variance=None, mask_shape=None, histo=None)[source]

Sample random crops of images and masks from batch.

Create random crops, both with and without nodules in it, from input batch.

Parameters:
  • batch_size (int) – number of nodules in the output batch. Required, if share=0.0. If None, resulting batch will include all cancerous nodules.
  • nodule_size (tuple, list or ndarray of int) – crop shape along (z,y,x).
  • share (float) – share of cancer crops in the batch. if input CTImagesBatch contains less cancer nodules than needed random nodules will be taken.
  • variance (tuple, list or ndarray of float) – variances of normally distributed random shifts of nodules’ start positions.
  • mask_shape (tuple, list or ndarray of int) – size of masks crop in (z,y,x)-order. If not None, crops with masks would be of mask_shape. If None, mask crop shape would be equal to crop_size.
  • histo (tuple) – np.histogram()’s output. Used for sampling non-cancerous crops.
Returns:

batch with cancerous and non-cancerous crops in a proportion defined by share with total batch_size nodules. If share == 1.0, batch_size is None, resulting batch consists of all cancerous crops stored in batch.

Return type:

Batch

sample_random_nodules(num_nodules, nodule_size, histo=None)[source]

Sample random nodules positions in CTImagesBatchMasked.

Samples random nodules positions in ndarray. Each nodule have shape defined by nodule_size. If size of patients’ data along z-axis is not the same for different patients, NotImplementedError will be raised.

Parameters:
  • num_nodules (int) – number of nodules to sample from dataset.
  • nodule_size (ndarray(3, )) – crop shape along (z,y,x).
  • histo (tuple) – np.histogram()’s output. 3d-histogram, represented by tuple (bins, edges).
Returns:

ndarray(num_nodules, 3). 1st array’s dim is an index of sampled nodules, 2nd points out start positions (integers) of nodules in batch skyscraper.

Return type:

ndarray

segmentation_targets(data_format='channels_last', **kwargs)[source]

Unpack data from batch in format suitable for regression task.

Parameters:data_format (str) – data_format shows where to put new axis for channels dimension: can be ‘channels_last’ or ‘channels_first’.
Returns:batch array with masks.
Return type:ndarray(batch_size, ..)
unpack(component='images', **kwargs)[source]

Basic way for unpacking components from batch.

Parameters:
  • component (str) – component to unpack, can be ‘images’ or ‘masks’.
  • data_format (str) – can be ‘channels_last’ or ‘channels_first’. Reflects where to put channels dimension: right after batch dimension or after all spatial axes.
  • kwargs (dict) – key-word arguments that will be passed in callable if component argument reffers to method of batch class.
Returns:

Return type:

ndarray(batch_size, ..) or None

update_nodules_histo(histo)[source]

Update histogram of nodules’ locations using nodules locations from batch.

Parameters:histo (list) – list(np.histogram()), used for sampling cancerous locations.

Notes

Execute action only after .fetch_nodules_info().

radio.preprocessing.ct_masked_batch.get_nodules_numba[source]

Fetch nodules from array by starting positions.

Takes array with data of shape (z, y, x) from batch, ndarray(p, 3) with starting indices of nodules where p is number of nodules and size of type ndarray(3, ) which contains sizes of nodules along each axis. The output is 3d ndarray with nodules put in CTImagesBatch-compatible skyscraper structure.

Parameters:
  • data (ndarray) – CTImagesBatch skyscraper represented by 3D ndarray.
  • positions (ndarray(l, 3) of int) – Contains nodules’ starting indices along [zyx]-axis accordingly in data.
  • size (ndarray(3,) of int) – Contains nodules’ sizes along each axis (z,y,x).

Notes

Dtypes of positions and size arrays must be the same.

Returns:3d ndarray with nodules
Return type:ndarray