CTImagesMaskedBatch¶

Batch class CTImagesMaskedBatch for storing CT-scans with masks.

class radio.preprocessing.ct_masked_batch.CTImagesMaskedBatch(index, *args, **kwargs)[source]¶

Bases: radio.preprocessing.ct_batch.CTImagesBatch

Batch class for storing batch of ct-scans with masks for nodules.

Allows to load info about cancer nodules, then create cancer-masks for each patient. Created masks are stored in self.masks

Parameters:	index (dataset.index) – ids of scans to be put in a batch

components¶: tuple of strings. – List names of data components of a batch, which are images, masks, origin and spacing. NOTE: Implementation of this attribute is required by Base class.

num_nodules¶: int – number of nodules in batch

images¶: ndarray – contains ct-scans for all patients in batch.

masks¶: ndarray – contains masks for all patients in batch.

nodules¶

np.recarray – contains info on cancer nodules location. record array contains the following information about nodules:

self.nodules.nodule_center – ndarray(num_nodules, 3) centers of nodules in world coords;

self.nodules.nodule_size – ndarray(num_nodules, 3) sizes of nodules along z, y, x in world coord;

self.nodules.img_size – ndarray(num_nodules, 3) sizes of images of patient data corresponding to nodules;

self.nodules.offset – ndarray(num_nodules, 3) position of individual patient scan inside batch;

self.nodules.spacing – ndarray(num_nodules, 3) of spacing attribute of patients which correspond to nodules;

self.nodules.origin – ndarray(num_nodules, 3) of origin attribute of patients which correspond to nodules.

binarize_mask(threshold=0.35)[source]¶

Binarize masks by threshold.

Parameters:	threshold (float) – threshold for masks binarization.

central_crop(crop_size, crop_mask=False, **kwargs)[source]¶

Make crop of crop_size from center of images.

Parameters:	crop_size (tuple, list or ndarray of int) – (z,y,x)-shape of central crop along three axes(z,y,x order is used). crop_mask (bool) – if True, crop the mask in the same way.
Returns:
Return type:	batch

classification_targets(threshold=10, **kwargs)[source]¶

Unpack data from batch in format suitable for classification task.

Parameters:	threshold (int) – minimum number of ‘1’ pixels in mask to consider it cancerous.
Returns:	targets for classification task: labels corresponding to cancerous nodules (‘1’) and non-cancerous nodules (‘0’).
Return type:	ndarray(batch_size, 1)

components = ('images', 'masks', 'spacing', 'origin')

create_mask()[source]¶

Create masks component from nodules component.

Notes

nodules must be not None before calling this method. see fetch_nodules_info() for more details.

fetch_mask(shape)[source]¶

Create masks component of different size then images, using nodules component.

Parameters:	shape (tuple, list or ndarray of int.) – (z_dim,y_dim,x_dim), shape of mask to be created.
Returns:	ndarray – 3d array with masks in form of skyscraper. # TODO (one part of code from here repeats create_mask function) – better to unify these two func

fetch_nodules_from_mask(images_loaded=True)[source]¶

Fetch nodules info (centers and sizes) from masks.

Runs skimage.measure.labels for fetching nodules regions from masks. Extracts nodules info from segmented regions and put this information in self.nodules np.recarray.

Parameters:	images_loaded (bool) – if True, i.e. images component is loaded, and image_size is used to compute correct nodules location inside skyscraper. If False, it doesn’t update info of location inside skyscraper.
Returns:
Return type:	batch

Notes

Sizes along [zyx] will be the same.

fetch_nodules_info(nodules=None, nodules_records=None, update=False, images_loaded=True)[source]¶

Extract nodules’ info from nodules into attribute self.nodules.

Parameters:	nodules (pd.DataFrame) – contains: ’seriesuid’: index of patient or series. ’coordZ’,’coordY’,’coordX’: coordinates of nodules center. ’diameter_mm’: diameter, in mm. nodules_records (np.recarray) – if not None, should contain the same fields as describe in Note. update (bool) – if False, warning appears to remind that nodules info will be earased and recomputed. images_loaded (bool) – if True, i.e. images component is loaded, and image_size is used to compute correct nodules location inside skyscraper. If False, it doesn’t update info of location inside skyscraper.
Returns:
Return type:	batch

Notes

Run this action only after load(). The method fills in record array self.nodules that contains the following information about nodules:

self.nodules.nodule_center – ndarray(num_nodules, 3) centers of nodules in world coords;

self.nodules.nodule_size – ndarray(num_nodules, 3) sizes of nodules along z, y, x in world coord;

self.nodules.img_size – ndarray(num_nodules, 3) sizes of images of patient data corresponding to nodules;

self.nodules.offset – ndarray(num_nodules, 3) of biases of patients which correspond to nodules;

self.nodules.spacing – ndarray(num_nodules, 3) of spacinf attribute of patients which correspond to nodules;

self.nodules.origin – ndarray(num_nodules, 3) of origin attribute of patients which correspond to nodules.

self.nodules.patient_pos – ndarray(num_nodules, 1) refers to positions of patients which correspond to stored nodules.

flip()[source]¶

Invert the order of slices for each patient

Returns:
Return type:	batch

Examples

>>> batch = batch.flip()

get_axial_slice(patient_pos, height)[source]¶

Get tuple of images slice and masks slice by patient and slice position.

Parameters:	patient_pos (int) – patient position in the batch height (float) – number of slice (z-axis), scaled to [0:1] used to get slice with position: int(height * number_of slices_for_patient) from patient’s scan and mask.
Returns:	(images_slice,masks_slice) by patient_pos and number of slice
Return type:	tuple

get_pos(data, component, index)[source]¶

Return a positon of an item for a given index in data or in self.`component`.

Fetch correct position inside batch for an item, looks for it in data, if provided, or in component in self.

Parameters:	data (None or ndarray) – data from which subsetting is done. If None, retrieve position from component of batch, if ndarray, returns index. component (str) – name of a component, f.ex. ‘images’. if component provided, data should be None. index (str or int) – index of an item to be looked for. may be key from dataset (str) or index inside batch (int).
Returns:	Position of item
Return type:	int

Notes

This is an overload of get_pos from base Batch-class, see corresponding docstring for detailed explanation.

static make_data_keras(batch, model=None, mode='segmentation', is_training=True, **kwargs)[source]¶

Prepare data in batch for training neural network implemented in keras.

Parameters:	mode (str) – mode can be one of following ‘classification’, ‘regression’ or ‘segmentation’. Default is ‘segmentation’. data_format (str) – data format batch data. Can be ‘channels_last’ or ‘channels_first’. Default is ‘channels_last’. is_training (bool) – whether model is in training or prediction mode. Default is True. threshold (int) – threshold value of ‘1’ pixels in masks to consider it cancerous. Default is 10.
Returns:	kwargs for keras model train method: {‘x’: ndarray(…), ‘y’: ndarrray(…)} for training neural network.
Return type:	dict or None

static make_data_tf(batch, model=None, mode='segmentation', is_training=True, **kwargs)[source]¶

Prepare data in batch for training neural network implemented in tensorflow.

Parameters:	mode (str) – mode can be one of following ‘classification’, ‘regression’ or ‘segmentation’. Default is ‘segmentation’. data_format (str) – data format batch data. Can be ‘channels_last’ or ‘channels_first’. Default is ‘channels_last’. is_training (bool) – whether model is in training or prediction mode. Default is True. threshold (int) – threshold value of ‘1’ pixels in masks to consider it cancerous. Default is 10.
Returns:	feed dict and fetches for training neural network.
Return type:	dict or None

static make_indices(size)[source]¶

Generate list of batch indices of given size.

Parameters:	size (int) – size of list with indices
Returns:	list of random indices
Return type:	list

Examples

>>> indices = CTImagesMaskedBatch.make_indices(20)
>>> indices
array(['3c3eb09b', '5b192d1f', 'f28ddbb0', '14460196', '31a92510',
       '3f324e44', '066ccf28', '5570938d', '5d1fb8f6', '539ea09c',
       '68f9f235', '8f7b0c49', 'c7903591', 'dc8e9504', '54e9eebc',
       '778abd5a', '99691fc6', '7da49e85', '0f343345', '876fb9e6'], dtype='<U8')

make_xip(depth, stride=1, mode='max', projection='axial', padding='reflect', **kwargs)[source]¶

Make intensity projection (maximum, minimum, mean or median).

Notice that axis is chosen according to projection argument.

Parameters:

depth (int) – number of slices over which xip operation is performed.
stride (int) – stride-step along projection dimension.
mode (str) – Possible values are ‘max’, ‘min’, ‘mean’ or ‘median’.
projection (str) – Possible values: ‘axial’, ‘coronal’, ‘sagital’. In case of ‘coronal’ and ‘sagital’ projections tensor will be transposed from [z,y,x] to [x,z,y] and [y,z,x].
padding (str) – mode of padding that will be passed in numpy.padding function.

nodules_dtype = dtype([('patient_pos', '<i4'), ('offset', '<i4', (3,)), ('img_size', '<i4', (3,)), ('nodule_center', '<f8', (3,)), ('nodule_size', '<f8', (3,)), ('spacing', '<f8', (3,)), ('origin', '<f8', (3,))])¶

nodules_to_df(nodules)[source]¶

Convert nodules_info ndarray into pandas dataframe.

Pandas DataFrame will contain following columns: ‘source_id’ - id of source element of batch; ‘nodule_id’ - generated id for nodules; ‘locZ’, ‘locY’, ‘locX’ - coordinates of nodules’ centers; ‘diamZ’, ‘diamY’, ‘diamX’ - sizes of nodules along zyx axes;

Parameters:	nodules (ndarray of type nodules_info) – nodules_info type is defined inside of CTImagesMaskedBatch class.
Returns:	centers, ids and sizes of nodules.
Return type:	pd.DataFrame

num_nodules

Get number of nodules in CTImagesMaskedBatch.

Returns:	number of nodules in CTImagesMaskedBatch. if fetch_nodules_info method has not been called yet returns 0.
Return type:	int

predict_on_scan(model_name, strides=(16, 32, 32), crop_shape=(32, 64, 64), batch_size=4, targets_mode='segmentation', data_format='channels_last', show_progress=True, model_type='tf')[source]¶

Get predictions of the model on data contained in batch.

Transforms scan data into patches of shape CROP_SHAPE and then feed this patches sequentially into model with name specified by argument ‘model_name’; after that loads predicted masks or probabilities into ‘masks’ component of the current batch and returns it.

Parameters:	model_name (str) – name of model that will be used for predictions. strides (tuple, list or ndarray of int) – (z,y,x)-strides for patching operation. crop_shape (tuple, list or ndarray of int) – (z,y,x)-shape of crops. batch_size (int) – number of patches to feed in model in one iteration. targets_mode (str) – type of targets ‘segmentation’, ‘regression’ or ‘classification’. data_format (str) – format of neural network input data, can be ‘channels_first’ or ‘channels_last’. model_type (str) – represents type of model that will be used for prediction. Possible values are ‘keras’ or ‘tf’.
Returns:
Return type:	CTImagesMaskedBatch.

regression_targets(threshold=10, **kwargs)[source]¶

Unpack data from batch in format suitable for regression task.

Parameters:	threshold (int) – minimum number of ‘1’ pixels in mask to consider it cancerous.
Returns:	targets for regression task: cancer center, size and label(1 for cancerous and 0 for non-cancerous). Note that in case of non-cancerous crop first 6 column of output array will be set to zero.
Return type:	ndarray(batch_size, 7)

sample_dump(dst, n_iters, nodule_size=(32, 64, 64), batch_size=20, share=0.8, **kwargs)[source]¶

Perform sample_nodules and dump on the same batch n_iters times.

Can be used for fast creation of large datasets of cancerous/non-cancerous crops.

Parameters:

dst (str) – folder to dump nodules in.
n_iters (int) – number of iterations to be performed.
nodule_size (tuple, list or ndarray of int) – (z,y,x)-shape of sampled nodules.
batch_size (int or None) – size of generated batches.
share (float) – share of cancer nodules. See docstring of sample_nodules for more info about possible combinations of parameters share and batch_size.
**kwargs (dict) – additional arguments supplied into sample_nodules. See docstring of sample_nodules for more info.

sample_nodules(batch_size, nodule_size=(32, 64, 64), share=0.8, variance=None, mask_shape=None, histo=None)[source]¶

Sample random crops of images and masks from batch.

Create random crops, both with and without nodules in it, from input batch.

Parameters:	batch_size (int) – number of nodules in the output batch. Required, if share=0.0. If None, resulting batch will include all cancerous nodules. nodule_size (tuple, list or ndarray of int) – crop shape along (z,y,x). share (float) – share of cancer crops in the batch. if input CTImagesBatch contains less cancer nodules than needed random nodules will be taken. variance (tuple, list or ndarray of float) – variances of normally distributed random shifts of nodules’ start positions. mask_shape (tuple, list or ndarray of int) – size of masks crop in (z,y,x)-order. If not None, crops with masks would be of mask_shape. If None, mask crop shape would be equal to crop_size. histo (tuple) – np.histogram()’s output. Used for sampling non-cancerous crops.
Returns:	batch with cancerous and non-cancerous crops in a proportion defined by share with total batch_size nodules. If share == 1.0, batch_size is None, resulting batch consists of all cancerous crops stored in batch.
Return type:	Batch

sample_random_nodules(num_nodules, nodule_size, histo=None)[source]¶

Sample random nodules positions in CTImagesBatchMasked.

Samples random nodules positions in ndarray. Each nodule have shape defined by nodule_size. If size of patients’ data along z-axis is not the same for different patients, NotImplementedError will be raised.

Parameters:	num_nodules (int) – number of nodules to sample from dataset. nodule_size (ndarray(3, )) – crop shape along (z,y,x). histo (tuple) – np.histogram()’s output. 3d-histogram, represented by tuple (bins, edges).
Returns:	ndarray(num_nodules, 3). 1st array’s dim is an index of sampled nodules, 2nd points out start positions (integers) of nodules in batch skyscraper.
Return type:	ndarray

segmentation_targets(data_format='channels_last', **kwargs)[source]¶

Unpack data from batch in format suitable for regression task.

Parameters:	data_format (str) – data_format shows where to put new axis for channels dimension: can be ‘channels_last’ or ‘channels_first’.
Returns:	batch array with masks.
Return type:	ndarray(batch_size, ..)

unpack(component='images', **kwargs)[source]¶

Basic way for unpacking components from batch.

Parameters:	component (str) – component to unpack, can be ‘images’ or ‘masks’. data_format (str) – can be ‘channels_last’ or ‘channels_first’. Reflects where to put channels dimension: right after batch dimension or after all spatial axes. kwargs (dict) – key-word arguments that will be passed in callable if component argument reffers to method of batch class.
Returns:
Return type:	ndarray(batch_size, ..) or None

update_nodules_histo(histo)[source]¶

Update histogram of nodules’ locations using nodules locations from batch.

Parameters:	histo (list) – list(np.histogram()), used for sampling cancerous locations.

Notes

Execute action only after .fetch_nodules_info().

radio.preprocessing.ct_masked_batch.get_nodules_numba[source]¶

Fetch nodules from array by starting positions.

Takes array with data of shape (z, y, x) from batch, ndarray(p, 3) with starting indices of nodules where p is number of nodules and size of type ndarray(3, ) which contains sizes of nodules along each axis. The output is 3d ndarray with nodules put in CTImagesBatch-compatible skyscraper structure.

Parameters:	data (ndarray) – CTImagesBatch skyscraper represented by 3D ndarray. positions (ndarray(l, 3) of int) – Contains nodules’ starting indices along [zyx]-axis accordingly in data. size (ndarray(3,) of int) – Contains nodules’ sizes along each axis (z,y,x).

Notes

Dtypes of positions and size arrays must be the same.

Returns:	3d ndarray with nodules
Return type:	ndarray

CTImagesMaskedBatch¶

Previous topic

Next topic

This Page