CTImagesMaskedBatch¶
Batch class CTImagesMaskedBatch for storing CT-scans with masks.
-
class
radio.preprocessing.ct_masked_batch.
CTImagesMaskedBatch
(index, *args, **kwargs)[source]¶ Bases:
radio.preprocessing.ct_batch.CTImagesBatch
Batch class for storing batch of ct-scans with masks for nodules.
Allows to load info about cancer nodules, then create cancer-masks for each patient. Created masks are stored in self.masks
Parameters: index (dataset.index) – ids of scans to be put in a batch -
components
¶ tuple of strings. – List names of data components of a batch, which are images, masks, origin and spacing. NOTE: Implementation of this attribute is required by Base class.
-
num_nodules
¶ int – number of nodules in batch
-
images
¶ ndarray – contains ct-scans for all patients in batch.
-
masks
¶ ndarray – contains masks for all patients in batch.
-
nodules
¶ np.recarray – contains info on cancer nodules location. record array contains the following information about nodules:
- self.nodules.nodule_center – ndarray(num_nodules, 3) centers of nodules in world coords;
- self.nodules.nodule_size – ndarray(num_nodules, 3) sizes of nodules along z, y, x in world coord;
- self.nodules.img_size – ndarray(num_nodules, 3) sizes of images of patient data corresponding to nodules;
- self.nodules.offset – ndarray(num_nodules, 3) position of individual patient scan inside batch;
- self.nodules.spacing – ndarray(num_nodules, 3) of spacing attribute of patients which correspond to nodules;
- self.nodules.origin – ndarray(num_nodules, 3) of origin attribute of patients which correspond to nodules.
-
binarize_mask
(threshold=0.35)[source]¶ Binarize masks by threshold.
Parameters: threshold (float) – threshold for masks binarization.
-
central_crop
(crop_size, crop_mask=False, **kwargs)[source]¶ Make crop of crop_size from center of images.
Parameters: Returns: Return type: batch
-
classification_targets
(threshold=10, **kwargs)[source]¶ Unpack data from batch in format suitable for classification task.
Parameters: threshold (int) – minimum number of ‘1’ pixels in mask to consider it cancerous. Returns: targets for classification task: labels corresponding to cancerous nodules (‘1’) and non-cancerous nodules (‘0’). Return type: ndarray(batch_size, 1)
-
components
= ('images', 'masks', 'spacing', 'origin')
-
create_mask
()[source]¶ Create masks component from nodules component.
Notes
nodules must be not None before calling this method. see
fetch_nodules_info()
for more details.
-
fetch_mask
(shape)[source]¶ Create masks component of different size then images, using nodules component.
Parameters: shape (tuple, list or ndarray of int.) – (z_dim,y_dim,x_dim), shape of mask to be created. Returns: - ndarray – 3d array with masks in form of skyscraper.
- # TODO (one part of code from here repeats create_mask function) – better to unify these two func
-
fetch_nodules_from_mask
(images_loaded=True)[source]¶ Fetch nodules info (centers and sizes) from masks.
Runs skimage.measure.labels for fetching nodules regions from masks. Extracts nodules info from segmented regions and put this information in self.nodules np.recarray.
Parameters: images_loaded (bool) – if True, i.e. images component is loaded, and image_size is used to compute correct nodules location inside skyscraper. If False, it doesn’t update info of location inside skyscraper. Returns: Return type: batch Notes
Sizes along [zyx] will be the same.
-
fetch_nodules_info
(nodules=None, nodules_records=None, update=False, images_loaded=True)[source]¶ Extract nodules’ info from nodules into attribute self.nodules.
Parameters: - nodules (pd.DataFrame) –
- contains:
- ’seriesuid’: index of patient or series.
- ’coordZ’,’coordY’,’coordX’: coordinates of nodules center.
- ’diameter_mm’: diameter, in mm.
- nodules_records (np.recarray) – if not None, should contain the same fields as describe in Note.
- update (bool) – if False, warning appears to remind that nodules info will be earased and recomputed.
- images_loaded (bool) – if True, i.e. images component is loaded, and image_size is used to compute correct nodules location inside skyscraper. If False, it doesn’t update info of location inside skyscraper.
Returns: Return type: batch
Notes
Run this action only after
load()
. The method fills in record array self.nodules that contains the following information about nodules:- self.nodules.nodule_center – ndarray(num_nodules, 3) centers of nodules in world coords;
- self.nodules.nodule_size – ndarray(num_nodules, 3) sizes of nodules along z, y, x in world coord;
- self.nodules.img_size – ndarray(num_nodules, 3) sizes of images of patient data corresponding to nodules;
- self.nodules.offset – ndarray(num_nodules, 3) of biases of patients which correspond to nodules;
- self.nodules.spacing – ndarray(num_nodules, 3) of spacinf attribute of patients which correspond to nodules;
- self.nodules.origin – ndarray(num_nodules, 3) of origin attribute of patients which correspond to nodules.
- self.nodules.patient_pos – ndarray(num_nodules, 1) refers to positions of patients which correspond to stored nodules.
- nodules (pd.DataFrame) –
-
flip
()[source]¶ Invert the order of slices for each patient
Returns: Return type: batch Examples
>>> batch = batch.flip()
-
get_axial_slice
(patient_pos, height)[source]¶ Get tuple of images slice and masks slice by patient and slice position.
Parameters: Returns: (images_slice,masks_slice) by patient_pos and number of slice
Return type:
-
get_pos
(data, component, index)[source]¶ Return a positon of an item for a given index in data or in self.`component`.
Fetch correct position inside batch for an item, looks for it in data, if provided, or in component in self.
Parameters: - data (None or ndarray) – data from which subsetting is done. If None, retrieve position from component of batch, if ndarray, returns index.
- component (str) – name of a component, f.ex. ‘images’. if component provided, data should be None.
- index (str or int) – index of an item to be looked for. may be key from dataset (str) or index inside batch (int).
Returns: Position of item
Return type: Notes
This is an overload of get_pos from base Batch-class, see corresponding docstring for detailed explanation.
-
static
make_data_keras
(batch, model=None, mode='segmentation', is_training=True, **kwargs)[source]¶ Prepare data in batch for training neural network implemented in keras.
Parameters: - mode (str) – mode can be one of following ‘classification’, ‘regression’ or ‘segmentation’. Default is ‘segmentation’.
- data_format (str) – data format batch data. Can be ‘channels_last’ or ‘channels_first’. Default is ‘channels_last’.
- is_training (bool) – whether model is in training or prediction mode. Default is True.
- threshold (int) – threshold value of ‘1’ pixels in masks to consider it cancerous. Default is 10.
Returns: kwargs for keras model train method: {‘x’: ndarray(…), ‘y’: ndarrray(…)} for training neural network.
Return type:
-
static
make_data_tf
(batch, model=None, mode='segmentation', is_training=True, **kwargs)[source]¶ Prepare data in batch for training neural network implemented in tensorflow.
Parameters: - mode (str) – mode can be one of following ‘classification’, ‘regression’ or ‘segmentation’. Default is ‘segmentation’.
- data_format (str) – data format batch data. Can be ‘channels_last’ or ‘channels_first’. Default is ‘channels_last’.
- is_training (bool) – whether model is in training or prediction mode. Default is True.
- threshold (int) – threshold value of ‘1’ pixels in masks to consider it cancerous. Default is 10.
Returns: feed dict and fetches for training neural network.
Return type:
-
static
make_indices
(size)[source]¶ Generate list of batch indices of given size.
Parameters: size (int) – size of list with indices Returns: list of random indices Return type: list Examples
>>> indices = CTImagesMaskedBatch.make_indices(20) >>> indices array(['3c3eb09b', '5b192d1f', 'f28ddbb0', '14460196', '31a92510', '3f324e44', '066ccf28', '5570938d', '5d1fb8f6', '539ea09c', '68f9f235', '8f7b0c49', 'c7903591', 'dc8e9504', '54e9eebc', '778abd5a', '99691fc6', '7da49e85', '0f343345', '876fb9e6'], dtype='<U8')
-
make_xip
(depth, stride=1, mode='max', projection='axial', padding='reflect', **kwargs)[source]¶ Make intensity projection (maximum, minimum, mean or median).
Notice that axis is chosen according to projection argument.
Parameters: - depth (int) – number of slices over which xip operation is performed.
- stride (int) – stride-step along projection dimension.
- mode (str) – Possible values are ‘max’, ‘min’, ‘mean’ or ‘median’.
- projection (str) – Possible values: ‘axial’, ‘coronal’, ‘sagital’. In case of ‘coronal’ and ‘sagital’ projections tensor will be transposed from [z,y,x] to [x,z,y] and [y,z,x].
- padding (str) – mode of padding that will be passed in numpy.padding function.
-
nodules_dtype
= dtype([('patient_pos', '<i4'), ('offset', '<i4', (3,)), ('img_size', '<i4', (3,)), ('nodule_center', '<f8', (3,)), ('nodule_size', '<f8', (3,)), ('spacing', '<f8', (3,)), ('origin', '<f8', (3,))])¶
-
nodules_to_df
(nodules)[source]¶ Convert nodules_info ndarray into pandas dataframe.
Pandas DataFrame will contain following columns: ‘source_id’ - id of source element of batch; ‘nodule_id’ - generated id for nodules; ‘locZ’, ‘locY’, ‘locX’ - coordinates of nodules’ centers; ‘diamZ’, ‘diamY’, ‘diamX’ - sizes of nodules along zyx axes;
Parameters: nodules (ndarray of type nodules_info) – nodules_info type is defined inside of CTImagesMaskedBatch class. Returns: centers, ids and sizes of nodules. Return type: pd.DataFrame
-
num_nodules
Get number of nodules in CTImagesMaskedBatch.
Returns: number of nodules in CTImagesMaskedBatch. if fetch_nodules_info method has not been called yet returns 0. Return type: int
-
predict_on_scan
(model_name, strides=(16, 32, 32), crop_shape=(32, 64, 64), batch_size=4, targets_mode='segmentation', data_format='channels_last', show_progress=True, model_type='tf')[source]¶ Get predictions of the model on data contained in batch.
Transforms scan data into patches of shape CROP_SHAPE and then feed this patches sequentially into model with name specified by argument ‘model_name’; after that loads predicted masks or probabilities into ‘masks’ component of the current batch and returns it.
Parameters: - model_name (str) – name of model that will be used for predictions.
- strides (tuple, list or ndarray of int) – (z,y,x)-strides for patching operation.
- crop_shape (tuple, list or ndarray of int) – (z,y,x)-shape of crops.
- batch_size (int) – number of patches to feed in model in one iteration.
- targets_mode (str) – type of targets ‘segmentation’, ‘regression’ or ‘classification’.
- data_format (str) – format of neural network input data, can be ‘channels_first’ or ‘channels_last’.
- model_type (str) – represents type of model that will be used for prediction. Possible values are ‘keras’ or ‘tf’.
Returns: Return type: CTImagesMaskedBatch.
-
regression_targets
(threshold=10, **kwargs)[source]¶ Unpack data from batch in format suitable for regression task.
Parameters: threshold (int) – minimum number of ‘1’ pixels in mask to consider it cancerous. Returns: targets for regression task: cancer center, size and label(1 for cancerous and 0 for non-cancerous). Note that in case of non-cancerous crop first 6 column of output array will be set to zero. Return type: ndarray(batch_size, 7)
-
sample_dump
(dst, n_iters, nodule_size=(32, 64, 64), batch_size=20, share=0.8, **kwargs)[source]¶ Perform sample_nodules and dump on the same batch n_iters times.
Can be used for fast creation of large datasets of cancerous/non-cancerous crops.
Parameters: - dst (str) – folder to dump nodules in.
- n_iters (int) – number of iterations to be performed.
- nodule_size (tuple, list or ndarray of int) – (z,y,x)-shape of sampled nodules.
- batch_size (int or None) – size of generated batches.
- share (float) – share of cancer nodules. See docstring of sample_nodules for more info about possible combinations of parameters share and batch_size.
- **kwargs (dict) – additional arguments supplied into sample_nodules. See docstring of sample_nodules for more info.
-
sample_nodules
(batch_size, nodule_size=(32, 64, 64), share=0.8, variance=None, mask_shape=None, histo=None)[source]¶ Sample random crops of images and masks from batch.
Create random crops, both with and without nodules in it, from input batch.
Parameters: - batch_size (int) – number of nodules in the output batch. Required, if share=0.0. If None, resulting batch will include all cancerous nodules.
- nodule_size (tuple, list or ndarray of int) – crop shape along (z,y,x).
- share (float) – share of cancer crops in the batch. if input CTImagesBatch contains less cancer nodules than needed random nodules will be taken.
- variance (tuple, list or ndarray of float) – variances of normally distributed random shifts of nodules’ start positions.
- mask_shape (tuple, list or ndarray of int) – size of masks crop in (z,y,x)-order. If not None, crops with masks would be of mask_shape. If None, mask crop shape would be equal to crop_size.
- histo (tuple) – np.histogram()’s output. Used for sampling non-cancerous crops.
Returns: batch with cancerous and non-cancerous crops in a proportion defined by share with total batch_size nodules. If share == 1.0, batch_size is None, resulting batch consists of all cancerous crops stored in batch.
Return type: Batch
-
sample_random_nodules
(num_nodules, nodule_size, histo=None)[source]¶ Sample random nodules positions in CTImagesBatchMasked.
Samples random nodules positions in ndarray. Each nodule have shape defined by nodule_size. If size of patients’ data along z-axis is not the same for different patients, NotImplementedError will be raised.
Parameters: Returns: ndarray(num_nodules, 3). 1st array’s dim is an index of sampled nodules, 2nd points out start positions (integers) of nodules in batch skyscraper.
Return type: ndarray
-
segmentation_targets
(data_format='channels_last', **kwargs)[source]¶ Unpack data from batch in format suitable for regression task.
Parameters: data_format (str) – data_format shows where to put new axis for channels dimension: can be ‘channels_last’ or ‘channels_first’. Returns: batch array with masks. Return type: ndarray(batch_size, ..)
-
unpack
(component='images', **kwargs)[source]¶ Basic way for unpacking components from batch.
Parameters: - component (str) – component to unpack, can be ‘images’ or ‘masks’.
- data_format (str) – can be ‘channels_last’ or ‘channels_first’. Reflects where to put channels dimension: right after batch dimension or after all spatial axes.
- kwargs (dict) – key-word arguments that will be passed in callable if component argument reffers to method of batch class.
Returns: Return type: ndarray(batch_size, ..) or None
-
-
radio.preprocessing.ct_masked_batch.
get_nodules_numba
[source]¶ Fetch nodules from array by starting positions.
Takes array with data of shape (z, y, x) from batch, ndarray(p, 3) with starting indices of nodules where p is number of nodules and size of type ndarray(3, ) which contains sizes of nodules along each axis. The output is 3d ndarray with nodules put in CTImagesBatch-compatible skyscraper structure.
Parameters: - data (ndarray) – CTImagesBatch skyscraper represented by 3D ndarray.
- positions (ndarray(l, 3) of int) – Contains nodules’ starting indices along [zyx]-axis accordingly in data.
- size (ndarray(3,) of int) – Contains nodules’ sizes along each axis (z,y,x).
Notes
Dtypes of positions and size arrays must be the same.
Returns: 3d ndarray with nodules Return type: ndarray