CTImagesBatch

class radio.CTImagesBatch(index, *args, **kwargs)[source]

Batch class for storing batch of CT-scans in 3D.

Contains a component images = 3d-array of stacked scans along number_of_slices (z) axis (aka “skyscraper”), associated information for subsetting individual patient’s 3D scan (_bounds, origin, spacing) and various methods to preprocess the data.

Parameters:index (dataset.index) – ids of scans to be put in a batch
components

tuple of strings. – List names of data components of a batch, which are images, origin and spacing. NOTE: Implementation of this attribute is required by Base class.

index

dataset.index – represents indices of scans from a batch

images

ndarray – contains ct-scans for all patients in batch.

spacing

ndarray of floats – represents distances between pixels in world coordinates

origin

ndarray of floats – contains world coordinates of (0, 0, 0)-pixel of scans

calc_lung_mask(patient, out_patient, res, erosion_radius, **kwargs)[source]

Return a mask for lungs

Parameters:erosion_radius (int) – radius of erosion to be performed.
central_crop(crop_size, **kwargs)[source]

Make crop of crop_size from center of images.

Parameters:crop_size (tuple, list or ndarray of int) – (z,y,x)-shape of crop.
Returns:
Return type:batch
components = ('images', 'spacing', 'origin')
classmethod concat(batches)[source]

Concatenate several batches in one large batch.

Assume that same components are filled in all supplied batches.

Parameters:batches (list or tuple of batches) – sequence of batches to be concatenated
Returns:large batch with length = sum of lengths of concated batches
Return type:batch

Notes

Old batches’ indexes are dropped. New large batch has new np-arange index. if None-entries or batches of len=0 are included in the list of batches, they will be dropped after concat.

dump(ix, dst, components=None, fmt='blosc', index_to_name=None, i8_encoding_mode=None)[source]

Dump chosen components of scans’ batcn in folder dst in specified format.

When some of the components are None, a warning is printed and nothing is dumped. By default (components is None) dump attempts to dump all components.

Parameters:
  • dst (str) – destination-folder where all patients’ data should be put
  • components (tuple, list, ndarray of strings or str) – component(s) that we need to dump (smth iterable or string). If not supplied, dump all components
  • fmt ('blosc') – format of dump. Currently only blosc-format is supported; in this case folder for each patient is created. Tree-structure of created files is demonstrated in the example below.
  • index_to_name (callable or None) – When supplied, should return str; A function that relates each item’s index to a name of item’s folder. That is, each item is dumped into os.path.join(dst, index_to_name(items_index)). If None, no transformation is applied and the method attempts to use indices of batch-items as names of items’ folders.
  • i8_encoding_mode (int, str or dict) – whether (and how) components of skyscraper-type should be cast to int8. If None, no cast is performed. The cast allows to save space on disk and to speed up batch-loading. However, it comes with loss of precision, as originally skyscraper-components are stored in float32-format. Can be int: 0, 1, 2 or str/None: ‘linear’, ‘quantization’ or None. 0 or None disable the cast. 1 stands for ‘linear’, 2 - for ‘quantization’. Can also be component-wise dict of modes, e.g.: {‘images’: ‘linear’, ‘masks’: 0}.

Examples

Initialize batch and load data

>>> ind = ['1ae34g90', '3hf82s76']
>>> batch = CTImagesBatch(ind)
>>> batch.load(...)
>>> batch.dump(dst='./data/blosc_preprocessed')

The command above creates following files:

  • ./data/blosc_preprocessed/1ae34g90/images/data.blk
  • ./data/blosc_preprocessed/1ae34g90/images/data.shape
  • ./data/blosc_preprocessed/1ae34g90/spacing/data.pkl
  • ./data/blosc_preprocessed/1ae34g90/origin/data.pkl
  • ./data/blosc_preprocessed/3hf82s76/images/data.blk
  • ./data/blosc_preprocessed/3hf82s76/images/data.shape
  • ./data/blosc_preprocessed/3hf82s76/spacing/data.pkl
  • ./data/blosc_preprocessed/3hf82s76/origin/data.pkl
flip(patient, out_patient, res)[source]

Invert the order of slices for each patient

Returns:
Return type:batch

Examples

>>> batch = batch.flip()
get_axial_slice(person_number, slice_height)[source]

Get axial slice (e.g., for plots)

Parameters:
  • person_number (str or int) – Can be either index (int) of person in the batch or patient_id (str)
  • slice_height (float) – scaled from 0 to 1 number of slice. e.g. 0.7 means that we take slice with number int(0.7 * number of slices for person)
Returns:

Return type:

ndarray (view)

Examples

Here self.index[5] usually smth like ‘a1de03fz29kf6h2’

>>> patch = batch.get_axial_slice(5, 0.6)
>>> patch = batch.get_axial_slice(self.index[5], 0.6)
get_patches(patch_shape, stride, padding='edge', data_attr='images')[source]

Extract patches of patch_shape with specified stride.

Parameters:
  • patch_shape (tuple, list or ndarray of int) – (z,y,x)-shape of a single patch.
  • stride (tuple, list or ndarray of int) – (z,y,x)-stride to slide over each patient’s data.
  • padding (str) – padding-type (see doc of np.pad for available types).
  • data_attr (str) – component to get data from.
Returns:

4d-ndaray of patches; first dimension enumerates patches

Return type:

ndarray

Notes

Shape of all patients data is needed to be the same at this step, resize/unify_spacing is required before.

get_pos(data, component, index)[source]

Return a positon of an item for a given index in data or in self.`component`.

Fetch correct position inside batch for an item, looks for it in data, if provided, or in component in self.

Parameters:
  • data (None or ndarray) – data from which subsetting is done. If None, retrieve position from component of batch, if ndarray, returns index.
  • component (str) – name of a component, f.ex. ‘images’. if component provided, data should be None.
  • index (str or int) – index of an item to be looked for. may be key from dataset (str) or index inside batch (int).
Returns:

Position of item

Return type:

int

Notes

This is an overload of get_pos from base Batch-class, see corresponding docstring for detailed explanation.

images_shape

Get shapes for all 3d scans in CTImagesBatch.

Returns:shapes of data for each patient, ndarray(patient_pos, 3)
Return type:ndarray
load(fmt='dicom', components=None, bounds=None, **kwargs)[source]

Load 3d scans data in batch.

Parameters:
  • fmt (str) – type of data. Can be ‘dicom’|’blosc’|’raw’|’ndarray’
  • components (tuple, list, ndarray of strings or str) – Contains names of batch component(s) that should be loaded. As of now, works only if fmt=’blosc’. If fmt != ‘blosc’, all available components are loaded. If None and fmt = ‘blosc’, again, all components are loaded.
  • bounds (ndarray(n_patients + 1, dtype=np.int) or None) – Needed iff fmt=’ndarray’. Bound-floors for items from a skyscraper (stacked scans).
  • **kwargs
    images : ndarray(n_patients * z, y, x) or None
    Needed only if fmt = ‘ndarray’ input array containing skyscraper (stacked scans).
    origin : ndarray(n_patients, 3) or None
    Needed only if fmt=’ndarray’. origins of scans in world coordinates.
    spacing : ndarray(n_patients, 3) or None
    Needed only if fmt=’ndarray’ ndarray with spacings of patients along z,y,x axes.
Returns:

Return type:

self

Examples

DICOM example initialize batch for storing batch of 3 patients with following IDs:

>>> index = FilesIndex(path="/some/path/*.dcm", no_ext=True)
>>> batch = CTImagesBatch(index)
>>> batch.load(fmt='dicom')

Ndarray example

images_array stores a set of 3d-scans concatted along 0-zxis, “skyscraper”. Say, it is a ndarray with shape (400, 256, 256)

bounds stores ndarray of last floors for each scan. say, bounds = np.asarray([0, 100, 400])

>>> batch.load(fmt='ndarray', images=images_array, bounds=bounds)
load_from_patches(patches, stride, scan_shape, data_attr='images')[source]

Get skyscraper from 4d-array of patches, put it to data_attr component in batch.

Let reconstruct original skyscraper from patches (if same arguments are passed)

Parameters:
  • patches (ndarray) – 4d-array of patches, with dims: (num_patches, z, y, x).
  • scan_shape (tuple, list or ndarray of int) – (z,y,x)-shape of individual scan (should be same for all scans).
  • stride (tuple, list or ndarray of int) – (z,y,x)-stride step used for gathering data from patches.
  • data_attr (str) – batch component name to store new data.

Notes

If stride != patch.shape(), averaging of overlapped regions is used. scan_shape, patches.shape(), stride are used to infer the number of items in new skyscraper. If patches were padded, padding is removed for skyscraper.

lower_bounds

Get lower bounds of patients data in CTImagesBatch.

Returns:ndarray(n_patients,) containing lower bounds of patients data along z-axis.
Return type:ndarray
make_xip(depth, stride=1, mode='max', projection='axial', padding='reflect', **kwargs)[source]

Make intensity projection (maximum, minimum, mean or median).

Notice that axis is chosen according to projection argument.

Parameters:
  • depth (int) – number of slices over which xip operation is performed.
  • stride (int) – stride-step along projection dimension.
  • mode (str) – Possible values are ‘max’, ‘min’, ‘mean’ or ‘median’.
  • projection (str) – Possible values: ‘axial’, ‘coronal’, ‘sagital’. In case of ‘coronal’ and ‘sagital’ projections tensor will be transposed from [z,y,x] to [x,z,y] and [y,z,x].
  • padding (str) – mode of padding that will be passed in numpy.padding function.
classmethod merge(batches, batch_size=None)[source]

Concatenate list of batches and then split the result in two batches of sizes (batch_size, sum(lens of batches) - batch_size)

Parameters:
  • batches (list of batches) –
  • batch_size (int) – length of first resulting batch
Returns:

(new_batch, rest_batch)

Return type:

tuple of batches

Notes

Merge performs split (of middle-batch) and then two concats because of speed considerations.

normalize_hu(min_hu=-1000, max_hu=400)[source]

Normalize HU-densities to interval [0, 255].

Trim HU that are outside range [min_hu, max_hu], then scale to [0, 255].

Parameters:
  • min_hu (int) – minimum value for hu that will be used as trimming threshold.
  • max_hu (int) – maximum value for hu that will be used as trimming threshold.
Returns:

Return type:

batch

Examples

>>> batch = batch.normalize_hu(min_hu=-1300, max_hu=600)
rescale(new_shape)[source]

Recomputes spacing values for patients’ data after resize.

Parameters:new_shape (ndarray(dtype=np.int)) – shape of patient 3d array after resize, in format np.array([z_dim, y_dim, x_dim], dtype=np.int).
Returns:ndarray(n_patients, 3) with spacing values for each patient along z, y, x axes.
Return type:ndarray
resize(patient, out_patient, res, shape=(128, 256, 256), method='pil-simd', axes_pairs=None, resample=None, order=3, *args, **kwargs)[source]

Resize (change shape of) each CT-scan in the batch.

When called from a batch, changes this batch.

Parameters:
  • shape (tuple, list or ndarray of int) – (z,y,x)-shape that should be AFTER resize. Note, that ct-scan dim_ordering also should be z,y,x
  • method (str) – interpolation package to be used. Either ‘pil-simd’ or ‘scipy’. Pil-simd ensures better quality and speed on configurations with average number of cores. On the contrary, scipy is better scaled and can show better performance on systems with large number of cores
  • axes_pairs (None or list/tuple of tuples with pairs) – pairs of axes that will be used for performing pil-simd resize, as this resize is made in 2d. Min number of pairs to use is 1, at max there can be 6 pairs. If None, set to ((0, 1), (1, 2)). The more pairs one uses, the more precise is the result. (and computation takes more time).
  • resample (filter of pil-simd resize. By default set to bilinear. Can be any of filters) – supported by PIL.Image.
  • order (the order of scipy-interpolation (<= 5)) – large value improves precision, but slows down the computaion.

Examples

>>> shape = (128, 256, 256)
>>> batch = batch.resize(shape=shape, order=2, method='scipy')
>>> batch = batch.resize(shape=shape, resample=PIL.Image.BILINEAR)
rotate(index, angle, components='images', axes=(1, 2), random=True, **kwargs)[source]

Rotate 3D images in batch on specific angle in plane.

Parameters:
  • angle (float) – degree of rotation.
  • components (tuple, list, ndarray of strings or str) – name(s) of components to rotate each item in it.
  • axes (tuple, list or ndarray of int) – (int, int), plane of rotation specified by two axes (zyx-ordering).
  • random (bool) – if True, then degree specifies maximum angle of rotation.
Returns:

ndarray of 3D rotated image.

Return type:

ndarray

Notes

zero padding automatically added after rotation. Use this action in the end of pipelines for purposes of augmentation. E.g., after sample_nodules()

Examples

Rotate images on 90 degrees:

>>> batch = batch.rotate(angle=90, axes=(1, 2), random=False)

Random rotation with maximum angle:

>>> batch = batch.rotate(angle=30, axes=(1, 2))
segment(erosion_radius=2, **kwargs)[source]

Segment lungs’ content from 3D array.

Parameters:erosion_radius (int) – radius of erosion to be performed.
Returns:
Return type:batch

Notes

Sets HU of every pixel outside lungs to DARK_HU = -2000.

Examples

>>> batch = batch.segment(erosion_radius=4, num_threads=20)
slice_shape

Get shape of slice in yx-plane.

Returns:ndarray([y_dim, x_dim],dtype=np.int) with shape of scan slice.
Return type:ndarray
classmethod split(batch, batch_size)[source]

Split one batch in two batches.

The lens of 2 batches would be batch_size and len(batch) - batch_size

Parameters:
  • batch (Batch class instance) – batch to be splitted in two
  • batch_size (int) – length of first returned batch. If batch_size >= len(batch), return None instead of a 2nd batch
Returns:

(1st_Batch, 2nd_Batch)

Return type:

tuple of batches

Notes

Method does not change the structure of input Batch.index. Indices of output batches are simply subsets of input Batch.index.

unify_spacing(patient, out_patient, res, factor, shape_resize, spacing=(1, 1, 1), shape=(128, 256, 256), method='pil-simd', order=3, padding='edge', axes_pairs=None, resample=None, *args, **kwargs)[source]

Unify spacing of all patients.

Resize all patients to meet spacing, then crop/pad resized array to meet shape.

Parameters:
  • spacing (tuple, list or ndarray of float) – (z,y,x)-spacing after resize. Should be passed as key-argument.
  • shape (tuple, list or ndarray of int) – (z,y,x)-shape after crop/pad. Should be passed as key-argument.
  • method (str) – interpolation method (‘pil-simd’ or ‘resize’). Should be passed as key-argument. See CTImagesBatch.resize for more information.
  • order (None or int) – order of scipy-interpolation (<=5), if used. Should be passed as key-argument.
  • padding (str) – mode of padding, any supported by np.pad. Should be passed as key-argument.
  • axes_pairs (tuple, list of tuples with pairs) – pairs of axes that will be used consequentially for performing pil-simd resize. Should be passed as key-argument.
  • resample (None or str) – filter of pil-simd resize. Should be passed as key-argument
  • patient (str) – index of patient, that worker is handling. Note: this argument is passed by inbatch_parallel
  • out_patient (ndarray) – result of individual worker after action. Note: this argument is passed by inbatch_parallel
  • res (ndarray) – New images to replace data inside images component. Note: this argument is passed by inbatch_parallel
  • factor (tuple) – (float), factor to make resize by. Note: this argument is passed by inbatch_parallel
  • shape_resize (tuple) – It is possible to provide shape_resize argument (shape after resize) instead of spacing. Then array with shape_resize will be cropped/padded for shape to = shape arg. Note that this argument is passed by inbatch_parallel

Notes

see CTImagesBatch.resize for more info about methods’ params.

Examples

>>> shape = (128, 256, 256)
>>> batch = batch.unify_spacing(shape=shape, spacing=(1.0, 1.0, 1.0),
                                order=2, method='scipy', padding='reflect')
>>> batch = batch.unify_spacing(shape=shape, spacing=(1.0, 1.0, 1.0),
                                resample=PIL.Image.BILINEAR)
upper_bounds

Get upper bounds of patients data in CTImagesBatch.

Returns:ndarray(n_patients,) containing upper bounds of patients data along z-axis.
Return type:ndarray