CTImagesBatch¶
-
class
radio.
CTImagesBatch
(index, *args, **kwargs)[source]¶ Batch class for storing batch of CT-scans in 3D.
Contains a component images = 3d-array of stacked scans along number_of_slices (z) axis (aka “skyscraper”), associated information for subsetting individual patient’s 3D scan (_bounds, origin, spacing) and various methods to preprocess the data.
Parameters: index (dataset.index) – ids of scans to be put in a batch -
components
¶ tuple of strings. – List names of data components of a batch, which are images, origin and spacing. NOTE: Implementation of this attribute is required by Base class.
-
index
¶ dataset.index – represents indices of scans from a batch
-
images
¶ ndarray – contains ct-scans for all patients in batch.
-
spacing
¶ ndarray of floats – represents distances between pixels in world coordinates
-
origin
¶ ndarray of floats – contains world coordinates of (0, 0, 0)-pixel of scans
-
calc_lung_mask
(patient, out_patient, res, erosion_radius, **kwargs)[source]¶ Return a mask for lungs
Parameters: erosion_radius (int) – radius of erosion to be performed.
-
central_crop
(crop_size, **kwargs)[source]¶ Make crop of crop_size from center of images.
Parameters: crop_size (tuple, list or ndarray of int) – (z,y,x)-shape of crop. Returns: Return type: batch
-
components
= ('images', 'spacing', 'origin')¶
-
classmethod
concat
(batches)[source]¶ Concatenate several batches in one large batch.
Assume that same components are filled in all supplied batches.
Parameters: batches (list or tuple of batches) – sequence of batches to be concatenated Returns: large batch with length = sum of lengths of concated batches Return type: batch Notes
Old batches’ indexes are dropped. New large batch has new np-arange index. if None-entries or batches of len=0 are included in the list of batches, they will be dropped after concat.
-
dump
(ix, dst, components=None, fmt='blosc', index_to_name=None, i8_encoding_mode=None)[source]¶ Dump chosen
components
of scans’ batcn in folderdst
in specified format.When some of the
components
areNone
, a warning is printed and nothing is dumped. By default (components is None
)dump
attempts to dump all components.Parameters: - dst (str) – destination-folder where all patients’ data should be put
- components (tuple, list, ndarray of strings or str) – component(s) that we need to dump (smth iterable or string). If not supplied, dump all components
- fmt ('blosc') – format of dump. Currently only blosc-format is supported; in this case folder for each patient is created. Tree-structure of created files is demonstrated in the example below.
- index_to_name (callable or None) – When supplied, should return str; A function that relates each item’s index to a name of item’s folder. That is, each item is dumped into os.path.join(dst, index_to_name(items_index)). If None, no transformation is applied and the method attempts to use indices of batch-items as names of items’ folders.
- i8_encoding_mode (int, str or dict) – whether (and how) components of skyscraper-type should be cast to int8. If None, no cast is performed. The cast allows to save space on disk and to speed up batch-loading. However, it comes with loss of precision, as originally skyscraper-components are stored in float32-format. Can be int: 0, 1, 2 or str/None: ‘linear’, ‘quantization’ or None. 0 or None disable the cast. 1 stands for ‘linear’, 2 - for ‘quantization’. Can also be component-wise dict of modes, e.g.: {‘images’: ‘linear’, ‘masks’: 0}.
Examples
Initialize batch and load data
>>> ind = ['1ae34g90', '3hf82s76'] >>> batch = CTImagesBatch(ind) >>> batch.load(...) >>> batch.dump(dst='./data/blosc_preprocessed')
The command above creates following files:
- ./data/blosc_preprocessed/1ae34g90/images/data.blk
- ./data/blosc_preprocessed/1ae34g90/images/data.shape
- ./data/blosc_preprocessed/1ae34g90/spacing/data.pkl
- ./data/blosc_preprocessed/1ae34g90/origin/data.pkl
- ./data/blosc_preprocessed/3hf82s76/images/data.blk
- ./data/blosc_preprocessed/3hf82s76/images/data.shape
- ./data/blosc_preprocessed/3hf82s76/spacing/data.pkl
- ./data/blosc_preprocessed/3hf82s76/origin/data.pkl
-
flip
(patient, out_patient, res)[source]¶ Invert the order of slices for each patient
Returns: Return type: batch Examples
>>> batch = batch.flip()
-
get_axial_slice
(person_number, slice_height)[source]¶ Get axial slice (e.g., for plots)
Parameters: Returns: Return type: ndarray (view)
Examples
Here self.index[5] usually smth like ‘a1de03fz29kf6h2’
>>> patch = batch.get_axial_slice(5, 0.6) >>> patch = batch.get_axial_slice(self.index[5], 0.6)
-
get_patches
(patch_shape, stride, padding='edge', data_attr='images')[source]¶ Extract patches of patch_shape with specified stride.
Parameters: Returns: 4d-ndaray of patches; first dimension enumerates patches
Return type: ndarray
Notes
Shape of all patients data is needed to be the same at this step, resize/unify_spacing is required before.
-
get_pos
(data, component, index)[source]¶ Return a positon of an item for a given index in data or in self.`component`.
Fetch correct position inside batch for an item, looks for it in data, if provided, or in component in self.
Parameters: - data (None or ndarray) – data from which subsetting is done. If None, retrieve position from component of batch, if ndarray, returns index.
- component (str) – name of a component, f.ex. ‘images’. if component provided, data should be None.
- index (str or int) – index of an item to be looked for. may be key from dataset (str) or index inside batch (int).
Returns: Position of item
Return type: Notes
This is an overload of get_pos from base Batch-class, see corresponding docstring for detailed explanation.
-
images_shape
¶ Get shapes for all 3d scans in CTImagesBatch.
Returns: shapes of data for each patient, ndarray(patient_pos, 3) Return type: ndarray
-
load
(fmt='dicom', components=None, bounds=None, **kwargs)[source]¶ Load 3d scans data in batch.
Parameters: - fmt (str) – type of data. Can be ‘dicom’|’blosc’|’raw’|’ndarray’
- components (tuple, list, ndarray of strings or str) – Contains names of batch component(s) that should be loaded. As of now, works only if fmt=’blosc’. If fmt != ‘blosc’, all available components are loaded. If None and fmt = ‘blosc’, again, all components are loaded.
- bounds (ndarray(n_patients + 1, dtype=np.int) or None) – Needed iff fmt=’ndarray’. Bound-floors for items from a skyscraper (stacked scans).
- **kwargs –
- images : ndarray(n_patients * z, y, x) or None
- Needed only if fmt = ‘ndarray’ input array containing skyscraper (stacked scans).
- origin : ndarray(n_patients, 3) or None
- Needed only if fmt=’ndarray’. origins of scans in world coordinates.
- spacing : ndarray(n_patients, 3) or None
- Needed only if fmt=’ndarray’ ndarray with spacings of patients along z,y,x axes.
Returns: Return type: self
Examples
DICOM example initialize batch for storing batch of 3 patients with following IDs:
>>> index = FilesIndex(path="/some/path/*.dcm", no_ext=True) >>> batch = CTImagesBatch(index) >>> batch.load(fmt='dicom')
Ndarray example
images_array stores a set of 3d-scans concatted along 0-zxis, “skyscraper”. Say, it is a ndarray with shape (400, 256, 256)
bounds stores ndarray of last floors for each scan. say, bounds = np.asarray([0, 100, 400])
>>> batch.load(fmt='ndarray', images=images_array, bounds=bounds)
-
load_from_patches
(patches, stride, scan_shape, data_attr='images')[source]¶ Get skyscraper from 4d-array of patches, put it to data_attr component in batch.
Let reconstruct original skyscraper from patches (if same arguments are passed)
Parameters: - patches (ndarray) – 4d-array of patches, with dims: (num_patches, z, y, x).
- scan_shape (tuple, list or ndarray of int) – (z,y,x)-shape of individual scan (should be same for all scans).
- stride (tuple, list or ndarray of int) – (z,y,x)-stride step used for gathering data from patches.
- data_attr (str) – batch component name to store new data.
Notes
If stride != patch.shape(), averaging of overlapped regions is used. scan_shape, patches.shape(), stride are used to infer the number of items in new skyscraper. If patches were padded, padding is removed for skyscraper.
-
lower_bounds
¶ Get lower bounds of patients data in CTImagesBatch.
Returns: ndarray(n_patients,) containing lower bounds of patients data along z-axis. Return type: ndarray
-
make_xip
(depth, stride=1, mode='max', projection='axial', padding='reflect', **kwargs)[source]¶ Make intensity projection (maximum, minimum, mean or median).
Notice that axis is chosen according to projection argument.
Parameters: - depth (int) – number of slices over which xip operation is performed.
- stride (int) – stride-step along projection dimension.
- mode (str) – Possible values are ‘max’, ‘min’, ‘mean’ or ‘median’.
- projection (str) – Possible values: ‘axial’, ‘coronal’, ‘sagital’. In case of ‘coronal’ and ‘sagital’ projections tensor will be transposed from [z,y,x] to [x,z,y] and [y,z,x].
- padding (str) – mode of padding that will be passed in numpy.padding function.
-
classmethod
merge
(batches, batch_size=None)[source]¶ Concatenate list of batches and then split the result in two batches of sizes (batch_size, sum(lens of batches) - batch_size)
Parameters: - batches (list of batches) –
- batch_size (int) – length of first resulting batch
Returns: (new_batch, rest_batch)
Return type: tuple of batches
Notes
Merge performs split (of middle-batch) and then two concats because of speed considerations.
-
normalize_hu
(min_hu=-1000, max_hu=400)[source]¶ Normalize HU-densities to interval [0, 255].
Trim HU that are outside range [min_hu, max_hu], then scale to [0, 255].
Parameters: Returns: Return type: batch
Examples
>>> batch = batch.normalize_hu(min_hu=-1300, max_hu=600)
-
rescale
(new_shape)[source]¶ Recomputes spacing values for patients’ data after resize.
Parameters: new_shape (ndarray(dtype=np.int)) – shape of patient 3d array after resize, in format np.array([z_dim, y_dim, x_dim], dtype=np.int). Returns: ndarray(n_patients, 3) with spacing values for each patient along z, y, x axes. Return type: ndarray
-
resize
(patient, out_patient, res, shape=(128, 256, 256), method='pil-simd', axes_pairs=None, resample=None, order=3, *args, **kwargs)[source]¶ Resize (change shape of) each CT-scan in the batch.
When called from a batch, changes this batch.
Parameters: - shape (tuple, list or ndarray of int) – (z,y,x)-shape that should be AFTER resize. Note, that ct-scan dim_ordering also should be z,y,x
- method (str) – interpolation package to be used. Either ‘pil-simd’ or ‘scipy’. Pil-simd ensures better quality and speed on configurations with average number of cores. On the contrary, scipy is better scaled and can show better performance on systems with large number of cores
- axes_pairs (None or list/tuple of tuples with pairs) – pairs of axes that will be used for performing pil-simd resize, as this resize is made in 2d. Min number of pairs to use is 1, at max there can be 6 pairs. If None, set to ((0, 1), (1, 2)). The more pairs one uses, the more precise is the result. (and computation takes more time).
- resample (filter of pil-simd resize. By default set to bilinear. Can be any of filters) – supported by PIL.Image.
- order (the order of scipy-interpolation (<= 5)) – large value improves precision, but slows down the computaion.
Examples
>>> shape = (128, 256, 256) >>> batch = batch.resize(shape=shape, order=2, method='scipy') >>> batch = batch.resize(shape=shape, resample=PIL.Image.BILINEAR)
-
rotate
(index, angle, components='images', axes=(1, 2), random=True, **kwargs)[source]¶ Rotate 3D images in batch on specific angle in plane.
Parameters: - angle (float) – degree of rotation.
- components (tuple, list, ndarray of strings or str) – name(s) of components to rotate each item in it.
- axes (tuple, list or ndarray of int) – (int, int), plane of rotation specified by two axes (zyx-ordering).
- random (bool) – if True, then degree specifies maximum angle of rotation.
Returns: ndarray of 3D rotated image.
Return type: ndarray
Notes
zero padding automatically added after rotation. Use this action in the end of pipelines for purposes of augmentation. E.g., after
sample_nodules()
Examples
Rotate images on 90 degrees:
>>> batch = batch.rotate(angle=90, axes=(1, 2), random=False)
Random rotation with maximum angle:
>>> batch = batch.rotate(angle=30, axes=(1, 2))
-
segment
(erosion_radius=2, **kwargs)[source]¶ Segment lungs’ content from 3D array.
Parameters: erosion_radius (int) – radius of erosion to be performed. Returns: Return type: batch Notes
Sets HU of every pixel outside lungs to DARK_HU = -2000.
Examples
>>> batch = batch.segment(erosion_radius=4, num_threads=20)
-
slice_shape
¶ Get shape of slice in yx-plane.
Returns: ndarray([y_dim, x_dim],dtype=np.int) with shape of scan slice. Return type: ndarray
-
classmethod
split
(batch, batch_size)[source]¶ Split one batch in two batches.
The lens of 2 batches would be batch_size and len(batch) - batch_size
Parameters: - batch (Batch class instance) – batch to be splitted in two
- batch_size (int) – length of first returned batch. If batch_size >= len(batch), return None instead of a 2nd batch
Returns: (1st_Batch, 2nd_Batch)
Return type: tuple of batches
Notes
Method does not change the structure of input Batch.index. Indices of output batches are simply subsets of input Batch.index.
-
unify_spacing
(patient, out_patient, res, factor, shape_resize, spacing=(1, 1, 1), shape=(128, 256, 256), method='pil-simd', order=3, padding='edge', axes_pairs=None, resample=None, *args, **kwargs)[source]¶ Unify spacing of all patients.
Resize all patients to meet spacing, then crop/pad resized array to meet shape.
Parameters: - spacing (tuple, list or ndarray of float) – (z,y,x)-spacing after resize. Should be passed as key-argument.
- shape (tuple, list or ndarray of int) – (z,y,x)-shape after crop/pad. Should be passed as key-argument.
- method (str) – interpolation method (‘pil-simd’ or ‘resize’). Should be passed as key-argument. See CTImagesBatch.resize for more information.
- order (None or int) – order of scipy-interpolation (<=5), if used. Should be passed as key-argument.
- padding (str) – mode of padding, any supported by np.pad. Should be passed as key-argument.
- axes_pairs (tuple, list of tuples with pairs) – pairs of axes that will be used consequentially for performing pil-simd resize. Should be passed as key-argument.
- resample (None or str) – filter of pil-simd resize. Should be passed as key-argument
- patient (str) – index of patient, that worker is handling. Note: this argument is passed by inbatch_parallel
- out_patient (ndarray) – result of individual worker after action. Note: this argument is passed by inbatch_parallel
- res (ndarray) – New images to replace data inside images component. Note: this argument is passed by inbatch_parallel
- factor (tuple) – (float), factor to make resize by. Note: this argument is passed by inbatch_parallel
- shape_resize (tuple) – It is possible to provide shape_resize argument (shape after resize) instead of spacing. Then array with shape_resize will be cropped/padded for shape to = shape arg. Note that this argument is passed by inbatch_parallel
Notes
see CTImagesBatch.resize for more info about methods’ params.
Examples
>>> shape = (128, 256, 256) >>> batch = batch.unify_spacing(shape=shape, spacing=(1.0, 1.0, 1.0), order=2, method='scipy', padding='reflect') >>> batch = batch.unify_spacing(shape=shape, spacing=(1.0, 1.0, 1.0), resample=PIL.Image.BILINEAR)
-
upper_bounds
¶ Get upper bounds of patients data in CTImagesBatch.
Returns: ndarray(n_patients,) containing upper bounds of patients data along z-axis. Return type: ndarray
-