Pipelines¶
Helper functions describing pipelines for creating large samples of nodules
-
radio.pipelines.pipelines.
combine_crops
(cancer_set, non_cancer_set, batch_sizes=(10, 10), hu_lims=(-1000, 400))[source]¶ Pipeline for generating batches of cancerous and non-cancerous crops from ct-scans in chosen proportion.
Parameters: - cancer_set (dataset) – dataset of cancerous crops in blosc format.
- non_cancer_set (dataset) – dataset of non-cancerous crops in blosc format.
- batch_sizes (tuple, list of int) – seq of len=2, (num_cancer_batches, num_noncancer_batches).
- hu_lims (tuple, list of float) – seq of len=2, representing limits of hu-trimming in normalize_hu-action.
Returns: Return type: pipeline
-
radio.pipelines.pipelines.
get_crops
(nodules, fmt='raw', nodule_shape=(32, 64, 64), batch_size=20, share=0.5, histo=None, variance=(36, 144, 144), hu_lims=(-1000, 400), **kwargs)[source]¶ Get pipeline that performs preprocessing and crops cancerous/non-cancerous nodules in a chosen proportion.
Parameters: - nodules (pd.DataFrame) –
- contains:
- ’seriesuid’: index of patient or series.
- ’z’,’y’,’x’: coordinates of nodules center.
- ’diameter’: diameter, in mm.
- fmt (str) – can be either ‘raw’, ‘blosc’ or ‘dicom’.
- nodule_shape (tuple, list or ndarray of int) – crop shape along (z,y,x).
- batch_size (int) – number of nodules in batch generated by pipeline.
- share (float) – share of cancer crops in the batch.
- histo (tuple) –
numpy.histogramdd()
output. Used for sampling non-cancerous crops - variance (tuple, list or ndarray of float) – variances of normally distributed random shifts of nodules’ start positions
- hu_lims (tuple, list of float) – seq of len=2, representing limits of hu-trimming in normalize_hu-action.
- **kwargs –
- spacing : tuple
- (z,y,x) spacing after resize.
- shape : tuple
- (z,y,x) shape after crop/pad.
- method : str
- interpolation method (‘pil-simd’ or ‘resize’).
See
resize()
. - order : None or int
- order of scipy-interpolation (<=5), if used.
- padding : str
- mode of padding, any supported by
numpy.pad()
.
Returns: Return type: pipeline
- nodules (pd.DataFrame) –
-
radio.pipelines.pipelines.
split_dump
(cancer_path, non_cancer_path, nodules, histo=None, fmt='raw', nodule_shape=(32, 64, 64), variance=(36, 144, 144), **kwargs)[source]¶ Get pipeline for dumping cancerous crops in one folder and random noncancerous crops in another.
Parameters: - cancer_path (str) – directory to dump cancerous crops in.
- non_cancer_path (str) – directory to dump non-cancerous crops in.
- nodules (pd.DataFrame) –
- contains:
- ’seriesuid’: index of patient or series.
- ’z’,’y’,’x’: coordinates of nodules center.
- ’diameter’: diameter, in mm.
- histo (tuple) –
numpy.histogramdd()
output. Used for sampling non-cancerous crops - fmt (str) – can be either ‘raw’, ‘blosc’ or ‘dicom’.
- nodule_shape (tuple, list or ndarray of int) – crop shape along (z,y,x).
- variance (tuple, list or ndarray of float) – variances of normally distributed random shifts of nodules’ start positions
- **kwargs –
- spacing : tuple
- (z,y,x) spacing after resize.
- shape : tuple
- (z,y,x) shape after crop/pad.
- method : str
- interpolation method (‘pil-simd’ or ‘resize’).
See
resize()
for more information. - order : None or int
- order of scipy-interpolation (<=5), if used.
- padding : str
- mode of padding, any supported by
numpy.pad()
.
Returns: Return type: pipeline
-
radio.pipelines.pipelines.
update_histo
(nodules, histo, fmt='raw', **kwargs)[source]¶ Pipeline for updating histogram using info in dataset of scans.
Parameters: - nodules (pd.DataFrame) –
- contains:
- ’seriesuid’: index of patient or series.
- ’z’,’y’,’x’: coordinates of nodules center.
- ’diameter’: diameter, in mm.
- histo (tuple) –
numpy.histogramdd()
output. Used for sampling non-cancerous crops (compare the latter with tuple (bins, edges) returned bynumpy.histogramdd()
). - fmt (str) – can be either ‘raw’, ‘blosc’ or ‘dicom’.
- **kwargs –
- spacing : tuple
- (z,y,x) spacing after resize.
- shape : tuple
- (z,y,x) shape after crop/pad.
- method : str
- interpolation method (‘pil-simd’ or ‘resize’).
See
resize()
for more information. - order : None or int
- order of scipy-interpolation (<=5), if used.
- padding : str
- mode of padding, any supported by
numpy.pad()
.
Returns: Return type: pipeline
- nodules (pd.DataFrame) –