Pipelines

Helper functions describing pipelines for creating large samples of nodules

radio.pipelines.pipelines.combine_crops(cancer_set, non_cancer_set, batch_sizes=(10, 10), hu_lims=(-1000, 400))[source]

Pipeline for generating batches of cancerous and non-cancerous crops from ct-scans in chosen proportion.

Parameters:
  • cancer_set (dataset) – dataset of cancerous crops in blosc format.
  • non_cancer_set (dataset) – dataset of non-cancerous crops in blosc format.
  • batch_sizes (tuple, list of int) – seq of len=2, (num_cancer_batches, num_noncancer_batches).
  • hu_lims (tuple, list of float) – seq of len=2, representing limits of hu-trimming in normalize_hu-action.
Returns:

Return type:

pipeline

radio.pipelines.pipelines.get_crops(nodules, fmt='raw', nodule_shape=(32, 64, 64), batch_size=20, share=0.5, histo=None, variance=(36, 144, 144), hu_lims=(-1000, 400), **kwargs)[source]

Get pipeline that performs preprocessing and crops cancerous/non-cancerous nodules in a chosen proportion.

Parameters:
  • nodules (pd.DataFrame) –
    contains:
    • ’seriesuid’: index of patient or series.
    • ’z’,’y’,’x’: coordinates of nodules center.
    • ’diameter’: diameter, in mm.
  • fmt (str) – can be either ‘raw’, ‘blosc’ or ‘dicom’.
  • nodule_shape (tuple, list or ndarray of int) – crop shape along (z,y,x).
  • batch_size (int) – number of nodules in batch generated by pipeline.
  • share (float) – share of cancer crops in the batch.
  • histo (tuple) – numpy.histogramdd() output. Used for sampling non-cancerous crops
  • variance (tuple, list or ndarray of float) – variances of normally distributed random shifts of nodules’ start positions
  • hu_lims (tuple, list of float) – seq of len=2, representing limits of hu-trimming in normalize_hu-action.
  • **kwargs
    spacing : tuple
    (z,y,x) spacing after resize.
    shape : tuple
    (z,y,x) shape after crop/pad.
    method : str
    interpolation method (‘pil-simd’ or ‘resize’). See resize().
    order : None or int
    order of scipy-interpolation (<=5), if used.
    padding : str
    mode of padding, any supported by numpy.pad().
Returns:

Return type:

pipeline

radio.pipelines.pipelines.split_dump(cancer_path, non_cancer_path, nodules, histo=None, fmt='raw', nodule_shape=(32, 64, 64), variance=(36, 144, 144), **kwargs)[source]

Get pipeline for dumping cancerous crops in one folder and random noncancerous crops in another.

Parameters:
  • cancer_path (str) – directory to dump cancerous crops in.
  • non_cancer_path (str) – directory to dump non-cancerous crops in.
  • nodules (pd.DataFrame) –
    contains:
    • ’seriesuid’: index of patient or series.
    • ’z’,’y’,’x’: coordinates of nodules center.
    • ’diameter’: diameter, in mm.
  • histo (tuple) – numpy.histogramdd() output. Used for sampling non-cancerous crops
  • fmt (str) – can be either ‘raw’, ‘blosc’ or ‘dicom’.
  • nodule_shape (tuple, list or ndarray of int) – crop shape along (z,y,x).
  • variance (tuple, list or ndarray of float) – variances of normally distributed random shifts of nodules’ start positions
  • **kwargs
    spacing : tuple
    (z,y,x) spacing after resize.
    shape : tuple
    (z,y,x) shape after crop/pad.
    method : str
    interpolation method (‘pil-simd’ or ‘resize’). See resize() for more information.
    order : None or int
    order of scipy-interpolation (<=5), if used.
    padding : str
    mode of padding, any supported by numpy.pad().
Returns:

Return type:

pipeline

radio.pipelines.pipelines.update_histo(nodules, histo, fmt='raw', **kwargs)[source]

Pipeline for updating histogram using info in dataset of scans.

Parameters:
  • nodules (pd.DataFrame) –
    contains:
    • ’seriesuid’: index of patient or series.
    • ’z’,’y’,’x’: coordinates of nodules center.
    • ’diameter’: diameter, in mm.
  • histo (tuple) – numpy.histogramdd() output. Used for sampling non-cancerous crops (compare the latter with tuple (bins, edges) returned by numpy.histogramdd()).
  • fmt (str) – can be either ‘raw’, ‘blosc’ or ‘dicom’.
  • **kwargs
    spacing : tuple
    (z,y,x) spacing after resize.
    shape : tuple
    (z,y,x) shape after crop/pad.
    method : str
    interpolation method (‘pil-simd’ or ‘resize’). See resize() for more information.
    order : None or int
    order of scipy-interpolation (<=5), if used.
    padding : str
    mode of padding, any supported by numpy.pad().
Returns:

Return type:

pipeline