Welcome to RadIO's documentation! ================================= **RadIO** is a framework for data science research of computed tomography (CT) imaging. Main features: - Asynchronously load **DICOM** and **MetaImage** (mhd/raw) files - Dump files to `blosc `_ to compress datasets and thus accelerate loading - Transform and augment CT-scans in parallel for faster preprocessing - Create concise chainable workflows with ``actions`` or use tailored :doc:`pipelines ` for preprocessing or model training - Train with ease a zoo of state-of-the-art neural networks for classification or semantic segmentation - Sample crops of any size from CT-scans for comprehensive training - Customize :ref:`distribution of crop locations ` for improved training - Predict :meth:`on the whole scan <.CTImagesMaskedBatch.predict_on_scan>`. `The documentation <#contents>`_ contains a comprehensive review of RadIO's capabilities. While `tutorials `_ provide ready-to-use code blocks and a practical demonstration of the most important RadIO features. Tutorials --------- There are four tutorials available: * In the `first `_ one you can learn how to set up a dataset of CT-scans and define a basic preprocessing workflow. * The `second tutorial `_ contains in-depth discussion of preprocessing and augmentation actions. * The `third tutorial `_ explains how to generate batches to train a neural network. * The `fourth tutorial `_ will help you configure and train a neural network to detect cancer. Documentation ------------- .. toctree:: :maxdepth: 2 intro/preprocessing intro/pipelines intro/models api/api Preprocess scans with chained actions ------------------------------------- Preprocessing module contains a set of :doc:`actions ` to efficiently prepare a dataset of CT-scans for neural networks training. Say, you have a bunch of **DICOM** scans with varying shapes. First, you create an index and define a dataset:: from radio import CTImagesBatch from dataset import FilesIndex, Dataset dicom_ix = FilesIndex(path='path/to/dicom/*', no_ext=True) # set up the index dicom_dataset = Dataset(index=dicom_ix, batch_class=CTImagesBatch) # init the dataset of dicom files You may want to resize the scans to equal shape **[128, 256, 256]**, normalize voxel densities to range **[0, 255]** and dump transformed scans. This preprocessing can be easily performed with the following :class:`pipeline `:: pipeline = ( dicom_dataset.p .load(fmt='dicom') .resize(shape=(128, 256, 256)) .normalize_hu() .dump('/path/to/preprocessed/scans/') ) pipeline.run(batch_size=20) See the :doc:`documentation ` for the description of preprocessing actions implemented in the module. Preprocess scans using a pre-defined workflow --------------------------------------------- Pipelines module contains ready-to-use workflows for most frequent tasks. For instance, if you want to preprocess a dataset of scans named ``dicom_dataset`` and prepare data for training a neural network, you can simply execute the following pipeline creator (without spending much time on thinking what actions to choose for a workflow):: from radio.pipelines import get_crops nodata_pipeline = get_crops(fmt='raw', shape=(128, 256, 256), nodules=nodules, batch_size=20, share=0.6, nodule_shape=(32, 64, 64)) dicom_pipeline = dicom_dataset >> nodata_pipeline for batch in dicom_pipeline.gen_batch(batch_size=12, shuffle=True): # ... # train a model here See the :doc:`documentation ` for more information about ready-made workflows. Adding a neural network to a workflow ------------------------------------- ``RadIO`` contains proven architectures for classification, segmentation and detection, including neural networks designed specifically for cancer detection (e.g. ``DenseNoduleNet`` inspired by the state-of-the-art DenseNet, but well suited for 3D CT scans):: from radio.preprocessing import CTImagesMaskedBatch as CTIMB from radio.models import DenseNoduleNet from radio.dataset import F training_pipeline = ( dicom_dataset.p .load(fmt='raw') .fetch_nodules_info(nodules_df) .create_mask() .sample_nodules(nodule_size=(32, 64, 64), batch_size=20) .init_model('static', DenseNoduleNet, 'net') .train_model('net', feed_dict={ 'images': F(CTIMB.unpack, component='images'), 'labels': F(CTIMB.unpack, component='classification_targets') }) ) training_pipeline.run(batch_size=10, shuffle=True) The :doc:`models documentation ` contains more information about implemented architectures and their application to cancer detection. Installation ------------ With `pipenv `_:: pipenv install git+https://github.com/analysiscenter/radio.git#egg=radio With `pip `_:: pip3 install git+https://github.com/analysiscenter/radio.git After that just import `RadIO`:: import radio .. note:: `RadIO` module is in the beta stage. Your suggestions and improvements are very welcome. .. note:: `RadIO` supports python 3.5 or higher. .. note:: When cloning repo from GitHub use flag ``--recursive`` to make sure that ``Dataset`` submodule is also cloned. ``git clone --recursive https://github.com/analysiscenter/radio.git`` Citing RadIO ------------ Please cite RadIO in your publications if it helps your research. .. image:: https://zenodo.org/badge/DOI/10.5281/zenodo.1156363.svg :target: https://doi.org/10.5281/zenodo.1156363 :: Khudorozhkov R., Emelyanov K., Koryagin A. RadIO library for data science research of CT images. 2017. :: @misc{radio_2017_1156363, author = {Khudorozhkov R., Emelyanov K., Koryagin A.}, title = {RadIO library for data science research of CT images}, year = 2017, doi = {10.5281/zenodo.1156363}, url = {https://doi.org/10.5281/zenodo.1156363} }