Torch models

Getting started

A model might be used for training or inference. In both cases you need to specify a model config and a pipeline.

A typical minimal config includes inputs and initial_block sections:

model_config = {
    'inputs': dict(images={'shape': (128, 128, 3)},
                   targets={'classes': 10}),
    'initial_block/inputs': 'images'

A minimal training pipeline consists of init_model() and train_model():

pipeline = my_dataset.p
    .init_model('dynamic', MyModel, 'my_model', model_config)
    .train_model('my_model', B('images'), labels=B('labels'), fetches='loss', save_to=V('loss'))
    .run(BATCH_SIZE, shuffle=True, n_epochs=5)

To create an inference pipeline replace train_model with predict_model().

A pipeline can also load() a pretrained model which was previously saved to a disk:

model_config = {
    'build': 'first',
    'load': dict(path='/path/to/model'),

pipeline = my_dataset.p
    .init_model('dynamic', MyModel, 'my_model', model_config)
    .predict_model('my_model', B('images'), fetches='predictions', save_to=V('predictions'))

Note that you can indicate through ‘build’ option whether a model needs to be created or updated by calling build(). Most often than not ‘build’ should be False (which is a default value), but sometimes it might be convenient to create the model before loading or change it after loading.

Specify build=’first’ to create the graph and load a pretrained model afterwards. While build=True means that the model will be loaded first and after that will be executed thus allowing to change the model.

Model structure

A typical model comprises of

  • initial_block

  • body (which, in turn, might include blocks)

  • head.

This division might seem somewhat arbitrary, though, many modern networks follow it.


This block just transforms the raw inputs into more managable and initially preprocessed tensors.

Some networks do not need this (like VGG). However, most network have 1 or 2 convolutional layers and sometimes also a max pooling layer with stride 2. These layers can be put into body, as well. But the initial block takes all irregular front layers, thus allowing for a regular body structure.


Body contains a repetitive structure of building blocks. Most networks (like VGG, ResNet and the likes) have a straight sequence of blocks, while others (e.g. UNet, LinkNet, RefineNet, ResNetAttention) look like graphs with many interconnections.

Initial block’s output goes into body as inputs. And body’s output is a compressed representation (embedding) of the input tensors. It can later be used for various tasks: classification, regression, detection, etc. So body produces a task-independent embedding.


The network building block reflects the model’s unique logic and specific technology.

Not surprisingly, many networks comprise different types of blocks, for example:

  • UNet and LinkNet have encoder and decoder blocks

  • Inception includes inception, reduction, and expanded blocks

  • DenseNet have dense and transition blocks

  • SqueezeNet alternates fire blocks with max-pooling.

When creating a custom model you can have as many block types as you need, though aim to make them universal and reusable elsewhere. For instance, LinkNet and GlobalConvolutionNetwork use ResNet blocks.

How to configure a model

Configuration options may vary between models. However, some parameters are available in many (if not all) models.


Inputs section contains a description of model input data, its shapes, transformations needed and names of the resulting tensors.

Each input might have following parameters:
dtypestr or tf.DType (by default ‘float32’)

data type

shapeint or tuple / list

a tensor shape which includes the number of channels/classes and doesn’t include a batch size.

classesarray-like or int

an array of class labels if data labels are strings or just a number of classes

Even though all parameters are optional, at least some of them should be specified for each input tensor.

For instance, this config will create placeholders with the names images and targets:

model_config = {
    'inputs': dict(images={'shape': (128, 128, 3)},
                   targets={'classes': 10}),

Later, names images and targets will be used to feed data into the model when training or predicting. Take into account that one-hot encoding is not required for labels when using cross-entropy loss as it is applied automatically. However, for custom losses one-hot encoding might be necessary.

For more information on the configuration of the inputs, see _make_inputs().

initial block

Initial block specifies which inputs flow into the model to turn into prediction:

model_config = {
    'initial_block/inputs': 'images',

As the default initial block contains a ConvBlock, all its parameters might be also specfied in the config:

model_config = {
    'initial_block': dict(layout='cnap', filters=64, kernel_size=7, strides=2),
    'initial_block/inputs': 'images',

So the configured initial block gets images tensor and applies a convolution with 7x7 kernel and stride 2.

Initial block can also be specified as nn.Module.

For predefined models an initial block has the default configuration according to the original paper. So you almost never need to redefine it.

However, initial_block/inputs should always be specified.


Body is the main part of a model. Thus its configuration highly depends on the model structure and purpose.

For instance, ResNet body config includes block section with specific residual block parameters. While UNet body contains upsample section which specifies the technique to resize tensors in a decoder part of the network.

See the model documentation to find out how to configure its body.

Body can be specified as nn.Module or a dict with parameters for a specific model.


For many models head is just another ConvBlock. So you may configure layout, the number of filters, dense layer units or other parameters. As usual, it is rarely needed for predefined models.

Head can be specified as nn.Module or a dict with ConvBlock parameters.

Loss, learning rate decay, optimizer

These parameters might be defined in one of three formats:

  • name

  • tuple (name, args)

  • dict {‘name’: name, …other args}

where name might be one of: - short name (‘mse’, ‘ce’, ‘l1’, ‘cos’, ‘hinge’, ‘huber’, ‘logloss’, ‘dice’) - class name from torch (e.g. ‘L1Loss’ or ‘TripletMargin’) - callable.

For example:

{'loss': 'mse'}
{'loss': ('KLDiv', {'reduction': 'none'})
{'loss': {'name': MyCustomLoss, 'epsilon': 1e-6}}

Available short names for losses: mse, ce, l1, cos, hinge, huber, logloss, dice.

{'decay': 'exp'}
{'decay': ('StepLR', {'steps_size': 10000})}
{'decay': {'name': MyCustomDecay, 'decay_rate': .5}
{'optimizer': 'Adam'}
{'optimizer': ('SparseAdam', {'lr': 0.01})}
{'optimizer': {'name': 'Adagrad', 'initial_accumulator_value': 0.01}
{'optimizer': {'name': MyCustomOptimizer, momentum=0.95}}

For more detail see TorchModel documentation.

How to write a custom model

To begin with, take a look at ConvBlock to find out how to write complex networks in just one line of code. This block is a convenient building block for concise, yet very expressive neural networks.

The simplest case you should avoid

Just redefine body() method.

For example, let’s create a small fully convolutional network with 3 layers of 3x3 convolutions, batch normalization, dropout and a dense layer at the end:

from batchflow.models.torch import TorchModel
from batchflow.models.torch.layers import ConvBlock

class MyModel(TorchModel):
    def body(self, **kwargs):
        x = ConvBlock(inputs, layout='cna cna cna df', filters=[64, 128, 256], units=10, kernel_size=3,
                      dropout_rate=.2, **kwargs)
        return x

Despite simplicity, this approach is highly discouraged as:

  • the model parameters are hard coded in the body

  • the model cannot be configured within a pipeline

  • the model does not allow model composition, i.e. using this model components in other models.

The right way

Here we split network configuration and network definition into separate methods:

from batchflow.models.torch import TorchModel
from batchflow.models.torch.layers import ConvBlock

class MyModel(TorchModel):
    def default_config(cls):
        config = TorchModel.default_config()
        config['body'].update(dict(filters=[64, 128, 256], kernel_size=3, layout='cna cna cna'))
        config['head'].update(dict(units=2, layout='df', dropout_rate=.2))
        return config

    def build_config(self, names=None):
        config = super().build_config(names)
        config['head/units'] = self.num_classes('targets')
        config['head/filters'] = self.num_classes('targets')
        return config

    def body(cls, **kwargs):
        kwargs = cls.get_defaults('body', kwargs)
        x = ConvBlock(**kwargs)
        return x

Note that default_config and body are @classmethods now, which means that they might be called without instantiating a MyModel object. This is needed for model composition, e.g. when MyModel serves as a base network for an FCN or SSD network.

On the other hand, build_config is an ordinary method, so it is called only when an instance of MyModel is created.

Thus, default_config should contain all the constants which are totaly independent of the dataset and a specific task at hand, while build_config is intended to extract values from the dataset through pipeline’s configuration (for details see Configuring a model).

Now you can train the model with a simple pipeline:

model_config = {
    'loss': 'ce',
    'decay': 'exp',
    'optimizer': 'Adam',
    'inputs': dict(images={'shape': (128, 128, 3)},
                   labels={'classes': 10}),
    'initial_block/inputs': 'images'

pipeline = my_dataset.p
    .init_variable('loss_history', init_on_each_run=list)
    .init_model('dynamic', MyModel, 'my_model', model_config)
    .train_model('my_model', B('images'), B('labels'),
                 fetches='loss', save_to=V('loss_history'), mode='a')
    .run(BATCH_SIZE, shuffle=True, n_epochs=5)

To switch to a fully convolutional head with 3x3 convolutions and global average pooling, just add 1 line to the config:

model_config = {
    'head/layout': 'cV'

As a result, the very same model class might be used

  • in numerous scenarios

  • with different configurations

  • for various tasks

  • with heterogenous data.

Things worth mentioning:

  1. Override initial_block(), body() and head(), if needed. In many cases config is just enough to build a network without additional code writing.

  2. Input data and its parameters should be defined in configuration under inputs key. See TorchModel._make_inputs() for details.

  3. You might want to use a convenient multidimensional ConvBlock, as well as other predefined layers from batchflow.models.torch.layers. Of course, you can use usual Torch layers.

  4. In many cases there is no need to write a loss function, learning rate decay and optimizer as they might be defined through config.

  5. For a configured loss to work one of the inputs should have a name targets and the model output is considered predictions. They will be passed to a loss function.