================= Named expressions ================= As pipelines are declarative you might need a way to address data which exists only when the pipeline is executed. For instance, model training takes batch data, but you don't have any batches when you declare a pipeline. Batches appear only when the pipeline is run. This is where **named expressions** come into play. A named expression specifies a substitution rule. When the pipeline is being executed, a named expression is replaced with the value calculated according to that rule. There are several types of named expressions: * B('name') - a batch class attribute or component name * V('name') - a pipeline variable name * C('name') - a pipeline config option * D('name') - a dataset attribute * F(...) - a callable * R(...) - a random value * W(...) - a wrapper for a named expression * P(...) - a wrapper for parallel actions that calculates its expression as a batch-sized vector * PP(...) - a wrapper for parallel actions that calculate its expression batch-size times in a cycle * I(...) - an iteration counter Named expressions can be defined in two ways: - through instance creation, e.g. B('attr'), V('name'), C('option') - through attribution, e.g. B.attr, V.name, C.option. The only difference is that the former allows for assignment mode, V('name', mode='append'), while the latter requires fewer letters to type (so the default mode is implied). Using in pipelines ================== Named expressions can be used in pipelines as variables to get data from and to store data into them. :: pipeline ... .train_model(C('model_name'), features=B.features, labels=B.labels, fetches='predictions', save_to=V('predictions')) ... Each named expression is calculated on each iteration and then its current value will be passed into action. Therefore, :ref:`actions ` get usual parameter values, not named expressions. Using outside of pipelines ========================== You may also use :doc:`named expressions <../api/batchflow.named_expressions>` in your custom methods. There are two main methods: :meth:`~batchflow.named_expr.NamedExpression.get` and :func:`~batchflow.named_expr.NamedExpression.set`. Operations with expressions =========================== Named expressions support basic arithmetic operations like `+`, `-`, etc. :: pipeline ... .print('Iterations per epoch:', D.size // B.size) To convert a named expression value to a string use :meth:`~batchflow.named_expr.NamedExpression.str` method:: pipeline ... .print('Dataset contains ' + D('size').str() + ' items') Formatting is also possible:: pipeline ... .print(V('variable').format('Value of the variable is {:7.7}') Slicing is often useful:: pipeline ... .print('Current loss:', V('loss_history')[-1]) As well as getting attributes:: pipeline ... .print('Size in bytes:', B.images.nbytes) And calling a function:: pipeline ... .print('Accuracy:', C.custom_accuracy(targets=B.labels, predictions=V('predictions')) B - batch component =================== :: pipeline ... .train_model(model_name, features=B.features, labels=B.labels) ... At each iteration ``B('features')`` and ``B('labels')`` will be replaced with ``current_batch.features`` and ``current_batch.labels``, i.e. `batch components `_ or attributes. .. note:: ``B()`` (i.e. without a component name) returns the batch itself. To avoid unexpected changes of the batch, the copy can be created with ``B(copy=True)``. V - pipeline variable ===================== :: pipeline ... .train_model(V('model_name'), ...) ... At each iteration ``V('model_name')`` will be replaced with the current value of ``pipeline.get_variable('model_name')``. Thus, you can even change the model trained (or any other pipeline parameter) during pipeline execution. C - config option ================= :: config = dict(model=ResNet34, model_config=model_config) train_pipeline = dataset.train.pipeline(config) ... .init_model('dynamic', C('model', default=ResNet18), 'my_model', C.model_config) ... At each iteration ``C('model')`` will be replaced with the current value of ``pipeline.config['model']``. If there is no ``model`` key in the pipeline config, a default value will be used. If default is not set, ``KeyError`` is raised. This is an example of a model independent pipeline which allows to change models, for instance, to assess performance of various models. D - dataset attribute ===================== :: pipeline ... .load(src=D.data_path, ...) ... At each iteration ``D('data_path')`` will be replaced with the current value of ``pipeline.dataset.data_path``. .. note:: `D()` (i.e. without an attribute name) returns the dataset itself. I - iterations counter ====================== :: pipeline ... .print('Iteration:', I.current, ' out of ', I.max) ... `I('ratio')` returns the ratio `current / max` and thus allows to control the iteration progress. For instance, at each iteration dataset items can be rotated at a random angle which increases with time:: pipeline ... .rotate(angle=I('ratio')*45) ... F - callable ============ A function which might take arguments. The callable can be a lambda function:: pipeline .init_model('dynamic', MyModel, 'my_model', config={ 'inputs/images/shape': F(lambda image_shape: (-1,) + image_shape)(B.image_shape)}} }) or a batch class method:: pipeline .train_model(model_name, make_data=F(MyBatch.pack_to_feed_dict)(B(), task='segmentation')) or an arbitrary function:: def get_boxes(batch, shape): x_coords = slice(0, shape[0]) y_coords = slice(0, shape[1]) return batch.images[:, y_coords, x_coords] pipeline ... .update_variable(var_name, F(get_boxes)(B(), C('image_shape'))) ... or any other Python callable. As static models are initialized before a pipeline is run (i.e. before any batch is created), all ``F``-functions specified in static ``init_model`` cannot get ``batch``:: pipeline .init_model('static', MyModel, 'my_model', config={ 'inputs/images/shape': F(get_shape)(C.input_shape)}} }) It can also be an arbitrary function with arbitrary arguments:: pipeline ... .init_variable('logfile', F(open)('file.log', 'w')) ... R - random value ================ A sample from a random distribution. All `numpy distributions `_ are supported:: pipeline .some_action(R('uniform')) .other_action(R('beta', 1, 1, seed=14)) .yet_other_action(R('poisson', lam=4, size=(2, 5))) .one_more_action(R(['opera', 'ballet', 'musical'], p=[.1, .15, .75], size=15, seed=42)) W - a wrapper ============= To pass a named expression to an action without evaluating it within a pipeline you can wrap it:: pipeline .some_action(arg=W(V('variable')) As a result ``some_action`` will get not a current value of a pipeline variable, but a ``V``-expression itself. P - a parallel wrapper ====================== It comes in handy for parallel actions so that :doc:`@inbatch_parallel ` could determine that different values should be passed to parallel invocations of the action. For instance, each item in the batch will be rotated at its own angle:: pipeline .rotate(angle=P(R('uniform', -30, 30))) Without ``P`` all images in the batch will be rotated at the same angle, since an angle randomized across batches only:: pipeline .rotate(angle=R('normal', 0, 1)) Every image in the batch gets a noise of the same intensity (7%), but of a different color:: pipeline. .add_color_noise(p_noise=.07, color=P(R('uniform', 0, 255, size=3))) ``P`` can be used not only with ``R``-expressions:: pipeline .some_action(P(V('loss_history'))) .other_action(P(C('apriori_info'))) .yet_other_action(P(B('sensor_data'))) .do_something(n=P([1, 2, 3, 4, 5])) However, more often ``P`` is applied to ``R``-expressions. PP - a parallel wrapper ======================= ``PP(expr)`` is essentially ``P([expr for _ in batch.indices])``. It comes in handy for shape-specific operations (e.g. `@` - matrix multiplication) or external functions which return single values. As far as ``R`` is concerned, ``P(R(...))`` is more efficient as it evaluates only once (as ``R(..., size=batch.size)``). Whereas ``PP(R(...))`` will evaluate ``R`` multiple times (once for each batch item).