from devito import Grid, TimeFunction, Eq, Operator
= Grid(shape=(4, 4))
grid = TimeFunction(name='u', grid=grid, save=3)
u = Operator(Eq(u.forward, u + 1)) op
Introduction to Operator
apply
, arguments
, and estimate_memory
This tutorial describes three fundamental user APIs:
- Operator.apply
- Operator.arguments
- Operator.estimate_memory
We will use a trivial Operator
that, at each time step, increments by 1 all points in the physical domain.
To run op
, we have to “apply
” it.
#NBVAL_IGNORE_OUTPUT
= op.apply() summary
Operator `Kernel` run in 0.00 s
Under the hood, some code has been generated (print(op)
to display the generated code), JIT-compiled, and executed. Since no additional arguments have been passed, op
has used u
as input. We can verify that the content of u.data
is as expected
u.dimensions, u.shape
((time, x, y), (3, 4, 4))
u.data
Data([[[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]],
[[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.]],
[[2., 2., 2., 2.],
[2., 2., 2., 2.],
[2., 2., 2., 2.],
[2., 2., 2., 2.]]], dtype=float32)
In particular, we observe that:
u
has size 3 along the time dimension, since it was built withsave=3
. Thereforeop
could only execute 2 timesteps, namely time=0 and time=1; givenEq(u.forward, u + 1)
, executing time=2 would cause out-of-bounds access errors. Devito figures this out automatically and sets appropriate minimum and maximum iteration points.- All 16 points in each timeslice of the 4x4
Grid
have been computed.
To access all default arguments used by op
without running the Operator
, one can run
#NBVAL_IGNORE_OUTPUT
op.arguments()
{'u': <cparam 'P' (0x7fb0d01d45a8)>,
'time_m': 0,
'time_size': 3,
'time_M': 1,
'x_m': 0,
'x_size': 4,
'x_M': 3,
'y_m': 0,
'y_size': 4,
'y_M': 3,
'timers': <cparam 'P' (0x7fb0d0550918)>}
'u'
stores a pointer to the allocated data; 'timers'
stores a pointer to a data structure used for C-level performance profiling.
One may want to replace some of these default arguments. For example, we could increase the minimum iteration point along the spatial Dimensions x
and y
, and execute only the very first timestep:
#NBVAL_IGNORE_OUTPUT
= 0. # Explicit reset to initial value
u.data[:] = op.apply(x_m=2, y_m=2, time_M=0) summary
Operator `Kernel` run in 0.00 s
We look again at the computed data to convince ourselves that everything went as intended to go
u.data
Data([[[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]],
[[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 1., 1.],
[0., 0., 1., 1.]],
[[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]]], dtype=float32)
Given a generic Dimension
d
, the naming convention is such that:
d_m
is the minimum iteration pointd_M
is the maximum iteration point
Hence, op.apply(..., d_m=4, d_M=7, ...)
will run op
in the compact interval [4, 7]
along d
. For historical reasons, d=...
aliases to d_M=...
; in many Devito examples it happens to see op.apply(..., time=10, ...)
– this is just equivalent to op.apply(..., time_M=10, ...)
.
If we try to specify an invalid iteration extreme, Devito will raise an exception.
from devito.exceptions import InvalidArgument
try:
apply(time_M=2)
op.except InvalidArgument as e:
print(e)
OOB detected due to time_M=2
The same Operator
can be applied to a different TimeFunction
. For example:
#NBVAL_IGNORE_OUTPUT
= TimeFunction(name='u', grid=grid, save=5)
u2 = op.apply(u=u2) summary
Operator `Kernel` run in 0.00 s
u2.data
Data([[[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]],
[[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.]],
[[2., 2., 2., 2.],
[2., 2., 2., 2.],
[2., 2., 2., 2.],
[2., 2., 2., 2.]],
[[3., 3., 3., 3.],
[3., 3., 3., 3.],
[3., 3., 3., 3.],
[3., 3., 3., 3.]],
[[4., 4., 4., 4.],
[4., 4., 4., 4.],
[4., 4., 4., 4.],
[4., 4., 4., 4.]]], dtype=float32)
Note that this was the third call to op.apply
, but code generation and JIT-compilation only occurred upon the very first call.
There is one relevant case in which the maximum iteration point along the time dimension must be specified – whenever save
is unset, as in such a case the Operator
wouldn’t know for how many iterations to run.
= TimeFunction(name='v', grid=grid)
v = Operator(Eq(v.forward, v + 1))
op2 try:
apply()
op2.except ValueError as e:
print(e)
No value found for parameter time_M
#NBVAL_IGNORE_OUTPUT
= op2.apply(time_M=4) summary
Operator `Kernel` run in 0.00 s
v.data
Data([[[4., 4., 4., 4.],
[4., 4., 4., 4.],
[4., 4., 4., 4.],
[4., 4., 4., 4.]],
[[5., 5., 5., 5.],
[5., 5., 5., 5.],
[5., 5., 5., 5.],
[5., 5., 5., 5.]]], dtype=float32)
The summary
variable can be inspected to retrieve performance metrics.
#NBVAL_IGNORE_OUTPUT
summary
PerformanceSummary([('section0',
PerfEntry(time=3e-06, gflopss=0.0, gpointss=0.0, oi=0.0, ops=0, itershapes=[]))])
We observe that basically all entries except for the execution time are fixed at 0. This is because by default Devito avoids computing performance metrics, to minimize the processing time before returning control to the user (in complex Operators
, the processing time to retrieve, for instance, the operation count or the memory footprint could be significant). To compute all performance metrics, a user could either export the environment variable DEVITO_PROFILING
to 'advanced'
or change the profiling level programmatically before the Operator
is constructed
#NBVAL_IGNORE_OUTPUT
from devito import configuration
'profiling'] = 'advanced'
configuration[
= Operator(Eq(u.forward, u*u + 1.))
op apply() op.
Operator `Kernel` run in 0.00 s
PerformanceSummary([('section0',
PerfEntry(time=3e-06, gflopss=0.021333333333333333, gpointss=0.010666666666666666, oi=0.16666666666666666, ops=2, itershapes=[(2, 4, 4)]))])
A PerformanceSummary
will contain as many entries as “sections” in the generated code. Currently, there is no way to automatically tie a compiler-generated section to the user-provided Eq
s (in general, there can be more than one Eq
in a section). The only option is to look at the generated code and search for bodies of code wrapped within C comments such as
<code>
For example
# Uncomment me and search for START(section0) ... STOP(section0) */
# print(op)
In the PerformanceSummary
, associated to section0
is a PerfEntry
, whose entries represent:
- time: The time, in seconds, to execute the section.
- gflopss: Performance of the section in Gigaflops per second.
- gpointss: Performance of the section in Gigapoints per second.
- oi: Operational intensity.
- ops: Floating point operations at each (innermost loop) iteration.
- itershape: The shape of the iteration space.
Memory estimation
If one intends to estimate the memory consumed by an Operator
before executing it, the estimate_memory
utility should be used. For example, taking the previous Operator
and estimating its memory consumption is carried out as follows:
op.estimate_memory()
MemoryEstimate(Kernel): {'host': '432 B', 'device': '0 B'}
This method estimates the memory consumption of the Operator
from all array-carrying symbols within the Operator
(Function
s, SparseFunction
s, TimeFunction
s, etc). In this case, we only have the tiny TimeFunction
u
, hence the small memory consumption shown. Additionally, if offloading to GPU via OpenACC or similar, memory consumption on the device will be given in the device
entry.
The Operator.estimate_memory()
method can be used to supply overrides as per Operator.apply()
, and these will be used to adjust the estimate accordingly.
from devito import switchconfig
with switchconfig(log_level='DEBUG'): # Log memory allocation
= Grid(shape=(201, 201))
grid_big # u_big is never touched, so we should not see a memory allocation for it
= TimeFunction(name='u_big', grid=grid_big, save=100)
u_big = op.estimate_memory(u=u_big)
memreport
memreport
MemoryEstimate(Kernel): {'host': '16 MB', 'device': '0 B'}
We can see that overriding u
with a larger TimeFunction
has made the projected memory consumption of this Operator
substantially larger. Note that when applying overrides, no data touch (and thus no allocation) is performed, enabling estimation of the memory consumption of operators which may not fit on the current machine (so long as one does not touch the data themselves, for example by accessing or assigning to u.data[:]
).
The estimate provided also includes any array temporaries constructed by the Devito compiler.
from devito import sin, Function
= TimeFunction(name='f', grid=grid, space_order=2)
f = TimeFunction(name='g', grid=grid, space_order=2)
g = Function(name='a', grid=grid, space_order=2)
a
# Reuse an expensive function to encourage generation of an array temp
= Eq(f.forward, g + sin(a).dx)
eq0 = Eq(g.forward, f + sin(a).dx)
eq1
= Operator([eq0, eq1], name="FancyKernel")
op_fancy = op_fancy.estimate_memory()
memreport memreport
MemoryEstimate(FancyKernel): {'host': '1 KB', 'device': '0 B'}
We can see the discrepency between naively summing the sizes of the Function
s used to construct the Operator
and the results of the estimate. This is due to the array temporary reduced by the compiler.
import numpy as np
= (f.size_allocated + g.size_allocated + a.size_allocated)*np.dtype(f.dtype).itemsize
function_size
print(f"Functions have a total size of {function_size} bytes, but {memreport['host']} bytes are consumed by the `Operator`")
Functions have a total size of 1280 bytes, but 1360 bytes are consumed by the `Operator`
The MemoryEstimate
object is designed to aid readability for orchestration. For example, one may want to estimate the memory consumption of an Operator
on an orchestration node, before selecting the ideal hardware to run it on. When accessing the MemoryEstimate
using its keys, values are returned in machine-readable bytes for this purpose. There is also a to_json
method included for easily writing this mapper to JSON for ingestion by orchestrators.
'host'] memreport[
\(\displaystyle 1360\)