{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# \n",
"\n",
"# Frequently Asked Questions\n",
"\n",
"- [How can I see the code generated by\n",
" Devito?](#how-can-i-see-the-code-generated-by-devito)\n",
"- [How can I see the compilation command with which Devito compiles\n",
" the generated\n",
" code](#how-can-i-see-the-compilation-command-with-which-devito-compiles-the-generated-code)\n",
"- [Where does the generated code go and how do I look at\n",
" it](#where-does-the-generated-code-go-and-how-do-i-look-at-it)\n",
"- [Can I change the directory where Devito stashes the generated\n",
" code](#can-i-change-the-directory-where-devito-stashes-the-generated-code)\n",
"- [I create an Operator, look at the generated code, and the equations\n",
" appear in a different order than I\n",
" expected.](#i-create-an-operator-look-at-the-generated-code-and-the-equations-appear-in-a-different-order-than-i-expected)\n",
"- [Do Devito Operators release the GIL when executing C\n",
" code?](#do-devito-operators-release-the-GIL-when-executing-C-code)\n",
"- [What performance optimizations does Devito\n",
" apply](#what-performance-optimizations-does-devito-apply)\n",
"- [Does Devito optimize complex\n",
" expressions](#does-devito-optimize-complex-expressions)\n",
"- [How are abstractions used in the seismic\n",
" examples](#how-are-abstractions-used-in-the-seismic-examples)\n",
"- [What environment variables control how Devito\n",
" works](#what-environment-variables-control-how-devito-works)\n",
"- [What are the accepted combinations of PLATFORM, ARCH, and\n",
" LANGUAGE](#what-are-the-accepted-combinations-of-platform-arch-and-language)\n",
"- [How do you run the unit tests from the command\n",
" line](#how-do-you-run-the-unit-tests-from-the-command-line)\n",
"- [What is the difference between f() and f\\[\\]\n",
" notation](#what-is-the-difference-between-f-and-f-notation)\n",
"- [What is the indexed notation](#what-is-the-indexed-notation)\n",
"- [Is there a flow chart](#is-there-a-flow-chart)\n",
"- [What’s up with object.data](#whats-up-with-objectdata)\n",
"- [How do I create an N-dimensional\n",
" grid](#how-do-i-create-and-n-dimensional-grid)\n",
"- [What are the keys to fast code](#what-are-the-keys-to-fast-code)\n",
"- [As time increases in the finite difference evolution, are wavefield\n",
" arrays “swapped” as you might see in c/c++\n",
" code](#as-time-increases-in-the-finite-difference-evolution-are-wavefield-arrays-swapped-as-you-might-see-in-cc-code)\n",
"- [What units are typically used in Devito\n",
" examples](#what-units-are-typically-used-in-devito-examples)\n",
"- [Can I subclass basic types such as\n",
" TimeFunction](#can-i-subclass-basic-types-such-as-timefunction)\n",
"- [How can I change the compilation flags (for example, I want to\n",
" change the optimization level from -O3 to\n",
" -O0)](#how-can-i-change-the-compilation-flags-for-example-i-want-to-change-the-optimization-level-from--o3-to--o0)\n",
"- [Is the jitted code\n",
" IEEE-compliant](#is-the-jitted-code-ieee-compliant)\n",
"- [Can I control the MPI domain\n",
" decomposition](#can-i-control-the-mpi-domain-decomposition)\n",
"- [How should I use MPI on multi-socket\n",
" machines](#how-should-I-use-MPI-on-multi-socket-machines)\n",
"- [How do I make sure my code is “MPI\n",
" safe”](#how-do-i-make-sure-my-code-is-MPI-safe)\n",
"- [Why does my Operator kernel die\n",
" suddenly](#why-does-my-operator-kernel-die-suddenly)\n",
"- [Can I manually modify the C code generated by Devito and test these\n",
" modifications](#can-i-manually-modify-the-c-code-generated-by-devito-and-test-these-modifications)\n",
"- [How do I find the source of my bug quickly and get\n",
" support](#how-do-i-find-the-source-of-my-bug-quickly-and-get-support)\n",
"- [Is there a way to get the performance of an\n",
" Operator](#is-there-a-way-to-get-the-performance-of-an-operator)\n",
"- [How does devito compute the performance of an\n",
" Operator](#how-does-devito-compute-the-performance-of-an-operator)\n",
"- [How do I fairly compare the performance to that of my in-house\n",
" code](#how-do-i-fairly-compare-the-performance-to-that-of-my-in-house-code)\n",
"- [Is there is list of refereed papers related to the Devito\n",
" project](#is-there-a-list-of-refereed-papers-related-to-the-devito-project)\n",
"- [How do I cite Devito](#how-do-i-cite-devito)\n",
"- [Where did the name Devito come\n",
" from?](#where-did-the-name-devito-come-from)\n",
"\n",
"## How can I see the code generated by Devito?\n",
"\n",
"After you build an `op=Operator(...)` implementing one or more\n",
"equations, you can use `print(op)` to see the generated low level code.\n",
"The example below builds an operator that takes a 1/2 cell forward\n",
"shifted derivative of the `Function` **f** and puts the result in the\n",
"`Function` **g**.\n",
"\n",
"``` python\n",
"import numpy as np\n",
"import devito\n",
"from devito import Grid, Function, Eq, Operator\n",
"grid = Grid(shape=(21,), extent=(1.0,), origin=(0.0,), dtype=np.float32)\n",
"x = grid.dimensions[0]\n",
"f = Function(name='f', grid=grid, space_order=8)\n",
"g = Function(name='g', grid=grid, space_order=8)\n",
"eq_dfdx = Eq(g, getattr(f, 'dx')(x+0.5*x.spacing))\n",
"op = Operator(eq_dfdx)\n",
"print(op)\n",
"```\n",
"\n",
"And the output:\n",
"\n",
"``` c\n",
"#define _POSIX_C_SOURCE 200809L\n",
"#include \"stdlib.h\"\n",
"#include \"math.h\"\n",
"#include \"sys/time.h\"\n",
"#include \"xmmintrin.h\"\n",
"#include \"pmmintrin.h\"\n",
"\n",
"struct dataobj\n",
"{\n",
" void *restrict data;\n",
" int * size;\n",
" int * npsize;\n",
" int * dsize;\n",
" int * hsize;\n",
" int * hofs;\n",
" int * oofs;\n",
"} ;\n",
"\n",
"struct profiler\n",
"{\n",
" double section0;\n",
"} ;\n",
"\n",
"\n",
"int Kernel(struct dataobj *restrict f_vec, struct dataobj *restrict g_vec, const float h_x, struct profiler * timers, const int x_M, const int x_m)\n",
"{\n",
" float (*restrict f) __attribute__ ((aligned (64))) = (float (*)) f_vec->data;\n",
" float (*restrict g) __attribute__ ((aligned (64))) = (float (*)) g_vec->data;\n",
"\n",
" /* Flush denormal numbers to zero in hardware */\n",
" _MM_SET_DENORMALS_ZERO_MODE(_MM_DENORMALS_ZERO_ON);\n",
" _MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);\n",
"\n",
" START(section0)\n",
" for (int x = x_m; x <= x_M; x += 1)\n",
" {\n",
" g[x + 8] = (3.57142857e-3F*(f[x + 4] - f[x + 12]) + 3.80952381e-2F*(-f[x + 5] + f[x + 11]) + 2.0e-1F*(f[x + 6] - f[x + 10]) + 8.0e-1F*(-f[x + 7] + f[x + 9]))/h_x;\n",
" }\n",
" STOP(section0,timers)\n",
"\n",
" return 0;\n",
"}\n",
"```\n",
"\n",
"## How can I see the compilation command with which Devito compiles the generated code\n",
"\n",
"Set the environment variable `DEVITO_LOGGING=DEBUG`. When an Operator\n",
"gets compiled, the used compilation command will be emitted to stdout.\n",
"\n",
"If nothing seems to change, it is possible that no compilation is\n",
"happening under-the-hood as all kernels have already been compiled in a\n",
"previous run. You will then have to clear up the Devito kernel cache.\n",
"From the Devito root directory, run:\n",
"\n",
"``` bash\n",
"python scripts/clear_devito_cache.py\n",
"```\n",
"\n",
"## Where does the generated code go and how do I look at it\n",
"\n",
"Devito stores the generated code as well as the jit-compiled libraries\n",
"in a temporary directory. By setting the environment variable\n",
"`DEVITO_LOGGING=DEBUG`, Devito will show, amongst other things, the\n",
"absolute path to the generated code.\n",
"\n",
"## Can I change the directory where Devito stashes the generated code\n",
"\n",
"Yes, just set the environment variable `TMPDIR` to your favorite\n",
"location.\n",
"\n",
"## I create an Operator, look at the generated code, and the equations appear in a different order than I expected.\n",
"\n",
"The Devito compiler computes a topological ordering of the input\n",
"equations based on data dependency analysis. Heuristically, some\n",
"equations might be moved around to improve performance (e.g., data\n",
"locality). Therefore, the order of the equations in the generated code\n",
"might be different than that used as input to the Operator.\n",
"\n",
"## Do Devito Operators release the GIL when executing C code?\n",
"\n",
"Yes. Devito uses\n",
"[ctypes.CDLL](https://docs.python.org/3/library/ctypes.html#ctypes.CDLL)\n",
"to call the JIT C code which releases the GIL.\n",
"\n",
"## What performance optimizations does Devito apply\n",
"\n",
"Take a look\n",
"[here](https://github.com/devitocodes/devito/tree/master/examples/performance)\n",
"and in particular [at this\n",
"notebook](https://github.com/devitocodes/devito/blob/master/examples/performance/00_overview.ipynb).\n",
"\n",
"## Does Devito optimize complex expressions\n",
"\n",
"Devito applies several performance optimizations to improve the number\n",
"of operations (“operation count”) in complex expressions. These\n",
"optimizations are designed to do a really good job but also be\n",
"reasonably fast. One such pass attempts to factorize as many common\n",
"terms as possible in expressions in order to reduce the operation count.\n",
"We will construct a demonstrative example below that has a common term\n",
"that is *not* factored out by the Devito optimization. The difference in\n",
"floating-point operations per output point for the factoring of that\n",
"term is about 10 percent, and the generated C is different, but\n",
"numerical outputs of running the two different operators are\n",
"indistinguishable to machine precision. In terms of actual performance,\n",
"the (few) missed factorization opportunities may not necessarily be a\n",
"relevant issue: as long as the code is not heavily compute-bound, the\n",
"runtimes may only be slightly higher than in the optimally-factorized\n",
"version.\n",
"\n",
"#### Operator 1:\n",
"\n",
"``` python\n",
"ux_update = t.spacing**2 * b * \\\n",
" ((c33 * u_x.dx(x0=x+x.spacing/2)).dx(x0=x-x.spacing/2) +\n",
" (c55 * u_x.dz(x0=z+z.spacing/2)).dz(x0=z-z.spacing/2) +\n",
" (c13 * u_z.dz(x0=z+z.spacing/2)).dx(x0=x-x.spacing/2) +\n",
" (c55 * u_z.dx(x0=x+x.spacing/2)).dz(x0=z-z.spacing/2)) + \\\n",
" (2 - t.spacing * wOverQ) * u_x + (t.spacing * wOverQ - 1) * u_x.backward \n",
"stencil_x = Eq(u_x.forward, ux_update)\n",
"print(\"\\n\", stencil_x)\n",
"op = Operator([stencil_x])\n",
"```\n",
"\n",
"#### Operator 2:\n",
"\n",
"``` python\n",
"ux_update = \\\n",
" t.spacing**2 * b * (c33 * u_x.dx(x0=x+x.spacing/2)).dx(x0=x-x.spacing/2) + \\\n",
" t.spacing**2 * b * (c55 * u_x.dz(x0=z+z.spacing/2)).dz(x0=z-z.spacing/2) + \\\n",
" t.spacing**2 * b * (c13 * u_z.dz(x0=z+z.spacing/2)).dx(x0=x-x.spacing/2) + \\\n",
" t.spacing**2 * b * (c55 * u_z.dx(x0=x+x.spacing/2)).dz(x0=z-z.spacing/2) + \\\n",
" (2 - t.spacing * wOverQ) * u_x + (t.spacing * wOverQ - 1) * u_x.backward \n",
"stencil_x = Eq(u_x.forward, ux_update)\n",
"print(\"\\n\", stencil_x)\n",
"op = Operator([stencil_x])\n",
"```\n",
"\n",
"#### Output 1:\n",
"\n",
"``` bash\n",
"Eq(u_x(t + dt, x, z), dt**2*(Derivative(c13(x, z)*Derivative(u_z(t, x, z), z), x) + Derivative(c33(x, z)*Derivative(u_x(t, x, z), x), x) + Derivative(c55(x, z)*Derivative(u_x(t, x, z), z), z) + Derivative(c55(x, z)*Derivative(u_z(t, x, z), x), z))*b(x, z) + (-dt*wOverQ(x, z) + 2)*u_x(t, x, z) + (dt*wOverQ(x, z) - 1)*u_x(t - dt, x, z))\n",
"Operator `Kernel` generated in 1.26 s\n",
" * lowering.Expressions: 0.61 s (48.7 %)\n",
" * lowering.Clusters: 0.52 s (41.5 %)\n",
" * specializing.Clusters: 0.31 s (24.8 %)\n",
"Flops reduction after symbolic optimization: [1160 --> 136]\n",
"```\n",
"\n",
"#### Output 2:\n",
"\n",
"``` bash\n",
"Eq(u_x(t + dt, x, z), dt**2*b(x, z)*Derivative(c13(x, z)*Derivative(u_z(t, x, z), z), x) + dt**2*b(x, z)*Derivative(c33(x, z)*Derivative(u_x(t, x, z), x), x) + dt**2*b(x, z)*Derivative(c55(x, z)*Derivative(u_x(t, x, z), z), z) + dt**2*b(x, z)*Derivative(c55(x, z)*Derivative(u_z(t, x, z), x), z) + (-dt*wOverQ(x, z) + 2)*u_x(t, x, z) + (dt*wOverQ(x, z) - 1)*u_x(t - dt, x, z))\n",
"Operator `Kernel` generated in 1.12 s\n",
" * lowering.Expressions: 0.59 s (53.0 %)\n",
" * lowering.Clusters: 0.40 s (35.9 %)\n",
" * specializing.Clusters: 0.31 s (27.8 %)\n",
"Flops reduction after symbolic optimization: [1169 --> 149]\n",
"```\n",
"\n",
"Other optimizations include common sub-expressions elimination, hoisting\n",
"of loop-invariant code, and detection of cross-iteration redundancies\n",
"(e.g., due to high order derivatives). Below we show the cumulative\n",
"impact of all these optimizations in a number of seismic operators.\n",
"\n",
"\n",
"\n",
"For more info, take a look [at this\n",
"notebook](https://github.com/devitocodes/devito/blob/master/examples/performance/00_overview.ipynb).\n",
"\n",
"## How are abstractions used in the seismic examples\n",
"\n",
"Many Devito examples are provided that demonstrate application for\n",
"specific problems, including e.g. fluid mechanics and seismic modeling.\n",
"We focus in this question on seismic modeling examples that provide\n",
"convenience wrappers to build differential equations and create Devito\n",
"Operators for various types of modeling physics including isotropic and\n",
"anisotropic, acoustic and elastic.\n",
"\n",
"These examples\n",
"([link](https://github.com/devitocodes/devito/tree/master/examples)) use\n",
"abstractions to remove details from the methods that actually build the\n",
"operators. The idea is that at the time you build a Devito operator, you\n",
"don’t need specific material parameter arrays (e.g. velocity or density\n",
"or Thomsen parameter), and you don’t need specific locations of sources\n",
"and receiver instruments. All you need to build the operator is a\n",
"placeholder that can provide the dimensionality and (for example) the\n",
"spatial order of finite difference approximation you wish to employ. In\n",
"this way you can build and return functioning highly optimized operators\n",
"to which you can provide the specific implementation details at runtime\n",
"via command line arguments.\n",
"\n",
"An example of this abstraction (or placeholder design pattern) in\n",
"operation is the call to the isotropic acoustic\n",
"`AcousticWaveSolver.forward` method that returns a Devito operator via\n",
"the `ForwardOperator` method defined in\n",
"[operators.py](https://github.com/devitocodes/devito/blob/master/examples/seismic/acoustic/operators.py#L65-L105).\n",
"\n",
"You will note that this method uses placeholders for the material\n",
"parameter arrays and the source and receiver locations, and then at\n",
"runtime uses arguments provided to the returned `Operator` to provide\n",
"state to the placeholders. You can see this happen on lines 112-113 in\n",
"[wavesolver.py](https://github.com/devitocodes/devito/blob/master/examples/seismic/acoustic/wavesolver.py#L112-L113).\n",
"\n",
"## What environment variables control how Devito works\n",
"\n",
"### How to get the list of Devito environment variables\n",
"\n",
"You can get the list of environment variables and all their possible\n",
"values with the following python code:\n",
"\n",
"``` python\n",
"from devito import print_defaults\n",
"print_defaults()\n",
"```\n",
"\n",
"### How to set Devito environment variables\n",
"\n",
"These environment variables can either be set from the shell or\n",
"programmatically. Note that when setting these variables\n",
"programmatically, you need to use lower case, and the leading `DEVITO`\n",
"is omitted. Values are case sensitive, meaning `openmp` is accepted and\n",
"`OPENMP` will throw an error. Below are examples of setting these\n",
"variables in the shell (**before** running python) and from python\n",
"(**before** executing devito code).\n",
"\n",
"| Method | Example |\n",
"|:-----------------|:---------------------------------------|\n",
"| bourne shell | DEVITO_LANGUAGE=“openmp” |\n",
"| c shell | setenv DEVITO_LANGUAGE “openmp” |\n",
"| programmatically | configuration\\[‘language’\\] = ‘openmp’ |\n",
"\n",
"### Description of Devito environment variables\n",
"\n",
"#### DEVITO_ARCH\n",
"\n",
"Used to select a specific “backend compiler”. The backend compiler takes\n",
"as input the code generated by Devito and produces a shared object.\n",
"Supported backend compilers are `gcc`, `icc`, `pgcc`, `clang`. For each\n",
"of these compilers, Devito uses some preset compilation flags (e.g.,\n",
"`-O3`, `-march=native`, `-fast-math`). If this environment variable is\n",
"left unset, Devito will attempt auto-detection of the most suitable\n",
"backend compiler available on the underlying system.\n",
"\n",
"#### DEVITO_PLATFORM\n",
"\n",
"This environment variable is mostly needed when running on GPUs, to ask\n",
"Devito to generate code for a particular device (see for example this\n",
"[tutorial](https://github.com/devitocodes/devito/blob/master/examples/gpu/01_diffusion_with_openmp_offloading.ipynb)).\n",
"Can be also used to specify CPU architectures such as Intel’s – Haswell,\n",
"Broadwell, SKL and KNL – ARM, AMD, and Power. Often one can ignore this\n",
"variable because Devito typically does a decent job at auto-detecting\n",
"the underlying platform.\n",
"\n",
"#### DEVITO_LANGUAGE\n",
"\n",
"Specify the generated code language. The default is `C`, which means\n",
"sequential C. Use `openmp` to emit C+OpenMP or `openacc` for C+OpenACC.\n",
"\n",
"#### DEVITO_PROFILING\n",
"\n",
"Choose the performance profiling level. This is also automatically\n",
"increased with `DEVITO_LOGGING=PERF` or `DEVITO_LOGGING=DEBUG`, in which\n",
"case this environment variable can be ignored.\n",
"\n",
"#### DEVITO_DEVELOP\n",
"\n",
"Mostly useful for developers when chasing\n",
"[segfaults](https://github.com/devitocodes/devito/blob/24ede9131473f69c1e44ba3b852f8654d3fd953e/devito/data/allocators.py#L373).\n",
"\n",
"#### DEVITO_OPT\n",
"\n",
"Choose the performance optimization level. By default set to the maximum\n",
"level, `advanced`.\n",
"\n",
"#### DEVITO_MPI\n",
"\n",
"Controls MPI in Devito. Use `1` to enable MPI. The most powerful MPI\n",
"mode is called “full”, and is activated setting `DEVITO_MPI=full`. The\n",
"“full” mode implements a number of optimizations including\n",
"computation/communication overlap.\n",
"\n",
"#### DEVITO_AUTOTUNING\n",
"\n",
"Search across a set of block shapes to maximize the effectiveness of\n",
"loop tiling (aka cache blocking). You can choose between `off`\n",
"(default), `basic`, `aggressive`, `max`. A more aggressive autotuning\n",
"should eventually result in better runtime performance, though the\n",
"search phase will take longer.\n",
"\n",
"#### DEVITO_LOGGING\n",
"\n",
"Run with `DEVITO_LOGGING=DEBUG` to find out the specific performance\n",
"optimizations applied by an Operator, how auto-tuning is getting along,\n",
"to emit the command used to compile the generated code, to emit more\n",
"performance metrics, and much more.\n",
"\n",
"#### DEVITO_FIRST_TOUCH\n",
"\n",
"Use `DEVITO_FIRST_TOUCH=1` in combination with `DEVITO_LANGUAGE=openmp`\n",
"to use an OpenMP parallel Operator for initialization of Function data.\n",
"This should ensure NUMA locality for data access when running “flatten”\n",
"OpenMP across multiple sockets on the same node (as opposed to using MPI\n",
"– one MPI process per socket – plus OpenMP, which is the recommended\n",
"way).\n",
"\n",
"#### DEVITO_JIT_BACKDOOR\n",
"\n",
"You can set `DEVITO_JIT_BACKDOOR=1` to test custom modifications to the\n",
"generated code. For more info, take a look at this\n",
"[FAQ](https://github.com/devitocodes/devito/wiki/FAQ#can-i-manually-modify-the-c-code-generated-by-devito-and-test-these-modifications).\n",
"\n",
"#### DEVITO_IGNORE_UNKNOWN_PARAMS\n",
"\n",
"Set `DEVITO_IGNORE_UNKNOWN_PARAMS=1` to avoid Devito raising an\n",
"exception if one attempts to pass an unknown argument to `op.apply()`.\n",
"\n",
"## What are the accepted combinations of PLATFORM, ARCH, and LANGUAGE\n",
"\n",
"#### LANGUAGE={C,openmp}\n",
"\n",
"These two languages can be used with virtually any PLATFORM and ARCH.\n",
"\n",
"With a device PLATFORM (e.g., `nvidiaX`, `amdgpuX`, or `intelgpuX`), the\n",
"compiler will generate OpenMP code for device offloading.\n",
"\n",
"When using OpenMP offloading, it is recommended to stick to the\n",
"corresponding vendor compiler, so `ARCH=amdclang` for AMD,\n",
"`ARCH={icc,icx,intel}` for Intel, and `ARCH=nvc` for NVidia.\n",
"\n",
"#### LANGUAGE=openacc\n",
"\n",
"Requires: `PLATFORM=nvidiaX` and `ARCH=nvc`.\n",
"\n",
"The legacy PGI compiler is also supported via `ARCH=pgcc`.\n",
"\n",
"#### LANGUAGE=cuda\n",
"\n",
"*DevitoPRO only.*\n",
"\n",
"Requires: `PLATFORM=nvidiaX` and `ARCH=cuda`.\n",
"\n",
"#### LANGUAGE=hip\n",
"\n",
"*DevitoPRO only.*\n",
"\n",
"Requires: `PLATFORM=amdgpuX` and `ARCH=hip`.\n",
"\n",
"## How do you run the unit tests from the command line\n",
"\n",
"In addition to the\n",
"[tutorials](https://www.devitoproject.org/devito/tutorials.html), the\n",
"unit tests provide an excellent way to see how the Devito API works with\n",
"small self-contained examples. You can exercise individual unit tests\n",
"with the following python code:\n",
"\n",
"``` bash\n",
"pytest \n",
"pytest -vs [more detailed log]\n",
"```\n",
"\n",
"## What is the difference between f() and f\\[\\] notation\n",
"\n",
"Devito offers a functional language to express finite difference\n",
"operators. This is introduced\n",
"[here](https://github.com/devitocodes/devito/blob/master/examples/userapi/01_dsl.ipynb)\n",
"and systematically used throughout our examples and tutorials. The\n",
"language relies on what in jargon we call the “f() notation”.\n",
"\n",
"``` python\n",
">>> from devito import Grid, Function\n",
">>> grid = Grid(shape=(5, 6))\n",
">>> f = Function(name='f', grid=grid, space_order=2)\n",
">>> f.dx\n",
"Derivative(f(x, y), x)\n",
">>> f.dx.evaluate\n",
"-f(x, y)/h_x + f(x + h_x, y)/h_x\n",
"```\n",
"\n",
"Sometimes, one wishes to escape the constraints of the language. Instead\n",
"of taking derivatives, other special operations are required. Or\n",
"perhaps, a specific grid point needs to be accessed. In such a case, one\n",
"could use the “f\\[\\] notation” or “indexed notation”. Following on from\n",
"the example above:\n",
"\n",
"``` python\n",
">>> x, y = grid.dimensions\n",
">>> f[x + 1000, y]\n",
"f[x + 1000, y]\n",
"```\n",
"\n",
"The indexed object can be used at will to construct `Eq`s, and they can\n",
"be mixed up with objects stemming from the “f() notation”.\n",
"\n",
"``` python\n",
">>> f.dx + f[x + 1000, y]\n",
"Derivative(f(x, y), x) + f[x + 1000, y]\n",
"```\n",
"\n",
"However, while the “f() notation” is substantially safe – the language\n",
"is designed such that only legal stencil expressions are built – the\n",
"“f\\[\\] notation” is not, and one can easily end up creating operators\n",
"performing out-of-bounds accesses. So use it judiciously!\n",
"\n",
"## What is the indexed notation\n",
"\n",
"The indexed notation, or “f\\[\\] notation”, is discussed\n",
"[here](#What-is-the-difference-between-f()-and-f%5B%5D-notation)\n",
"\n",
"## Is there a flow chart\n",
"\n",
"**needs links** 1. Equation Lowering - Indexification - Substitutions -\n",
"Domain alignment - Eq -\\> LoweredEq 2. Local Analysis 3. Clustering 4.\n",
"Symbolic Optimization via Devito Symbolic Engine (DSE) 5. IET\n",
"(Iteration/Expression Tree) Construction 6. IET Analysis 7. IET\n",
"Optimization (DLE/YLE) - Loop optimization via Devito Loop Engine\n",
"(DLE) - Loop Tiling / Cache Blocking - SIMD - OpenMP - MPI 8. Synthetic\n",
"9. JIT (Just In Time) Compilation\n",
"\n",
"## What’s up with object.data\n",
"\n",
"The `.data` property which is associated with objects such as\n",
"`Constant`, `Function` and `SparseFunction` (along with their\n",
"derivatives) represents the ‘numerical’ value of the ‘data’ associated\n",
"with that particular object. For example, a `Constant` will have a\n",
"single numerical value associated with it as shown in the following\n",
"snippet\n",
"\n",
"``` python\n",
"from devito import Constant\n",
"\n",
"c = Constant(name='c')\n",
"c.data = 2.7\n",
"\n",
"print(c.data)\n",
"```\n",
"\n",
"``` default\n",
"2.7\n",
"```\n",
"\n",
"Then, a `Function` defined on a `Grid` will have a data value associated\n",
"with each of the grid points (as shown in the snippet below) and so\n",
"forth.\n",
"\n",
"``` python\n",
"import numpy as np\n",
"from devito import Grid, Function\n",
"\n",
"grid = Grid(shape=(4, 4), extent=(1, 1))\n",
"f = Function(name='f', grid=grid)\n",
"f.data[:] = np.arange(16).reshape(grid.shape)\n",
"\n",
"print(f.data)\n",
"```\n",
"\n",
"``` default\n",
"[[ 0. 1. 2. 3.]\n",
" [ 4. 5. 6. 7.]\n",
" [ 8. 9. 10. 11.]\n",
" [12. 13. 14. 15.]]\n",
"```\n",
"\n",
"## How do I create and N-dimensional grid\n",
"\n",
"Grids are often created via, e.g.,\n",
"\n",
"``` python\n",
"grid = Grid(shape=(5, 5))\n",
"```\n",
"\n",
"where printing the `grid` object then returns:\n",
"\n",
"``` default\n",
"Grid[extent=(1.0, 1.0), shape=(5, 5), dimensions=(x, y)]\n",
"```\n",
"\n",
"Here we see the `grid` has been created with the ‘default’ dimensions\n",
"`x` and `y`. If a grid is created and passed a shape of `(5, 5, 5)`\n",
"we’ll see that in addition it has a `z` dimension. However, what if we\n",
"want to create a grid with, say, a shape of `(5, 5, 5, 5)`? For this\n",
"case, we’ve now run out of the dimensions defined by default and hence\n",
"need to create our own dimensions to achieve this. This can be done via,\n",
"e.g.,\n",
"\n",
"``` python\n",
"a = SpaceDimension('a')\n",
"b = SpaceDimension('b')\n",
"c = SpaceDimension('c')\n",
"d = SpaceDimension('d')\n",
"grid = Grid(shape=(5, 5, 5, 5), dimensions=(a, b, c, d))\n",
"```\n",
"\n",
"where now, printng `grid` we get\n",
"\n",
"``` default\n",
"Grid[extent=(1.0, 1.0, 1.0, 1.0), shape=(5, 5, 5, 5), dimensions=(a, b, c, d)]\n",
"```\n",
"\n",
"and `grid.shape` returns\n",
"\n",
"``` default\n",
"(5, 5, 5, 5)\n",
"```\n",
"\n",
"## What are the keys to fast code\n",
"\n",
"The code generated by devito is designed to run fast on CPU, GPU and\n",
"clusters thereof. Broadly outlined, some of the mechanics for generating\n",
"fast code are: \\### CPU \\* Loop tiling (or “cache blocking”) \\* Loop\n",
"fusion (to maximize data locality) Loop fission (to maximize\n",
"parallelism) \\* Flop-reducing transformations (CSE, cross-stencil\n",
"redundancies, factorization, hoisting) \\* OpenMP threading \\*\n",
"OpenMP-based SIMD \\* Alignment to promote SIMD vectorization \\### GPU \\*\n",
"Longer pipelines, less travel to host (do more work on the GPU before\n",
"communicating data between host and GPU) \\### Clusters of CPUs/GPUs \\*\n",
"Computation/communication overlap with prodding of the asynchronous\n",
"progress engine \\* Avoidance of unnecessary halo exchanges \\*\n",
"Reshuffling of halo exchanges \\* Threaded data packing/unpacking\n",
"\n",
"## As time increases in the finite difference evolution, are wavefield arrays “swapped” as you might see in c/c++ code\n",
"\n",
"In c/c++ code using two wavefield arrays for second order acoustics, you\n",
"might see code like the following to “swap” the wavefield arrays at each\n",
"time step:\n",
"\n",
"``` c\n",
" float *p_tmp = p_old;\n",
" p_old = p_cur;\n",
" p_cur = p_tmp;\n",
"```\n",
"\n",
"Instead of swapping arrays, devito uses the modulus of a time index to\n",
"map increasing temporal indices \\[0, 1, 2, 3, 4, 5, …\\] into cyclic\n",
"indices \\[0, 1, 2, 0, 1, 2, …\\].\n",
"\n",
"## What units are typically used in Devito examples\n",
"\n",
"- Sampling rates: msec\n",
"- Frequency: KHz\n",
"- Velocity: km/sec\n",
"\n",
"## Can I subclass basic types such as TimeFunction\n",
"\n",
"Yes, just like we did for our seismic examples, for example, the\n",
"[PointSource\n",
"class](https://github.com/devitocodes/devito/blob/master/examples/seismic/source.py).\n",
"A few caveats are necessary, though.\n",
"\n",
"First, classes such as `Function` or `SparseTimeFunction` are inherently\n",
"complex. In fact, `SparseTimeFunction` itself is a subclass of\n",
"`Function`. The whole class hierarchy is modular and well-structured,\n",
"but at the same time, it’s depth and offers several hooks to allow\n",
"specialization by subclasses – for example, all methods starting with\n",
"`__` such as `__init_finalize__` or `__shape_setup__`. It will take some\n",
"time to digest it. Luckily, you will only need to learn a little of\n",
"this, at least for simple subclasses.\n",
"\n",
"Second, you must know that these objects are subjected to so-called\n",
"reconstruction during compilation. Objects are immutable inside Devito;\n",
"therefore, even a straightforward symbolic transformation such as\n",
"`f[x] -> f[y]` boils down to performing a reconstruction, that is,\n",
"creating a whole new object. Since `f` carries around several attributes\n",
"(e.g., shape, grid, dimensions), each time Devito performs a\n",
"reconstruction, we only want to specify which attributes are changing –\n",
"not all of them, as it would make the code ugly and incredibly\n",
"complicated. The solution to this problem is that all the base symbolic\n",
"types inherit from a common base class called `Reconstructable`; a\n",
"`Reconstructable` object has two special class attributes, called\n",
"`__rargs__` and `__rkwargs__`. If a subclass adds a new positional or\n",
"keyword argument to its `__init_finalize__`, it must also be added to\n",
"`__rargs__` or `__rkwargs__`, respectively. This will provide Devito\n",
"with enough information to perform a reconstruction when it’s needed\n",
"during compilation. The following example should clarify:\n",
"\n",
"``` python\n",
"class Foo(Reconstructable):\n",
" __rargs__ = ('a', 'b')\n",
" __rkwargs__ = ('c',)\n",
"\n",
" def __init__(self, a, b, c=4):\n",
" self.a = a\n",
" self.b = b\n",
" self.c = c\n",
"\n",
" def __repr__(self):\n",
" return \"x(%d, %d, %d)\" % (self.a, self.b, self.c)\n",
"\n",
"class Bar(Foo):\n",
" __rkwargs__ = Foo.__rkwargs__ + ('d',)\n",
"\n",
" def __init__(self, *args, d=6, **kwargs)\n",
" super().__init__(*args, **kwargs)\n",
" self.d = d\n",
"```\n",
"\n",
"You are unlikely to care about how reconstruction works in practice, but\n",
"here are a few examples for `a = Foo(3, 5)` to give you more context.\n",
"\n",
"``` python\n",
"a._rebuild() -> \"x(3, 5, 4)\" (i.e., copy of `a`).\n",
"a._rebuild(4) -> \"x(4, 5, 4)\"\n",
"a._rebuild(4, 7) -> \"x(4, 7, 4)\"\n",
"a._rebuild(c=5) -> \"x(3, 5, 5)\"\n",
"a._rebuild(1, c=7) -> \"x(1, 5, 7)\"\n",
"```\n",
"\n",
"## How can I change the compilation flags (for example, I want to change the optimization level from -O3 to -O0)\n",
"\n",
"There is currently no API to achieve this straightforwardly. However,\n",
"there are three workarounds:\n",
"\n",
"- hacky way: change the flags explicitly in the Devito source code. In\n",
" Devito v4.0, you can do that\n",
" [here](https://github.com/opesci/devito/blob/v4.0/devito/compiler.py#L146)\n",
"- via env vars: use a\n",
" [CustomCompiler](https://github.com/opesci/devito/blob/v4.0/devito/compiler.py#L446)\n",
" – just leave the `DEVITO_ARCH` environment variable unset or set it\n",
" to `'custom'`. Then, `export CFLAGS=\"...\"` to tell Devito to use the\n",
" exported flags in place of the default ones.\n",
"- programmatically: subclass one of the compiler classes and set\n",
" `self.cflags` to whatever you need. Do not forget to add the\n",
" subclass to the [compiler\n",
" registry](https://github.com/opesci/devito/blob/v4.0/devito/compiler.py#L472).\n",
" For example, you could do\n",
"\n",
"``` python\n",
"from devito import configuration, compiler_registry\n",
"from devito.compiler import GNUCompiler\n",
"\n",
"class MyOwnCompiler(GNUCompiler):\n",
" def __init__(self, *args, **kwargs):\n",
" super(MyOwnCompiler, self).__init__(*args, **kwargs)\n",
" # \n",
"\n",
"\n",
"### Make sure Devito is aware of this new Compiler class\n",
"compiler_registry['mycompiler'] = MyOwnCompiler\n",
"configuration.add(\"compiler\", \"custom\", list(compiler_registry), callback=lambda i: compiler_registry[i]())\n",
"\n",
"\n",
"### Then, what remains to be done is asking Devito to use MyOwnCompiler\n",
"\n",
"configuration['compiler'] = 'mycompiler'\n",
"```\n",
"\n",
"## Is the jitted code IEEE compliant?\n",
"\n",
"By default, Devito compiles the generated code using flags that maximize\n",
"the runtime performance. Some of these flags, such as GCC’s\n",
"`-ffast-math`, break IEEE compliance by, for example, reordering\n",
"operations based on associativity. This may alter the numerical output,\n",
"including even making some codes produce incorrect results. If you think\n",
"your code is affected by `-ffast-math` or similar flags that break IEEE\n",
"compliance, you can disable them by setting the environment variable\n",
"`DEVITO_SAFE_MATH=1`. You can achieve the same effect in Python by\n",
"setting `configuration['safe-math'] = 1`.\n",
"\n",
"\\\n",
"\n",
"## Can I control the MPI domain decomposition?\n",
"\n",
"By default, domain decomposition starts along the slowest axis. This\n",
"means that for an n-dimensional grid, the outermost dimension is\n",
"decomposed first. For example, in a three-dimensional grid (x, y, z),\n",
"the x dimension is split into chunks first, followed by the y dimension,\n",
"and finally the z dimension. Then the process restarts from the\n",
"outermost dimension again, ensuring an n-dimensional decomposition,\n",
"until as many partitions as MPI ranks are created.\n",
"\n",
"### Customization\n",
"\n",
"The `Grid` class accepts an optional `topology` argument, allowing users\n",
"to specify custom domain decompositions as an n-tuple, where `n` is the\n",
"number of distributed dimensions. For instance, for a two-dimensional\n",
"grid, the topology `(4, 1)` will decompose the slowest axis into four\n",
"partitions (one per MPI rank), while the fastest axis will be replicated\n",
"across all MPI ranks. So, for example, given a\n",
"`Grid(shape=(16, 16), topology=(4, 1))`, the x dimension would be split\n",
"into 4 chunks of 4, resulting in partitions of shape `(4, 16)` for each\n",
"MPI rank.\n",
"\n",
"### Wildcard-based Decomposition\n",
"\n",
"Consider a domain with three distributed dimensions: x, y, and z, and an\n",
"MPI communicator with `N` processes. Here are some examples of\n",
"specifying a custom `topology`:\n",
"\n",
"- **With N known (e.g., N=4)**:\n",
"\n",
" - `(1, 1, 4)`: Decomposes the z dimension into 4 chunks.\n",
" - `(2, 1, 2)`: Decomposes the x dimension into 2 chunks and the z\n",
" dimension into 2 chunks.\n",
"\n",
"- **With N unknown**:\n",
"\n",
" - `(1, '*', 1)`: The wildcard `'*'` indicates that the runtime\n",
" should decompose the y dimension into N chunks.\n",
" - `('*', '*', 1)`: The runtime decomposes both x and y dimensions\n",
" into factors of N, prioritizing the outermost dimension.\n",
"\n",
" Assuming the number of ranks `N` cannot be evenly decomposed into\n",
" the requested stars, decomposition is as even as possible,\n",
" prioritizing the outermost dimension:\n",
"\n",
" - **For N=3**:\n",
" - `('*', '*', 1)` results in (3, 1, 1).\n",
" - `('*', 1, '*')` results in (3, 1, 1).\n",
" - `(1, '*', '*')` results in (1, 3, 1).\n",
" - **For N=6**:\n",
" - `('*', '*', 1)` results in (3, 2, 1).\n",
" - `('*', 1, '*')` results in (3, 1, 2).\n",
" - `(1, '*', '*')` results in (1, 3, 2).\n",
" - **For N=8**:\n",
" - `('*', '*', '*')` results in (2, 2, 2).\n",
" - `('*', '*', 1)` results in (4, 2, 1).\n",
" - `('*', 1, '*')` results in (4, 1, 2).\n",
" - `(1, '*', '*')` results in (1, 4, 2).\n",
"\n",
"### The `DEVITO_TOPOLOGY` Environment Variable\n",
"\n",
"As of Devito v4.8.11, the domain decomposition topology can also be\n",
"specified globally using the environment variable `DEVITO_TOPOLOGY`.\n",
"Accepted values are:\n",
"\n",
"- `x`: Corresponds to the topology `('*', 1, 1)`, decomposing the x\n",
" dimension.\n",
"- `y`: Corresponds to the topology `(1, '*', 1)`, decomposing the y\n",
" dimension.\n",
"- `z`: Corresponds to the topology `(1, 1, '*')`, decomposing the z\n",
" dimension.\n",
"- `xy`: Corresponds to the topology `('*', '*', 1)`, decomposing both\n",
" x and y dimensions.\n",
"\n",
"## How should I use MPI on multi-socket machines\n",
"\n",
"In general you should use one MPI rank per NUMA node on a multi-socket\n",
"machine. You can find the number of numa nodes with the `lscpu` command.\n",
"For example, here is the relevant part of the output from the `lscpu`\n",
"command on an AMD 7502 2 socket machine with 2 NUMA nodes:\n",
"\n",
"``` default\n",
"Architecture: x86_64\n",
"CPU(s): 64\n",
"On-line CPU(s) list: 0-63\n",
"Thread(s) per core: 1\n",
"Core(s) per socket: 32\n",
"Socket(s): 2\n",
"NUMA node(s): 2\n",
"Vendor ID: AuthenticAMD\n",
"Model name: AMD EPYC 7502 32-Core Processor\n",
"NUMA node0 CPU(s): 0-31\n",
"NUMA node1 CPU(s): 32-63\n",
"```\n",
"\n",
"## How do I make sure my code is “MPI safe”\n",
"\n",
"There are a few things you may want to check\n",
"\n",
"- To refer to the actual (“global”) shape of the domain, you should\n",
" always use `grid.shape` (or analogously through a `Function`,\n",
" `f.grid.shape`). And unless you know well what you’re doing, you\n",
" should never use the function shape, namely `f.shape` or\n",
" `f.data.shape`, as that will return the “local” domain shape, that\n",
" is the data shape after domain decomposition, which might differ\n",
" across the various MPI ranks.\n",
"\n",
"## Why does my Operator kernel die suddenly\n",
"\n",
"This is likely due to an out-of-bounds (OOB) array access while running\n",
"the generated code. This shouldn’t really ever happen! The compiler and\n",
"runtime should catch any DSL misuses, ultimately leading to such OOB\n",
"accesses, well before jumping to C-land via `op.apply(...)` to run the\n",
"generated code. However, there are currently two open issues when this\n",
"might unfortunately happen:\n",
"\n",
"- When misusing `SubDimensions` or `SubDomains`. See [this\n",
" issue](https://github.com/devitocodes/devito/issues/1578)\n",
"- When misusing `ConditionalDimensions`. See [this\n",
" issue](https://github.com/devitocodes/devito/issues/1829)\n",
"\n",
"## Can I manually modify the C code generated by Devito and test these modifications\n",
"\n",
"Yes, as of Devito v3.5 it is possible to modify the generated C code and\n",
"run it inside Devito. First you need to get the C file generated for a\n",
"given `Operator`. Run your code in `DEBUG` mode:\n",
"\n",
"``` bash\n",
"DEVITO_LOGGING=DEBUG python your_code.py\n",
"```\n",
"\n",
"The generated code path will be shown as in the excerpt below:\n",
"\n",
"``` default\n",
"CustomCompiler: compiled `/tmp/devito-jitcache-uid1000/ed41e9373af1bc129471b7ae45e1c3740b60a856.c` [0.29 s]\n",
"```\n",
"\n",
"You can now open the C file, do the modifications you like, and save\n",
"them. Finally, rerun the same program but this time with the *Devito JIT\n",
"backdoor* enabled:\n",
"\n",
"``` bash\n",
"DEVITO_JIT_BACKDOOR=1 python your_code.py\n",
"```\n",
"\n",
"This will force Devito to recompile and link the modified C code.\n",
"\n",
"If you have a large codebase with many `Operator`s, here’s a\n",
"[trick](https://github.com/devitocodes/devito/wiki/Efficient-use-of-DEVITO_JIT_BACKDOOR-in-large-codes-with-many-Operators)\n",
"to speed up your hacking with the JIT backdoor.\n",
"\n",
"## How do I find the source of my bug quickly and get support\n",
"\n",
"Assuming this is actually an issue in Devito, we distinguish between\n",
"three types of bugs:\n",
"\n",
"1. a Python-level bug, e.g. some exceptions raised during compilation\n",
"2. a C-level bug causing abrupt termination, e.g. a segmentation fault\n",
" (ouch)\n",
"3. a C-level bug, e.g. the numerical output “looks wrong”\n",
"\n",
"Typically such a bug occurs in a moderately big code, so how should we\n",
"proceed?\n",
"\n",
"If you are in the cases 1 or 2 above, the first thing to do, regardless\n",
"of who fixes it (either you directly if feeling brave, or most likely\n",
"someone from the Devito team), is to create an MFE – a Minimal Failing\n",
"Example. An interesting read about MFEs in general is available\n",
"[here](http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports).\n",
"The typical workflow in Devito is as follows:\n",
"\n",
"- If the failure is from a Jupyter Notebook, first of all, convert it\n",
" into a Python file. It’s trivial, you can do it directly from within\n",
" the Jupyter Notebook itself, under the “File” tab there’s an option\n",
" to convert the notebook into a Python file.\n",
"- Then it’s time to trim down the code. The idea is to incrementally\n",
" remove parts of the code until the bug disappears, i.e., Devito\n",
" compilation and execution reach completion without failures. In\n",
" doing so, you don’t care about things such as breaking the physics,\n",
" making the method unstable, and so on. So …\n",
"- If your Operator has many Eqs, start with removing some of them. Is\n",
" the bug still there? Great, keeping on removing the unnecessary.\n",
" Some Functions may now become unnecessary because they were only\n",
" appearing in the removed Eqs. Then, drop them too from the Python\n",
" file.\n",
"- Here’s another trick. You have many SubDomains – try with removing\n",
" some of them, both their definition and where they’re used.\n",
"- The MFE must have a super quick turnaround time. Avoid `solve` at\n",
" all costs. Run for only a few timesteps, possibly just one; this\n",
" will also help with avoiding NaN or Inf since your MFE is most\n",
" likely unstable.\n",
"- If you have just one Eq in your Operator, but it’s a big one,\n",
" perhaps with lots of operations over many Functions, then you should\n",
" trim it down. Try to mitigate the arithmetic complexity, for example\n",
" by dropping derivatives or entire sub-expressions.\n",
"- There are other tricks for the creation of an MFE. If your code by\n",
" default uses MPI, or OpenMP, or OpenACC, or combinations thereof,\n",
" but the bug appears even when running sequentially, then explicitly\n",
" disable parallelism. Also try with disabling the performance\n",
" optimizations applied by an Operator – `Operator(..., opt='noop')`.\n",
"- There are a few other tricks and things you may wanna play with to\n",
" create an MFE…\n",
"\n",
"At the end of the day, an MFE should really be small, ideally not more\n",
"than 10 lines of code, and definitely not more than 20~30. If you\n",
"managed to create an actual MFE, then we’re in business. File an issue\n",
"on GitHub or talk to us on Slack, attach the MFE, and someone will\n",
"quickly look into it.\n",
"\n",
"Now, stepping back: you may unfortunately be in the case 3 above – the\n",
"numerical output “looks wrong”, despite the code really looks sane. This\n",
"is in our experience an extremely rare circumstance. You may try to look\n",
"at the generated C, to see if you spot anything odd; or you may just\n",
"file an issue on GitHub with instructions to reproduce the problem, or\n",
"talk to us directly or Slack.\n",
"\n",
"## Is there a way to get the performance of an Operator\n",
"\n",
"Yes, any logging level below or equal to `PERF` will do the trick. For\n",
"example, if you run:\n",
"\n",
" DEVITO_LOGGING=PERF python your_code.py\n",
"\n",
"you will see that Devito emits lots of useful information concerning the\n",
"performance of an Operator. The following is reported:\n",
"\n",
"- the code generation, compilation, and execution times;\n",
"- for each section in the generated code, its execution time,\n",
" operational intensity (OI), GFlops/s and GPts/s performance;\n",
"- global GFlops/s and GPts/s performance of the Operator (i.e.,\n",
" cumulative across all sections);\n",
"- in the case of an MPI run, per-rank GFlops/s and GPts/s performance.\n",
"\n",
"About the GPts/s metric, that is number of gigapoints per seconds. The\n",
"“points” we refer to here are the actual grid points – so if the grid is\n",
"an `N**3` cube, the number of timesteps is `T`, and the Operator runs\n",
"for `S` secs, then we have `N**3*T/S GPts/s`. This is the typical metric\n",
"used for finite difference codes.\n",
"\n",
"An excerpt of the performance profile emitted by Devito upon running an\n",
"Operator is provided below. In this case, the Operator has two sections,\n",
"`section0` and `section1`, and `section1` consists of two consecutive 6D\n",
"iteration spaces whose size is given between angle brackets.\n",
"\n",
"``` default\n",
"Global performance: [OI=0.16, 8.00 GFlops/s, 0.04 GPts/s]\n",
"Local performance:\n",
" * section0<136,136,136> run in 0.10 s [OI=0.16, 0.14 GFlops/s]\n",
" * section1<<341,16,16,14,14,136>,<341,16,16,8,8,130>> run in 35.91 s [OI=5.36, 8.01 GFlops/s, 0.05 GPts/s]\n",
"```\n",
"\n",
"Note that `section0` doesn’t show the GPts/s. This is because no\n",
"TimeFunction is written in this section.\n",
"\n",
"## How does Devito compute the performance of an Operator\n",
"\n",
"The section execution time is computed directly in the generated code\n",
"through cheap timers. The cumulative Operator execution time is computed\n",
"through Python-level timers and includes the overhead inherent in the\n",
"processing of the arguments supplied to `op.apply(...)`.\n",
"\n",
"The floating-point operations are counted once all of the symbolic\n",
"flop-reducing transformations have been performed during compilation.\n",
"Devito uses an in-house estimate of cost, rather than SymPy’s estimate,\n",
"to take care of some low-level intricacies. For example, Devito’s\n",
"estimate ignores the cost of integer arithmetic used for offset indexing\n",
"into multi-dimensional arrays. Examples of how the Devito estimator\n",
"works are available\n",
"[here](https://github.com/devitocodes/devito/blob/v4.1/tests/test_dse.py#L265).\n",
"\n",
"To calculate the GFlops/s performance, Devito multiplies the\n",
"floating-point operations calculated at compile time by the size of the\n",
"iteration space, and it does that at the granularity of individual\n",
"expressions. For example, consider the following snippet:\n",
"\n",
"``` default\n",
"\n",
"for x = x_m to x_M\n",
" for y = y_m to y_M\n",
" u[x, y] = 2*v[x, y] + 1\n",
" for z = z_m to z_M\n",
" w[x, y, z] += 4*h[x, y, z] + 5 - f[x, y, z]\n",
"\n",
"```\n",
"\n",
"At compile time, Devito produces the following estimate:\n",
"`2*(x_M-x_m+1)*(y_M-y_m+1) + 4*(x_M-x_m+1)*(y_M-y_m+1)*(z_M-z_m+1) = 2*(x_M-x_m+1)*(y_M-y_m+1)*(1 + 2*(z_M-z_m+1))`.\n",
"Upon `op.apply(...)`, the values of `x_M, x_m, ...` become known and are\n",
"instantied. This gives the total number of operations performed in\n",
"`section0`, which is eventually divided by the section execution time to\n",
"give the GFlops/s performance.\n",
"\n",
"The produced GFlops/s has been checked against that reported by Intel\n",
"Advisor in a number of problems and the results were extremely close,\n",
"which gives confidence about the soundness of the Devito estimate.\n",
"Clearly, when it gets to articles or presentations, we encourage the use\n",
"of profilers such as Intel Advisor to back these numbers up. The metrics\n",
"emitted by Devito are only intended to give an initial yet fairly\n",
"realistic indication of the Operator performance.\n",
"\n",
"Compiler ninjas may wonder about the counting of loop-invariant\n",
"sub-expressions, which might produce an over-estimate of the actual\n",
"performance. Thanks to aggressive code motion, the amount of\n",
"innermost-loop-invariant sub-expressions in a Devito Operator is\n",
"typically negligible, so the Devito estimate doesn’t basically suffer\n",
"from this issue, or at least not in a tangible way to the best of our\n",
"knowledge.\n",
"\n",
"## How do I fairly compare the performance to that of my in-house code\n",
"\n",
"It’s challenging! Here’s a potentially non-exhaustive list of things to\n",
"check:\n",
"\n",
"- Check the physics — e.g., there are many TTI variations out there,\n",
" are the two codes using the same one? You wanna ensure the PDEs are\n",
" the same.\n",
"- Check the discretization — do the two codes use the same spatial\n",
" order? most importantly, is it the same spatial order in all problem\n",
" dimensions? carefully optimized codes often use lower order stencils\n",
" in some dimensions. This significantly impacts performance\n",
"- Did you try tuning the performance of the Devito `Operator`? E.g.,\n",
" on GPUs it is worth giving the `par-tile` option a go.\n",
"- Snapshotting: compression, serialization, asynchronous streaming if\n",
" running on device… Many legacy codes have these features enabled\n",
" out-of-the-box. Are you using them?\n",
"- Time sub-sampling for snapshotting – yes or no?\n",
"- Expanding box – yes or no.\n",
"- Mixed precision for model parameters – yes or no?\n",
"- Other minor potential devito optimizations: use of Buffer(2) to\n",
" reduce working set size; Trigonometric identities to avoid some\n",
" temporaries (e.g., cos2 = 1 - sin2), etc…\n",
"\n",
"## Is there a list of refereed papers related to the Devito project\n",
"\n",
"Please see https://www.devitoproject.org/publications\n",
"\n",
"## How do I cite Devito\n",
"\n",
"Please see https://www.devitoproject.org/citing\n",
"\n",
"## Where did the name Devito come from?\n",
"\n",
"The precursor project that led to Devito was named by\n",
"[@ggorman](https://github.com/ggorman) using a backronym generator. He\n",
"put in some keywords like “open source performant seismic codes”, and\n",
"chose the (weird) name “Opesci”. No one knew how to pronounce this, so a\n",
"common conversation was “How do you pronounce this?” “Opesci, like Joe\n",
"Pesci”. So for the next version we chose to go with a famous Joe Pesci\n",
"character - Tommy Devito from Goodfellas. The name came up in a\n",
"discussion between [@navjotk](https://github.com/navjotk) and\n",
"[@mlange05](https://github.com/mlange05) (mostly the latter) and we\n",
"stuck with it."
],
"id": "56ad72da-0e0c-46b8-856d-6ceb00bd69c0"
}
],
"nbformat": 4,
"nbformat_minor": 5,
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
}
}
}