Distributed Devito data under MPI: initialize, assign, and read

This notebook summarizes how to initialize, assign into, and read back the data of Function, TimeFunction, SparseFunction, and SparseTimeFunction when running with MPI. It covers the three RHS layouts you meet in practice – globally replicated arrays, arrays already distributed like the Devito object, and arrays in some external distribution – plus assigning one distributed Devito object into another, and gathering distributed data back to a single rank.

The examples use the same execution style as examples/mpi/overview.ipynb: start an MPI ipyparallel cluster, connect to it from the notebook, and execute the Devito examples on all engines with %%px.

Prerequisites

The examples below are written for 4 MPI processes. Other process counts work for the APIs shown here, but a few comments and toy distributions assume 4 ranks.

From the root Devito directory, install the MPI requirements and create the ipyparallel profile if needed:

pip install -r requirements-mpi.txt
./scripts/create_ipyparallel_mpi_profile.sh

Then start the cluster in another terminal:

ipcluster start -n 4 --profile=mpi --engines=mpi
import ipyparallel as ipp

c = ipp.Client(profile="mpi")
view = c[:]
view.block = True
%%px
from devito import configuration

configuration["mpi"] = True
configuration["log-level"] = "WARNING"

Terminology

A distributed Devito object has a logical global shape, but each MPI rank owns only one local piece. Here, “array” means an ordinary NumPy array unless the text explicitly refers to a Devito .data or .data_local object. In the examples below:

  • glb_a is a globally replicated array: every rank has the full logical array.
  • loc_a is a local array with the same distribution as the Devito object.
  • loc_ext_a is a local array with a different, external distribution.
  • glb_idx is a 1D array of global indices labeling the local values in loc_ext_a.

Data assignment overview

Pick the LHS form from the distribution of the right-hand side:

Object Global replicated RHS Local RHS, same distribution Local RHS, external / different distribution
Dense f.data[...] = glb_a f.data_local[...] = loc_a f.data[glb_idx] = loc_ext_a (routed by global index), or f.data[region] = g.data[...] (another distributed Devito object)
Sparse src.data[...] = glb_a src.data_local[...] = loc_a src.data[:, glb_idx] = loc_ext_a

Both dense and sparse objects support routed (global-index) assignment: the difference is only which axis carries the global labels. For a dense Function the distributed axes are the spatial grid axes, so the labels are grid coordinates; for a sparse object the distributed axis is the sparse-point axis, so the labels are sparse-point numbers.

Indexing forms

Under MPI, Devito’s data accessor uses the logical global index space, and it has two indexing modes:

  • Basic indexing – slices and scalars, e.g. data[:, :], data[0], data[2:7], data[::-1] – is local: each rank reads or writes only the part of the global array it already owns, with no communication. A slice result keeps its induced distribution; a reversed or strided slice simply relabels which rank owns which result index, it does not move data.
  • Advanced indexing with an integer array (or boolean mask) on a distributed axis, e.g. data[glb_idx] or data[:, glb_idx], is routed by global index: each rank supplies values labeled by their global index and Devito sends them point-to-point to the ranks that own those indices. This is the form used to load data that arrives in an external distribution.

A few specific cases to keep in mind:

  • data[:, [g]] or data[:, np.array([g])]: advanced indexing on that axis, routed by global index on the distributed dimension.
  • data[np.array([...])]: advanced indexing on the first axis.
  • data[[0, 0, 0, 0]] on multi-dimensional Data: Devito legacy basic indexing, equivalent to data[0, 0, 0, 0].

The last form is a known historical inconsistency with NumPy: Devito has long accepted a top-level Python list whose length matches the data rank as basic multi-dimensional indexing. It is kept for compatibility. Prefer explicit tuples for new basic multi-dimensional indexing, for example data[(0, 0, 0, 0)], and use a nested list or array on the selected axis for routed indexing, for example data[:, [g]].

Dense data from global replicated RHS

For dense objects such as Function and TimeFunction, data uses the logical global domain. If every rank has the full array, assign it directly through data.

%%px
import numpy as np
from devito import Function, Grid

grid = Grid(shape=(4, 4), extent=(3.0, 3.0))
f = Function(name="f_global", grid=grid)

# Every rank has the full logical array.
glb_a = np.arange(np.prod(grid.shape), dtype=f.dtype).reshape(grid.shape)
f.data[...] = glb_a
assert np.all(f.data_local == glb_a[f.local_indices])

Dense data from local array with same distribution

Use data_local when the array already has Devito’s local shape on each MPI rank. data_local is a rank-local NumPy view: reads, writes, and in-place NumPy operations affect only the storage owned by that rank and do not perform global indexing or routing.

%%px
import numpy as np
from devito import Function, Grid

grid = Grid(shape=(4, 4), extent=(3.0, 3.0))
f = Function(name="f_local", grid=grid)

# The RHS is already shaped like this rank's owned domain.
rank = grid.distributor.myrank
loc_a = np.full(f.data_local.shape, rank + 1, dtype=f.dtype)
f.data_local[:, :] = loc_a[:, :]
assert np.all(f.data_local == loc_a)

# In-place local NumPy operations are also local.
f.data_local[:, :] *= -1
assert np.all(f.data_local == -loc_a)

Dense data from an external distribution (routed by global index)

When the input values arrive in a layout that does not match Devito’s decomposition, label each value with its global grid index and assign through data. Devito routes every value point-to-point to the rank that owns that grid index – no all-to-all, and no need for the local order to match Devito’s.

This is the dense analog of the sparse external case: the global labels are grid coordinates instead of sparse-point numbers. A single 1D integer index labels one distributed axis; remaining axes use basic indices.

%%px
import numpy as np
from devito import Function, Grid

grid = Grid(shape=(8,), extent=(7.0,))
f = Function(name="f_routed", grid=grid, dtype=np.int32)

rank = grid.distributor.myrank
nprocs = grid.distributor.nprocs

# Toy external layout: each rank holds a subset of global indices, reversed order.
glb_idx = np.array_split(np.arange(8)[::-1], nprocs)[rank]
glb_a = np.arange(8, dtype=f.dtype)        # reference, for self-checking only
loc_ext_a = glb_a[glb_idx]                 # this rank's externally held values

f.data[:] = 0
f.data[glb_idx] = loc_ext_a                # routed point-to-point by global index

# Read back the same global labels; check Devito's local storage too.
assert np.all(f.data[glb_idx] == loc_ext_a)
assert np.all(f.data_local == glb_a[f.local_indices])

Routing also works across several distributed axes at once. Label each value with its full global coordinate tuple; Devito resolves the owning rank from all distributed axes together.

%%px
import numpy as np
from devito import Function, Grid

grid = Grid(shape=(4, 4), extent=(3.0, 3.0))
f = Function(name="f_routed_2d", grid=grid, dtype=np.int32)
glb_a = np.arange(16, dtype=f.dtype).reshape(4, 4)

# Each rank labels its values with global (x, y) coordinates on both
# distributed axes; Devito routes each value to the rank that owns it.
rank = grid.distributor.myrank
points = {0: ([0, 3], [0, 3]), 1: ([1, 2], [3, 0]),
          2: ([3, 0], [1, 2]), 3: ([2, 1], [2, 1])}
xs, ys = (np.array(p) for p in points[rank])

f.data[:] = -1
f.data[xs, ys] = glb_a[xs, ys]
assert np.all(f.data[xs, ys] == glb_a[xs, ys])

Dense data from another distributed Devito object

You can assign one distributed Devito object into another even when their decompositions differ (for example because the grids have different shapes). Devito redistributes the selected block point-to-point. Basic slices on either side – including reversed or strided ones – are resolved with induced (local) ownership first, then the values are exchanged.

%%px
import numpy as np
from devito import Function, Grid

grid_src = Grid(shape=(8, 8), extent=(7.0, 7.0))
grid_dst = Grid(shape=(12, 12), extent=(11.0, 11.0))
src = Function(name="src_vol", grid=grid_src, dtype=np.int32)
dst = Function(name="dst_vol", grid=grid_dst, dtype=np.int32)
src.data[:] = np.arange(64, dtype=src.dtype).reshape(8, 8)

# `src` and `dst` have different shapes, hence different decompositions.
# Devito redistributes the selected block; the reversed RHS slice is resolved
# with induced (local) ownership before the point-to-point exchange.
dst.data[:] = 0
dst.data[2:10, 2:10] = src.data[::-1, ::-1]

ref = np.zeros((12, 12), dtype=dst.dtype)
ref[2:10, 2:10] = np.arange(64, dtype=src.dtype).reshape(8, 8)[::-1, ::-1]
gathered = dst.data_gather(rank=0)
if grid_dst.distributor.myrank == 0:
    assert np.all(gathered == ref)

Reading distributed data: induced slices and gather

Reading mirrors writing. Basic indexing is local: each rank gets exactly its piece of the global result, with no communication, in the induced distribution – a reversed or strided slice just relabels which rank owns which result index. Routed (advanced) reads pull each labeled global index from its owner, as shown above with f.data[glb_idx].

To collect (a slice of) the whole global array onto a single rank, use data_gather. It accepts start, stop, step (per axis or broadcast) and a target rank, and returns the assembled array on that rank and None elsewhere.

%%px
import numpy as np
from devito import Function, Grid

grid = Grid(shape=(10, 10), extent=(9.0, 9.0))
f = Function(name="f_read", grid=grid, dtype=np.int32)
glb_a = np.arange(100, dtype=f.dtype).reshape(10, 10)
f.data[:] = glb_a

myrank = grid.distributor.myrank

# `data_gather` collects (a slice of) the global array onto one rank.
whole = f.data_gather(rank=0)                            # full global array
window = f.data_gather(start=2, stop=8, step=2, rank=0)  # strided sub-block
reverse = f.data_gather(step=-1, rank=0)                 # reversed on every axis
if myrank == 0:
    assert np.all(whole == glb_a)
    assert np.all(window == glb_a[2:8:2, 2:8:2])
    assert np.all(reverse == glb_a[::-1, ::-1])

Sparse data from global replicated RHS

For sparse objects, data can also be initialized from a globally replicated sparse array. Every rank sees the same logical (nt, npoint) values; Devito stores only the sparse points owned by the rank.

%%px
import numpy as np
from devito import Grid, SparseTimeFunction

grid = Grid(shape=(4, 4), extent=(3.0, 3.0))
nt = 3
npoint = 8
src = SparseTimeFunction(name="src_global", grid=grid, nt=nt, npoint=npoint)

# Every rank has the full logical sparse array.
glb_a = np.arange(nt*npoint, dtype=src.dtype).reshape(nt, npoint)
src.data[:, :] = glb_a

assert np.all(src.data_local == glb_a[src.local_indices])

Sparse data from local array with same distribution

Use data_local when the array already follows Devito’s sparse distribution on this rank. The local array columns must be ordered exactly like Devito’s local sparse storage.

%%px
import numpy as np
from devito import Grid, SparseTimeFunction

grid = Grid(shape=(4, 4), extent=(3.0, 3.0))
nt = 3
npoint = 8
src = SparseTimeFunction(name="src_local", grid=grid, nt=nt, npoint=npoint)

glb_a = np.arange(nt*npoint, dtype=src.dtype).reshape(nt, npoint)
# The RHS is already shaped and ordered like this rank's sparse storage.
loc_a = -glb_a[src.local_indices]
src.data_local[:, :] = loc_a
assert np.all(src.data_local == loc_a)

Sparse data from an external distribution

A sparse object has one dimension whose entries are sparse points, numbered globally from 0 to npoint - 1. Devito may store those points on different ranks from the external object that produced the input data.

When the input array has its own distribution, this rank holds only some sparse-point columns. Provide those columns together with glb_idx, where each entry gives the global sparse point number for the matching local column. Devito then routes the values to the ranks that own those sparse points.

The key assignment is:

src.data[:, glb_idx] = loc_ext_a

where glb_idx[j] is the global sparse point number for loc_ext_a[:, j]. The local order does not have to match Devito’s sparse distribution.

This is the routed indexing form described above: one 1D integer index on one MPI-distributed dimension; the other dimensions use basic indices such as slices or scalars.

%%px
import numpy as np
from devito import Grid, SparseTimeFunction

grid = Grid(shape=(4, 4), extent=(3.0, 3.0))
nt = 3
npoint = 8
src = SparseTimeFunction(name="src", grid=grid, nt=nt, npoint=npoint)

rank = grid.distributor.myrank
nprocs = grid.distributor.nprocs

# Toy external layout: each rank holds a subset of sparse points, non-Devito order.
glb_idx = np.array_split(np.arange(npoint)[::-1], nprocs)[rank]

# A globally known reference is used only to make this example self-checking.
# In an application, loc_ext_a would come from an external data object.
glb_a = np.arange(nt*npoint, dtype=src.dtype).reshape(nt, npoint)
loc_ext_a = glb_a[:, glb_idx]

src.data_local[:, :] = 0
src.data[:, glb_idx] = loc_ext_a

# Read back the same externally distributed labels.
assert np.all(src.data[:, glb_idx] == loc_ext_a)

# Check this rank's Devito-local sparse storage too.
assert np.all(src.data_local == glb_a[src.local_indices])

The RHS must have the same logical layout as the selected LHS. For a SparseTimeFunction, loc_ext_a should have shape (nt, len(glb_idx)).

One sparse point at a time

Use this form when each local sparse time series must be transformed before loading:

src.data[:, [g]] = new_a[:, None]

The one-element list is intentional. src.data[:, g] = new_a is basic indexing and preserves the existing owner-local behavior; src.data[:, [g]] = new_a[:, None] is routed by global sparse point number.

%%px
import numpy as np
from devito import Grid, SparseTimeFunction

grid = Grid(shape=(4, 4), extent=(3.0, 3.0))
nt = 4
rank = grid.distributor.myrank
nprocs = grid.distributor.nprocs
npoint = 2*nprocs
src = SparseTimeFunction(name="src_interp", grid=grid, nt=nt, npoint=npoint)

# Equal local counts keep the example focused on the one-point assignment.
glb_idx = np.array_split(np.arange(npoint)[::-1], nprocs)[rank]

glb_a = np.arange(nt*npoint, dtype=src.dtype).reshape(nt, npoint)
loc_ext_a = glb_a[:, glb_idx]

def transform(values):
    # Stand-in for a cubic spline or another application-level operation.
    return 2*values + 1

src.data_local[:, :] = 0

for j, g in enumerate(glb_idx):
    new_a = transform(loc_ext_a[:, j])
    src.data[:, [g]] = new_a[:, np.newaxis]

assert np.all(src.data[:, glb_idx] == transform(loc_ext_a))

assert np.all(src.data_local == transform(glb_a[src.local_indices]))

Each routed assignment is collective. The example above has equal local counts, so all ranks naturally enter the loop the same number of times. With irregular counts, loop over the maximum local count and have ranks without data in a round pass empty arrays:

idx = np.empty(0, dtype=np.int64)
values = np.empty((nt, 0), dtype=src.dtype)
src.data[:, idx] = values

NumPy expressions on the RHS

There is no separate rule for NumPy expressions. Choose the LHS from the distribution of the RHS result:

  • use data when the RHS result is globally replicated;
  • use data_local when the RHS result is already in Devito’s local distribution;
  • for sparse data in an external distribution, use the indexed form shown above: src.data[:, glb_idx] = loc_ext_a.

In the example below, taper is replicated, but rec.data_local*taper[:, None] is local because the result has the shape and order of rec.data_local.

%%px
import numpy as np
from devito import Grid, SparseTimeFunction

grid = Grid(shape=(4, 4), extent=(3.0, 3.0))
nt = 5
npoint = 8
rec = SparseTimeFunction(name="rec", grid=grid, nt=nt, npoint=npoint)

glb_a = np.arange(nt*npoint, dtype=rec.dtype).reshape(nt, npoint)
loc_a = glb_a[rec.local_indices]
rec.data_local[:, :] = loc_a

before = rec.data_local.copy()
taper = np.linspace(1.0, 2.0, nt).astype(rec.dtype)

# Vectorized local expression. np.newaxis and None are equivalent here.
rec.data_local[:, :] *= taper[:, np.newaxis]
assert np.all(rec.data_local == before*taper[:, None])

# A read/write no-op is also local and valid.
rec.data_local[:, :] = rec.data_local[:, :]
assert np.all(rec.data_local == before*taper[:, None])

Layout notes

  • Logical axes and shape must match the selected LHS view. For src.data[:, glb_idx], the RHS shape is (nt, len(glb_idx)); for a dense f.data[xs, ys], the RHS shape is xs.shape.
  • Devito does not transpose axes automatically. If an external object stores values as (npoint_local, nt), transpose or build a (nt, npoint_local) view or buffer first.
  • Memory order is separate from logical layout. C-contiguous, Fortran-contiguous, and ordinary NumPy strided views are valid as long as the shape and axes are right. Flags such as array.flags["F_CONTIGUOUS"] are useful diagnostics, but Devito does not need special handling solely because an array came from a Fortran-oriented object.
  • A raw dense volume with no per-element labels cannot be reshuffled into a different distribution by assignment alone: provide global labels via routed indexing (f.data[glb_idx] = ...), assign from another distributed Devito object (f.data[region] = g.data[...]), or apply an explicit interpolation/exchange step first.
  • A different time sampling or padding scheme likewise needs an explicit step before assignment.
Back to top