MPI and Large Data

Practical install and launch guide for npRmpi across session, attach, and profile workflows.

Keywords

npRmpi, npRmpi.init, MPI, mpiexec, session mode, attach mode, profile mode, R_PROFILE_USER, Windows, Linux, macOS

This page is for users who want the np workflow but need MPI. The short version is simple: do not start with old wrapper-heavy mpi.* patterns unless you truly need manual control. Start with npRmpi.init() and ordinary np-style calls.

If you want the smallest runnable examples first, go to Quickstarts. If you want the broader example catalog, go to Code Catalog.

Short version

On macOS and Linux, start with session / spawn mode unless you already know you need mpiexec.
On Windows, start with attach mode under mpiexec.
On larger or heterogeneous clusters, profile / manual-broadcast mode is the cleanest advanced route.
For session and attach, the supported modern workflow is npRmpi.init(...) followed by ordinary np* calls.
profile mode is different: it uses an explicit startup profile and explicit broadcast commands.

The same LP shortcut after MPI initialization

Once npRmpi.init(...) has been called, the same nomad = "auto" convenience route available in serial np is available here for LP-capable families.

If you want the smallest runnable example of that pattern, start with nprmpi_session_nomad_quickstart.R.

As in serial np, this shortcut requires the crs package because NOMAD degree search is provided there.

In the current release line, that route now uses the native crs NOMAD runtime directly rather than the older transitional fallback path.

What 0.70-5 changes for MPI work

The 0.70-5 release distributes post-cross-validation npindex() fitting, evaluation, and gradients across owned evaluation rows in an active MPI session. It also restores serial-equivalent exact-bootstrap behavior for adaptive and generalized nearest-neighbor conditional density/distribution plots, and strengthens autodispatch argument transport and bandwidth-object reuse checks.

These are implementation improvements rather than new calling conventions: the ordinary np-style session workflow remains the same. When comparing worker counts, use a numerical tolerance because MPI reductions can change summation order at roundoff scale.

Mode chooser

If you want to…	Use this mode	Launch pattern	Key rule	Script
work interactively and let `npRmpi` create workers for you	`session` / `spawn`	plain R session	no `.Rprofile` bootstrap needed	nprmpi_session_quickstart.R
run inside an MPI world created by `mpiexec`	`attach`	`mpiexec ... Rscript --no-save foo.R`	clear `R_PROFILE_USER` and `R_PROFILE`	nprmpi_attach_quickstart.R
keep worker startup and broadcast logic explicit	`profile` / manual-broadcast	`mpiexec ... R CMD BATCH --no-save foo.R`	set exactly one `R_PROFILE_USER`; clear `R_PROFILE`	nprmpi_profile_quickstart.R

A safe validation ladder

Use this order rather than jumping immediately to a large job.

Make sure library(npRmpi) loads.
Run the smallest mode-appropriate quickstart.
Increase workers only after the tiny run is clean.
Move to the demo harness only when the first small example works.

That order is not glamorous, but it avoids chasing MPI launch issues and estimator issues at the same time.

Platform map

Platform	First install path	First mode to try	Notes
macOS	source install against a working MPI toolchain	`session` / `spawn`	the currently documented local recipe uses MacPorts + MPICH
Linux	source install against a system MPI toolchain	`session` / `spawn`	include/lib paths depend on the local distribution and MPI stack
Windows	check the current CRAN package pages first, then install the available package build	`attach`	current package source treats `attach` as the Windows entry path rather than `spawn`
heterogeneous cluster	source install on the target machines	`profile`	explicit startup and broadcast control is usually the least surprising route

Current package pages:

npRmpi on CRAN: cran.r-project.org/package=npRmpi
Rmpi on CRAN: cran.r-project.org/package=Rmpi

Install from CRAN first

This is the shortest public route. If the target machine already has a working MPI toolchain and current package binaries or source prerequisites are in place, start here.

## Start with the public CRAN install when the MPI toolchain is ready
install.packages("npRmpi", dependencies = TRUE)

Optional GitHub branch install for development or troubleshooting

If you specifically need the live npRmpi branch rather than the current CRAN release, install the npRmpi branch from R-Package-np.

## Use the GitHub branch only when you specifically need the live npRmpi branch
library(devtools)
install_github("JeffreyRacine/R-Package-np", ref = "npRmpi", build_vignettes = FALSE)

Use build_vignettes = FALSE if TeX is not available.

Step 1 after install: make sure the package loads

Do this before worrying about launch modes.

R -q -e 'library(npRmpi); sessionInfo()'

macOS: current documented local recipe

The clearest documented local recipe in the repo is MacPorts + MPICH.

1) Install MPICH

sudo port install mpich-default
sudo port select --set mpi mpich-mp-fortran

2) Export the build environment

export RMPI_TYPE=MPICH
export RMPI_INCLUDE=/opt/local/include/mpich-mp
export RMPI_LIB_PATH=/opt/local/lib/mpich-mp
export RMPI_LIBS="-L/opt/local/lib/mpich-mp -lmpi"
export CC=mpicc
export CXX=mpicxx

3) Build and install

cd /Users/jracine/Development
R CMD build np-npRmpi
R CMD INSTALL npRmpi_0.70-5.tar.gz

4) First runtime test

Start with the session quickstart, not with mpiexec.

## Load npRmpi, start one worker, and fit the smallest session-mode example
library(npRmpi)
npRmpi.init(nslaves = 1)

set.seed(1)
x <- runif(200)
y <- sin(2 * pi * x) + rnorm(200, sd = 0.2)
dat <- data.frame(y, x)

bw <- npregbw(y ~ x, regtype = "ll", bwmethod = "cv.ls", data = dat)
fit <- npreg(bws = bw, data = dat)
summary(fit)

npRmpi.quit()

Linux: current practical guidance

The package docs do not yet ship one distro-specific Linux recipe, so the right mindset is:

install a working MPI implementation and development headers,
make sure mpicc and mpicxx are available,
point RMPI_INCLUDE, RMPI_LIB_PATH, and RMPI_LIBS at the matching headers and libraries if needed,
validate session mode before trying attach or profile.

A generic environment skeleton looks like this:

export RMPI_TYPE=MPICH
export RMPI_INCLUDE=/path/to/mpi/include
export RMPI_LIB_PATH=/path/to/mpi/lib
export RMPI_LIBS="-L/path/to/mpi/lib -lmpi"
export CC=mpicc
export CXX=mpicxx

The exact paths depend on the local MPI installation, so treat this as a pattern rather than as a finished copy-and-paste recipe.

Windows: current practical guidance

On Windows, the safest approach is conservative:

check the current CRAN pages for npRmpi and Rmpi,
install the available package build,
confirm that library(npRmpi) loads cleanly,
start with attach mode under mpiexec,
only then move on to profile if you need explicit broadcast control.

The key point is that the current package source does not treat spawn as the Windows entry path. The supported first move is attach.

A minimal script body looks like this:

## Initialize in attach mode after mpiexec has already created the MPI world
library(npRmpi)
npRmpi.init(mode = "attach")

if (mpi.comm.rank(0L) == 0L) {
  set.seed(1)
  x <- runif(200)
  y <- sin(2 * pi * x) + rnorm(200, sd = 0.2)
  dat <- data.frame(y, x)

  bw <- npregbw(y ~ x, regtype = "ll", bwmethod = "cv.ls", data = dat)
  fit <- npreg(bws = bw, data = dat)
  summary(fit)

  npRmpi.quit(mode = "attach")
}

Launch with a pre-created MPI world and cleared profile environment:

mpiexec -env R_PROFILE_USER "" -env R_PROFILE "" -n 2 Rscript --no-save foo.R

Launch patterns you actually need

`session` / `spawn`

Use this when you are in an ordinary R session and want npRmpi to create workers for you.

## Start a plain session-mode workflow from an ordinary R session
library(npRmpi)
npRmpi.init(nslaves = 1)

# ordinary np-style code goes here

npRmpi.quit()

Important note:

.Rprofile bootstrap files are not required for the supported npRmpi.init() workflow in session mode.

`attach`

Use this when MPI world is already present.

Launch pattern:

mpiexec -env R_PROFILE_USER "" -env R_PROFILE "" -n 2 Rscript --no-save foo.R

Inside foo.R:

## Initialize the attach-mode workflow inside a pre-created MPI world
library(npRmpi)
npRmpi.init(mode = "attach")

# ordinary np-style code goes here

npRmpi.quit(mode = "attach")

Important notes:

clear R_PROFILE_USER,
clear R_PROFILE,
do not layer a profile bootstrap on top of attach,
if you want mpiexec plus explicit profile startup, that is no longer attach; that is profile.

`profile` / manual-broadcast

Use this when you want explicit startup and explicit broadcast logic, especially on larger or heterogeneous clusters.

Get the package startup profile:

RPROFILE=$(Rscript --no-save -e 'cat(system.file("Rprofile", package="npRmpi"))')

Small quickstart launch:

mpiexec -env R_PROFILE_USER "$RPROFILE" -env R_PROFILE "" -n 2 \
  Rscript --no-save foo.R

Canonical demo-harness launch:

mpiexec -env R_PROFILE_USER "$RPROFILE" -env R_PROFILE "" \
  -env NP_RMPI_PROFILE_RECV_TIMEOUT_SEC 180 -n 2 \
  R CMD BATCH --no-save foo.R

Important rules:

provide exactly one profile source,
set R_PROFILE_USER to the intended profile file,
clear R_PROFILE,
do not use --vanilla for profile mode,
do not attach package Rmpi inside a profile-mode script body.

Heterogeneous clusters

For heterogeneous clusters, profile is usually the cleanest first choice because:

startup is explicit,
worker initialization is explicit,
broadcast steps are explicit,
you are not depending on implicit attachment behavior.

Practical advice:

use an explicit absolute R_PROFILE_USER path,
start with a tiny smoke,
only add FI_* overrides if the local MPI or network stack truly needs them,
scale workers and sample size only after the tiny run is clean.

Tiny demo-harness smokes

These are the documented small checks in the package demo tree.

Serial

cd /Users/jracine/Development/np-npRmpi/demo/serial
make -f ../makefile MODE=serial NP_DEMO_N=100

Attach

mkdir -p /Users/jracine/Development/np-npRmpi/demo/n_2_attach
cd /Users/jracine/Development/np-npRmpi/demo/n_2_attach
make -f ../makefile MODE=attach NP=2 NP_DEMO_N=100

Profile

mkdir -p /Users/jracine/Development/np-npRmpi/demo/n_2_profile
cd /Users/jracine/Development/np-npRmpi/demo/n_2_profile
make -f ../makefile MODE=profile NP=2 NP_DEMO_N=100

That is a much safer starting point than running the full demo matrix immediately.

`mode = "auto"`

The package also supports:

## Let npRmpi choose the startup route automatically when that is appropriate
npRmpi.init(mode = "auto")

Current package docs describe mode = "auto" this way:

if mpi.comm.size(0) > 1, choose attach,
otherwise choose spawn.

That can be convenient in user scripts, but for teaching and troubleshooting it is usually better to be explicit about whether you want session / spawn or attach.

Common failure signatures

`Error: npRmpi auto-dispatch requires an active MPI slave pool; call npRmpi.init(...) first`

You are trying to use autodispatch without an active MPI session.

In practice this means:

call npRmpi.init(...) before the first estimator call,
use session / spawn for interactive work,
use attach only when the MPI world already exists.

`Error: could not find function "mpi.bcast.cmd"`

You are trying to run a profile/manual-broadcast script without the intended startup profile.

Check the launch first:

does R_PROFILE_USER point to the profile you intended?
is R_PROFILE cleared?
are you avoiding --vanilla?
are you sure this is a profile script and not an attach script?

Attach or profile appears hung

Do the simple things first.

reduce the job to a tiny smoke,
try NP=2 before larger worker counts,
confirm that the launch command matches the right mode contract,
only then start experimenting with host-specific FI_* overrides.

Canonical source material for this page

The package markdown files remain the canonical low-level source material: