File this one under “ideas that took a while to become mainstream”.
Some background is likely in order. The np and npRmpi packages have been on CRAN for a very long time now, and have lived through the ups and downs experienced by any long-lived research software project: bug fixes, maintenance, refreshes, updates, and so forth. In fact, the CRAN archives indicate that the first archived version was np_0.12-1.tar.gz, released on 2006-11-19 and weighing in at 246K (https://cran.r-project.org/src/contrib/Archive/np/). These packages are used in courses, papers, replications, and by people I have never met who want to estimate something flexible without pretending that every object in sight is Gaussian, linear, or continuous. That remains the point of the packages.
The upcoming np and npRmpi 0.70-2 releases are not simply minor updates dressed up with new version numbers. They are fairly substantial extensions of packages whose core purpose remains unchanged: providing researchers and practitioners with useful open-source nonparametric kernel methods for regression, density, distribution, quantile, partially linear, single-index, smooth-coefficient, and modal models, with mixed data types treated as first-class citizens rather than afterthoughts.
The headline change, among several, is the extension of local polynomial methods across the conditional estimators in the package. Historically, most of these methods were local constant. Regression had the familiar regtype = "ll" option for local linear estimation, but density, distribution, semiparametric regression, conditional quantiles, and modes largely lived in the local-constant world. That was useful, often enough, and also a potential limitation.
What is new is not merely that higher-order local polynomials are available. The more interesting update is that the polynomial degree and bandwidth vectors can be selected jointly. Better still, the selected polynomial degree need not be the same for every continuous covariate. One variable may call for a local constant fit (degree 0), another for local linear (degree 1), another for a higher order (degree \(p\)). Hall and Racine (2015) proposed infinite-order local polynomial regression; Li, Li, and Racine (2026) extend this logic to bounded-support conditional density estimation with known or unknown support in work currently under revision. Here is the basic idea: let the data help choose the amount of local polynomial structure variable by variable, rather than forcing the practitioner to make an ad hoc global choice before estimation begins.
That sounds simple when written in one sentence, but computationally it is anything but simple. Jointly selecting continuous bandwidths and integer-valued polynomial degrees is a mixed-integer optimization problem, and these problems are famously difficult. In the past, serious work in this direction often meant reaching for commercial solvers such as IBM ILOG CPLEX Optimizer. That is powerful software, but not the open research-software path I wanted for np.
The bridge is NOMAD, developed by the team at GERAD. They kindly gave me permission to include NOMAD in the crs package, which I authored with Zhenghua Nie, and that has made it possible to bring this machinery into the np and npRmpi workflows using sophisticated open-source optimization tools that sit much more naturally with the R philosophy. The refreshed crs package now supplies the NOMAD solver interface used by np and npRmpi, while the old crs::npglpreg() surface has been retired in favour of the more natural
npreg(..., nomad = TRUE)interface in np and npRmpi. Users should not have to remember which package temporarily housed which optimization route. They should be able to ask the estimator for the model they want, select NOMAD when appropriate, and get on with the analysis.
There is also a broader set of recent extensions around that central change. The Gallery of Code is being updated to highlight several of them more prominently:
- bounded-support and statistically proper local-polynomial extensions of conditional density and distribution methods, exposed through
proper = TRUE,ckerbound, and the conditionalcxkerbound/cykerboundcontrols ("range"for empirical support,"fixed"with*lb/*ubfor user-supplied bounds), drawing on boundary-adaptive kernel ideas (Racine et al. 2024); - more natural and complete first-class
npqreg,npconmode, andnpcopulaworkflows, including formula support, automatic internal bandwidth selection whenbwsis omitted, vectortau(npqreg),newdata-friendly high-level usage (npconmode), automatic two-dimensional probability-grid construction for the default copula route (npcopula), gradients, standard errors, and NOMAD degree-and-bandwidth pass-through where applicable; - ordered-datatype probability smoothing available through the Racine-Li-Yan ordered kernel (
okertype = "racineliyan", andoxkertype/oykertypein conditional routes) (Racine et al. 2020); - Bonferroni and simultaneous variability bounds on plots with data overlays by default;
- interactive plotting surfaces through
renderer = "rgl"whenrglis available, with transparent wireframes replacing the previous perspective plots.
These are not simply incremental patches. They are consequences of making the conditional-family machinery more coherent.
For a small but devoted band of users who endured years of working with an outdated version of the npRmpi package, the slightly more noteworthy news may be that npRmpi is now back from the grave. The package had effectively lain dormant after its last 2014 CRAN-era release because my co-author Tristen Hayfield, largely responsible for bringing this package to life, had moved on after completing his Ph.D. in computational astrophysics at ETH Zurich, and I never seemed able to muster the time or energy to keep it current. That changed toward the end of 2025, and the package received the small 0.60-20 update needed to get it listed on CRAN again in early 2026. The goal now is to have npRmpi move in step with np for users whose problems are large enough that parallel bandwidth selection, mixed-degree searches, bootstrap inference, or plot-data construction are worth spreading over MPI workers. In other words, npRmpi is not a different statistical story. It is the same story with more machinery available when the job calls for it: a fully fleshed-out companion implementation of np designed for heavier workloads.
What users of the previous versions may notice most is that the old npRmpi workflow often required users to wrap ordinary code in explicit MPI calls, while the new version does not. That route remains available through the profile/manual-broadcast interface for people who want that level of control and the small but consistent speed advantages it affords, but it is no longer the only way to work. The current package supports profile, attach, and session modes. In session mode, once MPI has been initialized, the analysis code is essentially interchangeable with ordinary np code: fit the model, inspect the object, plot it, and move between serial and MPI execution with far less ceremony. That should make it much easier to exploit the multiple cores already sitting on a desktop or laptop without turning every analysis script into an MPI programming exercise.
The visible usability changes are in the same spirit. Plotting has a cleaner public interface (both packages have received a substantial plot redesign), so common requests now look like ordinary R arguments rather than a tour through historical option names. Bootstrap bands, asymptotic standard errors, plot behavior, renderer choice, grid controls, and perspective controls are now easier to express, while the older plot.errors.* names remain accepted for compatibility. There has also been substantial work on worker setup, OpenMPI runtime discovery, master/worker dispatch, object-fed plot helpers, bootstrap fanout, and keeping npRmpi genuinely independent at runtime. The intended result is simple: serial and MPI packages should agree in statistical behaviour, while the MPI version gives users a practical path for computationally heavier work.
All of this has also prompted a broader cleanup of the public-facing material around the packages. The package gallery is being refreshed in parallel so that examples, documentation, vignettes, and issue links are easier to find once the CRAN-facing APIs are public. My research and CV pages have also been updated so that software, papers, books, and replication material are less scattered across the web. This may sound like housekeeping, but for open research software it is not a minor detail. If people cannot find the work, access working examples, understand what is current, or report a problem when they find one, the software is less useful than it ought to be.
Once the final tarballs have cleared CRAN, the links below should be the first places to look. Before this becomes the R-announce note, the version numbers should be checked one last time against CRAN; the intended release train is np 0.70-2 and npRmpi 0.70-2, building on the current crs 0.15-42 release unless final checks require another crs update.
npon CRANnpRmpion CRANcrson CRAN- The package gallery
- Research contributions
- Curriculum vitae
npsourcenpRmpisource branchnp/npRmpiissuescrssource and issues
If you are coming to the packages after a long absence, the main thing to try is probably the simplest:
library(np)
set.seed(42)
n <- 1000
x <- sort(runif(n, -2, 2))
y <- x + 0.1*x^5 + rnorm(n)
fit <- npreg(y ~ x, nomad = TRUE)
plot(fit, errors = "bootstrap", band = "all")
summary(fit)In this simple example NOMAD recovers the fifth-degree polynomial structure without being told the data-generating process. Since the selected polynomial order matches the DGP and cross-validation selects a large bandwidth, the estimator effectively becomes a global polynomial fit rather than a local one, and achieves the parametric \(\sqrt{n}\) rate shown by Hall and Racine (2015). But the point is not limited to finite polynomials: their theory covers a rich class of particularly smooth analytic DGPs, including trigonometric functions and exponentiated polynomials, so the method remains nonparametric while being capable of parametric-rate behavior when the data support it.
For comparison, it is also worth looking at the corresponding local constant and local linear fits. They are the traditional defaults one might have used before the mixed-degree route was available.
fit.lc <- npreg(y ~ x)
plot(fit.lc, errors = "bootstrap", band = "all")
fit.ll <- npreg(y ~ x, regtype = "ll")
plot(fit.ll, errors = "bootstrap", band = "all")For larger jobs, the corresponding npRmpi workflow is intended to feel familiar and, hopefully, comfortable. Initialize session mode, ask for the same statistical object, and let the package spread the computationally expensive parts where it can.
This note remains provisional because CRAN has the last word on timing, and because announcement prose should never get too far ahead of the actual packages sitting on the CRAN mirrors. But the milestone is close enough that it seems worth saying out loud: local polynomial methods in np are no longer confined to a narrow corner of regression, joint degree-bandwidth selection is available through open tools, and npRmpi is no longer merely a line in the archive or a wrapper-heavy MPI exercise.
Color me happy.