Mixed-degree local polynomials arrive in np (and npRmpi rises from the grave)

File this one under “ideas that took a while to become mainstream”.

Some background is likely in order. The np and npRmpi packages have been on CRAN for a very long time now, and have lived through the ups and downs experienced by any long-lived research software project: bug fixes, maintenance, refreshes, updates, and so forth. In fact, the CRAN archives indicate that the initial version to appear on CRAN was np_0.12-1.tar.gz, released on 2006-11-19 (https://cran.r-project.org/src/contrib/Archive/np/).

The new np and npRmpi 0.70-3 releases are not minor updates dressed up with new version numbers. Together with crs 0.15-44, they are fairly consequential extensions of packages whose core purpose remains unchanged: providing researchers and practitioners with useful open-source nonparametric kernel methods for regression, density, distribution, quantile, partially linear, single-index, smooth-coefficient, and modal models, with mixed data types treated as first-class citizens rather than afterthoughts.

The headline change, one of many, is the extension of local polynomial methods across the conditional estimators in the package. Historically, most of these methods were classical in nature, that is, local constant or what is now denoted regtype = "lc". Regression had the familiar regtype = "ll" option for local linear estimation, but density, distribution, semiparametric regression, conditional quantiles, and modes largely lived in the local-constant world.

What is new is not merely that higher-order local polynomial approximations are available. The more exciting update is that the polynomial degree and bandwidth vectors can be selected jointly. Furthermore, the selected polynomial degree need not be the same for every continuous covariate. One dimension may call for degree 0 (local constant), another for degree 1 (local linear), and another for a higher-order degree, for example, \(p=5\), requiring the degree vector degree = c(0, 1, 5). This approach is grounded in the seminal work of Hall and Racine (2015), who proposed this “infinite-order” local polynomial regression framework, and also in more recent work by Li et al. (2026), who extended this logic to bounded-support conditional density estimation with known or unknown support. The basic idea is to let the data determine the amount of local polynomial structure, which may differ variable by variable, rather than forcing the practitioner to make an ad hoc global (that is, common) choice before estimation can proceed, and to do this jointly with other key tuning parameters such as the bandwidths.

That may sound simple, but computationally it is anything but simple. Jointly selecting continuous predictor bandwidths, categorical predictor smoothing parameters, and integer-valued polynomial degrees is a bounded mixed-integer optimization problem, and these problems are famously difficult to solve. In the past, serious work in this direction often meant reaching for commercial solvers such as IBM ILOG CPLEX Optimizer. That is powerful proprietary closed-source software, but not the open research-software path I could rely on for np.

The solution turned out to be the Nonsmooth Optimization via Mesh Adaptive Direct Search algorithm family (NOMAD), developed by the team at GERAD. They kindly gave us permission to include the open-source NOMAD algorithms in the crs package many years ago, which I co-authored with Zhenghua Nie, and that has made it possible to now bring this machinery into the np and npRmpi workflows using sophisticated optimization tools that sit much more naturally with the open R philosophy. The refreshed crs package now supplies the NOMAD solver interface used by np and npRmpi, while the old crs::npglpreg() surface has been retired (deprecated) in favour of the more natural

npreg(..., nomad = TRUE)

interface in np and npRmpi.

There is also a broader set of recent extensions beyond this central change that bears highlighting. The Gallery of Code is being updated to highlight several of them more prominently, including:

bounded-support and statistically proper local-polynomial extensions of conditional density and distribution methods, exposed through proper = TRUE, ckerbound, and the conditional cxkerbound/cykerbound controls ("range" for empirical support, "fixed" with *lb/*ub for user-supplied bounds), drawing on boundary-adaptive kernel ideas (Racine et al. 2024);
more natural and complete first-class conditional workflows across the family, including npqreg, nplsqreg, npconmode, and npcopula, with formula support, automatic internal bandwidth selection when bws is omitted, vector tau (npqreg and nplsqreg), prediction and residual support (nplsqreg), newdata-friendly high-level usage (npconmode), automatic two-dimensional probability-grid construction for the default copula route (npcopula), gradients now available across the conditional estimators, standard errors, and NOMAD degree-and-bandwidth pass-through where applicable;
higher-order derivative estimation across the local-polynomial conditional family, so derivatives of arbitrarily high order can now be estimated in principle, limited only by the polynomial degree, whether that degree is set manually or selected automatically to suit the underlying smoothness class;
one-call hat-matrix and operator helpers such as npreghat, npindexhat, npplreghat, and npscoefhat, which now simplify constrained nonparametric and semiparametric estimation by replacing much of the older manual matrix-construction workflow, together with npudenshat, npudisthat, npcdenshat, and npcdisthat, which extend the same streamlined operator style to the density and distribution family;
ordered-datatype probability smoothing available through the Racine-Li-Yan ordered kernel (okertype = "racineliyan", and oxkertype/oykertype in conditional routes) (Racine et al. 2020);
runtime improvements such as the new options(np.tree = "auto") default, automatic compression for repetitive categorical designs, and tighter LP Ichimura single-index objective evaluation;
Bonferroni and simultaneous variability bounds on plots with data overlays by default;
interactive plotting surfaces through renderer = "rgl" when rgl is available, with transparent wireframes replacing the previous perspective plots.

For a small but devoted band of users who endured years of working with an outdated version of the npRmpi package, the slightly more noteworthy news may be that npRmpi is now back from the grave. The package had effectively lain dormant after its last 2014 CRAN-era release because my co-author Tristen Hayfield, largely responsible for bringing this package to life, had moved on after completing his Ph.D. in computational astrophysics at ETH Zurich, and I never seemed able to muster the time or energy to keep it current. That changed toward the end of 2025, and the package received the small 0.60-20 update needed to get it listed on CRAN again in early 2026. The goal now is to have npRmpi move in step with np for users whose problems are large enough that parallel bandwidth selection, mixed-degree searches, bootstrap inference, or plot-data construction are worth spreading over MPI workers.

What users of the previous npRmpi versions may notice most is that the old npRmpi workflow often required users to wrap ordinary code in explicit MPI calls, while the new version does not. That route remains available through the profile/manual-broadcast interface for people who want that level of control and the small but consistent speed advantages it affords, but it is no longer the only way to work. The current package supports profile, attach, and session modes. In session mode, once MPI has been initialized, the analysis code is essentially interchangeable with ordinary np code: fit the model, inspect the object, plot it, and move between serial and MPI execution with far less ceremony. That should make it much easier to exploit the multiple cores already sitting on a desktop or laptop without turning every analysis script into an MPI programming exercise.

Other visible usability changes are similar in spirit. Plotting has a cleaner public interface (both packages have received a substantial plot redesign), so common requests now look like ordinary R arguments rather than a tour through historical option names. Bootstrap bands, asymptotic standard errors, plot behavior, renderer choice, grid controls, and perspective controls are now easier to express, while the older plot.errors.* names remain accepted for compatibility. There has also been substantial work on worker setup, OpenMPI runtime discovery, master/worker dispatch, object-fed plot helpers, bootstrap fanout, and keeping npRmpi genuinely independent at runtime. The intended result is simple: serial and MPI packages should agree in statistical behaviour, while the MPI version gives users a practical path for computationally heavier work.

All of this has also prompted a broader cleanup of the public-facing material around the packages. The Gallery of Code is being refreshed in parallel so that examples, documentation, vignettes, and issue links are easier to find once the CRAN-facing APIs are public. My research and CV pages have also been updated so that software, papers, books, and replication material are less scattered across the web. This may sound like housekeeping, but for open research software it is no minor detail. If people cannot find the work, access working examples, understand what is current, or report a problem when they find one, the software is less useful than it ought to be.

As of Wednesday, June 3, 2026, crs 0.15-44 and npRmpi 0.70-3 are live on CRAN. The np package page is still showing 0.70-2, with np 0.70-3 expected shortly. The links below should be the first places to look.

If you are coming to the packages after a long absence, the main thing to try is probably the simplest:

library(np)

set.seed(42)
n <- 1000

x <- sort(runif(n, -2, 2))
y <- x + 0.1*x^5 + rnorm(n)

fit <- npreg(y ~ x, nomad = TRUE)

plot(fit, errors = "bootstrap", band = "all")

summary(fit)

In this simple example NOMAD recovers the fifth-degree polynomial structure without being told the data-generating process. Since the selected polynomial order matches the DGP and cross-validation selects a large bandwidth, the estimator effectively becomes a global polynomial fit rather than a local one, and achieves the parametric \(\sqrt{n}\) rate shown by Hall and Racine (2015). But the point is not limited to finite polynomials: their theory covers a rich class of particularly smooth analytic DGPs, including trigonometric functions and exponentiated polynomials, so the method remains nonparametric while being capable of parametric-rate behavior when the data support it.

For comparison, it is also worth looking at the corresponding local constant and local linear fits. They are the traditional defaults one might have used before the mixed-degree route was available.

fit.lc <- npreg(y ~ x)
plot(fit.lc, errors = "bootstrap", band = "all")

fit.ll <- npreg(y ~ x, regtype = "ll")
plot(fit.ll, errors = "bootstrap", band = "all")

For larger jobs, the corresponding npRmpi workflow is intended to feel familiar and, hopefully, comfortable. Initialize session mode, ask for the same statistical object, and let the package spread the computationally expensive parts where it can.

So the milestone is now real enough to say out loud: local polynomial methods in np are no longer confined to a narrow corner of regression, joint degree-bandwidth selection is available through open tools, and npRmpi is no longer merely a line in the archive or a wrapper-heavy MPI exercise.

Color me happy.

References

Hall, Peter, and Jeffrey S. Racine. 2015. “Infinite Order Cross-Validated Local Polynomial Regression.” Journal of Econometrics 185: 510–25. https://doi.org/10.1016/j.jeconom.2014.06.003.

Li, Angda, Qi Li, and Jeffrey S. Racine. 2026. Boundary Adjusted, Polynomial Adaptive, Nonparametric Kernel Conditional Density Estimation. Under revision.

Racine, Jeffrey S., Qi Li, and Qiao Wang. 2024. “Boundary-Adaptive Kernel Density Estimation: The Case of (Near) Uniform Density.” Journal of Nonparametric Statistics 36 (1): 146–64. https://doi.org/10.1080/10485252.2023.2250011.

Racine, Jeffrey S., Qi Li, and K. X. Yan. 2020. “Kernel Smoothed Probability Mass Functions for Ordered Datatypes.” Journal of Nonparametric Statistics 32 (3): 563–86. https://doi.org/10.1080/10485252.2020.1759595.