Kernel Methods

Practical entry page for np workflows, with a clear route to npRmpi when jobs become large or slow.
Keywords

np, npreg, npregbw, kernel regression, npRmpi, bandwidth selection

This page is the practical starting point for kernel-based work. In most cases the right place to begin is np, then move to npRmpi only when the same workflow becomes too slow or too large in serial.

If you want a shorter conceptual introduction before diving into examples, start with Kernel Primer.

If your interest is specifically in entropy-based testing procedures, go to Entropy and Testing.

If your focus is partially linear, single-index, or varying-coefficient methods, go to Semiparametric Models.

If you want density, distribution, or quantile workflows rather than regression, go to Density, Distribution, Quantiles.

If you want nonparametric classification or conditional mode workflows, go to Classification and Modes.

If you want to build custom kernel objects or estimators, go to Build With npksum.

If you want a focused guide to plot(), gradients, intervals, and predictions, go to Plotting and Intervals.

If your focus is testing whether variables matter or whether a parametric model is misspecified, go to Significance and Specification.

Fast routes

If you want to… Start here Small runnable script
fit a first kernel regression Kernel Primer np_regression_quickstart.R
work with densities, distributions, or quantiles Density, Distribution, Quantiles Quickstarts
work with classification or conditional mode Classification and Modes np_classification_quickstart.R
work with partially linear, single-index, or varying-coefficient models Semiparametric Models np_semiparametric_quickstart.R
stay serial for now but look at richer scripts Code Catalog Worked Examples
run the same style of workflow with MPI MPI and Large Data nprmpi_session_quickstart.R

A very small np workflow

The code below is pulled from the current starter script.

Source file: np_regression_quickstart.R

rm(list = ls())

## Minimal np regression example.
##
## The intended workflow is:
## 1. compute a bandwidth object,
## 2. fit the regression estimator,
## 3. inspect the result and a simple fitted curve.

library(np)
options(np.messages = FALSE)

data(cps71, package = "np")
dat <- cps71[, c("logwage", "age")]

bw <- npregbw(logwage ~ age, data = dat, regtype = "ll", bwmethod = "cv.aic")
fit <- npreg(bws = bw, data = dat)

summary(bw)
summary(fit)

plot(dat$age, dat$logwage, cex = 0.25, col = "grey")
o <- order(dat$age)
lines(dat$age[o], fitted(fit)[o], col = 2, lwd = 2)

If you want density, distribution, conditional-density, conditional-distribution, or quantile starters instead, use Quickstarts.

A multivariate route

When you move beyond the single-regressor case, the next useful stop is usually Multivariate Regression and Prediction.

When to move to npRmpi

Move to npRmpi when the model you want is essentially the same, but bandwidth selection, bootstrap work, or repeated estimation becomes expensive enough that MPI is worthwhile. On platforms where spawning is supported, the current package offers a session mode in which the estimator calls retain the ordinary np style once MPI has been initialized.

See MPI and Large Data for the updated route, starting with nprmpi_session_quickstart.R on platforms where spawning is supported.

Longer examples

The following scripts remain useful and are grouped here by task.

Existing serial versus MPI script pairs

These are useful if you want to compare the older serial and MPI example scripts side by side, especially when you want route parity or a historical point of reference.

Papers

  • Hayfield, T. and J.S. Racine (2008), “Nonparametric Econometrics: The np Package,” Journal of Statistical Software, 27(5), 1-32.
  • Harrison, T.D. (2008), “Review of np Software for R,” Journal of Applied Econometrics, 23, 861-865.
  • Ho, A.T., K.P. Huynh, and D.T. Jacho-Chavez (2011), “npRmpi: A Package for Parallel Distributed Kernel Estimation in R,” Journal of Applied Econometrics, 26, 344-349.
Back to top