Kernel Methods
np, npreg, npregbw, kernel regression, npRmpi, bandwidth selection
This page is the practical starting point for kernel-based work. In most cases the right place to begin is np, then move to npRmpi only when the same workflow becomes too slow or too large in serial.
If you want a shorter conceptual introduction before diving into examples, start with Kernel Primer.
If your interest is specifically in entropy-based testing procedures, go to Entropy and Testing.
If your focus is partially linear, single-index, or varying-coefficient methods, go to Semiparametric Models.
If you want density, distribution, or quantile workflows rather than regression, go to Density, Distribution, Quantiles.
If you want nonparametric classification or conditional mode workflows, go to Classification and Modes.
If you want to build custom kernel objects or estimators, go to Build With npksum.
If you want a focused guide to plot(), gradients, intervals, and predictions, go to Plotting and Intervals.
If your focus is testing whether variables matter or whether a parametric model is misspecified, go to Significance and Specification.
Fast routes
| If you want to… | Start here | Small runnable script |
|---|---|---|
| fit a first kernel regression | Kernel Primer | np_regression_quickstart.R |
| work with densities, distributions, or quantiles | Density, Distribution, Quantiles | Quickstarts |
| work with classification or conditional mode | Classification and Modes | np_classification_quickstart.R |
| work with partially linear, single-index, or varying-coefficient models | Semiparametric Models | np_semiparametric_quickstart.R |
| stay serial for now but look at richer scripts | Code Catalog | Worked Examples |
| run the same style of workflow with MPI | MPI and Large Data | nprmpi_session_quickstart.R |
A very small np workflow
The code below is pulled from the current starter script.
Source file: np_regression_quickstart.R
rm(list = ls())
## Minimal np regression example.
##
## The intended workflow is:
## 1. compute a bandwidth object,
## 2. fit the regression estimator,
## 3. inspect the result and a simple fitted curve.
library(np)
options(np.messages = FALSE)
data(cps71, package = "np")
dat <- cps71[, c("logwage", "age")]
bw <- npregbw(logwage ~ age, data = dat, regtype = "ll", bwmethod = "cv.aic")
fit <- npreg(bws = bw, data = dat)
summary(bw)
summary(fit)
plot(dat$age, dat$logwage, cex = 0.25, col = "grey")
o <- order(dat$age)
lines(dat$age[o], fitted(fit)[o], col = 2, lwd = 2)If you want density, distribution, conditional-density, conditional-distribution, or quantile starters instead, use Quickstarts.
A multivariate route
When you move beyond the single-regressor case, the next useful stop is usually Multivariate Regression and Prediction.
When to move to npRmpi
Move to npRmpi when the model you want is essentially the same, but bandwidth selection, bootstrap work, or repeated estimation becomes expensive enough that MPI is worthwhile. On platforms where spawning is supported, the current package offers a session mode in which the estimator calls retain the ordinary np style once MPI has been initialized.
See MPI and Large Data for the updated route, starting with nprmpi_session_quickstart.R on platforms where spawning is supported.
Longer examples
The following scripts remain useful and are grouped here by task.
- Local constant and local linear regression: regression_intro_a.R
- Derivative estimation: regression_intro_b.R
- Plotting fitted objects and intervals: regression_intro_c.R
- Multivariate regression: regression_multivar_a.R
- Infinite-order local polynomial comparison: demo_poly.R
Existing serial versus MPI script pairs
These are useful if you want to compare the older serial and MPI example scripts side by side, especially when you want route parity or a historical point of reference.
- Conditional density estimation: npcdensls_serial.R, npcdensls_npRmpi.R
- Model specification testing: npcmstest_serial.R, npcmstest_npRmpi.R
- Conditional mode estimation: npconmode_serial.R, npconmode_npRmpi.R
- Density equality testing: npdeneqtest_serial.R, npdeneqtest_npRmpi.R
- Single index estimation: npindexich_serial.R, npindexich_npRmpi.R
- Partially linear regression: npplreg_serial.R, npplreg_npRmpi.R
- Local linear regression with AIC bandwidth selection: npregllaic_serial.R, npregllaic_npRmpi.R
- Serial dependence testing: npsdeptest_serial.R, npsdeptest_npRmpi.R
- Significance testing: npsigtest_serial.R, npsigtest_npRmpi.R
- Unconditional density estimation: npudensml_serial.R, npudensml_npRmpi.R
Papers
- Hayfield, T. and J.S. Racine (2008), “Nonparametric Econometrics: The np Package,” Journal of Statistical Software, 27(5), 1-32.
- Harrison, T.D. (2008), “Review of np Software for R,” Journal of Applied Econometrics, 23, 861-865.
- Ho, A.T., K.P. Huynh, and D.T. Jacho-Chavez (2011), “npRmpi: A Package for Parallel Distributed Kernel Estimation in R,” Journal of Applied Econometrics, 26, 344-349.