Density, Distribution, Quantiles

Practical route to unconditional and conditional density/distribution workflows, plus nonparametric quantile regression in np.

Keywords

npudens, npudist, npcdens, npcdist, npqreg, density, quantile regression

The np package is useful not only for regression but also for unconditional and conditional density estimation, distribution estimation, and nonparametric quantile regression. This page collects those workflows in one place.

If you want a minimal downloadable script for the simplest density workflow, start with np_density_quickstart.R.

If you want the shortest route to conditional density or quantile workflows, start with:

Unconditional density and distribution: `npudens` and `npudist`

Use these when the object of interest is the joint or marginal distribution itself rather than a regression function.

The Old Faithful data remain a simple and informative illustration because the shape is not well summarized by a simple parametric family.

## Fit unconditional density and distribution objects for the same data
library(np)
data(faithful, package = "datasets")

f_faithful <- npudens(~ eruptions + waiting, data = faithful)
F_faithful <- npudist(~ eruptions + waiting, data = faithful)

summary(f_faithful)
summary(F_faithful)

If you want to visualize the estimated surface, you can then plot the objects.

## Plot the fitted unconditional objects to inspect their shape
plot(f_faithful, view = "fixed", main = "")
plot(F_faithful, view = "fixed", main = "")

This is a good example of why nonparametric methods can matter: a bimodal or otherwise irregular structure is much easier to reveal when you do not force a simple parametric family on the data.

Conditional density and distribution: `npcdens` and `npcdist`

Use these when the full conditional distribution matters, not just the conditional mean.

The old GDP panel example remains helpful because the conditional distribution changes shape over time.

## Fit conditional density and distribution objects given the covariate
library(np)
data(Italy, package = "np")

fhat <- npcdens(gdp ~ year, data = Italy)
Fhat <- npcdist(gdp ~ year, data = Italy)

summary(fhat)
summary(Fhat)

Plots can then be used to inspect the evolving conditional density and conditional distribution.

## Plot the evolving conditional density and distribution surfaces
plot(fhat, view = "fixed", main = "")
plot(Fhat, view = "fixed", main = "")

This route is especially useful when the question is not simply whether the mean changes, but whether the entire distribution changes.

Practitioner-oriented extras

Some applied tasks are a little different from the standard “fit and inspect” workflow. The Cambridge practitioner material suggests three especially useful ones:

estimate a smooth probability function on discrete support,
draw a smooth resample from a fitted unconditional density,
draw a smooth resample from a fitted conditional density.

The first of these is compact enough to show inline.

Source file: np_probability_quickstart.R

## Minimal np smooth-probability example.
##
## This is a compact route for estimating a smooth probability function
## on unordered discrete support.

library(np)
options(np.messages = FALSE)

## Simulate a small unordered outcome with three support points.
set.seed(42)
X <- factor(sample(0:2, 100, replace = TRUE, prob = c(0.25, 0.45, 0.30)))

## Fit the smooth probability object using the unordered kernel.
fit <- npudens(~ X, ukertype = "aitchisonaitken", bwmethod = "cv.ml")

## Inspect the fitted object, then recover one fitted probability per class.
summary(fit)
tapply(fitted(fit), X, mean)

Additional practitioner scripts:

Copulas: `npcopula`

If the main object of interest is the dependence structure itself rather than a regression function, npcopula is the natural route.

The formula interface is now the streamlined first stop: when bws is not supplied, npcopula() automatically builds the required npudistbw() bandwidth object for the default copula target, or npudensbw() when target = "density". For the default two-dimensional grid workflow, omitting u also triggers automatic construction of a probability grid.

Source file: np_copula_quickstart.R

## Minimal np copula example.
##
## This uses the direct formula interface and lets npcopula() build
## the corresponding marginal distribution bandwidth object internally.

library(np)
library(MASS)

set.seed(42)
n <- 1000
rho <- 0.95
Sigma <- matrix(c(1, rho, rho, 1), 2, 2)

## Simulate a simple bivariate sample with strong dependence.
dat <- as.data.frame(mvrnorm(n = n, mu = c(0, 0), Sigma = Sigma))
names(dat) <- c("x", "y")

## Let npcopula() create a small default probability grid.
copula_fit <- npcopula(~ x + y, data = dat, neval = 20)

## Inspect the fitted copula object and the retained bandwidth object.
summary(copula_fit)
summary(attr(copula_fit, "bws"))

For LP-capable conditional families, the same modern shortcut is now available:

## Use the same NOMAD shortcut for LP conditional density and distribution fits
fhat_lp <- npcdens(eruptions ~ waiting, data = dat, nomad = TRUE, nmulti = 1)
Fhat_lp <- npcdist(eruptions ~ waiting, data = dat, nomad = TRUE, nmulti = 1)

That is the convenience route for automatic local-polynomial degree and bandwidth search in a single call. As with regression, it requires the crs package because NOMAD degree search is provided there.

When the local-polynomial degree is above zero, proper = TRUE is also available in the npc* calls and related plot routes. This enforces proper conditional density/distribution output in the practical sense used here: non-negative and integrating to one conditional on x.

Nonparametric quantile regression: `npqreg`

Use npqreg when your interest lies in conditional quantiles rather than means.

The direct formula interface is now the natural first stop. When bws is not supplied, npqreg() automatically calls npcdistbw() internally to select the underlying conditional-distribution bandwidth object. Bandwidth-selection options supplied to npqreg() are passed through to that internal npcdistbw() call, so you can still control the underlying conditional-distribution fit without breaking the high-level workflow.

## Extract several quantiles from one selected conditional-distribution fit
library(np)
data(Italy, package = "np")

qfit <- npqreg(gdp ~ ordered(year),
               data = Italy,
               tau = c(0.25, 0.50, 0.75))

Then compare the fitted quantiles:

## Overlay the fitted conditional quantiles
plot(qfit)

If you later want to reuse the selected bandwidths for additional quantiles without recomputing cross-validation, you still can. Call npqreg() when you need the fitted values as an object, or pass tau directly to plot() when graphical inspection is all you need:

qmed <- npqreg(gdp ~ ordered(year), data = Italy, tau = 0.50)
qfit <- npqreg(bws = qmed$bws, tau = c(0.25, 0.50, 0.75))
plot(qfit)

## Or, if plotting is the only goal:
plot(qmed, tau = c(0.25, 0.50, 0.75))

If you want derivative information, request it in the fit:

qgrad <- npqreg(gdp ~ ordered(year),
                data = Italy,
                tau = 0.50,
                gradients = TRUE)

plot(qgrad, gradients = TRUE)

The same formula interface can also use the automatic local-polynomial degree-and-bandwidth search when the conditioning data include at least one continuous variable:

Italy$year_numeric <- as.numeric(as.character(Italy$year))

qfit_lp <- npqreg(gdp ~ year_numeric,
                  data = Italy,
                  tau = c(0.25, 0.50, 0.75),
                  nomad = TRUE)

This is a good example of the intended progression: start with the direct high-level route, then reuse the retained bandwidth object only when you specifically want to avoid recomputing it.

When should I use these instead of regression?

If you want…	Start with
A regression function or conditional mean	`npreg`
An unconditional density or distribution	`npudens`, `npudist`
A conditional density or conditional distribution	`npcdens`, `npcdist`
Conditional quantiles	`npqreg`

Practical advice

density and conditional-density workflows can be computationally heavier than users expect,
use explicit bandwidth objects when reusing the same structure across multiple fits,
plots are often the most informative part of the workflow,
if the run becomes expensive and the workflow is stable, that may be a reason to move to npRmpi.

Unconditional density and distribution: npudens and npudist

Conditional density and distribution: npcdens and npcdist