FAQ and Troubleshooting
FAQ, troubleshooting, np, npRmpi, crs, install, factor, MPI
This page keeps the useful question-and-answer material and drops the old habit of duplicating changelog text inside the FAQ.
Two support pages now carry the more detailed practical guidance:
Fast routes
| If you need… | Start here |
|---|---|
| the smallest runnable scripts | Quickstarts |
| density, conditional-density, distribution, or quantile examples | Density, Distribution, Quantiles |
| classification or conditional-mode examples | Classification and Modes |
| plotting, predictions, or interval examples | Plotting and Intervals |
npRmpi install and launch guidance |
MPI and Large Data |
| factors, ordered variables, and formula traps | Data Preparation and Variable Types |
| runtime, memory, and scaling advice | Runtime, Memory, and Scaling |
| function-name lookup | Reference and Function Lookup |
Which package should I start with?
Start with np unless you already know that you need MPI or spline methods.
- Use
npfor core kernel regression, density, distribution, and testing on one machine. - Use
npRmpiwhen the same workflow becomes large or slow enough that MPI is warranted. - Use
crswhen regression splines or spline-specific constraints are the natural tool.
The short chooser lives on Choose a Package.
I am new to R. Where should I begin?
Begin with Install and Get Started. Then run one small example from Quickstarts before worrying about the longer scripts.
Good general resources are:
How do I keep my installed packages current?
This remains a sensible command:
update.packages(checkBuilt = TRUE, ask = FALSE)Where is the gentlest introduction to the packages?
Use the package vignettes and then the gallery pages.
vignette("np", package = "np")
vignette("entropy_np", package = "np")
vignette("crs", package = "crs")
vignette("spline_primer", package = "crs")For a shorter web route, start with:
How do I find functions, examples, and demos quickly?
For package help:
library(help = "np")
library(help = "crs")
?npreg
?crsFor runnable examples:
example(npreg)
example(crs)
demo(package = "crs")For website-first routing:
Where do I find runnable examples?
- For
np, start with Quickstarts, then move to Density, Distribution, Quantiles or Worked Examples depending on the task. - For
npRmpi, start with Quickstarts and then MPI and Large Data. - For
crs, usedemo(package = "crs")and the scripts on Splines. - For interactive teaching examples, use Interactive Demos.
How should I prepare mixed data?
The class of each variable matters. Continuous variables can remain numeric, but unordered categorical variables should be factors and ordered categorical variables should be ordered factors when that distinction is meaningful.
mydat <- data.frame(
y = rnorm(200),
x_cont = runif(200),
x_unordered = factor(sample(c("a", "b", "c"), 200, replace = TRUE)),
x_ordered = ordered(sample(1:4, 200, replace = TRUE))
)This is not just bookkeeping. In np, the variable class affects the weighting rule used by the estimator.
For a more deliberate checklist, see Data Preparation and Variable Types.
Does formula syntax imply a linear-additive model?
No. In np, a formula such as
y ~ x1 + x2 + x3simply tells the function which variable is the response and which are the covariates. It does not mean that you are fitting an ordinary linear-additive model. The formula interface is a convenient way to specify the variables, not a commitment to parametric structure.
Why should I create an explicit bandwidth object?
Because in np the bandwidth object is often the key object in the workflow.
bw <- npregbw(y ~ x1 + x2, data = mydat)
summary(bw)
fit <- npreg(bws = bw)Working explicitly this way makes it easier to inspect the chosen bandwidths, reuse them, and avoid unnecessary recomputation.
For the broader runtime side of this issue, see Runtime, Memory, and Scaling.
Where do I find the changelog?
The FAQ should not duplicate changelog text.
Use the package page on CRAN and the package repo NEWS.md / release notes instead. The point is to keep one canonical record of changes rather than two or three drifting copies.
Some older examples use attach(). Should I?
Older scripts and vignettes sometimes use attach() because that was a common style at the time. For modern work, it is usually better to avoid attach() and instead use:
- explicit
data =arguments, - explicit data-frame references such as
mydat$x, - or a short preprocessing step that creates a clean data frame.
That keeps variable scope clearer and reduces accidental name collisions.
Why can bandwidth selection take a while?
Because many of these methods are doing real search and resampling work rather than evaluating a closed-form formula once.
Practical advice:
- start with a small example,
- confirm the workflow in serial first,
- move to
npRmpionly when the serial workflow is the right one but too slow for the job at hand.
The fuller version of that advice now lives on Runtime, Memory, and Scaling.
I have data from Stata, SAS, SPSS, or elsewhere. Can I still use these packages?
Yes. The important point is not the original file format but the data frame you hand to the modeling function once the data are in R. After import, make sure the variable classes are correct before fitting models, especially for factors and ordered factors.
How do I cite the packages?
Use citation() from inside R and see Books, Papers, and Citation for the package papers and books.
citation("np")
citation("npRmpi")
citation("crs")I am using npRmpi and see npRmpi auto-dispatch requires an active MPI slave pool
That means there is no active MPI session.
- For interactive work, call
npRmpi.init(mode = "spawn", ...)first. - For
mpiexecjobs, initialize withnpRmpi.init(mode = "attach", ...)near the top of the script.
On current package sources, Windows belongs in the second branch rather than the first.
The detailed route is on MPI and Large Data.
I am using npRmpi and see could not find function "mpi.bcast.cmd"
This usually means a manual-broadcast/profile script started without the required profile startup.
Check the following:
R_PROFILE_USERpoints to the intended profile file,R_PROFILEis cleared,- the command is not using
R CMD BATCH --vanilla, - the effective launch command really includes
-env R_PROFILE_USER <path>.
I need examples for npRmpi on larger or more complex systems
Use MPI and Large Data. That page now treats:
session/spawnas the default instructional path,attachas the normalmpiexecautodispatch route,profile/ manual-broadcast as the advanced route, and often the right one for heterogeneous clusters.
Where should I look next?
Can crs recover something as simple as linear regression?
Yes. Spline methods include simpler structures as special cases when the basis is restricted appropriately. A short illustration now appears on Splines.
I am using crs and the search is slow or memory hungry
That usually means the search space is simply large, not that the fit is broken.
The first levers to inspect are:
cv.df.mindegree.maxsegments.maxbasiscomplexity- optimizer display options such as
opts = list("DISPLAY_DEGREE" = 3)
The practical route now lives on Spline Search and Tuning, with the broader ecosystem version on Runtime, Memory, and Scaling.
I want a smoother spline, fixed degree, or the selected knots
Those are all legitimate practical questions, and they are better handled directly than by pretending the cross-validated answer should have been different.
See Spline Search and Tuning for:
- imposing a minimum spline degree,
- holding degree fixed and searching over knots,
- retrieving knot locations from the fitted model,
- and the linear-regression special case.
Legacy FAQ PDFs
The old FAQs still contain useful material, but they also contain obsolete changelog duplication and older presentation choices.
- Legacy
npFAQ on CRAN: np_faq.pdf - Legacy
crsFAQ on CRAN: crs_faq.pdf