Locally Adaptive Online
Functional Data Analysis

Valentin Patilea\(^\dagger\)

ENSAI & CREST\(^\dagger\), valentin.patilea@ensai.fr

Jeffrey S. Racine\(^\ddagger\)

McMaster University\(^\ddagger\), racinej@mcmaster.ca

Tuesday, June 25, 2024

Slide Pro-Tips

  • Link to slides - jeffreyracine.github.io/Braga (case sensitive, Google Translate)

  • Link to paper - https://ideas.repec.org/p/mcm/deptwp/2024-04.html

  • View full screen by pressing the F key (press the Esc key to revert)

  • Access navigation menu by pressing the M key (click X in navigation menu to close)

  • Advance using arrow keys

  • Zoom in by holding down the Alt key in MS Windows, Opt key in macOS or Ctrl key in Linux, and clicking on any screen element (Alt/Opt/Ctrl click again to zoom out)

  • Export to a PDF by pressing the E key (wait a few seconds, then print [or print using system dialog], enable landscape layout, then save as PDF - press the E key to revert)

  • Enable drawing tools - chalk board by pressing the B key (B to revert), notes canvas by pressing the C key (C to revert), press the Del key to erase, press the D key to download drawings

Abstract

One drawback with classical smoothing methods (kernels, splines, wavelets etc.) is their reliance on assuming the degree of smoothness (and thereby assuming continuous differentiability up to some order) for the underlying object being estimated. However, the underlying object may in fact be irregular (i.e., non-smooth and even perhaps nowhere differentiable) and, as well, the (ir)regularity of the underlying function may vary across its support. Elaborate adaptive methods for curve estimation have been proposed, however, their intrinsic complexity presents a formidable and perhaps even insurmountable barrier to their widespread adoption by practitioners. We contribute to the functional data literature by providing a pointwise MSE-optimal, data-driven, iterative plug-in estimator of “local regularity” and a computationally attractive, recursive, online updating method. In so doing we are able to separate measurement error “noise” from “irregularity” thanks to “replication”, a hallmark of functional data. Our results open the door for the construction of minimax optimal rates, “honest” confidence intervals, and the like, for various quantities of interest.

Outline of Talk

  • Modern data is often functional in nature (e.g., an electrocardiogram (ECG) and many other measures recorded by wearable devices)

  • The analysis of functional data requires nonparametric methods

  • However, nonparametric methods rely on smoothness assumptions (i.e., require you to assume something we don’t know)

  • We show how we can learn the degree of (non)smoothness in functional data settings and we separate this from measurement noise (this cannot be done with classical data)

  • This allows us to conduct functional data analysis that is optimal (we do this in a statistical framework)

  • We emphasize online computation (i.e., how to update when new functional data becomes available)

Classical Versus Functional Data

  • A defining feature of classical regression analysis is that

    • sample elements are random pairs, \((y_i,x_i)\)

    • the function of interest \(\mathbb{E}(Y|X=x)\) is non-random

  • A defining feature of functional data analysis is that

    • sample elements are random functions, \(X^{(i)}\)

    • these are also functions of interest

  • The following figure (Figure 1) presents \(N=25\) sample elements (classical left plot, functional right plot)

Classical Versus Functional Data

Figure 1: Classical Versus Functional Sample Elements (\(N=25\))

Functional Data

  • FDA is the statistical analysis of samples of curves (i.e., samples of random variables taking values in spaces of functions)

  • FDA has heterogeneous, longitudinal aspects (“individual trajectories”)

  • Curves are continuums, so we never know the curve values at all points

    • The curves are only available at discrete points (e.g., \((Y^{(i)}_m , T^{(i)}_m) \in\mathbb R \times [0,1]\))

    • The points at which curves are available can differ across curves

    • The curves may be measured with error

  • Consider measurements taken from 1 random curve:

    • Figure 2 is measured without error from an irregular curve

    • Figure 3 is measured with error from a smooth curve

    • Figure 4 displays varying (ir)regularity and measurement noise

FDA Sample Element (1 Random Function)

Figure 2: Irregular Function, Data Measured Without Error

FDA Sample Element (1 Random Function)

Figure 3: Regular Function, Noisy Data

FDA Sample Element (1 Random Function)

Figure 4: Irregular Function, Varying Regularity, Noisy Data

FDA Sample Elements (Random Functions)

Figure 5: Berkeley Growth Study Data

FDA Sample Elements (Random Functions)

Figure 6: Canadian Weather Study Data

Functional Data Setting

Functional Data

  • Functional data carry information along the curves and among the curves

  • Consider a second-order stochastic process with continuous trajectories, \(X = (X_t : t\in [0,1])\)

  • The mean and covariance functions are \[\begin{equation*} \mu(t) = \mathbb{E}(X_t)\text{ and } \Gamma (s,t) = \mathbb{E}\left\{ [X_s - \mu(s)] [X_t-\mu(t)]\right\},\, s,t\in [0,1] \end{equation*}\]

  • The framework we consider is one where independent sample path realizations \(X^{(i)}\), \(i=1,2\ldots,N\), of \(X\) are measured with error at discrete times

  • The data associated with the \(i\)th sample path \(X^{(i)}\) consists of the pairs \((Y^{(i)}_m , T^{(i)}_m) \in\mathbb R \times [0,1]\) generated as \[\begin{equation*} Y^{(i)}_m = X^{(i)}(T^{(i)}_m) + \varepsilon^{(i)}_m, \qquad 1\leq m \leq M_i \end{equation*}\]

Measurement Errors and Design

  • The \(\varepsilon^{(i)}_m\) are measurement errors, and we allow \[\begin{equation*} \varepsilon^{(i)}_m = \sigma(T^{(i)}_m) e^{(i)}_m, \quad 1\leq m \leq M_i \end{equation*}\]

  • The \(e^{(i)}_m\) are independent copies of a centred variable \(e\) with unit variance, and \(\sigma(T^{(i)}_m)\) is some unknown bounded function which accounts for possibly heteroscedastic measurement errors

  • Our approach applies to both independent design and common design cases

  • Relative to the number of curves, \(N\), the number of points per curve, \(M_i\), may be small (“sparse”) or large (“dense”)

Replication

  • One key distinguishing feature of FDA is that of “replication” (i.e., common structure among curves)

  • Essentially, there is prior information in the \(N-1\) sample curves that can be exploited to learn about the \(N\)th, which is not available in, say, classical regression analysis

  • This common structure can be exploited for a variety of purposes

  • For instance, it will allow us to obtain estimates of the regularity of the curves that may vary across their domain \(t\in[0,1]\) (i.e., local regularity estimates)

  • This would not be possible in the classical nonparametric setting where we are restricted to a single curve only

Local Regularity

Overview

  • Smoothness […] such as the existence of continuous second derivatives, is often imposed for regularization and is especially useful if nonparametric smoothing techniques are employed, as is prevalent in FDA” (Wang, Chiou, and Müller 2016)

  • This is problematic since imposing an unknown (and global) degree of smoothness may be incompatible with the underlying stochastic process

  • A key feature of our approach is its data-driven locally adaptive nature

  • We consider a meaningful regularity concept for the data generating process based on probability theory

  • We propose simple estimates for local regularity and link process regularity to sample path regularity

Definition

  • A key element of our approach is “local regularity”, which here is the largest order fractional derivative admitted by the sample paths of \(X\) as measured by the value of \(H_t\), the “local Hölder exponent”, which may vary with \(t\)

  • More precisely, here “local regularity” is the largest value \(H_t\) for which, uniformly with respect to \(u\) and \(v\) in a neighborhood of \(t\), the second order moment of \((X_u-X_v)/|u-v|^{H_t}\) is finite

  • We can then assume \[\begin{equation*} \mathbb{E}\left[(X_u-X_v)^2\right]\approx L_t^2|u-v|^{2H_t} \end{equation*}\] when \(u\) and \(v\) lie in a neighborhood of \(t\)

  • If a function is smooth (i.e., continuously differentiable), then \(H_t=1\), otherwise the function is non-smooth with \(0<H_t<1\)

  • If a function is a constant function then \(L_t=0\), otherwise \(L_t>0\)

Definition

  • Let \([t-\Delta_*/2, t + \Delta_*/2] \cap [0,1]\), and define \[\begin{align*} \theta(u,v) &= \mathbb{E}\left[ (X_u-X_v)^{2} \right], \quad\text{ hence }\\ \theta(u,v) &\approx L_t^2 |u-v|^{2H_t} \quad \text{if } |u-v| \text{ is small and close to }t \end{align*}\]

  • Letting \(\Delta_* =2^{-1}e^{-\log(\bar M_i)^{1/3}}>0\), \(t_1=t-\Delta_*/2\), \(t_3= t + \Delta_*/2\), and \(t_2=(t_1+t_3)/2\) (the definition of \(t_1\) and \(t_3\) is adjusted in boundary regions), then we show that \[\begin{equation*} H_t \approx \frac{\log(\theta(t_1,t_3)) - \log(\theta(t_1,t_2))}{2\log(2)} \quad \text{if } |t_3-t_1| \text{ is small} \end{equation*}\]

  • Moreover, \[\begin{equation*} L_t \approx \frac{\sqrt{\theta(t_1,t_3)}}{|t_1-t_3|^{H_t} } \quad \text{if } |t_3-t_1| \text{ is small} \end{equation*}\]

Estimation

  • The idea is to estimate \(\theta(t_1,t_3)\) and \(\theta(t_1,t_2)\) averaging vertically over curves

  • Given estimates \(\widehat\theta(t_1,t_3)\) and \(\widehat\theta(t_1,t_2)\), the estimators of \(H_{t}\) and \(L_t\) are given by \[\begin{equation*} \widehat H_t = \frac{\log(\widehat\theta(t_1,t_3)) - \log(\widehat\theta(t_1,t_2))}{2\log(2)},\quad \widehat L_t = \frac{\sqrt{\widehat\theta(t_1,t_3)}}{|t_1-t_3|^{\widehat H_t} } \end{equation*}\]

  • The estimator \(\widehat{\theta}(t_l,t_j)\) is the average of local curve smoothers, i.e., \[\begin{equation*} \widehat{\theta}(t_l,t_j)=\frac{1}{N}\sum_{i = 1}^N\left(\widetilde X^{(i)}(t_l)-\widetilde X^{(i)}(t_j)\right)^2 \end{equation*}\]

  • The smoother \(\widetilde X^{(i)}(t)\) depends on a bandwidth \(h_t\) that, post-iteration, adapts to the local regularity of the underlying process

Assumptions

Assumptions

  • The stochastic process \(X\) is a random function taking values in \(L^2 (\mathcal T)\), with \(\mathbb E (\| X\|^2) <\infty\)

  • The process is not deterministic with all sample paths equal to a common path

  • The increments of the process have any moment, and the distributions of the increments are sub-Gaussian

  • The functions \(X^{(i)}(t)\) may be nowhere differentiable

  • The process \(X\) may be non-stationary with non-stationary increments

  • The measurement errors \(\varepsilon^{(i)}\) may be heteroscedastic

  • The mean function \(\mu(t)\) may be smoother than the \(X^{(i)}(t)\) functions

  • \(0<L_t<\infty\) and \(0<H_t<1\)

Estimation of Local Regularity

Methodology

  • We take the optimal bandwidth expression for \(h_t\) (which depends on \(H_t\) and \(L_t\)) that minimizes pointwise MSE using a general squared bias term (not the usual term one gets assuming twice differentiable curves)

  • Then, given an initial batch of \(N\) curves, we estimate \(H_t\) and \(L_t\) for each \(t\in \mathcal T_0\), which involves an iterative plug-in procedure:

    1. begin with some starting values for the local bandwidths \(h_t\)

    2. construct preliminary estimates of each curve for every \(t\in \mathcal T_0\) using the data pairs \((Y^{(i)}_m , T^{(i)}_m)\) and local bandwidth starting values

    3. use these preliminary curve estimates to get starting values for \(H_t\) and \(L_t\) for every \(t\in \mathcal T_0\), and plug these into the optimal bandwidth expression

    4. repeat 1-3 using the updated plug-in bandwidths; continue iterating \(H_T\), and \(L_t\) and \(h_t\) for every \(t\in \mathcal T_0\) until the procedure stabilizes (this occurs quite quickly, typically after 10 or so iterations)

Details

  • To estimate the \(i\)th curve at a point \(t\) with local Hölder exponent \(H_t\) and local Hölder constant \(L_t\), the MSE-optimal bandwidth \(h^*_{t,HL}\) is \[\begin{equation*} h^*_{t,HL} = \left[ \frac{\sigma_t^2 \int K^2(u)du }{2H_t L_t^2\times \int |u|^{2H_t}|K(u)|du\times f_T(t)}\times \frac{1}{\bar M_i} \right]^{\frac{1}{2H_t+1}} \end{equation*}\]

  • The kernel function \(K(u)\) is provided by the user hence \(\int K^2(u)du\) and \(\int |u|^{2H_t}|K(u)|du\) can be computed given \(H_t\)

  • \(\sigma_t^2\) is estimated using one-half the squared differences of the two closest \(Y^{(i)}\) observations at \(t\), averaged across all curves

  • The design density \(f_T(t)\) is straightforward to estimate

  • We estimate \(H_t\) and \(L_t\) for some batch of \(N\) curves as outlined on the previous slide, then we recursively update them as online data arrives

MfBm Example (independent design)

Figure 7: Multifractional Brownian Motion - True Curves

MfBm Example (independent design)

Figure 8: Multifractional Brownian Motion - Data

MfBm Example (independent design)

Figure 9: Multifractional Brownian Motion - Bandwidth Iteration

MfBm Example (independent design)

Figure 10: Multifractional Brownian Motion - Estimated and True Regularity

Berkeley Growth Study Example (common design)

Figure 11: Berkeley Growth Example - Estimated Regularity

Canadian Weather Study Example (common design)

Figure 12: Canadian Weather Example - Estimated Regularity

\(H_t\) Comparison: growth, canWeather, MfBm

Figure 13: Estimated Regularity Comparison

Estimation of \(\mu(t)\) and \(\Gamma(s,t)\)

Estimation of \(\mu(t)\) and \(\Gamma(s,t)\)

  • In order to estimate \(\mu(t)\) at point \(t\) and \(\Gamma(s,t)\) at points \(s\) and \(t\), ideally we would use the (unknown, continuous) curves evaluated at these points, hence the ideal estimators given \(N\) curves would be \[\begin{align*} \widehat \mu(t)&=\frac{1}{N}\sum_{i=1}^N X^{(i)}(t)\\ \widehat \Gamma(s,t)&=\frac{1}{N}\sum_{i=1}^N \left(X^{(i)}(s)-\mu(s)\right)\left(X^{(i)}(t)-\mu(t)\right) \end{align*}\]

  • Of course, these are infeasible as we don’t observe the true curves, we observe \(M_i\) noisy sample pairs for each curve \(X^{(i)}\) measured at random points

  • That is, we observe \(Y^{(i)}_m=X^{(i)}(T_m^{(i)})+\varepsilon^{(i)}_m\) at discrete irregularly spaced \(T_m^{(i)}\), i.e., we observe vectors of pairs \((Y^{(i)}_m,T_m^{(i)})\) of length \(M_i\), \(i=1,\dots,N\)

Estimation of \(\mu(t)\) and \(\Gamma(s,t)\) Cont.

  • Our estimates of \(\mu(t)\) and \(\Gamma(s,t)\), like those of \(X^{(i)}\), are local in nature and adapt to \(H_t\) and \(L_t\)

  • It can be shown that to estimate \(\mu(t)\) and \(\Gamma(s,t)\) at points \(s\) and \(t\) with local Hölder exponent \(H_t\) and local Hölder constant \(L_t\), the MSE-optimal bandwidths are given by \[\begin{align*} h^*_{t,\mu} &= \left[ \frac{\sigma_t^2 \int K^2(u)du }{2H_t L_t^2\times \int |u|^{2H_t}|K(u)|du\times f_T(t)}\times \frac{1}{N\bar M_i} \right]^{\frac{1}{2H_t+1}},\\ h^*_{t,\Gamma} &= \left[ \frac{\sigma_t^2 \int K^2(u)du }{4H_t L_t^2\times \int |u|^{2H_t}|K(u)|du\times f_T(t)}\times \frac{1}{N\bar M^2_i} \right]^{\frac{1}{2H_t+2}} %% Note this is the "sparse" Gamma bandwidth when N exceeds M (don't want to clutter with min of 2 bandwidths) \end{align*}\]

  • Using estimates of \(H_t\), \(L_t\), \(\sigma_t\) and \(f_T(t)\) from the batch of \(N\) curves, we smooth \(X^{(i)}(s)\) and \(X^{(i)}(t)\) using \(\widehat h_{t,\mu}\) and \(\widehat h_{t,\Gamma}\) to construct \(\widehat\mu(t)\) and \(\widehat\Gamma(s,t)\) (here we use, e.g., \(\widehat X^{(i)}(t) = \sum_{m=1}^{M_i} W_{m}^{(i)}(t;\widehat h_{t,\Gamma}) Y^{(i)}_m\))

Example: Estimation of \(\Gamma(s,t)\)

Figure 14: Berkeley Growth

Example: Estimation of \(\Gamma(s,t)\)

Figure 15: Canadian Weather

Online Recursive Updating

Online Recursive Updating of Local Hölder Exponent

  • Given observations from a new online curve \(X^{(i+1)}\), let the local bandwidth \(\widehat h_{t,HL}\) be that based on the estimates of \(H_t\) and \(L_t\) for recursion \(i\) (call this \(\widehat h_{t,HL}^{(i)}\))

  • Let \(\gamma_{i+1}=(\sum_{j=1}^iM_j)/(\sum_{j=1}^iM_j+M_{i+1})\) (which equals \(i/(i+1)\) if \(M_1=\dots=M_{i+1}\)), and let \(\widehat{\theta}_i(u,v)=\frac{1}{i}\sum_{j = 1}^i\left\{\widehat X^{(j)}(u)-\widehat X^{(j)}(v)\right\}^2\)

  • The recursively updated estimator of \(\theta(u,v)\) using \(\widehat h_{t,HL}^{(i)}\) is given by \[\begin{equation*} \widehat\theta_{i+1}(u,v) = \gamma_{i+1} \widehat\theta_{i}(u,v)+ (1-\gamma_{i+1}) \left\{\widehat{X}^{(i+1)}(u) - \widehat{X}^{(i+1)}(v)\right\}^2 \end{equation*}\]

  • The recursively updated estimators of \(H_t\) and \(L_t\) are then obtained via \[\begin{equation*} \widehat H_{t,i+1} = \frac{\log\big( \widehat\theta_{i+1}(t_1,t_3)\big) - \log \big(\widehat\theta_{i+1}(t_1,t_2)\big)}{2\log (2)},\quad\widehat L_{t,i+1} = \frac{\sqrt{\widehat\theta_{i+1}(t_1,t_3)}}{|t_1-t_3|^{\widehat H_{t,i+1}}} \end{equation*}\]

Online Recursive Updating of Local Hölder Exponent

  • Updated estimators of \(\sigma(t)\) and \(f_T(t)\) are recursively computable

  • The recursively updated estimators of \(h_{t,HL}\), \(h_{t,\mu}\), \(h_{t,\Gamma}\) can then be computed (recall they depend on \(H_t\), \(L_t\), \(\sigma(t)\), and \(f_T(t)\)), and call these \(\widehat h_{t,HL}^{(i+1)}\), \(\widehat h_{t,\mu}^{(i+1)}\), and \(\widehat h_{t,\Gamma}^{(i+1)}\)

  • Using \(\widehat h_{t,\mu}^{(i+1)}\) or \(\widehat h_{t,\Gamma}^{(i+1)}\) we can estimate \(X^{(i+1)}(t)\) and recursively update the estimators of \(\mu(t)\) or \(\Gamma(s,t)\), again using \(\gamma_{i+1}\) and Robbins and Monro (1951) (\(\widehat h_{t,HL}^{(i+1)}\) will start the next recursion for \(X^{(i+2)}(t)\), etc.)

  • The “memory footprint” is determined by the grid \(\mathcal T_0\subset [0,1]\) (e.g., 100 equidistant points) since we construct estimates of \(\mu(\cdot)\), \(\sigma(\cdot)\), \(f_T(\cdot)\) on \(\mathcal T_0\) and estimates of \(\Gamma(\cdot,\cdot)\) on \(\mathcal T_0 \times \mathcal T_0\)

  • Thus, our procedures require only that we retain and update vectors and matrices of length \(\mathcal T_0\) and dimension \(\mathcal T_0 \times \mathcal T_0\)

Online Recursive Updating Clip

Summary and Appendices

Summary

  • Though this project has a lot of moving parts, we demonstrate that the approach can deliver consistent computationally feasible FDA

  • Most importantly, our method is data-driven and locally adaptive to the regularity of the stochastic process

  • We support both batch estimation and online updating using computationally attractive approaches

  • Though not mentioned explicitly so far, the pointwise asymptotics for the individual curve, mean curve, and covariance curve estimates are established and can be used to construct interval estimates etc.

  • R code exists and we expect to be releasing this publicly in the near future

Appendix A: Resources

Appendix B: MfBm Gaussian Processes

  • A Multifractional Brownian Motion (MfBm, Peltier and Lévy Véhel 1995), say \((W(t))_{t\geq 0}\), with Hurst index function \(H_t \in(0,1)\), is a centred Gaussian process with covariance function \[\begin{equation*} C(s,t) = \mathbb{E}\left[W(s)W(t)\right] = D(H_s,H_t )\left[ s^{H_s+H_t} + t^{H_s+H_t} - |t-s|^{H_s+H_t}\right],\, s, t\geq 0, \end{equation*}\] where \[\begin{equation*} D(x,y)=\frac{\sqrt{\Gamma (2x+1)\Gamma (2y+1)\sin(\pi x)\sin(\pi y)}} {2\Gamma (x+y+1)\sin(\pi(x+y)/2)}, \, D(x,x) = 1/2,\, x,y >0 \end{equation*}\]

  • Define the Hurst index function given \(0 <\underline H \leq \overline H <1\), a change point \(t_c \in (0,1)\), and a slope \(S>0\), by \[\begin{equation*} H_t = \underline H + \frac{\overline H - \underline H}{1+\exp(- S (t-t_c))}, \qquad t\in [0,1] \end{equation*}\]

Appendix B: MfBm Gaussian Processes Cont.

  • The MfBm data are then generated as follows (Chan and Wood 1998)

    • for each \(i\), an integer \(M_i\) is generated as a realization of some random variable with mean \(\mathfrak m\) (e.g., Poisson), or \(M_i\) could be a constant

    • next, generate \(M_i\) independent draws \(T_1^{(i)},\ldots,T_{M_i}^{(i)}\) from a uniform random variable on \([0,1]\)

    • using the covariance formula above, the \(M_i\times M_i\) matrix \(C^{(i)}\) with the entries \[\begin{equation*} C^{(i)}(T_m^{(i)}, T_{m^\prime}^{(i)}),\quad 1\leq m,m^\prime \leq M_i, \end{equation*}\] is computed

  • The \(M_i\)-dimensional vector with components \(X^{(i)}(T_n^{(i)})\), \(1\leq m \leq M_i\) is the realization of a zero mean Gaussian distribution with covariance matrix \(C^{(i)}\)

  • While the increments for Brownian Motion (i.e., when \(H_t=1/2\)) are stationary and independent, increments for MfBm are neither

Appendix C: (Non-)Smooth, (Ir)Regular

  • We use the terms “smooth” and “regular”, or their opposites “non-smooth” and “irregular” so we provide some background that might be helpful

  • We use “smoothness” of a function in the standard sense, i.e.,

    • smoothness is measured by the number of continuous derivatives over some domain (called the “differentiability class”)

    • at minimum, a function is considered smooth if it is “differentiable everywhere” (hence continuous)

    • at the other extreme, if it also possesses continuous derivatives of all orders it is said to be “infinitely differentiable” and is often referred to as a “\(C^{{\infty }}\) function”

  • Thus, “non-smooth” functions are not differentiable everywhere, and in the extreme may be “nowhere differentiable”

Appendix C: (Non-)Smooth, (Ir)Regular Cont.

  • We use the term “irregular” to refer to non-smooth functions whose “Hölder continuity exponent” varies over some domain

  • We have in mind Hölder continuity where the “Hölder exponent” \(H\) defines the regularity of the function (also called its “Hurst” index)

  • But, in addition, we have in mind that such regularity might vary over \(t\) hence we write \(H_t\) and call this the “local Hölder exponent”

  • The Hölder continuity condition you may be familiar with is \[|X_u-X_v|\le L|u-v|^H,\quad 0<L<\infty,\quad 0<H< 1\]

  • So taking the square we have \[|X_u-X_v|^2\le L^2|u-v|^{2H}\]

Appendix C: (Non-)Smooth, (Ir)Regular Cont.

  • Now, \(X\) is just a realization of a stochastic process, and the condition is that we put an expectation over all realizations, and since we want to conduct statistical analysis we can’t use \(\le\), we instead must use \(\approx\)

  • You need equality in order to get estimates of \(H\) and \(L\) (if we don’t have equality, forget it), but the good news is that almost all the processes you find in probability texts do indeed satisfy the almost equal condition we use (e.g., derivatives, squares, log(1+…) for Gaussian processes do satisfy this with equality), so in fact there is no loss in generality and this is not restrictive at all

  • Furthermore, we adopt an augmented Hölder continuity condition and allow \(H_t\) and \(L_t\) to vary on \(t\in[0,1]\), thus we work with \[\mathbb{E}\left[(X_u-X_v)^2\right]\approx L_t^2|u-v|^{2H_t},\quad u,v\text{ in some neighborhood of } t\]

Appendix D: Pointwise Local Curve Properties

  • The Hölder class MSE of one kernel smoothed curve at a point \(t\), where \(M_i\) is the number of observations for the \(i\)th curve, is given by \[MSE_i\leq C_1(t)h^{2H_t} + \frac{C_2(t)}{M_i f_T(t) h}\]

  • The constants in the above formula are \(C_1(t)=L_t^2\int |u|^{2H_t}|K(u)|du\) and \(C_2(t)=\sigma^2_t \int K^2(u)du\), where \(K(u)\) is the kernel function

  • Given this, the MSE-optimal bandwidth for curve \(i\) at point \(t\) is \[h^*_t=\left[\frac{C_2(t)}{2H_tC_1(t)M_i f_T(t)}\right]^{\frac{1}{2H_t+1}}\]

  • Using the Epanechnikov kernel given by \(\frac{3}{4}(1-u^2)\) on \([-1,1]\), it can be shown that \(C_1(t)=\frac{3L_t^2}{[2H_T+1][2H_t+3]}\) and \(C_2(t)=\frac{3}{5}\sigma^2_t\)

Appendix E: Estimation of \(\sigma^2_t\)

  • To construct a consistent estimator of \(\sigma^2_t\) appearing in the MSE-optimal bandwidth formula, we use the following expression: \[\begin{equation*} \widehat\sigma^2_t=\frac{1}{2N}\sum_{i=1}^N\left(Y^{(i)}_{m(t)}-Y^{(i)}_{m(t-1)}\right)^2 \end{equation*}\]

  • To obtain this expression, let \(m(t)\) denote the order statistic of discrete sample point \(m\) indexed at \(t\) (i.e., \(T^{(i)}_{m(t)}\) is the closest sample design point to \(t\), \(T^{(i)}_{m(t-1)}\) the second closest, etc.)

  • Recalling that \(\varepsilon^{(i)}_m = \sigma(T^{(i)}_m) e^{(i)}_m\), we can express \((Y^{(i)}_{m(t)}-Y^{(i)}_{m(t-1)})\) in the expression above as follows: \[\begin{align*} Y^{(i)}_{m(t)}-Y^{(i)}_{m(t-1)}&=X^{(i)}(T^{(i)}_{m(t)})+\varepsilon^{(i)}_{m(t)}-X^{(i)}(T^{(i)}_{m(t-1)})-\varepsilon^{(i)}_{m(t-1)}\\ &=\left(X^{(i)}(T^{(i)}_{m(t)})-X^{(i)}(T^{(i)}_{m(t-1)})\right)+\left(\varepsilon^{(i)}_{m(t)}-\varepsilon^{(i)}_{m(t-1)}\right) \end{align*}\]

Appendix E: Estimation of \(\sigma^2_t\) Cont.

  • Given continuity of \(X^{(i)}(T^{(i)}_{m(t)})\), the first term on line 2 is negligible

  • Given this, and the independence of the \(e^{(i)}_{m(t)}\), we have \[\begin{align*} \mathbb{E}\left(Y^{(i)}_{m(t)}-Y^{(i)}_{m(t-1)}\right)^2 &\approx\mathbb{E}\left(\varepsilon^{(i)}_{m(t)}-\varepsilon^{(i)}_{m(t-1)}\right)^2=2\sigma^2(t), \end{align*}\] which leads to the expression \(\widehat\sigma^2_t=\frac{1}{2N}\sum_{i=1}^N\left(Y^{(i)}_{m(t)}-Y^{(i)}_{m(t-1)}\right)^2\)

  • A similar approach has been used in a classical homoscedastic nonparametric setting (Horowitz and Spokoiny 2001, Equation (2.9))

  • We modify this for use in an FDA setting, and by exploiting “replication” (i.e., by averaging vertically over all curves at point \(t\)) we obtain a simple method for computing the measurement noise variance that allows for heteroscedasticity of unknown form

References (scrollable)

Belhakem, Ryad, Franck Picard, Vincent Rivoirard, and Angelina Roche. 2021. “Minimax Estimation of Functional Principal Components from Noisy Discretized Functional Data.” https://doi.org/https://doi.org/10.48550/arXiv.2110.12739.
Cai, T. Tony, Mark Low, and Zongming Ma. 2014. “Adaptive Confidence Bands for Nonparametric Regression Functions.” Journal of the American Statistical Association 109 (507): 1054–70.
Cai, T. Tony, and Ming Yuan. 2011. Optimal estimation of the mean function based on discretely sampled functional data: Phase transition.” The Annals of Statistics 39 (5): 2330–55. https://doi.org/10.1214/11-AOS898.
Chan, Grace, and Andrew T. A. Wood. 1998. “Simulation of Multifractional Brownian Motion.” In COMPSTAT, edited by Roger Payne and Peter Green, 233–38. Heidelberg: Physica-Verlag HD.
Febrero-Bande, Manuel, and Manuel Oviedo de la Fuente. 2012. “Statistical Computing in Functional Data Analysis: The R Package fda.usc.” Journal of Statistical Software 51 (4): 1–28. https://www.jstatsoft.org/v51/i04/.
Hall, Peter, and Joel L. Horowitz. 2007. “Methodology and Convergence Rates for Functional Linear Regression.” The Annals of Statistics 35 (1): 70–91.
Horowitz, Joel L., and Vladimir G. Spokoiny. 2001. “An Adaptive, Rate-Optimal Test of a Parametric Mean-Regression Model Against a Nonparametric Alternative.” Econometrica 69 (3): 599–631.
Horváth, Lajos, and Piotr Kokoszka. 2012. Inference for Functional Data with Applications. Vol. 200. Springer Series in Statistics. https://doi.org/https://doi.org/10.1007/978-1-4614-3655-3.
Kokoszka, P., and M. Reimherr. 2017. 1st ed. Chapman; Hall/CRC. https://doi.org/https://doi.org/10.1201/9781315117416.
Lepski, O. V., E. Mammen, and V. G. Spokoiny. 1997. Optimal spatial adaptation to inhomogeneous smoothness: an approach based on kernel estimates with variable bandwidth selectors.” The Annals of Statistics 25 (3): 929–47. https://doi.org/10.1214/aos/1069362731.
Levitin, Daniel J., Regina L. Nuzzo, Bradley W. Vines, and James O. Ramsay. 2007. “Introduction to Functional Data Analysis.” Canadian Psychology 48: 135–55.
Padilla-Segarra, Adrián, Mabel González-Villacorte, Isidro R. Amaro, and Saba Infante. 2020. “Brief Review of Functional Data Analysis: A Case Study on Regional Demographic and Economic Data.” In Information and Communication Technologies, edited by Germania Rodriguez Morales, Efraín R. Fonseca C., Juan Pablo Salgado, Pablo Pérez-Gosende, Marcos Orellana Cordero, and Santiago Berrezueta, 163–76. Cham: Springer International Publishing.
Peltier, Romain-François, and Jacques Lévy Véhel. 1995. Multifractional Brownian Motion : Definition and Preliminary Results.” Research Report RR-2645. INRIA. https://hal.inria.fr/inria-00074045.
Ramsay, J. O., Spencer Graves, and Giles Hooker. 2022. Fda: Functional Data Analysis. https://CRAN.R-project.org/package=fda.
Ramsay, J. O., and B. W. Silverman. 2005. Functional Data Analysis. 2nd ed. New York: Springer-Verlag.
Reiss, Philip T., Jeff Goldsmith, Han Lin Shang, and R. Todd Ogden. 2017. “Methods for Scalar-on-Function Regression.” International Statistical Review / Revue Internationale de Statistique 85 (2): 228–49. http://www.jstor.org/stable/44840886.
Robbins, Herbert, and Sutton Monro. 1951. A Stochastic Approximation Method.” The Annals of Mathematical Statistics 22 (3): 400–407. https://doi.org/10.1214/aoms/1177729586.
Tian, Renfang. 2020. “On Functional Data Analysis: Methodologies and Applications.” PhD thesis, UWSpace. http://hdl.handle.net/10012/15811.
Wang, Jane-Ling, Jeng-Min Chiou, and Hans-Georg Müller. 2016. “Functional Data Analysis.” Annual Review of Statistics and Its Application 3 (1): 257–95. https://doi.org/10.1146/annurev-statistics-041715-033624.
Zhou, Yidong, Satarupa Bhattacharjee, Cody Carroll, Yaqing Chen, Xiongtao Dai, Jianing Fan, Alvaro Gajardo, et al. 2022. Fdapace: Functional Data Analysis and Empirical Dynamics. https://CRAN.R-project.org/package=fdapace.