Locally Adaptive Online
Functional Data Analysis

Valentin Patilea\(^\dagger\)

ENSAI & CREST\(^\dagger\), valentin.patilea@ensai.fr

Jeffrey S. Racine\(^\ddagger\)

McMaster University\(^\ddagger\), racinej@mcmaster.ca

Tuesday, June 25, 2024

Slide Pro-Tips

Link to slides - jeffreyracine.github.io/Braga (case sensitive, Google Translate)
Link to paper - https://ideas.repec.org/p/mcm/deptwp/2024-04.html
View full screen by pressing the F key (press the Esc key to revert)
Access navigation menu by pressing the M key (click X in navigation menu to close)
Advance using arrow keys
Zoom in by holding down the Alt key in MS Windows, Opt key in macOS or Ctrl key in Linux, and clicking on any screen element (Alt/Opt/Ctrl click again to zoom out)
Export to a PDF by pressing the E key (wait a few seconds, then print [or print using system dialog], enable landscape layout, then save as PDF - press the E key to revert)
Enable drawing tools - chalk board by pressing the B key (B to revert), notes canvas by pressing the C key (C to revert), press the Del key to erase, press the D key to download drawings

Abstract

One drawback with classical smoothing methods (kernels, splines, wavelets etc.) is their reliance on assuming the degree of smoothness (and thereby assuming continuous differentiability up to some order) for the underlying object being estimated. However, the underlying object may in fact be irregular (i.e., non-smooth and even perhaps nowhere differentiable) and, as well, the (ir)regularity of the underlying function may vary across its support. Elaborate adaptive methods for curve estimation have been proposed, however, their intrinsic complexity presents a formidable and perhaps even insurmountable barrier to their widespread adoption by practitioners. We contribute to the functional data literature by providing a pointwise MSE-optimal, data-driven, iterative plug-in estimator of “local regularity” and a computationally attractive, recursive, online updating method. In so doing we are able to separate measurement error “noise” from “irregularity” thanks to “replication”, a hallmark of functional data. Our results open the door for the construction of minimax optimal rates, “honest” confidence intervals, and the like, for various quantities of interest.

Outline of Talk

Modern data is often functional in nature (e.g., an electrocardiogram (ECG) and many other measures recorded by wearable devices)
The analysis of functional data requires nonparametric methods
However, nonparametric methods rely on smoothness assumptions (i.e., require you to assume something we don’t know)
We show how we can learn the degree of (non)smoothness in functional data settings and we separate this from measurement noise (this cannot be done with classical data)
This allows us to conduct functional data analysis that is optimal (we do this in a statistical framework)
We emphasize online computation (i.e., how to update when new functional data becomes available)

Classical Versus Functional Data

A defining feature of classical regression analysis is that
- sample elements are random pairs, \((y_i,x_i)\)
- the function of interest \(\mathbb{E}(Y|X=x)\) is non-random
A defining feature of functional data analysis is that
- sample elements are random functions, \(X^{(i)}\)
- these are also functions of interest
The following figure (Figure 1) presents \(N=25\) sample elements (classical left plot, functional right plot)

Classical Versus Functional Data

Figure 1: Classical Versus Functional Sample Elements (\(N=25\))

Functional Data

FDA is the statistical analysis of samples of curves (i.e., samples of random variables taking values in spaces of functions)
FDA has heterogeneous, longitudinal aspects (“individual trajectories”)
Curves are continuums, so we never know the curve values at all points
- The curves are only available at discrete points (e.g., \((Y^{(i)}_m , T^{(i)}_m) \in\mathbb R \times [0,1]\))
- The points at which curves are available can differ across curves
- The curves may be measured with error
Consider measurements taken from 1 random curve:
- Figure 2 is measured without error from an irregular curve
- Figure 3 is measured with error from a smooth curve
- Figure 4 displays varying (ir)regularity and measurement noise

FDA Sample Element (1 Random Function)

Figure 2: Irregular Function, Data Measured Without Error

FDA Sample Element (1 Random Function)

Figure 3: Regular Function, Noisy Data

FDA Sample Element (1 Random Function)

Figure 4: Irregular Function, Varying Regularity, Noisy Data

FDA Sample Elements (Random Functions)

Figure 5: Berkeley Growth Study Data

FDA Sample Elements (Random Functions)

Figure 6: Canadian Weather Study Data

Functional Data Setting

Functional Data

Functional data carry information along the curves and among the curves
Consider a second-order stochastic process with continuous trajectories, \(X = (X_t : t\in [0,1])\)
The mean and covariance functions are \[\begin{equation*} \mu(t) = \mathbb{E}(X_t)\text{ and } \Gamma (s,t) = \mathbb{E}\left\{ [X_s - \mu(s)] [X_t-\mu(t)]\right\},\, s,t\in [0,1] \end{equation*}\]
The framework we consider is one where independent sample path realizations \(X^{(i)}\), \(i=1,2\ldots,N\), of \(X\) are measured with error at discrete times
The data associated with the \(i\)th sample path \(X^{(i)}\) consists of the pairs \((Y^{(i)}_m , T^{(i)}_m) \in\mathbb R \times [0,1]\) generated as \[\begin{equation*} Y^{(i)}_m = X^{(i)}(T^{(i)}_m) + \varepsilon^{(i)}_m, \qquad 1\leq m \leq M_i \end{equation*}\]

Measurement Errors and Design

The \(\varepsilon^{(i)}_m\) are measurement errors, and we allow \[\begin{equation*} \varepsilon^{(i)}_m = \sigma(T^{(i)}_m) e^{(i)}_m, \quad 1\leq m \leq M_i \end{equation*}\]
The \(e^{(i)}_m\) are independent copies of a centred variable \(e\) with unit variance, and \(\sigma(T^{(i)}_m)\) is some unknown bounded function which accounts for possibly heteroscedastic measurement errors
Our approach applies to both independent design and common design cases
Relative to the number of curves, \(N\), the number of points per curve, \(M_i\), may be small (“sparse”) or large (“dense”)

Replication

One key distinguishing feature of FDA is that of “replication” (i.e., common structure among curves)
Essentially, there is prior information in the \(N-1\) sample curves that can be exploited to learn about the \(N\)th, which is not available in, say, classical regression analysis
This common structure can be exploited for a variety of purposes
For instance, it will allow us to obtain estimates of the regularity of the curves that may vary across their domain \(t\in[0,1]\) (i.e., local regularity estimates)
This would not be possible in the classical nonparametric setting where we are restricted to a single curve only

Local Regularity

Q Can you provide a definition of “irregular function”?

“It’s our meaning.”
“If the function is non-differentiable, and it’s regularity (in the sense of Hölder continuity, where the exponent gives the regularity) can vary, we call that situation an irregular one”
“I do not have a definition of irregular, because”regular” is too vague.”
“It could be differentiable, Lipschitz continuous, Hölder continuous, analytic, etc, etc”
“Irregular is perhaps inappropriate for saying that the curves are not differentiable. Non-smooth is better.”
“But here we have the other aspect: the Hölder exponent H could vary. This means that we allow different degrees of non smoothness in different points. I think we may also call such process irregular. We then have to look at the local non-smoothness, and this is what we do because H is estimated in any point t.”
“So in some sense it is both non smooth and irregular.”

Overview

“Smoothness […] such as the existence of continuous second derivatives, is often imposed for regularization and is especially useful if nonparametric smoothing techniques are employed, as is prevalent in FDA” (Wang, Chiou, and Müller 2016)
This is problematic since imposing an unknown (and global) degree of smoothness may be incompatible with the underlying stochastic process
A key feature of our approach is its data-driven locally adaptive nature
We consider a meaningful regularity concept for the data generating process based on probability theory
We propose simple estimates for local regularity and link process regularity to sample path regularity

“[w]e make the assumption that the underlying process generating the data is smooth. The observed data are subject to measurement error that may mask this smoothness” (Levitin et al. 2007, pg. 137-138)
“The assumption that a certain number of derivatives exist has been used in most of the analyses that we have considered. In this way we stabilize estimated principal components, regression functions, monotone transformations, canonical weight functions, and linear differential operators.

Are there more general concepts of regularity that would aid FDA?” (Ramsay and Silverman 2005, pg. 380)
minimax optimal rates for mean and covariance functions, (Lepski, Mammen, and Spokoiny 1997; Cai and Yuan 2011), Cai & Yuam (2010, 11, 12) (see minute 23 of zoom video feb 28 2023)
An estimator (estimation rule) is called minimax if its maximal risk is minimal among all estimators. In a sense this means that it is an estimator which performs best in the worst possible case allowed in the problem
adaptive confidence bands for nonparametric regression (Cai, Low, and Ma 2014) “Ideally, an adaptive confidence band should have its size automatically adjusted to the smoothness of the underlying function, while maintaining a prespecified coverage probability. However as we shall show such a goal is impossible even for Lipschitz function classes and hence a new framework for investigating adaptive confidence bands is needed.”
decrease of the eigenvalues of the covariance operator, which impacts optimal rates for the regression, for instance (Belhakem et al. 2021; Hall and Horowitz 2007) (people say “suppose we know the rate of decrease of the eigenvalues of the covariance matrix operator of the covariates, \(\alpha\), which are functional in this case, but this rate is exactly \(2H+1\), so if we know \(H\) we know the rate - it is like saying we know there are two derivatives, we know the \(\alpha\)… how do they know that? They don’t!”) (minute 30 in the zoom video)

Definition

A key element of our approach is “local regularity”, which here is the largest order fractional derivative admitted by the sample paths of \(X\) as measured by the value of \(H_t\), the “local Hölder exponent”, which may vary with \(t\)
More precisely, here “local regularity” is the largest value \(H_t\) for which, uniformly with respect to \(u\) and \(v\) in a neighborhood of \(t\), the second order moment of \((X_u-X_v)/|u-v|^{H_t}\) is finite
We can then assume \[\begin{equation*} \mathbb{E}\left[(X_u-X_v)^2\right]\approx L_t^2|u-v|^{2H_t} \end{equation*}\] when \(u\) and \(v\) lie in a neighborhood of \(t\)
If a function is smooth (i.e., continuously differentiable), then \(H_t=1\), otherwise the function is non-smooth with \(0<H_t<1\)
If a function is a constant function then \(L_t=0\), otherwise \(L_t>0\)

Definition

Let \([t-\Delta_*/2, t + \Delta_*/2] \cap [0,1]\), and define \[\begin{align*} \theta(u,v) &= \mathbb{E}\left[ (X_u-X_v)^{2} \right], \quad\text{ hence }\\ \theta(u,v) &\approx L_t^2 |u-v|^{2H_t} \quad \text{if } |u-v| \text{ is small and close to }t \end{align*}\]
Letting \(\Delta_* =2^{-1}e^{-\log(\bar M_i)^{1/3}}>0\), \(t_1=t-\Delta_*/2\), \(t_3= t + \Delta_*/2\), and \(t_2=(t_1+t_3)/2\) (the definition of \(t_1\) and \(t_3\) is adjusted in boundary regions), then we show that \[\begin{equation*} H_t \approx \frac{\log(\theta(t_1,t_3)) - \log(\theta(t_1,t_2))}{2\log(2)} \quad \text{if } |t_3-t_1| \text{ is small} \end{equation*}\]
Moreover, \[\begin{equation*} L_t \approx \frac{\sqrt{\theta(t_1,t_3)}}{|t_1-t_3|^{H_t} } \quad \text{if } |t_3-t_1| \text{ is small} \end{equation*}\]

Estimation

The idea is to estimate \(\theta(t_1,t_3)\) and \(\theta(t_1,t_2)\) averaging vertically over curves
Given estimates \(\widehat\theta(t_1,t_3)\) and \(\widehat\theta(t_1,t_2)\), the estimators of \(H_{t}\) and \(L_t\) are given by \[\begin{equation*} \widehat H_t = \frac{\log(\widehat\theta(t_1,t_3)) - \log(\widehat\theta(t_1,t_2))}{2\log(2)},\quad \widehat L_t = \frac{\sqrt{\widehat\theta(t_1,t_3)}}{|t_1-t_3|^{\widehat H_t} } \end{equation*}\]
The estimator \(\widehat{\theta}(t_l,t_j)\) is the average of local curve smoothers, i.e., \[\begin{equation*} \widehat{\theta}(t_l,t_j)=\frac{1}{N}\sum_{i = 1}^N\left(\widetilde X^{(i)}(t_l)-\widetilde X^{(i)}(t_j)\right)^2 \end{equation*}\]
The smoother \(\widetilde X^{(i)}(t)\) depends on a bandwidth \(h_t\) that, post-iteration, adapts to the local regularity of the underlying process

Assumptions

The stochastic process \(X\) is a random function taking values in \(L^2 (\mathcal T)\), with \(\mathbb E (\| X\|^2) <\infty\)
The process is not deterministic with all sample paths equal to a common path
The increments of the process have any moment, and the distributions of the increments are sub-Gaussian
The functions \(X^{(i)}(t)\) may be nowhere differentiable
The process \(X\) may be non-stationary with non-stationary increments
The measurement errors \(\varepsilon^{(i)}\) may be heteroscedastic
The mean function \(\mu(t)\) may be smoother than the \(X^{(i)}(t)\) functions
\(0<L_t<\infty\) and \(0<H_t<1\)

Estimation of Local Regularity

Methodology

We take the optimal bandwidth expression for \(h_t\) (which depends on \(H_t\) and \(L_t\)) that minimizes pointwise MSE using a general squared bias term (not the usual term one gets assuming twice differentiable curves)
Then, given an initial batch of \(N\) curves, we estimate \(H_t\) and \(L_t\) for each \(t\in \mathcal T_0\), which involves an iterative plug-in procedure:
1. begin with some starting values for the local bandwidths \(h_t\)
2. construct preliminary estimates of each curve for every \(t\in \mathcal T_0\) using the data pairs \((Y^{(i)}_m , T^{(i)}_m)\) and local bandwidth starting values
3. use these preliminary curve estimates to get starting values for \(H_t\) and \(L_t\) for every \(t\in \mathcal T_0\), and plug these into the optimal bandwidth expression
4. repeat 1-3 using the updated plug-in bandwidths; continue iterating \(H_T\), and \(L_t\) and \(h_t\) for every \(t\in \mathcal T_0\) until the procedure stabilizes (this occurs quite quickly, typically after 10 or so iterations)

Details

To estimate the \(i\)th curve at a point \(t\) with local Hölder exponent \(H_t\) and local Hölder constant \(L_t\), the MSE-optimal bandwidth \(h^*_{t,HL}\) is \[\begin{equation*} h^*_{t,HL} = \left[ \frac{\sigma_t^2 \int K^2(u)du }{2H_t L_t^2\times \int |u|^{2H_t}|K(u)|du\times f_T(t)}\times \frac{1}{\bar M_i} \right]^{\frac{1}{2H_t+1}} \end{equation*}\]
The kernel function \(K(u)\) is provided by the user hence \(\int K^2(u)du\) and \(\int |u|^{2H_t}|K(u)|du\) can be computed given \(H_t\)
\(\sigma_t^2\) is estimated using one-half the squared differences of the two closest \(Y^{(i)}\) observations at \(t\), averaged across all curves
The design density \(f_T(t)\) is straightforward to estimate
We estimate \(H_t\) and \(L_t\) for some batch of \(N\) curves as outlined on the previous slide, then we recursively update them as online data arrives

MfBm Example (independent design)

Figure 7: Multifractional Brownian Motion - True Curves

MfBm Example (independent design)

Figure 8: Multifractional Brownian Motion - Data

MfBm Example (independent design)

Figure 9: Multifractional Brownian Motion - Bandwidth Iteration

MfBm Example (independent design)

Figure 10: Multifractional Brownian Motion - Estimated and True Regularity

Berkeley Growth Study Example (common design)

Figure 11: Berkeley Growth Example - Estimated Regularity

Canadian Weather Study Example (common design)

Figure 12: Canadian Weather Example - Estimated Regularity

\(H_t\) Comparison: growth, canWeather, MfBm

Figure 13: Estimated Regularity Comparison

Estimation of \(\mu(t)\) and \(\Gamma(s,t)\)

In order to estimate \(\mu(t)\) at point \(t\) and \(\Gamma(s,t)\) at points \(s\) and \(t\), ideally we would use the (unknown, continuous) curves evaluated at these points, hence the ideal estimators given \(N\) curves would be \[\begin{align*} \widehat \mu(t)&=\frac{1}{N}\sum_{i=1}^N X^{(i)}(t)\\ \widehat \Gamma(s,t)&=\frac{1}{N}\sum_{i=1}^N \left(X^{(i)}(s)-\mu(s)\right)\left(X^{(i)}(t)-\mu(t)\right) \end{align*}\]
Of course, these are infeasible as we don’t observe the true curves, we observe \(M_i\) noisy sample pairs for each curve \(X^{(i)}\) measured at random points
That is, we observe \(Y^{(i)}_m=X^{(i)}(T_m^{(i)})+\varepsilon^{(i)}_m\) at discrete irregularly spaced \(T_m^{(i)}\), i.e., we observe vectors of pairs \((Y^{(i)}_m,T_m^{(i)})\) of length \(M_i\), \(i=1,\dots,N\)

Estimation of \(\mu(t)\) and \(\Gamma(s,t)\) Cont.

Our estimates of \(\mu(t)\) and \(\Gamma(s,t)\), like those of \(X^{(i)}\), are local in nature and adapt to \(H_t\) and \(L_t\)
It can be shown that to estimate \(\mu(t)\) and \(\Gamma(s,t)\) at points \(s\) and \(t\) with local Hölder exponent \(H_t\) and local Hölder constant \(L_t\), the MSE-optimal bandwidths are given by \[\begin{align*} h^*_{t,\mu} &= \left[ \frac{\sigma_t^2 \int K^2(u)du }{2H_t L_t^2\times \int |u|^{2H_t}|K(u)|du\times f_T(t)}\times \frac{1}{N\bar M_i} \right]^{\frac{1}{2H_t+1}},\\ h^*_{t,\Gamma} &= \left[ \frac{\sigma_t^2 \int K^2(u)du }{4H_t L_t^2\times \int |u|^{2H_t}|K(u)|du\times f_T(t)}\times \frac{1}{N\bar M^2_i} \right]^{\frac{1}{2H_t+2}} %% Note this is the "sparse" Gamma bandwidth when N exceeds M (don't want to clutter with min of 2 bandwidths) \end{align*}\]
Using estimates of \(H_t\), \(L_t\), \(\sigma_t\) and \(f_T(t)\) from the batch of \(N\) curves, we smooth \(X^{(i)}(s)\) and \(X^{(i)}(t)\) using \(\widehat h_{t,\mu}\) and \(\widehat h_{t,\Gamma}\) to construct \(\widehat\mu(t)\) and \(\widehat\Gamma(s,t)\) (here we use, e.g., \(\widehat X^{(i)}(t) = \sum_{m=1}^{M_i} W_{m}^{(i)}(t;\widehat h_{t,\Gamma}) Y^{(i)}_m\))

Example: Estimation of \(\Gamma(s,t)\)

Figure 14: Berkeley Growth

Example: Estimation of \(\Gamma(s,t)\)

Figure 15: Canadian Weather

Online Recursive Updating

Online Recursive Updating of Local Hölder Exponent

Given observations from a new online curve \(X^{(i+1)}\), let the local bandwidth \(\widehat h_{t,HL}\) be that based on the estimates of \(H_t\) and \(L_t\) for recursion \(i\) (call this \(\widehat h_{t,HL}^{(i)}\))
Let \(\gamma_{i+1}=(\sum_{j=1}^iM_j)/(\sum_{j=1}^iM_j+M_{i+1})\) (which equals \(i/(i+1)\) if \(M_1=\dots=M_{i+1}\)), and let \(\widehat{\theta}_i(u,v)=\frac{1}{i}\sum_{j = 1}^i\left\{\widehat X^{(j)}(u)-\widehat X^{(j)}(v)\right\}^2\)
The recursively updated estimator of \(\theta(u,v)\) using \(\widehat h_{t,HL}^{(i)}\) is given by \[\begin{equation*} \widehat\theta_{i+1}(u,v) = \gamma_{i+1} \widehat\theta_{i}(u,v)+ (1-\gamma_{i+1}) \left\{\widehat{X}^{(i+1)}(u) - \widehat{X}^{(i+1)}(v)\right\}^2 \end{equation*}\]
The recursively updated estimators of \(H_t\) and \(L_t\) are then obtained via \[\begin{equation*} \widehat H_{t,i+1} = \frac{\log\big( \widehat\theta_{i+1}(t_1,t_3)\big) - \log \big(\widehat\theta_{i+1}(t_1,t_2)\big)}{2\log (2)},\quad\widehat L_{t,i+1} = \frac{\sqrt{\widehat\theta_{i+1}(t_1,t_3)}}{|t_1-t_3|^{\widehat H_{t,i+1}}} \end{equation*}\]

Online Recursive Updating of Local Hölder Exponent

Updated estimators of \(\sigma(t)\) and \(f_T(t)\) are recursively computable
The recursively updated estimators of \(h_{t,HL}\), \(h_{t,\mu}\), \(h_{t,\Gamma}\) can then be computed (recall they depend on \(H_t\), \(L_t\), \(\sigma(t)\), and \(f_T(t)\)), and call these \(\widehat h_{t,HL}^{(i+1)}\), \(\widehat h_{t,\mu}^{(i+1)}\), and \(\widehat h_{t,\Gamma}^{(i+1)}\)
Using \(\widehat h_{t,\mu}^{(i+1)}\) or \(\widehat h_{t,\Gamma}^{(i+1)}\) we can estimate \(X^{(i+1)}(t)\) and recursively update the estimators of \(\mu(t)\) or \(\Gamma(s,t)\), again using \(\gamma_{i+1}\) and Robbins and Monro (1951) (\(\widehat h_{t,HL}^{(i+1)}\) will start the next recursion for \(X^{(i+2)}(t)\), etc.)
The “memory footprint” is determined by the grid \(\mathcal T_0\subset [0,1]\) (e.g., 100 equidistant points) since we construct estimates of \(\mu(\cdot)\), \(\sigma(\cdot)\), \(f_T(\cdot)\) on \(\mathcal T_0\) and estimates of \(\Gamma(\cdot,\cdot)\) on \(\mathcal T_0 \times \mathcal T_0\)
Thus, our procedures require only that we retain and update vectors and matrices of length \(\mathcal T_0\) and dimension \(\mathcal T_0 \times \mathcal T_0\)

Online Recursive Updating Clip

Summary and Appendices

Summary

Though this project has a lot of moving parts, we demonstrate that the approach can deliver consistent computationally feasible FDA
Most importantly, our method is data-driven and locally adaptive to the regularity of the stochastic process
We support both batch estimation and online updating using computationally attractive approaches
Though not mentioned explicitly so far, the pointwise asymptotics for the individual curve, mean curve, and covariance curve estimates are established and can be used to construct interval estimates etc.
R code exists and we expect to be releasing this publicly in the near future

Appendix A: Resources

Books:
- Ramsay and Silverman (2005), Horváth and Kokoszka (2012), Kokoszka and Reimherr (2017)
Review Articles:
- Wang, Chiou, and Müller (2016), Reiss et al. (2017)
R packages:
- fda (Ramsay, Graves, and Hooker 2022), fdapace (Zhou et al. 2022), fda.usc (Febrero-Bande and Oviedo de la Fuente 2012)
FDA CRAN Task View:
- https://cran.r-project.org/view=FunctionalData
Applications in Economics:
- Padilla-Segarra et al. (2020), Tian, Renfang (2020)

Appendix B: MfBm Gaussian Processes

A Multifractional Brownian Motion (MfBm, Peltier and Lévy Véhel 1995), say \((W(t))_{t\geq 0}\), with Hurst index function \(H_t \in(0,1)\), is a centred Gaussian process with covariance function \[\begin{equation*} C(s,t) = \mathbb{E}\left[W(s)W(t)\right] = D(H_s,H_t )\left[ s^{H_s+H_t} + t^{H_s+H_t} - |t-s|^{H_s+H_t}\right],\, s, t\geq 0, \end{equation*}\] where \[\begin{equation*} D(x,y)=\frac{\sqrt{\Gamma (2x+1)\Gamma (2y+1)\sin(\pi x)\sin(\pi y)}} {2\Gamma (x+y+1)\sin(\pi(x+y)/2)}, \, D(x,x) = 1/2,\, x,y >0 \end{equation*}\]
Define the Hurst index function given \(0 <\underline H \leq \overline H <1\), a change point \(t_c \in (0,1)\), and a slope \(S>0\), by \[\begin{equation*} H_t = \underline H + \frac{\overline H - \underline H}{1+\exp(- S (t-t_c))}, \qquad t\in [0,1] \end{equation*}\]

Appendix B: MfBm Gaussian Processes Cont.

The MfBm data are then generated as follows (Chan and Wood 1998)
- for each \(i\), an integer \(M_i\) is generated as a realization of some random variable with mean \(\mathfrak m\) (e.g., Poisson), or \(M_i\) could be a constant
- next, generate \(M_i\) independent draws \(T_1^{(i)},\ldots,T_{M_i}^{(i)}\) from a uniform random variable on \([0,1]\)
- using the covariance formula above, the \(M_i\times M_i\) matrix \(C^{(i)}\) with the entries \[\begin{equation*} C^{(i)}(T_m^{(i)}, T_{m^\prime}^{(i)}),\quad 1\leq m,m^\prime \leq M_i, \end{equation*}\] is computed
The \(M_i\)-dimensional vector with components \(X^{(i)}(T_n^{(i)})\), \(1\leq m \leq M_i\) is the realization of a zero mean Gaussian distribution with covariance matrix \(C^{(i)}\)
While the increments for Brownian Motion (i.e., when \(H_t=1/2\)) are stationary and independent, increments for MfBm are neither

Appendix C: (Non-)Smooth, (Ir)Regular

We use the terms “smooth” and “regular”, or their opposites “non-smooth” and “irregular” so we provide some background that might be helpful
We use “smoothness” of a function in the standard sense, i.e.,
- smoothness is measured by the number of continuous derivatives over some domain (called the “differentiability class”)
- at minimum, a function is considered smooth if it is “differentiable everywhere” (hence continuous)
- at the other extreme, if it also possesses continuous derivatives of all orders it is said to be “infinitely differentiable” and is often referred to as a “\(C^{{\infty }}\) function”
Thus, “non-smooth” functions are not differentiable everywhere, and in the extreme may be “nowhere differentiable”

Appendix C: (Non-)Smooth, (Ir)Regular Cont.

We use the term “irregular” to refer to non-smooth functions whose “Hölder continuity exponent” varies over some domain
We have in mind Hölder continuity where the “Hölder exponent” \(H\) defines the regularity of the function (also called its “Hurst” index)
But, in addition, we have in mind that such regularity might vary over \(t\) hence we write \(H_t\) and call this the “local Hölder exponent”
The Hölder continuity condition you may be familiar with is \[|X_u-X_v|\le L|u-v|^H,\quad 0<L<\infty,\quad 0<H< 1\]
So taking the square we have \[|X_u-X_v|^2\le L^2|u-v|^{2H}\]

Appendix C: (Non-)Smooth, (Ir)Regular Cont.

Now, \(X\) is just a realization of a stochastic process, and the condition is that we put an expectation over all realizations, and since we want to conduct statistical analysis we can’t use \(\le\), we instead must use \(\approx\)
You need equality in order to get estimates of \(H\) and \(L\) (if we don’t have equality, forget it), but the good news is that almost all the processes you find in probability texts do indeed satisfy the almost equal condition we use (e.g., derivatives, squares, log(1+…) for Gaussian processes do satisfy this with equality), so in fact there is no loss in generality and this is not restrictive at all
Furthermore, we adopt an augmented Hölder continuity condition and allow \(H_t\) and \(L_t\) to vary on \(t\in[0,1]\), thus we work with \[\mathbb{E}\left[(X_u-X_v)^2\right]\approx L_t^2|u-v|^{2H_t},\quad u,v\text{ in some neighborhood of } t\]

Here we use the Kolmogorov Continuity Theorem (video Mar 7 around 35 mins)
Let \((S,d)\) be some complete metric space, and let \(X\colon [0, + \infty) \times \Omega \to S\) be a stochastic process. Suppose that for all times \(T > 0\), there exist positive constants \(\alpha, \beta, K\) such that

\[\mathbb{E} [d(X_t, X_s)^\alpha] \leq K | t - s |^{1 + \beta}\]

for all \(0 \leq s, t \leq T\). Then there exists a modification \(\tilde{X}\) of \(X\) that is a continuous process, i.e. a process \(\tilde{X}\colon [0, + \infty) \times \Omega \to S\) such that
- \(\tilde{X}\) is sample-continuous;
- for every time \(t \geq 0\), \(\mathbb{P} (X_t = \tilde{X}_t) = 1.\)
Furthermore, the paths of \(\tilde{X}\) are locally \(\gamma\)-Hölder-continuous for every \(0<\gamma<\tfrac\beta\alpha\)
NB - In our case (set \(\alpha=2\), “we like squares”), \(1+\beta=2H\), and \(K=L^2\)… to obtain equality we let \(L^2\) denote the infimum of \(K\) such that equality holds
NB - our \(L\) controls the signal-to-noise ratio (\(L\)-to-\(\sigma\) ratio). A larger \(L\), other things equal, means the signal is very informative as the amplitude of the oscillations is very large relative to the measurement error (around 56 mins into Mar 7 2023 video)

Appendix D: Pointwise Local Curve Properties

The Hölder class MSE of one kernel smoothed curve at a point \(t\), where \(M_i\) is the number of observations for the \(i\)th curve, is given by \[MSE_i\leq C_1(t)h^{2H_t} + \frac{C_2(t)}{M_i f_T(t) h}\]
The constants in the above formula are \(C_1(t)=L_t^2\int |u|^{2H_t}|K(u)|du\) and \(C_2(t)=\sigma^2_t \int K^2(u)du\), where \(K(u)\) is the kernel function
Given this, the MSE-optimal bandwidth for curve \(i\) at point \(t\) is \[h^*_t=\left[\frac{C_2(t)}{2H_tC_1(t)M_i f_T(t)}\right]^{\frac{1}{2H_t+1}}\]
Using the Epanechnikov kernel given by \(\frac{3}{4}(1-u^2)\) on \([-1,1]\), it can be shown that \(C_1(t)=\frac{3L_t^2}{[2H_T+1][2H_t+3]}\) and \(C_2(t)=\frac{3}{5}\sigma^2_t\)

Appendix E: Estimation of \(\sigma^2_t\)

To construct a consistent estimator of \(\sigma^2_t\) appearing in the MSE-optimal bandwidth formula, we use the following expression: \[\begin{equation*} \widehat\sigma^2_t=\frac{1}{2N}\sum_{i=1}^N\left(Y^{(i)}_{m(t)}-Y^{(i)}_{m(t-1)}\right)^2 \end{equation*}\]
To obtain this expression, let \(m(t)\) denote the order statistic of discrete sample point \(m\) indexed at \(t\) (i.e., \(T^{(i)}_{m(t)}\) is the closest sample design point to \(t\), \(T^{(i)}_{m(t-1)}\) the second closest, etc.)
Recalling that \(\varepsilon^{(i)}_m = \sigma(T^{(i)}_m) e^{(i)}_m\), we can express \((Y^{(i)}_{m(t)}-Y^{(i)}_{m(t-1)})\) in the expression above as follows: \[\begin{align*} Y^{(i)}_{m(t)}-Y^{(i)}_{m(t-1)}&=X^{(i)}(T^{(i)}_{m(t)})+\varepsilon^{(i)}_{m(t)}-X^{(i)}(T^{(i)}_{m(t-1)})-\varepsilon^{(i)}_{m(t-1)}\\ &=\left(X^{(i)}(T^{(i)}_{m(t)})-X^{(i)}(T^{(i)}_{m(t-1)})\right)+\left(\varepsilon^{(i)}_{m(t)}-\varepsilon^{(i)}_{m(t-1)}\right) \end{align*}\]

Appendix E: Estimation of \(\sigma^2_t\) Cont.

Given continuity of \(X^{(i)}(T^{(i)}_{m(t)})\), the first term on line 2 is negligible
Given this, and the independence of the \(e^{(i)}_{m(t)}\), we have \[\begin{align*} \mathbb{E}\left(Y^{(i)}_{m(t)}-Y^{(i)}_{m(t-1)}\right)^2 &\approx\mathbb{E}\left(\varepsilon^{(i)}_{m(t)}-\varepsilon^{(i)}_{m(t-1)}\right)^2=2\sigma^2(t), \end{align*}\] which leads to the expression \(\widehat\sigma^2_t=\frac{1}{2N}\sum_{i=1}^N\left(Y^{(i)}_{m(t)}-Y^{(i)}_{m(t-1)}\right)^2\)
A similar approach has been used in a classical homoscedastic nonparametric setting (Horowitz and Spokoiny 2001, Equation (2.9))
We modify this for use in an FDA setting, and by exploiting “replication” (i.e., by averaging vertically over all curves at point \(t\)) we obtain a simple method for computing the measurement noise variance that allows for heteroscedasticity of unknown form

References (scrollable)

Belhakem, Ryad, Franck Picard, Vincent Rivoirard, and Angelina Roche. 2021. “Minimax Estimation of Functional Principal Components from Noisy Discretized Functional Data.” https://doi.org/https://doi.org/10.48550/arXiv.2110.12739.

Cai, T. Tony, Mark Low, and Zongming Ma. 2014. “Adaptive Confidence Bands for Nonparametric Regression Functions.” Journal of the American Statistical Association 109 (507): 1054–70.

Cai, T. Tony, and Ming Yuan. 2011. “Optimal estimation of the mean function based on discretely sampled functional data: Phase transition.” The Annals of Statistics 39 (5): 2330–55. https://doi.org/10.1214/11-AOS898.

Chan, Grace, and Andrew T. A. Wood. 1998. “Simulation of Multifractional Brownian Motion.” In COMPSTAT, edited by Roger Payne and Peter Green, 233–38. Heidelberg: Physica-Verlag HD.

Febrero-Bande, Manuel, and Manuel Oviedo de la Fuente. 2012. “Statistical Computing in Functional Data Analysis: The R Package fda.usc.” Journal of Statistical Software 51 (4): 1–28. https://www.jstatsoft.org/v51/i04/.

Hall, Peter, and Joel L. Horowitz. 2007. “Methodology and Convergence Rates for Functional Linear Regression.” The Annals of Statistics 35 (1): 70–91.

Horowitz, Joel L., and Vladimir G. Spokoiny. 2001. “An Adaptive, Rate-Optimal Test of a Parametric Mean-Regression Model Against a Nonparametric Alternative.” Econometrica 69 (3): 599–631.

Horváth, Lajos, and Piotr Kokoszka. 2012. Inference for Functional Data with Applications. Vol. 200. Springer Series in Statistics. https://doi.org/https://doi.org/10.1007/978-1-4614-3655-3.

Kokoszka, P., and M. Reimherr. 2017. 1st ed. Chapman; Hall/CRC. https://doi.org/https://doi.org/10.1201/9781315117416.

Lepski, O. V., E. Mammen, and V. G. Spokoiny. 1997. “Optimal spatial adaptation to inhomogeneous smoothness: an approach based on kernel estimates with variable bandwidth selectors.” The Annals of Statistics 25 (3): 929–47. https://doi.org/10.1214/aos/1069362731.

Levitin, Daniel J., Regina L. Nuzzo, Bradley W. Vines, and James O. Ramsay. 2007. “Introduction to Functional Data Analysis.” Canadian Psychology 48: 135–55.

Padilla-Segarra, Adrián, Mabel González-Villacorte, Isidro R. Amaro, and Saba Infante. 2020. “Brief Review of Functional Data Analysis: A Case Study on Regional Demographic and Economic Data.” In Information and Communication Technologies, edited by Germania Rodriguez Morales, Efraín R. Fonseca C., Juan Pablo Salgado, Pablo Pérez-Gosende, Marcos Orellana Cordero, and Santiago Berrezueta, 163–76. Cham: Springer International Publishing.

Peltier, Romain-François, and Jacques Lévy Véhel. 1995. “Multifractional Brownian Motion : Definition and Preliminary Results.” Research Report RR-2645. INRIA. https://hal.inria.fr/inria-00074045.

Ramsay, J. O., Spencer Graves, and Giles Hooker. 2022. Fda: Functional Data Analysis. https://CRAN.R-project.org/package=fda.

Ramsay, J. O., and B. W. Silverman. 2005. Functional Data Analysis. 2nd ed. New York: Springer-Verlag.

Reiss, Philip T., Jeff Goldsmith, Han Lin Shang, and R. Todd Ogden. 2017. “Methods for Scalar-on-Function Regression.” International Statistical Review / Revue Internationale de Statistique 85 (2): 228–49. http://www.jstor.org/stable/44840886.

Robbins, Herbert, and Sutton Monro. 1951. “A Stochastic Approximation Method.” The Annals of Mathematical Statistics 22 (3): 400–407. https://doi.org/10.1214/aoms/1177729586.

Tian, Renfang. 2020. “On Functional Data Analysis: Methodologies and Applications.” PhD thesis, UWSpace. http://hdl.handle.net/10012/15811.

Wang, Jane-Ling, Jeng-Min Chiou, and Hans-Georg Müller. 2016. “Functional Data Analysis.” Annual Review of Statistics and Its Application 3 (1): 257–95. https://doi.org/10.1146/annurev-statistics-041715-033624.

Zhou, Yidong, Satarupa Bhattacharjee, Cody Carroll, Yaqing Chen, Xiongtao Dai, Jianing Fan, Alvaro Gajardo, et al. 2022. Fdapace: Functional Data Analysis and Empirical Dynamics. https://CRAN.R-project.org/package=fdapace.

Locally Adaptive Online Functional Data Analysis

Slide Pro-Tips

Abstract

Outline of Talk

Classical Versus Functional Data

Classical Versus Functional Data

Functional Data

FDA Sample Element (1 Random Function)

FDA Sample Element (1 Random Function)

FDA Sample Element (1 Random Function)

FDA Sample Elements (Random Functions)

FDA Sample Elements (Random Functions)

Functional Data Setting

Functional Data

Measurement Errors and Design

Replication

Local Regularity

Overview

Definition

Definition

Estimation

Assumptions

Assumptions

Estimation of Local Regularity

Methodology

Details

MfBm Example (independent design)

MfBm Example (independent design)

MfBm Example (independent design)

MfBm Example (independent design)

Berkeley Growth Study Example (common design)

Canadian Weather Study Example (common design)

\(H_t\) Comparison: growth, canWeather, MfBm

Estimation of \(\mu(t)\) and \(\Gamma(s,t)\)

Estimation of \(\mu(t)\) and \(\Gamma(s,t)\)

Estimation of \(\mu(t)\) and \(\Gamma(s,t)\) Cont.

Example: Estimation of \(\Gamma(s,t)\)

Example: Estimation of \(\Gamma(s,t)\)

Online Recursive Updating

Online Recursive Updating of Local Hölder Exponent

Online Recursive Updating of Local Hölder Exponent

Online Recursive Updating Clip

Summary and Appendices

Summary

Appendix A: Resources

Appendix B: MfBm Gaussian Processes

Appendix B: MfBm Gaussian Processes Cont.

Appendix C: (Non-)Smooth, (Ir)Regular

Appendix C: (Non-)Smooth, (Ir)Regular Cont.

Appendix C: (Non-)Smooth, (Ir)Regular Cont.

Appendix D: Pointwise Local Curve Properties

Appendix E: Estimation of \(\sigma^2_t\)

Appendix E: Estimation of \(\sigma^2_t\) Cont.

References (scrollable)

Locally Adaptive Online
Functional Data Analysis