Statistics: Bootstrapping

Initialize NeuroAnalyzer

using NeuroAnalyzer
using Statistics
eeg = load("files/eeg.hdf")
e10 = epoch(eeg, ep_len = 10)

Bootstrapping is a resampling method used to estimate the sampling distribution of a statistic by repeatedly sampling with replacement from the observed data. It is widely used in statistical inference, machine learning, and signal processing (e.g., EEG analysis) to:

Estimate confidence intervals,
Assess uncertainty in estimators,
Test hypotheses,
Validate models,
Handle small or non-normal datasets.

Bootstrapping is a non-parametric method that uses resampling to approximate the sampling distribution of a statistic (e.g., mean, median, correlation). It is particularly useful when:

The theoretical distribution of the statistic is unknown,
The sample size is small,
The data is non-normal,
Parametric assumptions (e.g., normality) are violated.

Key Ideas

Instead of relying on theoretical distributions (e.g., normal distribution for means), bootstrapping uses the data itself to estimate uncertainty by:

Resampling the data with replacement,
Computing the statistic for each resample,
Repeating this process many times (e.g., 1000–10,000 iterations),
Using the distribution of bootstrap statistics to estimate confidence intervals, bias, or p-values.

Calculate CI

Getting the first 2 seconds of the channel 1 of epochs 1:10 and calculating its mean and 95% CI:

s2 = t2s(e10, t = 2) # in samples
s = e10.data[1, 1:s2, 1:10]
bs_data = bootstrap_ci(s, cl = 0.95)
s_avg, s_l, s_u = bs_data.gm, bs_data.ll, bs_data.ul

Plotting CI:

t = e10.epoch_time[1:s2] # time points
plot_ci(s_avg,
        s_l,
        s_u,
        t,
        xlabel = "Time [s]",
        ylabel = "Amplitude")

Calculate Statistics

In NeuroAnalyzer, you can calculate any statistical function (e.g., mean, median, correlation) using the bootstrapping method by providing the function as a string with obj as a placeholder for the signal data.

s = e10.data[1, :, 1:10] # get epochs 1 to 10 for channel 1
s_stat = mean(s)         # mean of the data
s_dist = bootstrap_stat(s,
                        f = "mean(obj)",
                        n1 = 3000,
                        n2 = 1000)

f: The statistical function to compute (as a string). Use obj as a placeholder for the data.\ n1: number of bootstrap resamples; must be ≥ 1.\ n2: number of epochs drawn per resample; must be ≥ 1.

The result is bootstrap distribution of the statistic.

Plotting the distribution against calculate mean:

plot_histogram(s_dist,
               s_stat,
               draw_median = false,
               draw_mean = true)