Statistics: Bootstrapping

Initialize NeuroAnalyzer
using NeuroAnalyzer
using Statistics
eeg = load("files/eeg.hdf")
e10 = epoch(eeg, ep_len = 10)

Bootstrapping is a resampling method used to estimate the sampling distribution of a statistic by repeatedly sampling with replacement from the observed data. It is widely used in statistical inference, machine learning, and signal processing (e.g., EEG analysis) to:

  • Estimate confidence intervals,
  • Assess uncertainty in estimators,
  • Test hypotheses,
  • Validate models,
  • Handle small or non-normal datasets.

Bootstrapping is a non-parametric method that uses resampling to approximate the sampling distribution of a statistic (e.g., mean, median, correlation). It is particularly useful when:

  • The theoretical distribution of the statistic is unknown,
  • The sample size is small,
  • The data is non-normal,
  • Parametric assumptions (e.g., normality) are violated.

Key Ideas

Instead of relying on theoretical distributions (e.g., normal distribution for means), bootstrapping uses the data itself to estimate uncertainty by:

  • Resampling the data with replacement,
  • Computing the statistic for each resample,
  • Repeating this process many times (e.g., 1000–10,000 iterations),
  • Using the distribution of bootstrap statistics to estimate confidence intervals, bias, or p-values.

Calculate CI

Getting the first 2 seconds of the channel 1 of epochs 1:10 and calculating its mean and 95% CI:

s2 = t2s(e10, t = 2) # in samples
s = e10.data[1, 1:s2, 1:10]
bs_data = bootstrap_ci(s, cl = 0.95)
s_avg, s_l, s_u = bs_data.gm, bs_data.ll, bs_data.ul

Plotting CI:

t = e10.epoch_time[1:s2] # time points
plot_ci(s_avg,
        s_l,
        s_u,
        t,
        xlabel = "Time [s]",
        ylabel = "Amplitude")

Calculate Statistics

In NeuroAnalyzer, you can calculate any statistical function (e.g., mean, median, correlation) using the bootstrapping method by providing the function as a string with obj as a placeholder for the signal data.

s = e10.data[1, :, 1:10] # get epochs 1 to 10 for channel 1
s_stat = mean(s)         # mean of the data
s_dist = bootstrap_stat(s,
                        f = "mean(obj)",
                        n1 = 3000,
                        n2 = 1000)

f: The statistical function to compute (as a string). Use obj as a placeholder for the data.\ n1: number of bootstrap resamples; must be ≥ 1.\ n2: number of epochs drawn per resample; must be ≥ 1.

The result is bootstrap distribution of the statistic.

Plotting the distribution against calculate mean:

plot_histogram(s_dist,
               s_stat,
               draw_median = false,
               draw_mean = true)