Signal Complexity and Similarity

Initialize NeuroAnalyzer

using NeuroAnalyzer
using Plots
eeg = load("files/eeg.hdf")
e10 = epoch(eeg, ep_len = 10)
eeg1 = NeuroAnalyzer.filter(e10,
                            ch = "eeg",
                            fprototype = :butterworth,
                            ftype = :bp,
                            cutoff = (4, 8),
                            order = 8)
eeg2 = NeuroAnalyzer.filter(e10,
                            ch = "eeg",
                            fprototype = :butterworth,
                            ftype = :bp,
                            cutoff = (8, 12),
                            order = 8)

Signal complexity measures how intricate or unpredictable a signal (e.g., EEG) is over time. High complexity often reflects rich, non-linear dynamics, while low complexity may indicate periodicity or noise.

Key Metrics for EEG Complexity

Sample Entropy (SampEn):

Quantifies the unpredictability of EEG time series.
Lower values indicate more regularity (e.g., during seizures or deep sleep).

Fractal Dimension:

Describes the self-similarity of EEG signals across different time scales.
Higher values suggest greater complexity (e.g., during cognitive tasks).

Lempel-Ziv Complexity (LZC):

Estimates the rate at which new patterns emerge in the signal.
Useful for detecting changes in brain states (e.g., transition from wakefulness to sleep).

Multiscale Entropy (MSE):

Measures complexity across multiple time scales.
Helps capture both fast and slow EEG dynamics.

Signal similarity assesses how alike two EEG signals or segments are, often used for comparing brain states, subjects, or conditions.

Key Metrics for EEG Similarity

Correlation Coefficient:

Measures linear relationship between two EEG signals.
Example: Pearson’s rrr or Spearman’s ρ.

Dynamic Time Warping (DTW):

Aligns EEG signals in time to compare their shapes, even if they are misaligned or vary in speed.

Cross-Approximate Entropy (Cross-ApEn):

Quantifies the synchrony or asynchrony between two EEG signals.
Useful for studying brain connectivity.

Phase Locking Value (PLV):

Measures the consistency of phase differences between two EEG signals.
High PLV indicates strong phase synchronization (e.g., during cognitive tasks).

Euclidean Distance:

Computes the point-by-point difference between two signals.
Simple but sensitive to amplitude and time shifts.

Applications in EEG Analysis

Complexity:
- Detecting pathological states (e.g., epilepsy, Alzheimer’s).
- Monitoring depth of anesthesia or cognitive load.
Similarity:
- Comparing EEG patterns across subjects or conditions.
- Studying brain connectivity and functional networks.

Entropy and Negentropy

Entropy

Entropy is a measure of randomness, uncertainty, or disorder in a system or signal. In information theory, entropy quantifies the amount of information contained in a signal.

For EEG signals, entropy helps assess the complexity or predictability of brain activity.

Types of Entropy in EEG Analysis

Shannon Entropy:

Measures the average unpredictability in a signal.
Formula:

\[ H(X) = -\sum_{i} p(x_i) \log p(x_i) \]

where \(p(x_i)\) is the probability of observing \(x_i\).

Sample Entropy (SampEn):

Estimates the likelihood that similar patterns in the EEG signal remain similar when extended by one sample.
Lower values indicate more regularity (e.g., during seizures or deep sleep).

Approximate Entropy (ApEn):

Similar to SampEn but includes self-matches, which can introduce bias.
Useful for short or noisy EEG segments.

Multiscale Entropy (MSE):

Extends entropy analysis across multiple time scales.
Captures both fast and slow dynamics in EEG signals.

Why Use Entropy in EEG?

Complexity Assessment: High entropy indicates complex, irregular brain activity (e.g., during cognitive tasks).
Pathology Detection: Low entropy may reflect abnormal regularity (e.g., epilepsy or anesthesia).
State Monitoring: Tracks changes in brain states (e.g., sleep stages, cognitive load).

In NeuroAnalyzer the following entropy descriptors are available:

ent: histogram-based entropy in bits (Freedman-Diaconis binning)
shent: Shannon entropy (Wavelets.coefentropy)
leent: log energy entropy (Wavelets.coefentropy)
sent: sample entropy (ComplexityMeasures)
nsent: normalized sample entropy (ComplexityMeasures)
dent: differential entropy

Calculating entropy:

e = entropy(e10;
            ch = "eeg")
plot_bar(e.ent[:, 1];
         glabels = labels(eeg)[1:19],
         xlabel = "Channels",
         title = "Entropy, epoch 1")

Entropy of the signal is calculated using its histogram bins (number of bins is calculated using the Freedman-Diaconis formula) using the formulas p = n / sum(n) and entropy = -sum(p .* log2(p)), where p is the probability of each bin and n are bins’ weights.

Shannon entropy and log energy entropy are calculated using Wavelets.coefentropy().

Sample and normalized sample entropy and log energy entropy are calculated using ComplexityMeasures.

Negentropy

Negentropy is a measure of non-Gaussianity or structure in a signal. It quantifies how much a signal deviates from a Gaussian (normal) distribution. ICA (Independent Component Analysis), negentropy is used as a contrast function to separate independent components.

Mathematical Definition

Negentropy \(J\) of a random variable \(Y\) is defined as:

\[ J(Y) = H(Y_{\text{Gaussian}}) - H(Y) \]

where:
\(H(Y)\) is the differential entropy of \(Y\),
\(Y_{\text{Gaussian}}\) is a Gaussian random variable with the same covariance as \(Y\).

High negentropy: Signal is far from Gaussian (structured, non-random).
Low negentropy: Signal is close to Gaussian (random, unstructured).

Role of Negentropy in ICA

ICA aims to maximize negentropy to find independent components (ICs) in EEG signals.
Non-Gaussian components are more likely to represent meaningful sources (e.g., brain activity or artifacts).

Why Use Negentropy in EEG?

Source Separation: Helps isolate neural sources and artifacts in EEG signals.
Feature Extraction: Identifies structured, non-random patterns in brain activity.
Artifact Removal: Distinguishes between Gaussian noise and non-Gaussian artifacts (e.g., eye blinks).

Calculating negentropy, use sample entropy and do not normalize signal against its total energy:

ne = negentropy(e10;
                ch = "eeg",
                norm = false,
                type = :sample)
plot_bar(ne[:, 1];
         glabels = labels(eeg)[1:19],
         xlabel = "Channels",
         title = "Negentropy, epoch 1")

┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.

└ @ Makie ~/.julia/packages/Makie/kJl0u/src/scenes.jl:264

Entropy vs. Negentropy

Feature	Entropy	Negentropy
Definition	Measure of randomness	Measure of non-Gaussianity
High Value	High randomness, unpredictability	High structure, non-randomness
Use in EEG	Assesses signal complexity	Separates independent sources
Application	Detects brain states, pathologies	ICA for artifact removal

Higuchi Fractal Dimension

The Higuchi fractal dimension is a method for measuring the fractal dimension of a time series, such as EEG signals. It quantifies the complexity and self-similarity of the signal, providing insights into its underlying dynamics.

It was introduced by Tomoyuki Higuchi in 1988 as a method to estimate the fractal dimension directly from time-series data.

Unlike other fractal dimension methods, HFD does not require phase-space reconstruction, making it computationally efficient and suitable for short and noisy time series.

Interpretation

Higher HFD (closer to 2): Indicates a more complex, irregular time series (e.g., chaotic or noisy signals).
Lower HFD (closer to 1): Indicates a more regular, smooth time series (e.g., periodic signals).

Calculating the Higuchi fractal dimension:

h = hfd(e10, ch = ["C3", "C4"])
println("C3, epoch 1: ", h[1, 1])
println("C4, epoch 1: ", h[2, 1])

C3, epoch 1: 1.778734145484373
C4, epoch 1: 1.8424658440680828

Generalised Hurst Exponents

The Generalised Hurst Exponents (GHEs) is used as a measure of long-term memory of time series. It relates to the auto-correlations of the time series, and the rate at which these decrease as the lag between pairs of values increases.

The Hurst exponent is referred to as the “index of dependence” or “index of long-range dependence”. It quantifies the relative tendency of a time series either to regress strongly to the mean or to cluster in a direction.

Interpretation

A value in the range 0.5–1 indicates a time series with long-term positive autocorrelation, meaning that the decay in autocorrelation is slower than exponential, following a power law; for the series it means that a high value tends to be followed by another high value and that future excursions to more high values do occur.

A value in the range 0–0.5 indicates a time series with long-term switching between high and low values in adjacent pairs, meaning that a single high value will probably be followed by a low value and that the value after that will tend to be high, with this tendency to switch between high and low values lasting a long time into the future, also following a power law.

A value of 0.5 indicates short-memory, with (absolute) auto-correlations decaying exponentially quickly to zero.

Calculating GHEs:

ghe = ghexp(e10,
            ch = ["C3", "C4"],
            tau_range = 1:4,
            q_range = 0.1:0.1:0.5)
print("C3, epoch 1 at q=0.5: ",
      ghe[1, 1, 1, 5],
      " ",
      ghe[1, 1, 2, 5])

C3, epoch 1 at q=0.5: 0.3741756192862314 0.0033533945949778226

For details, see Hurst.jl documentation.

Summed Similarity

Summed similarity is a method for aggregating similarity scores across multiple items, features, or dimensions into a single value. It is commonly used in:

Signal processing (e.g., combining similarity across EEG channels or time points),
Machine learning (e.g., aggregating feature similarities),
Cognitive neuroscience (e.g., measuring overall similarity between brain states),
Data mining (e.g., combining similarity scores across datasets).

Summed similarity is the sum of similarity scores between items, features, or dimensions. It provides a single scalar value representing the overall similarity across all comparisons.

Key Concepts

Concept	Description
Similarity Score	A measure of how similar two items are (e.g., correlation, cosine similarity, Euclidean distance).
Summed Similarity	The sum of all pairwise similarity scores across items or dimensions.
Aggregation	Combining multiple similarity scores into a single value.

Examples of Use Cases

Use Case	Description
EEG Signal Analysis	Compute the sum of pairwise correlations between EEG channels to measure overall connectivity.
Feature Similarity	Sum the cosine similarity between features to measure overall feature similarity.
Brain State Comparison	Sum the Euclidean distance between brain states to measure overall dissimilarity.
Time-Series Similarity	Sum the cross-correlation between time-series segments to measure overall similarity.

How to Calculate Summed Similarity

Calculating summed similarity using an exponential decay model between two signals:

sumsim(eeg1,
       eeg2;
       ch1 = "F3",
       ch2 = "F3",
       ep1 = 1,
       ep2 = 1,
       theta = 1)

1×1 Matrix{Float64}:
 9.953285126537227e-25

Tip: Values of summed similarity are in the range [0, 1]; higher value indicates larger similarity.

Dirichlet Energy

Dirichlet energy is a mathematical concept used to measure the smoothness or variation of a function defined on a graph or manifold. In the context of EEG signal processing, it quantifies how much a signal changes across channels or time points, making it useful for:

Smoothing EEG signals,
Detecting artifacts or discontinuities,
Graph-based signal processing (e.g., EEG connectivity networks),
Dimensionality reduction (e.g., Laplacian eigenmaps).

Interpretation

If \(f\) is smooth (e.g., a slowly varying EEG signal), its Dirichlet energy is low.
If \(f\) has sharp transitions (e.g., artifacts, discontinuities), its Dirichlet energy is high.

Key Properties

Property	Description
Non-Negative	Dirichlet energy is always ≥ 0.
Zero for Constant	Dirichlet energy is 0 if \(f\) is constant (no variation).
Graph-Dependent	Depends on the graph structure (e.g., EEG channel adjacency).

Smoothing EEG Signals

Dirichlet energy can be used to smooth EEG signals by minimizing E(f)E(f)E(f) subject to constraints (e.g., preserving key features).

Example:

Compute Dirichlet energy for each EEG channel relative to its neighbors.
Apply a smoothing filter (e.g., Laplacian smoothing) to reduce high-frequency noise.

Artifact Detection

High Dirichlet energy indicates sharp transitions (e.g., artifacts, muscle activity).

Example:

Compute Dirichlet energy for EEG segments.
Flag segments with high energy as potential artifacts.

Graph-Based Connectivity Analysis

Use Dirichlet energy to measure smoothness of connectivity patterns across EEG channels.

Example:

Compute the graph Laplacian from EEG connectivity (e.g., coherence, PLV).
Use E(f)E(f)E(f) to quantify global smoothness of connectivity.

Calculate Dirichlet energy:

dirinrg(e10;
        ch = ["Fp1", "Fp2"])

2×100 Matrix{Float64}:
  6613.52  5503.35  5873.42  6110.03  …  14882.9  15072.5  13700.2  14664.9
 10591.0   6441.44  6571.65  6450.84     33229.8  36356.2  26027.8  12078.2

Zip Ratio

Zip ratio is an intermediate marker of signal complexity (with lower values indicating lower complexity).

Zip ratio is a ratio of zip-compressed to uncompressed object data.

zipratio(e10)

0.47208788556582115