Skip to main content

Inverse Methods

This section describes the mathematical details of all inverse source estimation methods available in MNE-CPP: minimum-norm estimates (MNE, dSPM, sLORETA, eLORETA), contextual minimum-norm estimates (CMNE), sparse methods (MxNE, Gamma-MAP), beamformers (LCMV, DICS), RAP MUSIC, and dipole fitting.

Minimum-Norm Estimates

In the Bayesian sense, the ensuing current distribution is the maximum a posteriori (MAP) estimate under the following assumptions:

  • The viable locations of the currents are constrained to the cortex. Optionally, the current orientations can be fixed to be normal to the cortical mantle.
  • The amplitudes of the currents have a Gaussian prior distribution with a known source covariance matrix.
  • The measured data contain additive noise with a Gaussian distribution with a known covariance matrix. The noise is not correlated over time.

The Linear Inverse Operator

The measured data in the source estimation procedure consists of MEG and EEG data, recorded on a total of NN channels. The task is to estimate a total of QQ strengths of sources located on the cortical mantle. If the number of source locations is PP, Q=PQ = P for fixed-orientation sources and Q=3PQ = 3P if the source orientations are unconstrained.

The regularized linear inverse operator following from regularized maximal likelihood of the above probabilistic model is given by the Q×NQ \times N matrix:

M=RG(GRG+C)1M = R' G^\top (G R' G^\top + C)^{-1}

where GG is the gain matrix relating the source strengths to the measured MEG/EEG data, CC is the data noise-covariance matrix and RR' is the source covariance matrix. The dimensions of these matrices are N×QN \times Q, N×NN \times N, and Q×QQ \times Q, respectively.

The expected value of the current amplitudes at time tt is then given by j^(t)=Mx(t)\hat{j}(t) = Mx(t), where x(t)x(t) is a vector containing the measured MEG and EEG data values at time tt.

Regularization

The a priori variance of the currents is, in practice, unknown. We can express this by writing R=R/λ2R' = R / \lambda^2, which yields the inverse operator:

M=RG(GRG+λ2C)1M = R G^\top (G R G^\top + \lambda^2 C)^{-1}

where the unknown current amplitude is now interpreted in terms of the regularization parameter λ2\lambda^2. Larger λ2\lambda^2 values correspond to spatially smoother and weaker current amplitudes, whereas smaller λ2\lambda^2 values lead to the opposite.

We can arrive at the regularized linear inverse operator also by minimizing a cost function SS with respect to the estimated current j^\hat{j} (given the measurement vector xx at any given time tt) as:

minj^{S}=minj^{e~e~+λ2j^R1j^}=minj^{(xGj^)C1(xGj^)+λ2j^R1j^}\min_{\hat{j}} \left\{ S \right\} = \min_{\hat{j}} \left\{ \tilde{e}^\top \tilde{e} + \lambda^2 \hat{j}^\top R^{-1} \hat{j} \right\} = \min_{\hat{j}} \left\{ (x - G\hat{j})^\top C^{-1} (x - G\hat{j}) + \lambda^2 \hat{j}^\top R^{-1} \hat{j} \right\}

where the first term consists of the difference between the whitened measured data and those predicted by the model, while the second term is a weighted-norm of the current estimate. With increasing λ2\lambda^2, the source term receives more weight and larger discrepancy between the measured and predicted data is tolerable.

Whitening and Scaling

The MNE software employs data whitening so that a "whitened" inverse operator assumes the form:

M~=MC1/2=RG~(G~RG~+λ2I)1\tilde{M} = M C^{1/2} = R \tilde{G}^\top (\tilde{G} R \tilde{G}^\top + \lambda^2 I)^{-1}

where

G~=C1/2G\tilde{G} = C^{-1/2} G

is the spatially whitened gain matrix.

The expected current values are:

j^(t)=Mx(t)=M~x~(t)\hat{j}(t) = Mx(t) = \tilde{M} \tilde{x}(t)

where x~(t)=C1/2x(t)\tilde{x}(t) = C^{-1/2} x(t) is the whitened measurement vector at time tt.

The spatial whitening operator C1/2C^{-1/2} is obtained with the help of the eigenvalue decomposition C=UCΛC2UCC = U_C \Lambda_C^2 U_C^\top as C1/2=ΛC1UCC^{-1/2} = \Lambda_C^{-1} U_C^\top.

In the MNE software the noise-covariance matrix is stored as the one applying to raw data. To reflect the decrease of noise due to averaging, this matrix, C0C_0, is scaled by the number of averages, LL, i.e., C=C0/LC = C_0 / L.

note

When EEG data are included, the gain matrix GG needs to be average referenced when computing the linear inverse operator MM. This is incorporated during creation of the spatial whitening operator C1/2C^{-1/2}, which includes any projectors on the data. EEG data average reference (using a projector) is mandatory for source modeling.

A convenient choice for the source-covariance matrix RR is such that trace(G~RG~)/trace(I)=1\text{trace}(\tilde{G} R \tilde{G}^\top) / \text{trace}(I) = 1. With this choice we can approximate λ21/SNR2\lambda^2 \sim 1/\text{SNR}^2, where SNR is the (amplitude) signal-to-noise ratio of the whitened data.

note

The definition of the signal-to-noise ratio / λ2\lambda^2 relationship given above works nicely for the whitened forward solution. In the un-whitened case scaling with the trace ratio trace(GRG)/trace(C)\text{trace}(GRG^\top) / \text{trace}(C) does not make sense, since the diagonal elements summed have, in general, different units of measure. For example, the MEG data are expressed in T or T/m whereas the unit of EEG is Volts.

Regularization of the Noise-Covariance Matrix

Since a finite amount of data is usually available to compute an estimate of the noise-covariance matrix CC, the smallest eigenvalues of its estimate are usually inaccurate and smaller than the true eigenvalues. Depending on the seriousness of this problem, the following quantities can be affected:

  • The model data predicted by the current estimate
  • Estimates of signal-to-noise ratios, which lead to estimates of the required regularization
  • The estimated current values
  • The noise-normalized estimates

Fortunately, the latter two are least likely to be affected due to regularization of the estimates. However, in some cases especially the EEG part of the noise-covariance matrix estimate can be deficient, i.e., it may possess very small eigenvalues and thus regularization of the noise-covariance matrix is advisable.

Historically, the MNE software accomplishes the regularization by replacing a noise-covariance matrix estimate CC with:

C=C+kεkσˉk2I(k)C' = C + \sum_k \varepsilon_k \bar{\sigma}_k^2 I^{(k)}

where the index kk goes across the different channel groups (MEG planar gradiometers, MEG axial gradiometers and magnetometers, and EEG), εk\varepsilon_k are the corresponding regularization factors, σˉk\bar{\sigma}_k are the average variances across the channel groups, and I(k)I^{(k)} are diagonal matrices containing ones at the positions corresponding to the channels contained in each channel group.

Computation of the Solution

The most straightforward approach to calculate the MNE is to employ the expression of the original or whitened inverse operator directly. However, for computational convenience we prefer to take another route, which employs the singular-value decomposition (SVD) of the matrix:

A=G~R1/2=UΛVA = \tilde{G} R^{1/2} = U \Lambda V^\top

where the superscript 1/2^{1/2} indicates a square root of RR.

Combining the SVD with the inverse equation, it is easy to show that:

M~=R1/2VΓU\tilde{M} = R^{1/2} V \Gamma U^\top

where the elements of the diagonal matrix Γ\Gamma are:

γk=λkλk2+λ2\gamma_k = \frac{\lambda_k}{\lambda_k^2 + \lambda^2}

If we define w(t)=Ux~(t)=UC1/2x(t)w(t) = U^\top \tilde{x}(t) = U^\top C^{-1/2} x(t), then the expected current is:

j^(t)=R1/2VΓw(t)=kvˉkγkwk(t)\hat{j}(t) = R^{1/2} V \Gamma w(t) = \sum_k \bar{v}_k \gamma_k w_k(t)

where vˉk=R1/2vk\bar{v}_k = R^{1/2} v_k, with vkv_k being the kk-th column of VV. The current estimate is thus a weighted sum of the "weighted" eigenleads vkv_k.

Noise Normalization

Noise normalization serves three purposes:

  1. It converts the expected current value into a dimensionless statistical test variable. Thus the resulting time and location dependent values are often referred to as dynamic statistical parameter maps (dSPM).

  2. It reduces the location bias of the estimates. In particular, the tendency of the MNE to prefer superficial currents is eliminated.

  3. The width of the point-spread function becomes less dependent on the source location on the cortical mantle.

In practice, noise normalization is implemented as a division by the square root of the estimated variance of each voxel. Using our "weighted eigenleads" definition in matrix form as Vˉ=R1/2V\bar{V} = R^{1/2} V:

dSPM

Noise-normalized linear estimates introduced by Dale et al. (1999) require division of the expected current amplitude by its variance. The variance computation uses:

MCM=M~M~=VˉΓ2VˉM C M^\top = \tilde{M} \tilde{M}^\top = \bar{V} \Gamma^2 \bar{V}^\top

The variances for each source are thus:

σk2=γk2\sigma_k^2 = \gamma_k^2

Under the standard conditions, the tt-statistic values associated with fixed-orientation sources are proportional to L\sqrt{L} while the FF-statistic employed with free-orientation sources is proportional to LL.

note

The MNE software usually computes the square roots of the F-statistic to be displayed on the inflated cortical surfaces. These are also proportional to L\sqrt{L}.

sLORETA

sLORETA (Pascual-Marqui, 2002) estimates the current variances as the diagonal entries of the resolution matrix, which is the product of the inverse and forward operators:

MG=VˉΓΛVˉR1MG = \bar{V} \Gamma \Lambda \bar{V}^\top R^{-1}

Because RR is diagonal and we only care about the diagonal entries, the variance estimates are:

σk2=γk2(1+λk2λ2)\sigma_k^2 = \gamma_k^2 \left(1 + \frac{\lambda_k^2}{\lambda^2}\right)

eLORETA

While dSPM and sLORETA solve for noise normalization weights σk2\sigma_k^2 that are applied to standard minimum-norm estimates j^(t)\hat{j}(t), eLORETA (Pascual-Marqui, 2011) instead solves for a source covariance matrix RR that achieves zero localization bias. For fixed-orientation solutions the resulting matrix RR will be a diagonal matrix, and for free-orientation solutions it will be a block-diagonal matrix with 3×33 \times 3 blocks.

The following system of equations is used to find the weights, i{1,,P}\forall i \in \{1, \ldots, P\}:

ri=[Gi(GRG+λ2C)1Gi]1/2r_i = \left[ G_i^\top \left( GRG^\top + \lambda^2 C \right)^{-1} G_i \right]^{-1/2}

An iterative algorithm finds the values for the weights rir_i that satisfy these equations:

  1. Initialize identity weights.
  2. Compute N=(GRG+λ2C)1N = \left( GRG^\top + \lambda^2 C \right)^{-1}.
  3. Holding NN fixed, compute new weights ri=[GiNGi]1/2r_i = \left[ G_i^\top N G_i \right]^{-1/2}.
  4. Using new weights, go to step (2) until convergence.

Using the whitened substitution G~=C1/2G\tilde{G} = C^{-1/2} G, the computations can be performed entirely in the whitened space, avoiding the need to compute NN directly:

ri=[G~iN~G~i]1/2r_i = \left[ \tilde{G}_i^\top \tilde{N} \tilde{G}_i \right]^{-1/2}

Predicted Data

Under noiseless conditions the SNR is infinite and thus leads to λ2=0\lambda^2 = 0 and the minimum-norm estimate explains the measured data perfectly. Under realistic conditions, however, λ2>0\lambda^2 > 0 and there is a misfit between measured data and those predicted by the MNE. Comparison of the predicted data x^(t)\hat{x}(t) and measured data can give valuable insight on the correctness of the regularization applied.

In the SVD approach:

x^(t)=Gj^(t)=C1/2UΠw(t)\hat{x}(t) = G\hat{j}(t) = C^{1/2} U \Pi w(t)

where the diagonal matrix Π\Pi has elements πk=λkγk\pi_k = \lambda_k \gamma_k. The predicted data is thus expressed as the weighted sum of the "recolored eigenfields" in C1/2UC^{1/2} U.

Cortical Patch Statistics

If source space distance information was used during source space creation, the source space file will contain Cortical Patch Statistics (CPS) for each vertex of the cortical surface. The CPS provide information about the source space point closest to each vertex as well as the distance from the vertex to this source space point.

Once these data are available, the following cortical patch statistics can be computed for each source location dd:

  • The average over the normals at the vertices in a patch, nˉd\bar{n}_d
  • The areas of the patches, AdA_d
  • The average deviation of the vertex normals in a patch from their average, σd\sigma_d, given in degrees

Orientation Constraints

The principal sources of MEG and EEG signals are generally believed to be postsynaptic currents in the cortical pyramidal neurons. Since the net primary current associated with these microscopic events is oriented normal to the cortical mantle, it is reasonable to use the cortical normal orientation as a constraint in source estimation.

In addition to allowing completely free source orientations, the MNE software implements three orientation constraints based on the surface normal data:

  • Fixed orientation: Source orientation is rigidly fixed to the surface normal direction. If cortical patch statistics are available, the average normal over each patch nˉd\bar{n}_d is used. Otherwise, the vertex normal at the source space location is employed.

  • Fixed Loose Orientation Constraint (fLOC): A source coordinate system based on the local surface orientation at the source location is employed. The first two source components lie in the plane normal to the surface normal, and the third component is aligned with it. The variance of the tangential components is reduced by a configurable factor.

  • Variable Loose Orientation Constraint (vLOC): Similar to fLOC except that the loose factor is multiplied by σd\sigma_d (the angular deviation of normals within the patch).

Depth Weighting

The minimum-norm estimates have a bias towards superficial currents. This tendency can be alleviated by adjusting the source covariance matrix RR to favor deeper source locations. In the depth weighting scheme, the elements of RR corresponding to the pp-th source location are scaled by a factor:

fp=(g1pg1p+g2pg2p+g3pg3p)γf_p = (g_{1p}^\top g_{1p} + g_{2p}^\top g_{2p} + g_{3p}^\top g_{3p})^{-\gamma}

where g1pg_{1p}, g2pg_{2p}, and g3pg_{3p} are the three columns of GG corresponding to source location pp and γ\gamma is the order of the depth weighting.

Effective Number of Averages

It is often the case that the epoch to be analyzed is a linear combination over conditions rather than one of the original averages computed. The noise-covariance matrix computed is originally one corresponding to raw data. Therefore, it has to be scaled correctly to correspond to the actual or effective number of epochs in the condition to be analyzed. In general:

C=C0/LeffC = C_0 / L_{\text{eff}}

where LeffL_{\text{eff}} is the effective number of averages. To calculate LeffL_{\text{eff}} for an arbitrary linear combination of conditions y(t)=i=1nwixi(t)y(t) = \sum_{i=1}^{n} w_i x_i(t):

1/Leff=i=1nwi2/Li1 / L_{\text{eff}} = \sum_{i=1}^{n} w_i^2 / L_i

For a weighted average, where wi=Li/i=1nLiw_i = L_i / \sum_{i=1}^n L_i:

Leff=i=1nLiL_{\text{eff}} = \sum_{i=1}^{n} L_i

For a difference of two categories (w1=1w_1 = 1, w2=1w_2 = -1):

Leff=L1L2L1+L2L_{\text{eff}} = \frac{L_1 L_2}{L_1 + L_2}

Generalizing, for any combination of sums and differences where wi=±1w_i = \pm 1:

1/Leff=i=1n1/Li1 / L_{\text{eff}} = \sum_{i=1}^{n} 1/L_i

References

  • Hämäläinen, M.S. & Ilmoniemi, R.J. (1994). Interpreting magnetic fields of the brain: minimum norm estimates. Med. Biol. Eng. Comput., 32, 35–42.
  • Dale, A.M. et al. (2000). Dynamic Statistical Parametric Mapping: combining fMRI and MEG for high-resolution imaging of cortical activity. Neuron, 26(1), 55–67. DOI: 10.1016/S0896-6273(00)81138-1
  • Pascual-Marqui, R.D. (2002). Standardized low-resolution brain electromagnetic tomography (sLORETA): technical details. Methods Find. Exp. Clin. Pharmacol., 24D, 5–12.
  • Pascual-Marqui, R.D. (2007). Discrete, 3D distributed, linear imaging methods of electric neuronal activity. Part 1: exact, zero error localization. arXiv:0710.3341.
  • Hämäläinen, M.S. et al. (1993). Magnetoencephalography — theory, instrumentation, and applications to noninvasive studies of the working human brain. Rev. Mod. Phys., 65(2), 413–497.

Contextual Minimum-Norm Estimates (CMNE)

Standard minimum-norm methods such as MNE, dSPM, sLORETA, and eLORETA compute source estimates independently at each time point: the estimate j^(t)\hat{j}(t) depends only on the measurement x(t)x(t) and not on the temporal context in which x(t)x(t) occurs. While this makes the methods robust and computationally straightforward, it ignores the rich temporal structure of neural signals. Brain activity unfolds over time with characteristic dynamics — somatosensory responses follow reproducible sequences across cortical areas, auditory processing propagates from primary to associative cortices, and so forth.

Contextual Minimum-Norm Estimates (CMNE) address this limitation by augmenting dSPM with a spatiotemporal LSTM network that learns to exploit temporal context (Dinh et al., 2021). Rather than replacing the physics-based inverse model, CMNE uses data-driven temporal learning as a post-processing correction that improves spatial fidelity while preserving the well-understood regularization properties of dSPM.

Dinh, C.; Samuelsson, J.G.; Hunold, A.; Hämäläinen, M.S.; Khan, S.: Contextual MEG and EEG Source Estimates Using Spatiotemporal LSTM Networks. Frontiers in Neuroscience 15:552666 (2021). DOI: 10.3389/fnins.2021.552666

Overview

The CMNE pipeline proceeds in four stages:

  1. dSPM kernel computation — a standard noise-normalized inverse kernel KdSPMK_\text{dSPM} is built from the forward model, noise covariance, and regularization parameter (as described above).
  2. dSPM source projection — the kernel is applied to the evoked/epoch data to obtain a noise-normalized source time course at every cortical location.
  3. Z-score rectification — the source amplitudes are rectified (absolute value) and z-scored across time at each vertex, producing a unit-variance representation that is suitable as LSTM input.
  4. LSTM temporal correction — a trained LSTM network predicts the expected source pattern from the preceding kk time steps and applies it as a multiplicative correction to the current dSPM estimate, forming a recursive Markov chain.

Mathematical Formulation

Stage 1–3: dSPM and Preprocessing

Let ytRNy_t \in \mathbb{R}^N denote the measurement at time tt across NN channels. The whitened data and whitened gain matrix are:

y~t=C1/2yt,G~=C1/2G\tilde{y}_t = C^{-1/2} y_t, \qquad \tilde{G} = C^{-1/2} G

The MNE kernel and dSPM noise normalization yield:

K=RG~(G~RG~+λ2I)1K = R \tilde{G}^\top (\tilde{G} R \tilde{G}^\top + \lambda^2 I)^{-1} KdSPM=WdSPMK,WdSPM(i,i)=1[KCK]iiK_\text{dSPM} = W_\text{dSPM} K, \qquad W_\text{dSPM}(i,i) = \frac{1}{\sqrt{[K C K^\top]_{ii}}}

The dSPM source estimate is x^t=KdSPMy~t\hat{x}_t = K_\text{dSPM} \tilde{y}_t. This is then rectified and z-scored:

q^t(i)=x^t(i)μiσi\hat{q}_t(i) = \frac{|\hat{x}_t(i)| - \mu_i}{\sigma_i}

where μi\mu_i and σi\sigma_i are the mean and standard deviation of x^t(i)|\hat{x}_t(i)| over time.

Stage 4: LSTM Temporal Correction

The temporal correction operates as a recursive Markov chain. Let btRQb_t \in \mathbb{R}^{Q} denote the CMNE-corrected source estimate at time tt and let kk be the look-back window. For the first kk time steps, no correction is possible:

bt=q^t,t<kb_t = \hat{q}_t, \qquad t < k

For tkt \geq k, the preceding kk corrected estimates form the LSTM input sequence:

Bt=[btk,  btk+1,  ,  bt1]Rk×Q\mathbf{B}_{t} = [b_{t-k}, \; b_{t-k+1}, \; \ldots, \; b_{t-1}] \in \mathbb{R}^{k \times Q}

The LSTM network fθf_\theta (trained offline) maps this sequence to a prediction vector:

pt=fθ(Bt)RQp_t = f_\theta(\mathbf{B}_{t}) \in \mathbb{R}^{Q}

This prediction is normalized to form a diagonal weighting matrix:

WtCMNE(i,i)=pt(i)maxjpt(j)W_t^\text{CMNE}(i,i) = \frac{|p_t(i)|}{\max_j |p_t(j)|}

The CMNE estimate is then the element-wise product of the LSTM-derived weights and the dSPM estimate:

bt=WtCMNEq^tb_t = W_t^\text{CMNE} \hat{q}_t

Crucially, btb_t replaces q^t\hat{q}_t in the sliding window for subsequent predictions. This recursive structure allows the network to build an evolving "context" of the source dynamics, progressively refining its predictions as more data becomes available.

LSTM Architecture

The default architecture follows the paper's cross-validated design:

ParameterDefaultDescription
Look-back kk80Number of past time steps fed to the LSTM
Hidden units1280LSTM hidden state dimension
Layers1Number of stacked LSTM layers
OutputQQ (= n_sources)Dense layer mapping hidden state to source space
LossMSEMean squared error between prediction and ground truth
OptimizerAdamDefault learning rate 10310^{-3}
  • Input shape: (B,k,Q)(B, k, Q) — batch of look-back windows across all sources
  • Output shape: (B,Q)(B, Q) — predicted source pattern for the next time step

The model is trained offline (in Python using PyTorch) and exported to ONNX format for C++ inference via ONNX Runtime.

Training

The LSTM is trained on epoched data — individual trials from the same experimental paradigm. For each epoch:

  1. The dSPM source estimate is computed using the same forward model, noise covariance, and regularization as will be used at inference time.
  2. The source amplitudes are z-score rectified.
  3. Sliding windows of length kk are extracted, each producing one training pair: the input window Bt\mathbf{B}_t and the target q^t\hat{q}_t.

When ground-truth source activity is available (e.g., from simulations), the target is the true source pattern. When ground truth is unavailable, the dSPM estimate itself serves as a pseudo ground truth — the LSTM then learns to predict the next dSPM pattern from the preceding ones, effectively performing temporal denoising (referred to as "test mode" in the implementation).

note

Training on the MNE sample dataset (289 epochs, 7498 sources, look-back 40) produces approximately 110,000 training samples. The first run computes dSPM for all epochs, which takes several minutes on CPU. Subsequent runs load cached source estimates from disk automatically.

Fallback: Moving-Average Approximation

When no trained ONNX model is available, MNE-CPP provides a moving-average fallback that replaces the LSTM prediction with a simple temporal average:

WtMA(i,i)=1ks=tkt1bs(i)W_t^\text{MA}(i,i) = \frac{1}{k} \sum_{s=t-k}^{t-1} b_s(i)

This produces a smoothed version of the CMNE correction without requiring any trained model. While it does not achieve the spatial improvement of the LSTM approach, it serves as a baseline and allows the pipeline to run end-to-end for testing.

Performance

The paper demonstrates the following improvements over standard dSPM on simulated somatosensory data (Dinh et al., 2021, Table 1):

MetricdSPMCMNE
Peak localization error (PE)higherreduced
Spatial dispersion (SD)broaderreduced
Source-space SNRlowerimproved

The LSTM's contextual predictions effectively "focus" the dSPM estimate by suppressing source locations inconsistent with the learned temporal dynamics, leading to sharper and more focal source images while maintaining the noise normalization properties of dSPM.

CLI Tool

The mne_compute_cmne tool provides three modes of operation:

# Compute CMNE source estimates
mne_compute_cmne --mode compute \
--fwd sample_audvis-meg-eeg-oct-6-fwd.fif \
--cov sample_audvis-cov.fif \
--evoked sample_audvis-ave.fif \
--onnx cmne_lstm.onnx \
--out sample_audvis

# Train the LSTM model
mne_compute_cmne --mode train \
--fwd sample_audvis-meg-eeg-oct-6-fwd.fif \
--cov sample_audvis-cov.fif \
--epochs sample_audvis-epo.fif \
--onnx-out cmne_lstm.onnx

# Fine-tune an existing model
mne_compute_cmne --mode finetune \
--fwd sample_audvis-meg-eeg-oct-6-fwd.fif \
--cov sample_audvis-cov.fif \
--epochs sample_audvis-epo.fif \
--finetune cmne_lstm.onnx \
--onnx-out cmne_lstm_v2.onnx

Convenience scripts run_compute.sh and run_train.sh are provided for quick experiments using MNE sample data.


Sparse Methods

Sparse inverse methods estimate source activity under the assumption that only a small number of cortical locations are active at any given time. Unlike distributed methods (MNE, dSPM, sLORETA) that produce estimates at all cortical locations, sparse methods enforce sparsity through appropriate penalty terms, yielding solutions with most source amplitudes exactly zero. This makes them particularly suitable for localizing focal brain activity.

Mixed-Norm Estimates (MxNE)

The Mixed-Norm Estimate (MxNE; Gramfort et al., 2012) promotes spatial sparsity while allowing temporal smoothness within each active source. It minimizes a cost function with an 2,1\ell_{2,1}-norm (group lasso) penalty:

minX12MGXF2+αi=1QXi2\min_{X} \frac{1}{2} \| M - GX \|_F^2 + \alpha \sum_{i=1}^{Q} \| X_i \|_2

where MM is the N×TN \times T measurement matrix, GG is the N×QN \times Q gain matrix, XX is the Q×TQ \times T source matrix, and XiX_i denotes the ii-th row of XX (the time course of source ii). The parameter α>0\alpha > 0 controls the trade-off between data fit and sparsity.

The 2,1\ell_{2,1}-norm iXi2\sum_i \|X_i\|_2 is the sum of 2\ell_2-norms of the rows: it penalizes the number of active sources (rows with non-zero 2\ell_2-norm) rather than individual time-point amplitudes. This encourages entire source time courses to be driven to zero, producing a solution where only a few sources are active across all time points.

IRLS Algorithm

MNE-CPP solves the MxNE problem using Iteratively Reweighted Least Squares (IRLS):

  1. Initialize all source weights wi=1w_i = 1.
  2. Construct the diagonal weight matrix W=diag(1/wi2)W = \operatorname{diag}(1/w_i^2).
  3. Solve the weighted least-squares problem: X(k)=(GG+αW)1GMX^{(k)} = (G^\top G + \alpha W)^{-1} G^\top M
  4. Update the weights: wi=max(Xi(k)2,ϵ)w_i = \max(\|X_i^{(k)}\|_2, \, \epsilon) where ϵ=1010\epsilon = 10^{-10}.
  5. Prune sources with wi<108w_i < 10^{-8} from the active set.
  6. Repeat from step 2 until convergence (maximum weight change <tolerance< \text{tolerance}) or a maximum number of iterations is reached.

The active-set strategy progressively removes inactive sources, reducing the problem size at each iteration and improving computational efficiency.

Parameters

ParameterDefaultDescription
alpha(user-specified)Regularization parameter controlling sparsity
nIterations50Maximum IRLS iterations
tolerance10610^{-6}Convergence threshold on weight change

References

  • Gramfort, A.; Kowalski, M.; Hämäläinen, M.S. (2012). Mixed-norm estimates for the M/EEG inverse problem using accelerated gradient methods. Physics in Medicine and Biology, 57(7), 1937–1961. DOI: 10.1088/0031-9155/57/7/1937
  • Gramfort, A.; Strohmeier, D.; Haueisen, J.; Hämäläinen, M.S.; Kowalski, M. (2013). Time-frequency mixed-norm estimates: sparse M/EEG imaging with non-stationary source activations. NeuroImage, 70, 410–422. DOI: 10.1016/j.neuroimage.2012.12.051

Gamma-MAP

Gamma-MAP (Sparse Bayesian Learning; Wipf & Nagarajan, 2009) takes a Bayesian approach to sparse source estimation. Instead of directly penalizing source amplitudes, it places a parameterized prior on each source variance and uses the Expectation-Maximization (EM) algorithm to estimate these hyperparameters from the data. Sources whose estimated variance falls below a threshold are pruned, yielding a sparse solution.

Generative Model

The data model is:

M=GX+E,EN(0,Cn)M = G X + E, \qquad E \sim \mathcal{N}(0, C_n)

where CnC_n is the noise covariance matrix. Each source ii is assigned an independent Gaussian prior with unknown variance γi\gamma_i:

XiN(0,γiIT)X_i \sim \mathcal{N}(0, \gamma_i I_T)

The model evidence (data covariance) is:

CM=GΓG+Cn,Γ=diag(γ1,,γQ)C_M = G \, \Gamma \, G^\top + C_n, \qquad \Gamma = \operatorname{diag}(\gamma_1, \ldots, \gamma_Q)

EM Update Rules

The EM algorithm alternates between computing the posterior source estimates and updating the hyperparameters:

E-step — compute the posterior mean:

X^=ΓGCM1M\hat{X} = \Gamma \, G^\top \, C_M^{-1} \, M

M-step — update source variances:

γinew=X^i22T\gamma_i^{\text{new}} = \frac{\| \hat{X}_i \|_2^2}{T}

Sources with γi<γthreshold\gamma_i < \gamma_{\text{threshold}} are pruned from the active set between iterations, reducing the dimensionality of CMC_M and improving computational efficiency.

Convergence

The algorithm converges when the maximum relative change in γ\gamma falls below a tolerance:

maxiγinewγioldmaxjγjold<tolerance\max_i \frac{|\gamma_i^{\text{new}} - \gamma_i^{\text{old}}|}{\max_j |\gamma_j^{\text{old}}|} < \text{tolerance}

Parameters

ParameterDefaultDescription
nIterations100Maximum EM iterations
tolerance10610^{-6}Convergence threshold on relative γ\gamma change
gammaThreshold101010^{-10}Sources with γ<threshold\gamma < \text{threshold} are pruned

References

  • Wipf, D. & Nagarajan, S.S. (2009). A unified Bayesian framework for MEG/EEG source imaging. NeuroImage, 44(3), 947–966. DOI: 10.1016/j.neuroimage.2008.02.059
  • Calvetti, D.; Hakula, H.; Pursiainen, S.; Somersalo, E. (2009). Conditionally Gaussian hypermodels for cerebral source localization. SIAM J. Imaging Sciences, 2(3), 879–909.

Beamformer Methods

Beamformers are a family of adaptive spatial filters that estimate the source activity at a given brain location while suppressing contributions from other locations and noise. Unlike minimum-norm methods, beamformers do not require an explicit regularized inverse operator — instead, they construct a spatial filter from the data covariance and the forward model at each source point.

LCMV Beamformer

The Linearly Constrained Minimum Variance (LCMV) beamformer (Van Veen et al., 1997) operates in the time domain. It finds a spatial filter WW that passes the signal from a target source location with unit gain while minimizing the total output power (i.e., suppressing all other sources and noise).

Spatial Filter

The optimization problem is:

minW  tr(WCmW)subject toWG=I\min_{W} \; \operatorname{tr}(W C_m W^\top) \quad \text{subject to} \quad W G = I

where GG is the lead-field matrix (gain matrix) at the source location (N×norientN \times n_{\text{orient}}, with norient=1n_{\text{orient}} = 1 for fixed orientation or 33 for free orientation) and CmC_m is the N×NN \times N data covariance matrix.

The closed-form solution is the unit-gain filter:

Wug=(GCm1G)1GCm1W_{\text{ug}} = (G^\top C_m^{-1} G)^{-1} \, G^\top C_m^{-1}

Regularization

Since the data covariance matrix may be rank-deficient (e.g., after SSP or MaxFilter), Tikhonov regularization is applied (Gross & Ioannides, 1999):

Cm,reg=Cm+αtr(Cm)rank(Cm)IC_{m,\text{reg}} = C_m + \alpha \cdot \frac{\operatorname{tr}(C_m)}{\operatorname{rank}(C_m)} \cdot I

where α\alpha is the regularization parameter (typically 0.05). In practice this is implemented via eigendecomposition:

Cm1=Vdiag ⁣(1λi+αload)VC_m^{-1} = V \operatorname{diag}\!\left(\frac{1}{\lambda_i + \alpha_{\text{load}}}\right) V^\top

Weight Normalization

The raw unit-gain filter can produce estimates with non-uniform sensitivity across the source space. Several normalization schemes address this:

Unit-noise-gain (Sekihara & Nagarajan, 2008):

Wung=Wugdiag(WugWug)W_{\text{ung}} = \frac{W_{\text{ug}}}{\sqrt{\operatorname{diag}(W_{\text{ug}} W_{\text{ug}}^\top)}}

Unit-noise-gain-invariant (rotation-invariant form):

W=(AA)1/2A,A=GCm1W = (A A^\top)^{-1/2} A, \quad A = G^\top C_m^{-1}

Neural Activity Index (NAI):

WNAI=Wungσnoise2W_{\text{NAI}} = \frac{W_{\text{ung}}}{\sqrt{\sigma_{\text{noise}}^2}}

where σnoise2\sigma_{\text{noise}}^2 is the estimated noise level.

Optimal Orientation

When the source orientation is free, the orientation that maximizes the beamformer output power can be found by solving the generalized eigenvalue problem (Sekihara & Nagarajan, 2008, eq. 4.47):

e^=argmaxee(GCm1G)ee(GCm2G)e\hat{e} = \arg\max_{e} \frac{e^\top (G^\top C_m^{-1} G) \, e}{e^\top (G^\top C_m^{-2} G) \, e}

The eigenvector with the largest eigenvalue gives the optimal source orientation.

Source Estimates

The estimated source time course at location ii is:

si(t)=Wix~(t)s_i(t) = W_i \, \tilde{x}(t)

and the source power is:

Pi=tr(WiCmWi)P_i = \operatorname{tr}(W_i \, C_m \, W_i^\top)

where x~(t)=Cn1/2Px(t)\tilde{x}(t) = C_n^{-1/2} P_\perp \, x(t) is the whitened, projected data vector.

References

  • Van Veen, B.D. et al. (1997). Localization of brain electrical activity via linearly constrained minimum variance spatial filtering. IEEE Trans. Biomed. Eng., 44(9), 867–880.
  • Sekihara, K. & Nagarajan, S.S. (2008). Adaptive Spatial Filters for Electromagnetic Brain Imaging. Springer.
  • Gross, J. & Ioannides, A.A. (1999). Linear transformations of data space in MEG. Phys. Med. Biol., 44, 2081–2097.

DICS Beamformer

The Dynamic Imaging of Coherent Sources (DICS) beamformer (Gross et al., 2001) is the frequency-domain analogue of LCMV. It replaces the time-domain data covariance CmC_m with the cross-spectral density (CSD) matrix C(f)C(f) at a frequency of interest:

W(f)=(GC(f)1G)1GC(f)1W(f) = (G^\top C(f)^{-1} G)^{-1} \, G^\top C(f)^{-1}

The CSD matrix C(f)C(f) can be estimated using Fourier, multitaper, or Morlet wavelet methods.

Real Filter Option

The CSD matrix is generally complex-valued. When computing spatial filter weights for source power estimation, it is common to use only the real part (Hipp et al., 2011):

Creal(f)=[C(f)]C_{\text{real}}(f) = \Re[C(f)]

This ensures real-valued spatial filter weights and avoids phase-related artifacts in the power estimates.

Source Power

The source power at location ii and frequency ff is:

Pi(f)=tr(Wi(f)C(f)Wi(f)H)P_i(f) = \operatorname{tr}(W_i(f) \, C(f) \, W_i(f)^H)

All regularization and weight-normalization options from LCMV apply identically.

References

  • Gross, J. et al. (2001). Dynamic imaging of coherent sources: studying neural interactions in the human brain. PNAS, 98(2), 694–699.
  • van Vliet, M. et al. (2018). Analysis of functional connectivity and oscillatory power using DICS. J. Neurosci. Methods, 309, 199–212.

RAP MUSIC

Recursively Applied and Projected MUltiple SIgnal Classification (RAP-MUSIC; Mosher & Leahy, 1999) is a scanning method that localizes multiple correlated or uncorrelated dipolar sources by iteratively identifying them and projecting them out of the data.

Signal Subspace Estimation

Given the measured data matrix FF (NN channels × TT time samples), the signal subspace is estimated from the eigendecomposition of the data covariance:

FF=UΛUF F^\top = U \Lambda U^\top

The signal subspace Φs=[uNn+1,,uN]\Phi_s = [u_{N-n+1}, \ldots, u_N] is formed from the nn eigenvectors corresponding to the nn largest eigenvalues, where nn is the number of dipoles to search for.

Subspace Correlation Scan

For each candidate source location ii with lead-field GiG_i (N×norientN \times n_{\text{orient}}):

  1. Compute the SVD of the lead-field: Gi=UGΣGVGG_i = U_G \Sigma_G V_G^\top
  2. Compute the correlation matrix: C=UGΦs\mathcal{C} = U_G^\top \Phi_s
  3. Compute the SVD of the correlation: C=UCΣCVC\mathcal{C} = U_\mathcal{C} \Sigma_\mathcal{C} V_\mathcal{C}^\top
  4. The subcorrelation metric is the largest singular value: ρi=σC,1\rho_i = \sigma_{\mathcal{C},1}
  5. The optimal orientation is: e^i=VGdiag(1/σG)uC,1\hat{e}_i = V_G^\top \operatorname{diag}(1/\sigma_G) \, u_{\mathcal{C},1}

The source location with maximum ρi\rho_i is selected as the kk-th identified source.

Recursive Projection

After identifying source kk with lead-field column ak=Gie^ia_k = G_{i^*} \hat{e}_{i^*}, the source is projected out of both the lead-field and signal subspace. Let Ak=[a1,,ak]A_k = [a_1, \ldots, a_k] be the matrix of all identified source fields so far, and let UAU_A come from its SVD. The projector is:

Pk=IUAUAP_k = I - U_A U_A^\top

The projected quantities for the next iteration are:

G(k+1)=PkG,Φs(k+1)=PkΦsG^{(k+1)} = P_k G, \qquad \Phi_s^{(k+1)} = P_k \Phi_s

The algorithm terminates when the subcorrelation falls below a threshold (typically 0.5) or nn sources have been found.

TRAP-MUSIC Variant

In Truncated RAP-MUSIC (Mäkelä et al., 2018), the signal subspace is additionally truncated at each iteration, keeping only nkn - k columns after projection. This improves robustness when sources are highly correlated.

Dipole-Pair and N-Dipole Scanning

Mosher & Leahy (1999) already proposed that the subspace correlation scan can be performed over nn-tuples of grid points rather than single dipoles. For a pair of candidate locations (i,j)(i,j), one forms the combined lead-field GijRN×6G_{ij} \in \mathbb{R}^{N \times 6} and computes the subcorrelation metric for this joint model. This extends naturally to nn-tuples, but the number of combinations grows as (Ngridn)\binom{N_{\text{grid}}}{n}, making exhaustive search prohibitive for n>2n > 2.

Powell-Accelerated Pair Scanning

To make dipole-pair scanning computationally tractable, MNE-CPP employs a Powell coordinate-descent strategy (Dinh et al., 2012): instead of evaluating all (Ngrid2)\binom{N_{\text{grid}}}{2} pairs exhaustively, the algorithm alternates between fixing one dipole index and scanning the other along all grid points. This reduces the search from O(N2)O(N^2) to iterative O(N)O(N) sweeps that converge rapidly, though not necessarily to the global maximum. The pair search is parallelized with OpenMP.

The procedure is:

  1. Start with an initial pair of grid indices
  2. Fix dipole 1 at its current index, scan all grid points for the best partner dipole 2
  3. Fix dipole 2 at its newly found index, scan all grid points for the best partner dipole 1
  4. Repeat until the maximum pair index converges (same pair found in consecutive iterations)
  5. Project out the identified dipole pair and repeat for the next pair

This Powell search was further combined with lead-field clustering (RTC-MUSIC; Dinh et al., 2017) to enable real-time scanning by partitioning the source space into regions with representative lead fields, dramatically reducing the number of subcorrelation evaluations.

References

  • Mosher, J.C. & Leahy, R.M. (1999). Source localization using recursively applied and projected (RAP) MUSIC. IEEE Trans. Signal Process., 47(2), 332–340.
  • Dinh, C.; Bollmann, S.; Eichardt, R.; Baumgarten, D.; Haueisen, J. (2012). A GPU-accelerated Performance Optimized RAP-MUSIC Algorithm for Real-Time Source Localization. Biomedizinische Technik, 57. DOI: 10.1515/bmt-2012-4260
  • Dinh, C.; Esch, L.; Rühle, J.; Bollmann, S.; Güllmar, D.; Baumgarten, D.; Hämäläinen, M.S.; Haueisen, J. (2017). Real-Time Clustered Multiple Signal Classification (RTC-MUSIC). Brain Topography, 30(5). DOI: 10.1007/s10548-017-0586-7
  • Mäkelä, N. et al. (2018). Truncated RAP-MUSIC (TRAP-MUSIC) for MEG and EEG source localization. NeuroImage, 167, 73–83.

Dipole Fitting

Sequential equivalent current dipole (ECD) fitting localizes focal brain activity by fitting one or more current dipoles to the measured field pattern at each time point. Unlike distributed methods (MNE, beamformers) that estimate activity at many locations simultaneously, dipole fitting finds the single position, orientation, and amplitude that best explain the data.

Forward Model

For a current dipole at position rdr_d with moment Q=(Qx,Qy,Qz)Q = (Q_x, Q_y, Q_z)^\top, the predicted field at sensor kk is:

bk=j=13Gkj(rd)Qjb_k = \sum_{j=1}^{3} G_{kj}(r_d) \, Q_j

or in matrix form: b=G(rd)Qb = G(r_d) \, Q, where G(rd)G(r_d) is the N×3N \times 3 forward matrix computed at the candidate position using the BEM or sphere model.

Cost Function

The fitting proceeds in whitened data space. Let B~=Cn1/2PB\tilde{B} = C_n^{-1/2} P_\perp B be the whitened, projected measurement vector and let G~(rd)=Cn1/2PG(rd)\tilde{G}(r_d) = C_n^{-1/2} P_\perp G(r_d) be the corresponding whitened forward matrix. The SVD of the whitened forward at the candidate position is:

G~(rd)=UΣV\tilde{G}(r_d) = U \Sigma V^\top

The cost function to minimize over the dipole position rdr_d is:

f(rd)=B~2c=1ncomp(ucB~)2f(r_d) = \|\tilde{B}\|^2 - \sum_{c=1}^{n_{\text{comp}}} (u_c^\top \tilde{B})^2

where ncompn_{\text{comp}} is the effective number of components (typically 3 for a free dipole, reduced to 2 if the smallest singular value is less than 20% of the largest).

Goodness of Fit

The relative quality of a dipole fit is measured by the goodness-of-fit (GOF):

GOF=c=1ncomp(ucB~)2B~2×100%\text{GOF} = \frac{\sum_{c=1}^{n_{\text{comp}}} (u_c^\top \tilde{B})^2}{\|\tilde{B}\|^2} \times 100\%

A GOF of 100% means the dipole model explains the data perfectly; lower values indicate contributions from other sources, distributed activity, or noise.

Dipole Moment Estimation

Once the optimal position rdr_d^* is found, the dipole moment is computed from the SVD of the whitened forward:

Q=c=1ncompucB~σcscvcQ = \sum_{c=1}^{n_{\text{comp}}} \frac{u_c^\top \tilde{B}}{\sigma_c} \, s_c \, v_c

where scs_c are column normalization scale factors and σc\sigma_c are the singular values.

Optimization Algorithm

To avoid local minima, the optimization begins with a grid search over precomputed guess points — a set of candidate dipole positions distributed within the conductor model (typically on concentric spheres at 1–2 cm spacing inside the inner skull). The forward fields for all guess points are precomputed, and the best-fitting initial positions are identified by evaluating the projection of the data onto each guess point's forward field.

Non-Linear Optimization

Starting from the best guess point(s), the position is refined by non-linear optimization:

  • MNE-CPP uses a Nelder-Mead simplex algorithm (ported from MNE-C): the initial simplex is a regular tetrahedron of ~1 cm edge length centered on the guess point, with convergence tolerance of 0.2 mm.
  • MNE-Python uses COBYLA (Constrained Optimization BY Linear Approximation) with the constraint that the dipole must remain at least min_dist (default 5 mm) inside the inner skull surface.

Two-Pass Strategy (MNE-CPP)

The MNE-CPP implementation optionally uses a two-pass approach: the first pass uses a sphere model for speed, and the second pass refines the position using the full BEM model starting from the sphere-model result.

Confidence Regions

Confidence limits on the fitted dipole position can be estimated from the Hessian of the cost function at the solution. The Jacobian matrix JJ contains the partial derivatives of the predicted field with respect to all six dipole parameters (rx,ry,rz,Qx,Qy,Qzr_x, r_y, r_z, Q_x, Q_y, Q_z). The parameter covariance matrix is:

C=(JJ)1\mathcal{C} = (J^\top J)^{-1}

The 95% confidence volume is:

V95%=4π3c3λ1λ2λ3V_{95\%} = \frac{4\pi}{3} \sqrt{c^3 \, \lambda_1 \lambda_2 \lambda_3}

where λ1,λ2,λ3\lambda_1, \lambda_2, \lambda_3 are the eigenvalues of the 3×33 \times 3 position submatrix of C\mathcal{C} and c=7.81c = 7.81 is the critical value from the χ2\chi^2 distribution with 3 degrees of freedom at the 95% confidence level.

References

  • Sarvas, J. (1987). Basic mathematical and electromagnetic concepts of the biomagnetic inverse problem. Phys. Med. Biol., 32(1), 11–22.
  • Hämäläinen, M. et al. (1993). Magnetoencephalography — theory, instrumentation, and applications to noninvasive studies of the working human brain. Rev. Mod. Phys., 65(2), 413–497.

Summary of Inverse Methods

The following table summarizes all inverse methods available in MNE-CPP:

MethodTypeDomainSourcesKey Strength
MNEDistributedTimeAll corticalMathematically well-defined, full current distribution
dSPMDistributedTimeAll corticalNoise normalization removes depth bias
sLORETADistributedTimeAll corticalZero localization error for point sources
eLORETADistributedTimeAll corticalExact zero localization bias, iterative weights
CMNEDistributed + LSTMTimeAll corticalSpatiotemporal context improves spatial fidelity
MxNESparse (L21)TimeFew activeGroup-lasso sparsity with temporal smoothness
Gamma-MAPSparse (Bayesian)TimeFew activeAutomatic relevance determination, data-driven pruning
LCMVBeamformerTimeScanningAdaptive filter, good for focal sources
DICSBeamformerFrequencyScanningFrequency-specific source localization
RAP MUSICSubspace scanningTimeMultiple focalLocalizes multiple correlated sources
Dipole FitParametricTime1 dipole/timepointPrecise localization for truly focal activity

All methods share common preprocessing: data whitening with the noise covariance matrix CnC_n, application of SSP projectors, and (for MEG) software gradient compensation. The choice of method depends on the expected source configuration and the scientific question.