Primer

Introduction: What EEG Measures, What It Can Do, and What It Cannot Do

A realistic guide that refuses to turn EEG into magical mind reading

Mind Uploading Research Project

Public Page Updated: 2026-04-04 Human-first (updated with the recording-frame contract front-door sync)

How to use this page

Read this first to avoid getting lost

This page is an introduction that sorts out what EEG measures, what it is good at, and what it is bad at. It is designed to avoid the misreading that EEG can simply read the mind, while explaining where EEG fits inside Mind-Upload.

  • EEG is strong at tracking temporal change, but weak at identifying spatial origin with high precision.
  • The questions 'there is information in the signal,' 'the internal state is uniquely fixed,' and 'the result survives long-term use' require different audits.
  • Even when the score is the same, you still have to separate target-neural-variable signal from eye movement, EMG, uninstructed movement, feedback, subject / session fingerprint, and acquisition-distribution shortcuts such as site / device / reference / electrode layout.
  • Even with high-density EEG, source-imaging improvements stay source-class- and benchmark-conditioned: individual MRI/FEM geometry, visibility limits, conductivity calibration, and named validation class all matter.
  • Recent deep-source gains in epilepsy-style settings are conditional improvements, not a general license to read arbitrary deep or basal generators from scalp EEG.
  • Before any inverse solver runs, scalp visibility already depends on source extent, orientation, cortical folding, cancellation, and CSF-aware head modeling.
  • For ESI, simulation / phantom, intracranial stimulation, simultaneous SEEG, and postsurgical outcome are different validation classes, not one generic 'validated' box.
  • For ESI, a single best-looking inverse map is not accepted as stable evidence unless cross-solver / cross-parameter spread or posterior / ensemble width is also shown.
  • Debates about electrode count must not collapse channel count, coverage, local super-Nyquist effects, and WBE sufficiency into one meaning.
  • Preprocessing and QC are not cosmetic cleanup; they are acceptance conditions that determine which claims are allowed.
  • A `harmonized EEG` branch is not one object: common-channel reduction, interpolation to a target montage, and REST-based transformation preserve different measurement objects and claim ceilings.
  • Artifact removal and line-noise suppression do not by themselves solve source leakage or make directed connectivity causal.
  • Reference system, electrode layout, amplifier / filter chain, and recording site are part of the measurement condition, not background noise.
  • A high within-session score is not read here as cross-day robustness or deployability.
  • For cross-day or long-term claims, the Temporal Validity Card audits fixed decoder interval, state annotation, and recalibration burden separately.
  • For phase-targeted EEG loops, low latency or mean phase error is not enough; oscillation estimability, causal-versus-post-hoc targeting benchmark, downstream effect, and phase stability remain separate evidence objects.
  • For results that output probabilities or prediction sets, the Calibration & Abstention Card fixes fit/calibration/test separation, evaluation family, and coverage-risk.
  • Large-scale pretraining is promising, but comparisons that hide the pretraining corpus, overlap audit, channel mapping, benchmark object, benchmark version, or adaptation regime are not accepted here.
  • For foundation-model claims, benchmark object, independent prediction unit, grouped hold-out unit, and inference-stage budget remain separate disclosure fields.
  • A model that accepts arbitrary electrode layouts is still not a shared physiological coordinate system; electrode-coordinate route, reference family, and omitted-channel policy remain part of the claim.
  • For large pretrained EEG models, official benchmark operations and postmortems are treated as evidence-bearing conditions, so leaderboard rank is not read apart from split construction, inference-stage budget, or later organizer corrections.
  • Metric semantics are part of the claim: seizure tasks need sensitivity / false alarms / overlap / latency, and sleep staging needs macro-F1 / Cohen's kappa / per-stage agreement rather than one headline accuracy.
  • Recent sleep-staging benchmark papers also show that out-of-domain generalization and demographic / clinical bias can remain even when global accuracy improves.
  • Multimodal is not one thing; simultaneous acquisition, geometric fusion, invasive calibration, and atlas priors are read separately.
  • Same-session multimodal EEG acquisition is not self-validating; synchronized streams, shared cross-modal components, and externally calibrated biological variables are separate achievements.
  • Wearable OPM-MEG is not a portable naturalistic free pass; shielding class, field nulling, calibration / coregistration, anatomy route, crosstalk, and task regime still shape what 'movement tolerant' means.
  • A shared EEG-fMRI or EEG-PET-MRI factor can still mix neural, autonomic, and vascular contributions rather than identifying one target state.
  • A multimodal gain can still depend on complete-case subsets, missing-modality policy, and cross-centre transfer, so it is not automatic bundle-robust evidence.
  • Mind-Upload treats EEG not as a device that reads everything, but as a modality that provides macroscopic constraints.
Best for
Readers who want to understand EEG's strengths and limits first, and avoid inflated expectations
Reading time
12-18 minutes
Accuracy note
EEG is an important measurement modality, but by itself it does not provide all the information WBE would require. Read it by separating what EEG can do from what it cannot do.

Relatively clear at this stage

What we know now

  • EEG is strong for observing temporal change and forms a foundation for state estimation and event detection.
  • Because it is measured on the scalp, it has clear limits for spatial localization and deep-structure inference.
  • Field formation itself is selective: depth, orientation, cortical folding, source extent, cancellation, and tissue conductivities all change what reaches the scalp with usable SNR.
  • Under strict validation conditions, some studies have reconstructed signals near specific deep sources, but that is not the same as general unique recovery.
  • Recent deep-source gains are conditional: depth-weighted or conductivity-calibrated pipelines improve some epilepsy-style cases, but the gain still depends on source class, depth, SNR, and benchmark family.
  • For ESI, simulation / phantom, stimulation ground truth, simultaneous invasive recording, and postsurgical outcome answer different error questions and should not be collapsed.
  • For ESI, inverse-method, software-package, parameter, and conductivity choices can materially shift the reconstructed source, so solver-disagreement itself is part of the result.
  • Reference choice, electrode positions, device-side filtering, artifact handling, and retention rate can all change conclusions about ERP, connectivity, and decoding.
  • Common-channel intersection, interpolation to a target montage, and REST-based transformation are different recording-frame branches rather than one raw-equivalent preprocessing label.
  • Even the same decoding score means different things under within-session, cross-session, cross-subject, and closed-loop evaluation.
  • For phase-targeted loops, target-band power / SNR gate, no-stim rate, causal-versus-post-hoc benchmark, downstream comparator, and preferred-phase stability are separate audits from mean phase error or loop speed.
  • A high EEG score may reflect EOG / EMG / movement / feedback routes, subject / session fingerprint, or acquisition-distribution shortcuts, not only the target signal.
  • Foundation / self-supervised EEG models improve representation learning, but without a Pretraining Card they still remain qualified transfer evidence rather than portable robustness or source-identifiable recovery.
  • For foundation / self-supervised EEG models, benchmark provenance includes benchmark object / supervision unit, extra-data disclosure, pretrained checkpoint identity, fine-tuning regime, split randomness, hidden grouping, inference-stage restrictions, and later postmortems that can change leaderboard interpretation.
  • Layout-agnostic or heterogeneous-device support is a recording-frame advance, but it still does not prove reference-invariant or anatomy-invariant physiology across montages.
  • For foundation-model results, trial-level, epoch-level, and subject-level outputs are different objects, and the grouped hold-out unit still has to be named separately.
  • Rare-event and class-imbalanced EEG tasks cannot be read safely from accuracy or AUROC alone; the metric family itself changes what the score means.
  • For seizure tasks, event sensitivity, false alarms, overlap logic, latency, and threshold policy are separate performance objects, while sleep staging also needs Cohen's kappa, macro-F1, per-stage performance, and bias / subgroup slices.
  • Source-space connectivity and directed connectivity require even stronger assumptions and validation than source localization itself.
  • wPLI reduces some zero-lag mixing, but source leakage, ghost interactions, and observational-causality limits remain separate audit items.
  • Cross-site or cross-device scores are not safely interpretable unless harmonization logs and hold-out design are explicit.
  • Same-session multimodal acquisition narrows one mismatch, but still does not validate the fusion model, the shared factor, or the target biological variable by itself.
  • For wearable OPM-MEG, movement tolerance is still route-conditioned: shielding class, field control, calibration / coregistration, anatomy route, crosstalk, and task regime remain part of the claim.
  • Low-frequency multimodal agreement can still be driven partly by arousal, autonomic physiology, vascular transfer, or other shared nuisance structure.
  • A multimodal bundle can improve prediction while still remaining complete-case-limited, transfer-sensitive, or disagreement-prone in hard subgroups.

Still unresolved beyond this point

What we still do not know

  • Whether non-invasive EEG alone can reconstruct internal state at a level sufficient for WBE remains unresolved.
  • How far preprocessing differences change conclusions depends on the task and dataset.
  • Which source-imaging setup gives enough deep coverage across broad task families remains unresolved.
  • How far conditional deep-source gains survive realistic noise, basal-source regimes, and non-epilepsy tasks remains unresolved.
  • How far wearable OPM-MEG can move from shielded proof-of-concept and narrow task regimes to broader naturalistic source reconstruction remains unresolved.
  • The best tradeoff between artifact suppression and signal preservation also depends on the task and metric.
  • How far same-day separation ability generalizes to later days or long-term operation remains unresolved.
  • How far heterogeneous-layout support survives reference-family shifts, coordinate-route mismatch, and label-limited clinical adaptation remains unresolved.

Learn the basics

Check the basics in the wiki

TL;DR

EEG does not directly show the inside of the brain. It measures a mixed electrical pattern at the scalp. That gives EEG strong temporal resolution, but weak and blurry spatial localization. For that reason, Mind-Upload prioritizes data quality control (QC) and shared data-organization rules such as BIDS before anything else.

Misreadings worth stopping early

  • EEG is not magical mind reading: the observed signal is mixed, and the inverse problem has hard limits.
  • But EEG is not useless: it is strong for temporal change and state transition tracking, so it is useful as part of a verification stack.
  • Logs matter: without QC, synchronization, and preprocessing records, no one else can check the same result.
  • Multimodal is not an escape hatch: same-session fusion, shared factors, and robust bundle evidence still need separate audits.
When you want the terminology organized as a flow

If words such as EEG, QC, BIDS, inverse problem, ESI, and DCM keep appearing separately and feel disconnected, read Wiki: Guide to terms from measurement to modeling first. It makes it easier to see which stage of the workflow each term belongs to.

When you are unsure where to go next after this EEG introduction

If you understand EEG's limits and now want to decide whether to look at public datasets, go directly to the L0 practice section in Datasets, or read Verification first, see Wiki: Guide to the public pages.

When you want to continue only through practical pages after EEG

If you want to decide only among Verification, Datasets, the L0 practice section in Datasets, and the casework section in Verification, see Wiki: Guide to the practical pages.

When you want a single path from EEG to L0

If you want one straight route from this EEG introduction through dataset selection, required artifacts, and final L0 confirmation, see Wiki: One straight route from EEG to L0.

When the discussion suddenly becomes an estimation problem

Once ESI or DCM appears, the topic has moved from direct observation into inference under assumptions. If you want that transition organized together with the forward problem, inverse problem, and causal equivalence classes, read Wiki: From observation to estimation first.

When you get stuck on uncertainty

Confidence intervals, credible intervals, estimate width, and abstention under low confidence are all devices for avoiding overstatement. If you want a basic reading guide for that, see Wiki: Uncertainty, calibration, and abstention.

When you want to know what was actually held out in a high-scoring result

The meaning of an EEG decoding number changes depending on whether it is within-session, cross-session, or cross-subject. Same-day separation, cross-day durability, generalization to other people, and long-term operational stability are different questions. For that practical distinction, see Datasets: Generalization families and Wiki: State, trait, and drift.

When the same task label still hides a different measurement condition

This site previously stopped more clearly at subject / session fingerprint than at setup effects. That remained too weak for EEG. The official EEG-BIDS specification already separates electrodes, channels, coordinate system, and reference scheme instead of treating them as cosmetic metadata. Hu et al. (2018) showed that reference montage and electrode setup alter the measured scalp potential itself, Melnik et al. (2017) showed that system, subject, and session each influence EEG recordings, Xu et al. (2020) showed that cross-dataset deep-learning results move with environmental variability such as amplifier, cap, sampling rate, and filtering, and Dong et al. (2024) showed that cross-location comparison required an explicit REST-based offline transform rather than a vague statement that the datasets had simply been harmonized. Therefore, on this site, site / device / reference system / electrode layout / coordinate route / protocol are treated as part of the measurement condition. Inference from these sources: common-channel intersection, interpolation to a target montage, and REST-based transformation preserve different benchmark objects and different claim ceilings, so `harmonized EEG` is not read here as one raw-equivalent object. A score that does not report the original channel map, coordinate route, raw reference plus rereference family, omitted/interpolated-channel policy, and harmonization branch is not read as clean evidence that only the target neural variable changed.

When a large pretrained EEG model looks like a general decoder

This site does not read a foundation-model headline through score alone. Kostas et al. (2021) already framed transfer across differing hardware, subjects, and tasks as the core challenge, Jiang et al. (2024) explicitly listed electrode mismatch, unequal sample length, varied task design, and low SNR as still-open EEG barriers, Han et al. (2025), Chen et al. (2025), and El Ouahidi et al. (2025) then pushed toward channel-permutation equivariance, heterogeneous-electrode adaptation, and setup-agnostic pretraining, while Lee et al. (2025) still found only marginal gains over conventional deep baselines under fine-tuning. Ma et al. (2026) further showed that EEG foundation models can generalize poorly when subject-level supervision is limited unless extra adaptation structure is added, Xiong et al. (2025) argued that inconsistent evaluation protocols make cross-model comparison unreliable, Lahiri et al. (2026) showed that narrow-source and diverse-source pretraining can trade places across linear-probe and fine-tuning regimes, and Liu et al. (2026) showed that linear probing is often insufficient and specialist-from-scratch models remain competitive. Therefore, on this site, `supports arbitrary layouts` is read only as recording-frame compatibility under a declared coordinate route, reference family, and omitted-channel policy, not as proof that different montages already share one physiology-preserving representation. Large-model EEG results are read through the Wiki: Pretraining Card and Verification: Pretraining Card. If the downstream claim is about a neural variable rather than just benchmark transfer, they must also pass the Specificity & Shortcut Card, because `works across setups` is still not the same as `stopped reading identity / setup shortcuts`.

Foundation-model benchmarks are not all scoring the same object

The official EEG Challenge 2025 homepage makes the split explicit: Challenge 1 predicts response time from CCD trials, while Challenge 2 predicts externalizing scores from EEG across multiple paradigms. The official rules add that Challenge 1 is scored per trial, submissions are inference-only, and models must run on a single GPU with 20 GB memory. Lee et al. (2025) spans memory tasks and sleep stage classification, Liu et al. (2026) contrasts leave-one-subject-out cross-subject evaluation with within-subject few-shot calibration, and Lahiri et al. (2026) shows that benchmark inconsistencies can reverse rankings on identical datasets by up to 24 percentage points. Therefore, on this site, a foundation-model result must state not only the benchmark name, but also the predicted object, the independent prediction unit, the grouped hold-out unit, the adaptation regime, and the operations budget. Without that distinction, "transfers across tasks" remains too coarse and too easy to overread as one general decoder.

Benchmark operations are part of the result, not administrative detail

The official EEG Challenge 2025 homepage states that the challenge paper is already out of date relative to execution-phase changes. The official submission page fixes the event as an inference-only code competition, while the rules page requires disclosure of extra pretraining datasets, pretrained checkpoints, fine-tuning method, and a single-GPU 20 GB inference-stage budget. The final leaderboard then disclosed that Challenge 2 samples had not been randomized, which allowed some teams to exploit contiguous-trial same-subject structure and led organizers to split the awards. On this site, that means benchmark provenance is part of the evidence: split construction, hidden grouping, checkpoint policy, normalization, inference budget, and later postmortems all change what a leaderboard score is allowed to mean.

Metric semantics are part of the EEG claim, not a post-processing detail

This page had already separated split design, benchmark provenance, and calibration, but it still left one weak shortcut too implicit: what score is allowed to summarize the task. The primary literature does not support that shortcut. Saito & Rehmsmeier (2015) showed that precision-recall views are often more informative than ROC summaries under strong class imbalance. In seizure tasks, Roy et al. (2021), Scheuer et al. (2021), and Segal et al. (2023) show that practical value turns on sensitivity, false alarms, event-overlap rules, latency, and sometimes threshold / calibration control rather than one headline number. In sleep staging, Sun et al. (2017), Vallat & Walker (2021), and Dei Rossi et al. (2026) show that Cohen's kappa, macro-F1, per-stage performance, and bias / out-of-domain slices answer different questions. Therefore, on this site, accuracy or AUROC alone are not accepted as a generic EEG headline when class balance, event cost, or stage structure matters.

Common EEG task family Minimum metric bundle at this entry level Overreading stopped here
Cue-locked classification / state decoding Balanced accuracy or macro-F1, confusion matrix, subject-wise aggregation, and calibration / abstention if probabilities are output. One accuracy number is enough to claim robust neural decoding.
Seizure detection / forecasting Event sensitivity or recall, false alarms per hour or day, event-overlap / onset rule, latency, and threshold or calibration policy. AUROC or accuracy alone means clinically usable alarm behavior.
Sleep staging Cohen's kappa, macro-F1, per-stage recall / F1 especially for N1, confusion matrix, and subgroup slices if a deployment or clinical claim is made. Pooled accuracy means staging is equally reliable across sleep stages and patient groups.

What EEG is, very roughly

EEG records voltage differences using electrodes placed on the scalp. What is observed is the result of many neural activities mixed together and filtered through tissue, skull, and scalp. EEG can measure rapidly, at millisecond scale, but the signal is spatially blurred and therefore easy to overinterpret if the limits are ignored.

Where EEG is strong, and where it is weak

What EEG is strong at What EEG is weak at
Tracking temporal change such as state transitions, phase, and short responses
Connecting naturally to closed-loop control
Stating exactly where activity occurred, especially for deep sources
Reading “the content of thought” directly, which is the classic overclaim
Relatively inexpensive and easier to scale
Suitable for repeated measurement and longitudinal tracking
Highly vulnerable to noise and artifacts such as blinks, muscle activity, line noise, and movement
Reference choice and preprocessing can materially change the result
Important

When QC logs and metadata are present, EEG can track state change at millisecond scale and is practically valuable. When the recording conditions are missing, even a plausible-looking model loses reproducibility.

Question What EEG alone can support relatively well What needs extra modalities or conditions
How brain state is changing now Temporal change in wakefulness, sleep, anesthesia, or task response is relatively accessible. Strong claims about which deep structure or cell population caused that pattern need extra structural or multimodal information.
Where in the brain activity occurred Broad scalp pattern and rough distribution can be observed. Strong anatomical or deep-source claims need support from MRI, MEG, invasive recording, or similar evidence.
The content of thought itself Task-constrained classification and state estimation are possible targets. Claims about reading unconstrained thought mix easily with model completion and need counterfactual testing.
Whether the information is sufficient for WBE EEG can provide macroscopic constraints and a scaffold for state tracking. Detailed reconstruction at synapse or local-circuit scale is impossible from EEG alone and presupposes multimodal integration.
Typical ways “the same EEG” yields different results

Changing the reference scheme changes the apparent scalp distribution. Changing electrode positions or channel layout changes which field patterns are even sampled. Changing filter settings or the device-side chain changes which slow or fast components remain. Tightening artifact removal may also remove neural signal you wanted to preserve. That is why results cannot be compared unless the acquisition condition and processing path are recorded together with the result.

Information-theoretic limits of EEG and what they imply for WBE

To place EEG inside the context of WBE, it is useful to distinguish between constraints that technology may improve and limits that follow more fundamentally from physics and mathematics.

The gap in spatial resolution

  • What scalp EEG directly gives you: a mixed voltage distribution on the scalp. Even when source estimation is used, effective resolution depends strongly on electrode density, SNR, head-model quality, and electrode-coordinate error.
  • What high-density EEG improves: constraints on broad activity near the cortical surface become better, but deep sources and fine-grained local circuits still cannot be uniquely reconstructed.
  • The gap to WBE: if WBE required preservation of cellular, synaptic, and local-circuit dynamics, scalp EEG would still be orders of magnitude too coarse. More sensors alone do not close that gap.

Bandwidth constraints

  • Sensor count is not the same as independent information: even at 256 channels, volume conduction and spatial correlation reduce the independent degrees of freedom far below raw channel count.
  • EEG observes a low-dimensional projection: it reflects large-scale brain dynamics, but it does not carry the full detail of microscopic circuit structure.
  • So its role is necessarily limited: EEG is useful for state estimation, transition detection, closed-loop control, and constraining other modalities, but not for reconstructing the entire circuit by itself.

Fundamental physical and mathematical constraints

  • Field formation and volume conduction: from intracranial current source to scalp, skull, cerebrospinal fluid, and scalp, the signal is not only low-pass filtered but also reshaped by source extent, orientation, cortical folding, and cancellation. This is a physical limitation, not something removed by better hardware alone.
  • Ill-posed inverse problem: multiple different internal source configurations can generate the same scalp potential pattern. In principle, scalp EEG cannot uniquely determine the source configuration (Hämäläinen & Ilmoniemi, 1994). Prior information such as structural MRI, fMRI, or anatomical constraints is required.
Before the inverse problem, there is already a field-formation visibility wall

Another weak reading to stop early is to think that all important neural events are already present at the scalp and that source imaging only has to recover them algorithmically. The primary literature does not support that shortcut. Ahlfors et al. (2010) quantified source-orientation sensitivity with realistic boundaries and found that the median ratio between the least and most sensitive orientations was 0.63 for EEG but only 0.06 for MEG. Ahlfors et al. (2010) also showed that extended and distributed sources can cancel substantially at the surface. Goldenholz et al. (2009) then showed that source extent and anatomy matter strongly for detectability: in mesial temporal cortex, simulated source patches of about 3 cm2 versus 8 cm2 differed by roughly 10 dB in SNR, and the modality preference changed across cortex rather than staying fixed. Piastra et al. (2021) further showed with detailed FEM models that ignoring the CSF compartment overestimates EEG SNR and that cortical / subcortical sensitivity depends jointly on depth and orientation. Therefore, what reaches the scalp is already a geometry- and tissue-filtered subset of brain activity. More channels or a more sophisticated inverse solver cannot recreate source detail that field formation never delivered to the sensors.

Conclusion: EEG alone cannot provide enough information for WBE

The role of EEG is to provide macroscopic constraints on brain dynamics. Concretely, that includes:

  • Global-state calibration such as wakefulness, sleep, or anesthesia
  • Consciousness monitoring such as PCI-style measures
  • Longitudinal stability tracking, separating what changes from what remains stable over hours or days

If the target is WBE, EEG belongs as one macroscopic modality that complements invasive recordings, electron microscopy, molecular profiling, and other richer measurements.

Do not underrate it either: under strict conditions, limited deep inference exists

It would also be inaccurate to say “deep sources are completely invisible.” Studies combining high-density EEG (256 channels), individual MRI, realistic head models, and simultaneous intracranial recording have reported significant correspondence between scalp-EEG source images and signals near the thalamus or nucleus accumbens (Seeber et al., 2019). Afnan et al. (2024) further showed that depth-weighted cMEM improved localization of deep generators, especially mesial temporal sources, across 5400 realistic simulations and epilepsy patient data. But in direct validation using intracranial electrical stimulation as ground truth, even 37-electrode scalp EEG still showed mean localization errors of 10.3-26 mm depending on source depth and skull-conductivity optimization (Unnwongse et al., 2023). The right summary is therefore not “impossible,” but rather “limited inference is possible under tightly fixed conditions, and the error depends strongly on source class, source depth, SNR, and conductivity assumptions.”

As of 2026-03, what should count as “improvement”?

Source-imaging improvement is not established merely by increasing channel count. At minimum, one should show a forward model built from individual MRI or equivalent geometry, a visibility argument that the targeted source class is expected to reach the scalp given its depth / orientation / extent / cancellation profile, a posterior distribution or confidence interval, sensitivity analysis of the forward and tissue model, and validation against a named validation class such as simulation / phantom, intracranial stimulation, simultaneous invasive recording, or postsurgical outcome. Improvement means not “we can now see deep structures,” but “a third party can audit under which conditions the error was reduced, by how much, for which source class, and on which benchmark family.”

Validation class What it actually checks Safe reading on this site What you still cannot say
Simulation / phantom Solver behavior, geometry sensitivity, and pipeline regression under known generative conditions. Useful for checking whether error grows gracefully when geometry, coverage, conductivity, or noise changes. It is not living-brain ground truth and does not by itself validate spontaneous or clinical source recovery.
Intracranial stimulation ground truth Localization error against a known stimulation site and time under fixed geometry; see Mikulan et al. (2020) and Unnwongse et al. (2023). Useful for asking how error moves with conductivity assumptions, electrode count, and source depth. It does not by itself validate spontaneous pathological dynamics or universal deep-source recovery.
Simultaneous invasive recording Concordance with concurrent SEEG/ECoG in a specific cohort and event regime; Hao et al. (2025) is one example. Useful for auditing where scalp-to-source inference still tracks invasive activity under the same event. It does not make the inverse problem unique and does not automatically generalize beyond the cohort, montage, or pathology.
Postsurgical outcome / clinical concordance Whether ESI points into clinically relevant tissue or resection zones; Birot et al. (2014) is an example. Useful as decision-support evidence in presurgical workflows. It is not a direct ground-truth measurement of the true source location or full internal state.
What you should not claim from starter public datasets

This kind of validation requires individual MRI, electrode coordinates, and external ground truth. For that reason, starter datasets such as EEG Motor Movement/Imagery, CHB-MIT, Sleep-EDF, and TUH EEG are useful for L0-L1 practice and benchmarking, but not as direct source-imaging benchmarks. For the use-case split, see Datasets: Starter-dataset audit.

2026-03 literature audit: EEG claims are cut by four gates

The key weakness we needed to tighten was the temptation to read EEG progress from a single score or a single source map. When you line up the primary literature, at least observability, specificity, identifiability, and deployability are separate audits. The facts that some information is present in the scalp signal, that the information comes from the target neural variable, that the internal source or network has been narrowed toward a unique solution, and that performance survives across days or in closed loop are not supported by the same evidence.

gate Question asked here Minimum evidence wanted What this gate still cannot claim
observability Is information about the target state or task condition actually present in scalp EEG? Synchronized event logs, an appropriate hold-out, simultaneous recording or an external reference, and decoding / tracking with artifact audit. It still cannot claim source uniqueness, general deep-structure recovery, causal network structure, or long-term operational viability.
specificity Is the score reading the target neural variable, or is it reading shortcuts such as eye movement, EMG, uninstructed movement, feedback, or subject fingerprint? EOG / EMG / behavior logs, nuisance-only baselines, countermeasures such as fixed gaze or feedback-off, and slice-wise hold-outs. You still cannot write "the score is high, therefore it is a neural readout" or "the class appeared, therefore the internal state was understood."
identifiability How narrowly and uniquely can internal sources or networks be constrained from the observation? Individual MRI, measured electrode coordinates, the forward model, conductivity assumptions, uncertainty display, and external validation via phantom, intracranial stimulation, or simultaneous invasive recording. You still cannot write "the visible source is the true unique solution" or "the network diagram is the circuit diagram."
deployability How well does performance survive across days, reattachment, long-term drift, and closed-loop use? Cross-session / cross-subject evaluation, a fixed decoder interval, recalibration burden, disclosed latency, jitter, and abstention behavior, plus phase-targeting logs when phase-locked control is claimed. You still cannot rephrase a high same-day score as real-world stability or WBE-relevant sustained constraint.
Site rule from this 4-way split

Results that pass observability are described only as "the signal contains information". Only after specificity evidence is added do we write that "the information is relatively specific to the target neural variable". Only after identifiability evidence is added do we write that "a source / network was estimated". Only after deployability evidence is added do we write that "the method may survive across days, long-term use, or closed loop." This is how the boundaries drawn by Hämäläinen & Ilmoniemi (1994), Mostert et al. (2018), Musall et al. (2019), Ma et al. (2022), and Wilson et al. (2025) are turned into operational reading rules on this site.

2026-03-18 addendum: "something was visible" is not the same as "the target was visible"

Mostert et al. (2018) showed that visual-working-memory decoding can retain an eye-movement confound, Muthukumaraswamy (2013) summarized how high-frequency power overlaps with muscle artifact, McFarland et al. (2005) showed that EMG can boost early BCI-session performance, and Chen et al. (2024) showed that post-onset auditory feedback can inflate offline speech-decoding scores. In addition, Chaibub Neto et al. (2019) showed that diagnostic models can learn identity-confounding signals when repeated measures are not participant-disjoint, while Wang et al. (2020) and Di et al. (2021) showed that resting-state EEG can support time-robust person identification. For that reason, decode / biomarker results on this site are paired with the Verification: Specificity & Shortcut Card so the target path and shortcut paths are audited separately.

The same decoding score can support very different claims

One of the most common EEG misreadings is to take same-day / same-subject separation and silently rephrase it as stable across days, generalized to other people, or deployable long term. The MOABB documentation itself separates `WithinSessionEvaluation`, `CrossSessionEvaluation`, and `CrossSubjectEvaluation` because they are different scientific questions.

Evaluation family What it mainly tests What it still cannot say
within-session How well task and class separate on the same day and setup. It does not establish cross-day stability, tolerance to reattachment, or long-term drift robustness.
cross-session How well features survive across days for the same person, despite state variation and setup change. It does not establish subject-independent generalization or the absence of recalibration requirements.
cross-subject Whether some structure generalizes across people and can support a cold-start route. It does not establish the same strength of claim as a decoder adapted to one individual, nor a personal long-term controller.
longitudinal / closed loop How many days a fixed decoder survives, plus recalibration burden, latency, silence/abstention behavior, and phase-targeting logs when relevant. Offline accuracy alone is not enough to call it a deployable loop.
What the primary literature actually shows

In the five-day MI dataset from Ma et al. (2022), subject-specific mean accuracy fell from 68.8% within-session to 53.7% cross-session, then rose to 78.9% with adaptation using a small amount of target-session data. Same-day score and cross-day durability are therefore not the same achievement. Musall et al. (2019) further showed that neural dynamics during task can be strongly shaped by uninstructed movement, and Wilson et al. (2025) showed that long-term BCI operation requires frequent recalibration because activity patterns themselves drift over time.

A safe reading rule for this page

When you see a high EEG score, check at least (1) what was held out, (2) how eye movement, muscle activity, and uninstructed movement were audited, (3) whether target-day or target-subject data were used, (4) how many days a fixed decoder was held, and (5) how the recording frame was aligned across layouts or sites, if setup diversity is part of the claim. If these are missing, this site reads the result only as a qualified decode, not as WBE-relevant state reconstruction or a deployable loop.

The analysis flow assumed by Mind-Upload

01

Measurement and QC

Record impedance, noise floor, and synchronization behavior such as delay, jitter, and drift in log form so later analysis can use them.

02

Preprocessing

Apply filters, reference schemes, artifact handling for blinks, muscle activity, and line noise, plus any cross-setup harmonization, while recording the coordinate route, reference family, omitted/interpolated-channel policy, and whether comparison used common-channel reduction, interpolation, REST, or another explicit transform.

03

Features and metric bundle

Compute spectra, phase synchrony, complexity measures, state transitions, and similar descriptors, then choose a task-matched metric bundle instead of one headline score.

04

Source estimation (if needed)

Use head models or MRI to infer cortical activity, but treat uncertainty as part of the result because this is an inverse problem.

05

Modeling and verification

Do not stop at correlation-based decoding; test intervention prediction and counterfactual behavior under changed stimulus or task conditions.

Minimum QC: without it, comparison breaks

Mind-Upload treats the presence of a QC log as the minimum floor for joining a benchmark.

Minimum QC

  • Channel quality: missing channels, saturation, impedance, suspected bridging
  • Noise: line frequency, hardware noise bands, and estimated noise floor
  • Artifacts: blinks, eye movements, muscle activity, movement, cardiac contamination
  • Synchronization: delay, jitter, and drift, ideally measured end-to-end
  • Preprocessing log: filter settings, reference scheme, rejection method, threshold, and excluded intervals
Synchronization and events are also part of QC

Even a clean-looking waveform becomes unreliable if stimulus or response timing is ambiguous. If you want the minimum measurement log, including event markers, stimulus logs, and bad-segment records, see Wiki: Basics of event synchronization and measurement logs.

Closed-loop use adds a different kind of difficulty

EEG's temporal resolution makes it attractive for closed loop, but then end-to-end latency, jitter, and safe-stop logic matter. For phase-targeted loops, the site now also separates oscillation estimability, targeting accuracy, downstream effect, and phase stability instead of summarizing the result by one timing number. For the difference between that and offline accuracy, see Wiki: Closed loop, latency, jitter, and safe stops.

Phase-targeted EEG loops need four separate logs at the entry page

This was the remaining frontdoor weakness in the EEG introduction. The deeper wiki already fixed the rule, but the entry page still let readers compress phase-targeted EEG into generic real-time or low-latency language. The primary literature does not support that shortcut. Mansouri et al. (2018) and Zrenner et al. (2018) established that real-time phase-triggering is feasible, but Zrenner et al. (2020) showed that meaningful phase estimation itself degrades when oscillatory amplitude and signal-to-noise ratio are low. Gordon et al. (2021) then improved prefrontal theta targeting only after adding low-theta and phase-reset exclusions together with a post-hoc benchmark. Kim et al. (2023) showed across 11 public datasets and 484 participants that better prediction is largely a power / SNR problem rather than a generic cognitive-state problem. Vigué-Guix et al. (2022) achieved reliable trial-to-trial alpha phase locking without a consistent behavioral benefit, and Hougland et al. (2025) showed that the optimal sensorimotor mu-phase fluctuates within-session and has low test-retest reliability. Therefore, on this site, a phase-targeted EEG result is not summarized by mean phase error alone.

  • Estimability: target band, channel or spatial filter, power / SNR gate, phase-reset rejection, and no-stim rate.
  • Targeting accuracy: causal estimator family, post-hoc benchmark, circular targeting metric, and off-target or random-phase comparator.
  • Downstream effect: physiological or behavioral comparator rather than phase locking alone.
  • Phase stability: whether the preferred phase was fixed or adaptively updated, plus within-session drift and cross-session reliability.

If those logs are missing, this site keeps the result at exploratory state-dependent timing evidence rather than validated phase-specific control. The longer operational rule is in Wiki: phase-targeting wall and Verification: Mind Uploading Verification Commons.

2026-03 literature audit: preprocessing is not cosmetic, it is an acceptance gate

EEG preprocessing is not the stage where you merely make the later figures look better. It is the auditing step where you decide which parts of the signal are treated as neural and which claims must still be stopped. COBIDAS-MEEG and EEG-BIDS define a concrete reporting floor. The PREP pipeline shows that bad-channel handling and rereferencing cannot be separated cleanly, Widmann and colleagues show that filter design can distort waveform shape and latency, and recent decoding work shows that artifact correction does not always improve performance. For that reason, this site treats preprocessing as an acceptance gate, not as preference tuning.

Gate How to read it as of 2026-03 Claim stopped at this page
Metadata / reporting gate Reference, ground, filters, bad channels, electrode coordinates, and event timing must be recorded before comparison is meaningful. Writing only “clean EEG is available, so the result is reproducible.”
Reference gate Reference choice is not a tiny implementation detail; it changes the meaning of sensor-space patterns and connectivity measures. Drawing strong conclusions from topography or network measures while hiding the rereference scheme.
Filter gate Not only cutoff but order, type, and causal versus acausal design can move onset and waveform shape. Making strong claims about ERP latency or high-frequency increase while omitting filter design.
Artifact gate ICA, ICLabel, Autoreject, and PREP are strong candidates, but maximizing accuracy is not the same as maximizing neural specificity. Treating the pipeline with the highest decoding score as automatically the best pipeline.
Retention / high-frequency gate EMG overlaps with high beta and gamma, while aggressive cleaning can remove neural signal too. Retention rate and raw-clean deltas should be reported numerically. Reading gamma power as neural gain without an EMG audit.
Site rule derived from this audit

At minimum, record the raw reference, the rereference scheme, filters, bad channels and bad segments, the number of removed components or epochs, the number of retained trials and channels, major raw-clean metric deltas, and, if possible, a sensitivity analysis against one alternative pipeline. The practical details are centralized in Wiki: EEG preprocessing and QC.

Connectivity metrics are not leak-proof or causal just by name

Artifact-cleaning and line-noise tools such as ASR and ZapLine help with reproducible cleanup, but they do not solve volume conduction, source leakage, or directional identifiability. Vinck et al. (2011) made wPLI safer than coherence or PLV against some zero-lag mixing, yet Haufe et al. (2013) showed that sensor-space connectivity remains severely limited by volume conduction, Palva et al. (2018) showed that even leakage-insensitive source-space measures can yield ghost interactions, and Ye et al. (2020) evaluated STE under TMS precisely because causality is difficult to identify from observations alone. Therefore, on this site, wPLI, source-space connectivity, and STE are read as auditable estimators with explicit ceilings, not as leak-proof or causal readouts by naming convention.

Where EEG can contribute to WBE in a realistic way

Realistic contributions

  • L0: build a reproducible pipeline on public data, with standardization and auditability
  • L1: estimate states such as wakefulness, sleep, anesthesia, or task condition robustly
  • L2: move toward models that can predict temporal development under changed stimuli or tasks
  • Longitudinal work: separate stable features from changing features across hours and days, as groundwork for later identity-related questions

Common misunderstandings to stop here

A

“EEG can read thoughts directly”

In practice the output can depend on noise, individual differences, training-data bias, and the prior distribution of the language model. You still need counterfactual tests to separate what is genuinely EEG-derived.

B

“Preprocessing is a minor detail, so it can wait”

Results change with preprocessing, reference choice, and recording-frame harmonization branch. That is why Mind-Upload fixes QC, metadata, and reproduction procedures first.

C

“The accuracy is high, so it should work on other days and other people too”

Within-session, cross-session, cross-subject, and closed-loop are different questions. Even with the same score, different hold-out conditions and recalibration burden support different-strength claims.

What must now be stated rigorously about ESI

The key point in this section is not which algorithm name was used, but what chain of evidence was used to audit error. EEG source imaging is easy to overrate if future promise is confused with the minimum evidence required today.

Five floors of evidence you should disclose

  • Visibility floor: before inverse choice, disclose why the targeted generator class should reach the scalp at all, naming source depth, orientation, extent, cancellation risk, and whether CSF-aware head modeling was used. A better solver cannot recover a source that field formation filtered below usable SNR.
  • Reporting floor: under COBIDAS-MEEG and EEG-BIDS, publish the reference scheme, SamplingFrequency, SoftwareFilters, bad segments, preprocessing conditions, electrode coordinates, and coordinate system. BIDS is necessary but is not itself ground truth.
  • Geometry floor: if making an anatomical source claim, disclose individual MRI, measured electrode positions, the forward model, and assumptions about skull conductivity. Template MRI can substitute, but geometry and conductivity mismatch still widen error (Birot et al., 2014; Aydin et al., 2019).
  • Validation floor: report localization error or concordance against a named validation class such as simulation / phantom, intracranial stimulation, simultaneous SEEG/HD-EEG, or a clinical cohort with postsurgical outcome. A point estimate alone is not enough, and “externally validated” is too vague.
  • Solver-stability floor: if standard inverse families or reasonable parameter windows give materially different source locations, report the spread. A single best-looking map is not enough by itself.

Separate what is required from what is promising

  • Required: a described forward model, electrode coregistration, preprocessing logs, error metrics, published failure conditions, and either posterior / ensemble width or cross-solver / cross-parameter sensitivity.
  • Promising candidates: Bayesian uncertainty maps, Champagne-family methods, and Bayesian skull-conductivity modeling are all strong options, but as of 2026-03 none can yet be called the single standard answer.
  • What should not be frozen into a recipe: Michel & Brunet (2019) explicitly warn against relying blindly on automatic artifact removal and recommend visual inspection alongside it. Therefore ASR and ZapLine-plus are better treated as candidates to be audited with sensitivity analysis, not as universal mandatory steps.
2026-03-19 addendum: a single best inverse map is not enough

The remaining weak point was to let one clean source map stand in for the whole solution set. Mahjoory et al. (2017) showed that inverse-method and software-package choice can materially change EEG source localization and especially downstream connectivity, and explicitly recommended verifying results with more than one source-imaging procedure. Mikulan et al. (2020) then used known intracranial stimulation sites and showed that, when all tested parameter combinations were considered, localization distances in the benchmark commonly ranged from about 2 mm to 50 mm rather than collapsing to one stable answer. Vorwerk et al. (2024) further showed that tissue-conductivity uncertainty, especially skull conductivity, can shift reconstructed depth and location. On this site, a single best-looking inverse map is therefore not read as stable anatomical evidence unless the paper also shows cross-solver / cross-parameter spread or a posterior / ensemble uncertainty width.

2026-03-27 addendum: conditional deep-source gains are not a general deep-coverage license

The next shortcut to block is to compress every recent deep-source paper into one impression that “deep recovery is now basically solved if the pipeline is careful enough.” The primary literature does not support that reading. Afnan et al. (2024) improved deep-generator localization, especially mesial temporal sources, across 5400 realistic simulations and epilepsy patient data, but that paper is still a conditional deep-epilepsy route rather than a generic proof of whole-brain deep recoverability. Hao et al. (2025) then showed in 29 simultaneous HD-EEG/SEEG cases that ictal ESI remained at 14.07 ± 4.62 mm versus 17.38 ± 4.16 mm for interictal ESI, with source depth and spike power still materially affecting accuracy. Vorwerk et al. (2026) further showed that individualized skull-conductivity estimation can reduce source-localization uncertainty from a few centimeters to less than one centimeter for most simulated sources, while explicitly finding that sources at the base of the brain remain less improved and that point estimates still hide residual uncertainty. On this site, those papers are therefore read as source-class- and benchmark-conditioned error reduction, not as generic proof that scalp EEG now observes arbitrary deep generators.

The upper bound shown by current measured evidence

  • Field formation already removes information: orientation, folding, source extent, and cancellation decide which generators reach the scalp with usable SNR before inversion even starts.
  • Deep detection works only under explicit conditions: Seeber et al. (2019) used 256-channel scalp EEG with simultaneous DBS recording and found correspondence near thalamic and nucleus-accumbens regions. That does not mean deep sources are uniquely recoverable in general.
  • Stimulation-ground-truth error still spreads with assumptions: Unnwongse et al. (2023) used intracranial stimulation as ground truth and showed mean localization errors ranging from 10.3 to 26.0 mm depending on source depth and skull-conductivity setting.
  • Concurrent invasive reference still leaves sizable error: Hao et al. (2025) reported mean localization errors of 14.07 +/- 4.62 mm for ictal ESI and 17.38 +/- 4.16 mm for interictal ESI in simultaneous HD-EEG/SEEG, with depth and source power strongly affecting performance.
  • Conductivity calibration helps but does not erase the wall: Vorwerk et al. (2026) reduced localization uncertainty for many simulated sources, yet basal sources remained less improved and residual uncertainty still needed explicit treatment.
  • Validation class is part of the claim: stimulation ground truth, simultaneous invasive recording, and postsurgical outcome do not answer the same error question. On this site, their benchmark class must be named before the claim ceiling is raised.
  • Even in clinical use, overconfidence is unsafe: In the E-PILEPSY systematic review by Mouthaan et al. (2019), source imaging had summary sensitivity of 82% and specificity of 53%, but all included studies still carried some form of bias. Even in a relatively structured presurgical setting, source imaging remains decision support, not ground truth.
Conclusion of this page

If you want to claim improvement in ESI, the first question is not “which method did you use?” but on which benchmark, for which error, and by how much was that error reduced? Claims about source imaging should not be closed inside public starter datasets alone. They should be read together with Datasets: The source-imaging validation ladder.

Multimodal integration is not a shortcut; it is another audit problem

It is not enough to say that EEG alone is insufficient and therefore adding other modalities automatically solves the problem. Integration is useful, but it also adds each modality's own error sources.

Roles and constraints by integration type

  • EEG + fMRI: the complementarity across time and space is attractive, but simultaneous recording adds artifacts and safety constraints that depend on field strength. As Jorge et al. (2015) shows, at 7T even cable length and placement can materially change noise, and the hemodynamic side still needs vascular-state / CVR interpretation rather than “just add spatial resolution.”
  • EEG + MEG: the sensitivity profiles are complementary, and Aydin et al. (2014) showed that combined EEG/MEG with realistic head models and conductivity calibration can improve reliability. But that advantage depends on the quality of skull-conductivity calibration and coregistration.
  • EEG + PET + MRI: simultaneous acquisition can compare electrophysiology, hemodynamics, and metabolism under one window, but PET quantification, vascular interpretation, shared-factor specificity, and bundle robustness still remain explicit audit items.
  • OPM-MEG: movement-tolerant MEG is a real advance, but shielding class, field nulling, sensor calibration / coregistration, anatomy route, crosstalk, and task regime still determine what the data mean.
2026-04-01 addendum: wearable OPM-MEG is not one portable route

This page still left one multimodal shortcut too open. The current primary literature does not support reading wearable OPM-MEG as if "movement-tolerant" automatically meant shield-light, calibration-light, MRI-free, and naturalistic source-valid measurement at once. Boto et al. (2018) established wearable feasibility but also exposed sensor saturation under head movement without background-field control. Rea et al. (2021) and Mellor et al. (2022) show that precision field modeling and nulling remain core engineering requirements, Holmes et al. (2025) show that lighter shielding still depends on active compensation plus tSSS rather than ordinary-room portability, Rhodes et al. (2025) show that pseudo-MRI can be useful for group studies while still treating individual MRI as the gold standard, Wu et al. (2025) show that array crosstalk remains a practical design limit, and Spedden et al. (2025) show whole-body stepping feasibility in only three healthy participants under a narrow sensorimotor beta paradigm. Therefore, on this site wearable OPM-MEG is read as movement-tolerant macro electrophysiology under disclosed shielding, field control, calibration / coregistration, anatomy route, crosstalk, and task regime, not as portable unconstrained brain readout or as a generic upgrade in state observability.

Same-session multimodal EEG still needs a Fusion Card

The older short rule on this page was still too weak because it let simultaneous or multimodal sound closer to self-validating than the primary literature supports. Kothe et al. (2025) describe LSL as synchronization infrastructure rather than device-side delay truth, Wei et al. (2020) treat EEG-fMRI fusion as a model-conditioned inference problem, Vafaii et al. (2024) show that simultaneous multimodal recordings retain both common and divergent organization, and Chen et al. (2025) show coupled global dynamics together with modality- and network-specific structure in simultaneous EEG-PET-MRI across wakefulness and NREM sleep. Therefore, even same-session multimodal EEG results still need a Fusion Card before they are read above the strongest unimodal or prior-conditioned ceiling.

A shared multimodal factor can still be mixed physiology

Even when a paper reports a strong common component, that component is not yet automatically the target neural variable. Gold et al. (2024) show that fMRI-autonomic covariance grows as vigilance decreases in simultaneous EEG-fMRI-autonomic recordings, Özbay et al. (2019) show that sympathetic activity contributes to the fMRI signal during EEG-marked arousal events, and Epp et al. (2025) show that significant BOLD changes can oppose oxygen-metabolism changes across about 40% of task-responsive cortical voxels. On this site, a shared EEG-fMRI or EEG-PET-MRI factor therefore has to be labeled as a shared neural candidate, a physiology-linked global factor, or mixed / unresolved rather than being promoted directly to one solved state variable.

More modalities do not make the bundle robust by default

Bundle-level performance gains are real, but they still need their own stop lines. Rohaut et al. (2024) show that multimodal assessment can improve neuroprognosis while reducing uncertain cases in severe brain injury. But Amiri et al. (2023) treated direct same-sample multimodal prediction as a separate analysis inside an acute DoC cohort, and Manasova et al. (2026) report cross-centre evaluation together with missing-value substitution and higher inter-modality disagreement in clinically hard groups. Therefore, this site does not read ``more modalities improved performance'' as if the bundle were already availability-agnostic, transfer-stable, or coherent in the hardest regimes.

Implementation rule

The value of integration is not that “more modalities were used.” The value lies in recording coordinate-alignment error, delay mismatch, noise-structure differences, shared-vs-specific component logic, availability / complete-case slice, and uncertainty propagation separately, then showing which specific errors were reduced. Even after integration, each modality's raw data and metadata should still remain individually auditable. If the bundle also mixes living-human proxy classes such as PET, MRSI, or support-state routes, this page also routes it to the Human Proxy Composition Card rather than letting the word multimodal carry that burden implicitly.

When you want only the entry-level picture first

If you want a basic explanation of why EEG, MEG, fMRI, ECoG, and MRI are combined at all, read Wiki: Basics of multimodal integration first, then come back to this section.