Bottom line in one sentence
EEG foundation models are an important advance for representation learning and low-label downstream tasks. However, that advance is readable only after separating what data the model was pretrained on, how formats were harmonized, and how far adaptation went downstream. A large model name alone does not determine either the strength of generalization or which claims still need to be stopped.
This page does not cover philosophy or legal institutions. It covers only how to read EEG foundation / self-supervised models from technical and natural-science evidence.
The previous site had already strengthened QC, splits, multimodality, and drift, but it was missing how to read foundation models themselves. Without that layer, recent large-scale pretraining can still be misread too quickly as "dataset shift is solved," "a general decoder exists," or "we are closer to WBE." This page therefore separates what the primary literature actually advances from what it still leaves unresolved.
The sources on this page mix peer-reviewed journal / accepted conference papers, accepted posters / workshops, official challenge websites / rules, arXiv preprints, and under-review manuscripts. These are not evidence of the same strength. For example, the official EEG Foundation Challenge site states in its 2025-11-17 update that the proposal preprint does not reflect changes made during the execution phase and that the current website and starter kit should be used instead. The final leaderboard then disclosed that Challenge 2 had not randomized samples, which allowed teams to exploit the fact that contiguous trials likely came from the same subjects. Accordingly, this page does not place model-capability comparisons, benchmark-governance warnings, and moving-target competition rules into the same single frontier ranking.
This was the next weak point on this page. El Ouahidi et al. (2025) is important because it explicitly targets arbitrary length and electrode arrangement with pretraining on more than 60,000 hours from 92 datasets. But that is still not the same as proving that the learned representation stopped reading subject identity, reference / device / protocol structure, or other recording-distribution cues. Lahiri et al. (2026) then showed that narrow-source versus diverse-source pretraining can trade places depending on whether the downstream regime is linear-probe or fine-tuning, while Liu et al. (2026) showed across 12 open-source foundation models and 13 datasets that linear probing is often insufficient, specialist models trained from scratch remain competitive, and larger models do not automatically generalize better. Those benchmark-side warnings line up with the shortcut literature already used elsewhere on this site: Chaibub Neto et al. (2019), Xu et al. (2020), and Di et al. (2021) show why identity confounding, acquisition variability, and time-robust fingerprints must be audited separately from headline transfer. Therefore, on this site, setup-agnostic pretraining is not read as shortcut-resistant neural representation unless the downstream claim also passes the Specificity & Shortcut Card.
This was the next remaining compression on this page. Han et al. (2025) targeted channel-permutation equivariance so arbitrary electrode configurations could be handled more robustly, Chen et al. (2025) introduced a coordinate-based spatial embedding for more than 150 electrode layouts, and El Ouahidi et al. (2025) pushed further toward any-setup pretraining. Those are real advances in recording-frame compatibility. But they still do not prove that different montages, coordinate routes, and reference families have become one shared physiology-preserving coordinate system. Ma et al. (2026) then showed that even strong EEG foundation models can generalize poorly when subject-level supervision is limited unless extra adaptation structure is added, while Lahiri et al. (2026) showed that split construction, checkpoint selection, segment length, and normalization can still dominate comparison. Therefore, on this site, layout support, reference-family robustness, coordinate-route disclosure, and label-limited adaptation burden remain separate fields rather than being collapsed into one word such as generalization.
This was the next remaining weakness on this page. The official EEG Challenge homepage states that Challenge 1 predicts response time from CCD trials, whereas Challenge 2 predicts externalizing scores from EEG across multiple paradigms. The official rules then add that Challenge 1 is scored per trial, submissions are inference-only, and models must run on a single GPU with 20 GB memory. Lee et al. (2025) fine-tuned large brainwave foundation models across memory tasks and sleep stage classification, Liu et al. (2026) explicitly compared leave-one-subject-out cross-subject evaluation with within-subject few-shot calibration, and Lahiri et al. (2026) showed that six benchmark inconsistencies can reverse rankings on identical datasets by up to 24 percentage points. Therefore, on this site, benchmark name, predicted object, independent prediction unit, grouped hold-out unit, and operations budget are separate disclosure fields rather than one merged "benchmark provenance" box.
2026-03-30 correction: benchmark object still needs an explicit matrix
The older wording on this page already required benchmark-object disclosure, but it still left one practical shortcut open: a reader could talk as if benchmark object, independent prediction unit, hold-out unit, and challenge operations budget were all fixed by the benchmark name alone. The current primary and official sources do not support that shortcut. On this site, those fields now have to be read separately.
| Case | What is predicted | What unit must still be named separately | Safe ceiling on this site |
|---|---|---|---|
| EEG Challenge 1 official homepage + rules |
Trial-level response-time regression from the CCD task. | The trial is the scoring unit, but grouped subject structure and the inference-only single-GPU 20 GB operations budget still have to be disclosed separately. | A named cross-task-transfer benchmark under a fixed operations budget, not a general decoder verdict. |
| EEG Challenge 2 official homepage + leaderboard |
Subject-level externalizing-factor prediction from EEG across multiple paradigms. | The subject is the natural independent unit, and the leaderboard postmortem shows that hidden contiguous-trial grouping can still change what the benchmark measured. | A subject-invariant benchmark attempt whose interpretation remains contingent on grouping policy, not proof that subject invariance is solved. |
| Lee et al. (2025) ICML proceedings |
Fine-tuning results across memory tasks and sleep stage classification. | The task family, label granularity, adaptation regime, and metric family still have to be named because sleep-stage labels and memory-task outputs are not one prediction object. | A fine-tuning / PEFT audit across named tasks, not a universal frontier ranking for EEG foundation models. |
| Liu et al. (2026) benchmarking preprint |
Cross-model comparison across 13 EEG datasets and nine paradigms. | The paper explicitly separates leave-one-subject-out evaluation from within-subject few-shot calibration, so the hold-out unit cannot be collapsed into one transfer score. | A benchmark matrix for transfer-regime tradeoffs, not a settled answer to which model generalizes best. |
| Lahiri et al. (2026) PRISM |
Clinical differential diagnosis from interictal EEG, including epilepsy versus mimickers. | The clinically interesting object is subject-level diagnosis, but the paper also shows that split construction, checkpoint selection, segment length, and normalization can dominate comparison. | Evidence that protocol differences can dominate rankings, not an accepted law of clinical transfer. |
2026-04-02 correction: setup compatibility is not physiological equivalence
The older wording on this page already warned against shortcut-resistant overclaims, but one practical shortcut was still left open. A reader could still move from heterogeneous-device support to a shared physiology-preserving representation without naming which part of the cross-setup gap had actually been closed. The current 2025-2026 model literature does not support that shortcut. On this site, recording-frame compatibility and physiology-side equivalence are now kept separate explicitly.
| Case | What the paper directly advances | What still must be disclosed separately on this site |
|---|---|---|
| DIVER-0 (2025) workshop / arXiv |
Channel-permutation-equivariant modeling and robust adaptation to arbitrary electrode configurations unseen during pretraining. | Coordinate route, reference family, omitted-channel policy, downstream adaptation regime, and whether the target variable stayed identifiable rather than merely layout-tolerant. |
| HEAR (2025) arXiv |
A coordinate-based embedding that supports heterogeneous EEG devices, varying electrode counts, and more than 150 layouts. | Whether the geometry route is subject-specific or template-based, whether reference mismatch was neutralized or only absorbed, and what claim ceiling remains for cross-montage physiology. |
| REVE (2025) accepted poster / arXiv |
Large-scale setup-agnostic pretraining across 92 datasets and more than 60,000 hours. | Overlap audit, covered reference-system distribution, coordinate-route disclosure, and whether a downstream gain survived shortcut slices rather than only mixed-corpus transfer. |
| SCOPE (2026) arXiv |
A structured adaptation route for label-limited cross-subject settings where EEG foundation models otherwise generalize poorly. | The label budget, pseudo-label / prototype burden, and whether the result is a property of the pretrained representation or of extra downstream rescue. |
| PRISM (2026) arXiv |
Clinical-transfer evidence plus a warning that split construction, checkpoint selection, segment length, and normalization can dominate rankings. | Benchmark provenance, independent hold-out unit, preprocessing path, and the exact comparison regime before any statement about portable clinical generalization. |
Read primary sources by evidence tier
The biggest weakness that needed correction here was that accepted model papers, official challenge documentation, benchmark-warning preprints, and under-review manuscripts were too easy to read as equally strong "latest research." Technically, that matters because accepted model papers support advances in representation learning / transfer under specific settings, official rules support the exposure conditions of the benchmark, and benchmark-audit preprints support warnings about instability in comparison. A table that hides source type therefore becomes a source of misreading by itself.
| Example | Source type / as of 2026-03-25 | What can be said relatively strongly | What barrier the paper itself leaves unresolved |
|---|---|---|---|
| Kostas et al. (2021) BENDR |
Peer-reviewed journal paper | It showed that self-supervised pretraining can provide breadth across novel subjects, hardware, and tasks. | Downstream applicability remained unsettled; pretraining alone did not guarantee universal transfer. |
| Wang et al. (2023) BIOT |
Accepted conference paper | It provided a concrete strategy for bringing heterogeneous biosignals with different sampling rates, channels, recording durations, and missing values into cross-dataset learning. | Conversely, any result that does not report format harmonization is not meaningfully comparable. |
| Jiang et al. (2024) LaBraM |
Accepted conference paper | It performed cross-dataset pretraining on about 20 datasets and roughly 2,500 hours of EEG, and showed strong performance across multiple downstream tasks. | It explicitly leaves electrode mismatch, unequal length, varied task design, and low SNR as central EEG-side challenges. |
| Wang et al. (2024) EEGPT |
Accepted conference presentation | It reported strong downstream performance with a pretrained transformer and linear probing under low SNR, inter-subject variability, and channel mismatch. | A high score there does not automatically imply cross-day deployability or source identifiability. |
| Lee et al. (2025) ICML fine-tuning audit |
Accepted conference poster | It showed that current large brainwave foundation models only slightly outperform conventional deep baselines, while PEFT methods such as LoRA can greatly reduce the number of trainable parameters. | The gain is small, around 0.5% even at the abstract level, so the result does not support the claim that "larger models win by default." |
| EEG Foundation Challenge (2025) NeurIPS competition |
Official competition website / rules | It attempts to standardize measurement of cross-task transfer and subject-invariant representation over more than 3,000 HBN-EEG participants. | What it provides directly is current benchmark governance, not a final verdict on model capability. The official site also states that the proposal preprint is outdated, so operational conditions should be read from the current rules and starter kit. |
| EEG Foundation Challenge final leaderboard (2025) Governance postmortem |
Official leaderboard / postmortem | It shows that benchmark operations themselves can expose hidden subject-order shortcuts: the organizers reported that Challenge 2 samples had not been randomized, so contiguous trials could reveal same-subject structure and the final prize logic had to be changed. | This is strong evidence about benchmark fragility, not a stable capability ranking of the submitted models. It tells us the measurement changed, not which architecture is universally best. |
| Xiong et al. (2025) EEG-FM-Bench |
arXiv benchmark preprint | It states explicitly that the rapid proliferation of foundation models has outpaced standardized evaluation and that fragmented comparison is slowing scientific progress. | Unharmonized comparisons do create scientific inefficiency, but this is safest to read as a benchmark warning rather than as a final frontier ranking. |
| El Ouahidi et al. (2025) REVE |
Accepted poster / arXiv manuscript | It introduced a 4D positional encoding that can handle arbitrary length and electrode arrangement, pointing toward better transfer across diverse setups. | What can be read relatively strongly here is a direction for handling heterogeneity, not a stable universal ranking across accepted benchmarks. |
| Han et al. (2025) DIVER-1 |
Under-review / arXiv manuscript | It presented a largest-scale corpus and a systematic scaling-law analysis, arguing that electrophysiology raises a data-constrained scaling question. | The warning that smaller models trained longer can outperform larger models trained briefly under fixed data / compute is important, but an under-review source alone is not enough to fix the field's default scaling-law interpretation. |
| Wang et al. (2025) NeuroTTT |
arXiv method preprint | It showed that domain-tuned self-supervision and test-time training can help with pretraining-downstream misalignment and cross-subject shift. | Conversely, the results do not support the assumption that a foundation model alone is sufficient without downstream adaptation. Results that include TTT are also not read here as evidence of deployment simplicity. |
| Lahiri et al. (2026) PRISM |
arXiv clinical-transfer preprint | It reported that pretraining with targeted diversity can become advantageous under fine-tuning and can improve performance on a clinical mimicker task. | The warning that benchmark inconsistency alone can strongly reverse rankings on the same dataset is important, but it still should not be fixed as a shared conclusion of accepted clinical benchmarks. |
| Liu et al. (2026) EEG FM benchmarking |
arXiv benchmark / review preprint | It compared 12 open-source foundation models and specialist baselines across 13 EEG datasets, and argued that linear probing is often insufficient, scratch specialists remain competitive, and larger models do not automatically generalize better. | Because it is still a preprint and a benchmark study, it does not by itself prove shortcut resistance, deployment readiness, or a settled ranking across future accepted evaluations. |
The 10 gates before reading a foundation model
| Gate | Why it is needed | Minimum evidence we want |
|---|---|---|
| G0: source type / maturity | Accepted papers, accepted posters, official rules, arXiv preprints, and under-review manuscripts support claims of different strength. | The source type, whether it is accepted / preprint / under review, and for moving-target rules pages, the last verified date. |
| G1: corpus identity / overlap | A pretraining corpus is also a dataset. If closely related data leak into the downstream side, the split no longer means what it appears to mean, and that leakage can happen through multiple ancestry axes rather than one route. | Corpus name, version / snapshot, total hours, and a multi-axis overlap audit covering raw-recording / window ancestry, subject / session ancestry, site / device / reference / layout ancestry, task / benchmark-object ancestry, and extra-data / checkpoint ancestry. |
| G2: population / setup diversity | The number of datasets or total hours is not enough. If population, device, or electrode layout are biased, pretraining may simply learn recording-distribution artifacts. | The covered population, device types, clinical vs. lab setting, electrode schema, and the distribution of reference systems. |
| G3: harmonization / geometry route | EEG differs greatly in channel count, electrode geometry, reference family, sample rate, and window length, and even a layout-tolerant model does not automatically erase those differences. | Channel map, electrode-coordinate route or template, reference family, resampling, token length, and the policy for missing, omitted, or interpolated channels / segments. |
| G4: adaptation regime | Frozen feature extraction, full fine-tuning, and test-time training do not mean the same thing when one asks what actually transferred. | Whether the regime is frozen, linear-probe, PEFT, full fine-tune, or TTT, plus target-data usage, label budget, and recalibration amount. |
| G5: benchmark object / supervision unit / independent prediction unit | Per-window classification, event detection, sequence labeling, subject-level regression, and retrieval / ranking do not test the same scientific object. Official foundation-model benchmarks already mix these families. | The supervision unit, label provenance, output family, metric bundle, what counts as one independent prediction, and whether that unit inherits raw-recording or subject grouping. |
| G6: benchmark provenance / operations budget | Benchmark papers from 2025-2026 show that rankings can move with split construction, checkpoint selection, segment length, hidden sample ordering, and challenge-stage compute restrictions. The official EEG Challenge postmortem made that point operationally explicit. | Benchmark name, version, split rule, sample-randomization / hidden-grouping policy, checkpoint selection, segment length, normalization, how the external hold-out was built, and any inference-stage compute / training restrictions. |
| G7: shortcut-resistance / specificity bridge | A good transfer score can still come from subject identity, site / device / reference structure, or protocol distribution rather than the intended neural variable. Foundation-model headlines do not remove that risk. | A task-matched nuisance audit, including participant / site / device / reference disjointness, metadata-only or identity baselines where applicable, shortcut slices, and the linked Specificity & Shortcut Card. |
| G8: scale / efficiency | In EEG, "bigger is stronger" does not always hold. It is easy to misread results unless parameter count, data, compute, and trainable fraction are read together. | Total parameter count, trainable parameter count, pretraining epochs / steps, corpus size, training time, and adapter size. |
| G9: claim ceiling | Success for a foundation model is still an advance in macro decoding / representation learning. | An explicit statement of what remains latent, and an explicit stop against source identifiability, direct validation, and WBE state-completeness claims. |
The EEG Challenge submission page defines an inference-only code submission setting, while the final leaderboard discloses that Challenge 2 accidentally preserved same-subject trial contiguity. Those facts are not side notes. They directly change what a reported ranking means, because one result was obtained under a fixed inference budget and another could exploit an unintended grouping cue. On this site, benchmark provenance therefore includes operational constraints and postmortem disclosures, not only the benchmark title.
The older wording on this page still made overlap audit sound too much like one checkbox. Current primary and official sources do not support that compression. Brookshire et al. (2024) show a raw-window ancestry failure mode, Chaibub Neto et al. (2019) show a subject-characteristic failure mode, Melnik et al. (2017) and Xu et al. (2020) show a setup-distribution failure mode, and the official EEG Challenge data, submission, and leaderboard pages show a benchmark-object / benchmark-operations failure mode. On this site, those are now read as separate ancestry axes rather than one generic overlap warning.
The Pretraining Card required on this site
For foundation / self-supervised results, this site requires a Pretraining Card in addition to the standard model card. This is not an external publication standard; it is an operating rule of this site for keeping heterogeneous-corpus pretraining comparable.
| Item | Minimum required content | Dangerous misreading if omitted |
|---|---|---|
| Corpus | Pretraining corpus name, version, total hours, exclusion criteria, and a multi-axis overlap audit covering raw-recording / window, subject / session, site / device / reference / layout, task / benchmark-object, and extra-data / checkpoint ancestry. | You may miss the possibility that what looked like generalization was actually reuse of the same recording family, person, setup, task object, or benchmark lineage. |
| Population / Setup | Population, device, electrode layout, reference system, and whether the setting is clinical or lab-based. | You may misread the number of datasets as recording diversity itself. |
| Harmonization / Geometry Route | Channel schema, electrode-coordinate route or template, reference family, sample rate, tokenization, normalization, and missing / omitted / interpolated-channel policy. | You may misread recording-frame translation as physiology-preserving model capability. |
| Objective | The pretraining objective, such as masked, autoregressive, or contrastive. | You cannot compare which inductive bias actually mattered. |
| Source Type / Maturity | Whether the source is an accepted journal / conference paper, accepted poster / workshop, official rules page, arXiv preprint, or under-review manuscript, and for a rules page, the last verified date. | You may misread under-review warnings or operational documentation as frontier evidence of the same strength as accepted model papers. |
| Adaptation | Frozen / linear-probe / PEFT / full fine-tune / TTT, target-data usage, label budget, and whether recalibration is used. | You may conflate "a general representation transferred well" with "the model was strongly adapted to the target." |
| Benchmark Provenance / Operations | Benchmark name, version, split rule, checkpoint selection, segment length, normalization, and any inference-stage compute or no-training restriction. | You may misread ranking changes caused by benchmark design as differences in the model itself. |
| Benchmark Object / Supervision Unit | Whether the downstream object is window / trial classification, event detection, sequence labeling, subject-level regression / diagnosis, retrieval / ranking, or another family, together with label provenance, output family, metric bundle, the independent prediction unit, and whether grouped ancestry from the same recording or subject remains. | You may collapse heterogeneous wins into one story about portable EEG generalization even though the model solved different objects with different error surfaces. |
| Shortcut-resistance / Specificity Bridge | For any downstream decode / biomarker / clinical claim, report participant / site / device / reference disjointness, metadata-only or subject-ID baselines where relevant, nuisance-route checks, shortcut slices, and the linked Specificity & Shortcut Card. | You may misread a representation that mainly preserves identity or recording-distribution cues as if it had become invariant to those shortcuts. |
| Scale / Efficiency | Total parameter count, trainable parameter count, pretraining steps / epochs, training time, adapter size, and inference cost. | You may read "the foundation model won because it is large" when the real driver was compute allocation or PEFT. |
| Evaluation | Evaluation family, hold-out unit, device hold-out, cross-day evaluation, abstention policy, and failure conditions. | You may mistake a high same-day score for deployability. |
| Stopped claim | A one-line statement of what still cannot be claimed. | You may over-extrapolate foundation-model success to source truth or WBE. |
Operating rules on this site
Rule
- We do not hide source type: accepted papers, official rules, and preprints / under-review manuscripts are not listed as evidence of the same strength.
- Foundation-model results are not exempt from split auditing: independence must be checked including the pretraining corpus.
- We do not hide population / setup diversity: we report not just the number of datasets, but which recording distributions were actually included.
- We do not hide format harmonization: channel / reference / sampling harmonization must always be reported.
- We do not read heterogeneous-device support as physiology equivalence: coordinate route, reference family, and omitted-channel policy stay visible even when a model accepts arbitrary layouts.
- We do not hide the amount of adaptation: linear probing, full fine-tuning, and TTT are not all listed as the same kind of "transfer success."
- We do not hide benchmark object: window classification, event detection, sequence labeling, subject-level regression, and retrieval-like tasks are not compressed into one frontier score.
- We do not hide independent units or grouped hold-outs: trial, epoch, recording, and subject are different prediction objects and need separate disclosure.
- We do not hide benchmark provenance: because rankings move with split / checkpoint / preprocessing differences, benchmark specification is part of the result.
- We do not hide challenge operations budgets: inference-only settings, no-training rules, and memory limits are part of what the leaderboard score means.
- We do not treat "any setup" as shortcut-resistant by title alone: foundation-model transfer claims also need a shortcut-resistance bridge to the Specificity & Shortcut Card.
- Current competition rules are checked on the official site: proposal papers or companion preprints are background material; current rules / submission instructions / starter kits take priority for operations.
- We do not hide benchmark postmortems: if organizers later disclose split flaws, sample-order shortcuts, or scoring changes, that disclosure changes how we read the leaderboard.
- Benchmark-warning preprints are not treated as frontier verdicts: ranking reversals and scaling-law claims remain exploratory until reinforced by accepted papers or independent reruns.
- We do not hide scale / efficiency: we do not write that a foundation model won without reporting parameter count, trainable fraction, and training time.
- Even at high scores, the claim ceiling is kept in place: source identifiability, direct validation, closed-loop deployability, and WBE state-completeness are separate gates.
- Results without a Pretraining Card are treated only as qualified decoding evidence: they are not automatically promoted to L2 or above.
References
- Kostas, D., Aroca-Ouellette, S., & Rudzicz, F. (2021). BENDR: Using Transformers and a Contrastive Self-Supervised Learning Task to Learn From Massive Amounts of EEG Data. Frontiers in Human Neuroscience, 15, 653659. doi:10.3389/fnhum.2021.653659
- Wang, H., Lu, C., Xie, B., et al. (2023). BIOT: Biosignal Transformer for Cross-data Learning in the Wild. NeurIPS 2023. paper
- Jiang, W.-B., Zhao, L., & Lu, B.-L. (2024). Large Brain Model for Learning Generic Representations with Tremendous EEG Data in BCI. ICLR 2024. proceedings
- Wang, G., Liu, W., He, Y., Xu, C., Ma, L., & Li, H. (2024). EEGPT: Pretrained Transformer for Universal and Reliable Representation of EEG Signals. NeurIPS 2024. poster / abstract
- Lee, N., Barmpas, K., Panagakis, Y., Adamos, D., Laskaris, N., & Zafeiriou, S. (2025). Are Large Brainwave Foundation Models Capable Yet? Insights from Fine-Tuning. Proceedings of the 42nd International Conference on Machine Learning, PMLR 267, 32878-32888. PMLR
- EEG Foundation Challenge (2025). From Cross-Task to Cross-Subject EEG Decoding. NeurIPS 2025 competition. official website
- EEG Foundation Challenge (2025). Data. official data page
- EEG Foundation Challenge (2025). Rules. official rules
- EEG Foundation Challenge (2025). Submission. submission page
- EEG Foundation Challenge (2025). Leaderboard. official leaderboard / postmortem
- Xiong, W., Li, J., Li, J., & Zhu, K. (2025). EEG-FM-Bench: A Comprehensive Benchmark for the Systematic Evaluation of EEG Foundation Models. arXiv. arXiv:2508.17742
- El Ouahidi, Y., Lys, J., Thölke, P., Farrugia, N., Pasdeloup, B., Gripon, V., Jerbi, K., & Lioi, G. (2025). REVE: A Foundation Model for EEG -- Adapting to Any Setup with Large-Scale Pretraining on 25,000 Subjects. accepted poster / arXiv manuscript. arXiv:2510.21585
- Han, D. D., Lee, A. L., Lee, T., Gwon, Y., Lee, S., Lee, S., Park, D. K., Yoo, S., Cha, J., & Chung, C. K. (2025). DIVER-0: A Fully Channel Equivariant EEG Foundation Model. ICML 2025 Workshop on GenBio / arXiv manuscript. arXiv:2507.14141
- Chen, Z., Qin, C., You, W., Liu, R., Chu, C., Yang, R., Tan, K. C., & Wu, J. (2025). HEAR: An EEG Foundation Model with Heterogeneous Electrode Adaptive Representation. arXiv preprint. arXiv:2510.12515
- Han, D. D., Gwon, Y., Lee, A. L., et al. (2025). DIVER-1: Deep Integration of Vast Electrophysiological Recordings at Scale. under-review / arXiv manuscript. arXiv:2512.19097
- Wang, S., Deng, Y., Bao, Z., Zhan, X., & Duan, Y. (2025). NeuroTTT: Bridging Pretraining-Downstream Task Misalignment in EEG Foundation Models via Test-Time Training. arXiv preprint. arXiv:2509.26301
- Ma, J., Wu, F., Xing, Y., Lin, Q., Liu, T., Liu, C., Jia, Z., & Feng, M. (2026). Structured Prototype-Guided Adaptation for EEG Foundation Models. arXiv preprint. arXiv:2602.17251
- Lahiri, J. B., Runwal, P., Kulkarni, A., Jain, M., Mishra, A. R., Panwar, S., & Singh, S. (2026). PRISM: Exploring Heterogeneous Pretrained EEG Foundation Model Transfer to Clinical Differential Diagnosis. arXiv preprint. arXiv:2603.02268
- Liu, D., Chen, Y., Chen, Z., Cui, Z., Wen, Y., An, J., Luo, J., & Wu, D. (2026). EEG Foundation Models: Progresses, Benchmarking, and Open Problems. arXiv preprint. arXiv:2601.17883
- Brookshire, G., Kasper, J., Blauch, N. M., Wu, Y. C., Glatt, R., Merrill, D. A., Gerrol, S., Yoder, K. J., Quirk, C., & Lucero, C. (2024). Data leakage in deep learning studies of translational EEG. Frontiers in Neuroscience, 18, 1373515. doi:10.3389/fnins.2024.1373515
- Chaibub Neto, E., Pratap, A., Perumal, T. M., et al. (2019). Detecting the impact of subject characteristics on machine learning-based diagnostic applications. npj Digital Medicine, 2, 99. doi:10.1038/s41746-019-0178-x
- Xu, M., Yao, S., Wei, Z., et al. (2020). Cross-dataset variability problem in EEG decoding with deep learning. Frontiers in Human Neuroscience, 14, 103. doi:10.3389/fnhum.2020.00103
- Di, Y., An, X., Zhong, W., Liu, S., & Ming, D. (2021). The time-robustness analysis of individual identification based on resting-state EEG. Frontiers in Human Neuroscience, 15, 672946. doi:10.3389/fnhum.2021.672946