Wiki: Baseline / Benchmark / Pre-registration / Model Card

The shortest map

A baseline is the minimum comparison partner. A benchmark fixes not only the task and score but also the split rules, metric bundle, and operational reading of the result. Pre-registration fixes what success, failure, and abstention mean before the run starts. A model card reports what happened. Additional cards are then attached depending on the claim shape. Without that stack, a good score is still not comparable progress.

2026-03 addendum: L1 and above still need an Observability Budget

For L1 and higher results, this site still stacks the Observability Budget on top of the usual model card so the measurement stack, direct observables, residual latent state, claim ceiling, and abstention conditions are visible rather than implied by the score.

2026-03-25 addendum: a benchmark is not just data plus one score

The earlier version of this page still let benchmark sound like a static score sheet. That is too weak. The official EEG Challenge (2025) homepage states that the original challenge preprint became outdated during execution and that the website plus starter kit should be treated as current. The official rules require disclosure of additional pretraining data, pretrained models / fine-tuning method, and single-GPU 20 GB inference-stage constraints, while the official leaderboard later disclosed that Challenge 2 samples had not been randomized, which changed the prize structure and what the ranking meant. Xiong et al. (2025) and Liu et al. (2026) then showed more generally that protocol and evaluation choices materially affect EEG-foundation-model comparisons. Likewise, Saito & Rehmsmeier (2015), Roy et al. (2021), Sun et al. (2017), and Vallat & Walker (2021) show why metric semantics also change what a score means. Therefore, on this site, a benchmark now includes split / randomization rule, task-matched metric bundle, benchmark version, extra-data / checkpoint policy, inference-stage restrictions, and organizer postmortems, not only a dataset name and one number.

2026-03-25 addendum: a model card is not the whole artifact stack

The next weakness was to let a generic model card sound like the final reporting layer for every result. That is also too weak. Chaibub Neto et al. (2019), Xu et al. (2020), and Di et al. (2021) show why decoding and transfer claims can still ride on subject / acquisition shortcuts, so score reporting alone does not establish target-variable specificity. Kothe et al. (2025), Wei et al. (2020), Vafaii et al. (2024), and Chen et al. (2025) show why simultaneous or multimodal does not replace a fusion audit. Johansen et al. (2024), Li et al. (2025), Baadsvik et al. (2024), Hirschler et al. (2025), and Dagum et al. (2026) constrain different living-human quantity types and burdens rather than one already field-ready whole-brain meter. Lu et al. (2023), Bosch et al. (2022), MICrONS Consortium et al. (2025), and Attardo et al. (2015) show why same-subject or same-brain wording still leaves a sequential bridge burden. Therefore, on this site, the model card is only one layer in a claim-triggered artifact stack.

2026-03-30 addendum: generic companion cards are no longer enough for the current site rule

The next weakness on this page was narrower but important: it still listed only the earlier generic companion cards, even though the current public site now requires route-specific cards or logs for several claim families. The primary literature does not support compressing these routes into one generic reporting layer. Tang et al. (2023), d'Ascoli et al. (2025), and Wairagkar et al. (2025) show why language-facing outputs need a Neural Contribution Card plus temporal and calibration disclosure rather than only a score. Horrillo-Maysonnial et al. (2023), Rong et al. (2025), and Feng et al. (2025) show why ESI claims need validation-class, source-regime, and benchmark-object typing plus solver-disagreement disclosure. Gajwani et al. (2023), He et al. (2024), and Manzano-Patrón et al. (2025) show why tractography needs object typing rather than one graph headline. Smith et al. (2011), Barnett & Seth (2017), Villaverde et al. (2019), and Novelli et al. (2025) show why effective-connectivity graphs remain model-conditioned unless closure, node policy, and sampling sensitivity are disclosed. Lynn et al. (2021) and Ishihara & Shimazaki (2025) show why irreversibility language hides multiple estimator families and closure assumptions. Therefore, this page now separates generic companion cards from route-specific cards / logs instead of treating them as one bucket.

2026-04-04 addendum: three cards now need field-level disclosure, not name-level disclosure

The next weakness on this page became visible only after the route-stack expansion: three cards still sounded satisfied by a short label even though the current primary literature says otherwise. Kothe et al. (2025), Vafaii et al. (2024), Chen et al. (2025), Bolt et al. (2025), Epp et al. (2025), Rohaut et al. (2024), and Manasova et al. (2026) show why Fusion still has to separate synchronization, temporal-kernel relation, shared-vs-specific structure, quantity bridge, and bundle robustness. Li et al. (2025), Bøgh et al. (2024), Morgan et al. (2024), and Manasova et al. (2026) show why Human Proxy Composition still has to separate quantity type, operating-point dependence, method-family non-equivalence, and disagreement topology. Bosch et al. (2022), MICrONS Consortium et al. (2025), Gallego et al. (2020), Karpowicz et al. (2025), Wilson et al. (2025), and Wairagkar et al. (2025) show why State-Continuity Bridge still has to name the carried object, tolerance rule, and rescue route rather than relying on specimen identity or score survival alone. Therefore, this guide no longer treats those three cards as satisfied by a short name plus one sentence.

First separate the roles

Artifact	Main role	What it fixes that the others do not
Baseline	The minimum comparison partner.	Prevents a new score from being read without context.
Benchmark	The benchmark object: task, split rules, metric bundle, and governance.	Fixes what comparison actually means before any score is interpreted.
Pre-registration	The promise made before the run.	Fixes success, failure, stopping, and abstention rules before hindsight pressure appears.
Model card	The result report for one trained system.	Records scores, baselines, failure examples, compute usage, and practical weaknesses of the submitted system.
Observability Budget	The measurement-side ceiling.	Fixes what was directly observed, what remained latent, and which claim ceiling still applies.
Specificity & Shortcut Card	The shortcut audit for decode / biomarker / transfer claims.	Separates the target neural variable from subject, session, site, device, protocol, and other nuisance routes.
Neural Contribution Card	The language-facing shortcut audit.	Fixes task constraint, candidate set, prompt or language-model scaffold, no-brain / no-LM / shuffle controls, and subject cooperation for text / speech outputs.
Fusion Card	The multimodal / atlas-prior integration audit.	Fixes acquisition relation, lag audit, effective-window / temporal-kernel relation, geometry / co-registration scope, fusion model, shared-vs-specific component disclosure, quantity bridge / physiology grounding, unimodal baselines, complete-case / missing-modality disclosure, transfer or disagreement window, external calibration, and abstention.
Pretraining Card	The EEG foundation / self-supervised transfer audit.	Fixes corpus identity / overlap, harmonization, adaptation regime, benchmark provenance, and efficiency constraints.
Route-specific cards / logs	The claim-family-specific disclosure layer.	Types ESI, tractography, effective-connectivity, irreversibility, intervention, and boundary claims by their own failure modes instead of compressing them into one generic report.
Human Proxy Composition Card	The bundle audit for several living-human proxy rows.	Fixes proxy class, direct observable and evidence role by row, same-subject relation, effective time window / state axis, regime compatibility, operational maturity, calibrator role, model burden, method-family non-equivalence, agreement / disagreement topology plus resolution policy, incremental evidence, and residual latent-state ceiling.
State-Continuity Bridge Card	The sequential bridge audit.	Fixes acquisition order, elapsed time, regime continuity, coordinate transfer / deformation, carried object / bridge witness, tolerance / failure rule, rescue route versus raw continuity, bridge-validation rung, and residual drift ceiling before same-state language is allowed.
Temporal Validity Card	The time-generalization ceiling.	Fixes fixed-decoder interval, state annotation, recalibration burden, drift handling, and transfer ceiling across hours to days.
Calibration & Abstention Card	The uncertainty and fallback audit.	Fixes fit / calibration / test separation, evaluation slice, coverage-risk target, and fallback behavior when outputs include confidence or abstention.
Failure examples / negative results	The record of where things broke.	Prevents the field from learning only from accidental successes.

What a benchmark fixes on this site

Benchmark field	Why it matters	What goes wrong if it is missing
Task and target definition	States exactly what is predicted or detected.	A score can be overread as if it applied to a broader task family.
Split / randomization rule	Defines whether subject, session, trial order, and hidden grouping were controlled.	Identity or contiguous-trial shortcuts can change the meaning of the leaderboard.
Task-matched metric bundle	Fixes which metrics are needed for this task, such as false alarms, latency, macro-F1, or kappa.	One headline score can hide the real failure mode.
Extra-data / checkpoint policy	States whether outside data or pretrained models changed the comparison.	Transfer gains can be misread as if they came only from the submitted pipeline.
Operational restrictions	Fixes inference-time compute, code-submission conditions, and other deployment-side constraints.	A result can be misread as portable when it depended on a looser operating regime.
Version / postmortem status	States whether organizer updates, starter-kit changes, or later error disclosures changed the benchmark object.	An obsolete preprint or early leaderboard can be overread as the final benchmark truth.

Which extra artifacts are triggered by the claim

Claim shape	Base stack	Extra artifact to add	What it blocks
Any L1+ measurement claim	Baseline + Benchmark + Pre-registration + Model card	Observability Budget	Stops a measurement stack from being overread as if it directly observed more than it did.
Decode / biomarker / transfer claim	Base stack + Observability Budget	Specificity & Shortcut Card	Stops subject / session / site / device / protocol shortcuts from being mistaken for target-variable capture.
Foundation / self-supervised EEG claim	Base stack + Observability Budget	Pretraining Card plus Specificity & Shortcut Card	Stops a transfer win from being overread as generic portability or shortcut-resistant representation learning.
Language-facing text / speech / brain-to-text claim	Base stack + Observability Budget	Neural Contribution Card plus Specificity & Shortcut Card; add Calibration & Abstention Card when confidence, retrieval-set, or prediction-set language is reported.	Stops fluent output or top-k retrieval from being overread as if neural contribution, confidence, and prompt dependence were already separated.
EEG source imaging / inverse reconstruction claim	Base stack + Observability Budget	Inverse-Solver Agreement Log plus named validation class, source regime, montage / coverage policy, and benchmark-object disclosure.	Stops one localization headline from being overread as if depth bias, source extent, coverage geometry, and solver disagreement were already resolved.
Diffusion-MRI tractography / structural-connectome claim	Base stack + Observability Budget	Tractography route card	Stops one tractography graph from being overread as if acquisition, endpoint assignment, graph construction, uncertainty, and calibration were fixed.
Effective-connectivity / DCM / directed-graph claim	Base stack + Observability Budget	Effective-connectivity route card	Stops a directed graph from being overread as discovered causal wiring when candidate-model family, closure, node policy, and sampling sensitivity remain implicit.
Thermodynamic / irreversibility claim	Base stack + Observability Budget	Irreversibility / thermodynamic route card	Stops arrow-of-time or entropy-flow language from being overread as if signal route, coarse-graining, estimator family, and quantity type were already fixed.
Multimodal or atlas-prior claim	Base stack + Observability Budget	Fusion Card	Stops simultaneity, synchronization middleware, or a prior from standing in for validated fusion.
Intervention / closed-loop claim	Base stack + Observability Budget	Intervention Card; for embodied or human-in-the-loop claims, also add the Body / Environment Boundary Card.	Stops low latency or one control trace from standing in for a typed intervention, preserved loop boundary, or safe deployment claim.
Several living-human proxy rows used together	Base stack + Observability Budget	Human Proxy Composition Card	Stops proxy-rich bundles from being overread as same-subject state identification when role by row, regime compatibility, maturity, and disagreement remain implicit.
Same-subject / same-brain sequential bridge	Base stack + Observability Budget	State-Continuity Bridge Card; add Temporal Validity Card when the bridge crosses hours to days or a fixed-decoder interval is claimed.	Stops specimen identity, score survival, or rescue-dependent stability from being overread as same-state evidence or stable time-generalization.
Outputs with probabilities, intervals, prediction sets, or abstention	Base stack + the cards already triggered by the claim	Calibration & Abstention Card	Stops raw confidence, threshold tuning, or selective reporting from being overread as calibrated risk control.

Three cards whose short labels had become too weak

The next problem was not card count but card sufficiency. After the stack expansion, these three cards still sounded satisfied by a short label even though the current site rule had already become stricter. This section brings the learning guide up to that current field-level requirement.

Card	Why the older short description is now too weak	Minimum fields this guide now expects
Fusion Card	Kothe et al. (2025), Vafaii et al. (2024), Chen et al. (2025), Bolt et al. (2025), Epp et al. (2025), Rohaut et al. (2024), and Manasova et al. (2026) show why synchronized acquisition, shared low-frequency structure, a quantity bridge, and bundle robustness are different achievements.	Acquisition relation, effective-window / temporal-kernel relation, shared-vs-specific component disclosure, quantity bridge / physiology grounding, unimodal and prior-only baselines, complete-case / missing-modality policy, transfer or disagreement window, external calibration, and abstention.
Human Proxy Composition Card	Li et al. (2025), Bøgh et al. (2024), Morgan et al. (2024), Vafaii et al. (2024), Chen et al. (2025), and Manasova et al. (2026) show why quantity type, operating point, common-driver burden, and disagreement topology still matter even when all rows are real human data.	Proxy class, direct observable and evidence role by row, effective time window / state axis, regime compatibility, operational maturity, calibrator role, method-family non-equivalence, cross-row agreement / disagreement plus resolution policy, increment over the strongest single row, and residual latent-state ceiling.
State-Continuity Bridge Card	Bosch et al. (2022), MICrONS Consortium et al. (2025), Gallego et al. (2020), Van De Ville et al. (2021), Karpowicz et al. (2025), Wilson et al. (2025), and Wairagkar et al. (2025) show why specimen identity, carried object, rescue strategy, and score survival are different objects.	Bridge class, acquisition order, elapsed time, regime continuity, coordinate transfer / deformation, carried object / witness, tolerance / failure rule, rescue route versus raw continuity, bridge-validation rung, and residual drift ceiling.

Why route-specific cards had to be added

Claim family	Why a generic score sheet is too weak	Card or log this site now asks for
Language-facing decode / speech	High scores can still come from candidate-set restriction, language priors, prompt scaffolds, or session-specific support rather than the target neural route alone.	Neural Contribution Card plus Specificity & Shortcut Card; add Calibration & Abstention Card and Temporal Validity Card when the claim leaves same-session.
EEG source imaging / inverse reconstruction	A single localization score is too weak because validation class, source regime, montage / coverage policy, source depth or extent, and solver disagreement can all change what the result means.	Inverse-Solver Agreement Log plus named validation class and benchmark-object disclosure.
Tractography / structural connectome	Hub maps and connectome metrics can shift with acquisition / harmonization, cortical endpoint assignment, graph construction, uncertainty routing, and external calibration; the graph is not one fixed object by default.	Tractography route card.
Effective connectivity / DCM	The output still depends on candidate-model family, observed-subsystem closure / latent-confound audit, node-definition policy, sampling / transformation sensitivity, validation, and reliability window.	Effective-connectivity route card.
Thermodynamic irreversibility	Different papers compute different quantities from different signal routes, coarse-grainings, and estimator families, so one irreversibility headline does not name one measurement object.	Irreversibility / thermodynamic route card.
Multimodal / atlas-prior integration	A synchronized or atlas-informed result can still mix incompatible temporal objects, physiology-linked shared factors, missing-modality slices, and modality-specific disagreements instead of one validated biological quantity.	Fusion Card with effective-window, shared-vs-specific, quantity-bridge, complete-case, and disagreement disclosure.
Closed loop / embodied controller	Latency alone does not tell you what was perturbed, which sensory / motor / interoceptive channels were preserved or omitted, or how far the result generalizes across time.	Intervention Card plus Body / Environment Boundary Card; add Temporal Validity Card when the claim rises above a same-session demo.
Living-human proxy bundle	Proxy-rich human evidence can still mix different quantity types, spatial units, timescales, model burdens, role assignments, and disagreement topologies rather than one same-subject state sample.	Human Proxy Composition Card with role by row, regime compatibility, maturity, calibrator role, and disagreement disclosure.
Sequential same-subject / same-brain bridge	Specimen identity does not by itself fix state continuity across fixation, deformation, sleep / wake regime, elapsed time, cross-day reacquisition, or adaptation-assisted score rescue.	State-Continuity Bridge Card, plus Temporal Validity Card when the bridge spans hours to days, with carried object, tolerance rule, and rescue-mode disclosure.

Why the stack is cumulative

If this is missing	The usual failure mode
No baseline	It becomes unclear whether a gain is meaningful or trivial.
No benchmark object	Different runs or papers are scored under different hidden rules and still get compared anyway.
No pre-registration	Success and stopping conditions can drift after the result is known.
No model card	Only the headline number remains visible while failure modes disappear.
No companion card matched to the claim	The result is silently promoted above the evidence it actually supports.
No negative-result record	The same failure gets rediscovered and renamed as if it were new.

View it as a minimal flow

Put down a baseline

Start with the minimum comparison partner, even if it is simple.

Fix the benchmark object

Align the task, split / randomization rule, task-matched metric bundle, and operational benchmark constraints before comparing systems.

Pre-register success, failure, and abstention

Decide in advance what counts as passing, stopping, or refusing a stronger claim.

Report the result with the triggered companion cards

Leave the model card, the Observability Budget, and any extra cards required by the claim shape.

Keep failure examples visible

Record where the pipeline broke, not only where it happened to work.

Why failure examples stay in the stack

If only successful cases are kept, the field learns a distorted map of where the claim ceiling really is. On this site, a usable failure record states the condition, which metric failed, whether the failure came from leakage, OOD shift, compute limits, bridge failure, or fusion mismatch, and what stronger claim therefore remains blocked.

Minimum failure record

State the data regime, split rule, metric bundle, triggered cards, and the first place the claim lost support. A vague sentence such as it did not generalize is weaker than a report that says cross-site transfer collapsed after harmonization changed, false alarms doubled, and the Specificity & Shortcut Card stayed unresolved.

Minimum checks when reading public pages

Checklist

Is there a baseline? Is it clear what the new result is compared against?
Is the benchmark object fixed? Are the task, split / randomization rule, metric bundle, version, and governance visible?
Is there a pre-registration? Are success, failure, stopping, and abstention rules stated before the run?
Is there a model card plus Observability Budget? Are scores, weaknesses, failure examples, direct observables, latent state, and claim ceiling visible?
If it is a decode / biomarker / transfer result, is a Specificity & Shortcut Card visible? Are subject, session, site, device, and protocol shortcuts audited separately?
If it emits text or speech, is a Neural Contribution Card visible? Are candidate set, language-model or prompt scaffold, no-brain / no-LM controls, and subject cooperation disclosed?
If it is a foundation / self-supervised EEG result, is a Pretraining Card visible? Are corpus overlap, harmonization, adaptation regime, benchmark version, and inference-stage restrictions written?
If it is an ESI, tractography, effective-connectivity, or thermodynamic claim, is the route-specific card or log visible? Are validation class / graph object / closure / estimator-family details written rather than hidden in one headline?
If it is multimodal or atlas-prior, is a Fusion Card visible? Are acquisition relation, effective-window / temporal-kernel relation, shared-vs-specific disclosure, quantity bridge, complete-case or missing-modality policy, and external calibration written?
If it is a closed-loop or intervention result, is an Intervention Card visible, and if embodiment matters is a Body / Environment Boundary Card visible? Are trigger rule, timing audit, preserved loop channels, and slow-boundary omissions disclosed?
If several living-human proxy rows are used together, is a Human Proxy Composition Card visible? Are proxy class, direct observable and role by row, regime compatibility, maturity / calibrator role, disagreement topology, and increment over the strongest single row disclosed?
If the claim bridges same-subject or same-brain measurements across regimes, is a State-Continuity Bridge Card visible? Are carried object, tolerance / failure rule, rescue route, elapsed time, regime continuity, deformation / registration burden, and bridge-validation rung written?
If the claim rises across hours to days or reports confidence / abstention, are Temporal Validity and Calibration & Abstention Cards visible? Are recalibration burden, transfer ceiling, fit/calibration/test separation, and fallback behavior written?

References

Saito, T., & Rehmsmeier, M. (2015). The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets. doi:10.1371/journal.pone.0118432
Roy, Y., Banville, H., Albuquerque, I., et al. (2021). Deep learning-based electroencephalography analysis: a systematic review. doi:10.1016/j.ebiom.2021.103275
Sun, H., Paixao, L., Oliva, J. T., et al. (2017). Brain age from the electroencephalogram of sleep. doi:10.1093/sleep/zsx139
Vallat, R., & Walker, M. P. (2021). An open-source, high-performance tool for automated sleep staging. doi:10.7554/eLife.70092
Chaibub Neto, E., Pratap, A., Perumal, T. M., et al. (2019). Detecting the impact of subject characteristics on machine learning-based diagnostic applications. doi:10.1038/s41746-019-0178-x
Xu, M., Yao, S., Wei, Z., et al. (2020). Cross-dataset variability problem in EEG decoding with deep learning. doi:10.3389/fnhum.2020.00103
Di, Y., An, X., Zhong, W., Liu, S., & Ming, D. (2021). The Time-Robustness Analysis of Individual Identification Based on Resting-State EEG. doi:10.3389/fnhum.2021.672946
EEG Challenge (2025) homepage. official site
EEG Challenge (2025) rules. official rules
EEG Challenge (2025) starter kit. official starter kit page
EEG Challenge (2025) leaderboard and organizer postmortem. official leaderboard
Xiong, W., Li, J., Li, J., Zhu, K., & Jiang, C. (2025/2026). EEG-FM-Bench: A Comprehensive Benchmark for the Systematic Evaluation of EEG Foundation Models. arXiv:2508.17742
Liu, D., Chen, Y., Chen, Z., Cui, Z., Wen, Y., An, J., Luo, J., & Wu, D. (2026). EEG Foundation Models: Progresses, Benchmarking, and Open Problems. arXiv:2601.17883
Kothe, C., Shirazi, S. Y., Stenner, T., et al. (2025). The lab streaming layer for synchronized multimodal recording. doi:10.1162/IMAG.a.136
Wei, H., Jafarian, A., Zeidman, P., et al. (2020). Bayesian fusion and multimodal DCM for EEG and fMRI. doi:10.1016/j.neuroimage.2020.116595
Vafaii, H., Mandino, F., Desrosiers-Gregoire, G., et al. (2024). Multimodal measures of spontaneous brain activity reveal both common and divergent patterns of cortical functional organization. doi:10.1038/s41467-023-44363-z
Chen, J. E., Lewis, L. D., Coursey, S. E., et al. (2025). Simultaneous EEG-PET-MRI identifies temporally coupled and spatially structured brain dynamics across wakefulness and NREM sleep. doi:10.1038/s41467-025-64414-x
Johansen, A., Beliveau, V., Colliander, E., et al. (2024). An In Vivo High-Resolution Human Brain Atlas of Synaptic Density. doi:10.1523/JNEUROSCI.1750-23.2024
Li, X., Zhu, X.-H., Li, Y., et al. (2025). Quantitative mapping of key glucose metabolic rates in the human brain using dynamic deuterium magnetic resonance spectroscopic imaging. doi:10.1093/pnasnexus/pgaf072
Baadsvik, E. L., Weiger, M., Froidevaux, R., et al. (2024). Myelin bilayer mapping in the human brain in vivo. doi:10.1002/mrm.29998
Hirschler, L., et al. (2025). Region-specific drivers of CSF mobility measured with MRI in humans. doi:10.1038/s41593-025-02073-3
Dagum, P., et al. (2026). The glymphatic system clears amyloid beta and tau from brain to plasma in humans. doi:10.1038/s41467-026-68374-8
Lu, X., Han, X., Meirovitch, Y., et al. (2023). Preserving extracellular space for high-quality optical and ultrastructural studies of whole mammalian brains. doi:10.1016/j.crmeth.2023.100520
Bosch, C., Ackels, T., Pacureanu, A., et al. (2022). Functional and multiscale 3D structural investigation of brain tissue through correlative in vivo physiology, synchrotron microtomography and volume electron microscopy. doi:10.1038/s41467-022-30199-6
MICrONS Consortium, Bae, J. A., Lee, W.-C. A., et al. (2025). Functional connectomics spanning multiple areas of mouse visual cortex. doi:10.1038/s41586-025-08790-w
Attardo, A., Fitzgerald, J. E., & Schnitzer, M. J. (2015). Impermanence of dendritic spines in live adult CA1 hippocampus. doi:10.1038/nature14467
Tang, J., LeBel, A., Jain, S., & Huth, A. G. (2023). Semantic reconstruction of continuous language from non-invasive brain recordings. Nature Neuroscience, 26, 858-866. doi:10.1038/s41593-023-01304-9
Défossez, A., Caucheteux, C., Rapin, J., Kabeli, O., & King, J.-R. (2023). Decoding speech perception from non-invasive brain recordings. Nature Machine Intelligence, 5, 1097-1107. doi:10.1038/s42256-023-00714-5
d'Ascoli, S., Bel, C., Rapin, J., Banville, H., Benchetrit, Y., Pallier, C., & King, J.-R. (2025). Towards decoding individual words from non-invasive brain recordings. Nature Communications, 16, 10521. doi:10.1038/s41467-025-65499-0
Wairagkar, M., Card, N. S., Singer-Clark, T., Hou, X., Iacobacci, C., Miller, L. M., Hochberg, L. R., Brandman, D. M., & Stavisky, S. D. (2025). An instantaneous voice-synthesis neuroprosthesis. Nature, 644(8075), 145-152. doi:10.1038/s41586-025-09127-3
Horrillo-Maysonnial, A., Avigdor, T., Abdallah, C., et al. (2023). Targeted density electrode placement achieves high concordance with traditional high-density EEG for electrical source imaging in epilepsy. Clinical Neurophysiology, 156, 262-271. doi:10.1016/j.clinph.2023.08.009
Rong, J., Sun, R., Joseph, B., Worrell, G., & He, B. (2025). Deep learning-based EEG source imaging is robust under varying electrode configurations. Clinical Neurophysiology, 175, 2010730. doi:10.1016/j.clinph.2025.04.009
Unnwongse, K., Van Klink, N., Tousseyn, S., et al. (2023). Validating EEG source imaging using intracranial electrical stimulation. Brain Communications, 5(1), fcad023. doi:10.1093/braincomms/fcad023
Hao, S., Zhao, H., Feng, Z., et al. (2025). HD-EEG source imaging with simultaneous SEEG recording in drug-resistant epilepsy. Epilepsia, 66(11), 4451-4464. doi:10.1111/epi.18552
Pascarella, A., Mikulan, E., Sciacchitano, F., et al. (2023). An in-vivo validation of ESI methods with focal sources. NeuroImage, 277, 120219. doi:10.1016/j.neuroimage.2023.120219
Feng, Z., Guan, C., & Sun, Y. (2025). Block-Champagne: A novel Bayesian framework for imaging extended E/MEG source. IEEE Transactions on Medical Imaging. doi:10.1109/TMI.2025.3642620
Gajwani, M., Oldham, S., Pang, J. C., Arnatkevičiūtė, A., Tiego, J., Bellgrove, M. A., & Fornito, A. (2023). Can hubs of the human connectome be identified consistently with diffusion MRI? Network Neuroscience, 7(4), 1277-1304. doi:10.1162/netn_a_00324
He, Y., Hong, Y., Wu, Y., et al. (2024). Spherical-deconvolution informed filtering of tractograms changes laterality of structural connectome. NeuroImage, 297, 120904. doi:10.1016/j.neuroimage.2024.120904
McMaster, E. M., Newlin, N. R., Rudravaram, G., et al. (2025). Harmonized connectome resampling for variance in voxel sizes. Magnetic Resonance Imaging, 122, 110424. doi:10.1016/j.mri.2025.110424
Bramati, I. B., Szczupak, D., Carneiro Monteiro, M., Meireles, F., Menezes Guimarães, D., Dean, R. J., Paul, L. K., & Tovar-Moll, F. (2026). Diffusion MRI sampling schemes bias diffusion metrics and tractography. Frontiers in Neuroimaging, 5, 1670604. doi:10.3389/fnimg.2026.1670604
Manzano-Patrón, J. P., Deistler, M., Schröder, C., et al. (2025). Uncertainty mapping and probabilistic tractography using Simulation-based Inference in diffusion MRI: A comparison with classical Bayes. Medical Image Analysis, 103, 103580. doi:10.1016/j.media.2025.103580
Zhu, S., Huszar, I. N., Cottaar, M., et al. (2025). Imaging the structural connectome with hybrid MRI-microscopy tractography. Medical Image Analysis, 102, 103498. doi:10.1016/j.media.2025.103498
Penny, W. D., Stephan, K. E., Mechelli, A., & Friston, K. J. (2004). Comparing dynamic causal models. NeuroImage, 22(3), 1157-1172. doi:10.1016/j.neuroimage.2004.03.026
Rosa, M. J., Friston, K., & Penny, W. (2012). Post-hoc selection of dynamic causal models. Journal of Neuroscience Methods, 208(1), 66-78. doi:10.1016/j.jneumeth.2012.04.013
Smith, S. M., Miller, K. L., Salimi-Khorshidi, G., Webster, M., Beckmann, C. F., Nichols, T. E., Ramsey, J. D., & Woolrich, M. W. (2011). Network modelling methods for FMRI. NeuroImage, 54(2), 875-891. doi:10.1016/j.neuroimage.2010.08.063
Barnett, L., & Seth, A. K. (2017). Detectability of Granger causality for subsampled continuous-time neurophysiological processes. Journal of Neuroscience Methods, 275, 93-121. doi:10.1016/j.jneumeth.2016.10.016
Vink, J. J. T., Klooster, D. C. W., Ozdemir, R. A., Westover, M. B., Pascual-Leone, A., & Shafi, M. M. (2020). EEG Functional Connectivity is a Weak Predictor of Causal Brain Interactions. Brain Topography, 33(2), 221-237. doi:10.1007/s10548-020-00757-6
Villaverde, A. F., Tsiantis, N., & Banga, J. R. (2019). Full observability and estimation of unknown inputs, states and parameters of nonlinear biological models. Journal of the Royal Society Interface, 16(156), 20190043. doi:10.1098/rsif.2019.0043
Novelli, L., Barnett, L., Seth, A. K., & Razi, A. (2025). Minimum-Phase Property of the Hemodynamic Response Function, and Implications for Granger Causality in fMRI. Human Brain Mapping, 46(10), e70285. doi:10.1002/hbm.70285
Jafarian, A., Karadag Assem, M., Kocagoncu, E., et al. (2024). Reliability of dynamic causal modelling of resting-state magnetoencephalography. Human Brain Mapping, 45(10), e26782. doi:10.1002/hbm.26782
Yan, J., Zhang, S.-W., Zhang, C., Huang, W., Shi, J., & Chen, L. (2026). Dynamical Causality under Latent Confounders for Biological Network Reconstruction. IEEE Transactions on Pattern Analysis and Machine Intelligence. doi:10.1109/TPAMI.2026.3658839
Lynn, C. W., Cornblath, E. J., Papadopoulos, L., et al. (2021). Broken detailed balance and entropy production in the human brain. PNAS, 118(47), e2109889118. doi:10.1073/pnas.2109889118
Ishihara, K., & Shimazaki, H. (2025). State-space kinetic Ising model reveals task-dependent entropy flow in sparsely active nonequilibrium neuronal dynamics. Nature Communications, 16, 10852. doi:10.1038/s41467-025-66669-w
Egger, A., Bayon, C., d'Almeida, J., et al. (2024). Chrono-EEG dynamics influencing hand gesture decoding: a 10-hour study. Scientific Reports, 14, 20247. doi:10.1038/s41598-024-70609-x
Idziak, A., Inavalli, V. V. G. K., Bancelin, S., Arizono, M., & Nagerl, U. V. (2023). The Impact of Chemical Fixation on the Microanatomy of Mouse Organotypic Hippocampal Slices. eNeuro. doi:10.1523/ENEURO.0104-23.2023
Benisty, H., Barson, D., Moberly, A. H., et al. (2024). Rapid fluctuations in functional connectivity of cortical networks encode spontaneous behavior. Nature Neuroscience. doi:10.1038/s41593-023-01498-y

Where to go next

Return to Verification for the authoritative card fields, to Datasets / L0 Practice for hands-on implementation, or to Verification Casework for cross-domain precedents.