Wiki

Wiki: Baseline / Benchmark / Pre-registration / Model Card

From score reporting to the current artifact stack required by the claim

Mind Uploading Research Project

Public Page Updated: 2026-04-04 Learning guide / current card-field sync

How to use this page

Read this first to avoid getting lost

This page explains the operational difference among baseline, benchmark, pre-registration, model card, and the additional cards that become necessary when a result depends on multimodal fusion, large-scale pretraining, shortcut resistance, language-facing decoding, route-specific measurement / inference claims, living-human proxy bundles, or sequential same-subject bridges. The current revision also shows where card names alone became too coarse and where field-level disclosure is now required.

  • A benchmark is not just a score sheet; on this site it also includes split rules, metric semantics, and benchmark governance.
  • A normal model card is not enough for every claim shape; some results need claim-triggered companion cards, and some now also need route-specific cards or logs.
  • For decode or representation claims, shortcut resistance is a separate audit from score reporting.
  • For language-facing text / speech outputs, the Neural Contribution Card is now treated separately from the generic shortcut audit.
  • For multimodal or atlas-prior claims, a Fusion Card is separate from synchronization middleware or co-registration, and now also has to separate effective-window mismatch, shared-vs-specific factors, quantity bridges, and bundle robustness.
  • For ESI, tractography, effective-connectivity, and thermodynamic claims, the current site rule now asks for route-specific disclosure rather than only a generic model card.
  • For several living-human proxy rows used together, a Human Proxy Composition Card is required before same-subject state-identification language is allowed, and it now also has to disclose role by row, regime compatibility, operational maturity, calibrator role, and disagreement topology.
  • For same-subject or same-brain sequential bridges, a State-Continuity Bridge Card is required before same-state language is allowed, and it now also has to name the carried object, tolerance / failure rule, and rescue route.
Best for
People who find baseline, benchmark, pre-registration, and model-card language similar, and people who want to know which extra cards are required before a stronger claim is allowed
Reading time
12-18 minutes
Accuracy note
This page is a learning guide to the artifact stack. The authoritative card fields and stop rules still live on the public Verification page.

Relatively clear at this stage

What we know now

  • Comparable progress requires a baseline, a benchmark object, preregistered stopping rules, a result report, and explicit failure disclosure.
  • Benchmark meaning depends not only on the dataset and score, but also on split randomness, metric bundle, extra-data policy, operational constraints, and postmortems.
  • Observability Budget, Specificity & Shortcut Card, Neural Contribution Card, Fusion Card, Pretraining Card, route-specific cards / logs, Human Proxy Composition Card, Temporal Validity Card, Calibration & Abstention Card, and State-Continuity Bridge Card answer different failure modes.
  • Card names alone are no longer sufficient on this site: Fusion, Human Proxy Composition, and State-Continuity Bridge Cards each now require field-level disclosure to block newer forms of overreading.
  • A higher score can still be scientifically weak if the artifact stack does not match the claim being made.

Still unresolved beyond this point

What we still do not know

  • Which subsets of this artifact stack will become field-wide defaults beyond this site is still unsettled.
  • The exact minimum disclosure expected for negative results and failure examples will continue to evolve.

Learn the basics

Check the basics in the wiki

What the wiki is for

The wiki is a learning aid. For the project's official current synthesis, success criteria, and operating rules, always return to the public pages.

The shortest map

A baseline is the minimum comparison partner. A benchmark fixes not only the task and score but also the split rules, metric bundle, and operational reading of the result. Pre-registration fixes what success, failure, and abstention mean before the run starts. A model card reports what happened. Additional cards are then attached depending on the claim shape. Without that stack, a good score is still not comparable progress.

2026-03 addendum: L1 and above still need an Observability Budget

For L1 and higher results, this site still stacks the Observability Budget on top of the usual model card so the measurement stack, direct observables, residual latent state, claim ceiling, and abstention conditions are visible rather than implied by the score.

2026-03-25 addendum: a benchmark is not just data plus one score

The earlier version of this page still let benchmark sound like a static score sheet. That is too weak. The official EEG Challenge (2025) homepage states that the original challenge preprint became outdated during execution and that the website plus starter kit should be treated as current. The official rules require disclosure of additional pretraining data, pretrained models / fine-tuning method, and single-GPU 20 GB inference-stage constraints, while the official leaderboard later disclosed that Challenge 2 samples had not been randomized, which changed the prize structure and what the ranking meant. Xiong et al. (2025) and Liu et al. (2026) then showed more generally that protocol and evaluation choices materially affect EEG-foundation-model comparisons. Likewise, Saito & Rehmsmeier (2015), Roy et al. (2021), Sun et al. (2017), and Vallat & Walker (2021) show why metric semantics also change what a score means. Therefore, on this site, a benchmark now includes split / randomization rule, task-matched metric bundle, benchmark version, extra-data / checkpoint policy, inference-stage restrictions, and organizer postmortems, not only a dataset name and one number.

2026-03-25 addendum: a model card is not the whole artifact stack

The next weakness was to let a generic model card sound like the final reporting layer for every result. That is also too weak. Chaibub Neto et al. (2019), Xu et al. (2020), and Di et al. (2021) show why decoding and transfer claims can still ride on subject / acquisition shortcuts, so score reporting alone does not establish target-variable specificity. Kothe et al. (2025), Wei et al. (2020), Vafaii et al. (2024), and Chen et al. (2025) show why simultaneous or multimodal does not replace a fusion audit. Johansen et al. (2024), Li et al. (2025), Baadsvik et al. (2024), Hirschler et al. (2025), and Dagum et al. (2026) constrain different living-human quantity types and burdens rather than one already field-ready whole-brain meter. Lu et al. (2023), Bosch et al. (2022), MICrONS Consortium et al. (2025), and Attardo et al. (2015) show why same-subject or same-brain wording still leaves a sequential bridge burden. Therefore, on this site, the model card is only one layer in a claim-triggered artifact stack.

2026-03-30 addendum: generic companion cards are no longer enough for the current site rule

The next weakness on this page was narrower but important: it still listed only the earlier generic companion cards, even though the current public site now requires route-specific cards or logs for several claim families. The primary literature does not support compressing these routes into one generic reporting layer. Tang et al. (2023), d'Ascoli et al. (2025), and Wairagkar et al. (2025) show why language-facing outputs need a Neural Contribution Card plus temporal and calibration disclosure rather than only a score. Horrillo-Maysonnial et al. (2023), Rong et al. (2025), and Feng et al. (2025) show why ESI claims need validation-class, source-regime, and benchmark-object typing plus solver-disagreement disclosure. Gajwani et al. (2023), He et al. (2024), and Manzano-Patrón et al. (2025) show why tractography needs object typing rather than one graph headline. Smith et al. (2011), Barnett & Seth (2017), Villaverde et al. (2019), and Novelli et al. (2025) show why effective-connectivity graphs remain model-conditioned unless closure, node policy, and sampling sensitivity are disclosed. Lynn et al. (2021) and Ishihara & Shimazaki (2025) show why irreversibility language hides multiple estimator families and closure assumptions. Therefore, this page now separates generic companion cards from route-specific cards / logs instead of treating them as one bucket.

2026-04-04 addendum: three cards now need field-level disclosure, not name-level disclosure

The next weakness on this page became visible only after the route-stack expansion: three cards still sounded satisfied by a short label even though the current primary literature says otherwise. Kothe et al. (2025), Vafaii et al. (2024), Chen et al. (2025), Bolt et al. (2025), Epp et al. (2025), Rohaut et al. (2024), and Manasova et al. (2026) show why Fusion still has to separate synchronization, temporal-kernel relation, shared-vs-specific structure, quantity bridge, and bundle robustness. Li et al. (2025), Bøgh et al. (2024), Morgan et al. (2024), and Manasova et al. (2026) show why Human Proxy Composition still has to separate quantity type, operating-point dependence, method-family non-equivalence, and disagreement topology. Bosch et al. (2022), MICrONS Consortium et al. (2025), Gallego et al. (2020), Karpowicz et al. (2025), Wilson et al. (2025), and Wairagkar et al. (2025) show why State-Continuity Bridge still has to name the carried object, tolerance rule, and rescue route rather than relying on specimen identity or score survival alone. Therefore, this guide no longer treats those three cards as satisfied by a short name plus one sentence.

First separate the roles

Artifact Main role What it fixes that the others do not
Baseline The minimum comparison partner. Prevents a new score from being read without context.
Benchmark The benchmark object: task, split rules, metric bundle, and governance. Fixes what comparison actually means before any score is interpreted.
Pre-registration The promise made before the run. Fixes success, failure, stopping, and abstention rules before hindsight pressure appears.
Model card The result report for one trained system. Records scores, baselines, failure examples, compute usage, and practical weaknesses of the submitted system.
Observability Budget The measurement-side ceiling. Fixes what was directly observed, what remained latent, and which claim ceiling still applies.
Specificity & Shortcut Card The shortcut audit for decode / biomarker / transfer claims. Separates the target neural variable from subject, session, site, device, protocol, and other nuisance routes.
Neural Contribution Card The language-facing shortcut audit. Fixes task constraint, candidate set, prompt or language-model scaffold, no-brain / no-LM / shuffle controls, and subject cooperation for text / speech outputs.
Fusion Card The multimodal / atlas-prior integration audit. Fixes acquisition relation, lag audit, effective-window / temporal-kernel relation, geometry / co-registration scope, fusion model, shared-vs-specific component disclosure, quantity bridge / physiology grounding, unimodal baselines, complete-case / missing-modality disclosure, transfer or disagreement window, external calibration, and abstention.
Pretraining Card The EEG foundation / self-supervised transfer audit. Fixes corpus identity / overlap, harmonization, adaptation regime, benchmark provenance, and efficiency constraints.
Route-specific cards / logs The claim-family-specific disclosure layer. Types ESI, tractography, effective-connectivity, irreversibility, intervention, and boundary claims by their own failure modes instead of compressing them into one generic report.
Human Proxy Composition Card The bundle audit for several living-human proxy rows. Fixes proxy class, direct observable and evidence role by row, same-subject relation, effective time window / state axis, regime compatibility, operational maturity, calibrator role, model burden, method-family non-equivalence, agreement / disagreement topology plus resolution policy, incremental evidence, and residual latent-state ceiling.
State-Continuity Bridge Card The sequential bridge audit. Fixes acquisition order, elapsed time, regime continuity, coordinate transfer / deformation, carried object / bridge witness, tolerance / failure rule, rescue route versus raw continuity, bridge-validation rung, and residual drift ceiling before same-state language is allowed.
Temporal Validity Card The time-generalization ceiling. Fixes fixed-decoder interval, state annotation, recalibration burden, drift handling, and transfer ceiling across hours to days.
Calibration & Abstention Card The uncertainty and fallback audit. Fixes fit / calibration / test separation, evaluation slice, coverage-risk target, and fallback behavior when outputs include confidence or abstention.
Failure examples / negative results The record of where things broke. Prevents the field from learning only from accidental successes.

What a benchmark fixes on this site

Benchmark field Why it matters What goes wrong if it is missing
Task and target definition States exactly what is predicted or detected. A score can be overread as if it applied to a broader task family.
Split / randomization rule Defines whether subject, session, trial order, and hidden grouping were controlled. Identity or contiguous-trial shortcuts can change the meaning of the leaderboard.
Task-matched metric bundle Fixes which metrics are needed for this task, such as false alarms, latency, macro-F1, or kappa. One headline score can hide the real failure mode.
Extra-data / checkpoint policy States whether outside data or pretrained models changed the comparison. Transfer gains can be misread as if they came only from the submitted pipeline.
Operational restrictions Fixes inference-time compute, code-submission conditions, and other deployment-side constraints. A result can be misread as portable when it depended on a looser operating regime.
Version / postmortem status States whether organizer updates, starter-kit changes, or later error disclosures changed the benchmark object. An obsolete preprint or early leaderboard can be overread as the final benchmark truth.

Which extra artifacts are triggered by the claim

Claim shape Base stack Extra artifact to add What it blocks
Any L1+ measurement claim Baseline + Benchmark + Pre-registration + Model card Observability Budget Stops a measurement stack from being overread as if it directly observed more than it did.
Decode / biomarker / transfer claim Base stack + Observability Budget Specificity & Shortcut Card Stops subject / session / site / device / protocol shortcuts from being mistaken for target-variable capture.
Foundation / self-supervised EEG claim Base stack + Observability Budget Pretraining Card plus Specificity & Shortcut Card Stops a transfer win from being overread as generic portability or shortcut-resistant representation learning.
Language-facing text / speech / brain-to-text claim Base stack + Observability Budget Neural Contribution Card plus Specificity & Shortcut Card; add Calibration & Abstention Card when confidence, retrieval-set, or prediction-set language is reported. Stops fluent output or top-k retrieval from being overread as if neural contribution, confidence, and prompt dependence were already separated.
EEG source imaging / inverse reconstruction claim Base stack + Observability Budget Inverse-Solver Agreement Log plus named validation class, source regime, montage / coverage policy, and benchmark-object disclosure. Stops one localization headline from being overread as if depth bias, source extent, coverage geometry, and solver disagreement were already resolved.
Diffusion-MRI tractography / structural-connectome claim Base stack + Observability Budget Tractography route card Stops one tractography graph from being overread as if acquisition, endpoint assignment, graph construction, uncertainty, and calibration were fixed.
Effective-connectivity / DCM / directed-graph claim Base stack + Observability Budget Effective-connectivity route card Stops a directed graph from being overread as discovered causal wiring when candidate-model family, closure, node policy, and sampling sensitivity remain implicit.
Thermodynamic / irreversibility claim Base stack + Observability Budget Irreversibility / thermodynamic route card Stops arrow-of-time or entropy-flow language from being overread as if signal route, coarse-graining, estimator family, and quantity type were already fixed.
Multimodal or atlas-prior claim Base stack + Observability Budget Fusion Card Stops simultaneity, synchronization middleware, or a prior from standing in for validated fusion.
Intervention / closed-loop claim Base stack + Observability Budget Intervention Card; for embodied or human-in-the-loop claims, also add the Body / Environment Boundary Card. Stops low latency or one control trace from standing in for a typed intervention, preserved loop boundary, or safe deployment claim.
Several living-human proxy rows used together Base stack + Observability Budget Human Proxy Composition Card Stops proxy-rich bundles from being overread as same-subject state identification when role by row, regime compatibility, maturity, and disagreement remain implicit.
Same-subject / same-brain sequential bridge Base stack + Observability Budget State-Continuity Bridge Card; add Temporal Validity Card when the bridge crosses hours to days or a fixed-decoder interval is claimed. Stops specimen identity, score survival, or rescue-dependent stability from being overread as same-state evidence or stable time-generalization.
Outputs with probabilities, intervals, prediction sets, or abstention Base stack + the cards already triggered by the claim Calibration & Abstention Card Stops raw confidence, threshold tuning, or selective reporting from being overread as calibrated risk control.

Three cards whose short labels had become too weak

The next problem was not card count but card sufficiency. After the stack expansion, these three cards still sounded satisfied by a short label even though the current site rule had already become stricter. This section brings the learning guide up to that current field-level requirement.

Card Why the older short description is now too weak Minimum fields this guide now expects
Fusion Card Kothe et al. (2025), Vafaii et al. (2024), Chen et al. (2025), Bolt et al. (2025), Epp et al. (2025), Rohaut et al. (2024), and Manasova et al. (2026) show why synchronized acquisition, shared low-frequency structure, a quantity bridge, and bundle robustness are different achievements. Acquisition relation, effective-window / temporal-kernel relation, shared-vs-specific component disclosure, quantity bridge / physiology grounding, unimodal and prior-only baselines, complete-case / missing-modality policy, transfer or disagreement window, external calibration, and abstention.
Human Proxy Composition Card Li et al. (2025), Bøgh et al. (2024), Morgan et al. (2024), Vafaii et al. (2024), Chen et al. (2025), and Manasova et al. (2026) show why quantity type, operating point, common-driver burden, and disagreement topology still matter even when all rows are real human data. Proxy class, direct observable and evidence role by row, effective time window / state axis, regime compatibility, operational maturity, calibrator role, method-family non-equivalence, cross-row agreement / disagreement plus resolution policy, increment over the strongest single row, and residual latent-state ceiling.
State-Continuity Bridge Card Bosch et al. (2022), MICrONS Consortium et al. (2025), Gallego et al. (2020), Van De Ville et al. (2021), Karpowicz et al. (2025), Wilson et al. (2025), and Wairagkar et al. (2025) show why specimen identity, carried object, rescue strategy, and score survival are different objects. Bridge class, acquisition order, elapsed time, regime continuity, coordinate transfer / deformation, carried object / witness, tolerance / failure rule, rescue route versus raw continuity, bridge-validation rung, and residual drift ceiling.

Why route-specific cards had to be added

Claim family Why a generic score sheet is too weak Card or log this site now asks for
Language-facing decode / speech High scores can still come from candidate-set restriction, language priors, prompt scaffolds, or session-specific support rather than the target neural route alone. Neural Contribution Card plus Specificity & Shortcut Card; add Calibration & Abstention Card and Temporal Validity Card when the claim leaves same-session.
EEG source imaging / inverse reconstruction A single localization score is too weak because validation class, source regime, montage / coverage policy, source depth or extent, and solver disagreement can all change what the result means. Inverse-Solver Agreement Log plus named validation class and benchmark-object disclosure.
Tractography / structural connectome Hub maps and connectome metrics can shift with acquisition / harmonization, cortical endpoint assignment, graph construction, uncertainty routing, and external calibration; the graph is not one fixed object by default. Tractography route card.
Effective connectivity / DCM The output still depends on candidate-model family, observed-subsystem closure / latent-confound audit, node-definition policy, sampling / transformation sensitivity, validation, and reliability window. Effective-connectivity route card.
Thermodynamic irreversibility Different papers compute different quantities from different signal routes, coarse-grainings, and estimator families, so one irreversibility headline does not name one measurement object. Irreversibility / thermodynamic route card.
Multimodal / atlas-prior integration A synchronized or atlas-informed result can still mix incompatible temporal objects, physiology-linked shared factors, missing-modality slices, and modality-specific disagreements instead of one validated biological quantity. Fusion Card with effective-window, shared-vs-specific, quantity-bridge, complete-case, and disagreement disclosure.
Closed loop / embodied controller Latency alone does not tell you what was perturbed, which sensory / motor / interoceptive channels were preserved or omitted, or how far the result generalizes across time. Intervention Card plus Body / Environment Boundary Card; add Temporal Validity Card when the claim rises above a same-session demo.
Living-human proxy bundle Proxy-rich human evidence can still mix different quantity types, spatial units, timescales, model burdens, role assignments, and disagreement topologies rather than one same-subject state sample. Human Proxy Composition Card with role by row, regime compatibility, maturity, calibrator role, and disagreement disclosure.
Sequential same-subject / same-brain bridge Specimen identity does not by itself fix state continuity across fixation, deformation, sleep / wake regime, elapsed time, cross-day reacquisition, or adaptation-assisted score rescue. State-Continuity Bridge Card, plus Temporal Validity Card when the bridge spans hours to days, with carried object, tolerance rule, and rescue-mode disclosure.

Why the stack is cumulative

If this is missing The usual failure mode
No baseline It becomes unclear whether a gain is meaningful or trivial.
No benchmark object Different runs or papers are scored under different hidden rules and still get compared anyway.
No pre-registration Success and stopping conditions can drift after the result is known.
No model card Only the headline number remains visible while failure modes disappear.
No companion card matched to the claim The result is silently promoted above the evidence it actually supports.
No negative-result record The same failure gets rediscovered and renamed as if it were new.

View it as a minimal flow

01

Put down a baseline

Start with the minimum comparison partner, even if it is simple.

02

Fix the benchmark object

Align the task, split / randomization rule, task-matched metric bundle, and operational benchmark constraints before comparing systems.

03

Pre-register success, failure, and abstention

Decide in advance what counts as passing, stopping, or refusing a stronger claim.

04

Report the result with the triggered companion cards

Leave the model card, the Observability Budget, and any extra cards required by the claim shape.

05

Keep failure examples visible

Record where the pipeline broke, not only where it happened to work.

Why failure examples stay in the stack

If only successful cases are kept, the field learns a distorted map of where the claim ceiling really is. On this site, a usable failure record states the condition, which metric failed, whether the failure came from leakage, OOD shift, compute limits, bridge failure, or fusion mismatch, and what stronger claim therefore remains blocked.

Minimum failure record

State the data regime, split rule, metric bundle, triggered cards, and the first place the claim lost support. A vague sentence such as it did not generalize is weaker than a report that says cross-site transfer collapsed after harmonization changed, false alarms doubled, and the Specificity & Shortcut Card stayed unresolved.

Minimum checks when reading public pages

Checklist

  • Is there a baseline? Is it clear what the new result is compared against?
  • Is the benchmark object fixed? Are the task, split / randomization rule, metric bundle, version, and governance visible?
  • Is there a pre-registration? Are success, failure, stopping, and abstention rules stated before the run?
  • Is there a model card plus Observability Budget? Are scores, weaknesses, failure examples, direct observables, latent state, and claim ceiling visible?
  • If it is a decode / biomarker / transfer result, is a Specificity & Shortcut Card visible? Are subject, session, site, device, and protocol shortcuts audited separately?
  • If it emits text or speech, is a Neural Contribution Card visible? Are candidate set, language-model or prompt scaffold, no-brain / no-LM controls, and subject cooperation disclosed?
  • If it is a foundation / self-supervised EEG result, is a Pretraining Card visible? Are corpus overlap, harmonization, adaptation regime, benchmark version, and inference-stage restrictions written?
  • If it is an ESI, tractography, effective-connectivity, or thermodynamic claim, is the route-specific card or log visible? Are validation class / graph object / closure / estimator-family details written rather than hidden in one headline?
  • If it is multimodal or atlas-prior, is a Fusion Card visible? Are acquisition relation, effective-window / temporal-kernel relation, shared-vs-specific disclosure, quantity bridge, complete-case or missing-modality policy, and external calibration written?
  • If it is a closed-loop or intervention result, is an Intervention Card visible, and if embodiment matters is a Body / Environment Boundary Card visible? Are trigger rule, timing audit, preserved loop channels, and slow-boundary omissions disclosed?
  • If several living-human proxy rows are used together, is a Human Proxy Composition Card visible? Are proxy class, direct observable and role by row, regime compatibility, maturity / calibrator role, disagreement topology, and increment over the strongest single row disclosed?
  • If the claim bridges same-subject or same-brain measurements across regimes, is a State-Continuity Bridge Card visible? Are carried object, tolerance / failure rule, rescue route, elapsed time, regime continuity, deformation / registration burden, and bridge-validation rung written?
  • If the claim rises across hours to days or reports confidence / abstention, are Temporal Validity and Calibration & Abstention Cards visible? Are recalibration burden, transfer ceiling, fit/calibration/test separation, and fallback behavior written?

References

  1. Saito, T., & Rehmsmeier, M. (2015). The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets. doi:10.1371/journal.pone.0118432
  2. Roy, Y., Banville, H., Albuquerque, I., et al. (2021). Deep learning-based electroencephalography analysis: a systematic review. doi:10.1016/j.ebiom.2021.103275
  3. Sun, H., Paixao, L., Oliva, J. T., et al. (2017). Brain age from the electroencephalogram of sleep. doi:10.1093/sleep/zsx139
  4. Vallat, R., & Walker, M. P. (2021). An open-source, high-performance tool for automated sleep staging. doi:10.7554/eLife.70092
  5. Chaibub Neto, E., Pratap, A., Perumal, T. M., et al. (2019). Detecting the impact of subject characteristics on machine learning-based diagnostic applications. doi:10.1038/s41746-019-0178-x
  6. Xu, M., Yao, S., Wei, Z., et al. (2020). Cross-dataset variability problem in EEG decoding with deep learning. doi:10.3389/fnhum.2020.00103
  7. Di, Y., An, X., Zhong, W., Liu, S., & Ming, D. (2021). The Time-Robustness Analysis of Individual Identification Based on Resting-State EEG. doi:10.3389/fnhum.2021.672946
  8. EEG Challenge (2025) homepage. official site
  9. EEG Challenge (2025) rules. official rules
  10. EEG Challenge (2025) starter kit. official starter kit page
  11. EEG Challenge (2025) leaderboard and organizer postmortem. official leaderboard
  12. Xiong, W., Li, J., Li, J., Zhu, K., & Jiang, C. (2025/2026). EEG-FM-Bench: A Comprehensive Benchmark for the Systematic Evaluation of EEG Foundation Models. arXiv:2508.17742
  13. Liu, D., Chen, Y., Chen, Z., Cui, Z., Wen, Y., An, J., Luo, J., & Wu, D. (2026). EEG Foundation Models: Progresses, Benchmarking, and Open Problems. arXiv:2601.17883
  14. Kothe, C., Shirazi, S. Y., Stenner, T., et al. (2025). The lab streaming layer for synchronized multimodal recording. doi:10.1162/IMAG.a.136
  15. Wei, H., Jafarian, A., Zeidman, P., et al. (2020). Bayesian fusion and multimodal DCM for EEG and fMRI. doi:10.1016/j.neuroimage.2020.116595
  16. Vafaii, H., Mandino, F., Desrosiers-Gregoire, G., et al. (2024). Multimodal measures of spontaneous brain activity reveal both common and divergent patterns of cortical functional organization. doi:10.1038/s41467-023-44363-z
  17. Chen, J. E., Lewis, L. D., Coursey, S. E., et al. (2025). Simultaneous EEG-PET-MRI identifies temporally coupled and spatially structured brain dynamics across wakefulness and NREM sleep. doi:10.1038/s41467-025-64414-x
  18. Johansen, A., Beliveau, V., Colliander, E., et al. (2024). An In Vivo High-Resolution Human Brain Atlas of Synaptic Density. doi:10.1523/JNEUROSCI.1750-23.2024
  19. Li, X., Zhu, X.-H., Li, Y., et al. (2025). Quantitative mapping of key glucose metabolic rates in the human brain using dynamic deuterium magnetic resonance spectroscopic imaging. doi:10.1093/pnasnexus/pgaf072
  20. Baadsvik, E. L., Weiger, M., Froidevaux, R., et al. (2024). Myelin bilayer mapping in the human brain in vivo. doi:10.1002/mrm.29998
  21. Hirschler, L., et al. (2025). Region-specific drivers of CSF mobility measured with MRI in humans. doi:10.1038/s41593-025-02073-3
  22. Dagum, P., et al. (2026). The glymphatic system clears amyloid beta and tau from brain to plasma in humans. doi:10.1038/s41467-026-68374-8
  23. Lu, X., Han, X., Meirovitch, Y., et al. (2023). Preserving extracellular space for high-quality optical and ultrastructural studies of whole mammalian brains. doi:10.1016/j.crmeth.2023.100520
  24. Bosch, C., Ackels, T., Pacureanu, A., et al. (2022). Functional and multiscale 3D structural investigation of brain tissue through correlative in vivo physiology, synchrotron microtomography and volume electron microscopy. doi:10.1038/s41467-022-30199-6
  25. MICrONS Consortium, Bae, J. A., Lee, W.-C. A., et al. (2025). Functional connectomics spanning multiple areas of mouse visual cortex. doi:10.1038/s41586-025-08790-w
  26. Attardo, A., Fitzgerald, J. E., & Schnitzer, M. J. (2015). Impermanence of dendritic spines in live adult CA1 hippocampus. doi:10.1038/nature14467
  27. Tang, J., LeBel, A., Jain, S., & Huth, A. G. (2023). Semantic reconstruction of continuous language from non-invasive brain recordings. Nature Neuroscience, 26, 858-866. doi:10.1038/s41593-023-01304-9
  28. Défossez, A., Caucheteux, C., Rapin, J., Kabeli, O., & King, J.-R. (2023). Decoding speech perception from non-invasive brain recordings. Nature Machine Intelligence, 5, 1097-1107. doi:10.1038/s42256-023-00714-5
  29. d'Ascoli, S., Bel, C., Rapin, J., Banville, H., Benchetrit, Y., Pallier, C., & King, J.-R. (2025). Towards decoding individual words from non-invasive brain recordings. Nature Communications, 16, 10521. doi:10.1038/s41467-025-65499-0
  30. Wairagkar, M., Card, N. S., Singer-Clark, T., Hou, X., Iacobacci, C., Miller, L. M., Hochberg, L. R., Brandman, D. M., & Stavisky, S. D. (2025). An instantaneous voice-synthesis neuroprosthesis. Nature, 644(8075), 145-152. doi:10.1038/s41586-025-09127-3
  31. Horrillo-Maysonnial, A., Avigdor, T., Abdallah, C., et al. (2023). Targeted density electrode placement achieves high concordance with traditional high-density EEG for electrical source imaging in epilepsy. Clinical Neurophysiology, 156, 262-271. doi:10.1016/j.clinph.2023.08.009
  32. Rong, J., Sun, R., Joseph, B., Worrell, G., & He, B. (2025). Deep learning-based EEG source imaging is robust under varying electrode configurations. Clinical Neurophysiology, 175, 2010730. doi:10.1016/j.clinph.2025.04.009
  33. Unnwongse, K., Van Klink, N., Tousseyn, S., et al. (2023). Validating EEG source imaging using intracranial electrical stimulation. Brain Communications, 5(1), fcad023. doi:10.1093/braincomms/fcad023
  34. Hao, S., Zhao, H., Feng, Z., et al. (2025). HD-EEG source imaging with simultaneous SEEG recording in drug-resistant epilepsy. Epilepsia, 66(11), 4451-4464. doi:10.1111/epi.18552
  35. Pascarella, A., Mikulan, E., Sciacchitano, F., et al. (2023). An in-vivo validation of ESI methods with focal sources. NeuroImage, 277, 120219. doi:10.1016/j.neuroimage.2023.120219
  36. Feng, Z., Guan, C., & Sun, Y. (2025). Block-Champagne: A novel Bayesian framework for imaging extended E/MEG source. IEEE Transactions on Medical Imaging. doi:10.1109/TMI.2025.3642620
  37. Gajwani, M., Oldham, S., Pang, J. C., Arnatkevičiūtė, A., Tiego, J., Bellgrove, M. A., & Fornito, A. (2023). Can hubs of the human connectome be identified consistently with diffusion MRI? Network Neuroscience, 7(4), 1277-1304. doi:10.1162/netn_a_00324
  38. He, Y., Hong, Y., Wu, Y., et al. (2024). Spherical-deconvolution informed filtering of tractograms changes laterality of structural connectome. NeuroImage, 297, 120904. doi:10.1016/j.neuroimage.2024.120904
  39. McMaster, E. M., Newlin, N. R., Rudravaram, G., et al. (2025). Harmonized connectome resampling for variance in voxel sizes. Magnetic Resonance Imaging, 122, 110424. doi:10.1016/j.mri.2025.110424
  40. Bramati, I. B., Szczupak, D., Carneiro Monteiro, M., Meireles, F., Menezes Guimarães, D., Dean, R. J., Paul, L. K., & Tovar-Moll, F. (2026). Diffusion MRI sampling schemes bias diffusion metrics and tractography. Frontiers in Neuroimaging, 5, 1670604. doi:10.3389/fnimg.2026.1670604
  41. Manzano-Patrón, J. P., Deistler, M., Schröder, C., et al. (2025). Uncertainty mapping and probabilistic tractography using Simulation-based Inference in diffusion MRI: A comparison with classical Bayes. Medical Image Analysis, 103, 103580. doi:10.1016/j.media.2025.103580
  42. Zhu, S., Huszar, I. N., Cottaar, M., et al. (2025). Imaging the structural connectome with hybrid MRI-microscopy tractography. Medical Image Analysis, 102, 103498. doi:10.1016/j.media.2025.103498
  43. Penny, W. D., Stephan, K. E., Mechelli, A., & Friston, K. J. (2004). Comparing dynamic causal models. NeuroImage, 22(3), 1157-1172. doi:10.1016/j.neuroimage.2004.03.026
  44. Rosa, M. J., Friston, K., & Penny, W. (2012). Post-hoc selection of dynamic causal models. Journal of Neuroscience Methods, 208(1), 66-78. doi:10.1016/j.jneumeth.2012.04.013
  45. Smith, S. M., Miller, K. L., Salimi-Khorshidi, G., Webster, M., Beckmann, C. F., Nichols, T. E., Ramsey, J. D., & Woolrich, M. W. (2011). Network modelling methods for FMRI. NeuroImage, 54(2), 875-891. doi:10.1016/j.neuroimage.2010.08.063
  46. Barnett, L., & Seth, A. K. (2017). Detectability of Granger causality for subsampled continuous-time neurophysiological processes. Journal of Neuroscience Methods, 275, 93-121. doi:10.1016/j.jneumeth.2016.10.016
  47. Vink, J. J. T., Klooster, D. C. W., Ozdemir, R. A., Westover, M. B., Pascual-Leone, A., & Shafi, M. M. (2020). EEG Functional Connectivity is a Weak Predictor of Causal Brain Interactions. Brain Topography, 33(2), 221-237. doi:10.1007/s10548-020-00757-6
  48. Villaverde, A. F., Tsiantis, N., & Banga, J. R. (2019). Full observability and estimation of unknown inputs, states and parameters of nonlinear biological models. Journal of the Royal Society Interface, 16(156), 20190043. doi:10.1098/rsif.2019.0043
  49. Novelli, L., Barnett, L., Seth, A. K., & Razi, A. (2025). Minimum-Phase Property of the Hemodynamic Response Function, and Implications for Granger Causality in fMRI. Human Brain Mapping, 46(10), e70285. doi:10.1002/hbm.70285
  50. Jafarian, A., Karadag Assem, M., Kocagoncu, E., et al. (2024). Reliability of dynamic causal modelling of resting-state magnetoencephalography. Human Brain Mapping, 45(10), e26782. doi:10.1002/hbm.26782
  51. Yan, J., Zhang, S.-W., Zhang, C., Huang, W., Shi, J., & Chen, L. (2026). Dynamical Causality under Latent Confounders for Biological Network Reconstruction. IEEE Transactions on Pattern Analysis and Machine Intelligence. doi:10.1109/TPAMI.2026.3658839
  52. Lynn, C. W., Cornblath, E. J., Papadopoulos, L., et al. (2021). Broken detailed balance and entropy production in the human brain. PNAS, 118(47), e2109889118. doi:10.1073/pnas.2109889118
  53. Ishihara, K., & Shimazaki, H. (2025). State-space kinetic Ising model reveals task-dependent entropy flow in sparsely active nonequilibrium neuronal dynamics. Nature Communications, 16, 10852. doi:10.1038/s41467-025-66669-w
  54. Egger, A., Bayon, C., d'Almeida, J., et al. (2024). Chrono-EEG dynamics influencing hand gesture decoding: a 10-hour study. Scientific Reports, 14, 20247. doi:10.1038/s41598-024-70609-x
  55. Idziak, A., Inavalli, V. V. G. K., Bancelin, S., Arizono, M., & Nagerl, U. V. (2023). The Impact of Chemical Fixation on the Microanatomy of Mouse Organotypic Hippocampal Slices. eNeuro. doi:10.1523/ENEURO.0104-23.2023
  56. Benisty, H., Barson, D., Moberly, A. H., et al. (2024). Rapid fluctuations in functional connectivity of cortical networks encode spontaneous behavior. Nature Neuroscience. doi:10.1038/s41593-023-01498-y

Where to go next

Return to Verification for the authoritative card fields, to Datasets / L0 Practice for hands-on implementation, or to Verification Casework for cross-domain precedents.