Wiki: How to read claims and evidence

Basic rule

When a neuroscience or WBE headline sounds strong, do not ask first whether it sounds impressive. Ask what was directly observed, what was only inferred through a model or prior, and what the result still does not identify. This page is the shortest beginner route for doing that.

2026-03-26 deepening: why this beginner page needed another rewrite

The 2026-03-19 rewrite fixed several major overreads, but one important beginner shortcut still remained: it was still too easy to read same-subject multimodal or living-human whole-brain measurement as if they already meant near-complete state access. The recent public updates across this site now require a Human Proxy Composition Card and a Fusion Card in addition to route cards for tractography, effective connectivity, thermodynamic irreversibility, neural contribution, and body/environment boundary. This page now exposes that human-proxy distinction at the entrance instead of leaving it to later technical pages.

2026-03-29 deepening: a causal-graph headline still needs closure and sampling audits

This page also had one remaining effective-connectivity shortcut. It still sounded too easy to read "found causal wiring" as if candidate-model comparison already covered the main failure modes. The current literature does not support that. Smith et al. (2011) showed that lag-based fMRI methods perform poorly and that functionally inaccurate ROIs are especially damaging to network estimation. Barnett & Seth (2017) showed that subsampling can create detectability black spots. Vink et al. (2020) showed that resting-state EEG functional connectivity explains less than 10% of TMS-evoked propagation variance. Villaverde et al. (2019) showed why observability of states, inputs, and parameters is its own problem, Novelli et al. (2025) showed that slow BOLD sampling can still induce spurious Granger-causal inference, Jafarian et al. (2024) showed that reliability itself is conditional on a tight acquisition regime, and Yan et al. (2026) showed that latent confounders remain an active reconstruction problem. Therefore, on this site, a causal-graph headline still has to disclose observed-subsystem closure / latent-confound audit, node-definition policy, sampling / transformation sensitivity, validation, reliability window, and abstention before it rises above a model-conditioned causal hypothesis.

Rephrasing L0-L5 in everyday language, without losing the technical floor

Level	Safe everyday reading	Minimum evidence floor	Fastest overread to block
L0	Someone else can rerun the same result.	Public data or artifact pack, code, environment, split rules, and logs are complete enough for third-party rerun.	Do not read rerunnable as generalizable.
L1	A signal can be decoded or classified under stated conditions.	Participant-/session-disjoint evaluation, measurement-condition disclosure, relevant baselines, shortcut audit, and abstention when confidence collapses.	Do not read high score as target-specific neural evidence or correct internal mechanism.
L2	The model still predicts or controls something after conditions are changed.	Held-out perturbation or counterfactual evaluation, preregistered success/failure rules, and evidence that the effect survives beyond a fixed dataset regime.	Do not read fit on observed data as causal robustness.
L3	A closed loop runs stably under a disclosed boundary.	Latency/jitter/safe-stop logs, recalibration burden, and a body/environment boundary card naming preserved, substituted, and omitted sensory, motor, and interoceptive loops.	Do not read real-time demo as solved embodiment or state-complete closed loop.
L4	Continuity or identity is being tested explicitly.	Pre-registered continuity tests, branch handling, memory/value/learning criteria, and explicit alternative explanations.	Do not read functional similarity as identity preserved.
L5	A system is being considered for durable operation in the world.	Operational, safety, and governance conditions must exist in public form.	Do not read works in a lab as ready for deployment.

Translating common headline phrases into safer first readings

Headline-style phrase	Safest default reading	What you must ask before reading stronger
"Recovered sentences from the brain"	Usually an L1 decode or assistive BCI result under task- and participant-specific conditions.	What were the task constraint, language prior, candidate set, subject cooperation requirement, calibration burden, and no-brain / no-LM / shuffle baselines?
"Found a biomarker with 95% accuracy"	Usually an L1 classifier under a specific acquisition and split regime.	Were subjects and sessions disjoint, could metadata or subject fingerprint explain the score, and was performance checked across sites/devices/datasets?
"Measured the whole-brain state in living humans"	Usually a proxy-rich human or simultaneous multimodal result that constrains several bounded quantities under specific cohort, hardware, and model burdens.	What did each row directly observe, were rows actually same-subject / same-session / same-perturbation, did shared-vs-specific decomposition and common-driver audit survive, and what calibrator role plus residual hidden-state ceiling remained?
"Mapped the connectome in living humans"	Usually a tractography-conditioned macro pathway estimate, not an edge-complete connectome.	What were the direct observables, tractography priors/filtering choices, uncertainty handling, and same-brain or external validation route?
"Found causal wiring / effective connectivity"	Usually a model-conditioned causal hypothesis.	What candidate model space competed, how was the observed subsystem or latent-confound boundary audited, how were nodes defined, what observation and sampling assumptions were imposed, how was model recovery checked, and what validation or reliability window exists?
"Measured entropy production / irreversibility in brain data"	Usually a modality- and estimator-conditioned auxiliary nonequilibrium analysis.	What were the signal route, coarse-graining, estimator family, null control, quantity type, and abstention boundary?
"Ran a stable real-time closed loop"	Usually a local closed-loop success under a specific sensory/motor contract.	Which loops were preserved, which were substituted, what recalibration was needed, and which body/environment channels remained omitted?

Why the beginner rules had to become stricter

1. High accuracy does not become target-specific evidence by default

This is the first place beginners are most often misled. Chaibub Neto et al. (2019) showed that record-wise splits can inflate performance because the model learns who the participant is, not only the target label. Di et al. (2021) showed that resting-state EEG can support time-robust individual identification. Xu et al. (2020) showed that cross-dataset variability weakens EEG-decoding generalization. Meanwhile, Tang et al. (2023) showed that non-invasive semantic reconstruction requires subject cooperation, and Willett et al. (2023) achieved strong speech-BCI performance under implanted, participant-specific conditions. Therefore, on this site, a decode headline is not read strongly until split unit, measurement condition, task/language prior, and shortcut routes are disclosed.

2. Proxy-rich human multimodal evidence does not automatically become same-subject state closure

The beginner route also needed tightening because current human proxy papers are easy to list rhetorically as if they were already converging on one state-complete meter. The primary literature does not support that shortcut. Johansen et al. (2024) provided a 33-participant SV2A atlas, Lucchetti et al. (2025) defined a five-metabolite parcel-similarity graph in 51 adolescents with 13-person replication, Li et al. (2025) reported 7 T dynamic DMRSI kinetic maps in five healthy participants, Hirschler et al. (2025) reported a specialized 7 T CSF-mobility route in healthy younger adults whose sequence does not determine net-flow direction, and Dagum et al. (2026) inferred model-based overnight biomarker efflux in a randomized crossover trial with 39 participants using an investigational device and a multicompartment model. These are real advances, but they are not one shared inferential object. In parallel, Vafaii et al. (2024) showed both common and divergent structure across simultaneous modalities, Chen et al. (2025) showed coupled global progression plus two distinct network patterns in simultaneous EEG-PET-MRI, Bolt et al. (2025) showed substantial autonomic coupling of a major global fMRI mode, and Epp et al. (2025) showed that approximately 40% of significant cortical ΔBOLD voxels can oppose oxygen-metabolism changes. Therefore, on this site, same-subject, multimodal, and proxy-rich do not by themselves justify whole-brain state language. They instead trigger proxy-class, calibrator-role, and common-driver audits.

3. "Connectome" still names different evidence classes

The beginner route also needed tightening because the word connectome hides a large spread in evidence class. Thomas et al. (2014) showed inherent limits in anatomical accuracy for diffusion-MRI tractography, Maier-Hein et al. (2017) exposed fundamental ambiguities and many invalid bundles in a community challenge, Schilling et al. (2020) showed that high anatomical accuracy depends on strong start/end/exclusion priors, and Grisot et al. (2021) localized recurring failure modes in the same brain. Therefore, a living-human tractography graph is not read here as connectome-complete by default. It stays at macro pathway prior unless the tractography route card is shown.

4. Model-conditioned graphs are not discovered causal wiring

For effective connectivity, the problem is not that DCM or related models are useless. The problem is overreading them. Penny et al. (2004) made explicit that DCM inferences are contingent on model structure, and Rosa et al. (2012) showed that larger candidate spaces can be searched more efficiently. But that is still not enough to read a directed graph as discovered causal wiring. Smith et al. (2011) showed that lag-based fMRI methods perform poorly and that functionally inaccurate ROIs are especially damaging to network estimation. Barnett & Seth (2017) showed that subsampling can create detectability black spots and sweet spots. Vink et al. (2020) showed that resting-state EEG functional connectivity explains less than 10% of TMS-evoked propagation variance. Villaverde et al. (2019) showed why observability of the full input-state-parameter system is its own question. Novelli et al. (2025) showed that realistic HRF variability alone need not force false positives while slow BOLD sampling can still induce spurious Granger-causal inference. Jafarian et al. (2024) showed that reliability can be high under tightly matched MEG sessions, and Yan et al. (2026) showed that latent confounders remain an active reconstruction problem. Therefore, on this site, a dense effective-connectivity graph without model-space disclosure, observed-subsystem closure / latent-confound audit, node-definition policy, sampling / transformation sensitivity, recovery, validation, reliability window, and abstention remains a model-conditioned causal hypothesis.

5. Thermodynamic keywords still hide different measured objects

The older beginner wording was also too weak for thermodynamic claims. Lynn et al. (2021) estimated entropy-production lower bounds from coarse-grained BOLD state transitions, de la Fuente et al. (2023) used temporal irreversibility decoding on ECoG, and Ishihara & Shimazaki (2025) estimated model-based entropy flow in a nonstationary state-space kinetic Ising model. Those are related, but not identical, objects. Therefore this site no longer allows the beginner reading that "thermodynamic paper" automatically means a common measurement of physical dissipation or WBE-relevant cost.

6. Real-time loop success still needs a disclosed body/environment boundary

Finally, closed-loop headlines needed a stricter beginner rule. Musall et al. (2019) showed that richly varied movements dominate cortex-wide activity, Saleem et al. (2013) showed that locomotion changes visual-cortex coding, and Flesher et al. (2021) showed that restoring tactile feedback improves robotic-arm control. The safe reading is therefore not "closed loop solved" but "a specific local loop worked under a specific retained/substituted boundary."

Seven questions before you believe the strong version of a headline

Checklist

What was directly observed? Separate sensor output from inferred internal state.
What model space or prior was imposed? Candidate models, language priors, tractography filters, and neural-mass assumptions all matter.
Which shortcut could reproduce the score? Subject/session fingerprint, metadata leakage, device differences, and candidate-set structure must be checked explicitly.
If this is a human or multimodal bundle, what does each row directly observe? Proxy class, operational maturity, calibrator role, and possible common-driver routes must be named separately.
What external or held-out validation exists? Same-dataset fit is weaker than perturbation, stimulation, same-brain tracing, or external benchmark prediction.
What loops or state variables remain outside the measurement? Boundary and hidden-state omissions still set the claim ceiling.
Where does the paper abstain? A strong paper says which interpretations it does not support.

Where to go next after this beginner page

If the headline is mostly about...	Read this next	Why
Decode / biomarker / speech / EEG score	Measurement-stack observability and claim ceilings	It fixes what was directly observed and which shortcut routes remain open.
Living-human multimodal / proxy-rich whole-brain claim	Human Proxy Composition and Route Maturity	It separates direct observable by row, proxy class, calibrator role, and common-driver audit before same-subject language is allowed.
Connectome / tractography / structural prior	Why wiring diagrams alone are not enough	It separates scaffold progress from hidden-state completeness.
DCM / effective connectivity / causal graph	Effective-connectivity route card	It shows why candidate-model dependence is only the start, and why closure, node definition, sampling, and validation still have to be disclosed.
Entropy production / irreversibility / time arrow	Irreversibility route card	It separates estimator families and null controls.
Closed loop / BCI / embodiment	Closed loops, latency, jitter, and safety stops	It explains why latency logs and boundary disclosure are separate requirements.

References

Chaibub Neto, E., Pratap, A., Perumal, T. M., et al. (2019). Detecting the impact of subject characteristics on machine learning-based diagnostic applications. npj Digital Medicine, 2, 99. doi:10.1038/s41746-019-0178-x
Xu, M., Yao, S., Wei, Z., et al. (2020). Cross-dataset variability problem in EEG decoding with deep learning. Frontiers in Human Neuroscience, 14, 103. doi:10.3389/fnhum.2020.00103
Di, Y., An, X., Zhong, W., Liu, S., & Ming, D. (2021). The time-robustness analysis of individual identification based on resting-state EEG. Frontiers in Human Neuroscience, 15, 672946. doi:10.3389/fnhum.2021.672946
Tang, J., LeBel, A., Jain, S., & Huth, A. G. (2023). Semantic reconstruction of continuous language from non-invasive brain recordings. Nature Neuroscience, 26, 858-866. doi:10.1038/s41593-023-01304-9
Willett, F. R., Kunz, E. M., Fan, C., et al. (2023). A high-performance speech neuroprosthesis. Nature, 620, 1031-1036. doi:10.1038/s41586-023-06377-x
Johansen, A., Bzdok, D., Veronese, M., et al. (2024). An in vivo high-resolution human brain atlas of synaptic density. Journal of Neuroscience. doi:10.1523/JNEUROSCI.1750-23.2024
Vafaii, H., Mandino, F., Desrosiers-Grégoire, G., et al. (2024). Multimodal measures of spontaneous brain activity reveal both common and divergent patterns of cortical functional organization. Nature Communications. doi:10.1038/s41467-023-44363-z
Lucchetti, F., Céléreau, E., Steullet, P., et al. (2025). Constructing the human brain metabolic connectome with MR spectroscopic imaging reveals cerebral biochemical organization. Nature Communications. doi:10.1038/s41467-025-66124-w
Li, X., Zhu, X.-H., Li, Y., et al. (2025). Quantitative mapping of key glucose metabolic rates in the human brain using dynamic deuterium magnetic resonance spectroscopic imaging. PNAS Nexus. doi:10.1093/pnasnexus/pgaf072
Hirschler, L., Runderkamp, B. A. R., Decker, A., et al. (2025). Region-specific drivers of CSF mobility measured with MRI in humans. Nature Neuroscience. doi:10.1038/s41593-025-02073-3
Chen, J. E., Lewis, L. D., Coursey, S. E., et al. (2025). Simultaneous EEG-PET-MRI identifies temporally coupled and spatially structured brain dynamics across wakefulness and NREM sleep. Nature Communications. doi:10.1038/s41467-025-64414-x
Bolt, T., Wang, S., Nomi, J. S., et al. (2025). Autonomic physiological coupling of the global fMRI signal. Nature Neuroscience. doi:10.1038/s41593-025-01945-y
Epp, S. M., Castrillon, G., Yuan, B., et al. (2025). BOLD signal changes can oppose oxygen metabolism across the human cortex. Nature Neuroscience. doi:10.1038/s41593-025-02132-9
Dagum, P., Elbert, D. L., Giovangrandi, L., et al. (2026). The glymphatic system clears amyloid beta and tau from brain to plasma in humans. Nature Communications. doi:10.1038/s41467-026-68374-8
Thomas, C., Ye, F. Q., Irfanoglu, M. O., et al. (2014). Anatomical accuracy of brain connections derived from diffusion MRI tractography is inherently limited. PNAS, 111(46), 16574-16579. doi:10.1073/pnas.1405672111
Maier-Hein, K. H., Neher, P. F., Houde, J.-C., et al. (2017). The challenge of mapping the human connectome based on diffusion tractography. Nature Communications, 8, 1349. doi:10.1038/s41467-017-01285-x
Schilling, K. G., Petit, L., Rheault, F., et al. (2020). Brain connections derived from diffusion MRI tractography can be highly anatomically accurate if we know where white matter pathways start, where they end, and where they do not go. Brain Structure and Function, 225, 2387-2402. doi:10.1007/s00429-020-02129-z
Grisot, G., Haber, S. N., Hawrylycz, M., Yendiki, A., et al. (2021). Diffusion MRI and anatomic tracing in the same brain reveal common failure modes of tractography. NeuroImage, 239, 118300. doi:10.1016/j.neuroimage.2021.118300
Penny, W. D., Stephan, K. E., Mechelli, A., & Friston, K. J. (2004). Comparing dynamic causal models. NeuroImage, 22(3), 1157-1172. doi:10.1016/j.neuroimage.2004.03.026
Rosa, M. J., Friston, K., & Penny, W. (2012). Post-hoc selection of dynamic causal models. Journal of Neuroscience Methods, 208(1), 66-78. doi:10.1016/j.jneumeth.2012.04.013
Frässle, S., Paulus, F. M., Krach, S., & Jansen, A. (2016). Test-retest reliability of effective connectivity in the face perception network. Human Brain Mapping, 37(2), 730-744. doi:10.1002/hbm.23061
Frässle, S., Manjaly, Z. M., Do, C. T., et al. (2021). Whole-brain estimates of directed connectivity for human connectomics. NeuroImage, 225, 117491. doi:10.1016/j.neuroimage.2020.117491
Smith, S. M., Miller, K. L., Salimi-Khorshidi, G., Webster, M., Beckmann, C. F., Nichols, T. E., Ramsey, J. D., & Woolrich, M. W. (2011). Network modelling methods for FMRI. NeuroImage, 54(2), 875-891. doi:10.1016/j.neuroimage.2010.08.063
Barnett, L., & Seth, A. K. (2017). Detectability of Granger causality for subsampled continuous-time neurophysiological processes. Journal of Neuroscience Methods, 275, 93-121. doi:10.1016/j.jneumeth.2016.10.016
Villaverde, A. F., Tsiantis, N., & Banga, J. R. (2019). Full observability and estimation of unknown inputs, states and parameters of nonlinear biological models. Journal of the Royal Society Interface, 16(156), 20190043. doi:10.1098/rsif.2019.0043
Vink, J. J. T., Klooster, D. C. W., Ozdemir, R. A., Westover, M. B., Pascual-Leone, A., & Shafi, M. M. (2020). EEG Functional Connectivity is a Weak Predictor of Causal Brain Interactions. Brain Topography, 33(2), 221-237. doi:10.1007/s10548-020-00757-6
Jafarian, A., Karadag Assem, M., Kocagoncu, E., et al. (2024). Reliability of dynamic causal modelling of resting-state magnetoencephalography. Human Brain Mapping, 45(10), e26782. doi:10.1002/hbm.26782
Novelli, L., Barnett, L., Seth, A. K., & Razi, A. (2025). Minimum-Phase Property of the Hemodynamic Response Function, and Implications for Granger Causality in fMRI. Human Brain Mapping, 46(10), e70285. doi:10.1002/hbm.70285
Yan, J., Zhang, S.-W., Zhang, C., Huang, W., Shi, J., & Chen, L. (2026). Dynamical Causality under Latent Confounders for Biological Network Reconstruction. IEEE Transactions on Pattern Analysis and Machine Intelligence. doi:10.1109/TPAMI.2026.3658839
Lynn, C. W., Cornblath, E. J., Papadopoulos, L., et al. (2021). Broken detailed balance and entropy production in the human brain. PNAS, 118(47), e2109889118. doi:10.1073/pnas.2109889118
de la Fuente, L. A., Zamberlan, F., Bocaccio, H., et al. (2023). Temporal irreversibility of neural dynamics as a signature of consciousness. Cerebral Cortex, 33(5), 1856-1865. doi:10.1093/cercor/bhac177
Ishihara, K., & Shimazaki, H. (2025). State-space kinetic Ising model reveals task-dependent entropy flow in sparsely active nonequilibrium neuronal dynamics. Nature Communications, 16, 10852. doi:10.1038/s41467-025-66669-w
Musall, S., Kaufman, M. T., Juavinett, A. L., Gluf, S., & Churchland, A. K. (2019). Single-trial neural dynamics are dominated by richly varied movements. Nature Neuroscience, 22, 1677-1686. doi:10.1038/s41593-019-0502-4
Saleem, A. B., Ayaz, A., Jeffery, K. J., Harris, K. D., & Carandini, M. (2013). Integration of visual motion and locomotion in mouse visual cortex. Nature Neuroscience, 16, 1864-1869. doi:10.1038/nn.3567
Flesher, S. N., Downey, J. E., Weiss, J. M., et al. (2021). A brain-computer interface that evokes tactile sensations improves robotic arm control. Science, 372(6544), 831-836. doi:10.1126/science.abd0380

Wiki: How to read claims and evidence

Read this first to avoid getting lost

What we know now

What we still do not know

Check the basics in the wiki

Plain-language terms on this page

See the structure before reading the whole page