What this route now fixes first
The short path from EEG to L0 still exists, but it now starts with a stricter question: what exactly will your score mean? Current primary literature and official standards do not support the shortcut that says "pick a public dataset, run preprocessing, train a model, report accuracy." Before a score matters, this site now fixes benchmark object, temporal regime, event semantics and clock domain, artifact lineage, and the stopped claim.
The old route was too permissive. Pernet et al. (2019), the current BIDS specification, and Pernet et al. (2020) show why raw identity, derivatives, and reporting provenance must be explicit. Hermes et al. (2025) and Kothe et al. (2025) show why event semantics and synchronization still need separate audits. Chaibub Neto et al. (2019), Melnik et al. (2017), Xu et al. (2020), and Di et al. (2021) show why subject and acquisition shortcuts can survive loose evaluation. Egger et al. (2024) show that even within roughly half a day, EEG decoding conditions can shift enough to matter for robustness. The official EEG Challenge (2025) pages then show that benchmark governance itself can change what a leaderboard means. Therefore, this page no longer treats "dataset -> preprocessing -> score" as a sufficient beginner route.
This page stays on the technical and natural science side. It does not argue about philosophy, law, or identity. It only fixes what must be observable, logged, and audited before an EEG result can count as reproducible L0 work on this site.
Six gates from EEG to L0
| Order | Page to open | What is fixed here | What must exist before moving on |
|---|---|---|---|
| 1 | EEG 101 | Fix the measurement ceiling: what EEG directly observes, what remains latent, and what kind of claim it cannot support on its own. | A one-line stopped claim such as "this route aims at reproducible macro-state analysis, not source-complete or WBE-complete recovery." |
| 2 | Datasets and Baseline / Benchmark / Pre-registration / Model Card | Choose a benchmark object, not just a file bundle: task, target, independent hold-out unit, metric bundle, version, extra-data policy, and benchmark-governance status. | A short benchmark card naming dataset/version, task, target, split unit, main metric bundle, and whether official rules or postmortems changed the benchmark meaning. |
| 3 | Dataset splits and data leakage and State, trait, and drift | Freeze the temporal regime: subject, session, and time disjointness; same-session versus cross-day scope; fixed versus recalibrated decoder interval. | A split manifest plus a temporal-validity note stating whether the result is same-session, same-day, cross-day, or longer-horizon, and whether the decoder stays fixed. |
| 4 | Event synchronization and observation logs | Freeze the observation contract: event times, event semantics, label provenance, clock domain, delay/jitter/drift notes, and report-usage flags. | An observation log that separates time anchor, semantics, and synchronization layer instead of mixing them into one generic metadata note. |
| 5 | Hands-on and L0 minimum artifact pack | Produce the first reproducible artifact bundle: raw identity, derivative identity, run identity, QC, baseline output, and failure registry. | A rerunnable derivative package with dataset provenance, command or pipeline provenance, environment pin, QC report, baseline output, and at least one named failure mode. |
| 6 | Verification | Convert the artifact bundle into a bounded claim: L0 ceiling, observability ceiling, shortcut ceiling, and temporal-validity ceiling. | A submission-ready stopped claim plus the required companion cards if the result starts to imply target specificity or temporal durability. |
Why these gates must stay separate
| What older beginner routes tended to compress | What current sources actually support | How this site now reads the route |
|---|---|---|
| "Dataset choice" as only a file download step | Saito & Rehmsmeier (2015) show why metric choice changes what a binary score means, and the official EEG Challenge (2025) rules plus final leaderboard show that governance changes can alter benchmark meaning after launch. | Choosing data now means choosing the benchmark object: task, target, split/randomization rule, metric bundle, version, and governance status. |
| "Clean split" as the whole leakage solution | Chaibub Neto et al. (2019), Melnik et al. (2017), Xu et al. (2020), and Di et al. (2021) show that subject/session and acquisition-distribution structure can remain highly predictive. | The route now fixes both split hygiene and shortcut resistance. A clean split is necessary, but it is not treated as proof that the target neural variable was isolated. |
| "Same-session score" as temporal generalization | Egger et al. (2024) show that EEG decoding conditions can change materially across a day-night window, and this site's Temporal Validity Card plus state-trait-drift rule now separate fixed-decoder interval, fast labels, slow internal-milieu disclosure, and recalibration burden. | The route now asks the reader to decide same-session, same-day, cross-day, or longer-horizon scope before training, and to log whether the regime changed through movement / arousal alone or through slower circadian / endocrine-metabolic state as well. |
| "Events are in BIDS" as if semantics and timing were solved together | The current BIDS specification and Hermes et al. (2025) support structured events and machine-readable semantics, while Kothe et al. (2025) makes clear that synchronization middleware does not by itself measure device-side delay. | This route now separates time anchor, event semantics, and clock/synchronization audit into distinct observation artifacts. |
| "Pipeline ran" as if provenance were sufficient | Gorgolewski et al. (2016), Pernet et al. (2019), Pernet et al. (2020), and the current BIDS specification separate raw datasets, derivatives, and generated-by provenance. | The route now requires raw identity, derivative identity, and run identity to be visible as different objects before L0 is called complete. |
Minimum artifact bundle before one score matters
| Artifact | What it must disclose | What goes wrong if it is missing |
|---|---|---|
| Benchmark object | Dataset/version, task, target, independent hold-out unit, metric bundle, extra-data policy, and benchmark-governance status. | A score can be overread as if it applied to a different task, split regime, or official rule set. |
| Split manifest | Which subject/session/time units are disjoint, how folds were frozen, and which grouping variables were respected. | The evaluation can drift silently as folds or grouping assumptions change. |
| Temporal-validity note | Same-session versus cross-day scope, fixed versus recalibrated decoder, fast state labels, and any relevant slow internal-milieu disclosure. | Same-day success can be silently promoted to longitudinal stability or deployability. |
| Observation log | Event times, semantics, scorer or report provenance, clock domain, delay/jitter/drift notes, and bad-segment annotations. | The route cannot distinguish a signal problem from a label or timing problem. |
| Derivative lineage | Source dataset, generated-by pipeline, version or commit, environment pin, command provenance, and output locations. | Reanalysis and audit become impossible even if the main score is reproducible once. |
| Stopped claim | What the result supports and what it still does not support, in one or two sentences. | L0 can be overread as source localization truth, stable biomarker evidence, or WBE-relevant state capture. |
Not every field above is a single mandatory key in one external standard. The stronger requirement on this site is an operational inference from current standards, primary literature, and challenge practice: if a result is to count as comparable L0 progress, the benchmark object, temporal regime, observation contract, and derivative lineage all need their own artifacts rather than one mixed prose paragraph.
Five accidents this route now tries to stop early
| Common accident | Why it is scientifically weak | Where to return |
|---|---|---|
| Choosing a starter dataset without naming the benchmark object | The data may be fine, but the score will still be uninterpretable if task, metric bundle, and governance status are missing. | Baseline / Benchmark / Pre-registration / Model Card |
| Using a subject/session split but not naming temporal scope | The result can still be same-session or same-day only, even if the split sounds clean. | State, trait, and drift |
| Treating `events.tsv` as if it fully solved label meaning | Time anchors, condition semantics, and report-derived labels are different objects and can fail independently. | Event synchronization and observation logs |
| Treating LSL or trigger lines as hardware ground truth | Network synchronization does not automatically measure display, audio, amplifier, or device-internal delays. | Event synchronization and observation logs |
| Reporting one score without lineage and failure disclosure | The route becomes impossible to audit, extend, or compare even if the run once looked successful. | L0 minimum artifact pack and Verification |
Where to go next
If the measurement ceiling of EEG is still not clear, return to EEG 101. If the main uncertainty is split hygiene or benchmark provenance, go to Dataset splits and data leakage. If time anchors, label provenance, or synchronization are the problem, go to Event synchronization and observation logs. Once the first artifact bundle exists, route it through Verification and attach the Temporal Validity Card or Specificity & Shortcut Card whenever the claim starts to reach beyond plain reproducible analysis.
References
- Gorgolewski KJ, Auer T, Calhoun VD, et al. The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments. Sci Data. 2016. https://doi.org/10.1038/sdata.2016.44
- Pernet CR, Appelhoff S, Gorgolewski KJ, et al. EEG-BIDS, an extension to the brain imaging data structure for electroencephalography. Sci Data. 2019. https://doi.org/10.1038/s41597-019-0104-8
- Pernet C, Garrido MI, Gramfort A, et al. Issues and recommendations from the OHBM COBIDAS MEEG committee for reproducible EEG and MEG research. Nat Neurosci. 2020. https://doi.org/10.1038/s41593-020-00709-0
- Brain Imaging Data Structure (stable): dataset_description.json, derived dataset and pipeline description. https://bids-specification.readthedocs.io/en/stable/modality-agnostic-files/dataset-description.html
- Hermes D, Bigdely-Shamlo N, Niso G, et al. HED library schema for EEG data annotation. Sci Data. 2025. https://doi.org/10.1038/s41597-025-05791-2
- Kothe C, Grivich M, Stenner T, et al. The lab streaming layer for synchronized multimodal recording. Imaging Neurosci. 2025. https://doi.org/10.1162/IMAG.a.136
- Chaibub Neto E, Pratap A, Perumal TM, et al. Detecting the impact of subject characteristics on machine learning-based diagnostic applications. npj Digit Med. 2019. https://doi.org/10.1038/s41746-019-0178-x
- Melnik A, Legkov P, Izdebski K, et al. Systems, subjects, sessions: to what extent do these factors influence EEG data? Front Hum Neurosci. 2017. https://doi.org/10.3389/fnhum.2017.00150
- Xu M, Han J, Wang Y, et al. Cross-dataset variability problem in EEG decoding with deep learning. Front Hum Neurosci. 2020. https://doi.org/10.3389/fnhum.2020.00103
- Di M, Han J, Wang Y, et al. The time-robustness analysis of individual identification based on resting-state EEG. Front Hum Neurosci. 2021. https://doi.org/10.3389/fnhum.2021.672946
- Egger M, Haden B, Bernarding J, et al. Chrono-EEG dynamics influencing hand gesture decoding: a 10-hour study. Sci Rep. 2024. https://doi.org/10.1038/s41598-024-70609-x
- Saito T, Rehmsmeier M. The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS One. 2015. https://doi.org/10.1371/journal.pone.0118432
- EEG Challenge (2025) official website. https://eeg2025.github.io/
- EEG Challenge (2025) official rules. https://eeg2025.github.io/rules/
- EEG Challenge (2025) final leaderboard and organizer postmortem. https://eeg2025.github.io/leaderboard/