Movement is a rich source of biomedical information. Every frame of a video, whether of a patient walking or an animal recovering from a stroke, contains quantitative signals about motor control, coordination, and neural recovery. Yet most of these signals remain underutilized. They are recorded in incompatible formats, analyzed with inconsistent metrics, and rarely integrated across research domains.

Computer vision (CV) has made it possible to extract these data efficiently. Informatics defines how to manage, standardize, and connect them so that movement data become reliable, reproducible evidence that can inform both research and clinical decision-making.

From Video to Structured Data

Modern CV methods can estimate body pose and joint trajectories directly from video, converting unstructured footage into spatiotemporal movement features. From these trajectories, secondary descriptors such as symmetry, stability, speed/tempo, smoothness, and coordination capture distinct aspects of motor function.

The challenge lies not in feature extraction itself but in standardization. To be interpretable and comparable, each step data capture, preprocessing, feature generation, and model fitting must follow defined metadata structures, version control, and quality-control protocols. Without this informatics infrastructure, reproducibility and interoperability remain limited.

The Case for Open Gait Data Standards

For gait analytics to reach clinical reliability, data must be standardized, interoperable, and shareable.

  • Cross-platform comparability: ensuring measurements remain valid across hospitals, labs, and devices.
  • Clinical integration: mapping gait features to health-informatics vocabularies (UMLS, SNOMED CT) for seamless entry into electronic health records.
  • FAIR compliance: making data Findable, Accessible, Interoperable, Reusable to enable research reproducibility.
  • Benchmark datasets: fueling transparent model training and validation.

Clinical Integration and Translational Relevance  

Open gait data standards not only enable cross-platform comparability but also support integration with health-informatics systems such as electronic health records (EHRs). By anchoring gait metrics to standardized clinical vocabularies (SNOMED CT, UMLS), each measurement can be unambiguously associated with clinical findings and diagnoses. Standardized metadata further allows gait parameters such as spatiotemporal, kinematic, and derived features to be linked directly to patient outcomes, interventions, or disease progression. This connection ensures that gait analytics are not just reproducible, but also translational, informing both clinical decision-making and preclinical research.

This is the outline of the current landscape of gait-related standards and what an open, cross-species framework should include:

1. Clinical Terminologies & Ontologies

Even when not gait-specific, clinical terminologies provide the semantic backbone for interoperable health data.

  • SNOMED CT and UMLS already include codes such as “Abnormal gait (finding)” and related disorders, allowing quantitative gait outputs to be mapped directly to clinical vocabularies.(1)
  • NCBO BioPortal hosts hundreds of ontologies relevant to movement, anatomy, and neuroscience, supporting consistent terminology discovery.
  • CDISC standards (Clinical Data Interchange Standards Consortium) define metadata for clinical trial data, ensuring gait endpoints align with regulatory frameworks.(5)
  • For signal-type outputs, the General Data Format for Biomedical Signals (GDF) supports standard representation of biosignals, including gait-derived time series, using open conventions.(6)

Anchoring gait metadata to these vocabularies ensures interoperability with EHR systems, clinical trial databases, and translational research repositories.

2. Protocol & Measurement Standardization

File formats alone cannot guarantee comparability; measurement protocols must also be standardized.

Initiatives such as GALOP (Gait Advisors Leading Outcomes for Parkinson’s) have proposed minimum parameter sets and metadata requirements for human Parkinson’s gait studies.(2)

Critical protocol elements include:

  • Defined gait cycle reference (heel-strike to heel-strike, or paw contact to next contact).
  • Controlled walking speed, surface type, and footwear  
  • ​​Device and camera setup: sampling rate, resolution, calibration
  • Trial metadata: number of steps, duration, and environmental context.

Standardized acquisition protocols provide the foundation for interoperability, no data schema can compensate for inconsistent measurement practice.

3. File Formats, Metadata, and Open Datasets

To make gait data reusable, open formats and transparent metadata are essential:

  • Preferred formats:
    • C3D for motion-capture files
    • HDF5 or Parquet for large multi-array datasets
    • CSV/TSV for tabular derived features. Derived features include spatiotemporal parameters (step length, stride, cadence), kinematic measurements (joint angles, segment trajectories), kinetic variables (forces, torques), and, when available, physiological signals such as EMG. Clearly specifying the type and units of these features in the metadata ensures comparability across studies and species.
    • JSON sidecar for full metadata (device, sampling rate, coordinate frames, ontology codes)
  • Public examples:
    • A Nature Scientific Data gait repository comprising 138 able-bodied adults and 50 stroke survivors provides a strong reference for file organization and metadata richness.(3)
    • In the preclinical domain, GAITOR Suite demonstrates transparent sharing and analysis pipelines for rodent gait, including open source code and reproducible file structures.(4)

Studying how such datasets structure subject information, gait parameters, and device metadata can inform a practical standard schema for human and animal data alike.

4. Governance, FAIR Practices, and Community Adoption

  • Open data standards only succeed with sustained governance and transparent licensing.
  • Governance: community working groups for methods, data curation, and validation.
  • Version control: managed via open repositories with schema documentation (e.g., JSON Schema).
  • FAIR compliance: Ensure each dataset is Findable, Accessible, Interoperable, and Reusable.
  • Licensing: open (Apache-2.0 for code, CC-BY-4.0 for data) to enable global use and extension.
  • Depositories: link to established institutional or open-access repositories with persistent identifiers (DOIs).

Toward a Shared Movement Data Infrastructure

A sustainable future for digital motor research will require a common data infrastructure: federated repositories of annotated videos, pose trajectories, and derived features that follow consistent ontologies and licensing. Open, well-curated datasets would enable objective benchmarking of models, bias detection, and cross-species validation of behavioral metrics.

Community governance, through working groups focused on methods, data curation, and validation, will be key to maintaining version control, metadata quality, and equitable access.

Challenges and Next Steps

Remaining challenges include:

  • Data privacy and governance for human video recordings, requiring federated and on-device analysis strategies.
  • Bias mitigation, ensuring algorithmic performance across diverse populations and settings.
  • Scalability, so that pipelines remain efficient in low-resource environments.
  • Regulatory alignment, linking digital endpoints with accepted clinical outcome measures.

Addressing these issues will determine how quickly CV-based motor analytics become routine components of translational and clinical research.

Conclusion

Computer vision has made gait measurement ubiquitous; informatics determines whether those measurements are meaningful, interoperable, and clinically actionable.

By anchoring gait data to open ontologies (SNOMED CT, UMLS), standardizing measurement protocols (GALOP and related frameworks), and adopting transparent metadata and file structures, movement science can become both reproducible and translational.

As these practices mature, preclinical and clinical datasets will increasingly reinforce one another, linking neural and behavioral recovery from bench to bedside.

 

References

  1. SNOMED CT Browser. Abnormal gait (finding) [SNOMED CT: 22325002].
  2. GALOP (Gait Advisors Leading Outcomes for Parkinson’s Disease). Recommendations for standardized gait assessment. PubMed ID: 36481259.
  3. Nature Scientific Data. (2021). Comprehensive gait dataset of stroke survivors and controls.
  4. GAITOR Suite. (2020). Open tools for rodent gait analysis. Nature Methods.
  5. CDISC (Clinical Data Interchange Standards Consortium). Clinical Data Standards for Biomedical Research.
  6. GDF 2.0 (General Data Format for Biomedical Signals). Specification for standard representation of biomedical signal data.