From Chaos to Clarity: Organizing and Sharing Your Research Data Like a Pro

Manousos A. Klados
May 28
6 min read

1. Introduction to RDM and FAIR Principles

What is Research Data Management (RDM)?

Research Data Management involves all the steps you take to ensure the integrity, reproducibility, and accessibility of the data you generate during your research project.

RDM encompasses:

Planning: Anticipating data types, volumes, and management strategies.
Collecting: Acquiring data using standardized and validated procedures.
Storing: Safeguarding data with secure and reliable systems.
Preserving: Ensuring long-term accessibility and reusability.
Sharing: Publishing data in ways that promote discovery and reuse.

Good RDM supports open science, meets compliance requirements, and enhances your reputation as a responsible researcher.

What Are the FAIR Principles?

The FAIR Principles were proposed to improve the transparency, discoverability, and reusability of research data:

Findable: Data and metadata should be indexed and assigned a globally unique and persistent identifier (e.g., DOI).
Accessible: Data should be retrievable using standardized protocols, with clear conditions for access.
Interoperable: Data should use standard formats and vocabularies that allow integration with other datasets.
Reusable: Data should include rich metadata and clear usage licenses, enabling future reuse.

Example: Publishing a dataset with a DOI, in an open format (CSV), annotated with standard vocabularies (like MeSH terms), and accompanied by a CC-BY license ensures it is FAIR.

2. Planning Your Data Management

2.1 Writing a Data Management Plan (DMP)

A Data Management Plan outlines how you will handle data through your project’s lifecycle:

Why: Funding agencies increasingly require DMPs (e.g., Horizon Europe, NIH, Wellcome Trust).
When: Start at the project proposal stage and update as needed.
How: Use tools like DMPonline or DMPTool.

A strong DMP addresses:

Types and formats of data.
Metadata and documentation standards.
Storage and backup procedures.
Ethical and legal compliance.
Plans for sharing, preservation, and reuse.

3. Organizing and Documenting Data

3.1 File and Folder Organization

Use clear, descriptive, and consistent naming conventions. Avoid ambiguous names like final_version2.doc.

Include project names, dates, versions, and descriptions (e.g., ProjectX_Interview1_2025-05-28_v1.csv).
Structure folders logically:

/ProjectX
    /RawData
    /ProcessedData
    /Scripts
    /Figures
    /Documentation

3.2 Metadata Creation

Metadata describes your data to make it discoverable and understandable:

Descriptive: Title, authors, keywords, abstract.
Structural: Relationships between datasets.
Administrative: Access rights, licensing.

Use standards like Dublin Core, DataCite, or domain-specific schemas (e.g., MIAME for microarray data).

3.3 Documentation and ReadMe Files

Every dataset should include a README:

Purpose of the data.
Methods used for collection and processing.
File descriptions and formats.
Software or tools required to open files.

🔧 Tip: Use electronic lab notebooks (ELNs) to track data provenance (e.g., Jupyter, LabArchives).

4. Storing and Backing Up Data

4.1 Secure Storage Options

Institutional storage (e.g., university servers, encrypted drives).
Cloud solutions (e.g., Google Drive, OneDrive) with secure access controls.
Consider data volume, sensitivity, and access needs.

4.2 Backup Strategies

Follow the 3-2-1 rule:

3 copies: primary + 2 backups.
2 media: local + external/cloud.
1 offsite: cloud or remote storage.

Automate backups where possible. Version control systems (e.g., Git) help manage changes over time.

5. Making Data FAIR

5.1 Findable

Assign Persistent Identifiers (PIDs) like DOIs via repositories (e.g., Zenodo, Dataverse).
Index datasets in disciplinary repositories or registries.
Use descriptive metadata with rich keywords.

5.2 Accessible

Store data in open repositories that use standard protocols (HTTP, FTP).
Define access levels: public, restricted, or embargoed.
Provide contact information for restricted data.

5.3 Interoperable

Use open, standard formats (e.g., CSV, JSON, NetCDF).
Annotate data with controlled vocabularies (e.g., MeSH, OBO Foundry).
Link related datasets and software with PIDs.

5.4 Reusable

Include detailed documentation and clear licensing (e.g., CC-BY, CC0).
Follow community data standards (e.g., BIDS for neuroimaging).
Version data releases and record provenance.

6. Legal and Ethical Considerations

6.1 Informed Consent and Data Protection

Include data sharing provisions in consent forms.
Anonymize or pseudonymize sensitive data.
Comply with regulations (e.g., GDPR, HIPAA).

6.2 Licensing and Permissions

Apply open licenses (CC-BY, CC0) for data and software.
Use Data Use Agreements where necessary.

7. Sharing and Publishing Data

7.1 Where to Share?

Choose repositories appropriate for your data type and discipline:

Zenodo: General research data.
Dryad: Life sciences data.
OSF: Preprints, data, code.
Domain-specific: NeuroVault (neuroimaging), GenBank (genomic).

7.2 Data Availability Statements

Publish clear data availability statements with your articles:

Specify where data can be accessed.
Include repository links and DOIs.
Clarify any access restrictions.

8. Long-term Preservation and Sustainability

Store data in repositories with long-term funding and maintenance plans.
Use non-proprietary formats (e.g., CSV over XLSX).
Regularly check data integrity and accessibility.

9. Advanced Tips for Efficient RDM

Automate metadata creation with scripts or ELNs.
Integrate RDM into daily workflows: version-controlled folders, consistent backups.
Collaborate with data stewards or librarians for expertise.
Consider pre-registration and data citation to increase transparency and impact.

10. Additional Resources

🔗 DMP Tools:

🔗 Metadata and Standards:

🔗 Repositories:

🎯Final Action Plan

1️⃣ Draft a DMP before starting any project.

2️⃣ Organize and document your data meticulously.

3️⃣ Secure and back up data regularly.

4️⃣ Apply FAIR principles to make your data discoverable and reusable.

5️⃣ Share data ethically, legally, and with clear licensing.

6️⃣ Revisit and update your DMP throughout the project.

📄Sample Data Management Plan (DMP)

📌Project Information

Project Title: Neural Correlates of Emotion Recognition Using fNIRS
Principal Investigator: Dr. Manousos Klados
Institution: NIRx GmbH / Partner University
Start Date: September 1, 2025
End Date: August 31, 2028
Funder: Marie Skłodowska-Curie Doctoral Network (MSCA-DN)

1. Data Collection

1.1 What data will be collected?

fNIRS recordings (raw and pre-processed data) from 100 participants.
Behavioral data (reaction times, accuracy).
Demographic information (age, gender).
Questionnaire data (e.g., PANAS, STAI).

1.2 How will data be collected?

fNIRS data via NIRScout system.
Behavioral data using E-Prime software.
Paper or online questionnaires, digitized as needed.

1.3 Data formats

Raw fNIRS: .nirs, .snirf
Processed fNIRS: .mat, .csv
Behavioral and questionnaire data: .csv
Metadata: .json, README files

2. Documentation and Metadata

2.1 How will data be documented?

File naming conventions (e.g., ParticipantID_SessionDate_Version).
Metadata describing data collection parameters, preprocessing pipelines, and quality control.
README files in each dataset directory explaining content and structure.

2.2 Metadata standards

Use DataCite Metadata Schema for dataset description.
Include controlled vocabularies (e.g., NeuroLex terms for brain regions).

3. Ethics and Legal Compliance

3.1 Data protection measures

Personal data (demographics) stored separately and anonymized with participant IDs.
Informed consent forms include data sharing and reuse clauses.

3.2 Legal compliance

Adhere to GDPR for EU participants.
Institutional ethics approval obtained (Reference: IRB-2025-045).

4. Storage and Backup

4.1 Where will data be stored?

Active data: Institutional servers with secure access.
Backup: Encrypted external hard drives and secure cloud storage (e.g., OneDrive).

4.2 Backup strategy

Automatic daily backups to network storage.
Weekly manual backup to offsite cloud.

5. Sharing and Access

5.1 Will data be shared?

Yes, anonymized data and code will be shared post-publication.

5.2 Where will data be shared?

Data repository: Zenodo with DOI.
Code repository: GitHub, archived with Zenodo.
Data will be linked to relevant publications and pre-registrations.

5.3 Licensing

Data: CC-BY 4.0 license.
Code: MIT license.

6. Long-term Preservation

Data will be preserved for at least 10 years in Zenodo and institutional repositories.
Metadata and documentation will be maintained to ensure long-term usability.

7. Responsibilities and Resources

Responsible person: Dr. Manousos Klados
Support: Institutional data stewards, IT department
Resources required: Access to secure storage, version control platforms, cloud services.

8. Related Policies