From Chaos to Clarity: Organizing and Sharing Your Research Data Like a Pro
- Manousos A. Klados
- 3 days ago
- 6 min read
1. Introduction to RDM and FAIR Principles
What is Research Data Management (RDM)?
Research Data Management involves all the steps you take to ensure the integrity, reproducibility, and accessibility of the data you generate during your research project.
RDM encompasses:
Planning: Anticipating data types, volumes, and management strategies.
Collecting: Acquiring data using standardized and validated procedures.
Storing: Safeguarding data with secure and reliable systems.
Preserving: Ensuring long-term accessibility and reusability.
Sharing: Publishing data in ways that promote discovery and reuse.
Good RDM supports open science, meets compliance requirements, and enhances your reputation as a responsible researcher.
What Are the FAIR Principles?
The FAIR Principles were proposed to improve the transparency, discoverability, and reusability of research data:
Findable: Data and metadata should be indexed and assigned a globally unique and persistent identifier (e.g., DOI).
Accessible: Data should be retrievable using standardized protocols, with clear conditions for access.
Interoperable: Data should use standard formats and vocabularies that allow integration with other datasets.
Reusable: Data should include rich metadata and clear usage licenses, enabling future reuse.
Example: Publishing a dataset with a DOI, in an open format (CSV), annotated with standard vocabularies (like MeSH terms), and accompanied by a CC-BY license ensures it is FAIR.
2. Planning Your Data Management
2.1 Writing a Data Management Plan (DMP)
A Data Management Plan outlines how you will handle data through your project’s lifecycle:
A strong DMP addresses:
Types and formats of data.
Metadata and documentation standards.
Storage and backup procedures.
Ethical and legal compliance.
Plans for sharing, preservation, and reuse.
3. Organizing and Documenting Data
3.1 File and Folder Organization
Use clear, descriptive, and consistent naming conventions. Avoid ambiguous names like final_version2.doc.
Include project names, dates, versions, and descriptions (e.g., ProjectX_Interview1_2025-05-28_v1.csv).
Structure folders logically:
/ProjectX
/RawData
/ProcessedData
/Scripts
/Figures
/Documentation
3.2 Metadata Creation
Metadata describes your data to make it discoverable and understandable:
Descriptive: Title, authors, keywords, abstract.
Structural: Relationships between datasets.
Administrative: Access rights, licensing.
Use standards like Dublin Core, DataCite, or domain-specific schemas (e.g., MIAME for microarray data).
3.3 Documentation and ReadMe Files
Every dataset should include a README:
Purpose of the data.
Methods used for collection and processing.
File descriptions and formats.
Software or tools required to open files.
🔧 Tip: Use electronic lab notebooks (ELNs) to track data provenance (e.g., Jupyter, LabArchives).
4. Storing and Backing Up Data
4.1 Secure Storage Options
Institutional storage (e.g., university servers, encrypted drives).
Cloud solutions (e.g., Google Drive, OneDrive) with secure access controls.
Consider data volume, sensitivity, and access needs.
4.2 Backup Strategies
Follow the 3-2-1 rule:
3 copies: primary + 2 backups.
2 media: local + external/cloud.
1 offsite: cloud or remote storage.
Automate backups where possible. Version control systems (e.g., Git) help manage changes over time.
5. Making Data FAIR
5.1 Findable
Assign Persistent Identifiers (PIDs) like DOIs via repositories (e.g., Zenodo, Dataverse).
Index datasets in disciplinary repositories or registries.
Use descriptive metadata with rich keywords.
5.2 Accessible
Store data in open repositories that use standard protocols (HTTP, FTP).
Define access levels: public, restricted, or embargoed.
Provide contact information for restricted data.
5.3 Interoperable
Use open, standard formats (e.g., CSV, JSON, NetCDF).
Annotate data with controlled vocabularies (e.g., MeSH, OBO Foundry).
Link related datasets and software with PIDs.
5.4 Reusable
Include detailed documentation and clear licensing (e.g., CC-BY, CC0).
Follow community data standards (e.g., BIDS for neuroimaging).
Version data releases and record provenance.
6. Legal and Ethical Considerations
6.1 Informed Consent and Data Protection
Include data sharing provisions in consent forms.
Anonymize or pseudonymize sensitive data.
Comply with regulations (e.g., GDPR, HIPAA).
6.2 Licensing and Permissions
Apply open licenses (CC-BY, CC0) for data and software.
Use Data Use Agreements where necessary.
7. Sharing and Publishing Data
7.1 Where to Share?
Choose repositories appropriate for your data type and discipline:
Zenodo: General research data.
Dryad: Life sciences data.
OSF: Preprints, data, code.
Domain-specific: NeuroVault (neuroimaging), GenBank (genomic).
7.2 Data Availability Statements
Publish clear data availability statements with your articles:
Specify where data can be accessed.
Include repository links and DOIs.
Clarify any access restrictions.
8. Long-term Preservation and Sustainability
Store data in repositories with long-term funding and maintenance plans.
Use non-proprietary formats (e.g., CSV over XLSX).
Regularly check data integrity and accessibility.
9. Advanced Tips for Efficient RDM
Automate metadata creation with scripts or ELNs.
Integrate RDM into daily workflows: version-controlled folders, consistent backups.
Collaborate with data stewards or librarians for expertise.
Consider pre-registration and data citation to increase transparency and impact.
10. Additional Resources
🔗 DMP Tools:
🔗 Metadata and Standards:
🔗 Repositories:
🎯Final Action Plan
1️⃣ Draft a DMP before starting any project.
2️⃣ Organize and document your data meticulously.
3️⃣ Secure and back up data regularly.
4️⃣ Apply FAIR principles to make your data discoverable and reusable.
5️⃣ Share data ethically, legally, and with clear licensing.
6️⃣ Revisit and update your DMP throughout the project.
📄Sample Data Management Plan (DMP)
📌Project Information
Project Title: Neural Correlates of Emotion Recognition Using fNIRS
Principal Investigator: Dr. Manousos Klados
Institution: NIRx GmbH / Partner University
Start Date: September 1, 2025
End Date: August 31, 2028
Funder: Marie Skłodowska-Curie Doctoral Network (MSCA-DN)
1. Data Collection
1.1 What data will be collected?
fNIRS recordings (raw and pre-processed data) from 100 participants.
Behavioral data (reaction times, accuracy).
Demographic information (age, gender).
Questionnaire data (e.g., PANAS, STAI).
1.2 How will data be collected?
fNIRS data via NIRScout system.
Behavioral data using E-Prime software.
Paper or online questionnaires, digitized as needed.
1.3 Data formats
Raw fNIRS: .nirs, .snirf
Processed fNIRS: .mat, .csv
Behavioral and questionnaire data: .csv
Metadata: .json, README files
2. Documentation and Metadata
2.1 How will data be documented?
File naming conventions (e.g., ParticipantID_SessionDate_Version).
Metadata describing data collection parameters, preprocessing pipelines, and quality control.
README files in each dataset directory explaining content and structure.
2.2 Metadata standards
Use DataCite Metadata Schema for dataset description.
Include controlled vocabularies (e.g., NeuroLex terms for brain regions).
3. Ethics and Legal Compliance
3.1 Data protection measures
Personal data (demographics) stored separately and anonymized with participant IDs.
Informed consent forms include data sharing and reuse clauses.
3.2 Legal compliance
Adhere to GDPR for EU participants.
Institutional ethics approval obtained (Reference: IRB-2025-045).
4. Storage and Backup
4.1 Where will data be stored?
Active data: Institutional servers with secure access.
Backup: Encrypted external hard drives and secure cloud storage (e.g., OneDrive).
4.2 Backup strategy
Automatic daily backups to network storage.
Weekly manual backup to offsite cloud.
5. Sharing and Access
5.1 Will data be shared?
Yes, anonymized data and code will be shared post-publication.
5.2 Where will data be shared?
Data repository: Zenodo with DOI.
Code repository: GitHub, archived with Zenodo.
Data will be linked to relevant publications and pre-registrations.
5.3 Licensing
Data: CC-BY 4.0 license.
Code: MIT license.
6. Long-term Preservation
Data will be preserved for at least 10 years in Zenodo and institutional repositories.
Metadata and documentation will be maintained to ensure long-term usability.
7. Responsibilities and Resources
Responsible person: Dr. Manousos Klados
Support: Institutional data stewards, IT department
Resources required: Access to secure storage, version control platforms, cloud services.
8. Related Policies
Institutional data management policy.
Funder guidelines (e.g., MSCA Open Science requirements).
Ethical guidelines from the local IRB.
Appendix: Quick Reference Checklist
✅ Write and update the DMP.
✅ Use clear file names and directory structures.
✅ Store data securely and back it up regularly.
✅ Create rich metadata and documentation.
✅ Share data and code in open repositories with DOIs.
✅ Apply appropriate licenses.
✅ Ensure compliance with ethical and legal standards.
Enjoyed this post?
If you found this helpful, subscribe to my email list for more on research workflows, productivity in academia, and life as a scholar.
You can also visit my website add me as friend in LinkedIn to connect, explore options, or see what I’m working on.
Dr. Manousos Klados, MSc, PhD. PGCert. FHEA, FIMA
🎓Associate Professor in Psychology
Director of MSc/MA in Cognitive/Clinical Neuropsychology
✍️ Editor in Chief of Brain Organoid and System Neuroscience Journal
🧬 Scientific Consultant @ NIRx
🧑💻 Personal websites: https://linktr.ee/thephdmentor|
Comments