Data Repository Guidance

Scientific Data mandates authors submit datasets to an appropriate public data repository. Data should be submitted to discipline-specific, community-recognised service where available or a generalist repository if not.

Deposition is required from the second round of peer review onwards to allow us to check policy compliance and prepare the paper for publication. While it is also recommended for the first round of review, we accept informal methods of sharing at the start if authors wish to delay this. This can include sharing URLs to private cloud storage as long as the method of download is seamless and anonymous. 

Repository requirements 

    Repositories need to meet the following requirements for anonymous peer-review, data access, preservation, resource stability, licences and suitability for use by all researchers with the appropriate types of data

    General requirements for repositories (all data types)

    • Ensure long-term persistence and preservation of datasets in their published form. All Data Descriptors need to be associated with live data to avoid future issues with the integrity of the paper. 
    • Make data available for peer review. Where logins or other barriers are required or temporarily applied, routes for confidential peer review of submitted datasets need to be provided that do not reveal the identity of the reviewer to the data owner/author of the associated article. Please consult with the repository to arrange this, or provide the data in a temporary location. 
    • Provide stable persistent identifiers for submitted datasets. DOIs are the default for most non-omics datasets described in the journal. 
    • Direct https download of the submitted dataset, preferably as a single download/unit, from the DOI-registered page 

    Requirements for repositories for non-sensitive (non-human) data

    • Allow public access to data without barriers, such as formal application processes. Basic login/registration functionalities, where data are captured for analytics purposes only, are accepted as long as immediate access is granted to the holder of the email address without manual checks, however we encourage login-free https access without registration in most cases. 
    • Use open licences (CC0 and CC-BY, or their equivalents, are required in most cases). We do not typically support the use of more restrictive CC licences - containing SA, NC or ND clauses - other than where applied to third party data that has been re-used and the original licence needs to be retained. 

    Note: elements of the above policies may be waived for exceptionally large datasets (> 1Tb in size) as we appreciate meeting all requirements is challenging at this scale. 

    Requirements for controlled access repositories handling sensitive human data

    • Login/registration of users for the purposes of identity validation
    • Support Data Usage Agreements (DUAs) which stipulate which users are and are not allowed to managed risks associated with the files 
    • Controlled access, meaning download is only available to registered, verified users who sign the DUA

    Please see our Human Data policy for more complete instructions of our requirements. 

    Repository guidance

    Researchers sharing certain data types are mandated to use specific repositories. These include genomics, transcriptomics, protein structures, proteomics, and small molecule crystallography. Please see Springer Nature's mandated data types for a full list. 

    For all other data types, Scientific Data does not mandate, endorse, or recommend any specific resource and researchers are free to deposit their dataset in any repository that meets the above policy.

    If you require assistance with where to deposit data, please consider these resources: