Test data
nf-core has a central repository on nf-core/test-datasets containing small data files for testing of nf-core components and pipelines. This guide explains how to find and upload new test data.
Modules and subworkflows
Reusing existing test data
When setting up tests for a nf-core component, test data files should be reused as far as possible.
If a small test data file for your module can be quickly generated by an upstream module, do not add the input file to the test data.
Run the upstream module in the test for your module using the nf-test setup method.
Test data for modules and subworkflows are stored on the modules branch of nf-core/test-datasets.
The branch is approximately structured as follows:
data/
├── delete_me/
│ └── <tool name>/
│ └── <test data file>
├── generic/
│ └── <format>/
│ └── <test data file>
└── <research domain>/
└── <technology or analysis>/
│ └── <format>/
│ └── <test data file>/
└── <species>/ ## Bioinformatics only
└── <technology or analysis>/
└── <format>/
└── <test data file>/Existing nf-core test data can be explored using nf-core/tools.
nf-core test-datasets search --branch modulesType in the search bar keywords such as for file formats or extensions to find potentially relevant files. To get more information about the context of the data, see the nf-core/test-datasets README file.
Selecting a file to print different variants of URLs to the test data file for use in tests.
Uploading new data
If you cannot generate test data on the fly quickly, or reuse existing data, you can upload new compliant test data to the modules branch.
If you need to create a derivative file, use existing test data as a source.
For example, if you need to test a new bioinformatic short-read aligner that requires a genome file index, use an existing genome.fasta file on nf-core/test-datasets to generate the index, rather than using your own reference genome.
Do not clone the nf-core/test-datasets repository: it is extremely large and will take a long time to download. If a local copy is required, we highly recommend using single-branch clones.
- Verify your new test-data file complies with the nf-core test-datasets specifications.
- Fork the nf-core/test-datasets repository including all branches.
- Upload to the
modulesbranch on your fork in a suitable location.- (Recommended) Use the GitHub website upload function.
- If you are unsure of the suitable location, ask on the nf-core #test-data Slack channel.
- Edit the README of the
modulesbranch to add a short entry describing the new test data file.- Include as much information as possible to support reconstruction of the file (for example, original source accession numbers or URLs, tool name and version used to generate).
- (Optional) If a description requires more than one or two sentences Additionally add a dedicated README markdown file alongside the test-data file itself.
- Open a pull request and request a review on the nf-core #request-review Slack channel.
Research domain specific guidance
Biology
Avoid adding test data of new organisms as far as possible, and reuse existing files.
SARS-CoV-2 and Homo sapiens (Chr 21) are the best supported organisms and should be used where possible. If a new organism is absolutely required for a tool, first propose the addition on the #test-data Slack channel.
Pipelines
Each official nf-core pipeline has a dedicated branch on the test-datasets repository, for example, nf-core/rnaseq.
Reusability principles are less strict for pipeline test data. Test data should not be reused across pipelines for stability purposes. Reusing test data from modules for pipeline tests is possible due to the greater stability of these files.
A pipeline-specific nf-core/test-datasets branch will generally consist of:
- Small raw input files (for example, FASTQ, FASTA etc. for bioinformatics)
- Samplesheets
Adding new test data
- Verify your new test-data file complies with the nf-core test-datasets specifications.
- Fork the nf-core/test-datasets repository.
- Upload to the dedicated pipeline branch in a suitable location.
- (Recommended) Use the GitHub website upload function.
- Edit the README of the dedicated pipeline branch describing how the file was generated.
- Include as much information as possible that could allow reconstruction of the file (for example, original source accession numbers or URLs, tool name and versions used to generate).
- Open a pull request and request a review on the nf-core #request-review Slack channel.