Why MetaWorks?
Free and open-source software
MetaWorks runs at the command-line on linux-64. The pipeline strings together popular open-source free software tools to process demultiplexed Illumina paired-end reads such as SeqPrep (St. John, 2016), CutAdapt (Martin, 2011), VSEARCH (Rognes, Flouri, Nichols, Quince, & Mahé, 2016), and the RDP Classifier (Wang, Garrity, Tiedje, & Cole, 2007).
Versioned workflows to improve reproducibility
MetaWorks is versioned and available from GitHub.
Harmonized Conda processing environment
MetaWorks comes with a conda environment file that should be activated before running the pipeline. Conda is an open-source environment and package manager (Anaconda, 2016). The environment file contains most of the programs and dependencies needed to run MetaWorks. If pseudogene filtering will be used, then the NCBI ORFfinder program will also need to be installed. Additional RDP-trained reference sets may need to be downloaded if the reference set needed is not already built in to the RDP classifier.
Uses Snakemake for scalable processing
Snakemake is a Python-based workflow manager (Koster and Rahmann, 2012) that strings together workflow-steps and distributes these jobs across a high performance computing platform to efficiently manage computational resources. Interrupted jobs can be re-started following the last successful step.
Generates either exact sequence variants and/or operational taxonomic units
MetaWorks offers workflows for generating exact sequence variants (ESVs) and/or operational taxonomic units (OTUs) using a 97% identity cutoff using VSEARCH. A workflow is also available for single-read processing for when the amplicon reads do not overlap.
Supports popular metabarcode markers
MetaWorks was specifically developed to handle different types of metabarcodes from ribosomal RNA genes + spacers to protein coding genes. Unique marker considerations, such as the removal of conserved rRNA genes from ITS sequences and putative pseudogenes from COI is supported. A detailed description of our pseudogene-filtering approaches for protein-coding metabarcode markers has been published.
MetaWorks uses a naive Bayesian classifier to make taxonomic assignments with a measure of confidence (Wang et al, 2007). Built-in markers include the popular prokaryote 16S rRNA gene, fungal ITS, and fungal LSU rRNA markers. Custom-trained classifiers developed for MetaWorks includes: COI, 18S rRNA, rbcL, and 12S.
Trained classifiers that work with MetaWorks and the RDP Classifier
Marker | Target taxa | Classifier availability |
---|---|---|
COI | Eukaryotes | Eukaryote COI Classifier |
rbcL | Diatoms | Diat.barcode rbcL Classifier |
rbcL | Land plants | Land plant rbcL Classifier |
rbcL | Eukaryotes | Eukaryote rbcL Classifier |
12S | Fish | MitoFish 12S Classifier |
12S | Vertebrates | Vertebrate 12S Classifier |
SSU (18S) | Diatoms | Diat.barcode SSU Classifier |
SSU (16S) | Vertebrates | Vertebrate mitochondrial 16S Classifier |
SSU (18S) | Eukaryotes | SILVA 18S Classifier |
SSU (16S) | Prokaryotes | Built-in to the RDP classifier |
ITS | Fungi (Warcup) | Built-in to the RDP classifier |
ITS | Fungi (UNITE 2014) | Built-in to the RDP classifier |
ITS | Fungi (UNITE 2021) | Fungal UNITE ITS Classifier |
ITS | Plants (PLANiTS) | PLANiTS ITS Classifier |
LSU | Fungi | Built-in to the RDP classifier |
Developed to support projects that cut across taxon lines!
Our pipelines have been around, in one form or another, since before the terms metabarcoding and eDNA were coined. As we know, ‘best practice’ is a moving target in this field. MetaWorks is based on ‘best practices’ from the fields of microbial and fungal molecular ecology and strives to accommodate the needs of the animal metabarcode community. We are driven by the need to make metabarcode bioinformatic processing both scalable and tractable within reasonable timeframes. This pipeline is in active development to keep up with improvements in the underlying programs and reference sequence databases.
MetaWorks has been used as a part of the STREAM and EcoBiomics projects to process multi-marker metabarcode datasets from freshwater benthos, water, and soil.
Papers and Projects That Use MetaWorks
Edge, C., Baker, L., Smenderovac, E., Heartz, S., & Emilson, E. 2022. Tebufenozide has limited direct effects on simulated aquatic communities. Ecotoxicology, 31(8), 1231-1240.
Edge TA, Baird DJ, Bilodeau G, Gagné N, Greer C, Konkin D, et al. 2020. The Ecobiomics project: Advancing metagenomics assessment of soil health and freshwater quality in Canada. Science of The Total Environment, 710: 135906. doi:10.1016/j.scitotenv.2019.135906
Moir, C. 2021. No Stomach, No Problem: an Integrated Morpho-Molecular Approach to Assessing the Diets of the Cunner Wrasse, Tautogolabrus adspersus, among Coastal, Nearshore Regions of Atlantic Canada (Doctoral dissertation, University of Guelph).
Porter, TM, & Hajibabaei, M. 2021. Profile hidden Markov model sequence analysis can help remove putative pseudogenes from DNA barcoding and metabarcoding datasets. BMC Bioinformatics, 22(1): 256. doi:10.1186/s12859-021-04180-x
Porter, T.M., Smenderovac, E., Morris, D., Venier, L. 2023. All boreal forest successional stages needed to maintain the full suite of soil biodiversity, community composition, and function following wildfire. Scientific Reports: 13, 7978. doi.org/10.1038/s41598-023-30732-7
Robinson, CV, Baird, DJ, Wright, MTG, Porter, TM, Hartwig, K, Hendriks, E, Maclean, L, Mallinson, R, Monk, WA, Paquette, C and Hajibabaei, M. 2021. Combining DNA and people power for healthy rivers: Implementing the STREAM community-based approach for global freshwater monitoring. Perspectives in Ecology and Conservation, 19(3): 279-285. doi:10.1016/j.pecon.2021.03.001
Robinson, CV, Porter, TM, Maitland, VC, Wright, MT and Hajibabaei, M. 2022. Multi-marker metabarcoding resolves subtle variations in freshwater condition: Bioindicators, ecological traits, and trophic interactions. Ecological Indicators, 145, 109603.
Rudar, J, Golding, GB, Kremer, SC and Hajibabaei, M. 2022. Decision Tree Ensembles Utilizing Multivariate Splits Are Effective at Investigating Beta-Diversity in Medically Relevant 16S Amplicon Sequencing Data. bioRxiv, doi:10.1101/2022.03.31.486647
Smenderovac E, Emilson C, Porter T, Morris D, Hazlett P, Diochon A, et al. 2022. Forest soil biotic communities show few responses to wood ash applications at multiple sites across Canada. Sci Rep., 12: 4171. doi:10.1038/s41598-022-07670-x
Smenderovac, E., Hoage, J., Porter, T. M., Emilson, C., Fleming, R., Basiliko, N., ... & Venier, L. (2023). Boreal forest soil biotic communities are affected by harvesting, site preparation with no additional effects of higher biomass removal 5 years post-harvest. Forest Ecology and Management, 528, 120636.
How to Cite
The MetaWorks pipeline and approach:
Porter, T. M., & Hajibabaei, M. (2022). MetaWorks: A flexible, scalable bioinformatic pipeline for high-throughput multi-marker biodiversity assessments. PLOS ONE, 17(9), e0274260. doi: 10.1371/journal.pone.0274260
The MetaWorks code:
Teresita M. Porter. (2020, June 25). MetaWorks: A Multi-Marker Metabarcode Pipeline (Version v1.10.0). Zenodo, doi:10.5281/zenodo.4741407
The COI classifier:
Porter, T. M., & Hajibabaei, M. (2018). Automated high throughput animal CO1 metabarcode classification. Scientific Reports, 8, 4226. doi:10.1038/s41598-018-22505-4
MetaWorks pseudogene filtering:
Porter, T.M., & Hajibabaei, M. (2021). Profile hidden Markov model sequence analysis can help remove putative pseudogenes from DNA barcoding and metabarcoding datasets. BMC Bioinformatics, 22: 256. doi:10.1186/s12859-021-04180-x
The RDP classifier:
Wang, Q., Garrity, G. M., Tiedje, J. M., & Cole, J. R. (2007). Naive Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy. Applied and Environmental Microbiology. 73(16), 5261–5267. doi:10.1128/AEM.00062-07