Why MetaWorks?

Free and open-source software

MetaWorks runs at the command-line on linux-64. The pipeline strings together popular open-source free software tools to process demultiplexed Illumina paired-end reads such as SeqPrep (St. John, 2016), CutAdapt (Martin, 2011), VSEARCH (Rognes, Flouri, Nichols, Quince, & Mahé, 2016), and the RDP Classifier (Wang, Garrity, Tiedje, & Cole, 2007).

Versioned workflows to improve reproducibility

MetaWorks is versioned and available from GitHub.

Harmonized Conda processing environment

MetaWorks comes with a conda environment file that should be activated before running the pipeline. Conda is an open-source environment and package manager (Anaconda, 2016). The environment file contains most of the programs and dependencies needed to run MetaWorks. If pseudogene filtering will be used, then the NCBI ORFfinder program will also need to be installed. Additional RDP-trained reference sets may need to be downloaded if the reference set needed is not already built in to the RDP classifier.

Uses Snakemake for scalable processing

Snakemake is a Python-based workflow manager (Koster and Rahmann, 2012) that strings together workflow-steps and distributes these jobs across a high performance computing platform to efficiently manage computational resources. Interrupted jobs can be re-started following the last successful step.

Generates either exact sequence variants and/or operational taxonomic units

MetaWorks offers workflows for generating exact sequence variants (ESVs) and/or operational taxonomic units (OTUs) using a 97% identity cutoff using VSEARCH. A workflow is also available for single-read processing for when the amplicon reads do not overlap.

Supports popular metabarcode markers

MetaWorks was specifically developed to handle different types of metabarcodes from ribosomal RNA genes + spacers to protein coding genes. Unique marker considerations, such as the removal of conserved rRNA genes from ITS sequences and putative pseudogenes from COI is supported. A detailed description of our pseudogene-filtering approaches for protein-coding metabarcode markers has been published.

MetaWorks uses a naive Bayesian classifier to make taxonomic assignments with a measure of confidence (Wang et al, 2007). Built-in markers include the popular prokaryote 16S rRNA gene, fungal ITS, and fungal LSU rRNA markers. Custom-trained classifiers developed for MetaWorks includes: COI, 18S rRNA, rbcL, and 12S.

Trained classifiers that work with MetaWorks and the RDP Classifier
Marker Target taxa Classifier availability
COI Eukaryotes Eukaryote COI Classifier
rbcL Diatoms Diat.barcode rbcL Classifier
rbcL Land plants Land plant rbcL Classifier
rbcL Eukaryotes Eukaryote rbcL Classifier
12S Fish MitoFish 12S Classifier
12S Vertebrates Vertebrate 12S Classifier
SSU (18S) Diatoms Diat.barcode SSU Classifier
SSU (16S) Vertebrates Vertebrate mitochondrial 16S Classifier
SSU (18S) Eukaryotes SILVA 18S Classifier
SSU (16S) Prokaryotes Built-in to the RDP classifier
ITS Fungi (Warcup) Built-in to the RDP classifier
ITS Fungi (UNITE 2014) Built-in to the RDP classifier
ITS Fungi (UNITE 2021) Fungal UNITE ITS Classifier
ITS Plants (PLANiTS) PLANiTS ITS Classifier
LSU Fungi Built-in to the RDP classifier
Developed to support projects that cut across taxon lines!

Our pipelines have been around, in one form or another, since before the terms metabarcoding and eDNA were coined. As we know, ‘best practice’ is a moving target in this field. MetaWorks is based on ‘best practices’ from the fields of microbial and fungal molecular ecology and strives to accommodate the needs of the animal metabarcode community. We are driven by the need to make metabarcode bioinformatic processing both scalable and tractable within reasonable timeframes. This pipeline is in active development to keep up with improvements in the underlying programs and reference sequence databases.

MetaWorks has been used as a part of the STREAM and EcoBiomics projects to process multi-marker metabarcode datasets from freshwater benthos, water, and soil.

Papers and Projects That Use MetaWorks

Edge, C., Baker, L., Smenderovac, E., Heartz, S., & Emilson, E. 2022. Tebufenozide has limited direct effects on simulated aquatic communities. Ecotoxicology, 31(8), 1231-1240.

Edge TA, Baird DJ, Bilodeau G, Gagné N, Greer C, Konkin D, et al. 2020. The Ecobiomics project: Advancing metagenomics assessment of soil health and freshwater quality in Canada. Science of The Total Environment, 710: 135906. doi:10.1016/j.scitotenv.2019.135906

Moir, C. 2021. No Stomach, No Problem: an Integrated Morpho-Molecular Approach to Assessing the Diets of the Cunner Wrasse, Tautogolabrus adspersus, among Coastal, Nearshore Regions of Atlantic Canada (Doctoral dissertation, University of Guelph).

Porter, TM, & Hajibabaei, M. 2021. Profile hidden Markov model sequence analysis can help remove putative pseudogenes from DNA barcoding and metabarcoding datasets. BMC Bioinformatics, 22(1): 256. doi:10.1186/s12859-021-04180-x

Porter, T.M., Smenderovac, E., Morris, D., Venier, L. 2022. All boreal forest successional stages needed to maintain the full suite of soil biodiversity following natural wildfire in jack pine-dominated forest ecosites. BioRxiv, https://doi.org/10.1101/2022.11.18.517085

Robinson, CV, Baird, DJ, Wright, MTG, Porter, TM, Hartwig, K, Hendriks, E, Maclean, L, Mallinson, R, Monk, WA, Paquette, C and Hajibabaei, M. 2021. Combining DNA and people power for healthy rivers: Implementing the STREAM community-based approach for global freshwater monitoring. Perspectives in Ecology and Conservation, 19(3): 279-285. doi:10.1016/j.pecon.2021.03.001

Robinson, CV, Porter, TM, Maitland, VC, Wright, MT and Hajibabaei, M. 2022. Multi-marker metabarcoding resolves subtle variations in freshwater condition: Bioindicators, ecological traits, and trophic interactions. Ecological Indicators, 145, 109603.

Rudar, J, Golding, GB, Kremer, SC and Hajibabaei, M. 2022. Decision Tree Ensembles Utilizing Multivariate Splits Are Effective at Investigating Beta-Diversity in Medically Relevant 16S Amplicon Sequencing Data. bioRxiv, doi:10.1101/2022.03.31.486647

Smenderovac E, Emilson C, Porter T, Morris D, Hazlett P, Diochon A, et al. 2022. Forest soil biotic communities show few responses to wood ash applications at multiple sites across Canada. Sci Rep., 12: 4171. doi:10.1038/s41598-022-07670-x

Smenderovac, E., Hoage, J., Porter, T. M., Emilson, C., Fleming, R., Basiliko, N., ... & Venier, L. (2023). Boreal forest soil biotic communities are affected by harvesting, site preparation with no additional effects of higher biomass removal 5 years post-harvest. Forest Ecology and Management, 528, 120636.

How to Cite

The MetaWorks pipeline and approach:

Porter, T. M., & Hajibabaei, M. (2022). MetaWorks: A flexible, scalable bioinformatic pipeline for high-throughput multi-marker biodiversity assessments. PLOS ONE, 17(9), e0274260. doi: 10.1371/journal.pone.0274260

The MetaWorks code:

Teresita M. Porter. (2020, June 25). MetaWorks: A Multi-Marker Metabarcode Pipeline (Version v1.10.0). Zenodo, doi:10.5281/zenodo.4741407

The COI classifier:

Porter, T. M., & Hajibabaei, M. (2018). Automated high throughput animal CO1 metabarcode classification. Scientific Reports, 8, 4226. doi:10.1038/s41598-018-22505-4

MetaWorks pseudogene filtering:

Porter, T.M., & Hajibabaei, M. (2021). Profile hidden Markov model sequence analysis can help remove putative pseudogenes from DNA barcoding and metabarcoding datasets. BMC Bioinformatics, 22: 256. doi:10.1186/s12859-021-04180-x

The RDP classifier:

Wang, Q., Garrity, G. M., Tiedje, J. M., & Cole, J. R. (2007). Naive Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy. Applied and Environmental Microbiology. 73(16), 5261–5267. doi:10.1128/AEM.00062-07

This site is maintained by Teresita M. Porter (terrimporter AT gmail DOT com) and Artin Mashayekhi
Documentation License: CC-BY 4.0