Early Stage

14. ViroMatch: A Computational Pipeline for Detection of Viral Reads from Complex Metagenomic Data

Introduction: Next-generation sequencing (NGS) allows the comprehensive characterization of the virome and discovery of novel viruses, as it enables culture-independent, agnostic assessment of viral nucleic acid within a sample. We have produced software in the form of an automated pipeline, ViroMatch, that takes raw NGS sequencing reads as input and performs read quantification and associated virus taxonomy classification. We are extending this pipeline to include variant analysis.

Methods: ViroMatch is written in Python and is implemented as a DAG (Directed Acyclic Graph) workflow using Snakemake, which supports single or parallel processing modes. ViroMatch incorporates both nucleotide and translated amino acid sequence alignment against a comprehensive database of viral reference genomes, which allows us the sensitivity to detect highly conserved and divergent viral sequences. Additionally, we have pre-compiled local viral sequences from NCBI (RefSeq, nt & nr) and associated annotation. ViroMatch is available through an executable Docker image.

Results: The ViroMatch pipeline has been run over 10,000 times across multiple sample types with a focus on evaluating vertebrate/human viruses. Studies include the maternal and paternal virome and in vitro fertilization outcomes, maternal virome and preterm birth, the virome in patients with post-transplant lymphoproliferative disease, and the respiratory tract virome in children at risk for asthma. Thus far, variant calling tests have been focused on papillomavirus genomes from the maternal virome and preterm birth study.

Impact: ViroMatch is a staple in our work involving genomics-based detection and evaluation of viruses, and we anticipate this resource will be of great interest to others in the scientific and medical communities. Our databases and companion tools are extensible, and we anticipate periodic updates based on new virus data. ViroMatch is currently an active, key component of four NIH R01 funded projects.

Organization – Washington University in St. Louis

Wylie TN, Wylie KM