Q&A with Dr. Jared Simpson
As scientists across Canada race to track the spread and evolution of COVID-19 through virus sequencing, software developed by Dr. Jared Simpson’s research team is enabling quality control of the samples being tested.
The Simpson Lab, housed in the Ontario Institute for Cancer Research and the University of Toronto, develops new ways to analyze genomic sequencing data. The lab’s open-source ncov-tools software has been instrumental in Canada’s virus sequencing efforts – detecting quality issues in viral samples, as well as ensuring control samples are functioning properly.
We asked Dr. Simpson about the importance of the tools his lab is developing for national sequencing efforts and his work as a member of the Canadian COVID-19 Genomics Network (CanCOGeN). Dr. Simpson sits on CanCOGeN’s VirusSeq Implementation Committee and leads VirusSeq’s Quality Assurance/Quality Control (QA/QC) initiatives.
“There are over a dozen sites across the country sequencing genomes, and each month, new sites come online. We standardized the methods used to sequence and analyze the genomes, which allows us to consistently produce the high-quality data we need to see how the virus varies across Canada.” — Dr. Jared Simpson
Why is your software playing such a central role in Canada’s SARS-CoV-2 sequencing efforts?
It has to do with standardization and quality control. CanCOGeN is a national project, and we wanted to be able to compare sequencing results from coast to coast. This requires having consistently high-quality data: if a genome is sequenced in British Columbia, is it equivalent in quality to one sequenced in Quebec or Nova Scotia?
Very early on, we decided that we would standardize the sequencing and analysis methods used to generate a genome, and we all adopted the same software and the same analysis pipelines to do that. The Quality Control Workgroup at VirusSeq brought everyone together to decide on exactly what the quality standards for the project would be.
What do you mean by quality?
The starting point of genome sequencing is a clinical sample. A sample may contain a lot of virus or very little virus. When there is little virus in the sample it is much harder to sequence. So, we needed ways to identify these challenging samples that may not be useful for further analysis, like lineage tracking. One of the main criteria we use is how much of the viral genome we were able to reconstruct after sequencing, and that is typically expressed as a percentage. If the clinical sample doesn’t contain very much virus, we might only reconstruct 50 per cent of the genome. Our criteria for a high-quality genome is if it’s at least 90 per cent complete. We also look at things like the number of mutations, the type of mutations and certain patterns that may indicate that samples were mixed together.
In a country such as Canada with different provincial health systems, why do you think your software has been so universally adopted?
We have a collaborative team at CanCOGeN. Everybody recognized the need for data standardization. If you can standardize the analysis pipelines and standardize all the quality control early in the project, it saves you a lot of difficulty later on when you go to integrate the data to build a national picture.
Collaboration and diverse expertise has been key to developing these tools. For example, my colleague Richard de Borja, a computational biologist, works closely with me on this project. Richard is the lead developer and maintainer of the ncov-tools software package for doing all these quality control checks. He also runs ncov-tools on all genomes that are sequenced at our institute, OICR. ncov-tools is available on GitHub, so anybody can take their sequencing results and run them through this software pipeline to get a quality report for their genomes.
You lead Quality Assurance/Quality Control (QA/QC) initiatives at VirusSeq. What else has this group been working on, and why is it important for Canada’s pandemic response?
Establishing the standard that says it’s a high-quality genome was a critical step. But another key function of the quality control working group is identifying areas where the analysis pipelines that turn raw sequencing data into a finished genome can be improved. The working group acts as a discussion forum for identifying any issues with sequencing that some groups might be uncertain about. If they notice a problem, they can take it to the QA/QC Working Group for discussion with other experts from around the country. This has led to improvements to the analysis pipelines, new types of quality control criteria, and improvements to the sequencing protocols.
How have your international collaborations influenced your work related to COVID-19?
I’ve worked with Nick Loman and Josh Quick at the University of Birmingham in the United Kingdom for many years. In 2015 they built a portable genome sequencing system to perform viral surveillance directly in the field while working on Ebola in West Africa. I helped them by writing analysis software called nanopolish to interpret the genome sequencing data. Their project led to the ARTIC Network, which developed one of the main protocols used to sequence coronavirus.
This long-term collaboration with Nick and Josh, and now also John Tyson in British Columbia and Matt Loose at the University of Nottingham, led to my involvement in sequencing in Ontario and subsequently with CanCOGeN.
The Canadian COVID-19 Genomics Network (CanCOGeN) is on a mission to respond to COVID-19 by generating accessible and usable data from viral and host genomes to inform public health and policy decisions, and guide treatment and vaccine development. This pan-Canadian consortium is led by Genome Canada, in partnership with six regional Genome Centres, the National Microbiology Lab and provincial public health labs, genome sequencing centres (through CGEn), hospitals, academia and industry across the country.
Photo credits:
- Photo by J.P. Moczulski – CP Images
- Photo courtesy of S Lawler – OICR