Close this search box.

Spotlight on CanCOGeN data sharing progress


Genomics is a data science, generating enormous volumes of complex, rapidly growing data, which have become increasingly valuable as our computing power and analytical tools evolve.

Genomic data has played a pivotal role in Canada’s, and the global, response to COVID-19—from enabling surveillance of SARS-CoV-2 spread and new variants of concern, to measuring the efficacy of vaccines and COVID-19 treatments, to fuelling research into the genetic factors influencing COVID-19 severity. Given the importance of data sharing for public health-decision making and prevention and treatment of COVID-19, it has been a core priority for CanCOGeN.

We asked some members of CanCOGeN’s Data Sharing Committee to weigh in on progress made since the network launched in April 2020. The Data Sharing Committee was launched in January 2021 to address the limited number of publicly available Canadian genome sequences.

CanCOGeN data sharing committee

L-R: Ma’n H. Zawati, Yann Joly, Art Poon, Natalie Knox, Gijs van Rooijen, Will Hsiao

How has the data sharing landscape for VirusSeq changed over the past two years, and what’s next? 

“The speed and level of completeness of viral sequence data deposit has substantially improved. The big challenge ahead is to provide centralized, efficient access to more sensitive host metadata, and to implement a secure, ethical process at CPLHN and other Canadian institutions that will allow for data linkages.”

– Dr. Yann Joly, Research Director of the Centre of Genomics and Policy at McGill University, and Chair of the CanCOGeN Data Sharing Committee as well as VirusSeq’s Ethics and Governance Working Group

EXPLORE on the CanCOGeN blog – Tackling COVID-19 through genomics data sharing: Q&A with Dr. Yann Joly

How has Canada’s performance on data sharing changed over the past two years, and is there anything to learn from other countries? 

“A year ago, Canada was one of the leading contributors of SARS-CoV-2 genomes in the world by volume, but a third of these genomes were released with incomplete dates of sampling, and their release was delayed by nearly five months on average. Today, Canada continues to lead the world in the number of genomes published (over a quarter million), and we are catching up in the other data sharing metrics. Incomplete dates now only affect 1 in 7 genomes, and the average delay is down to two months putting Canada in the middle of the pack. In the past year, we have moved from the 87th-percentile of all countries to the 46th.”

– Dr. Art Poon, Associate Professor in Virus Evolution and Bioinformatics at Western University, Canada, and member of the CanCOGeN Data Sharing Committee.

How has data sharing between provinces changed over the past two years, and what impact will these changes have in the future? What potential impacts are there beyond the COVID-19 pandemic?

“Canada recognizes the unique role of the provinces to manage and deliver healthcare solutions to its citizens. Unfortunately, pandemics do not recognize borders which necessitates the needs to share pandemic related health information, such as the sequence of SARS-CoV-2—including its variants—quickly across jurisdictional borders while respecting the privacy of information that could stigmatize groups or individuals. The data sharing policies of CanCOGeN have allowed for an appropriate balance between these two diverging considerations, which will serve as a blueprint for any future pandemic related health information data sharing.”

– Dr. Gijs van Rooijen, Chief Scientific Officer at Genome Alberta, and member of the CanCOGeN Data Sharing Committee

We have seen significant improvements in the quality of data generated and shared by CanCOGeN over the last two years. In what ways has data quality improved and how has that impacted data sharing?

“We have come a long way in generating high quality data for release, especially metadata. In the early days of CanCOGeN, the desire to ensure data accuracy delayed data release. However, data curation has made significant progress in streamlining data flow in a complex federated healthcare system while enhancing data quality. Instead of being a hindrance for rapid data release, data quality has become a strength of Canadian COVID-19 genomic datasets.”

– Dr. Will Hsiao, Associate Professor, Health Sciences, Simon Fraser University; Chair of the VirusSeq Metadata Working Group, Member, Data Sharing Committee, CanCOGeN

How has the National Microbiology Laboratory (NML) supported efforts to share data within Canada, and how will the NML continue to push the data sharing agenda forward as VirusSeq transitions to the NML?

“NML’s strong partnership with the CanCOGeN partners and Canadian public health labs has enabled the establishment of robust COVID-19 data sharing standards and workflows in Canada. These efforts are paving the way for sharing other infectious disease genomic data in a rapid and collaborative manner.”

– Dr. Natalie Knox, NML, Head of the Canadian Public Health Laboratory Network (CPHLN)-CanCOGeN COVID-19 Data Analytics Working Group

How has the Canadian VirusSeq Data Portal impacted data sharing since its launch? How could the portal be a game changer for future pandemic response and other health challenges?

“The Canadian VirusSeq Data Portal has provided a more open repository than GISAID to researchers seeking access to viral genome sequences and accompanying minimal metadata for public health research. The portal expert team also assisted CPLHN [Canadian Public Health Laboratory Network] in their data deposit and helped them overcome key challenges in the process.”

– Dr. Yann Joly, Research Director of the Centre of Genomics and Policy at McGill University, and Chair of VirusSeq’s Ethics and Governance Working Group

What advances have been made with the HostSeq Databank in terms of data sharing? 

“CGEn’s HostSeq Databank has put in place a data access office that receives and reviews requests for access to its national database containing genomic, as well as personal and health data. It has also convened an independent pan-Canadian data access committee to make final decisions in an efficient manner. This one-stop-shop model has streamlined the access process and ensured timely approval of requests.”

– Dr. Ma’n H. Zawati, Assistant Professor, McGill University’s Faculty of Medicine and Health Sciences; Executive Director, Centre of Genomics and Policy in the Department of Human Genetics; Lead, HostSeq Data Access Compliance Office

EXPLORE on the CanCOGeN blog – HostSeq: Enabling data sharing to tackle COVID-19 and future health challenges

The Canadian COVID-19 Genomics Network (CanCOGeN) is on a mission to respond to COVID-19 by generating accessible and usable data from viral and host genomes to inform public health and policy decisions, and guide treatment and vaccine development. This pan-Canadian consortium is led by Genome Canada, in partnership with six regional Genome Centres, the National Microbiology Lab and provincial public health labs, genome sequencing centres (through CGEn), hospitals, academia and industry across the country.

Quick facts

Media contact