A recent study posted to the medRxiv* pre-print server estimated severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) epidemiological parameters by using various sampling strategies on the sequencing data of viral genomes.
Emerging SARS-CoV-2 variants of concern (VOCs) are extensively studied by analyzing viral genome sequences. This sequencing data is crucial to understanding the genetic variation in different VOCs, the difference in transmissibility of the VOCs, and their impact on disease severity and related hospitalizations. Epidemiological information obtained from genomic data can be utilized to facilitate better handling of the coronavirus disease 2019 (COVID-19) pandemic.
About the study
The present study explored the effects of different sampling strategies on important epidemiological parameters obtained from genomic sequencing data of SARS-CoV-2.
They included data from the Amazonas region and Hong Kong and calculated empirical epidemiological parameters. The basic reproduction number (R0) was estimated in the two data sources using confirmed SARS-CoV-2 cases over time. Maximum likelihood methods were used to evaluate the number of confirmed cases in a week, which were then fitted to a susceptible-exposed-infectious-recovered (SEIR) model.
The time-varying or effective reproduction number (Rt) was estimated using the EpiFilter model which in turn employed a renewal transmission model that described the dependence of the number of new confirmed cases at time t on the Rt at time t.
EpiFilter utilized information available on past and future incidences to reduce the dependence on assumptions and was used as a reference for parameters derived from genomic data. The relationship between growth rate (rt) and Rt was used to estimate rṭ.
Complete genomes of SARS-CoV-2 were collected, screened, and the sequences obtained for the two groups were aligned. In the aligned sequences, the last 50 base pairs (bp) and the first 130 bp were removed to eliminate any sequencing artifacts. The quality control of both datasets was conducted based on completeness, diversity, and ambiguity of bases in each sequence, and the eligible sequences were selected for the study.
The two datasets of the Amazonas and the Hong Kong regions were subclassified into samples collected per week following a temporal sampling scheme which was based on the quantity of confirmed SARS-CoV-2 cases that week. The sample schemes were characterized as uniform, proportional, and allowed reciprocal-proportional sampling with no application of a sampling strategy.
The study results showed that a sampling intensity of 11.6% and 2.4% was observed in the number of COVID-19 cases in Hong Kong and the Amazonas throughout the study period. The number of confirmed SARS-CoV-2 cases per week in Hong Kong and the Amazonas in the unsampled scheme was 117 and 196, and that in the proportional sampling scheme was 54 and 168, while the uniform sampling scheme had 79 and 150 cases, and the reciprocal-proportional sampling scheme had 84 and 67 confirmed cases, respectively.
The correlation between sampling dates and genetic divergence ranged from 0.36 to 0.52 for the Hong Kong group and from 0.13 to 0.20 for the Amazonas region, indicating a higher temporal signal in the Hong Kong region. The mean submission rate in Hong Kong was between 9.16×10-4 and 2.09×10-3 substitutions per site per year (s/s/y), with overlapping Bayesian credible intervals (BCIs) in the sampling schemes. Also, the gradient of the slope (clock rate) was similar to the root-to-tip regression estimations and the early approximations of the mean substitution rate of SARS-CoV-2.
Overlapped BCIs were observed in the sampling scheme in the Amazonas region with a mean substitution rate between 4.00 x 10-4 and 5.56 x 10-4 s/s/y which indicated that the sampling strategy did not affect the clock rate estimation. The value of R0 in the Hong Kong region and the Amazonas region was 2.17 and 3.67, respectively. In both the datasets, all of the sampling schemes had similar values of R0 indicating that R0 is independent of the changes in the sampling method.
The study findings showed that values of Rt and rt are dependent on changes in the sampling schemes while values of R0 did not change according to the sampling strategies. In the Hong Kong subset, the proportional sampling scheme could predict the reduction in Rt due to the application of public restrictions and the following increase and decrease of Rt during the emergence of new variants of concern in the second wave.
The researchers believe that temporal sampling strategies can help create a generalized sampling framework to obtain essential epidemiologic information from genomic datasets collected from around the globe. This study noted that surveillance and analysis of genomic information are crucial to the appropriate handling of the COVID-19 pandemic.
medRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.