Variant calling

The clinical interpretation of genomes is sensitive to the identification of individual genetic variants among the millions populating each genome, necessitating extreme accuracy. Standard variant-calling tools are prone to systematic errors that are associated with the subtleties of sample preparation, sequencing technology, sequence context, and the sometimes unpredictable influence of biology such as somatic mosaicism. A mixture of statistical techniques including hand-crafted features such as strand-bias or population-level dependencies] are used to address these issues, resulting in high accuracy but biased errors. AI algorithms can learn these biases from a single genome with a known gold standard of reference variant calls and produce superior variant calls. DeepVariant, a CNN-based variant caller trained directly on read alignments without any specialized knowledge about genomics or sequencing platforms, was recently shown to outperform standard tools on some variant-calling tasks. The improved accuracy is thought to be due to the ability of CNNs to identify complex dependencies in sequencing data. In addition, recent results suggest that deep learning is poised to revolutionize base calling (and as a result, variant identification) for nanopore-based sequencing technologies, which have historically struggled to compete with established sequencing technology because of the error-prone nature of prior base-calling algorithms