A density-watershed algorithm (DWA) method for robust, accurate and automatic classification of dual-fluorescence and four-cluster droplet digital PCR data
Xiurui Zhu, Shisheng Su, Mingzhu Fu, Zhiyong Peng, Dong Wang, Xiao Rui, Fang Wang, Xiaobin Liu, Baoxia Liu, Lingxiang Zhu,Wenjun Yang, Na Gao, Guoliang Huang, Gaoshan Jing and Yong Guo
Droplet digital PCR (ddPCR) is a single-molecule amplification technology with broad applications in precision medicine and clinical diagnosis. Dual-fluorescence and four-cluster ddPCR (two/four-ddPCR) assay is an effective way to quantify copy numbers. Currently, two/four-ddPCR data are usually classified with manual thresholds. For clinical applications, automatic and accurate methods are required to avoid subjectivity in diagnosis. Although there are some automatic classification algorithms, their accuracy and robustness still need to be improved to meet the needs of clinical diagnosis. Therefore, a new method is in high demand to automatically classify two/four-ddPCR data in an accurate and robust way. Here, a novel density-watershed algorithm (DWA) method was developed for the accurate, automatic and unsupervised classification of two/four-ddPCR data. First, data gridding was applied to a scatter plot of the fluorescence signal intensity to calculate data densities. Based on the data densities, the watershed algorithm was used to divide the gridded scatter plot into isolated regions automatically. Next, an optimal cluster pattern was determined based on these isolated regions, and excess regions were merged. Finally, the two/four-ddPCR data were classified based on the merged regions, and DNA template copy numbers were calculated accordingly. Using the DWA method for the quantification of both wild types and mutants of epidermal growth factor receptor (EGFR) L858R and T790M, the classification results were highly consistent with expectations, and significantly better than commonly-used automatic algorithms for now. The computed template copy numbers scaled proportionally to the relative concentration of input templates (r2 > 0.998) in four orders of magnitude with a good reproducibility, and achieved a limit of detection over 40 times lower than the commonly-used automatic algorithms. Furthermore, the DWA method was validated on 254 clinical DNA samples derived from frozen tissues, formalin-fixed paraffin-embedded tissues and peripheral blood. In most cases, the DWA method derived accurate and valid classification results. This highly effective DWA method may be widely used for automatically classifying two/four-ddPCR data, and it will greatly promote the application of ddPCR in clinical diagnosis.