Penalized weighted low-rank approximation for robust recovery of recurrent copy number variations

UNCG Author/Contributor (non-UNCG co-authors, if there are any, appear on document)
Xiaoli Gao, Associate Professor (Creator)
The University of North Carolina at Greensboro (UNCG )
Web Site:

Abstract: 2015-2016 UNCG University Libraries Open Access Publishing Fund Grant Winner. BackgroundCopy number variation (CNV) analysis has become one of the most important researchareas for understanding complex disease. With increasing resolution of array-basedcomparative genomic hybridization (aCGH) arrays, more and more raw copy numberdata are collected for multiple arrays. It is natural to realize the co-existence of bothrecurrent and individual-specific CNVs, together with the possible data contaminationduring the data generation process. Therefore, there is a great need for an efficient androbust statistical model for simultaneous recovery of both recurrent and individualspecificCNVs.ResultWe develop a penalized weighted low-rank approximation method (WPLA) for robustrecovery of recurrent CNVs. In particular, we formulate multiple aCGH arrays into arealization of a hidden low-rank matrix with some random noises and let an additionalweight matrix account for those individual-specific effects. Thus, we do not restrict therandom noise to be normally distributed, or even homogeneous. We show itsperformance through three real datasets and twelve synthetic datasets from different typesof recurrent CNV regions associated with either normal random errors or heavilycontaminated errors.ConclusionOur numerical experiments have demonstrated that the WPLA can successfully recoverthe recurrent CNV patterns from raw data under different scenarios. Compared with twoother recent methods, it performs the best regarding its ability to simultaneously detectboth recurrent and individual-specific CNVs under normal random errors. Moreimportantly, the WPLA is the only method which can effectively recover the recurrentCNVs region when the data is heavily contaminated.

Additional Information

BMC Bioinformatics
Language: English
Date: 2015
Copy number variation, Fused lasso, Low-rank approximation, Recurrent copy number variation, Penalized weighted approximation

Email this document to