Detecting Copy Number Variation on Targeted Exon Sequencing with RCNA

Example Data

The data we use for this vignette can be found in the data/ folder under vignette. This dataset has a number of genes specified in the annotation file, and the sample names, raw coverage files, and chrX normalization are specified in sample.csv. Below you can find how we transform the data from the raw flat files into a special S4 object used for analysis.

library(RCNA)
samples <- read.csv(system.file("vignette", "samples.csv", package = "RCNA"), stringsAsFactors = FALSE)
new.obj <- create_RCNA_object(sample.names = samples$sample.name, 
                        ano.file = system.file("vignette" ,"annotations.csv", package = "RCNA"),
                        out.dir = 'output',
                        file.raw.coverage = file.path("data", samples$raw.cov.file),
                        norm.cov.matrix = file.path("output", "norm-cov-matrix.csv.gz"),
                        nkr = 0.9,
                        x.norm = samples$x.norm,
                        low.score.cutoff = -0.5,
                        high.score.cutoff = 0.5,
                        ncpu = ncpu)

new.obj
#> An object of class "RCNA_object"
#> Slot "sample.names":
#>  [1] "sample-0001" "sample-0002" "sample-0003" "sample-0004" "sample-0005" "sample-0006" "sample-0007" "sample-0008" "sample-0009" "sample-0010"
#> 
#> Slot "ano.file":
#> [1] "/home/mbradley/R/x86_64-pc-linux-gnu-library/4.2/RCNA/vignette/annotations.csv"
#> 
#> Slot "out.dir":
#> [1] "output"
#> 
#> Slot "ncpu":
#> [1] 4
#> 
#> Slot "win.size":
#> [1] 75
#> 
#> Slot "gc.step":
#> [1] 0.01
#> 
#> Slot "estimate_gc":
#> [1] TRUE
#> 
#> Slot "nkr":
#> [1] 0.9
#> 
#> Slot "norm.cov.matrix":
#> [1] "output/norm-cov-matrix.csv.gz"
#> 
#> Slot "low.score.cutoff":
#> [1] -0.5
#> 
#> Slot "high.score.cutoff":
#> [1] 0.5
#> 
#> Slot "gcParams":
#>          file.raw.coverage sample.names                file.corrected.coverage
#> 1  data/sample-0001.txt.gz  sample-0001 output/gc/sample-0001.corrected.txt.gz
#> 2  data/sample-0002.txt.gz  sample-0002 output/gc/sample-0002.corrected.txt.gz
#> 3  data/sample-0003.txt.gz  sample-0003 output/gc/sample-0003.corrected.txt.gz
#> 4  data/sample-0004.txt.gz  sample-0004 output/gc/sample-0004.corrected.txt.gz
#> 5  data/sample-0005.txt.gz  sample-0005 output/gc/sample-0005.corrected.txt.gz
#> 6  data/sample-0006.txt.gz  sample-0006 output/gc/sample-0006.corrected.txt.gz
#> 7  data/sample-0007.txt.gz  sample-0007 output/gc/sample-0007.corrected.txt.gz
#> 8  data/sample-0008.txt.gz  sample-0008 output/gc/sample-0008.corrected.txt.gz
#> 9  data/sample-0009.txt.gz  sample-0009 output/gc/sample-0009.corrected.txt.gz
#> 10 data/sample-0010.txt.gz  sample-0010 output/gc/sample-0010.corrected.txt.gz
#> 
#> Slot "nkrParams":
#>                         file.nkr.coverage x.norm sample.names
#> 1  output/gc/sample-0001.corrected.txt.gz  FALSE  sample-0001
#> 2  output/gc/sample-0002.corrected.txt.gz  FALSE  sample-0002
#> 3  output/gc/sample-0003.corrected.txt.gz  FALSE  sample-0003
#> 4  output/gc/sample-0004.corrected.txt.gz  FALSE  sample-0004
#> 5  output/gc/sample-0005.corrected.txt.gz  FALSE  sample-0005
#> 6  output/gc/sample-0006.corrected.txt.gz  FALSE  sample-0006
#> 7  output/gc/sample-0007.corrected.txt.gz  FALSE  sample-0007
#> 8  output/gc/sample-0008.corrected.txt.gz  FALSE  sample-0008
#> 9  output/gc/sample-0009.corrected.txt.gz  FALSE  sample-0009
#> 10 output/gc/sample-0010.corrected.txt.gz  FALSE  sample-0010
#> 
#> Slot "scoreParams":
#>                       file.score.coverage sample.names
#> 1  output/gc/sample-0001.corrected.txt.gz  sample-0001
#> 2  output/gc/sample-0002.corrected.txt.gz  sample-0002
#> 3  output/gc/sample-0003.corrected.txt.gz  sample-0003
#> 4  output/gc/sample-0004.corrected.txt.gz  sample-0004
#> 5  output/gc/sample-0005.corrected.txt.gz  sample-0005
#> 6  output/gc/sample-0006.corrected.txt.gz  sample-0006
#> 7  output/gc/sample-0007.corrected.txt.gz  sample-0007
#> 8  output/gc/sample-0008.corrected.txt.gz  sample-0008
#> 9  output/gc/sample-0009.corrected.txt.gz  sample-0009
#> 10 output/gc/sample-0010.corrected.txt.gz  sample-0010
#> 
#> Slot "commands":
#> list()

Note: While the samples and annotation files have been included with this package, the raw coverage files are not included in this package as they are larger than what is allowed by CRAN. A release of the raw data may be included in future updates to this vignette.

The form of the coverage file should be a space-separated file with no headers and match the exact format as seen below. The columns should represent (1) chromosome, (2) position, (3) target ID - this is optional but not used, (4) coverage in read depth, and (5) the DNA nucleotide base. This vignette uses 10 samples, each spanning 96 genes and exactly 336,583 base pairs.

head(read.table(file.path("data", "sample-0001.txt.gz"), sep = " "))
#>     V1      V2   V3   V4 V5
#> 1 chr1 1787331 r456  977  T
#> 2 chr1 1787332 r456 1004  T
#> 3 chr1 1787333 r456  991  A
#> 4 chr1 1787334 r456  994  G
#> 5 chr1 1787335 r456 1049  T
#> 6 chr1 1787336 r456 1022  T

The annotation file must be a comma-separated file with the columns seen below, including headers. They are (1) the name of the genetic feature, such as a gene name (2) chromosome, (3) feature start position, and (4) feature end position. This vignette uses two genes as features.

head(read.csv(system.file("vignette" ,"annotations.csv", package = "RCNA")))
#>   feature chromosome    start      end
#> 1   ALAS2          X 55009055 55031064
#> 2   SRSF2         17 76734115 76737374

Estimate and Correct GC Bias

The first step to the RCNA workflow is to estimate the GC-content bias in the raw coverage data. This is done using a sliding window approach, which first estimates the GC-content. The GC-content is then normalized against the mean coverage across all bases in the coverage file. Lastly, a corrective multiplicative factor is calculated for each GC-content “bin”. Each bin represents a certain percentage of GC-content in a given window. This correction is then applied iteratively using a second sliding window. The size of the sliding window (in bp) can be specified using the win.size parameter (default = 75 bp). For this example we will use both the default gc.step and win.size parameters. This step can be accomplished by using the correct_gc_bias function, as shown below.

Note on performance: This function will train and correct the entire coverage file. This is beneficial to the accuracy of the correction, as running the correction on only a few genes can result in biased results. Keeping this in mind, the run-time of the algorithm scales linearly in respect to both number of bases as well as the number of samples causing this step to occupy a vast majority of the entire workflow’s run-time. We recommend using a pre-determined GC factor file or providing pre-corrected coverage files if possible to cut down on the execution time.

gc.res <- correct_gc_bias(new.obj, estimate_gc = TRUE, verbose = TRUE)
#> No output file column labeled `file.gc.factor` - defaulting to sample names. See `?correct_gc_bias()` for more information.
#> Beginning GC estimation
#> Calculating GC bias correction
#> Feature score estimation succeeded!
gc.res[[1]]
#> An object of class "RCNA_analysis"
#> Slot "call":
#> [1] "estimate_gc_bias"
#> 
#> Slot "params":
#> $file.raw.coverage
#>  [1] "data/sample-0001.txt.gz" "data/sample-0002.txt.gz" "data/sample-0003.txt.gz" "data/sample-0004.txt.gz" "data/sample-0005.txt.gz" "data/sample-0006.txt.gz" "data/sample-0007.txt.gz" "data/sample-0008.txt.gz" "data/sample-0009.txt.gz"
#> [10] "data/sample-0010.txt.gz"
#> 
#> $file.gc.factor
#>  [1] "output/gc/sample-0001.gc.factor.txt" "output/gc/sample-0002.gc.factor.txt" "output/gc/sample-0003.gc.factor.txt" "output/gc/sample-0004.gc.factor.txt" "output/gc/sample-0005.gc.factor.txt" "output/gc/sample-0006.gc.factor.txt"
#>  [7] "output/gc/sample-0007.gc.factor.txt" "output/gc/sample-0008.gc.factor.txt" "output/gc/sample-0009.gc.factor.txt" "output/gc/sample-0010.gc.factor.txt"
#> 
#> 
#> Slot "res.files":
#> $file.gc.factor
#>  [1] "output/gc/sample-0001.gc.factor.txt" "output/gc/sample-0002.gc.factor.txt" "output/gc/sample-0003.gc.factor.txt" "output/gc/sample-0004.gc.factor.txt" "output/gc/sample-0005.gc.factor.txt" "output/gc/sample-0006.gc.factor.txt"
#>  [7] "output/gc/sample-0007.gc.factor.txt" "output/gc/sample-0008.gc.factor.txt" "output/gc/sample-0009.gc.factor.txt" "output/gc/sample-0010.gc.factor.txt"
gc.res[[2]]
#> An object of class "RCNA_analysis"
#> Slot "call":
#> [1] "correct_gc_bias"
#> 
#> Slot "params":
#> $win.size
#> [1] 75
#> 
#> $gcParams
#>          file.raw.coverage                file.corrected.coverage                      file.gc.factor sample.names
#> 1  data/sample-0001.txt.gz output/gc/sample-0001.corrected.txt.gz output/gc/sample-0001.gc.factor.txt  sample-0001
#> 2  data/sample-0002.txt.gz output/gc/sample-0002.corrected.txt.gz output/gc/sample-0002.gc.factor.txt  sample-0002
#> 3  data/sample-0003.txt.gz output/gc/sample-0003.corrected.txt.gz output/gc/sample-0003.gc.factor.txt  sample-0003
#> 4  data/sample-0004.txt.gz output/gc/sample-0004.corrected.txt.gz output/gc/sample-0004.gc.factor.txt  sample-0004
#> 5  data/sample-0005.txt.gz output/gc/sample-0005.corrected.txt.gz output/gc/sample-0005.gc.factor.txt  sample-0005
#> 6  data/sample-0006.txt.gz output/gc/sample-0006.corrected.txt.gz output/gc/sample-0006.gc.factor.txt  sample-0006
#> 7  data/sample-0007.txt.gz output/gc/sample-0007.corrected.txt.gz output/gc/sample-0007.gc.factor.txt  sample-0007
#> 8  data/sample-0008.txt.gz output/gc/sample-0008.corrected.txt.gz output/gc/sample-0008.gc.factor.txt  sample-0008
#> 9  data/sample-0009.txt.gz output/gc/sample-0009.corrected.txt.gz output/gc/sample-0009.gc.factor.txt  sample-0009
#> 10 data/sample-0010.txt.gz output/gc/sample-0010.corrected.txt.gz output/gc/sample-0010.gc.factor.txt  sample-0010
#> 
#> $gc.step
#> [1] 0.01
#> 
#> 
#> Slot "res.files":
#> $file.corrected.coverage
#>                         file.gc.factor                file.corrected.coverage
#> 1  output/gc/sample-0001.gc.factor.txt output/gc/sample-0001.corrected.txt.gz
#> 2  output/gc/sample-0002.gc.factor.txt output/gc/sample-0002.corrected.txt.gz
#> 3  output/gc/sample-0003.gc.factor.txt output/gc/sample-0003.corrected.txt.gz
#> 4  output/gc/sample-0004.gc.factor.txt output/gc/sample-0004.corrected.txt.gz
#> 5  output/gc/sample-0005.gc.factor.txt output/gc/sample-0005.corrected.txt.gz
#> 6  output/gc/sample-0006.gc.factor.txt output/gc/sample-0006.corrected.txt.gz
#> 7  output/gc/sample-0007.gc.factor.txt output/gc/sample-0007.corrected.txt.gz
#> 8  output/gc/sample-0008.gc.factor.txt output/gc/sample-0008.corrected.txt.gz
#> 9  output/gc/sample-0009.gc.factor.txt output/gc/sample-0009.corrected.txt.gz
#> 10 output/gc/sample-0010.gc.factor.txt output/gc/sample-0010.corrected.txt.gz
new.obj@commands = gc.res

The return value of correct_gc_bias is an RCNA_analysis object which documents the workflow steps.

Alternatively, you can also choose not to estimate the GC bias and instead use a GC factor file. We can inspect the GC factor file that we generated from the previous step to see what the factor file should look like.

gc.file <- read.table(file.path("output", "gc", "sample-0001.gc.factor.txt"), sep = "\t", stringsAsFactors = FALSE)
colnames(gc.file) = c("Percent.GC", "Coverage.Correction.Factor")
gc.file = gc.file[gc.file$Coverage.Correction.Factor > 0,]
head(gc.file, n = 15)
#>    Percent.GC Coverage.Correction.Factor
#> 14       0.13                    0.14400
#> 15       0.14                    0.13987
#> 16       0.15                    0.15162
#> 17       0.16                    0.16245
#> 18       0.17                    0.19365
#> 19       0.18                    0.24451
#> 20       0.19                    0.32753
#> 21       0.20                    0.36505
#> 22       0.21                    0.50538
#> 23       0.22                    0.60806
#> 24       0.23                    0.63844
#> 25       0.24                    0.68623
#> 26       0.25                    0.70939
#> 27       0.26                    0.76110
#> 28       0.27                    0.75852

In the GC estimation step, a sliding window is applied to calculate the number of G and C bases within the window. Additionally, the window calculates the average coverage within the window normalized against mean coverage across all bases and is constantly adjusted as the sliding window proceeds down the coverage file. This results in a GC factor file that has a coverage ratio associated with each GC-content bin, corresponding to the percent of bases within a given sliding window region that are either G or C. The size of the GC-content bins can be adjusted using the gc.step parameter (default = 0.01), but the value must tile the interval of 0 to 1.

So long as this column scheme is followed, a user can also provide their own GC factor file using the file.gc.factor argument. A smaller bin size may result in more finely-tuned bias correction at the cost of increased time to calculate the factor files.

Estimating Normal Karyotype Range

In the next step, we will estimate the expected “normal” karyotype range (NKR) for each feature in the annotation file. This will be used downstream to determine scores for features that exceed their respective features’ NKR. This is similar to defining a confidence interval for the coverage, however it is centered around the mode-normalized median coverage across all samples in the analysis. Please note that the region analyzed is strictly where the coverage file and annotation file overlap - any regions in the coverage file that are not included within the range of a feature in the annotation file will not be analyzed. The ratio of coverage compared to the median is summarized in the file passed as the norm.cov.matrix argument. The normalized coverage ratio matrix will be generated if the file specified does not exist. Additionally, the nkr parameter is a quantile that defines what is considered as the “normal.” This is considered similar to an alpha value when using bootstrapped confidence intervals. Below is an example of how one would execute this function. For this example, we have already specified the desired path of norm.cov.matrix, as well as the value for nkr (default = 0.9).

This step can be accomplished by using the estimate_nkr function.

nkr.res <- estimate_nkr(new.obj)
#> Importing coverage matrix from previous run
#> NKR estimation succeeded!
nkr.res
#> An object of class "RCNA_analysis"
#> Slot "call":
#> [1] "estimate_nkr"
#> 
#> Slot "params":
#> $nkrParams
#>                         file.nkr.coverage x.norm sample.names
#> 1  output/gc/sample-0001.corrected.txt.gz  FALSE  sample-0001
#> 2  output/gc/sample-0002.corrected.txt.gz  FALSE  sample-0002
#> 3  output/gc/sample-0003.corrected.txt.gz  FALSE  sample-0003
#> 4  output/gc/sample-0004.corrected.txt.gz  FALSE  sample-0004
#> 5  output/gc/sample-0005.corrected.txt.gz  FALSE  sample-0005
#> 6  output/gc/sample-0006.corrected.txt.gz  FALSE  sample-0006
#> 7  output/gc/sample-0007.corrected.txt.gz  FALSE  sample-0007
#> 8  output/gc/sample-0008.corrected.txt.gz  FALSE  sample-0008
#> 9  output/gc/sample-0009.corrected.txt.gz  FALSE  sample-0009
#> 10 output/gc/sample-0010.corrected.txt.gz  FALSE  sample-0010
#> 
#> $nkr
#> [1] 0.9
#> 
#> $cov.ratios
#> [1] "output/norm-cov-matrix.csv.gz"
#> 
#> 
#> Slot "res.files":
#> $Output
#> [1] "ALAS2.RData" "SRSF2.RData"
#> 
#> $`Normalized coverage ratios`
#> [1] "output/norm-cov-matrix.csv.gz"
new.obj@commands[[3]] <- nkr.res

The resulting .RData files (found in output/nkr/) contain R workspace objects which define the normal karyotype ranges on each feature in the annotation file. There should be one file per feature. We can load one and inspect the contents if we so wish. Let’s also take a look at the normalized coverage matrix too.

load(file.path("output", "nkr", nkr.res@res.files$Output[1]))
head(res$raw)
#>                     sample-0001 sample-0002 sample-0003 sample-0004 sample-0005 sample-0006 sample-0007 sample-0008 sample-0009 sample-0010
#> ALAS2_chrX_55009180  -0.4712177   0.3079704  -0.5489125  -0.4938741 -0.07685428  -0.6372187  0.06688500  0.05838189   0.8210937   0.1710256
#> ALAS2_chrX_55009181  -0.5421495   0.3375313  -0.5301404  -0.4759917 -0.04093790  -0.5775500  0.05953157  0.02246552   0.8261610   0.2160292
#> ALAS2_chrX_55009182  -0.5991689   0.2725962  -0.5727809  -0.5267949 -0.06078825  -0.5995822  0.02852741  0.06799289   0.8366352   0.1749022
#> ALAS2_chrX_55009183  -0.5206929   0.3876275  -0.4933843  -0.4266533 -0.07423166  -0.5755403  0.04197082  0.07797758   0.8431989   0.2068355
#> ALAS2_chrX_55009184  -0.5002292   0.3359176  -0.6075949  -0.5254258 -0.10150053  -0.5629351  0.06923969  0.10888241   0.8308551   0.2418197
#> ALAS2_chrX_55009185  -0.5282313   0.2823252  -0.5591150  -0.5149572 -0.05550205  -0.5885885  0.09700744  0.03702967   0.8127750   0.1673071

norm.cov.mat = read.csv(nkr.res@res.files$`Normalized coverage ratios`, stringsAsFactors = FALSE)
head(norm.cov.mat)
#>               sample.0001 sample.0002 sample.0003   sample.0004 sample.0005 sample.0006 sample.0007 sample.0008 sample.0009 sample.0010
#> _chr1_1787331  0.09619511 0.034520150  -0.2518999 -0.0409949001 -0.07862110  0.10284579  0.02632270  -0.1125486  0.10600302  -0.1263846
#> _chr1_1787332  0.07800200 0.014416556  -0.1484041 -0.0233834615 -0.14537101  0.05658341  0.01301504  -0.2154538  0.07562834  -0.1691862
#> _chr1_1787333  0.09349493 0.054155189  -0.1638869  0.0031294940 -0.04512092  0.10563155 -0.01780169  -0.1404977  0.17946723  -0.1549712
#> _chr1_1787334  0.02663941 0.035837190  -0.2692994  0.0210334554 -0.05555740  0.04344290 -0.03570566  -0.2147050  0.07125268  -0.1644438
#> _chr1_1787335  0.12655848 0.039913718  -0.2052132  0.0009800749 -0.10926807  0.06087082 -0.01565227  -0.1498817  0.15122988  -0.1489632
#> _chr1_1787336  0.08408666 0.006827653  -0.2353454 -0.0157945579 -0.06202690  0.09588569  0.01166820  -0.2026655  0.13805799  -0.1346327

We can see that the raw attribute from the feature file lines up to the rows of the normalized coverage matrix that contain that feature. Both of these represent the normalized coverage at that location (found in the rows) for each sample (found in the column names). We can inspect the estimate_nkr results further to find the mode-normalized median coverage and the bounds on the normal karyotype range.

head(res$val.pos) # median coverage across ALL samples
#> [1] -0.009236 -0.009236 -0.016130 -0.016130 -0.016130 -0.009236
head(res$uci.pos) # NKR upper bound
#> [1] 0.590188 0.606278 0.582818 0.638192 0.608133 0.574073
head(res$lci.pos) # NKR lower bound
#> [1] -0.597481 -0.561620 -0.599396 -0.550859 -0.587498 -0.575325

Estimating feature score

In the final step of the workflow we estimate the CNA score of a feature by calculating what percentage of the feature lands above the upper bound calculated in the previous step. This step outputs two files - one with all results and one with only the results that pass a score cutoff filter specified by the score.cutoff parameter(s) (default = 0.5). This parameter is important for removing false positives, and as seen in the original manuscript may require fine tuning to achieve an acceptable Type I error rate.

This step can be accomplished by using the estimate_feature_score function.

score.res <- estimate_feature_score(new.obj)
#> Feature score estimation succeeded!
score.res
#> An object of class "RCNA_analysis"
#> Slot "call":
#> [1] "estimate_feature_score"
#> 
#> Slot "params":
#> $scoreParams
#>                       file.score.coverage sample.names
#> 1  output/gc/sample-0001.corrected.txt.gz  sample-0001
#> 2  output/gc/sample-0002.corrected.txt.gz  sample-0002
#> 3  output/gc/sample-0003.corrected.txt.gz  sample-0003
#> 4  output/gc/sample-0004.corrected.txt.gz  sample-0004
#> 5  output/gc/sample-0005.corrected.txt.gz  sample-0005
#> 6  output/gc/sample-0006.corrected.txt.gz  sample-0006
#> 7  output/gc/sample-0007.corrected.txt.gz  sample-0007
#> 8  output/gc/sample-0008.corrected.txt.gz  sample-0008
#> 9  output/gc/sample-0009.corrected.txt.gz  sample-0009
#> 10 output/gc/sample-0010.corrected.txt.gz  sample-0010
#> 
#> $low.score.cutoff
#> [1] -0.5
#> 
#> $high.score.cutoff
#> [1] 0.5
#> 
#> 
#> Slot "res.files":
#> $Output
#> [1] "RCNA-output-scorepass.csv" "RCNA-output.csv"
new.obj@commands[[4]] <- score.res

Let’s inspect the final results to get a better understanding of what they mean.

out.pass <- read.csv(file.path(new.obj@out.dir, "score", score.res@res.files$Output[[1]]), header = T, stringsAsFactors = F)
out.all <- read.csv(file.path(new.obj@out.dir, "score", score.res@res.files$Output[[2]]), header = T, stringsAsFactors = F)
out.all
#>    Sample.Name Feature Chromosome Feature.Start.Position Feature.End.Position   Score Percent.Above.NKR Percent.Below.NKR Median.Normalized.log2.Ratio Feature.Positions
#> 1  sample-0001   ALAS2       chrX               55009055             55031064 -0.0937              0.00              9.37                      -0.6073              1836
#> 2  sample-0002   ALAS2       chrX               55009055             55031064  0.0005              0.05              0.00                       0.3021              1836
#> 3  sample-0003   ALAS2       chrX               55009055             55031064 -0.7702              0.00             77.02                      -0.6852              1836
#> 4  sample-0004   ALAS2       chrX               55009055             55031064 -0.0223              0.00              2.23                      -0.5378              1836
#> 5  sample-0005   ALAS2       chrX               55009055             55031064  0.0033              0.33              0.00                      -0.0006              1836
#> 6  sample-0006   ALAS2       chrX               55009055             55031064 -0.1138              0.00             11.38                      -0.5437              1836
#> 7  sample-0007   ALAS2       chrX               55009055             55031064  0.0000              0.00              0.00                      -0.0116              1836
#> 8  sample-0008   ALAS2       chrX               55009055             55031064  0.0000              0.00              0.00                       0.0576              1836
#> 9  sample-0009   ALAS2       chrX               55009055             55031064  0.9553             95.53              0.00                       0.6512              1836
#> 10 sample-0010   ALAS2       chrX               55009055             55031064  0.0408              4.08              0.00                       0.2703              1836
#> 11 sample-0001   SRSF2      chr17               76734115             76737374  0.0616              6.16              0.00                       0.0292               666
#> 12 sample-0002   SRSF2      chr17               76734115             76737374  0.4715             50.45              3.30                       0.0979               666
#> 13 sample-0003   SRSF2      chr17               76734115             76737374  0.1291             12.91              0.00                       0.0235               666
#> 14 sample-0004   SRSF2      chr17               76734115             76737374  0.1081             10.81              0.00                       0.0157               666
#> 15 sample-0005   SRSF2      chr17               76734115             76737374 -0.7553              1.20             76.73                      -0.2732               666
#> 16 sample-0006   SRSF2      chr17               76734115             76737374  0.0000              0.30              0.30                      -0.0132               666
#> 17 sample-0007   SRSF2      chr17               76734115             76737374 -0.0841              0.15              8.56                      -0.1064               666
#> 18 sample-0008   SRSF2      chr17               76734115             76737374  0.0060              1.05              0.45                      -0.1035               666
#> 19 sample-0009   SRSF2      chr17               76734115             76737374  0.0165              3.30              1.65                       0.0083               666
#> 20 sample-0010   SRSF2      chr17               76734115             76737374  0.0465             13.66              9.01                       0.0574               666

A score above 0 represents a feature that was detected to have a copy number gain, while a negative score represents a feature detected to have a copy number loss. The Percent.Above.NKR and Percent.Below.NKR fields represent which parts of the feature exhibit coverage that exceed the normal karyotype range. The last column represents the number of base pairs in the feature.

Detecting Copy Number Variation on Targeted Exon Sequencing with RCNA

Matt Bradley

2024-11-20

Introduction

Example Data

Estimate and Correct GC Bias

Estimating Normal Karyotype Range

Estimating feature score

run_RCNA