Azenta Whole Exome Sequencing Analysis Report

1 Project Information

Customer Azenta Life Science
Email
Quote human-WES-somatic
Configuration Illumina

2 Description of Workflow

2.1 Library preparation workflow

Figure 2.1: WES library preparation workfow

2.2 Bioinformatics workflow

Figure 2.2: WES data analysis workfow

3 Analysis

3.1 Sequencing statistics

Raw BCL files generated by the sequencer were converted to FASTQ files for each sample using . The summary statistics for the raw data are shown in Table 3.1.

Table 3.1: Sample seuqencing summary statistics

3.2 Alignment to the reference genome

Sequencing adapters and low quality bases in raw reads were trimmed using Trimmomatic 0.39. Cleaned reads were then aligned to the GRCh38 reference genome using Sentieon 202112.01. Alignments were then sorted and PCR/Optical duplicates were marked. Table 3.2 shows the alignment statistics.

Table 3.2: Summary statistics of alignment

3.3 Somatic SNVs and INDELs

3.3.1 Summary of somatic SNVs and INDELs calling

Somatic SNVs and small INDELs were called by using Sentieon 202112.01 (TNSeq algorithm). The VCF files generated by the pipeline were then normalized (left alignment of INDELs and splitting multiallelic sites into multiple sites) using bcftools 1.13. Overlapped transcripts were identified for each variant and the effects of the variants on the transcripts were predicted by Ensembl VEP 104. Table 3.3 shows the summary statistics of somatic small variant calling.

Table 3.3: Summary of variant calling across all tumor samples

Impact of the variants were classified based on MAF document spcifications. Figure 3.1 shows the variant classification of samples in the cohort.

Classification of variants

Figure 3.1: Classification of variants

DNA substitution mutations are of two types. Transitions are interchanges of purines or pyrimidines. Transversions are interchanges of purine for pyrimidine bases. Figure 3.2 shows the classification of the base substituions.

Base substitution distribution

Figure 3.2: Base substitution distribution

3.3.2 Analysis of top mutated genes

Fig 3.3 shows the most mutated genes in the cohort.

The most mutated genes in the cohort

Figure 3.3: The most mutated genes in the cohort

The distribution of mutation along the genes are plotted as lollipop plot shown in Figure 3.4.