Azenta Whole Exome Sequencing Analysis Report

1 Project Information

Customer	Azenta Life Science
Email	ngs@azenta.com
Quote	human-WES-somatic
Configuration	Illumina

2 Description of Workflow

2.1 Library preparation workflow

Figure 2.1: WES library preparation workfow

2.2 Bioinformatics workflow

Figure 2.2: WES data analysis workfow

3 Analysis

3.1 Sequencing statistics

Raw BCL files generated by the sequencer were converted to FASTQ files for each sample using . The summary statistics for the raw data are shown in Table 3.1.

Table 3.1: Sample seuqencing summary statistics

3.2 Alignment to the reference genome

Sequencing adapters and low quality bases in raw reads were trimmed using Trimmomatic 0.39. Cleaned reads were then aligned to the GRCh38 reference genome using Sentieon 202112.01. Alignments were then sorted and PCR/Optical duplicates were marked. Table 3.2 shows the alignment statistics.

Table 3.2: Summary statistics of alignment

3.3 Somatic SNVs and INDELs

3.3.1 Summary of somatic SNVs and INDELs calling

Somatic SNVs and small INDELs were called by using Sentieon 202112.01 (TNSeq algorithm). The VCF files generated by the pipeline were then normalized (left alignment of INDELs and splitting multiallelic sites into multiple sites) using bcftools 1.13. Overlapped transcripts were identified for each variant and the effects of the variants on the transcripts were predicted by Ensembl VEP 104. Table 3.3 shows the summary statistics of somatic small variant calling.

Table 3.3: Summary of variant calling across all tumor samples

Impact of the variants were classified based on MAF document spcifications. Figure 3.1 shows the variant classification of samples in the cohort.

Figure 3.1: Classification of variants

DNA substitution mutations are of two types. Transitions are interchanges of purines or pyrimidines. Transversions are interchanges of purine for pyrimidine bases. Figure 3.2 shows the classification of the base substituions.

Figure 3.2: Base substitution distribution

3.3.2 Analysis of top mutated genes

Fig 3.3 shows the most mutated genes in the cohort.

Figure 3.3: The most mutated genes in the cohort

The distribution of mutation along the genes are plotted as lollipop plot shown in Figure 3.4.

Figure 3.4: Distribution of mutations along the genes

4 Deliverables

For each sample:
- {sample}_R1/2_001.fastq.gz: Raw FASTQ files.
- {sample}.aln.bam: Sorted and duplicate marked BAM file.
For each tumor sample:
- {tumor_sample}_somatic.vcf.gz: Raw VCF file.
- {tumor_sample}_somatic_vep_anno.vcf.gz: VCF file annotated using VEP.
- {tumor_sample}_somatic_vep_anno.tsv.gz: VEP annotated variants in tab delimited text format.
- {tumor_sample}_somatic_vep_anno.maf.gz: MAF file for human somatic variants.
- {tumor_sample}_somatic_sv.vcf.gz: SV analysis VCF file.
- {tumor_sample}_somatic_cnv.vcf.gz: CNV analysis VCF file.
- {tumor_sample}_somatic_cnv.cns: CNV analysis CNS file (tab delimited text file).
For the project:
- Azenta Data Analysis Report
- {project}_joint_somatic.maf.gz

Please note that certain deliverables may not be available for some species and projects.