Azenta Whole Exome Sequencing Analysis Report
1 Project Information
Customer | Azenta Life Science |
ngs@azenta.com | |
Quote | human-WES-somatic |
Configuration | Illumina |
2 Description of Workflow
2.1 Library preparation workflow
Figure 2.1: WES library preparation workfow
2.2 Bioinformatics workflow
Figure 2.2: WES data analysis workfow
3 Analysis
3.1 Sequencing statistics
Raw BCL files generated by the sequencer were converted to FASTQ files for each sample using . The summary statistics for the raw data are shown in Table 3.1.
3.2 Alignment to the reference genome
Sequencing adapters and low quality bases in raw reads were trimmed using Trimmomatic 0.39. Cleaned reads were then aligned to the GRCh38 reference genome using Sentieon 202112.01. Alignments were then sorted and PCR/Optical duplicates were marked. Table 3.2 shows the alignment statistics.
3.3 Somatic SNVs and INDELs
3.3.1 Summary of somatic SNVs and INDELs calling
Somatic SNVs and small INDELs were called by using Sentieon 202112.01 (TNSeq algorithm). The VCF files generated by the pipeline were then normalized (left alignment of INDELs and splitting multiallelic sites into multiple sites) using bcftools 1.13. Overlapped transcripts were identified for each variant and the effects of the variants on the transcripts were predicted by Ensembl VEP 104. Table 3.3 shows the summary statistics of somatic small variant calling.
Impact of the variants were classified based on MAF document spcifications. Figure 3.1 shows the variant classification of samples in the cohort.
Figure 3.1: Classification of variants
DNA substitution mutations are of two types. Transitions are interchanges of purines or pyrimidines. Transversions are interchanges of purine for pyrimidine bases. Figure 3.2 shows the classification of the base substituions.
Figure 3.2: Base substitution distribution
4 Deliverables
- For each sample:
- {sample}_R1/2_001.fastq.gz: Raw FASTQ files.
- {sample}.aln.bam: Sorted and duplicate marked BAM file.
- For each tumor sample:
- {tumor_sample}_somatic.vcf.gz: Raw VCF file.
- {tumor_sample}_somatic_vep_anno.vcf.gz: VCF file annotated using VEP.
- {tumor_sample}_somatic_vep_anno.tsv.gz: VEP annotated variants in tab delimited text format.
- {tumor_sample}_somatic_vep_anno.maf.gz: MAF file for human somatic variants.
- {tumor_sample}_somatic_sv.vcf.gz: SV analysis VCF file.
- {tumor_sample}_somatic_cnv.vcf.gz: CNV analysis VCF file.
- {tumor_sample}_somatic_cnv.cns: CNV analysis CNS file (tab delimited text file).
- For the project:
- Azenta Data Analysis Report
- {project}_joint_somatic.maf.gz
Please note that certain deliverables may not be available for some species and projects.