User Tools

Site Tools


hla_typing

HLA Allele Genotyping

EpiToolKit provides an interface to OptiType [1], which is a novel approach for HLA genotyping based on NGS data. OptiType uses integer linear programming to solve a customized formulation of the well known maximum set covering problem. In doing so, OptiType simultaneously selects the most likely HLA allele combination.

The different configuration steps are explained in the following:

Step 1: Pre-processing

To avoid uploading large NGS files and long a runtime please pre-filter your data with the following command:

>razers3 --percent-identity 90 --max-hits 1 --distance-range 0 --output sample_fished.sam reference.fasta sample.fastq

This will filter reads that loosely map to the HLA cluster. If you are using a different read mapper make sure to use similar parameters. Reference files can be found at https://github.com/FRED-2/OptiType/tree/master/data

OptiType requires fastq files as input type. You can convert SAM files to fastq with the following command on Unix systems:

>cat sample_fished.sam | grep -v ^@ | awk '{print "@"$1"\n"$10"\n+\n"$11}' > sample_fished.fastq

Step 2: Data Input

In the first step you can specify the input files by selecting fastq files from the History panel. You have to specify whether your read data are of DNA or RNA origin by selecting the corresponding entry in the Data Type drop-down menu. If you have paired-end data you can choose paired-end in the Technique drop-down menu, which will generate an additional input field for selecting the second paired-end sample file. To upload the fastq file into the History panel either use the (1.) Upload File tool or the (2.) jQuery Upload tool.

Figure 1. Upload possibilities. 1. Upload tool. 2. jQuery Upload tool.

More advanced users can also fine-tune the algorithm by specifying additional parameter in the Advanced Settings section. Here you can specify β which determines the penalization of heterozygous loci and reflects the percentage of reads that have to be additionally explained by a heterozygous locus to be chosen over a homozygous locus. The algorithm also allows generating additional (sub)optimal solutions. The number of additionally generated solutions impacts the total runtime!

Step 3: Result

OptiType generates two outputs. The first output is an internal representation of the genotyping results and can be used as input for Epitope Prediction, Polymorphic Epitope Prediction or Epitope Selection. The second output is an interactive html page. It gives a summary of the input data and the specified configurations and represents the genotyping results in a sortable table. Besides the selected HLA alleles the number of reads that can potentially originate from the genotype as well as the achieved genotype score is reported.

Figure 1: Coverage plot for the top genotype solution found by OpitType.

Additionally, a customized coverage plot is generated for the top genotype solution (Figure 1). The blue area in the coverage plot marks the position of exon 2 and 3. The dark green curve describes unambiguously perfectly aligned reads and the light green curve describes ambiguously perfectly aligned reads, whereas the dark red curve and the light red curve describe unambiguously and ambiguously aligned reads with mismatches accordingly.

Reference

  1. Szolek, A*, Schubert, B*, Mohr, C*, Sturm, M, Feldhahn, M, & Kohlbacher, O. (2014). OptiType: precision HLA typing from next-generation sequencing data. Bioinformatics. doi: 10.1093/bioinformatics/btu548
hla_typing.txt · Last modified: 2014/12/19 10:48 by schubert