User Tools

Site Tools


Polymorphic Epitope Prediction

Polymorphic Epitope Prediction is based on SNEP [1] and extends Epitope Prediction by incorporating variant information. From these variants neoantigens are constructed which enables the discovery of neoepitopes that are influenced by the used variant information. These neoepitopes play an important role in cancer immunotherapy since they usually are novel peptide sequences that can only be found the tumor cells. Therefore, these epitopes represent promising targets for personalized cancer vaccines.

Polymorphic Epitope Prediction is primarily designed for human data and variant analysis based on HG19 as reference.

In the following the different configuration steps of Polymorphic Epitope Prediction are described.

Step 1: Data Input

In the first step you specify the variants from which neoepitopes should be generated. You have two options as input:

  • Protein ID Input. You can directly enter RefSeq [1], UniProt [2] protein IDs (either space or comma separated) by specifying the ID type in the Choose Input Format drop-down menu.
  • VCF File from History. You can specify a VCF file from the History panel, which was previously uploaded. Make sure to HG19 as reference for variant calling.

If protein IDs are used all known variants annotated in dbSNP [4] are extracted and used for neoepitope generation. The variants are than annotated with ANNOVAR [5] and all possible neoepitopes are generated based on the extracted variants.

You also have to specify the required length of the epitopes [8-16 AA]. Depending on the selected length the available prediction methods are filtered.

Additionally, you can specify an HLA Allele file from History. The Allele file contains HLA alleles in new nomenclature up to a detail level of 4-digits. The so specified alleles are used for predictions, if the selected prediction model supports them.

Advanced Options

Polymorphic Epitope Prediction supports single nucleotide variations, insertion, deletions, and frame shifts. You can specify what type variations should be considered during the analysis by checking the corresponding checkboxes under Filter Variant Types.

Step 2: Prediction Methods

In the second step the prediction methods to be used are selected. Multiple methods can be used at the same time, but at least one prediction method has to be selected. The following prediction methods are available:

  1. SYFPEITHI [3] are position-specific scoring matrices (PSSMs) that were designed based on expert knowledge and amino acid occurrences in naturally processed HLA ligands.
  2. BIMAS [4] uses PSSMs derived from experimentally determined binding affinities measured as dissociation rates of the peptide:HLA:β2-microglobulin complex relative to a reference peptide.
  3. SVMHC [5] is a SVM based classification method that was trained experimentally validated epitopes from the SYFPEITHI database and random generated peptides.
  4. NetMHC family [6-9] comprises of NetMHC, NetMHCpan, NetMHCII, and NetMHCIIpan, which are all artificial neural network based regression methods. Furthermore, NetMHCpan and NetMHCIIpan incorporate structural information of the HLA-binding pockets to allow prediction for HLA alleles with insufficient data.
  5. UniTope [10] is a SVM based prediction method that also combines structural information of the HLA binding groove with epitope sequences. In comparison to NetMHC(II)pan, the peptides are encoded using physicochemical properties.
  6. TEPITOPEpan [11] uses PSSMs for epitope prediction and is based on Sturniolo et al’s virtual binding pocket approach. To allow predictions for alleles that originally were not covered by Sturniolo et al. TEPITOPEpan uses a phylogenetic-based weighting approach to reconstruct the allele-specific PSSM from the original matrices.

Step 3: HLA selection

In this step the alleles for which predictions should be performed have to be selected. A tree is generated based on the supported alleles of the previous selected prediction methods (Figure 2). Only the shared HLA alleles are displayed if multiple prediction methods were selected. If an HLA Allele file was specified the supported alleles are filter based on the contained alleles in the Allele file.

 HLA allele tree for SYFPEITHI Figure 2: HLA allele tree for SYFPEITHI

By checking higher levels of the tree all HLA alleles of the lower levels are selected as well. If no HLA-Tree is generated or your favorite HLA allele is nowhere to be found, please click back and select a different prediction method.

Step 4: Results

Two outputs are generated. The first output is an internal representation of the predictions that can be directly used as input for Epitope Selection. The second output is a detailed and interactive html output of the prediction results.

Figure 3. Example result page.

The results are presented in a sortable and searchable table. Each row represents one prediction result of an epitope and a prediction method. The results can be exported in either CSV of Excel format by clicking Save and selecting the desired format. By clicking Print, the table is completely extended to be able to use the Browser print functionality. To return to the normal view hit ESC.


  1. Pruitt K.D. , Tatusova T. , Maglott D.R. (2007) NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 35:D61–D65
  2. The UniProt Consortium The Universal Protein Resource (UniProt). Nucleic Acids Res 35:D193–D197 (2007)
  3. Rammensee H. , Bachmann J. , Emmerich N.P. , Bachor O.A. , Stevanovic S. (1999) SYFPEITHI: database for MHC ligands and peptide motifs. Immunogenetics 50:213–219.
  4. Parker K.C. , Bednarek M.A. , Coligan J.E. (1994) Scheme for ranking potential HLA-A2 binding peptides based on independent binding of individual peptide side-chains. J Immunol 152:163–175.
  5. Dönnes P., Kohlbacher O. (2006) SVMHC: a server for prediction of MHC-binding peptides. Nucleic Acids Res 34:W194–W197.
  6. Lundegaard C, Lamberth K, Harndahl M, Buus S, Lund O, Nielsen M. (2008) NetMHC-3.0: accurate web accessible predictions of human, mouse and monkey MHC class I affinities for peptides of length 8-11. Nucleic Acids Res. 1;36(Web Server issue):W509-12.
  7. Nielsen M, Lundegaard C, Blicher T, Lamberth K, Harndahl M, et al. (2007) NetMHCpan, a Method for Quantitative Predictions of Peptide Binding to Any HLA-A and -B Locus Protein of Known Sequence. PLoS ONE 2(8): e796. doi: 10.1371/journal.pone.0000796
  8. Nielsen, M. and Lund, O. (2009) NN-align. An artificial neural network-based alignment algorithm for MHC class II peptide binding prediction. BMC bioinformatics, 10, 296.
  9. Karosiene E, Rasmussen M, Blicher T, Lund O, Buus S, and Nielsen M. (2013) NetMHCIIpan-3.0, a common pan-specific MHC class II prediction method including all three human MHC class II isotypes, HLA-DR, HLA-DP and HLA-DQ. Immunogenetics.
  10. Toussaint N. C, Feldhahn M, Ziehm M, Stevanovic M, and Kohlbacher O. (2011) T-cell epitope prediction based on self-tolerance. Proc. ICIW.
  11. Zhang L, Chen Y, Wong H-S, Zhou S, Mamitsuka H, et al. (2012) TEPITOPEpan: Extending TEPITOPE for Peptide Binding Prediction Covering over 700 HLA-DR Molecules. PLoS ONE 7(2): e30483. doi: 10.1371/journal.pone.0030483
polymorphic_prediction.txt · Last modified: 2014/12/18 14:50 by schubert