E2 Protein Sequence Comparison Across HPV Genotypes

Genotypes Analyzed

HPV16

Reference Sequence

51.2%

Avg DBD Identity

31.3%

Avg Hinge Identity

HPV16 E2 Protein Domain Structure

The E2 protein consists of two highly conserved domains separated by a flexible, variable hinge region.

TAD

Hinge

DBD

aa 1-201
Replication & Transcription

aa 202-285
Flexible Linker

aa 286-365
Dimerization & DNA Binding

Genotype Comparison Data

Genotype ↕	Risk Level ↕	Length ↕	Overall ID vs HPV16 ↕	TAD ID ↕	Hinge ID ↕	DBD ID ↕

Frequently Asked Questions

What is the E2 protein in HPV?

The E2 protein is a master regulator of the HPV lifecycle. It controls viral replication, transcription regulation, and genome maintenance. During integration of HPV into the host genome (which can lead to cancer), the E2 gene is often disrupted, leading to uncontrolled expression of oncogenes E6 and E7.

Why compare E2 sequences across genotypes?

Comparing sequences helps identify highly conserved regions that are common across multiple HPV types. These regions are potential targets for broad-spectrum antiviral drugs that could treat infections from various high-risk and low-risk HPV types, rather than just targeting a single genotype.

Which E2 domain is the most conserved?

Generally, the DNA-binding domain (DBD) and the transactivation domain (TAD) show higher conservation compared to the flexible hinge region. Our data shows the DBD often has the highest sequence identity across divergent genotypes, making it an attractive target for therapeutic intervention.

What is the reference sequence for this study?

HPV16 E2 (UniProt ID: P03120) is used as the reference sequence. HPV16 is the most prevalent high-risk genotype responsible for cervical and oropharyngeal cancers, making its E2 protein the most thoroughly characterized.

How is sequence identity calculated?

Sequence identity is calculated using the Needleman-Wunsch algorithm for global pairwise alignment against the HPV16 reference. Identities are calculated separately for the full protein and its individual functional domains (TAD, Hinge, and DBD).

Methodology & Sources

Data Collection: Protein sequences for E2 across various HPV genotypes were retrieved from the UniProt Knowledgebase (UniProtKB) via REST API. Papillomavirus Episteme (PAVE) nomenclature was used for genotype classification.

Alignment & Analysis:

Reference sequence: HPV16 E2 (UniProt P03120).
Domain boundaries defined based on structural biology consensus for HPV16: TAD (aa 1-201), Hinge (aa 202-285), DBD (aa 286-365).
Pairwise global alignment performed using the Needleman-Wunsch algorithm to calculate percent sequence identity.
Risk level classifications are based on IARC monographs regarding the oncogenic potential of HPV types.

Primary Data Sources:
1. UniProt Consortium. (2023). UniProt: the Universal Protein Knowledgebase in 2023. Nucleic Acids Research, 51(D1), D523-D531. DOI: 10.1093/nar/gkac1052
2. Van Doorslaer K, et al. (2017). The Papillomavirus Episteme: a major update to the PAVE database. Nucleic Acids Research, 45(D1), D499-D506. PMID: 27899637
3. McBride AA. (2013). The papillomavirus E2 proteins. Virology, 445(1-2), 57-79. PMID: 23931980