7
Genotypes Analyzed
HPV16
Reference Sequence
51.2%
Avg DBD Identity
31.3%
Avg Hinge Identity
HPV16 E2 Protein Domain Structure
The E2 protein consists of two highly conserved domains separated by a flexible, variable hinge region.
TAD
Hinge
DBD
aa 1-201
Replication & Transcription
Replication & Transcription
aa 202-285
Flexible Linker
Flexible Linker
aa 286-365
Dimerization & DNA Binding
Dimerization & DNA Binding
Genotype Comparison Data
| Genotype ↕ | Risk Level ↕ | Length ↕ | Overall ID vs HPV16 ↕ | TAD ID ↕ | Hinge ID ↕ | DBD ID ↕ |
|---|
Frequently Asked Questions
What is the E2 protein in HPV?
The E2 protein is a master regulator of the HPV lifecycle. It controls viral replication, transcription regulation, and genome maintenance. During integration of HPV into the host genome (which can lead to cancer), the E2 gene is often disrupted, leading to uncontrolled expression of oncogenes E6 and E7.
Why compare E2 sequences across genotypes?
Comparing sequences helps identify highly conserved regions that are common across multiple HPV types. These regions are potential targets for broad-spectrum antiviral drugs that could treat infections from various high-risk and low-risk HPV types, rather than just targeting a single genotype.
Which E2 domain is the most conserved?
Generally, the DNA-binding domain (DBD) and the transactivation domain (TAD) show higher conservation compared to the flexible hinge region. Our data shows the DBD often has the highest sequence identity across divergent genotypes, making it an attractive target for therapeutic intervention.
What is the reference sequence for this study?
HPV16 E2 (UniProt ID: P03120) is used as the reference sequence. HPV16 is the most prevalent high-risk genotype responsible for cervical and oropharyngeal cancers, making its E2 protein the most thoroughly characterized.
How is sequence identity calculated?
Sequence identity is calculated using the Needleman-Wunsch algorithm for global pairwise alignment against the HPV16 reference. Identities are calculated separately for the full protein and its individual functional domains (TAD, Hinge, and DBD).
Methodology & Sources
Data Collection: Protein sequences for E2 across various HPV genotypes were retrieved from the UniProt Knowledgebase (UniProtKB) via REST API. Papillomavirus Episteme (PAVE) nomenclature was used for genotype classification.
Alignment & Analysis:
- Reference sequence: HPV16 E2 (UniProt P03120).
- Domain boundaries defined based on structural biology consensus for HPV16: TAD (aa 1-201), Hinge (aa 202-285), DBD (aa 286-365).
- Pairwise global alignment performed using the Needleman-Wunsch algorithm to calculate percent sequence identity.
- Risk level classifications are based on IARC monographs regarding the oncogenic potential of HPV types.
Primary Data Sources:
1. UniProt Consortium. (2023). UniProt: the Universal Protein Knowledgebase in 2023. Nucleic Acids Research, 51(D1), D523-D531. DOI: 10.1093/nar/gkac1052
2. Van Doorslaer K, et al. (2017). The Papillomavirus Episteme: a major update to the PAVE database. Nucleic Acids Research, 45(D1), D499-D506. PMID: 27899637
3. McBride AA. (2013). The papillomavirus E2 proteins. Virology, 445(1-2), 57-79. PMID: 23931980
1. UniProt Consortium. (2023). UniProt: the Universal Protein Knowledgebase in 2023. Nucleic Acids Research, 51(D1), D523-D531. DOI: 10.1093/nar/gkac1052
2. Van Doorslaer K, et al. (2017). The Papillomavirus Episteme: a major update to the PAVE database. Nucleic Acids Research, 45(D1), D499-D506. PMID: 27899637
3. McBride AA. (2013). The papillomavirus E2 proteins. Virology, 445(1-2), 57-79. PMID: 23931980