biochemistry

An Introduction to Ebolavirus Biology

November 4, 2025November 4, 2025By logancollins No Comments

PDF version: An Introduction to Ebolavirus Biology – Logan Thrasher Collins

I wrote this educational primer as a fun exploration of a topic not related to my current research. While such knowledge may be useful in the event of some future ebolavirus epidemic, it is mostly just an exercise in curiosity and intellectual enrichment. I hope that you too enjoy learning about this fascinating (but scary!) virus as you browse my writeup. Also, if you’re an ebolavirus expert with concepts, edits, and/or ideas to offer, feel free to reach out with your additional insights! Shoutout: I’d like to give a special shoutout/thanks to Jain et al. (reference 4) and Bodmer et al. (reference 2). I used their papers extensively throughout the creation of writeup!

Genome

The ebolavirus genome consists of an 18.9 kb negative-sense single-stranded RNA (ssRNA) which encodes seven genes.^1,2 Each gene is flanked by a 3’ and 5’ untranslated region (3’UTR and 5’UTR) which contain start and end signals. The start signals have the consensus sequence of 3’-CUNCUUCUAAUU-5’ and the end signals have the consensus sequence 3’-UAAUUC(U)_5/6-5’. Since 3’UAAUU-5’ is found in both the start and end signals, they can overlap and (for most types of ebolavirus) do so at the junctions between the VP35-VP40, GP-VP30, and VP24-L genes. The rest of the genes have intergenic regions with non-overlapping start and end signals between them.

The 5’ and 3’ ends of the genome contain elements called the 5’ trailer and 3’ leader. The 5’ trailer contains parts of the antigenomic replication promoter and the 3’ leader contains parts of the genomic replication promoter. There is also a second genomic replication promoter in the NP untranslated region. Genomic replication promoters initiate RNA-dependent RNA polymerase (RdRP) replication of the negative-sense ssRNA genome while antigenomic replication promoters initiate replication of the positive-sense copy version of the ssRNA genome.

In total, the ebolavirus genome encodes seven proteins.¹ The seven proteins encoded by the ebolavirus genome include NP (nucleoprotein), VP24 (membrane-associated protein interfering with interferon signaling), VP30 and VP35 (polymerase matrix protein acting as interferon antagonist), L (the RdRP for replication), VP40 (matrix protein), and GP (glycoprotein).^1,2 The proteins will be discussed with more detail in the next section.

The GP RNA itself undergoes mRNA editing, so the GP can take three different forms.^2,3 The unedited GP mRNA (~80% of transcripts) encodes a precursor of soluble glycoprotein or sGP. The edited GP₀ mRNA (~20% of transcripts)^4,5 arises from viral polymerase stuttering at a slippage region sequence of seven consecutive uridines, which leads to addition of an adenosine and a frameshift allowing expression of GP_1,2. VP30 may help facilitate resolution of a stem loop involved in the stuttering of the viral polymerase.⁶ Finally, sometimes either two adenosines are added or one adenosine is omitted from the mRNA (5% of transcripts), leading instead to expression of a small soluble GP precursor protein (ssGP).^2,3

Structure

At a glance, ebolavirus consists of its ssRNA genome, a nucleocapsid and accessory proteins, and an envelope bearing its glycoproteins. The NP adopts a helical structure when complexed with the ssRNA genome, forming the nucleocapsid.⁷ VP35 and VP24 associate with the surface of the NP-RNA complex. VP40 forms the matrix between the envelope and the nucleocapsid. VP30 also binds the nucleocapsid and is important for transcription initiation.⁴ GP is a transmembrane protein which plays roles in cellular attachment and transduction.

NP’s main function is to encapsidate the ssRNA genome, forming a helical ssRNA-NP complex (the nucleocapsid).^2,8 The NPs form a left-handed helix with 24 subunits per turn. Each NP subunit binds six nucleotides of ssRNA via a positively charged cleft on the outside of the NP helix. The NP forms the core of a repeating asymmetric unit consisting of two NPs associated with two oppositely-oriented VP24 proteins, one of which in turn associates with a VP35 protein.

Image adapted from reference 8 (Sugita et al.)

In the cryo-EM structure⁸ above, the nucleocapsid helix of NP and ssRNA is displayed. VP24 and VP35 are not shown, though they also associate with the nucleocapsid.

VP24

VP24’s interactions and association with NP are required for nucleocapsid formation as well as for helping package the nucleocapsid into virions.⁴ It is involved in the initiation of viral budding. VP24 additionally inhibits the host cell’s immune responses. It inhibits IFN responses by blocking p38 phosphorylation, which inhibits the p38 MAPK pathway. It also can block NF-κB activation, precluding multiple downstream IFN gene expression pathways. VP24 can inhibit nuclear translocation of the phosphorylated transcription factor STAT1 by interacting with importins of the NPI-1 subfamily of importin-α.

VP35

VP35 is a tetrameric protein which plays a structural role in ebolavirus by associating with the surface of the nucleocapsid. It furthermore acts as a polymerase cofactor which bridges the NP-RNA complex with L (the polymerase) during replication.⁷ It has helicase and NTPase activities, which indicate that it may unwind RNA helices and hydrolyze NTPs to facilitate transcription and replication.⁴ VP35 also helps facilitate genome packaging and nucleocapsid assembly.

In addition, VP35 inhibits host cell immune responses.⁴ It interferes with the dimerization, phosphorylation, and nuclear localization of interferon regulatory factor 3 (IRF-3). It accomplishes this by preventing proteins TBK-1 and IKKε from interacting with IRF-3. Under normal circumstances, phosphorylation and dimerization of IRF-3 causes it to translocate to the nucleus and induce transcription of IFNα, IFNβ, and other genes. VP35 furthermore suppresses interferon transcription by enhancing SUMOylation of IRF-7 via interaction with PIAS1 (a type of SUMO ligase). VP35 also blocks PACT (which prevents activation of PACT-induced RIG-I ATPase) as well as inactivating PKR.

VP30

VP30 forms a hexamer composed of three dimers.⁴ It is required for RNA transcription initiation. It should be noted that VP30 has a disordered arginine-rich region in the middle of its sequence which interacts with the viral RNA. VP30 also interacts with NP, an interaction which must occur at a certain threshold level for optimal transcriptional activity. VP30 binds zinc, an essential capability for viral transcription initiation.

For transcription to occur, VP30 must either exhibit no phosphorylation (on serines and threonines) or have only partial phosphorylation along with constant phosphorylation-dephosphorylation activity. Partial phosphorylation is only acceptable at some stages of viral replication. By contrast, when it is phosphorylated, VP30 binds NP more robustly. This allows it to tightly associate with the nucleocapsid as new ebolavirus particles are synthesized.

VP40 (matrix protein)

Ebolavirus VP40 is abundantly expressed and associates with the plasma membrane of the host cell, where it facilitates viral assembly and budding.⁴ It contains two late budding domains (L-domains) of four amino acids each: PTAP and PPEY. The PTAP domain interacts with tumor susceptibility gene 101 protein (tsg101), which recruits VP40 to lipid raft domains on the plasma membrane. PPEY interacts with ubiquitin ligase Nedd4 and ubiquitin ligase ITCH E3, causing ubiquitination of the matrix proteins in certain ways, a requirement for budding.

VP40 can form dimers, hexamers, filaments, and octamers. Dimerization of VP40 is essential for binding to Sec24c and trafficking to the plasma membrane. Sec24c is a component of the coat protein complex II (COPII) which facilitates formation of transport vesicles traveling from the endoplasmic reticulum to the Golgi apparatus, enabling eventual transport to the plasma membrane.⁹ Dimers can assemble into filaments via VP40’s C-terminal domain residues, which is crucial for matrix assembly and budding.⁴

VP40 contains a C-terminal domain with a hydrophobic interface which penetrates the plasma membrane to anchor the matrix protein and facilitate assembly and budding.¹⁰ Interestingly, VP40 has been shown to selectively anchor onto the plasma membrane via interactions with the enriched anionic phospholipids like phosphatidylserine found in the plasma membrane. At the membrane, the dimers assemble into linear hexamers which are also important for assembly and budding. VP40 can additionally form octameric rings which are essential for VP40-ssRNA binding. Oligomers of VP40 have also been implicated as inhibitory regulators of viral transcription.¹¹

L protein

The L protein is the RNA-dependent RNA polymerase (RdRP) of the ebolavirus.⁴ It is a fairly large (2212 amino acids) protein consisting of five domains: (i) the RdRP domain which facilitates transcription and replication and polyadenylation, (ii) the capping domain which has polyribonucleotidyl transferase activity, (iii) a connector domain, (iv) a methyltransferase domain, and (v) a small C-terminal domain. The capping domain transfers a GDP to the 5’ phosphate of the viral mRNA. The methyltransferase then methylates the first nucleotide at the 2’-O position and the guanosine cap at the N-7 position. The small C-terminal domain plays a role in recruiting RNAs before methylation.

Additionally, the first 450 amino acids contain a homo-oligomerization domain which overlaps with the RdRP domain. The first 380 amino acids furthermore contain a domain for interaction with the VP35 protein, allowing localization of the L protein into viral inclusion bodies during assembly.

GP (glycoprotein) is a fusogenic transmembrane protein.⁴ It has a cathepsin binding site which is cleaved inside the endosome as a step in viral infection. It is also post-translationally cleaved by furin from its precursor GP₀ to make GP1 and GP2 subunits, which remain linked by disulfide bonds. Three GP_1,2 complexes associate to form the trimeric GP that is displayed on the ebolavirus envelope surface.⁵

GP1 mediates attachment to host cell receptors via its receptor binding domain (RBD).⁴ There is a heavily glycosylated mucin-like domain (MLD) in GP1, which can stimulate host dendritic cells by activating their MAPK and NF-κB pathways. GP1 furthermore contains another important heavily glycosylated domain called the glycan cap (though it is in the middle of the GP1 sequence).¹²

GP2 facilitates fusion of viral envelopes with host cell membranes. It does this by inserting a hydrophobic loop domain into the endosomal membrane, bringing the envelope into close contact with said endosomal membrane.⁴ GP2 also contains a transmembrane anchor domain to help tether the GP to the envelope. GP2 furthermore can inhibit cellular antiviral responses. Firstly, it interferes with tetherin activity (tetherins are host cell proteins that aim to prevent viral budding by “tethering” the virus). It also interferes with NF-κB signaling pathways. GP2 can additionally trigger lymphocyte apoptosis and cytokine dysregulation via an immunosuppressive C-terminal motif.

GP is subject to heavy post-translational glycosylation, protecting the protein against host antibodies.⁴ GP1 contains 95 glycosylation sites and GP2 contains an additional 2 known glycosylation sites. In particular, MLD is highly glycosylated, allowing it to mask cell-surface proteins like MHC-I (thus inhibiting CD8⁺ T cell responses).

GP_1,2 can be cleaved away from the viral envelope by the host enzyme TACE (TNF-α converting enzyme), leading to shed GP.^4,13 The shed GP can sequester antibodies, acting as an immunological decoy. Shed GP furthermore contributes to triggering various inflammatory cytokines.

GP_1,2 expression makes up only 20% of the total expressed protein from the GP gene.^4,5 The other 80% consists of soluble secreted glycoprotein (sGP), Δ peptide, and small soluble secreted glycoprotein (ssGP). Both the GP_1,2 and the ssGP are transcribed only when different ribosomal stuttering events occur during transcription as described earlier (in the genome section).

The sGP is a secreted protein which may serve as an immunological decoy which (as with shed GP) binds antibodies and thereby reduces the available antibodies that can bind to the virus itself.^4,5 It has 7 glycosylation sites. In addition, sGP might inhibit inflammatory cytokines and chemokines, further helping the virus evade immunological responses.⁴

A small C-terminal region of sGP can be cleaved off to form Δ peptide.^4,5 The Δ peptide also acts as an immunological decoy. Δ peptide can inhibit entry of ebolavirus into certain cells, preventing superinfection. In addition, Δ peptide may act as a viroporin, forming pores in mammalian cells.

The ssGP consists of an N-terminal region of 295 amino acids which are identical to GP₀ and sGP and a C-terminal region of 3 amino acids which are distinct. It is secreted as a disulfide bonded homodimer which undergoes glycosylation. Its function remains unknown.⁴

Life Cycle

Attachment

Ebolavirus begins its life cycle by leveraging GP_1,2 to attach to host cell receptors.² There are three known mechanisms for attachment including (i) binding of C-type lectins, (ii) interaction with phosphatidylserine-binding receptors, and (iii) antibody-dependent enhancement.

C-type lectins bind the GP’s glycans found on the MLD as well as the glycan cap.² Such lectins are mainly expressed on antigen presenting cells (dendritic cells, monocytes, macrophages, etc.) which are a primary target cell of ebolaviruses. However, they are not required or sufficient for entry, so they act as accessory receptors.

During budding, ebolavirus incorporates the host scramblase XKR8 into its envelope (via interactions with GP_1,2),² which randomly swaps phospholipids between inner and outer membrane leaflets. It should be noted that other scramblases like TMEM16F might also be used by the virus in the same way. The scramblases expose phosphatidylserine on the envelope’s surface (phosphatidylserine is normally found on the inner leaflet rather than the outer leaflet). As a result, phosphatidylserine receptors (TIM-1, TIM-4, Axl, and Mer) on the host cell membrane can bind the phosphatidylserine on the viral envelope. Since exposure of phosphatidylserine is normally used by the host to induce phagocytosis of apoptotic cell debris, the presence of phosphatidylserine on the ebolavirus envelope targets it for uptake into phagocytes. This is called “apoptotic mimicry”.

Antibody-dependent enhancement is when anti-ebolavirus antibodies bind to the virus and immune cell Fc receptors bind the antibodies.² Complement factor C1q can also bind the ebolavirus-antibody complexes and attach virions to immune cell surfaces. Though these pathways normally facilitate clearing of viruses by endocytic uptake and degradation, ebolavirus may leverage the process for infection instead.

Endosomal trafficking and fusion

After binding the cell surface, the ebolavirus is endocytosed via macropinocytosis, preferentially near host cell membrane lipid rafts.² Virions are trafficked from the early endosome to the late endosome. In the late endosome, the GP’s MLD and glycan cap are cleaved off by cathepsin B, cathepsin L, and/or other host cell proteases. This allows the GP to bind the intracellular receptor NPC1 (Niemann–Pick C1), which is found on the inner surface of the endosome. Low pH in the endosome causes acidification in the virus, which triggers disassociation of the VP40 matrix protein from the envelope, granting the virus more flexibility. It is thought that this flexibility may represent an additional prerequisite for fusion. Finally, the GP experiences a conformal change that causes insertion of GP2’s hydrophobic loop domain into the endosomal membrane, facilitating fusion with the envelope (a process dependent on certain pH and Ca²⁺ levels). After fusion, the nucleocapsid is released into the cytosol.

Transcription

Condensed nucleocapsids in the cytosol next begin RNA synthesis.² To do this, they use a ribonucleoprotein complex consisting of L, NP, VP35, and VP30. Primary transcription relies on proteins from the incoming virion while secondary transcription can also utilize proteins newly produced inside the host cell. L (along with its VP35 cofactor) catalyzes RNA polymerization as well as methylation (capping) of viral mRNAs as discussed earlier.

Cytosolic transcription initiated by the polymerase complex is assisted by VP30.² L starts at a site at the 3’ end of the genome and scans for the start signal of the first gene, which is the NP gene. The mRNA’s polyadenylation is triggered via polymerase slippage at the poly-uridine end signals which were discussed previously. L continues scanning the genome for the next start signal. It should be noted that scanning can occur in both directions along the genome. If the polymerase disassociates from the genome during scanning, it must return to the 3’ end to reinitiate. Because of this, genes close to the 3’ end of the genome are transcribed at a higher level than genes towards the 5’ end, a transcriptional gradient which might have functional significance.

Primary transcription occurs within 1-2 hours after infection.² After ~10 hours post-infection, ebolavirus causes the formation of cytosolic inclusion bodies that serve to facilitate secondary transcription and genome replication. These inclusion bodies are rich in L, NP, VP35, and VP30 as well as VP24, VP40, and certain host proteins such as CAD, STAU1, SMYD3, RBBP6, PEG10, hnRNP L, and RUVBL1. Ebolavirus inclusion bodies occur as membraneless phase-separated condensates driven by NP oligomerization interactions.

Viral mRNAs are exported from inclusion bodies by recruiting host NFX1 (nuclear RNA export factor 1).² NFX1 binds mRNAs within inclusion bodies and transports them out into the cytosol, where translation can occur. It has been shown that hypusinated eIF5A (eukaryotic initiation 5A) is required for viral mRNA translation. Note that hypusination is a post-translational modification where a lysine in eIF5A is converted to a non-canonical amino acid called hypusine.¹⁴ In addition, ADAR1 (adenosine deaminase acting on RNA 1) edits 3’ untranslated regions within viral mRNAs and thus alters some of their negative regulatory elements to no longer downregulate translation.²

The transition from transcription to replication is thought to involve VP30 phosphorylation.² Non-phosphorylated VP30 associates more strongly with the L-VP35 complex than its phosphorylated form. Phosphorylation of VP30 (and its lower affinity for L-VP35 in this form) may shift the focus of L-VP35 towards replication. When VP30 is phosphorylated, it also interacts more strongly with NP, which helps VP30 incorporate itself into new virions. That said, this process is not fully understood. Cellular kinases (SRPK1 and SRPK2) and phosphatases (PP2A-B56 and PP1) are sequestered into viral inclusion bodies to facilitate VP30 phosphorylation and dephosphorylation.

Replication

Only L, VP35, and NP are needed for ebolavirus replication (unlike transcription, which also needs VP30).² For replication, the genome is copied into an antigenome. Replication is initiated at the first C in the genome, which is actually position 2 in the sequence. As a result, the copies initially lack the 3’-terminal nucleotide. To fix this, it is believed that the 3’ region of the RNA folds into a hairpin structure which back-primes addition of the missing nucleotide. Both the genome and the antigenome are encapsidated by NP.² During the replication process, VP35 acts as a chaperone for monomeric NP that has not yet bound RNA. VP24 may cause nucleocapsids to transition from a relaxed state to a more condensed state.

Assembly and budding

After release from inclusion bodies, the nucleocapsids are transported along actin filaments to the plasma membrane where budding takes place.² GP_1,2 reaches the plasma membrane through the secretory pathway since it is a transmembrane protein. VP40 mediates budding by taking over parts of the host’s ESCRT (endosomal sorting complex required for transport) pathway. VP40 has a motif which recruits Tsg101 (an ESCRT-I component) to lipid rafts in the membrane. VP40 also has a motif which interacts with ubiquitin protein ligases (NEDD4, ITCH, WWP1, and SMURF2). These ubiquitin ligases ubiquitinate VP40, which facilitates its activity in budding. VP40 may also induce curvature across membrane phospholipids via its oligomerization. VP40 has a basic patch in its C-terminal domain which interacts with phosphatidylserine, which causes phosphatidylserine to cluster. Indeed, it has been shown that proper matrix layer formation requires phosphatidylserine clustering, so the interaction likely has functional importance.

References

1. Ghosh, S., Saha, A., Samanta, S. & Saha, R. P. Genome structure and genetic diversity in the Ebola virus. Curr. Opin. Pharmacol. 60, 83–90 (2021).

2. Bodmer, B. S., Hoenen, T. & Wendt, L. Molecular insights into the Ebola virus life cycle. Nat. Microbiol. 9, 1417–1426 (2024).

3. Martin, B., Hoenen, T., Canard, B. & Decroly, E. Filovirus proteins for antiviral drug discovery: A structure/function analysis of surface glycoproteins and virus entry. Antiviral Res. 135, 1–14 (2016).

4. Jain, S., Martynova, E., Rizvanov, A., Khaiboullina, S. & Baranwal, M. Structural and Functional Aspects of Ebola Virus Proteins. Pathogens vol. 10 at https://doi.org/10.3390/pathogens10101330 (2021).

5. Lee, J. E. & Saphire, E. O. Ebolavirus Glycoprotein Structure and Mechanism of Entry. Future Virol. 4, 621–635 (2009).

6. Mehedi, M. et al. Ebola Virus RNA Editing Depends on the Primary Editing Site Sequence and an Upstream Secondary Structure. PLOS Pathog. 9, e1003677 (2013).

7. Fujita-Fujiharu, Y. et al. Structural basis for Ebola virus nucleocapsid assembly and function regulated by VP24. Nat. Commun. 16, 2171 (2025).

8. Sugita, Y., Matsunami, H., Kawaoka, Y., Noda, T. & Wolf, M. Cryo-EM structure of the Ebola virus nucleoprotein–RNA complex at 3.6 Å resolution. Nature 563, 137–140 (2018).

9. Mancias, J. D. & Goldberg, J. Structural basis of cargo membrane protein discrimination by the human COPII coat machinery. EMBO J. 27, 2918–2928 (2008).

10. Adu-Gyamfi, E. et al. The Ebola Virus Matrix Protein Penetrates into the Plasma Membrane. J. Biol. Chem. 288, 5779–5789 (2013).

11. T., H. et al. Oligomerization of Ebola Virus VP40 Is Essential for Particle Morphogenesis and Regulation of Viral Transcription. J. Virol. 84, 7053–7063 (2010).

12. Peng, W. et al. Glycan shield of the ebolavirus envelope glycoprotein GP. Commun. Biol. 5, 785 (2022).

13. Ning, Y.-J., Deng, F., Hu, Z. & Wang, H. The roles of ebolavirus glycoproteins in viral pathogenesis. Virol. Sin. 32, 3–15 (2017).

14. McKenna, S. The first step of hypusination. Nat. Chem. Biol. 19, 664 (2023).

Key facts about protein vault biology

June 3, 2025June 5, 2025By logancollins No Comments

PDB 4V60: structure of a protein vault (MVP only)

Structure and organization:

Wild-type vaults consist of multiple copies of major vault protein (MVP), VPARP, and TEP1 proteins as well as small untranslated RNAs called vRNAs (Pupols, 2011).

Vaults are ~13 MDa in mass if VPARP, TEP1, and vRNAs are included along with the 78 MVPs (Galbiati et al., 2018). Each of the 78 MVP copies is ~97 kDa (Champion et al., 2009), so the hollow vault mass is ~7.7 MDa.

VPARP catalyzes poly-ADP ribosylation. It has been found to ribosylate itself and MVP. Its function is unknown. It contains the INT domain, which binds to the interior waist region of the vault (Pupols, 2011).

TEP1 (telomerase-associated protein 1) is found both associated with nuclear telomerase complexes and with cytosolic vaults. Its function is unknown (Pupols, 2011).

vRNAs are untranslated RNA polymerase III transcripts ranging from 80-150 nucleotides in length. Their function is unknown (Pupols, 2011).

Physiology and dynamics:

Vaults are found inside every cell in the human body at copy numbers of ~10⁴ vaults per cell in most cells but ~10⁵ in certain cell types (e.g. some immune cells) (Travis, 2024). They are especially abundant in tissues exposed to external stressors such as bronchus, renal proximal tubules, digestive tract, macrophages, and dendritic cells (Pupols, 2011). In embryonic tissues, vaults sometimes can occur at an impressive ~10⁷ copies per cell (Suprenant, 2002).

Vaults do not self-assemble from MVP on its own. They can only be made co-translationally on eukaryotic polyribosomes (Mrazek et al., 2014). Two copies of MVP, oriented in opposite directions, are first translated by two ribosomes. The N-terminal regions of these MVPs dimerize. As more ribosome pairs arrive in line, more MVP dimers are made. Lateral interactions between the dimers begin assembling the wall of the vault. In total, 39 copies of MVP dimer are translated on the polyribosome, leading to the formation of the final barrel-shaped vault structure.

Vaults have consistently been found as contaminants in purified extracellular vesicle (EV) preparations. There is evidence that vaults associate with the outside of EVs and are not protected beneath vesicular membranes (Liu et al., 2023). However, vaults have also been found to be released from cells in an EV-independent fashion wherein they are not bound to the outside of the EVs (Jeppesen et al., 2019). As such, they might be co-released alongside EVs.

Vaults frequently exchange halves when in solution, indicating their dynamic structural nature (Yang et al., 2010). Indeed, vaults have been proposed to experience a structural “breathing” motion.

Vaults disassemble at low pH, through mechanisms of half vault separation (Goldsmith et al., 2007) and/or weakening of the lateral associations between MVP copies (Llauró et al., 2016).

Vaults are cytosolic particles, but small amounts of them associate with the nuclear membrane at nuclear pore complexes and in some cell types (e.g. U373 glioblastoma cell line) about 5% of MVP is localized to the nucleus (Slesina et al., 2005).

As MVP is a self-protein, it is usually invisible to the immune system (Champion et al., 2009). Indeed, repeated intranasal administration of vaults carrying non-immunogenic proteins like mCherry-INT does not induce anti-vault antibodies even when MVP is fused to Z peptide (an Fc-binding peptide often used to conjugate antibodies for vault targeting to specific cell types). That said, the immune system’s tolerance to vaults can be broken if vaults carrying highly immunogenic proteins like chlamydia major outer membrane protein with INT (MOMP-INT). Repeated administration of vaults carrying MOMP-INT has been shown to induce anti-MVP antibodies.

Hints at function:

MVP knockout mice are viable but have lower survival rates when challenged with Pseudomonas aeruginosa (Frascotti et al., 2021).

Vault overexpression is found in multidrug resistant cancers, but so far this seems more of a correlation than a causation. Experimentally, overexpression of vaults alone does not produce the multidrug resistant phenotype (Frascotti et al., 2021).

In neurons, vaults localize at neurite tips and along axonal and dendritic microtubule networks. Vaults can co-precipitate with cytoplasmic RNAs that are known to be translated in response to synaptic activity (Frascotti et al., 2021).

Vaults are highly conserved (Daly et al., 2013; Slinning et al., 2024). They occur in mammals, amphibians, birds, fish, sea urchins, slime molds, and more. That said, insects, plants, and fungi do not have vaults.

References:

Champion, C. I., Kickhoefer, V. A., Liu, G., Moniz, R. J., Freed, A. S., Bergmann, L. L., Vaccari, D., Raval-Fernandes, S., Chan, A. M., Rome, L. H., & Kelly, K. A. (2009). A Vault Nanoparticle Vaccine Induces Protective Mucosal Immunity. PLOS ONE, 4(4), e5409. https://doi.org/10.1371/journal.pone.0005409

Daly, T. K., Sutherland-Smith, A. J., & Penny, D. (2013). In Silico Resurrection of the Major Vault Protein Suggests It Is Ancestral in Modern Eukaryotes. Genome Biology and Evolution, 5(8), 1567–1583. https://doi.org/10.1093/gbe/evt113

Frascotti, G., Galbiati, E., Mazzucchelli, M., Pozzi, M., Salvioni, L., Vertemara, J., & Tortora, P. (2021). The Vault Nanoparticle: A Gigantic Ribonucleoprotein Assembly Involved in Diverse Physiological and Pathological Phenomena and an Ideal Nanovector for Drug Delivery and Therapy. In Cancers (Vol. 13, Issue 4). https://doi.org/10.3390/cancers13040707

Galbiati, E., Avvakumova, S., La Rocca, A., Pozzi, M., Messali, S., Magnaghi, P., Colombo, M., Prosperi, D., & Tortora, P. (2018). A fast and straightforward procedure for vault nanoparticle purification and the characterization of its endocytic uptake. Biochimica et Biophysica Acta (BBA) – General Subjects, 1862(10), 2254–2260. https://doi.org/https://doi.org/10.1016/j.bbagen.2018.07.018

Goldsmith, L. E., Yu, M., Rome, L. H., & Monbouquette, H. G. (2007). Vault Nanocapsule Dissociation into Halves Triggered at Low pH. Biochemistry, 46(10), 2865–2875. https://doi.org/10.1021/bi0606243

Jeppesen, D. K., Fenix, A. M., Franklin, J. L., Higginbotham, J. N., Zhang, Q., Zimmerman, L. J., Liebler, D. C., Ping, J., Liu, Q., Evans, R., Fissell, W. H., Patton, J. G., Rome, L. H., Burnette, D. T., & Coffey, R. J. (2019). Reassessment of Exosome Composition. Cell, 177(2), 428-445.e18. https://doi.org/10.1016/j.cell.2019.02.029

Liu, X., Nizamudeen, Z., Hill, C. J., Parmenter, C., Arkill, K. P., Lambert, D. W., & Hunt, S. (2023). Vault particles are common contaminants of extracellular vesicle preparations. BioRxiv, 2023.11.09.566362. https://doi.org/10.1101/2023.11.09.566362

Llauró, A., Guerra, P., Kant, R., Bothner, B., Verdaguer, N., & de Pablo, P. J. (2016). Decrease in pH destabilizes individual vault nanocages by weakening the inter-protein lateral interaction. Scientific Reports, 6(1), 34143. https://doi.org/10.1038/srep34143

Mrazek, J., Toso, D., Ryazantsev, S., Zhang, X., Zhou, Z. H., Fernandez, B. C., Kickhoefer, V. A., & Rome, L. H. (2014). Polyribosomes Are Molecular 3D Nanoprinters That Orchestrate the Assembly of Vault Particles. ACS Nano, 8(11), 11552–11559. https://doi.org/10.1021/nn504778h

Pupols, M. D. (2011). Packaging RNA into Vault Nanoparticles to Develop a Novel Delivery System for RNA Therapeutics. In ProQuest Dissertations and Theses. University of California, Los Angeles PP – United States — California.

Slesina, M., Inman, E. M., Rome, L. H., & Volknandt, W. (2005). Nuclear localization of the major vault protein in U373 cells. Cell and Tissue Research, 321(1), 97–104. https://doi.org/10.1007/s00441-005-1086-8

Slinning, M. S., Nthiga, T. M., Eichner, C., Khadija, S., Rome, L. H., Nilsen, F., & Dondrup, M. (2024). Major vault protein is part of an extracellular cement material in the Atlantic salmon louse (Lepeophtheirus salmonis). Scientific Reports, 14(1), 15240. https://doi.org/10.1038/s41598-024-65683-0

Suprenant, K. A. (2002). Vault Ribonucleoprotein Particles: Sarcophagi, Gondolas, or Safety Deposit Boxes? Biochemistry, 41(49), 14447–14454. https://doi.org/10.1021/bi026747e

Travis, J. (2024). The vault guy. Science (New York, NY), 384(6700), 1058–1062.

Yang, J., Kickhoefer, V. A., Ng, B. C., Gopal, A., Bentolila, L. A., John, S., Tolbert, S. H., & Rome, L. H. (2010). Vaults Are Dynamically Unconstrained Cytoplasmic Nanoparticles Capable of Half Vault Exchange. ACS Nano, 4(12), 7229–7240. https://doi.org/10.1021/nn102051r

The Virus Zoo: A Primer on Molecular Virology

July 27, 2022January 29, 2023By logancollins 4 Comments

Click here for a PDF version of the virus zoo

Human Immunodeficiency Virus (HIV)

Genome and Structure:

HIV’s genome is a 9.7 kb linear positive-sense ssRNA.¹ There is a m⁷G-cap (specifically the standard eukaryotic m⁷GpppG as added by the host’s enzymes) at the 5’ end of the genome and a poly-A tail at the 3’ end of the genome.² The genome also has a 5’-LTR and 3’-LTR (long terminal repeats) that aid its integration into the host genome after reverse transcription, that facilitate HIV genetic regulation, and that play a variety of other important functional roles. In particular, it should be noted that the integrated 5’UTR contains the HIV promoter called U3.^3,4

HIV’s genome translates three polyproteins (as well as several accessory proteins). The Gag polyprotein contains the HIV structural proteins. The Gag-Pol polyprotein contains (within its Pol component) the enzymes viral protease, reverse transcriptase, and integrase. The Gag-Pol polyprotein is produced via a –1 ribosomal frameshift at the end of Gag translation. Because of the lower efficiency of this frameshift, Gag-Pol is synthesized 20-fold less frequently than Gag.⁵ The frameshift’s mechanism depends upon a slippery heptanucleotide sequence UUUUUUA and a downstream RNA secondary structure called the frameshift stimulatory signal (FSS).⁶ This FSS controls the efficiency of the frameshift process.

The HIV RNA genome undergoes alternative splicing to produce the rest of the viral proteins. One splicing event produces an RNA that separately encodes the Vpu protein and the Env protein (also called gp160).^6–8 A mechanism called ribosome shunting is used to transition from Vpu’s open reading frame to Env’s open reading frame. The Env protein contains the gp41 and gp120 proteins. Env is post-translationally cleaved into gp41 and gp120 by a host furin enzyme in the endoplasmic reticulum.⁹ It is important to note that Env is also heavily glycosylated post-translationally to help HIV evade the immune system. Several other complex splicing events lead to the production of RNAs encoding Tat, Rev, Nef, Vif, and Vpr.

HIV viral protease cleaves the Gag polyprotein and thus produces structural proteins including the capsid protein CA (also called p24), the matrix protein MA (also called p17), the nucleocapsid protein NC (also called p7), and the p6 peptide.¹⁰ The HIV core capsid is shaped like a truncated cone and consists of about 1500 CA monomers. Most of the CA proteins assemble into hexamers, but a few pentamers are present. The pentamers help give the core capsid its conical morphology by providing extra curvature near the top and bottom. Each core capsid contains two copies of the HIV genomic RNA, complexed with NC protein. Reverse transcriptase, integrase, and viral accessory proteins are also held within the core capsid. HIV’s core capsid is packaged into a lipid envelope that bears gp41-gp120 glycoprotein heterodimers. The MA protein forms a layer between the core capsid and the envelope.

Accessory proteins Vpu, Tat, Rev, Nef, Vif, and Vpr facilitate a variety of functions. Vpu induces degradation of CD4 proteins within the endoplasmic reticulum of host CD4+ T cells. It does this by using its cytosolic domain as a molecular adaptor between CD4 and a ubiquitin ligase (which subsequently triggers proteosomal degradation of the CD4).¹¹ The reason that Vpu does this is to prevent HIV superinfection wherein two different types of HIV might infect the same cell and interfere with each other. This is an example of competition between viruses.¹² Vpu also enhances release of HIV virions from infected cells by using its cytosolic domain to inhibit a host protein called tetherin (also known as BST-2).¹¹ Without Vpu, tetherin would bind the viral envelope to the cell surface as well to other HIV virus particles, impeding release.

Tat, also called the viral transactivator protein, is necessary for efficient transcriptional elongation of the HIV genome after integration into the host DNA.¹³ Tat binds the viral transactivation response element (TAR), a structured RNA motif present at the beginning of the HIV transcripts. It then recruits protein positive transcription elongation factor b (P-TEFb). This allows P-TEFb to phosphorylate certain residues in the C-terminal domain of RNA polymerase II, stimulating transcriptional elongation. Tat also recruits several of the host cell’s histone acetyltransferases to the viral 5’-LTR so as to open the chromatin around the U3 promoter and related parts of the integrated HIV genome.^3,4 Finally, Tat is secreted from infected cells¹⁴ and acts as an autocrine and paracrine signaling molecule.⁴ It inhibits antigen-specific lymphocyte proliferation, stimulates expression of certain cytokines and cytokine receptors, modulates the activities of various host cell types, causes neurotoxicity in the brain, and more.

Rev facilitates nuclear export of the unspliced and singly spliced HIV RNAs by binding to a sequence located in the Env coding region called the Rev response element (RRE).¹³ The Rev protein forms a dimer upon binding to the RRE and acts as an adaptor, binding a host nuclear export factor called CRM1. Rev is also known to form higher-order oligomers via cooperative multimerization of the RNA-bound dimers.

Nef is a myristoylated protein that downregulates certain host T cell proteins and thereby increases production of virus. Nef is localized to the cytosol and the plasma membrane. It specifically inhibits CD4, Lck, CTLA-4, and Bad.¹⁵ Downregulating CD4 contributes to the prevention of superinfection that also occurs with Vpu’s inhibition of CD4. Nef induces endocytosis of plasma membrane Lck protein and traffics it to recycling endosomes and the trans-Golgi network. At these intracellular compartments, Lck signals for Ras and Erk activation, which triggers IL-2 production. IL-2 causes T cells to grow and proliferate, leading to more T cells that HIV can infect and leading to activation of the machinery HIV needs to replicate itself within infected T cells. Nef triggers lysosomal degradation of CTLA-4. This is because CTLA-4 can serve as an off-switch for T cells, which would lead to inhibition of HIV replication if active. Nef inactivates the Bad protein via phosphorylation. Bad participates in apoptotic cascades, so Nef prevents apoptosis of the infected host cell in this way.

Vif forms a complex with the host antiviral proteins APOBEC3F and APOBEC3G and induces their ubiquitination and subsequent degradation by the proteosome.¹⁶ It also may inhibit these proteins through other mechanisms. APOBEC3F and APOBEC3G are cytidine deaminases that hypermutate the negative-sense strand of HIV cDNA, leading to weak or nonviable viruses.¹⁷ These proteins also interfere with reverse transcription by blocking tRNALys3 from binding to the HIV RNA 5’UTR (tRNALys3 usually acts as a primer to initiate reverse transcription of the HIV genome).¹⁸

Vpr facilitates nuclear import of the HIV pre-integration complex.¹⁹ The pre-integration complex consists of viral cDNA and associated proteins (uncoating and reverse transcription have already occurred at this stage). Vpr binds the pre-integration complex and recruits host importins to enable nuclear import. It may further enhance nuclear import through interactions with some of the nuclear pore proteins. In addition to nuclear import, Vpr has several more functions: it acts as a coactivator (along with other proteins) of the HIV 5’UTR’s U3 promoter, might influence NF-κB regulation, may modulate apoptotic pathways, and arrests the cell cycle at the G₂ stage.

Life cycle:

CD4+ T cells represent the primary targets of HIV, though the virus is also capable of infecting other cell types such as dendritic cells.²⁰ HIV infects CD4+ T cells through binding its gp120 glycoprotein to the CD4 receptor and the CCR5 coreceptor or the CXCR4 coreceptor.¹⁰ This triggers fusion of the viral envelope with the plasma membrane and allows the core capsid to enter the cytosol.

HIV’s core capsid is transported by motor proteins along microtubules to dock at nuclear pores. The nuclear pore complex has flexible cytosolic filaments composed primarily of the Nup358 protein, which interacts with the core capsid.²¹ These interactions guide the narrow end of the core capsid into the nuclear pore’s central channel. Next, the core capsid interacts with the central channel’s unstructured phenylalanine-glycine (FG) repeats that exist in a hydrogel-like liquid phase. As the core capsid translocates through the central pore, it binds the Nup153 protein, a component of the nuclear pore complex’s basket. Finally, many copies of the nucleoplasmic CPSF6 protein coat the core capsid and escort it towards its genomic site of integration. It is thought that the reverse transcription process begins inside of the core capsid at this point, leading to cDNA synthesis.^21,22 Buildup of newly made cDNA within the core capsid likely results in pressure that helps rupture the capsid structure, facilitating uncoating.

Tetrameric HIV integrase binds both of the viral LTRs and facilitates integration of the cDNA into the host genome.²³ Though integration sites vary widely, they are not entirely random. Host chromatin structure and other factors influence where the viral cDNA integrates.²⁴ Transcription of HIV RNAs can then proceed from the U3 promoter with the aid of the Tat protein and host factors. As described earlier, a series of RNA splicing events produce the various RNAs necessary to synthesize all of the different HIV proteins and polyproteins.

Env protein is trafficked to the cell membrane through the secretory pathway. It is cleaved by a host furin enzyme into gp41 and gp160 components during its time in the endoplasmic reticulum.⁹ Gag and Gag-Pol polyproteins are expressed cytosolically. Since Gag is post-translationally modified by amino-terminal myristoylation, it anchors to the cell membrane by inserting its myristate tail into the lipid bilayer.²⁵ Gag and a smaller number of Gag-Pol accumulate on the inner membrane surface and incorporate gp41-gp160 complexes. NC domains in the Gag proteins bind and help package the two copies of HIV genomic RNA. The p6 region of the Gag protein (located at the C-terminal end) then recruits host ESCRT-I and ALIX proteins, which subsequently sequester host ESCRT-III and VPS4 complexes to drive budding and membrane scission, releasing virus into the extracellular space. After this, the HIV viral protease (from within the Gag-Pol polyprotein) cleaves the Gag and Gag-Pol polyproteins into their constituent proteins, facilitating maturation of the released HIV particles.

SARS-CoV-2

Genome and Structure:

The SARS-CoV-2 genome consists of about 30 kb of linear positive-sense ssRNA. There is a m⁷G-cap (specifically m⁷GpppA₁) at the 5’ end of the genome and a 30-60 nucleotide poly-A tail at the 3’ end of the genome. These protective structures minimize exonuclease degradation.²⁶ The genome also has a 5’ UTR and a 3’ UTR which contain sequences that aid in transcriptional regulation and in packaging. The SARS-CoV-2 genome directly translates two partially overlapping polyproteins, ORF1a and ORF1b. There is a –1 ribosomal frameshift in ORF1b relative to ORF1a. Within the polyproteins, two self-activating proteases (Papain-like protease PLpro and 3-chymotrypsin-like protease 3CLpro) perform cleavage events that lead to the generation of the virus’s 16 non-structural proteins (nsps). It should be noted that the 3CLpro is also known as the main protease or M^pro. The coronavirus also produces 4 structural proteins, but these are not translated until after the synthesis of corresponding subgenomic RNAs via the viral replication complex. To create these subgenomic RNAs, negative-sense RNA must first be made and then undergo conversion back to positive-sense RNA for translation. Genes encoding the structural proteins are located downstream of the ORF1b section.

SARS-CoV-2’s four structural proteins include the N, E, M, and S proteins. Many copies of the N (nucleocapsid) protein bind the RNA genome and organize it into a helical ribonucelocapsid complex. The complex undergoes packaging into the viral envelope during coronavirus budding. Interactions between the N protein and the other structural proteins may facilitate this packaging process. The N protein also inhibits host immune responses by antagonizing viral suppressor RNAi and by blocking the signaling of interferon production pathways.²⁷

The transmembrane E (envelope) protein forms pentamers and plays a key but poorly understood role in the budding of viral envelopes into the endoplasmic reticulum Golgi intermediate compartment (ERGIC).^28–30 Despite its importance in budding, mature viral particles do not incorporate very many E proteins into their envelopes. One of the posttranslational modifications of the E protein is palmitoylation. This aids subcellular trafficking and interactions with membranes. E protein pentamers also act as ion channels that alter membrane potential.^31,32 This may lead to inflammasome activation, a contributing factor to cytokine storm induction.

The M (membrane) protein is the most abundant protein in the virion and drives global curvature in the ERGIC membrane to facilitate budding.^30,33 It forms transmembrane dimers that likely oligomerize to induce this curvature.³⁴ The M protein also has a cytosolic (and later intravirion) globular domain that likely interacts with the other structural proteins. M protein dimers also induce local curvature through preferential interactions with phosphatidylserine and phosphatidylinositol lipids.^29,30 M proteins help sequester S proteins into the envelopes of budding viruses.³⁵

The S (spike) protein of SARS-CoV-2 has been heavily studied due to its central roles in the infectivity and immunogenicity of the coronavirus. It forms a homotrimer that protrudes from the viral envelope and is heavily glycosylated. It binds the host’s ACE2 receptor (angiotensin-converting enzyme 2 receptor) and undergoes conformational changes to promote viral fusion.³⁶ The S protein undergoes cleavage into S1 and S2 subunits by the host’s furin protease during viral maturation.^37,38 This enhances SARS-CoV-2 entry into lung cells and may partially explain the virus’s high degree of transmissibility. The S1 fragment contains the receptor binding domain (RBD) and associated machinery while the S2 fragment facilitates fusion. Prior to cellular infection, most S proteins exist in a closed prefusion conformation where the RBDs of each monomer are hidden most of the time.³⁹ After the S protein binds ACE2 during transient exposure of one of its RBDs, the other two RBDs quickly bind as well. This binding triggers a conformational change in the S protein that loosens the structure, unleashing the S2 fusion component and exposing another proteolytic cleavage site called S2’. Host transmembrane proteases such as TMPRSS2 cut at S2’, causing the full activation of the S2 fusion subunit and the dramatic elongation of the S protein into the postfusion conformation. This results in the viral envelope fusing with the host membrane and uptake of the coronavirus’s RNA into the cell.

The 16 nsps of SARS-CoV-2 play a variety of roles. For instance, nsp1 shuts down host cell translation by plugging the mRNA entry channel of the ribosome, inhibiting the host cell’s immune responses and maximizing viral production.^40,41 Viral proteins still undergo translation because a conserved sequence in the coronavirus RNA helps circumvent the blockage through a poorly understood mechanism. The nsp5 protein is the protease 3CLpro.⁴² The nsp3 protein contains several subcomponents, including the protease PLpro. The nsp12, nsp7, and nsp8 proteins come together to form the RNA-dependent RNA polymerase (RdRp) that replicates the viral genome.^42,43 The nsp2 protein is likely a topoisomerase which functions in RNA replication. The nsp4 and nsp6 proteins as well as certain subcomponents of nsp3 restructure intracellular host membranes into double-membrane vesicles (DMVs) which compartmentalize viral replication.⁴⁴

Beyond the 4 structural proteins and 16 nsps of SARS-CoV-2, the coronaviral genome also encodes some poorly understood accessory proteins including ORF3a, ORF3b, ORF6, ORF7a, ORF7b, ORF8 and ORF9b.⁴⁵ These accessory proteins are non-essential for replication in vitro, but they are thought to be required for the virus’s full degree of virulence in vivo.

Life cycle:

As mentioned, SARS-CoV-2 infects cells by first binding a S protein RBD to the ACE2 receptor. This triggers a conformational change that elongates the S protein’s structure and reveals the S2 fusion fragment, facilitating fusion of the virion envelope with the host cell membrane.³⁹ Cleavage of the S’ site by proteases like TMPRSS2 aid this change from the prefusion to postfusion configurations. Alternatively, SARS-CoV-2 can enter the cell by binding to ACE2, undergoing endocytosis, and fusing with the endosome to release its genome (as induced by endosomal cathepsin proteases).⁴⁵ After release of the SARS-CoV-2 genome into the cytosol, the N protein disassociates and allows translation of ORF1a and ORF1b, producing polyproteins which are cleaved into mature proteins by the PLpro and 3CLpro proteases as discussed earlier.

The RdRp complex synthesizes negative-sense full genomic RNAs as well as negative-sense subgenomic RNAs. In the latter case, discontinuous transcription is employed, a process by which the RdRp jumps over certain sections of the RNA and initiates transcription separately from the rest of the genome.⁴⁶ The negative-sense RNAs are subsequently converted back into positive-sense full genomic RNAs and positive-sense subgenomic RNAs. The subgenomic RNAs are translated to make structural proteins and some accessory proteins.⁴⁵

As described earlier, the nsp4, nsp6, and parts of nsp3 proteins remodel host endoplasmic reticulum (ER) to create DMVs.⁴⁵ These DMVs are the site of the coronaviral genomic replication and serve to shield the viral RNA and RdRp complex from cellular innate immune factors. DMVs cluster together and are continuous with the ER mostly through small tubular connections. After replication, the newly synthesized coronavirus RNAs undergo export into the cytosol through molecular pore complexes that span both membranes of the DMVs.⁴⁷ These molecular pore complexes are composed of nsp3 domains and possibly other viral and/or host proteins.

Newly replicated SARS-CoV-2 genomic RNAs complex with N proteins to form helical nucleocapsids. To enable packaging, the nucleocapsids interact with M protein cytosolic domains which protrude at the ERGIC.⁴⁸ M proteins, E proteins, and S proteins are all localized to the ERGIC membrane. The highly abundant M proteins induce curvature of the membrane to facilitate budding. As mentioned, E proteins also play essential roles in budding, but the mechanisms are poorly understood. Once the virions have budded into the ERGIC, they are shuttled through the Golgi via a series of vesicles and eventually secreted out of the cell.

Adeno-associated virus (AAV)

Genome and Structure:

AAV genomes are about 4.7 kb in length and are composed of ssDNA. Inverted terminal repeats (ITRs) form hairpin structures at ends of the genome. These ITR structures are important for AAV genomic packaging and replication. Rep genes (encoded via overlapping reading frames) include Rep78, Rep68, Rep52, Rep40.⁴⁹ These proteins facilitate replication of the viral genome. As a Dependoparvovirus, additional helper functions from adenovirus (or certain other viruses) are needed for AAVs to replicate.

AAV capsids are about 25 nm in diameter. Cap genes include VP1, VP2, VP3 and are transcribed from overlapping reading frames.⁵⁰ The VP3 protein is the smallest capsid protein. The VP2 protein is the same as VP3 except that it includes an N-terminal extension with a nuclear localization sequence. The VP1 protein is the same as VP2 except that it includes a further N-terminal extension encoding a phospholipase A2 (PLA2) that facilitates endosomal escape during infection. In the AAV capsid, VP1, VP2, and VP3 are present at a ratio of roughly 1:1:10. It should be noted that this ratio is actually the average of a distribution, not a fixed number.

Frame-shifted start codons in the Cap gene region transcribe AAP (assembly activating protein) and MAAP (membrane associated accessory protein). These proteins help facilitate packaging and other aspects of the AAV life cycle.

Life cycle:

There are a variety of different AAV serotypes (AAV2, AAV6, AAV9, etc.) that selectively infect certain tissue types. AAVs bind to host cell receptors and are internalized by endocytosis. The particular receptors involved can vary depending on the AAV serotype, though some receptors are consistent across many serotypes. Internalization occurs most often via clathrin-coated pits, but some AAVs are internalized by other routes such as macropinocytosis or the CLIC/GEEC tubulovesicular pathway.⁵¹

After endocytosis, conformational changes in the AAV capsid lead to exposure of the PLA2 VP1 domain, which facilitates endosomal escape. The AAV is then transported to the nucleus mainly by motor proteins on cytoskeletal highways. It enters via nuclear pores and finishes uncoating its genome.

AAV genomes initiate replication using the ends of their ITR hairpins as primers. This leads to a series of complex steps involving strand displacement and nicking.⁴⁹ In the end, new copies of the AAV genome are synthesized. The Rep proteins are key players in this process. It is important to realize that AAVs can only replicate in cells which have also been infected by adenovirus or similar helper viruses (this is why they are called “adeno-associated viruses”). Adenoviruses provide helper genes encoding proteins (e.g. E4, E2a, VA) that are vital for the successful completion of the AAV life cycle. After new AAV capsids have assembled from VP1, VP2, and VP3 and once AAV genomes have been replicated, the ssDNA genomes are threaded into the capsids via pores at their five-fold vertices.

AAVs are nonpathogenic, though a large fraction of people possess antibodies against at least some serotypes, so exposure to them is fairly common.

Adenovirus

Genome and Structure:

Adenovirus genomes are about 36 kb in size and are composed of linear dsDNA. They possess inverted terminal repeats (ITRs) which help facilitate replication and other functions. These genomes contain a variety of transcriptional units which are expressed at different times during the virus’s life cycle.⁵² E1A, E1B, E2A, E2B, E3, and E4 transcriptional units are expressed early during cellular infection. Their proteins are involved in DNA replication, transcriptional regulation, and suppression of host immune responses. The L1, L2, L3, L4, and L5 transcriptional units are expressed later in the life cycle. Their products include most of the capsid proteins as well as other proteins involved in packaging and assembly. Each transcriptional unit can produce multiple mRNAs through the host’s alternative splicing machinery.

The capsid of the adenovirus is about 90 nm in diameter and consists of three major proteins (hexon, penton, and fiber proteins) as well as a variety of minor proteins and core proteins. Hexon trimer is the most abundant protein in the capsid, the pentameric pentons occur at the vertices, and trimeric fibers are positioned on top of the pentons.⁵³ The fibers point outwards from the capsid and end in knob domains which bind to cellular receptors. In Ad5, a commonly studied type of adenovirus, the fiber knob primarily binds to the coxsackievirus and adenovirus receptor (CAR). That said, it should be noted that Ad5’s fiber knob can also bind to alternative receptors such as vascular cell adhesion molecule 1 and heparan sulfate proteoglycans.

Minor capsid proteins include pIX, pIIIa, pVI, and pVIII. The pIX protein interlaces between hexons and helps stabilize the capsid. Though pIX is positioned in the crevices between the hexons, it is still exposed to the outside environment. By contrast, the pIIIa, pVI, and pVIII proteins bind to the inside of the capsid and contribute further structural stabilization. When the adenovirus is inside of the acidic endosome during infection, conformational changes in the capsid release the pVI protein, which facilitates endosomal escape through membrane lytic activity.

Adenovirus core proteins include pV, pVII, protein μ (also known as pX), adenovirus proteinase (AVP), pIVa2, and terminal protein (TP).⁵⁴ The pVII protein has many positively-charged arginine residues and so functions to condense the viral DNA. The pV protein bridges the core with the capsid through interactions with pVII and with pVI. AVP cleaves various adenoviral proteins (pIIIa, TP, pVI, pVII, pVIII, pX) to convert them to their mature forms.⁵⁵ The pIVa2 and pX proteins interact with the viral DNA and may play roles in packaging or replication. TP binds to the ends of the genome and is essential for localizing the viral DNA in the nucleus and for viral replication.

Life Cycle:

Adenovirus infects cells by binding its fiber knob to cellular receptors such as CAR (in the case of Ad5). The penton then binds certain α_v integrins, positioning the viral capsid for endocytosis.⁵⁶ When the endosome acidifies, the adenovirus capsid partially disassembles, fibers and pentons fall away, and pVI is released.⁵⁷ The pVI protein’s membrane lytic activity facilitates endosomal escape. Partially disassembled capsids then undergo dynein-mediated transport along microtubules and dock at the entrance to nuclear pores. The capsids further disassemble and releases DNA through the nuclear pore. This DNA remains complexed with pVII after it enters the nucleus.

Adenoviral transcription is initiated by the E1A protein, inducing expression of early genes.⁵⁸ This subsequently leads to expression of the E2, E3, and E4 transcriptional units, which help the virus escape immune responses. This cascade leads to expression of the L1, L2, L3, L4, and L5 transcriptional units, which mainly synthesize viral structural proteins and facilitate capsid assembly.

In the nucleus, adenovirus genomes replicate within dense complexes of protein that can be seen as spots via fluorescence microscopy. Replication begins at the ITRs and is primed by TP.⁵⁹ Several more viral proteins and host proteins also aid the initiation of replication. Nontemplate strands are displaced during replication but may reanneal and act as template strands later. Adenovirus DNA binding protein and adenovirus DNA polymerase play important roles in replication. Once the genome has been replicated, TP undergoes cleavage into its mature form, signaling for packaging of new genomes.

The adenoviral capsid assembly and maturation process occurs in the nucleus.⁵⁸ Once enough assembled adenoviruses have accumulated, they rupture the nuclear membrane using adenoviral death protein and subsequently lyse the cell, releasing adenoviral particles.

Herpes Simplex Virus 1 (HSV-1)

Genome and Structure:

HSV-1 genomes are about 150 kb in size and are composed of linear dsDNA. These genomes include a unique long (U_L) region and a unique short (U_S) region.⁶⁰ The U_L and U_S regions are both flanked by their own inverted repeats. The terminal inverted repeats are called TR_L and TR_S while the internal inverted repeats are called IR_L and IR_S. HSV-1 contains approximately 80 genes, though the complexity of its genomic organization makes an exact number of genes difficult to obtain. As with many other viruses, HSV-1 genomes encode early, middle, and late genes. The early genes activate and regulate transcription of the middle and late genes. Middle genes facilitate genome replication and late genes mostly encode structural proteins.

The diameter of HSV-1 ranges around 155 nm to 240 nm.⁶¹ Its virions include an inner icosahedral capsid (with a 125 nm diameter) surrounded by tegument proteins which are in turn enveloped by a lipid membrane containing glycoproteins.

HSV-1’s icosahedral capsid consists of a variety of proteins. Some of the most important capsid proteins are encoded by the UL19, UL18, UL38, UL6, UL17, and UL25 genes.⁶² The UL19 gene encodes the major capsid protein VP5, which forms pentamers and hexamers for the capsid. These VP5 pentamers and hexamers are glued together by triplexes consisting of two copies of VP23 (encoded by UL18) and one copy of VP19C (encoded by UL38).⁶³ The UL6 gene encodes the protein that makes up the portal complex, a structure used by HSV-1 to release its DNA during infection. Each HSV-1 capsid has a single portal (composed of 12 copies of the portal protein) located at one of the vertices. UL17 and UL25 encode additional structural proteins that stabilize the capsid by binding on top of the other vertices. These two proteins also serve as a bridge between the capsid core and the tegument proteins.

The tegument of HSV-1 contains dozens of distinct proteins. Some examples include pUL36, pUL37, pUL7, and pUL51 proteins. The major tegument proteins are pUL36 and pUL37. The pUL36 protein binds on top of the UL17-UL25 complexes at the capsid’s vertices.⁶⁴ The pUL37 protein subsequently associates with pUL36. The pUL51 protein associates with cytoplasmic membranes in infected cells and recruits the pUL7 protein.⁶⁵ This pUL51-pUL7 interaction is important for HSV-1 assembly. HSV-1 has many more tegument proteins which play various functional roles.

HSV-1’s envelope contains up to 16 unique glycoproteins. Four of these glycoproteins (gB, gD, gH, and gL) are essential for viral entry into cells.⁶⁶ The gD glycoprotein first binds to one of its cellular receptors (nectin-1, herpesvirus entry mediator or HVEM, or 3-O-sulfated heparan sulfate). This binding event triggers a conformational change in gD that allows it to activate the gH/gL heterodimer. Next, gH/gL activate gB which induces fusion of HSV-1’s envelope with the cell membrane. Though the remaining 12 envelope glycoproteins are poorly understood, it is thought that they also play roles that influence cellular tropism and entry.

Life cycle:

After binding to cellular receptors via its glycoproteins, HSV-1 induces fusion of its envelope with the host cell membrane.⁶⁷ The capsid is trafficked to nuclear pores via microtubules. Since the capsid is too large to pass through a nuclear pore directly, the virus instead ejects its DNA through the pore via the portal complex.⁶⁸

HSV-1 replicates its genome and assembles its capsids in the nucleus. But the assembled capsids are again too large to exist the nucleus through nuclear pores. To overcome this issue, HSV-1 first buds via the inner nuclear membrane into the perinuclear cleft (the space between nuclear membranes), acquiring a primary envelope.⁶⁷ This process is driven by a pair of proteins (pUL34 and pUL31) which together form the nuclear egress complex. Next, the primary envelope fuses with the outer nuclear membrane, releasing the assembled capsids into the cytosol.

To acquire its final envelope, the HSV-1 capsid likely buds into the trans-Golgi network or into certain tubular vesicular organelles.⁶⁹ These membrane sources contain the envelope proteins of the virus as produced by transcription and various secretory pathways. One player is the pUL51 tegument protein that starts associated with the membrane into which the virus buds. The interaction between pUL51 and pUL7 helps facilitate recruitment of the capsid to the membrane. (Capsid envelopment is also coupled in many other ways to formation of the outer tegument). The enveloped virion eventually undergoes trafficking through the secretory system and eventually is packaged into exosomes that fuse with the cell membrane and release completed virions into the extracellular environment.

In humans, HSV-1 infects the epithelial cells first and produces viral particles.⁷⁰ It subsequently enters the termini of sensory neurons, undergoes retrograde transport into the brain, and remains in the central nervous system in a dormant state. During periods of stress in the host, the virus is reactivated and undergoes anterograde transport to infect epithelial cells once again.

References

1. Wain-Hobson, S., Sonigo, P., Danos, O., Cole, S. & Alizon, M. Nucleotide sequence of the AIDS virus, LAV. Cell 40, 9–17 (1985).

2. Wilusz, J. Putting an ‘End’ to HIV mRNAs: capping and polyadenylation as potential therapeutic targets. AIDS Res. Ther. 10, 31 (2013).

3. Marcello, A., Zoppé, M. & Giacca, M. Multiple Modes of Transcriptional Regulation by the HIV-1 Tat Transactivator. IUBMB Life 51, 175–181 (2001).

4. Brigati, C., Giacca, M., Noonan, D. M. & Albini, A. HIV Tat, its TARgets and the control of viral gene expression. FEMS Microbiol. Lett. 220, 57–65 (2003).

5. Harrison, J. J. E. K. et al. Cryo-EM structure of the HIV-1 Pol polyprotein provides insights into virion maturation. Sci. Adv. 8, eabn9874 (2022).

6. Guerrero, S. et al. HIV-1 Replication and the Cellular Eukaryotic Translation Apparatus. Viruses vol. 7 199–218 at https://doi.org/10.3390/v7010199 (2015).

7. Feinberg, M. B. & Greene, W. C. Molecular insights into human immunodeficiency virus type 1 pathogenesis. Curr. Opin. Immunol. 4, 466–474 (1992).

8. Sertznig, H., Hillebrand, F., Erkelenz, S., Schaal, H. & Widera, M. Behind the scenes of HIV-1 replication: Alternative splicing as the dependency factor on the quiet. Virology 516, 176–188 (2018).

9. Behrens, A.-J. & Crispin, M. Structural principles controlling HIV envelope glycosylation. Curr. Opin. Struct. Biol. 44, 125–133 (2017).

10. Campbell, E. M. & Hope, T. J. HIV-1 capsid: the multifaceted key player in HIV-1 infection. Nat. Rev. Microbiol. 13, 471–483 (2015).

11. Andrew, A. & Strebel, K. HIV-1 Vpu targets cell surface markers CD4 and BST-2 through distinct mechanisms. Mol. Aspects Med. 31, 407–417 (2010).

12. Bour, S., Geleziunas, R. & Wainberg, M. A. The human immunodeficiency virus type 1 (HIV-1) CD4 receptor and its central role in promotion of HIV-1 infection. Microbiol. Rev. 59, 63–93 (1995).

13. Engelman, A. & Cherepanov, P. The structural biology of HIV-1: mechanistic and therapeutic insights. Nat. Rev. Microbiol. 10, 279–290 (2012).

14. Marino, J., Wigdahl, B. & Nonnemacher, M. R. Extracellular HIV-1 Tat Mediates Increased Glutamate in the CNS Leading to Onset of Senescence and Progression of HAND . Frontiers in Aging Neuroscience vol. 12 at https://www.frontiersin.org/articles/10.3389/fnagi.2020.00168 (2020).

15. Abraham, L. & Fackler, O. T. HIV-1 Nef: a multifaceted modulator of T cell receptor signaling. Cell Commun. Signal. 10, 39 (2012).

16. Mehle, A. et al. Vif Overcomes the Innate Antiviral Activity of APOBEC3G by Promoting Its Degradation in the Ubiquitin-Proteasome Pathway *. J. Biol. Chem. 279, 7792–7798 (2004).

17. Donahue, J. P., Vetter, M. L., Mukhtar, N. A. & D’Aquila, R. T. The HIV-1 Vif PPLP motif is necessary for human APOBEC3G binding and degradation. Virology 377, 49–53 (2008).

18. Fei, G., Shan, C., Meijuan, N., Jenan, S. & Lawrence, K. Inhibition of tRNALys3-Primed Reverse Transcription by Human APOBEC3G during Human Immunodeficiency Virus Type 1 Replication. J. Virol. 80, 11710–11722 (2006).

19. Kogan, M. & Rappaport, J. HIV-1 Accessory Protein Vpr: Relevance in the pathogenesis of HIV and potential for therapeutic intervention. Retrovirology 8, 25 (2011).

20. Hladik, F. & McElrath, M. J. Setting the stage: host invasion by HIV. Nat. Rev. Immunol. 8, 447–457 (2008).

21. Müller, T. G., Zila, V., Müller, B. & Kräusslich, H.-G. Nuclear Capsid Uncoating and Reverse Transcription of HIV-1. Annu. Rev. Virol. 9, 261–284 (2022).

22. Müller, T. G. et al. HIV-1 uncoating by release of viral cDNA from capsid-like structures in the nucleus of infected cells. Elife 10, e64776 (2021).

23. Marchand, C., Johnson, A. A., Semenova, E. & Pommier, Y. Mechanisms and inhibition of HIV integration. Drug Discov. Today Dis. Mech. 3, 253–260 (2006).

24. Hughes, S. H. & Coffin, J. M. What Integration Sites Tell Us about HIV Persistence. Cell Host Microbe 19, 588–598 (2016).

25. Freed, E. O. HIV-1 assembly, release and maturation. Nat. Rev. Microbiol. 13, 484–496 (2015).

26. Brant, A. C., Tian, W., Majerciak, V., Yang, W. & Zheng, Z.-M. SARS-CoV-2: from its discovery to genome structure, transcription, and replication. Cell Biosci. 11, 136 (2021).

27. Bai, Z., Cao, Y., Liu, W. & Li, J. The SARS-CoV-2 Nucleocapsid Protein and Its Role in Viral Structure, Biological Functions, and a Potential Target for Drug or Vaccine Mitigation. Viruses vol. 13 at https://doi.org/10.3390/v13061115 (2021).

28. Schoeman, D. & Fielding, B. C. Coronavirus envelope protein: current knowledge. Virol. J. 16, 69 (2019).

29. Monje-Galvan, V. & Voth, G. A. Molecular interactions of the M and E integral membrane proteins of SARS-CoV-2. Faraday Discuss. (2021) doi:10.1039/D1FD00031D.

30. Collins, L. T. et al. Elucidation of SARS-CoV-2 budding mechanisms through molecular dynamics simulations of M and E protein complexes. J. Phys. Chem. Lett. 12, 12249–12255 (2021).

31. Arya, R. et al. Structural insights into SARS-CoV-2 proteins. J. Mol. Biol. 433, 166725 (2021).

32. Yang, H. & Rao, Z. Structural biology of SARS-CoV-2 and implications for therapeutic development. Nat. Rev. Microbiol. 19, 685–700 (2021).

33. J Alsaadi, E. A. & Jones, I. M. Membrane binding proteins of coronaviruses. Future Virol. 14, 275–286 (2019).

34. Neuman, B. W. et al. A structural analysis of M protein in coronavirus assembly and morphology. J. Struct. Biol. 174, 11–22 (2011).

35. Boson, B. et al. The SARS-CoV-2 envelope and membrane proteins modulate maturation and retention of the spike protein, allowing assembly of virus-like particles. J. Biol. Chem. 296, (2021).

36. Zhang, J., Xiao, T., Cai, Y. & Chen, B. Structure of SARS-CoV-2 spike protein. Curr. Opin. Virol. 50, 173–182 (2021).

37. Walls, A. C. et al. Structure, Function, and Antigenicity of the SARS-CoV-2 Spike Glycoprotein. Cell 181, 281-292.e6 (2020).

38. Peacock, T. P. et al. The furin cleavage site in the SARS-CoV-2 spike protein is required for transmission in ferrets. Nat. Microbiol. 6, 899–909 (2021).

39. Fertig, T. E. et al. The atomic portrait of SARS-CoV-2 as captured by cryo-electron microscopy. J. Cell. Mol. Med. 26, 25–34 (2022).

40. Schubert, K. et al. SARS-CoV-2 Nsp1 binds the ribosomal mRNA channel to inhibit translation. Nat. Struct. Mol. Biol. 27, 959–966 (2020).

41. Yuan, S. et al. Nonstructural Protein 1 of SARS-CoV-2 Is a Potent Pathogenicity Factor Redirecting Host Protein Synthesis Machinery toward Viral RNA. Mol. Cell 80, 1055-1066.e6 (2020).

42. Raj, R. Analysis of non-structural proteins, NSPs of SARS-CoV-2 as targets for computational drug designing. Biochem. Biophys. Reports 25, 100847 (2021).

43. Kirchdoerfer, R. N. & Ward, A. B. Structure of the SARS-CoV nsp12 polymerase bound to nsp7 and nsp8 co-factors. Nat. Commun. 10, 2342 (2019).

44. Roingeard, P. et al. The double-membrane vesicle (DMV): a virus-induced organelle dedicated to the replication of SARS-CoV-2 and other positive-sense single-stranded RNA viruses. Cell. Mol. Life Sci. 79, 425 (2022).

45. Baggen, J., Vanstreels, E., Jansen, S. & Daelemans, D. Cellular host factors for SARS-CoV-2 infection. Nat. Microbiol. 6, 1219–1232 (2021).

46. Sashittal, P., Zhang, C., Peng, J. & El-Kebir, M. Jumper enables discontinuous transcript assembly in coronaviruses. Nat. Commun. 12, 6728 (2021).

47. Wolff, G. et al. A molecular pore spans the double membrane of the coronavirus replication organelle. Science (80-. ). 369, 1395–1398 (2020).

48. David, B. & Delphine, M. Betacoronavirus Assembly: Clues and Perspectives for Elucidating SARS-CoV-2 Particle Formation and Egress. MBio 12, e02371-21 (2021).

49. Sha, S. et al. Cellular pathways of recombinant adeno-associated virus production for gene therapy. Biotechnol. Adv. 49, 107764 (2021).

50. Wang, D., Tai, P. W. L. & Gao, G. Adeno-associated virus vector as a platform for gene therapy delivery. Nat. Rev. Drug Discov. 18, 358–378 (2019).

51. Riyad, J. M. & Weber, T. Intracellular trafficking of adeno-associated virus (AAV) vectors: challenges and future directions. Gene Ther. 28, 683–696 (2021).

52. Ahi, Y. S. & Mittal, S. K. Components of Adenovirus Genome Packaging. Frontiers in Microbiology vol. 7 1503 at https://www.frontiersin.org/article/10.3389/fmicb.2016.01503 (2016).

53. Gallardo, J., Pérez-Illana, M., Martín-González, N. & San Martín, C. Adenovirus Structure: What Is New? International Journal of Molecular Sciences vol. 22 at https://doi.org/10.3390/ijms22105240 (2021).

54. Kulanayake, S. & Tikoo, S. K. Adenovirus Core Proteins: Structure and Function. Viruses vol. 13 at https://doi.org/10.3390/v13030388 (2021).

55. Russell, W. C. & Kemp, G. D. Role of Adenovirus Structural Components in the Regulation of Adenovirus Infection BT – The Molecular Repertoire of Adenoviruses I: Virion Structure and Infection. in (eds. Doerfler, W. & Böhm, P.) 81–98 (Springer Berlin Heidelberg, 1995). doi:10.1007/978-3-642-79496-4_6.

56. R., N. G. & L., S. P. Role of αv Integrins in Adenovirus Cell Entry and Gene Delivery. Microbiol. Mol. Biol. Rev. 63, 725–734 (1999).

57. Pied, N. & Wodrich, H. Imaging the adenovirus infection cycle. FEBS Lett. 593, 3419–3448 (2019).

58. Georgi, F. & Greber, U. F. The Adenovirus Death Protein – a small membrane protein controls cell lysis and disease. FEBS Lett. 594, 1861–1878 (2020).

59. Hoeben, R. C. & Uil, T. G. Adenovirus DNA Replication. Cold Spring Harb. Perspect. Biol. 5, (2013).

60. McGeoch, D. J., Rixon, F. J. & Davison, A. J. Topics in herpesvirus genomics and evolution. Virus Res. 117, 90–104 (2006).

61. Laine, R. F. et al. Structural analysis of herpes simplex virus by optical super-resolution imaging. Nat. Commun. 6, 5980 (2015).

62. Mettenleiter, T. C., Klupp, B. G. & Granzow, H. Herpesvirus assembly: a tale of two membranes. Curr. Opin. Microbiol. 9, 423–429 (2006).

63. E., H. E. Up close with herpesviruses. Science (80-. ). 360, 34–35 (2018).

64. H., F. W. et al. The Large Tegument Protein pUL36 Is Essential for Formation of the Capsid Vertex-Specific Component at the Capsid-Tegument Interface of Herpes Simplex Virus 1. J. Virol. 89, 1502–1511 (2015).

65. J., R. R., Rachel, F. & M., L. R. The Herpes Simplex Virus 1 UL51 Protein Interacts with the UL7 Protein and Plays a Role in Its Recruitment into the Virion. J. Virol. 89, 3112–3122 (2015).

66. T., H. A., E., D. R., E., H. E. & Thomas, S. Contributions of the Four Essential Entry Glycoproteins to HSV-1 Tropism and the Selection of Entry Routes. MBio 12, e00143-21 (2021).

67. Zeev-Ben-Mordehai, T., Hagen, C. & Grünewald, K. A cool hybrid approach to the herpesvirus ‘life’ cycle. Curr. Opin. Virol. 5, 42–49 (2014).

68. Newcomb, W. W., Cockrell, S. K., Homa, F. L. & Brown, J. C. Polarized DNA Ejection from the Herpesvirus Capsid. J. Mol. Biol. 392, 885–894 (2009).

69. Ahmad, I. & Wilson, D. W. HSV-1 Cytoplasmic Envelopment and Egress. International Journal of Molecular Sciences vol. 21 at https://doi.org/10.3390/ijms21175969 (2020).

70. Roizman, B. & Whitley, R. J. An Inquiry into the Molecular Basis of HSV Latency and Reactivation. Annu. Rev. Microbiol. 67, 355–374 (2013).