As a scientist, I am driven by the power of technological breakthroughs to make positive change for humanity. While I also take immense pleasure in the artistic/creative aspects of technology design, my motivation is centered on helping people and on protecting the future of the human species. For this reason, I am interested in a wide array of contemporary challenges as described in this outline. Because I am a synthetic biologist and synthetic biology has many applications, I have the ability to explore solutions to such diverse challenges despite their highly multidisciplinary nature.
That said, one of the tools in any good researcher’s repertoire is collaboration. Since I am just one person, my knowledge can only go so deep in so many areas. Interdisciplinary projects are much more likely to succeed when experts from multiple areas work together. So, I leverage collaboration extensively when carrying out my projects and will continue to do so in the future.
It should be noted that, though I am publicly presenting a number of conceptual explanations of possible solutions to important problems via this list, I have deliberately stated them in somewhat vague language to prevent their public disclosure from precluding outside investment.
I am sharing this list as a way of increasing my likelihood of connecting with potential collaborators over time and as a method of inspiring others to consider how they too might contribute to building a bright future.
I chose the featured image of a Martian-hued rendering of the Anasazi Cliff Palace at Mesa Verde National Park as a symbolic tribute to the creative and culturally rich spirit of humanity, an homage to where we have come from, and a hopeful indication of where we might go in the future.
Infectious Disease Burden in Developing Countries
Especially malaria, tuberculosis, and HIV.
Translational strategies for dissemination are key.
Some possible solutions:
Gene drives to prevent mosquitos from carrying pathogens
Inexpensive home diagnostics
Rapid inexpensive biomanufacturing of treatments
Thermostable treatments and vaccines
Inexpensive immune enhancement gene therapies
Rapid inexpensive biomanufacturing of vaccines
Especially highly concerning pathogens such as carbapenem-resistant Enterobacterales, carbapenem-resistant Acinetobacter, Clostridioides difficile, and drug-resistant Neisseria gonorrhoeae.
Adaptability and rapid generation of new solutions are key.
Some possible solutions:
Adaptable probiotic treatments
Rational design of phage therapies
Diversified phage therapy production platforms
Lysogenic phage therapies for widespread resistance shutdown and/or bactericide
Sentinel bacteria for detection and elimination of resistance in environment (esp. livestock and wastewater)
Bacterial conjugation for elimination of resistance in environment
Immune enhancement gene therapies
Rapid inexpensive biomanufacturing of vaccines
Rapid vaccine discovery platforms
Inexpensive home diagnostics
Affordable Gene Therapy Manufacturing
Many existing gene therapy delivery vehicles are extremely expensive to produce, so treatments are very costly for patients and not enough can be made to reach large populations.
Novel manufacturing solutions are vital to scalably make existing vectors (esp. AAVs).
New vectors which retain the benefits of existing vectors yet can be made inexpensively are also important.
Some possible solutions:
Synthetic biology methods for radically redesigning cellular platforms of virus production
Novel inexpensively manufacturable viral vectors
Hybrid nanoparticle-viral vectors that are easier to produce yet retain benefits of viruses and perhaps bring extra advantages as well
Methods for production of DNA origami nanorobots
Use of DNA origami to support viral capsid assembly and genome packaging
Gene Therapy for Radiation Resistance in Space
Extended periods of time in space, on the moon, on Mars, etc. expose people to large amounts of radiation.
Future space colonization efforts could be severely jeopardized by human radiation exposure, especially since this can cause problems with reproduction.
Enhance human DNA repair pathways by adding new genetic circuits and/or altering gene regulation
Polygenic gene therapy delivery vectors
Develop ways of safely delivering genes to most or all cells in the human body
Gene Therapy for Aging
Aging affects everyone and is marked by a deterioration of health and eventual death.
Treatments for aging would greatly improve the human condition by making people both much healthier and longer lived.
Life extension only linearly affects population growth, whereas reproduction exponentially affects population growth, so concerns about life extension causing overpopulation are often exaggerated.
As human longevity increases, its small contribution to population growth will likely be mitigated by parallel growth of technologies that improve human sustainability (e.g. vertical farms, cultured meat, renewable energy, space habitation, etc.)
Gene therapy has great potential for extending human longevity and simultaneously improving overall global health.
Some possible genetic approaches:
Polygenic gene therapy delivery vectors
Identify multiple genes that synergistically improve healthspan and deliver them together
Engineer regulatory pathways and insert genetic circuits that optimally modulate gene expression for improving healthspan
Biological Carbon Capture for Climate Change
The climate crisis threatens human and nonhuman life since is leading to desertification, flooding, extreme weather events, ecosystem collapse, etc. and will cause many millions of deaths if it continues unchecked.
Addressing climate change will necessitate both policy solutions and technological solutions.
Carbon capture is a particularly promising route towards mitigating climate change yet is difficult to scale.
Self-replicating biological carbon capture approaches would come with risks but demonstrate much greater scalability.
Some possible biological carbon capture solutions:
Genetically enhanced trees that more rapidly capture carbon
Genetically enhanced cyanobacteria (or algae) that more rapidly capture carbon
Bacteriophages that propagate genes in ocean cyanobacteria to enhance their carbon uptake
Use bacterial conjugation in ocean cyanobacteria to propagate genes that enhance their resistance to bacteriophages and thus rapidly increase cyanobacterial population size and carbon capture capacity
Develop computational models and experimental model systems to explore possible negative side effects of all of the above, find ways to counterbalance those side effects
Strong nanorobots with capabilities resembling those in science fiction would revolutionize every human activity.
Strong nanorobots should specifically possess (1) inexpensive mass producibility or self-replication capabilities, (2) ways of programming them to alter their functionality and environmental responses, (3) automated locomotion oriented towards accomplishing programmed tasks.
Only relatively simple nanorobots have been created so far.
Some possible routes towards strong nanorobotics:
Develop minimal cells derived from Mycoplasma bacteria, equip them with extensive optogenetic equipment for external programming, keep the “programming mode” turned off except when a special chemical switch is flipped to prevent light from scrambling their instructions
Synthesize complex dynamic DNA origami nanostructures that include optically programmable logic systems inspired by digital logic circuits as well as automated locomotion modules, also devise scalable manufacturing methods
Horizontal Gene Drives to Repair Pollinator Insect Networks
Bees and other insect pollinators form a crucial part of global ecosystems, yet populations of these insects are declining.
Decline of insect pollinators is negatively affecting many crops, limiting food production across the world.
Some of the most prominent reasons for bee decline are the spread of invasive Varroa mites that carry deformed wing virus (DWV) and the prevalence of toxic pesticides in the environment.
Horizontal gene transfer via seeding donor bacteria into insect gut microbiota may help protect insect pollinators in a scalable fashion.
Some possible approaches involving horizontal gene drives:
Give bees gut bacteria that spread conjugative plasmids carrying anti-Varroa genes.
Give bees gut bacteria that spread conjugative plasmids carrying anti-DWV genes.
Give bees gut bacteria that spread conjugative plasmids carrying genes that facilitate breakdown of toxic pesticides.
Connectomics Towards Whole-Brain Emulation
Mapping the brain and simulating it in a computer would provide an unparalleled holistic understanding of how our minds work.
Even partial connectomes and/or animal connectomes could give remarkable insights into neurobiology.
The convergence of connectomics and computational neuroscience has applications in bioinspired artificial intelligence, bioinspired robotics, neural prostheses, medicine for brain disease, and more.
Some possible routes for connectomics:
Expansion microscopy with genetically encoded synaptic and neuronal barcodes coupled with spatial transcriptomics
Massively parallel electron microscopy approaches
Improved x-ray nanotomography coupled with special sample treatment methods to improve tissue stability, allow multicolor imaging (and possibly barcodes), and enhance resolution
Construction of many compact light source devices to rapidly map tissue in parallel
Construction of extremely bright next-generation synchrotrons coupled with sample treatment methods to greatly improve stability
Nanobiotechnology for Neural Interfaces and Neural Prostheses
Existing neural interfaces and neural prostheses utilize microelectrode-based technologies for communicating with brain tissue.
Nanobiotechnology approaches may enable more precise, less invasive, and more powerful neural interfaces and prostheses.
Some possible approaches:
Polymersome nanocompartments that mimic neurons in their electrical response properties via embedded transmembrane proteins and ion gradients, link to nanowires that transmit electrical potential to external devices or other parts of the brain
Leverage gene therapy for modifying human neurons to express optogenetic channels, design nanomachines that incorporate upconversion nanoparticles for targeted stimulation
The future of humankind depends on our ability to colonize other planets, moons, etc.
Earth is the only planet in the solar system where humans can survive unaided.
Terraforming the moon or Mars would provide humankind with a new habitable world, yet this represents an enormously challenging task.
Seeding the moon or Mars with heavily engineered microorganisms may provide a first step towards transforming these celestial bodies into habitable worlds.
Some possible microorganism-based terraforming methods:
Perform extensive metabolic engineering on extremophiles that metabolize substances similar to those found on the moon or Mars, make them convert regolith to Earthlike atmospheric gases
Borrow genetic pathways from Deinococcus radiodurans to provide radiation resistance
Extensively engineer microorganisms to metabolize regolith and form seaweed-like colonies that bud into edible fruits
HIV’s genome is a 9.7 kb linear positive-sense ssRNA.1 There is a m7G-cap (specifically the standard eukaryotic m7GpppG as added by the host’s enzymes) at the 5’ end of the genome and a poly-A tail at the 3’ end of the genome.2 The genome also has a 5’-LTR and 3’-LTR (long terminal repeats) that aid its integration into the host genome after reverse transcription, that facilitate HIV genetic regulation, and that play a variety of other important functional roles. In particular, it should be noted that the integrated 5’UTR contains the HIV promoter called U3.3,4
HIV’s genome translates three polyproteins (as well as several accessory proteins). The Gag polyprotein contains the HIV structural proteins. The Gag-Pol polyprotein contains (within its Pol component) the enzymes viral protease, reverse transcriptase, and integrase. The Gag-Pol polyprotein is produced via a –1 ribosomal frameshift at the end of Gag translation. Because of the lower efficiency of this frameshift, Gag-Pol is synthesized 20-fold less frequently than Gag.5 The frameshift’s mechanism depends upon a slippery heptanucleotide sequence UUUUUUA and a downstream RNA secondary structure called the frameshift stimulatory signal (FSS).6 This FSS controls the efficiency of the frameshift process.
The HIV RNA genome undergoes alternative splicing to produce the rest of the viral proteins. One splicing event produces an RNA that separately encodes the Vpu protein and the Env protein (also called gp160).6–8 A mechanism called ribosome shunting is used to transition from Vpu’s open reading frame to Env’s open reading frame. The Env protein contains the gp41 and gp120 proteins. Env is post-translationally cleaved into gp41 and gp120 by a host furin enzyme in the endoplasmic reticulum.9 It is important to note that Env is also heavily glycosylated post-translationally to help HIV evade the immune system. Several other complex splicing events lead to the production of RNAs encoding Tat, Rev, Nef, Vif, and Vpr.
HIV viral protease cleaves the Gag polyprotein and thus produces structural proteins including the capsid protein CA (also called p24), the matrix protein MA (also called p17), the nucleocapsid protein NC (also called p7), and the p6 peptide.10 The HIV core capsid is shaped like a truncated cone and consists of about 1500 CA monomers. Most of the CA proteins assemble into hexamers, but a few pentamers are present. The pentamers help give the core capsid its conical morphology by providing extra curvature near the top and bottom. Each core capsid contains two copies of the HIV genomic RNA, complexed with NC protein. Reverse transcriptase, integrase, and viral accessory proteins are also held within the core capsid. HIV’s core capsid is packaged into a lipid envelope that bears gp41-gp120 glycoprotein heterodimers. The MA protein forms a layer between the core capsid and the envelope.
Accessory proteins Vpu, Tat, Rev, Nef, Vif, and Vpr facilitate a variety of functions. Vpu induces degradation of CD4 proteins within the endoplasmic reticulum of host CD4+ T cells. It does this by using its cytosolic domain as a molecular adaptor between CD4 and a ubiquitin ligase (which subsequently triggers proteosomal degradation of the CD4).11 The reason that Vpu does this is to prevent HIV superinfection wherein two different types of HIV might infect the same cell and interfere with each other. This is an example of competition between viruses.12 Vpu also enhances release of HIV virions from infected cells by using its cytosolic domain to inhibit a host protein called tetherin (also known as BST-2).11 Without Vpu, tetherin would bind the viral envelope to the cell surface as well to other HIV virus particles, impeding release.
Tat, also called the viral transactivator protein, is necessary for efficient transcriptional elongation of the HIV genome after integration into the host DNA.13 Tat binds the viral transactivation response element (TAR), a structured RNA motif present at the beginning of the HIV transcripts. It then recruits protein positive transcription elongation factor b (P-TEFb). This allows P-TEFb to phosphorylate certain residues in the C-terminal domain of RNA polymerase II, stimulating transcriptional elongation. Tat also recruits several of the host cell’s histone acetyltransferases to the viral 5’-LTR so as to open the chromatin around the U3 promoter and related parts of the integrated HIV genome.3,4 Finally, Tat is secreted from infected cells14 and acts as an autocrine and paracrine signaling molecule.4 It inhibits antigen-specific lymphocyte proliferation, stimulates expression of certain cytokines and cytokine receptors, modulates the activities of various host cell types, causes neurotoxicity in the brain, and more.
Rev facilitates nuclear export of the unspliced and singly spliced HIV RNAs by binding to a sequence located in the Env coding region called the Rev response element (RRE).13 The Rev protein forms a dimer upon binding to the RRE and acts as an adaptor, binding a host nuclear export factor called CRM1. Rev is also known to form higher-order oligomers via cooperative multimerization of the RNA-bound dimers.
Nef is a myristoylated protein that downregulates certain host T cell proteins and thereby increases production of virus. Nef is localized to the cytosol and the plasma membrane. It specifically inhibits CD4, Lck, CTLA-4, and Bad.15 Downregulating CD4 contributes to the prevention of superinfection that also occurs with Vpu’s inhibition of CD4. Nef induces endocytosis of plasma membrane Lck protein and traffics it to recycling endosomes and the trans-Golgi network. At these intracellular compartments, Lck signals for Ras and Erk activation, which triggers IL-2 production. IL-2 causes T cells to grow and proliferate, leading to more T cells that HIV can infect and leading to activation of the machinery HIV needs to replicate itself within infected T cells. Nef triggers lysosomal degradation of CTLA-4. This is because CTLA-4 can serve as an off-switch for T cells, which would lead to inhibition of HIV replication if active. Nef inactivates the Bad protein via phosphorylation. Bad participates in apoptotic cascades, so Nef prevents apoptosis of the infected host cell in this way.
Vif forms a complex with the host antiviral proteins APOBEC3F and APOBEC3G and induces their ubiquitination and subsequent degradation by the proteosome.16 It also may inhibit these proteins through other mechanisms. APOBEC3F and APOBEC3G are cytidine deaminases that hypermutate the negative-sense strand of HIV cDNA, leading to weak or nonviable viruses.17 These proteins also interfere with reverse transcription by blocking tRNALys3 from binding to the HIV RNA 5’UTR (tRNALys3 usually acts as a primer to initiate reverse transcription of the HIV genome).18
Vpr facilitates nuclear import of the HIV pre-integration complex.19 The pre-integration complex consists of viral cDNA and associated proteins (uncoating and reverse transcription have already occurred at this stage). Vpr binds the pre-integration complex and recruits host importins to enable nuclear import. It may further enhance nuclear import through interactions with some of the nuclear pore proteins. In addition to nuclear import, Vpr has several more functions: it acts as a coactivator (along with other proteins) of the HIV 5’UTR’s U3 promoter, might influence NF-κB regulation, may modulate apoptotic pathways, and arrests the cell cycle at the G2 stage.
CD4+ T cells represent the primary targets of HIV, though the virus is also capable of infecting other cell types such as dendritic cells.20 HIV infects CD4+ T cells through binding its gp120 glycoprotein to the CD4 receptor and the CCR5 coreceptor or the CXCR4 coreceptor.10 This triggers fusion of the viral envelope with the plasma membrane and allows the core capsid to enter the cytosol.
HIV’s core capsid is transported by motor proteins along microtubules to dock at nuclear pores. The nuclear pore complex has flexible cytosolic filaments composed primarily of the Nup358 protein, which interacts with the core capsid.21 These interactions guide the narrow end of the core capsid into the nuclear pore’s central channel. Next, the core capsid interacts with the central channel’s unstructured phenylalanine-glycine (FG) repeats that exist in a hydrogel-like liquid phase. As the core capsid translocates through the central pore, it binds the Nup153 protein, a component of the nuclear pore complex’s basket. Finally, many copies of the nucleoplasmic CPSF6 protein coat the core capsid and escort it towards its genomic site of integration. It is thought that the reverse transcription process begins inside of the core capsid at this point, leading to cDNA synthesis.21,22 Buildup of newly made cDNA within the core capsid likely results in pressure that helps rupture the capsid structure, facilitating uncoating.
Tetrameric HIV integrase binds both of the viral LTRs and facilitates integration of the cDNA into the host genome.23 Though integration sites vary widely, they are not entirely random. Host chromatin structure and other factors influence where the viral cDNA integrates.24 Transcription of HIV RNAs can then proceed from the U3 promoter with the aid of the Tat protein and host factors. As described earlier, a series of RNA splicing events produce the various RNAs necessary to synthesize all of the different HIV proteins and polyproteins.
Env protein is trafficked to the cell membrane through the secretory pathway. It is cleaved by a host furin enzyme into gp41 and gp160 components during its time in the endoplasmic reticulum.9 Gag and Gag-Pol polyproteins are expressed cytosolically. Since Gag is post-translationally modified by amino-terminal myristoylation, it anchors to the cell membrane by inserting its myristate tail into the lipid bilayer.25 Gag and a smaller number of Gag-Pol accumulate on the inner membrane surface and incorporate gp41-gp160 complexes. NC domains in the Gag proteins bind and help package the two copies of HIV genomic RNA. The p6 region of the Gag protein (located at the C-terminal end) then recruits host ESCRT-I and ALIX proteins, which subsequently sequester host ESCRT-III and VPS4 complexes to drive budding and membrane scission, releasing virus into the extracellular space. After this, the HIV viral protease (from within the Gag-Pol polyprotein) cleaves the Gag and Gag-Pol polyproteins into their constituent proteins, facilitating maturation of the released HIV particles.
Genome and Structure:
The SARS-CoV-2 genome consists of about 30 kb of linear positive-sense ssRNA. There is a m7G-cap (specifically m7GpppA1) at the 5’ end of the genome and a 30-60 nucleotide poly-A tail at the 3’ end of the genome. These protective structures minimize exonuclease degradation.26 The genome also has a 5’ UTR and a 3’ UTR which contain sequences that aid in transcriptional regulation and in packaging. The SARS-CoV-2 genome directly translates two partially overlapping polyproteins, ORF1a and ORF1b. There is a –1 ribosomal frameshift in ORF1b relative to ORF1a. Within the polyproteins, two self-activating proteases (Papain-like protease PLpro and 3-chymotrypsin-like protease 3CLpro) perform cleavage events that lead to the generation of the virus’s 16 non-structural proteins (nsps). It should be noted that the 3CLpro is also known as the main protease or Mpro. The coronavirus also produces 4 structural proteins, but these are not translated until after the synthesis of corresponding subgenomic RNAs via the viral replication complex. To create these subgenomic RNAs, negative-sense RNA must first be made and then undergo conversion back to positive-sense RNA for translation. Genes encoding the structural proteins are located downstream of the ORF1b section.
SARS-CoV-2’s four structural proteins include the N, E, M, and S proteins. Many copies of the N (nucleocapsid) protein bind the RNA genome and organize it into a helical ribonucelocapsid complex. The complex undergoes packaging into the viral envelope during coronavirus budding. Interactions between the N protein and the other structural proteins may facilitate this packaging process. The N protein also inhibits host immune responses by antagonizing viral suppressor RNAi and by blocking the signaling of interferon production pathways.27
The transmembrane E (envelope) protein forms pentamers and plays a key but poorly understood role in the budding of viral envelopes into the endoplasmic reticulum Golgi intermediate compartment (ERGIC).28–30 Despite its importance in budding, mature viral particles do not incorporate very many E proteins into their envelopes. One of the posttranslational modifications of the E protein is palmitoylation. This aids subcellular trafficking and interactions with membranes. E protein pentamers also act as ion channels that alter membrane potential.31,32 This may lead to inflammasome activation, a contributing factor to cytokine storm induction.
The M (membrane) protein is the most abundant protein in the virion and drives global curvature in the ERGIC membrane to facilitate budding.30,33 It forms transmembrane dimers that likely oligomerize to induce this curvature.34 The M protein also has a cytosolic (and later intravirion) globular domain that likely interacts with the other structural proteins. M protein dimers also induce local curvature through preferential interactions with phosphatidylserine and phosphatidylinositol lipids.29,30 M proteins help sequester S proteins into the envelopes of budding viruses.35
The S (spike) protein of SARS-CoV-2 has been heavily studied due to its central roles in the infectivity and immunogenicity of the coronavirus. It forms a homotrimer that protrudes from the viral envelope and is heavily glycosylated. It binds the host’s ACE2 receptor (angiotensin-converting enzyme 2 receptor) and undergoes conformational changes to promote viral fusion.36 The S protein undergoes cleavage into S1 and S2 subunits by the host’s furin protease during viral maturation.37,38 This enhances SARS-CoV-2 entry into lung cells and may partially explain the virus’s high degree of transmissibility. The S1 fragment contains the receptor binding domain (RBD) and associated machinery while the S2 fragment facilitates fusion. Prior to cellular infection, most S proteins exist in a closed prefusion conformation where the RBDs of each monomer are hidden most of the time.39 After the S protein binds ACE2 during transient exposure of one of its RBDs, the other two RBDs quickly bind as well. This binding triggers a conformational change in the S protein that loosens the structure, unleashing the S2 fusion component and exposing another proteolytic cleavage site called S2’. Host transmembrane proteases such as TMPRSS2 cut at S2’, causing the full activation of the S2 fusion subunit and the dramatic elongation of the S protein into the postfusion conformation. This results in the viral envelope fusing with the host membrane and uptake of the coronavirus’s RNA into the cell.
The 16 nsps of SARS-CoV-2 play a variety of roles. For instance, nsp1 shuts down host cell translation by plugging the mRNA entry channel of the ribosome, inhibiting the host cell’s immune responses and maximizing viral production.40,41 Viral proteins still undergo translation because a conserved sequence in the coronavirus RNA helps circumvent the blockage through a poorly understood mechanism. The nsp5 protein is the protease 3CLpro.42 The nsp3 protein contains several subcomponents, including the protease PLpro. The nsp12, nsp7, and nsp8 proteins come together to form the RNA-dependent RNA polymerase (RdRp) that replicates the viral genome.42,43 The nsp2 protein is likely a topoisomerase which functions in RNA replication. The nsp4 and nsp6 proteins as well as certain subcomponents of nsp3 restructure intracellular host membranes into double-membrane vesicles (DMVs) which compartmentalize viral replication.44
Beyond the 4 structural proteins and 16 nsps of SARS-CoV-2, the coronaviral genome also encodes some poorly understood accessory proteins including ORF3a, ORF3b, ORF6, ORF7a, ORF7b, ORF8 and ORF9b.45 These accessory proteins are non-essential for replication in vitro, but they are thought to be required for the virus’s full degree of virulence in vivo.
As mentioned, SARS-CoV-2 infects cells by first binding a S protein RBD to the ACE2 receptor. This triggers a conformational change that elongates the S protein’s structure and reveals the S2 fusion fragment, facilitating fusion of the virion envelope with the host cell membrane.39 Cleavage of the S’ site by proteases like TMPRSS2 aid this change from the prefusion to postfusion configurations. Alternatively, SARS-CoV-2 can enter the cell by binding to ACE2, undergoing endocytosis, and fusing with the endosome to release its genome (as induced by endosomal cathepsin proteases).45 After release of the SARS-CoV-2 genome into the cytosol, the N protein disassociates and allows translation of ORF1a and ORF1b, producing polyproteins which are cleaved into mature proteins by the PLpro and 3CLpro proteases as discussed earlier.
The RdRp complex synthesizes negative-sense full genomic RNAs as well as negative-sense subgenomic RNAs. In the latter case, discontinuous transcription is employed, a process by which the RdRp jumps over certain sections of the RNA and initiates transcription separately from the rest of the genome.46 The negative-sense RNAs are subsequently converted back into positive-sense full genomic RNAs and positive-sense subgenomic RNAs. The subgenomic RNAs are translated to make structural proteins and some accessory proteins.45
As described earlier, the nsp4, nsp6, and parts of nsp3 proteins remodel host endoplasmic reticulum (ER) to create DMVs.45 These DMVs are the site of the coronaviral genomic replication and serve to shield the viral RNA and RdRp complex from cellular innate immune factors. DMVs cluster together and are continuous with the ER mostly through small tubular connections. After replication, the newly synthesized coronavirus RNAs undergo export into the cytosol through molecular pore complexes that span both membranes of the DMVs.47 These molecular pore complexes are composed of nsp3 domains and possibly other viral and/or host proteins.
Newly replicated SARS-CoV-2 genomic RNAs complex with N proteins to form helical nucleocapsids. To enable packaging, the nucleocapsids interact with M protein cytosolic domains which protrude at the ERGIC.48 M proteins, E proteins, and S proteins are all localized to the ERGIC membrane. The highly abundant M proteins induce curvature of the membrane to facilitate budding. As mentioned, E proteins also play essential roles in budding, but the mechanisms are poorly understood. Once the virions have budded into the ERGIC, they are shuttled through the Golgi via a series of vesicles and eventually secreted out of the cell.
Adeno-associated virus (AAV)
Genome and Structure:
AAV genomes are about 4.7 kb in length and are composed of ssDNA. Inverted terminal repeats (ITRs) form hairpin structures at ends of the genome. These ITR structures are important for AAV genomic packaging and replication. Rep genes (encoded via overlapping reading frames) include Rep78, Rep68, Rep52, Rep40.49 These proteins facilitate replication of the viral genome. As a Dependoparvovirus, additional helper functions from adenovirus (or certain other viruses) are needed for AAVs to replicate.
AAV capsids are about 25 nm in diameter. Cap genes include VP1, VP2, VP3 and are transcribed from overlapping reading frames.50 The VP3 protein is the smallest capsid protein. The VP2 protein is the same as VP3 except that it includes an N-terminal extension with a nuclear localization sequence. The VP1 protein is the same as VP2 except that it includes a further N-terminal extension encoding a phospholipase A2 (PLA2) that facilitates endosomal escape during infection. In the AAV capsid, VP1, VP2, and VP3 are present at a ratio of roughly 1:1:10. It should be noted that this ratio is actually the average of a distribution, not a fixed number.
Frame-shifted start codons in the Cap gene region transcribe AAP (assembly activating protein) and MAAP (membrane associated accessory protein). These proteins help facilitate packaging and other aspects of the AAV life cycle.
There are a variety of different AAV serotypes (AAV2, AAV6, AAV9, etc.) that selectively infect certain tissue types. AAVs bind to host cell receptors and are internalized by endocytosis. The particular receptors involved can vary depending on the AAV serotype, though some receptors are consistent across many serotypes. Internalization occurs most often via clathrin-coated pits, but some AAVs are internalized by other routes such as macropinocytosis or the CLIC/GEEC tubulovesicular pathway.51
After endocytosis, conformational changes in the AAV capsid lead to exposure of the PLA2 VP1 domain, which facilitates endosomal escape. The AAV is then transported to the nucleus mainly by motor proteins on cytoskeletal highways. It enters via nuclear pores and finishes uncoating its genome.
AAV genomes initiate replication using the ends of their ITR hairpins as primers. This leads to a series of complex steps involving strand displacement and nicking.49 In the end, new copies of the AAV genome are synthesized. The Rep proteins are key players in this process. It is important to realize that AAVs can only replicate in cells which have also been infected by adenovirus or similar helper viruses (this is why they are called “adeno-associated viruses”). Adenoviruses provide helper genes encoding proteins (e.g. E4, E2a, VA) that are vital for the successful completion of the AAV life cycle. After new AAV capsids have assembled from VP1, VP2, and VP3 and once AAV genomes have been replicated, the ssDNA genomes are threaded into the capsids via pores at their five-fold vertices.
AAVs are nonpathogenic, though a large fraction of people possess antibodies against at least some serotypes, so exposure to them is fairly common.
Genome and Structure:
Adenovirus genomes are about 36 kb in size and are composed of linear dsDNA. They possess inverted terminal repeats (ITRs) which help facilitate replication and other functions. These genomes contain a variety of transcriptional units which are expressed at different times during the virus’s life cycle.52 E1A, E1B, E2A, E2B, E3, and E4 transcriptional units are expressed early during cellular infection. Their proteins are involved in DNA replication, transcriptional regulation, and suppression of host immune responses. The L1, L2, L3, L4, and L5 transcriptional units are expressed later in the life cycle. Their products include most of the capsid proteins as well as other proteins involved in packaging and assembly. Each transcriptional unit can produce multiple mRNAs through the host’s alternative splicing machinery.
The capsid of the adenovirus is about 90 nm in diameter and consists of three major proteins (hexon, penton, and fiber proteins) as well as a variety of minor proteins and core proteins. Hexon trimer is the most abundant protein in the capsid, the pentameric pentons occur at the vertices, and trimeric fibers are positioned on top of the pentons.53 The fibers point outwards from the capsid and end in knob domains which bind to cellular receptors. In Ad5, a commonly studied type of adenovirus, the fiber knob primarily binds to the coxsackievirus and adenovirus receptor (CAR). That said, it should be noted that Ad5’s fiber knob can also bind to alternative receptors such as vascular cell adhesion molecule 1 and heparan sulfate proteoglycans.
Minor capsid proteins include pIX, pIIIa, pVI, and pVIII. The pIX protein interlaces between hexons and helps stabilize the capsid. Though pIX is positioned in the crevices between the hexons, it is still exposed to the outside environment. By contrast, the pIIIa, pVI, and pVIII proteins bind to the inside of the capsid and contribute further structural stabilization. When the adenovirus is inside of the acidic endosome during infection, conformational changes in the capsid release the pVI protein, which facilitates endosomal escape through membrane lytic activity.
Adenovirus core proteins include pV, pVII, protein μ (also known as pX), adenovirus proteinase (AVP), pIVa2, and terminal protein (TP).54 The pVII protein has many positively-charged arginine residues and so functions to condense the viral DNA. The pV protein bridges the core with the capsid through interactions with pVII and with pVI. AVP cleaves various adenoviral proteins (pIIIa, TP, pVI, pVII, pVIII, pX) to convert them to their mature forms.55 The pIVa2 and pX proteins interact with the viral DNA and may play roles in packaging or replication. TP binds to the ends of the genome and is essential for localizing the viral DNA in the nucleus and for viral replication.
Adenovirus infects cells by binding its fiber knob to cellular receptors such as CAR (in the case of Ad5). The penton then binds certain αv integrins, positioning the viral capsid for endocytosis.56 When the endosome acidifies, the adenovirus capsid partially disassembles, fibers and pentons fall away, and pVI is released.57 The pVI protein’s membrane lytic activity facilitates endosomal escape. Partially disassembled capsids then undergo dynein-mediated transport along microtubules and dock at the entrance to nuclear pores. The capsids further disassemble and releases DNA through the nuclear pore. This DNA remains complexed with pVII after it enters the nucleus.
Adenoviral transcription is initiated by the E1A protein, inducing expression of early genes.58 This subsequently leads to expression of the E2, E3, and E4 transcriptional units, which help the virus escape immune responses. This cascade leads to expression of the L1, L2, L3, L4, and L5 transcriptional units, which mainly synthesize viral structural proteins and facilitate capsid assembly.
In the nucleus, adenovirus genomes replicate within dense complexes of protein that can be seen as spots via fluorescence microscopy. Replication begins at the ITRs and is primed by TP.59 Several more viral proteins and host proteins also aid the initiation of replication. Nontemplate strands are displaced during replication but may reanneal and act as template strands later. Adenovirus DNA binding protein and adenovirus DNA polymerase play important roles in replication. Once the genome has been replicated, TP undergoes cleavage into its mature form, signaling for packaging of new genomes.
The adenoviral capsid assembly and maturation process occurs in the nucleus.58 Once enough assembled adenoviruses have accumulated, they rupture the nuclear membrane using adenoviral death protein and subsequently lyse the cell, releasing adenoviral particles.
Herpes Simplex Virus 1 (HSV-1)
Genome and Structure:
HSV-1 genomes are about 150 kb in size and are composed of linear dsDNA. These genomes include a unique long (UL) region and a unique short (US) region.60 The UL and US regions are both flanked by their own inverted repeats. The terminal inverted repeats are called TRL and TRS while the internal inverted repeats are called IRL and IRS. HSV-1 contains approximately 80 genes, though the complexity of its genomic organization makes an exact number of genes difficult to obtain. As with many other viruses, HSV-1 genomes encode early, middle, and late genes. The early genes activate and regulate transcription of the middle and late genes. Middle genes facilitate genome replication and late genes mostly encode structural proteins.
The diameter of HSV-1 ranges around 155 nm to 240 nm.61 Its virions include an inner icosahedral capsid (with a 125 nm diameter) surrounded by tegument proteins which are in turn enveloped by a lipid membrane containing glycoproteins.
HSV-1’s icosahedral capsid consists of a variety of proteins. Some of the most important capsid proteins are encoded by the UL19, UL18, UL38, UL6, UL17, and UL25 genes.62 The UL19 gene encodes the major capsid protein VP5, which forms pentamers and hexamers for the capsid. These VP5 pentamers and hexamers are glued together by triplexes consisting of two copies of VP23 (encoded by UL18) and one copy of VP19C (encoded by UL38).63 The UL6 gene encodes the protein that makes up the portal complex, a structure used by HSV-1 to release its DNA during infection. Each HSV-1 capsid has a single portal (composed of 12 copies of the portal protein) located at one of the vertices. UL17 and UL25 encode additional structural proteins that stabilize the capsid by binding on top of the other vertices. These two proteins also serve as a bridge between the capsid core and the tegument proteins.
The tegument of HSV-1 contains dozens of distinct proteins. Some examples include pUL36, pUL37, pUL7, and pUL51 proteins. The major tegument proteins are pUL36 and pUL37. The pUL36 protein binds on top of the UL17-UL25 complexes at the capsid’s vertices.64 The pUL37 protein subsequently associates with pUL36. The pUL51 protein associates with cytoplasmic membranes in infected cells and recruits the pUL7 protein.65 This pUL51-pUL7 interaction is important for HSV-1 assembly. HSV-1 has many more tegument proteins which play various functional roles.
HSV-1’s envelope contains up to 16 unique glycoproteins. Four of these glycoproteins (gB, gD, gH, and gL) are essential for viral entry into cells.66 The gD glycoprotein first binds to one of its cellular receptors (nectin-1, herpesvirus entry mediator or HVEM, or 3-O-sulfated heparan sulfate). This binding event triggers a conformational change in gD that allows it to activate the gH/gL heterodimer. Next, gH/gL activate gB which induces fusion of HSV-1’s envelope with the cell membrane. Though the remaining 12 envelope glycoproteins are poorly understood, it is thought that they also play roles that influence cellular tropism and entry.
After binding to cellular receptors via its glycoproteins, HSV-1 induces fusion of its envelope with the host cell membrane.67 The capsid is trafficked to nuclear pores via microtubules. Since the capsid is too large to pass through a nuclear pore directly, the virus instead ejects its DNA through the pore via the portal complex.68
HSV-1 replicates its genome and assembles its capsids in the nucleus. But the assembled capsids are again too large to exist the nucleus through nuclear pores. To overcome this issue, HSV-1 first buds via the inner nuclear membrane into the perinuclear cleft (the space between nuclear membranes), acquiring a primary envelope.67 This process is driven by a pair of proteins (pUL34 and pUL31) which together form the nuclear egress complex. Next, the primary envelope fuses with the outer nuclear membrane, releasing the assembled capsids into the cytosol.
To acquire its final envelope, the HSV-1 capsid likely buds into the trans-Golgi network or into certain tubular vesicular organelles.69 These membrane sources contain the envelope proteins of the virus as produced by transcription and various secretory pathways. One player is the pUL51 tegument protein that starts associated with the membrane into which the virus buds. The interaction between pUL51 and pUL7 helps facilitate recruitment of the capsid to the membrane. (Capsid envelopment is also coupled in many other ways to formation of the outer tegument). The enveloped virion eventually undergoes trafficking through the secretory system and eventually is packaged into exosomes that fuse with the cell membrane and release completed virions into the extracellular environment.
In humans, HSV-1 infects the epithelial cells first and produces viral particles.70 It subsequently enters the termini of sensory neurons, undergoes retrograde transport into the brain, and remains in the central nervous system in a dormant state. During periods of stress in the host, the virus is reactivated and undergoes anterograde transport to infect epithelial cells once again.
1. Wain-Hobson, S., Sonigo, P., Danos, O., Cole, S. & Alizon, M. Nucleotide sequence of the AIDS virus, LAV. Cell40, 9–17 (1985).
2. Wilusz, J. Putting an ‘End’ to HIV mRNAs: capping and polyadenylation as potential therapeutic targets. AIDS Res. Ther.10, 31 (2013).
3. Marcello, A., Zoppé, M. & Giacca, M. Multiple Modes of Transcriptional Regulation by the HIV-1 Tat Transactivator. IUBMB Life51, 175–181 (2001).
4. Brigati, C., Giacca, M., Noonan, D. M. & Albini, A. HIV Tat, its TARgets and the control of viral gene expression. FEMS Microbiol. Lett.220, 57–65 (2003).
5. Harrison, J. J. E. K. et al. Cryo-EM structure of the HIV-1 Pol polyprotein provides insights into virion maturation. Sci. Adv.8, eabn9874 (2022).
15. Abraham, L. & Fackler, O. T. HIV-1 Nef: a multifaceted modulator of T cell receptor signaling. Cell Commun. Signal.10, 39 (2012).
16. Mehle, A. et al. Vif Overcomes the Innate Antiviral Activity of APOBEC3G by Promoting Its Degradation in the Ubiquitin-Proteasome Pathway *. J. Biol. Chem.279, 7792–7798 (2004).
17. Donahue, J. P., Vetter, M. L., Mukhtar, N. A. & D’Aquila, R. T. The HIV-1 Vif PPLP motif is necessary for human APOBEC3G binding and degradation. Virology377, 49–53 (2008).
18. Fei, G., Shan, C., Meijuan, N., Jenan, S. & Lawrence, K. Inhibition of tRNALys3-Primed Reverse Transcription by Human APOBEC3G during Human Immunodeficiency Virus Type 1 Replication. J. Virol.80, 11710–11722 (2006).
19. Kogan, M. & Rappaport, J. HIV-1 Accessory Protein Vpr: Relevance in the pathogenesis of HIV and potential for therapeutic intervention. Retrovirology8, 25 (2011).
20. Hladik, F. & McElrath, M. J. Setting the stage: host invasion by HIV. Nat. Rev. Immunol.8, 447–457 (2008).
21. Müller, T. G., Zila, V., Müller, B. & Kräusslich, H.-G. Nuclear Capsid Uncoating and Reverse Transcription of HIV-1. Annu. Rev. Virol.9, 261–284 (2022).
22. Müller, T. G. et al. HIV-1 uncoating by release of viral cDNA from capsid-like structures in the nucleus of infected cells. Elife10, e64776 (2021).
23. Marchand, C., Johnson, A. A., Semenova, E. & Pommier, Y. Mechanisms and inhibition of HIV integration. Drug Discov. Today Dis. Mech.3, 253–260 (2006).
24. Hughes, S. H. & Coffin, J. M. What Integration Sites Tell Us about HIV Persistence. Cell Host Microbe19, 588–598 (2016).
25. Freed, E. O. HIV-1 assembly, release and maturation. Nat. Rev. Microbiol.13, 484–496 (2015).
26. Brant, A. C., Tian, W., Majerciak, V., Yang, W. & Zheng, Z.-M. SARS-CoV-2: from its discovery to genome structure, transcription, and replication. Cell Biosci.11, 136 (2021).
27. Bai, Z., Cao, Y., Liu, W. & Li, J. The SARS-CoV-2 Nucleocapsid Protein and Its Role in Viral Structure, Biological Functions, and a Potential Target for Drug or Vaccine Mitigation. Viruses vol. 13 at https://doi.org/10.3390/v13061115 (2021).
28. Schoeman, D. & Fielding, B. C. Coronavirus envelope protein: current knowledge. Virol. J.16, 69 (2019).
29. Monje-Galvan, V. & Voth, G. A. Molecular interactions of the M and E integral membrane proteins of SARS-CoV-2. Faraday Discuss. (2021) doi:10.1039/D1FD00031D.
30. Collins, L. T. et al. Elucidation of SARS-CoV-2 budding mechanisms through molecular dynamics simulations of M and E protein complexes. J. Phys. Chem. Lett.12, 12249–12255 (2021).
31. Arya, R. et al. Structural insights into SARS-CoV-2 proteins. J. Mol. Biol.433, 166725 (2021).
32. Yang, H. & Rao, Z. Structural biology of SARS-CoV-2 and implications for therapeutic development. Nat. Rev. Microbiol.19, 685–700 (2021).
33. J Alsaadi, E. A. & Jones, I. M. Membrane binding proteins of coronaviruses. Future Virol.14, 275–286 (2019).
34. Neuman, B. W. et al. A structural analysis of M protein in coronavirus assembly and morphology. J. Struct. Biol.174, 11–22 (2011).
35. Boson, B. et al. The SARS-CoV-2 envelope and membrane proteins modulate maturation and retention of the spike protein, allowing assembly of virus-like particles. J. Biol. Chem.296, (2021).
36. Zhang, J., Xiao, T., Cai, Y. & Chen, B. Structure of SARS-CoV-2 spike protein. Curr. Opin. Virol.50, 173–182 (2021).
37. Walls, A. C. et al. Structure, Function, and Antigenicity of the SARS-CoV-2 Spike Glycoprotein. Cell181, 281-292.e6 (2020).
38. Peacock, T. P. et al. The furin cleavage site in the SARS-CoV-2 spike protein is required for transmission in ferrets. Nat. Microbiol.6, 899–909 (2021).
39. Fertig, T. E. et al. The atomic portrait of SARS-CoV-2 as captured by cryo-electron microscopy. J. Cell. Mol. Med.26, 25–34 (2022).
40. Schubert, K. et al. SARS-CoV-2 Nsp1 binds the ribosomal mRNA channel to inhibit translation. Nat. Struct. Mol. Biol.27, 959–966 (2020).
41. Yuan, S. et al. Nonstructural Protein 1 of SARS-CoV-2 Is a Potent Pathogenicity Factor Redirecting Host Protein Synthesis Machinery toward Viral RNA. Mol. Cell80, 1055-1066.e6 (2020).
42. Raj, R. Analysis of non-structural proteins, NSPs of SARS-CoV-2 as targets for computational drug designing. Biochem. Biophys. Reports25, 100847 (2021).
43. Kirchdoerfer, R. N. & Ward, A. B. Structure of the SARS-CoV nsp12 polymerase bound to nsp7 and nsp8 co-factors. Nat. Commun.10, 2342 (2019).
44. Roingeard, P. et al. The double-membrane vesicle (DMV): a virus-induced organelle dedicated to the replication of SARS-CoV-2 and other positive-sense single-stranded RNA viruses. Cell. Mol. Life Sci.79, 425 (2022).
45. Baggen, J., Vanstreels, E., Jansen, S. & Daelemans, D. Cellular host factors for SARS-CoV-2 infection. Nat. Microbiol.6, 1219–1232 (2021).
46. Sashittal, P., Zhang, C., Peng, J. & El-Kebir, M. Jumper enables discontinuous transcript assembly in coronaviruses. Nat. Commun.12, 6728 (2021).
47. Wolff, G. et al. A molecular pore spans the double membrane of the coronavirus replication organelle. Science (80-. ).369, 1395–1398 (2020).
48. David, B. & Delphine, M. Betacoronavirus Assembly: Clues and Perspectives for Elucidating SARS-CoV-2 Particle Formation and Egress. MBio12, e02371-21 (2021).
49. Sha, S. et al. Cellular pathways of recombinant adeno-associated virus production for gene therapy. Biotechnol. Adv.49, 107764 (2021).
50. Wang, D., Tai, P. W. L. & Gao, G. Adeno-associated virus vector as a platform for gene therapy delivery. Nat. Rev. Drug Discov.18, 358–378 (2019).
51. Riyad, J. M. & Weber, T. Intracellular trafficking of adeno-associated virus (AAV) vectors: challenges and future directions. Gene Ther.28, 683–696 (2021).
53. Gallardo, J., Pérez-Illana, M., Martín-González, N. & San Martín, C. Adenovirus Structure: What Is New? International Journal of Molecular Sciences vol. 22 at https://doi.org/10.3390/ijms22105240 (2021).
55. Russell, W. C. & Kemp, G. D. Role of Adenovirus Structural Components in the Regulation of Adenovirus Infection BT – The Molecular Repertoire of Adenoviruses I: Virion Structure and Infection. in (eds. Doerfler, W. & Böhm, P.) 81–98 (Springer Berlin Heidelberg, 1995). doi:10.1007/978-3-642-79496-4_6.
56. R., N. G. & L., S. P. Role of αv Integrins in Adenovirus Cell Entry and Gene Delivery. Microbiol. Mol. Biol. Rev.63, 725–734 (1999).
57. Pied, N. & Wodrich, H. Imaging the adenovirus infection cycle. FEBS Lett.593, 3419–3448 (2019).
58. Georgi, F. & Greber, U. F. The Adenovirus Death Protein – a small membrane protein controls cell lysis and disease. FEBS Lett.594, 1861–1878 (2020).
59. Hoeben, R. C. & Uil, T. G. Adenovirus DNA Replication. Cold Spring Harb. Perspect. Biol. 5, (2013).
60. McGeoch, D. J., Rixon, F. J. & Davison, A. J. Topics in herpesvirus genomics and evolution. Virus Res.117, 90–104 (2006).
61. Laine, R. F. et al. Structural analysis of herpes simplex virus by optical super-resolution imaging. Nat. Commun.6, 5980 (2015).
62. Mettenleiter, T. C., Klupp, B. G. & Granzow, H. Herpesvirus assembly: a tale of two membranes. Curr. Opin. Microbiol.9, 423–429 (2006).
63. E., H. E. Up close with herpesviruses. Science (80-. ).360, 34–35 (2018).
64. H., F. W. et al. The Large Tegument Protein pUL36 Is Essential for Formation of the Capsid Vertex-Specific Component at the Capsid-Tegument Interface of Herpes Simplex Virus 1. J. Virol.89, 1502–1511 (2015).
65. J., R. R., Rachel, F. & M., L. R. The Herpes Simplex Virus 1 UL51 Protein Interacts with the UL7 Protein and Plays a Role in Its Recruitment into the Virion. J. Virol.89, 3112–3122 (2015).
66. T., H. A., E., D. R., E., H. E. & Thomas, S. Contributions of the Four Essential Entry Glycoproteins to HSV-1 Tropism and the Selection of Entry Routes. MBio12, e00143-21 (2021).
67. Zeev-Ben-Mordehai, T., Hagen, C. & Grünewald, K. A cool hybrid approach to the herpesvirus ‘life’ cycle. Curr. Opin. Virol.5, 42–49 (2014).
68. Newcomb, W. W., Cockrell, S. K., Homa, F. L. & Brown, J. C. Polarized DNA Ejection from the Herpesvirus Capsid. J. Mol. Biol.392, 885–894 (2009).
For this guide, I will explain the fundamental biology of adenovirus capsid proteins with an emphasis on the context of gene therapy. While the guide is meant primarily for readers with an interest in applying adenovirus to gene therapy, it will not include much discussion of the techniques and technologies involved in engineering adenoviruses for such purposes. If you are interested in learning more about adenovirus engineering, you may enjoy my review paper “Synthetic Biology Approaches for Engineering Next-Generation Adenoviral Gene Therapies” . Here, I will focus mostly on the capsid of human adenovirus serotype 5 (Ad5) since it is the most commonly used type of adenovirus employed in gene therapy research, but I will occasionally describe other types of adenoviruses when necessary. Many of the presented concepts remain the same or similar across other types of adenoviruses.
The adenovirus consists of an icosahedral protein capsid enclosing a double-stranded DNA (dsDNA) genome. It possesses 12 fiber proteins which protrude from the capsid and helps to facilitate cellular transduction. Adenoviruses are nonenveloped and approximately 90 nm in diameter (not including the fibers). The Ad5 genome is about 36 kb in size. Major capsid proteins of the adenovirus include the hexon, penton, and fiber. The minor capsid proteins are protein IIIa, protein VI, protein VIII, and protein IX. Inside the capsid, there are core proteins including protein V, protein VII, protein μ (also known as protein X), adenovirus proteinase (AVP), protein IVa2, and terminal protein (TP) . There are also many proteins expressed during adenovirus infection which are not incorporated into mature capsids, including the E1A proteins (289R, 243R, 217R, 171R, and 55R), the E1B proteins (52k and 55k), the adenoviral DNA polymerase, and more .
Ad5’s genome contains a variety of transcriptional units which are expressed at different times during the viral life cycle . The E1A, E1B, E2A, E2B, E3, and E4 transcriptional units are expressed early during cellular infection. Their proteins are involved in DNA replication, transcriptional regulation, and suppression of host immune responses. The L1, L2, L3, L4, and L5 transcriptional units are expressed later in the life cycle. Their products include most of the capsid proteins as well as other proteins involved in packaging and assembly. Each transcriptional unit can produce multiple mRNAs through the host’s alternative splicing machinery.
Major capsid proteins
Adenovirus hexon represents the main structural component of the capsid. It is encoded as one of the products of the Ad5 L3 gene. Each capsid contains 240 trimers of the hexon protein (720 monomers) and each facet of the icosahedron consists of 12 trimers . The lower part of each hexon monomer consists of two eight-stranded β barrels linked by a β-sheet. The eight-stranded β-barrels are known as jellyroll domains. In between the β-strands, long loops are present. These loops contain the seven hypervariable regions (HVRs) of the hexon, which differ in sequence composition between distinct adenovirus types. The loops form the upper portion of each hexon. HVR1 of Ad5 includes a 32-residue acidic loop which might be involved in neutralizing host defensins. The valley between the loop towers of Ad5 has been shown to interact with coagulation factors as well as to bind to the CD46 cellular receptor as an alternative cell entry mechanism.
Here, the structure of the Ad5 hexon trimer is shown from a side view and from a top view (PDB 1P30). All β-sheets are red, α-helices are cyan, and loops are magenta. Jellyroll domains are visible at the base of the side view and the HVR loops can be seen in the upper half of the side view. In the top view, the hexagonal shape of the hexon is clearly visible. The N- and C- termini are both located near the bottom of the hexon (adjacent to the inside of the virion). Some disordered regions are shown as dashed lines.
The 12 pentons serve to fill pentagonal gaps within the icosahedral capsid (which arise due to the geometry of the hexons) . Penton is encoded as one of the products of the Ad5 L2 gene. Each penton also acts as a base onto which a fiber protein is anchored. Adenovirus pentons are pentamers, with each monomeric subunit consisting of a single jellyroll domain for the lower part and both a hypervariable loop and a variable loop at the top. In Ad5 and many other human adenoviruses, the penton hypervariable loop includes an RGD amino acid sequence. RGD is both an αv integrin binding motif and is a target for adenovirus neutralization by the enteric defensin HD5. Importantly, the penton’s RGD motif is essential for cellular transduction into clathrin-coated pits . RGD may also play some role in endosomal escape. The other penton variable loop (distinct from the hypervariable loop) is poorly understood from a functional standpoint. Both the hypervariable loop and the variable loop might serve as decent sites for sequence modification in the context of gene therapy vectors. The penton N-terminal domain consists of approximately 50 amino acid sequence which extends into the inside of the adenovirus virion. This sequence is mostly disordered except for the part nearest to the jellyroll domain (residues 37-51 in Ad5), which interacts with two copies of protein IIIa.
Here, the structure of the Ad5 penton is shown from side and top views (PDB 3IZO). Coloration is by subunit. In the side view, the intravirion N-terminal domains are visible at the bottom, the jellyroll domains can be seen as the groups of β-sheets in the middle, and the loops are present at the upper region. The top view clearly illustrates the pentagonal symmetry of the penton. It should be noted that, in this structure, some of the loops are missing due to the difficulty of reconstructing them at high resolution. Of special relevance here is that the loop with the RGD sequence should be located at the top of the penton (in the gap between the uppermost α-helix and a nearby loop which both terminate prematurely).
Ad5’s 12 trimeric fibers are anchored onto the tops of the pentons . They are encoded as a product of the L5 gene. These fibers initiate cellular transduction through binding of the knob domain to cellular receptors. The primary receptor for Ad5 is the coxsackievirus and adenovirus receptor (CAR). That said, it should be noted that Ad5’s fiber knob can also bind to alternative receptors such as vascular cell adhesion molecule 1 and heparan sulfate proteoglycans. For Ad5, the fiber is about 37 nm in length, but other adenoviruses can have shorter or longer fibers . Fibers consist of an N-terminal tail domain, a shaft domain, and a C-terminal knob (also called head) domain . The three N-terminal tails anchor into some of the clefts between penton monomers, likely via a hydrophobic ring region. The shaft consists of a structure known as a trimeric β-spiral. Shaft flexibility plays a role in cellular transduction by facilitating interaction of the penton with its integrin receptor after binding of the knob to CAR. Many adenovirus fibers are known to have hinges at the third β-repeat from the N-terminal tail domain . These hinges arise from an insertion of a few extra amino acids within the third β-repeat which disrupts its structure and allows for it to flex. The C-terminal knob domain consists of an antiparallel β-sandwich and is responsible for trimerization of the fiber . Its C-termini are oriented back towards the capsid of the adenovirus.
Here, part of the structure of an Ad2 fiber is shown from two perspective views (PDB 1QIU). Though there are structures of the Ad5 fiber components available, only the above Ad2 fiber structure has been assembled into a complex with and made publicly available. The Ad2 fiber is highly similar to the Ad5 fiber. Both Ad5 and Ad2 fibers have 22 β-repeats. Only a few β-repeats are included in the above structures, but that should be enough to grant an intuitive understanding of the general fiber organization.
Minor capsid proteins
Ad5 protein IX (pIX) is a 140 amino acid protein found nestled between hexons which confers greater thermostability to the capsid relative to mutants lacking pIX . There are 240 copies of pIX in the capsid. It has an N-terminal domain, a rope domain, and a C-terminal domain. The N-terminal domains of three pIX monomers interlace to form a triskelion structure in the valleys between some of the hexons. The rope domain (also called linker domain) is often disordered and connects the N- and C-terminal domains. The C-terminal domain is an α-helix which forms a coiled-coil structure with the helices of other copies of pIX monomer. This coiled coil consists of four α-helices (three parallel and one antiparallel), each from a different pIX monomer. Four triskelions and three α-helix bundles are present in each icosahedral facet of the capsid. It should be noted that some of the triskelions take on slightly different structural features depending on which hexons they are associated with within a given facet . Though all of the C-termini of pIX are exposed on the capsid surface, they can still be described as resting within crevices between hexons. Because of this, spacer peptides are usually necessary when engineering Ad5 pIX-fusions such that that the added protein is elevated out of the crevices .
Here, four copies of Ad5 pIX are shown interlacing among four hexons (top and side views) (PDB 6B1T). The C-terminal domain α-helical bundle of pIX is clearly visible. The N-terminal domain triskelion structures are not visible in these views. Hexons are portrayed in cool colors and the pIX copies are shown in magenta. Some disordered regions are shown as dashed lines.
The Ad5 protein IIIa (pIIIa) plays a structural role in stabilizing the capsid from the inside . Five copies of pIIIa are found under each vertex of the Ad5 capsid. It is 585 amino acids in length, but only residues 7 to 300 have been structurally traced at high resolution. Its N-terminal domain connects the penton and the five adjacent hexons. (These are known as the peripentonal hexons. The peripentonal hexons plus the penton are collectively named the group-of-six or GOS) Its C-terminal domain binds protein VIII (another structural protein which will be discussed later). The traced part of the pIIIa structure consists of two globular domains connected by a long α-helix.
Above, traced parts of five pIIIa proteins are shown on the underside of a part of the Ad5 capsid (perspective is from the interior) (PDB 6B1T). Hexons are colored blue, the penton is colored yellow, and pIIIa is colored bright pink. The same structure is shown below from a side perspective.
Ad5 protein VI (pVI) starts out as 250 amino acids long but is cleaved by AVP at two sites, yielding multiple peptides . The first site is after residue 33 and the second is after residue 239. The middle part contains a predicted amphipathic α-helix (residues 34-54) which inserts into host endosomal membrane. This alters the membrane’s curvature and helps facilitate lysis of the endosome, allowing the adenovirus to escape into the cytosol. The middle part also contains a domain (residues 109-143) which sometimes binds to the inner surface of the capsid in the cavities between certain hexons. The N-terminal peptide pVIN also binds to cavities between hexons. It has been suggested that this affinity hides the first pVI cleavage site in these cavities, preventing release of the membrane lytic peptide. During intracellular trafficking, environmental changes may allow adenovirus protein VII (a core protein) to outcompete pVI for the binding sites between hexons, causing release of the membrane lytic peptide. Finally, the C-terminal peptide pVIC is a cofactor which helps activate AVP. The pVIC peptide binds covalently to AVP and slides along the adenoviral genome, using the DNA as a track to reach all of the substrates in the core and the inner capsid surface. There are approximately 360 copies of protein VI in the Ad5 virion. Unfortunately, high-resolution structural data on pVI are scarce due to its variable position in the adenovirus virion.
Ad5’s protein VIII (pVIII) also contributes to structurally stabilizing the adenoviral capsid from the interior . It starts as a 227-residue protein which is cleaved by AVP at three sites, yielding two large peptides and two small peptides. The two large peptides stay together and bind between hexons. Some pVIII copies wedge between pIIIa and the peripentonal hexons, helping to connect the peripentonal hexons to the next set of surrounding hexons. Some pVIII copies are located underneath the nine hexons on the middle face of each icosahedral facet (known as the group-of-nine or GON). An interesting aspect of pVIII-hexon interactions is that can pVIII can engage in β-sheet augmentation, where a β-strand from pVIII is incorporated into one of the jellyroll domains of a nearby hexon. Not much is known about the two smaller peptides from pVIII except that these peptides do not appear to bind the capsid in a symmetric fashion.
Here, the traced parts of pVIII (red) are shown interwoven into a piece of the Ad5 capsid from an interior perspective (PDB 6B1T). Hexons are shown in shades of blue, the penton is shown in yellow, and pIIIa is displayed in bright pink.
Core proteins which interact directly with the capsid
Adenovirus protein V (pV) is a positively charged protein which can form heterodimers with the pVII core protein . That said, pV exists in a dimer-monomer equilibirium, so the binding to pVII is often transient. There also are direct associations between pV and the pVI capsid protein. These associations between pVII, pV, and pVI likely act to bridge the adenovirus core with the adenovirus capsid. In addition, pV-pVII heterodimers might interact with core protein μ. Each virion contains about 150 copies of pV. Most of the copies of pV are released during the beginning of uncoating. Interestingly, pV is not essential for adenovirus capsid assembly.
Protein VII (pVII) is a positively charged protein which plays a central role in condensing the adenovirus genome to fit into the capsid . It has many arginine residues which contribute to its positive charge. AVP cleaves pVII at residues 13 and 24. The resulting middle peptide (including amino acids 13 through 24) might compete with pVI for hexon binding sites during adenovirus assembly. As mentioned earlier, environmental changes during intracellular trafficking may allow pVII to outcompete pVI for their hexon binding sites, causing release of the membrane lytic peptide from pVI cleavage. Though pVII acts as a functional analogue of the histone, it does not share much structural similarity with histones and does not replace histones when introduced into the cellular nucleus . During infection, the viral genomic DNA as complexed with pVII is imported through nuclear pores. While pVII is important for condensing the adenoviral genome, it is not strictly required for assembly and packaging. In addition, pVII functions in signaling for the suppression of host innate immune responses. It binds to high mobility group B (HMGB) protein 1, a factor which is normally released from cells exposed to inflammation and which acts as a danger signal for the immune system. The adenoviral pVII prevents release of HMGB protein 1 and thereby dampens innate immune responses. Finally, pVII helps to regulate the progression of various steps during adenovirus genome replication.
This guide has centered on explaining the structures and functions of the Ad5 capsid proteins as well as the core proteins which are involved in key structural interactions with the capsid proteins. But this is only the beginning of learning about adenovirus biology. As mentioned in the introductory section, there are other core proteins including protein μ, the adenovirus proteinase, protein IVa2, and terminal protein which primarily interact with the adenovirus genome. Furthermore, the complex life cycle of the adenovirus requires numerous replication and packaging proteins (as well as interesting interactions with host cells) not covered here. Despite the specific focus of this guide, I hope that it is helpful to the reader for gaining a better idea of how the adenovirus capsid works. Perhaps this text will even provide a valuable bedrock of understanding for interested readers who are working on Ad5 capsid engineering projects.
 L. T. Collins and D. T. Curiel, “Synthetic Biology Approaches for Engineering Next-Generation Adenoviral Gene Therapies,” ACS Nano, Aug. 2021, doi: 10.1021/acsnano.1c04556.
 S. Kulanayake and S. K. Tikoo, “Adenovirus Core Proteins: Structure and Function,” Viruses , vol. 13, no. 3. 2021, doi: 10.3390/v13030388.
 J. Gallardo, M. Pérez-Illana, N. Martín-González, and C. San Martín, “Adenovirus Structure: What Is New?,” International Journal of Molecular Sciences , vol. 22, no. 10. 2021, doi: 10.3390/ijms22105240.
 E. Vigne et al., “Genetic manipulations of adenovirus type 5 fiber resulting in liver tropism attenuation,” Gene Ther., vol. 10, no. 2, pp. 153–162, 2003, doi: 10.1038/sj.gt.3301845.
 S. A. Nicklin, E. Wu, G. R. Nemerow, and A. H. Baker, “The influence of adenovirus fiber structure and function on vector development for gene therapy,” Mol. Ther., vol. 12, no. 3, pp. 384–393, Sep. 2005, doi: 10.1016/j.ymthe.2005.05.008.
 V. S. Reddy and G. R. Nemerow, “Structures and organization of adenovirus cement proteins provide insights into the role of capsid maturation in virus entry and infection,” Proc. Natl. Acad. Sci., vol. 111, no. 32, pp. 11715 LP – 11720, Aug. 2014, doi: 10.1073/pnas.1408462111.
 J. Vellinga et al., “Spacers Increase the Accessibility of Peptide Ligands Linked to the Carboxyl Terminus of Adenovirus Minor Capsid Protein IX,” J. Virol., vol. 78, no. 7, pp. 3470 LP – 3479, Apr. 2004, doi: 10.1128/JVI.78.7.3470-3479.2004.
Station Eleven by Emily St. John Mandel: 98/100. Much of the essence of art is to reflect what makes us human, helping us better explain to ourselves what makes us tick. Station Eleven is a science fiction novel about a deadly flu pandemic which brings about the end of the world. Notably, it was written several years prior to the emergence of COVID-19. Emily St. John Mandel wields the premise masterfully to touch our souls and help us come to terms with human kindness, cruelty, hope, and vulnerability. Through its deep tragedy and heartfelt characters, the book manages to link questions of the individual and the global. We take a hard look at how the meaning of civilization connects to the meaning of life. Emily St. John Mandel’s prose puts billions to death. Those who survive must find purpose against the backdrop of the visceral viciousness of the apocalypse. Some immerse themselves in art, traveling the postapocalyptic wilderness and performing Shakespeare plays for pockets of survivors. Some join a religious cult led by a violent prophet who resembles history’s most monstrous men. Yet even this figure is skillfully humanized (though not exonerated) as having emerged from a frightened and damaged boy. Richly constructed character histories weave together in the end, creating a gorgeous tapestry which reveals both the inherent goodness and the intrinsic darkness of the human species. Station Eleven is lyrical, haunting, and intense. It immerses the reader in a realm which translates philosophy into the more brutally real language of emotion.
This Is How You Lose the Time War by Amal El-Mohtar and Max Gladstone: 98/100. I have a special fondness for fiction which reads like poetry. This Is How You Lose the Time War by Amal El-Mohtar and Max Gladstone represents a tour de force of far-future poetic science fiction which sparkles with imagination, intensity, and wonder. An epistolary novel, it is told through letters exchanged by a pair of time-traveling cyborg supersoldiers named Red and Blue respectively who start as mortal enemies on opposite sides of a war and gradually fall in love. Each letter is delivered through a distinct medium; powdered cod bone sprinkled over a biscuit, a code of mineral veins in lava, a pattern of a bee’s flight and the venom of its sting, and many more. Red and Blue often spend decades in different pasts and futures, taking on the forms of various people and animals as part of their war. Though this conflict’s degree of convolutedness is far beyond human comprehension, the authors expertly utilize lyrical language to transmit a tantalizing taste of its scope. The central characters are so far beyond human that they should seem alien to the reader, yet their emotions come across as piercing and visceral. Beyond this, the beauty of the language gives the narrative a songlike quality which instills every passage with sensation, crispness, and vivacity. In terms of symbolism and metaphor, the book contains more than enough fractal complexity to fill the Library of Congress with multilayered literary analyses. This Is How You Lose The Time War furthermore incorporates a wealth of fascinating philosophical ideas involving love, war, peace, power, and freedom which are built on top of its spectacular wordsmithing. This book makes me feel like I am sipping liquid beauty during the cool of early morning while watching the stars of an alien sky slip beneath the horizon.
Blindsight by Peter Watts: 98/100. It is difficult to describe Blindsight. I could clumsily slap labels onto the novel and call it literary psychological sci-fi horror with an emphasis on the philosophy of neuroscience. I could vaguely refer to it as a boiling froth of darkness replete with nightmarish poetics. I could say that it manages incorporate both aliens and vampires in a terrifyingly believable fashion. I could pontificate on how the story oozes with malign hyperintelligence and conveys a sense of hurtling movement too fast to track with human eyes. Yet none of this can truly capture the frightening majesty of the narrative. More directly, Blindsight is a story about contact with aliens. After humanity first encounters the aliens, the governments of Earth send a group of cyborgs, freaks, and savants on a living spaceship to meet the aliens. The captain of this group is vampire, a technologically resurrected predator with intelligence vastly exceeding that of any human. The protagonist (Siri Keeton) had half his brain surgically removed when he was a child, rendering him incapable of empathy and forcing him to learn how to navigate social interactions through purely algorithmic techniques. Siri’s unusual backstory and motivations are richly explored over the course of the story. The novel explores ideas surrounding radical neurodivergence, transhumanism, the effects of neurotechnology on society, intelligence, consciousness, artificial intelligence, empathy, the blurring of the human-machine divide, emotional abuse, ableism, and evolutionary biology. As the book progresses, numerous psychological and philosophical revelations accrue. The aliens are more truly alien than any other aliens I have encountered in fiction. It is through a certain aspect of these aliens that the book’s most intensely frightening philosophical proposition is unveiled, but I will not spoil that for the reader. Prepare to be deeply disturbed in the most intellectually stimulating of ways.
The Chronoliths by Robert Charles Wilson: 97/100. Science fiction is the literature of ideas. Quality science fiction links these ideas to our own lives in a meaningful fashion. The Chronoliths by Robert Charles Wilson is a novel which successfully weaves together big ideas with intensely personal trajectories of individual human lives. Through this style of writing, it allows us to see ourselves in the characters and reflect upon our roles in the epic drama of civilization and the universe. The Chronoliths blends several stories into a unified narrative. It tells the story of icy monuments which periodically materialize at various locations across the Earth, causing death and destruction where they appear. These Chronoliths have writing on them, text which proclaims future military victories by a warlord named Kuin. It tells the story of an ordinary man named Scott Warden, his efforts to protect his daughter, and how his destiny is inextricably linked to the Chronoliths by the physical forces of nature. It tells the story of a genius physicist named Sulamith Chopra who finds herself increasingly obsessed with the Chronoliths and how they influence the flow of history. It tells the story of a single mother named Ashlee and her difficult relationship with her sociopathic son Adam Mills. I am struck by the deeply human identities of all of the characters (even many of the minor characters). They feel so vividly real with their struggles, quirks, backstories, and traumas. I tangibly feel their hopes and fears as they search for purpose in the midst of troubled world. All of this is accentuated by the lovingly detailed global setting which glows with verisimilitude. I should mention that I am a longtime fan of Robert Charles Wilson’s writings. His short piece Utriusque Cosmi is perhaps my favorite story of all time. Yet even with my high expectations going into The Chonoliths, I was nonetheless floored by its haunting beauty.
The Sparrow by Mary Doria Russell: 95/100. It is not easy to incorporate theology into science fiction without proselytizing the reader, yet The Sparrow does an elegant job of examining philosophy of religion through a first contact lens. At a deeper level, this book is about the human search for meaning and belonging in the universe, so even nonreligious readers can viscerally appreciate most of its ideas. Some other important themes the interplay between love, trauma, guilt, faith, anger, and healing. There are also some interesting (and reasonably balanced) forays into the psychology surrounding sexual abstinence of priests. The Sparrow charts the painful recovery of the sole survivor of a mission to make first contact with aliens through visiting them directly on their home planet. The survivor is Father Emilio Sandoz and he is physically disfigured and psychologically scarred by his experiences. The novel works backwards to explain what happened to him and the rest of the crew of the mission. This book includes some extremely disturbing occurrences. I believe that these occurrences were necessary for the story, but they might be triggering to some readers, so please be aware of this. On a lighter note, Mary Doria Russell’s writing clearly demonstrates her exceptional skills as a historian. Part of what makes this story feel so real is that it contains a wealth of impeccably researched cultural depth. Latin American settings, the history of Turkey, the bureaucracy of the Roman Catholic Church, and more are covered in loving detail. Furthermore, the characters show thoroughly believable backstories, quirky personalities, and complex psychological evolution. I care about these people. The Sparrow represents one of the most philosophically rich and thought-provoking books that I have yet encountered.
The Quantum Thief by Hannu Rajaniemi: 95/100. I would characterize The Quantum Thief as the most imaginative novel I have ever read. From beginning to end, it sparkles with kaleidoscopic strangeness. Though some readers might be put off by the onslaught of unfamiliar terminology, I found the bizarre language exhilarating. It tells the tale of a gentleman thief named Jean le Flambeur who goes through a series of convoluted adventures in a hyper-futuristic postsingularity version of our own solar system. The novel explores the unreliability of memory and mind in a future where advanced neurotechnology is ubiquitous and any dividing line between biology and technology has been completely obliterated. I possess great admiration for the sheer audacity of the Rajaniemi’s creativity. The walking city on Mars (called the Oubliette) where much of the story takes place is only the tip of the iceberg. When people die in that city, their minds are transferred into colossal robotic monsters known as the Quiet which toil beneath the city on the surface of Mars. A detective accesses the Oubliette’s exomemory to solve the mystery of a murdered Chocolatier. The living spaceship named Perhonen flirts with the thief protagonist. Every line of the book adds more of these kinds of concepts. As the plot cascades, complex mysteries of missing memories and buried pasts unravel. All this mixes with the thrill of the heist, a cast of believable and emotionally resonant characters, a complex alien political landscape, and a sense that this futuristic society has been oddly suffused with French culture. It is difficult to properly describe the profoundly colorful weirdness of The Quantum Thief. You just have to read it for yourself.
Never Let Me Go by Kazuo Ishiguro: 95/100. For many, growing up is filled with both yearning and conflict. Never Let Me Go successfully captures the emotional intensity associated with the coming-of-age process while simultaneously investigating some dark concepts in bioethics. It is the story of Kathy, Tommy, Ruth, and a few others who grow up at an unusual English boarding school called Hailsham. The book chronicles the unfolding of their lives in a vividly believable and exquisitely detailed fashion as they hurtle towards an inevitable fate. They experience the familiar trials of growing up: navigating tricky social landscapes, falling in love, learning about the world, and forming their own identities. But there is a tragic context which overshadows these experiences. To reveal the specifics of this context would spoil some key aspects of the book, so I will only state that it explores some fascinating ideas in the area of medical science fiction. Despite the bioethics-related speculation which appears later in the novel, the narrative remains centered on the individual experiences of the characters, which fits well with its stylistic approach. Themes of mortality, love, friendship, and meaning are explored throughout. Perhaps most importantly, Never Let Me Go represents a deeply emotional story. By the end, I was weeping for the intricate characters who had decided to quietly accept something very sad indeed.
Exhalation by Ted Chiang: 95/100. As someone who was strongly influenced by Ted Chiang’s first short story collection “Stories of Your Life and Others”, I came into Exhalation with high expectations. I was not disappointed. Chiang possesses a special talent for crafting brilliant short pieces that combine intense clarity, tremendous conceptual ingenuity, and vast emotional depth. For instance, The Merchant and the Alchemist’s Gate followed the lyrical style of the classic One Thousand and One Nights, provided an uplifting narrative of loss and regret and redemption, and accessed themes of acceptance and fate. Another excellent story in the collection, The Truth of Fact The Truth of Feeling, gave a balanced perspective on how technology influences the way our brains think and communicate while also examining both a complex relationship between a father and daughter and a linguistics-driven historical scenario. The Lifecycle of Software Objects examines the concept of raising artificially intelligent creatures as children in a highly believable fashion. Exhalation (the title story) takes place in an alternate universe populated by a very different sort of life, yet it precisely interrogates ideas of vital importance to both the grand human condition and the deeply personal. Ted Chiang has once again demonstrated himself as one of the greatest short form science fiction authors ever to live.
Childhood’s End by Arthur C. Clarke: 92/100. It is not easy to capture the sheer sense of awe which comes from contemplating that which is beyond human comprehension. Childhood’s End delivers a shockingly provocative glimpse into the sublime while forcing the reader to contemplate the place of humanity in the universe. As humans, many of us enjoy telling ourselves stories about loving gods. Those inclined towards Lovecraftian tales take the opposite approach, conjuring up nightmares of cosmic monsters. Arthur C. Clarke unflinchingly finds a middle ground between these extremes. At the staggering conclusion of Childhood’s End, we experience both the cold realization of our own insignificance and a spiritually satisfying transcendence. Clarke proposes that to truly understand the divine, we may need to transform into something which is no longer even remotely human. Perhaps I am of the minority opinion that I am not repulsed by this notion, though I certainly do have some reservations about it. This is a spectacularly thought-provoking novel. My only complaint is that the first two sections of the book are significantly less compelling than its Earth-shattering conclusion, though they are necessary to set it up. Because the story was published in 1953, it includes some very outdated sexist assumptions and racist terminology. (As a person who has read some of Clarke’s later novels, I can attest that he improved over time in this regard). The characters and plot in the initial two-thirds of the book feel too stiff and detached for my taste. Nonetheless, this is more than made up for with the final portion of the story. If you want to think about the big questions and experience both extreme alienness and spiritual wonderment at the same time, you should read this book.
Cover image source: The Prologue and the Promise by Robert McCall
Many different types of CRISPR-Cas nucleases possess biotechnological relevance. For a newcomer, the menagerie of Cas proteins may seem overwhelming. It can be challenging to decide which type of CRISPR system to employ in one’s research. To help address this issue, I compiled these notes. While my guide is certainly not comprehensive, it still covers a wide swath of important Cas proteins and may prove valuable as a starting point for those interested in getting a sense of the field. One should be aware that the field of CRISPR technology is moving rapidly, so some of the nucleases described here might eventually be superseded by newly discovered and/or newly engineered Cas proteins. I would also like to mention that since these notes are specifically focused on types of Cas proteins, I have omitted direct explanations of some important CRISPR technologies such as base editors, prime editors, and dead Cas systems. I also have not directly explained important CRISPR-related concepts such as non-homologous end joining (NHEJ), homology-directed repair (HDR), and adeno-associated virus (AAV) vectors. I encourage the reader to look elsewhere to learn about these subjects since they are vital for having a strong understanding of CRISPR biotechnology. I hope that you enjoy reading my notes and find them useful for your own scientific endeavors!
SpCas9 represents one of the first discovered and most commonly used CRISPR-Cas proteins.1 It comes from Streptococcus pyogenes, a gram-positive bacterial pathogen. SpCas9 employs two nuclease domains to make blunt double-stranded cuts in DNA: the HNH domain for cutting the strand which pairs with the gRNA and the RuvC domain for cutting the other strand. The protospacer adjacent motif (PAM) of SpCas9 has the sequence 5’-NGG-3’, which limits the target sites that the nuclease can find. Though wild-type (WT) SpCas9 possesses a problematic level of off-target activity, several mutant variants of the enzyme have been engineered which give it much more precision.2,3 As some examples, a few of these (but not all of them) include eSpCas9-HF, eSpCas9(1.1), and HypaCas9. The eSpCas9-HF and eSpCas9(1.1) enzymes maintain robust on-target cleavage while reducing off-target effects.3 The HypaCas9 enzyme has similar properties, but with even less off-target effects.2
At 1053 amino acids in length, SaCas9 is significantly smaller than SpCas9 (which is 1368 amino acids long).4 SaCas9 can be used in mammalian cells, employs NNGRRT PAM sites (R is A or G), and uses RuvC and HNH domains for cutting. But without further engineering, SaCas9 has lower target specificity even than SpCas9. Fortunately, mutant versions of SaCas9 which exhibit improved targeting accuracy have been developed. Tan et al. engineered SaCas9-HF, a version of the protein which has much less off-target activity relative to the WT SaCas9 and retains its on-target activity.4 With such improvements, SaCas9-HF can serve as a useful alternative to SpCas9.
The LbCas12a enzyme makes staggered cuts using a single RuvC domain (and no HNH domain), uses T-rich PAM sites, and catalyzes its own crRNA maturation.5 LbCas12a comes from Lachnospiraceae bacterium ND2006. LbCas12a has another remarkable property: the binding and cleavage of target dsDNA activates a separate part of the protein which nonspecifically cleaves any ssDNA in its vicinity. This nonspecific trans-cleavage activity is thought to occur as a result of a conformational change in the LbCas12a protein which exposes its RuvC domain for broader ssDNA attack after binding to target dsDNA.6 It should be noted that other type-V Cas proteins including AsCas12a (see corresponding section), FnCas12a (from the bacterium Francisella novicida), and AaCas12b (from the bacterium Alicyclobacillus acidoterrestris) have been shown to exhibit the same capabilities.5 There furthermore exist many RNA-guided RNA-targeting Cas proteins which possess the same types of abilities.7 There are likely many other type-V Cas proteins with these capabilities as well. The activation of type-V Cas proteins to perform indiscriminate ssDNA cleavage after exposure to target dsDNA has been exploited as a target-induced signal amplification method to develop novel molecular diagnostics.6
The AsCas12a protein (also called Cpf1) is derived from Acidaminococcus sp.,8 which are a group of anaerobic gram-negative bacteria. The protein exhibits several distinctive features compared to Cas9. AsCas12a utilizes a T-rich PAM site, unlike Cas9’s G-rich PAM. This is useful since it expands the possible targets for CRISPR. In particular, the T-rich PAM of AsCas12a can be useful when dealing with organisms that have AT-rich genomes such as Plasmodium falciparum. The naturally occurring form of AsCas12a does not require a tracrRNA, instead its CRISPR arrays are processed into just crRNAs, which serve to complete the functional AsCas12a-crRNA complex. Rather than creating blunt ends, AsCas12a makes staggered cuts with 4-5 nucleotide 5’ overhangs. This is useful since it increases the precision of non-homologous end joining (NHEJ) repair and allows insertion of DNA sequences at a chosen cut site with a desired orientation as specified by the base pairing of the insert with the overhang sequences. In addition, the AsCas12a protein employs a single RuvC domain to make its staggered cuts and does not have an HNH domain. AsCas12a has a lower tolerance for gRNA-target mismatches9 compared to SpCas9 and therefore demonstrates greater targeting specificity. As a result, AsCas12a shows fewer off-target effects overall. But it also has a lower editing efficiency compared to Cas9 proteins, which means that less cells receive any edits upon introduction of the AsCas12a. As described with LbCas12a, the AsCas12a protein also can carry out nonspecific ssDNA cleavage after it cuts to its target dsDNA.
WT AsCas12a possesses high targeting specificity, low off-target effects, and makes 5’ overhangs which facilitate correct insert orientation (see the section on AsCas12a). These properties represent desirable qualities for therapeutic gene editing, but the low editing efficiency of AsCas12a limits its therapeutic potential. Because of this, Zhang et al. (in a collaboration between Editas and Integrated DNA Technologies) developed an engineered version of the protein which was dubbed AsCas12a ultra.9 This AsCas12a ultra protein was created using directed evolution in bacteria. It has two point mutations relative to WT AsCas12a, M537R and F870L. These mutations grant the AsCas12a ultra extremely high editing efficiency while maintaining the protein’s low level of off-target effects. For a variety of target sites, Zhang et al. demonstrated nearly 100% editing efficiency in HSPCs, iPSCs, T cells, and NK cells using AsCas12a ultra. They also showed 93% efficiency for simultaneous disruption of three genes in T cells. When performing knock-in edits, Zhang et al. achieved efficiencies of 60% in T cells, 50% in NK cells, and 30% in HSPCs. These impressive numbers illustrate the utility of AsCas12a ultra as a broadly applicable tool for therapeutic gene editing.
The AsCas12f1 protein consists of only 422 amino acids, making it one of the smallest Cas proteins known.10 It comes from a type of gram-positive iron-oxidizing bacteria called Acidibacillus sulfuroxidans. AsCas12f1 makes staggered double-stranded breaks in target DNA and recognizes 5’ T-rich PAMs. Even with minimal engineering (just the construction of gRNA from combining its tracrRNA and mature crRNA), Wu et al. showed that AsCas12f1 exhibits usable levels of activity in mammalian cells.10 When expressed directly in mammalian cells via a plasmid, the protein achieved a maximum indel efficiency of 32.8%. When delivered to mammalian cells by AAV-DJ, the maximum indel efficiency was 11.5%. The AsCas12f1 protein possesses considerable promise as a compact therapeutic gene editing tool.
Kim et al.’s engineered Un1Cas12f
At 529 amino acids in length, the Un1Cas12f nuclease represents one of the smallest Cas proteins yet discovered.11 This is useful since the small size of Un1Cas12f’s gene allows it to easily fit within AAV vectors. It comes from an uncultured archaeon and is classified as a type-V CRISPR nuclease, which utilize a C-terminal RuvC domain and do not possess an HNH domain. Though the original Un1Cas12f-gRNA complex has very low editing efficiency in eukaryotic cells, Kim et al. were able to intensively engineer the gRNA using a rational design strategy and achieve an 867-fold improvement of indel frequency in mammalian cells.12 They also showed that the Un1Cas12f gene and gRNA gene could be delivered to the cells using AAVs. Because of its small size, Un1Cas12f may serve as an excellent scaffold for creating base editors and prime editors which fit inside of AAVs.
The CasMINI protein is another engineered CRISPR nuclease derived from Cas12f,13 which comes from an uncultured archaeon. This Cas12f is the same as the Un1Cas12f used by Kim et al.12 Since Cas12f has little to no editing activity in mammalian cells, Xu et al. used rational design to optimize the associated gRNA and employed directed evolution to optimize the protein itself.13 CasMINI, a 529 amino acid protein, was the end result of these approaches. When CasMINI was modified to make dCasMINI-VPR (the VPR is a protein fusion which activates certain genes), it performed with comparable efficiency relative to the commonly used dLbCas12a-VPR. In some cases, dCasMINI-VPR actually outperformed dLbCas12a-VPR. When dCasMINI was modified by fusing base editor (ABE) domains at its N-terminus, the dCasMINI-ABE constructs performed base editing at comparable efficiency relative to dLbCas12a-ABE proteins. Because of their small sizes, the genes encoding the dCasMINI-ABE designs could easily fit into AAV vectors, though Xu et al. did not test this in their paper. Furthermore, even the genes encoding CasMINI prime editors should fit into AAV vectors. It should be noted that the most efficient dCasMINI-ABE base editing occurred in a narrow window precisely 3-4 bp downstream of the PAM site. When CasMINI was tested for its ability to perform gene editing by making indels, it showed significantly improved activity over Cas12f, though the editing efficiencies were still fairly low at around 5-10%.
The Cas12j enzyme, also known as CasΦ, comes from the genomes of huge bacteriophages of the Biggiephage clade.14 This is remarkable since CRISPR systems have usually been found in bacteria and archaea rather than viruses (though the prevalence of such machinery in viruses is perhaps underestimated). It has been hypothesized that Biggiephages use Cas12j to cut the DNA of other competing bacteriophages. There exist subtypes of Cas12j such as Cas12j-1, Cas12j-2, and Cas12j-3. All of the Cas12j nucleases are small at between 700 and 800 amino acids in length. The Cas12j nuclease cuts target dsDNA using a single C-terminal RuvC domain. Cas12j’s RuvC domain has a small amount of homology to the TnpB protein superfamily from which type-V Cas proteins evolved, yet it still shares <7% amino acid identity overall with type-V Cas proteins. Cas12j is most closely related to a type of TnpB group which is distinct from the type-V enzymes. The Cas12j nuclease catalyzes its own crRNA maturation using its RuvC domain (similar to the type-V nucleases). Unlike the type-V Cas proteins, Cas12j uses the same active site for both its RuvC cleavage of target DNA and its RuvC processing of the crRNA. It employs T-rich PAM sites which have fairly minimal target requirements. For example, the PAM of the Cas12j-2 subtype is 5’-TBN-3’ (B = G, T, or C). These minimal requirements give Cas12j expanded target recognition capabilities compared to other Cas proteins. Cas12j is active in vitro as well as within bacterial, human, and plant cells. Cas12j-2 (with a gRNA) has been observed to edit up to 33% of HEK293 cells. Though this may sound somewhat low, it represents an editing efficiency comparable to that initially reported for Cas9.
The LwaCas13a protein is a type-VI CRISPR nuclease and it cleaves RNA rather than DNA.15 It represents one of the most active types of RNA-guided RNA-targeting Cas proteins. LwaCas13a catalyzes the maturation of its own crRNA. The enzyme comes from Leptotrichia wadei, a type of anaerobic gram-negative bacteria found in saliva. LwaCas13a has demonstrated around 50%-80% knockdown of target RNAs in mammalian and plant cells. This is similar to the knockdown efficiencies of shRNAs, but LwaCas13a shows much lower off-target effects. When converted into dLwaCas13a, the protein can act as an RNA imaging tool. It has also been reported to have strong potential for therapeutics as well. One of the most important emerging applications of LwaCas13a (and similar Cas proteins) is that they can be used in diagnostics for infectious diseases.7 To do this, the LwaCas13a gRNA can be designed to target an RNA sequence from a desired pathogen. LwaCas13a can then be mixed with a short reporter RNA oligonucleotide which has a fluorophore at one end and a quencher at the other (the fluorophore is quenched by its close proximity to the quencher). If the target pathogen RNA is introduced, LwaCas13a will cleave said target RNA as well as activate nonspecific trans-cleavage activity (see section on LbCas12a), leading to cleavage of the reporter oligonucleotides. When the reporter oligonucleotides are cleaved, the fluorophore is released from the quencher, resulting in observable fluorescence. It should be noted that many CRISPR-based diagnostics require some form of target nucleic acid amplification step to increase signal prior to the usage of a Cas protein like LwaCas13a, though ways to mitigate this limitation are undergoing rapid development.16
Kannan et al. identified Cas13bt1 and Cas13bt3 as useful RNA-targeting CRISPR nucleases since Cas13bt has some activity in human cells.17 Cas13bt1 and Cas13bt3 are small at just 804 amino acids and 775 amino acids respectively. It should be noted that Cas13bt also exhibits nonspecific nonspecific trans-cleavage activity (see section on LbCas12a) after cleaving its RNA target, which may allow its usage in diagnostics. Kannan et al. took advantage of the small sizes of Cas13bt1 and Cas13bt3 to develop compact RNA base editors. They fused an ADAR2 hyperactive adenosine deaminase catalytic domain onto dCas13bt1 and dCas13bt3. The resulting constructs were respectively named REPAIR.t1 and REPAIR.t3 and were shown to facilitate adenosine to inosine conversion in target RNAs. They also fused an ADAR2dd cytidine deaminase domain (which was itself created through directed evolution) onto dCas13bt1 and dCas13bt3. The resulting constructs were respectively named RESCUE.t1 and RESCUE.t3 and were shown to facilitate conversion of cytosine to uracil in target RNAs. Due to the small sizes of Cas13bt enzymes, all of these RNA base editors were small enough to fit inside of AAV vectors even alongside gRNA encoding sequences. The authors demonstrated successful AAV-mediated delivery to cells, but the editing efficiencies were low, so further optimization will likely be necessary.
The CasX nuclease represents a distinct type of Cas protein which does not share much sequence similarity with other types of CRISPR enzymes except for a RuvC domain.18 It is an RNA-guided DNA-targeting endonuclease which has minimal nonspecific trans-cleavage activity. Using its single RuvC domain, CasX creates staggered cuts (with about 10 nucleotide overhangs) in dsDNA complementary to its gRNA and adjacent to its TTCN PAM sites. CasX nucleases are <1000 amino acids in length, which is smaller than Cas9 and Cas12a. This could be useful for AAV-mediated delivery of CasX systems. There are different subtypes of CasX which come from different bacteria. Two of the known subtypes are DpbCasX (from Deltaproteobacteria) and PlmCasX (from Planctomycetes). DpbCasx can act in human cells, though it shows limited gene editing efficiency. PlmCasX generally has better efficiency at performing in human cells and can often achieve targeted disruption of genes in around a third of transfected cells. While this level of disruption is still modest, it is similar to the levels originally found with WT Cas9 enzymes before they were optimized for gene editing.
Un1Cas12f (previously known as Cas14a)
The Cas12f proteins represent a class of small CRISPR nucleases (400-700 amino acids in length) that are capable of RNA-guided cleavage of ssDNA or dsDNA depending on whether the gRNA or crRNA includes a PAM. They employ a RuvC domain for cleavage and do not possess an HNH domain. There are various subtypes of Cas12f, but Un1Cas12f (previously Cas14a1) has been studied in the most detail. Un1Cas12f was first reported to selectively cleave ssDNA and not dsDNA.19 It was also initially reported to not require a PAM site for targeting. Without the constraint of needing a PAM site for targeting, Un1Cas12f has broader possibilities for which ssDNA sequences can be targeted. However, later research revealed that Un1Cas12f can cleave dsDNA when a 5’ T-rich PAM sequence is included in the gRNA or crRNA.20 As with many other types of Cas proteins, Un1Cas12f exhibits nonspecific nonspecific trans-cleavage activity of dsDNA (see section on LbCas12a) after cleaving its target DNA, which grants it utility as a component of diagnostics.
The Cas7-11 protein is an RNA-guided RNA-targeting CRISPR nuclease.21 It is named Cas7-11 because it arose evolutionarily from a fusion of a protein known as Cas7 with a protein known as Cas11. The DiCas7-11 enzyme comes from the gram-negative sulfate-reducing bacteria Desulfonema ishimotonii (there also exist similar types of Cas7-11 from other species). An important advantage of DiCas7-11 is that it does not have a toxic effect on host cells (bacterial or mammalian). By comparison, RNA knockdown technologies including shRNA, LwaCas13a, PspCas13b, and RfxCas13d typically cause around 30-50% host cell death. DiCas7-11 shows similar knockdown efficiencies compared to these other RNA knockdown technologies while demonstrating no detectable cellular toxicity. Unfortunately, DiCas7-11 is also fairly large at 1602 amino acids, making it difficult to package into AAV vectors. One more application of Cas7-11 is RNA editing. The creation of a dDiCas7-11 fused to a base editor domain has enabled RNA editing in mammalian cells.
3D structure images were created using PyMol.
(1) Anders, C.; Niewoehner, O.; Duerst, A.; Jinek, M. Structural Basis of PAM-Dependent Target DNA Recognition by the Cas9 Endonuclease. Nature2014, 513 (7519), 569–573. https://doi.org/10.1038/nature13579.
(2) Chen, J. S.; Dagdas, Y. S.; Kleinstiver, B. P.; Welch, M. M.; Sousa, A. A.; Harrington, L. B.; Sternberg, S. H.; Joung, J. K.; Yildiz, A.; Doudna, J. A. Enhanced Proofreading Governs CRISPR–Cas9 Targeting Accuracy. Nature2017, 550 (7676), 407–410. https://doi.org/10.1038/nature24268.
(3) M., S. I.; Linyi, G.; Bernd, Z.; A., S. D.; X., Y. W.; Feng, Z. Rationally Engineered Cas9 Nucleases with Improved Specificity. Science (80-. ).2016, 351 (6268), 84–88. https://doi.org/10.1126/science.aad5227.
(4) Tan, Y.; Chu, A. H. Y.; Bao, S.; Hoang, D. A.; Kebede, F. T.; Xiong, W.; Ji, M.; Shi, J.; Zheng, Z. Rationally Engineered Staphylococcus Aureus Cas9 Nucleases with High Genome-Wide Specificity. Proc. Natl. Acad. Sci.2019, 116 (42), 20969 LP – 20976. https://doi.org/10.1073/pnas.1906843116.
(5) S., C. J.; Enbo, M.; B., H. L.; Maria, D. C.; Xinran, T.; M., P. J.; A., D. J. CRISPR-Cas12a Target Binding Unleashes Indiscriminate Single-Stranded DNase Activity. Science (80-. ).2018, 360 (6387), 436–439. https://doi.org/10.1126/science.aar6245.
(6) Nalefski, E. A.; Patel, N.; Leung, P. J. Y.; Islam, Z.; Kooistra, R. M.; Parikh, I.; Marion, E.; Knott, G. J.; Doudna, J. A.; Le Ny, A.-L. M.; Madan, D. Kinetic Analysis of Cas12a and Cas13a RNA-Guided Nucleases for Development of Improved CRISPR-Based Diagnostics. iScience2021, 24 (9), 102996. https://doi.org/https://doi.org/10.1016/j.isci.2021.102996.
(7) Kellner, M. J.; Koob, J. G.; Gootenberg, J. S.; Abudayyeh, O. O.; Zhang, F. SHERLOCK: Nucleic Acid Detection with CRISPR Nucleases. Nat. Protoc.2019, 14 (10), 2986–3012. https://doi.org/10.1038/s41596-019-0210-2.
(8) Zetsche, B.; Gootenberg, J. S.; Abudayyeh, O. O.; Slaymaker, I. M.; Makarova, K. S.; Essletzbichler, P.; Volz, S. E.; Joung, J.; van der Oost, J.; Regev, A.; Koonin, E. V.; Zhang, F. Cpf1 Is a Single RNA-Guided Endonuclease of a Class 2 CRISPR-Cas System. Cell2015, 163 (3), 759–771. https://doi.org/https://doi.org/10.1016/j.cell.2015.09.038.
(9) Zhang, L.; Zuris, J. A.; Viswanathan, R.; Edelstein, J. N.; Turk, R.; Thommandru, B.; Rube, H. T.; Glenn, S. E.; Collingwood, M. A.; Bode, N. M.; Beaudoin, S. F.; Lele, S.; Scott, S. N.; Wasko, K. M.; Sexton, S.; Borges, C. M.; Schubert, M. S.; Kurgan, G. L.; et al. AsCas12a Ultra Nuclease Facilitates the Rapid Generation of Therapeutic Cell Medicines. Nat. Commun.2021, 12 (1), 3908. https://doi.org/10.1038/s41467-021-24017-8.
(12) Kim, D. Y.; Lee, J. M.; Moon, S. Bin; Chin, H. J.; Park, S.; Lim, Y.; Kim, D.; Koo, T.; Ko, J.-H.; Kim, Y.-S. Efficient CRISPR Editing with a Hypercompact Cas12f1 and Engineered Guide RNAs Delivered by Adeno-Associated Virus. Nat. Biotechnol.2021. https://doi.org/10.1038/s41587-021-01009-z.
(14) Patrick, P.; Basem, A.-S.; Ezra, B.-R.; A., T. C.; Zheng, L.; F., C. B.; J., K. G.; E., J. S.; F., B. J.; A., D. J. CRISPR-CasΦ from Huge Phages Is a Hypercompact Genome Editor. Science (80-. ).2020, 369 (6501), 333–337. https://doi.org/10.1126/science.abb1400.
(15) Abudayyeh, O. O.; Gootenberg, J. S.; Essletzbichler, P.; Han, S.; Joung, J.; Belanto, J. J.; Verdine, V.; Cox, D. B. T.; Kellner, M. J.; Regev, A.; Lander, E. S.; Voytas, D. F.; Ting, A. Y.; Zhang, F. RNA Targeting with CRISPR–Cas13. Nature2017, 550 (7675), 280–284. https://doi.org/10.1038/nature24049.
(17) Kannan, S.; Altae-Tran, H.; Jin, X.; Madigan, V. J.; Oshiro, R.; Makarova, K. S.; Koonin, E. V; Zhang, F. Compact RNA Editors with Small Cas13 Proteins. Nat. Biotechnol.2021. https://doi.org/10.1038/s41587-021-01030-2.
(18) Liu, J.-J.; Orlova, N.; Oakes, B. L.; Ma, E.; Spinner, H. B.; Baney, K. L. M.; Chuck, J.; Tan, D.; Knott, G. J.; Harrington, L. B.; Al-Shayeb, B.; Wagner, A.; Brötzmann, J.; Staahl, B. T.; Taylor, K. L.; Desmarais, J.; Nogales, E.; Doudna, J. A. CasX Enzymes Comprise a Distinct Family of RNA-Guided Genome Editors. Nature2019, 566 (7743), 218–223. https://doi.org/10.1038/s41586-019-0908-x.
(19) B., H. L.; David, B.; S., C. J.; David, P.-E.; Enbo, M.; P., W. I.; C., C. J.; C., K. N.; F., B. J.; A., D. J. Programmed DNA Destruction by Miniature CRISPR-Cas14 Enzymes. Science (80-. ).2018, 362 (6416), 839–842. https://doi.org/10.1126/science.aav4294.