**PDF version:** Introduction to the Physical Chemistry of Protein Folding

**Native and denatured states**

Under a given set of conditions, a protein will exhibit a stable equilibrium configuration. Stable equilibrium configurations are conformational states which depend only on thermodynamics and not on kinetics. That is, a stable equilibrium configuration can be reached by allowing a protein to achieve thermodynamic equilibrium (under the given set of conditions). The steps that the protein takes to reach this configuration and the rates at which these steps occur are not relevant to stable equilibrium conformations since such factors involve kinetics.

The native structure of a protein and the denatured structure of a protein (under native and denaturing conditions respectively) represent two important types of equilibrium configurations. Because of this, many properties of folded and unfolded proteins can be expressed in terms of equilibrium thermodynamics rather than needing to involve kinetics. Recall that native structures occur under typical biological conditions and denatured structures occur under harsh conditions such as high salt or temperature.

Changes in protein structure are either reversible or irreversible. In the case of a reversible change, a protein will eventually return to its initial state if the initial conditions are reestablished. In the case of an irreversible change, a protein will not return to its initial state even if initial conditions are reestablished. As an example, the formation of covalent bonds can induce irreversible changes in proteins.

**Equilibrium and protein folding**

To measure protein stability experimentally, solutions of a protein are mixed with varying levels of denaturing agents or incubated at varying temperatures. The fraction of folded protein f_{N} and the fraction of unfolded protein f_{D} = 1 – f_{N} are measured via spectroscopy or other techniques. These data can be used to make a denaturation curve.

To gain insight from this experiment, the data are used to find the folding free energy ΔG_{fold}. The folding free energy can be computed using the following equation. K is a folding equilibrium constant, R is the gas constant, and T is the temperature in Kelvin.

When denaturing proteins via chemical methods, the denaturation curve is given by the function ΔG_{fold}(c) where c is the concentration of the denaturant. ΔG_{fold}(c) is typically a linear function.

When denaturing proteins via heat, the denaturation curve is given by the function ΔG_{fold}(T) where T is the temperature in Kelvin. An instrument called a differential scanning calorimeter can apply varying temperatures to a protein solution and measure the amount of heat taken up or given off by the solution. As temperature increases, the heat absorption rises to a maximum level and then decreases, indicating that energy is invested to unfold the proteins until they are fully denatured. The point at which heat absorption reaches a maximum is called the denaturation temperature or melting temperature T_{m}. The excess heat capacity of unfolding ΔC_{p} is the difference between the pre-denaturation baseline heat capacity and post-denaturation baseline heat capacity.

Since ΔG_{fold} is the free energy change associated with the folding of a protein, it can be related to the enthalpy change ΔH_{fold} and the entropy change ΔS_{fold} using the following Gibbs free energy change equation.

**Protein folding energy landscapes**

Energy landscapes are often used to visualize protein folding. Energy or free energy is plotted along the vertical axis while the configurational space of the protein chain is represented by the horizontal axes. Protein folding energy landscapes roughly funnel-shaped, but with a variety of further peaks and valleys along the sides of the funnels. This illustrates that the protein tends towards a global energetic minimum (its folded state), but that to get to the folded state, the polypeptide chain must navigate past local energetic minima and energetic maxima.

Although energy landscapes are typically represented as 3-dimensional plots, configurational space often exists in hundreds of dimensions or more. For instance, a peptide consisting of 10 amino acids might have 150 atoms, each with x, y, and z coordinates. If this were used as the basis for the configurational space, the peptide’s configurational space would be 450-dimensional and its energy landscape would be 451-dimensional. While there are other ways of describing configurational space, it is difficult to reduce the number of dimensions to 2. As a result, the visual picture of a 3-dimensional energy landscape is usually more of a helpful conceptual aid than a quantitative portrait.

**Driving forces of protein folding**

Native protein structures typically have hydrophobic cores while exposing more polar moieties to the solvent. Hydrogen bonds are also important for protein structures as they are key in stabilizing α-helices and β-sheets as well as in a variety of other aspects of protein folding. Van der Waals interactions contribute to the tight packing of protein chains. Salt bridges (ionic interactions) can stabilize protein structures via electrostatic attraction.

There are several other important ways of classifying interactions between amino acids in proteins. Local interactions are those between amino acid residues that are close to each other in a polypeptide chain (e.g. within the same helix or turn) while nonlocal interactions occur between amino acid residues that are farther apart in a sequence (e.g. between β-sheet strands). Short-ranged interactions are those which are close together in space such as van der Waals interactions that depend on 1/r^{6} (where r is the distance of the interaction). Long-ranged interactions are those which are far apart in space such as Coulombic interactions that depend on 1/r.

Beyond these factors, the entropies of proteins make major contributions to protein folding. However, chain entropies are not observable from structures alone. Statistical mechanics models allow one to gain insights into protein entropy.

**Statistical mechanics and protein folding**

According to statistical mechanics, the free energy of a system can be computed using the partition function Q, a description of the microstates of the system. Microstates consist of specific configurations of a system and the associated energy levels of those configurations. Because quantum mechanics dictates that there are discrete energy levels, there exists a finite number of microstates for any given system. As an example, consider a protein in a specific conformational state with specific energy levels across all its constituent particles. This protein exists in a single microstate until it moves and changes to a different microstate. To compute the free energy of a system from the partition function, the following equation is employed.

The partition function itself is a sum of the relative statistical weights of all of the system’s possible microstates. Here, ε_{j} is the energy of a given microstate and ω(ε_{j}) is the number of microstates that have energy ε_{j}. Note that ω(ε_{j}) is also called the degeneracy of the j^{th} microstate.

Making a statistical mechanics model requires first knowing all of the system’s microstates. This includes the configurations, the energies ε_{j} of those configurations, and the number of configurations ω(ε_{j}) that result in a specific energy level ε_{j}. When the microstates are known, the probability of a given microstate can be computed using the equation below.

After computing these probabilities, it is possible to use them to compute weighted averages (known as ensemble averages) for desired properties of the system such as energy or fraction of folded proteins. For some property A that takes on the value A_{j} when in state j, the ensemble average is given by the following equation.

**The HP model of protein folding and unfolding**

Simple models of protein folding can reveal insights into how folding works. The HP model represents a protein as a chain of beads on a 2D or 3D lattice. The chain of beads can take on a variety of different configurations, but it cannot double back on itself or place two beads in the same location. In the HP model, there are two types of beads including hydrophobic beads and polar beads.

The model also describes contacts between beads which are adjacent to each other on the lattice but not adjacent in the sequence of the chain. When two hydrophobic beads make a contact, there is a favorable interaction energy of ε_{0} < 0. All other contacts have interaction energies of zero.

As an example, consider a chain with six beads on a 2D lattice. The beads at positions 1, 4, and 6 are hydrophobic while the beads at positions 2, 3, and 5 are polar. In this case, there are 36 possible configurations, each with a corresponding energy value. Each of these configurations is a specific microstate of the system.

The energy level of each microstate is determined by how many hydrophobic contacts are made. In this system, three energy levels are possible including 0, ε_{0}, and 2ε_{0}. The collections of microstates at each of these energy levels are called macrostates. The macrostate at energy 0 includes 28 microstates (also called a degeneracy of 28), the macrostate at energy ε_{0} has 7 microstates, and the macrostate at energy 2ε_{0} has 1 microstate. Using this information, the partition function Q for the system is given as follows.

By using the following equation (which was described in general terms earlier) along with the partition function from above, the probabilities of the macrostates are found. Here, p_{1 }corresponds to the unfolded state, p_{2 }is an intermediate state, and p_{3} represents the folded state.

Once the probabilities of the microstates are found, one can compute the ensemble average energy using the next equations.

Another useful way of applying statistical thermodynamics to protein folding using the HP model is to compute the value of ΔG_{fold}. Recall that ΔG_{fold} = –RT·ln(f_{N}/f_{D}). Since f_{N} is the native fraction and f_{D} is the denatured fraction, f_{N} = p_{3} = p_{N} (folded state) and f_{D} = p_{1} = p_{D} (unfolded state).

By using the above equation for ΔG_{fold}, the midpoint temperature of the chain’s folding transition can be found. The folding transition happens at the temperature where p_{N} = p_{D}. When this is the case, the term inside the natural logarithm is equal to 1, making ΔG_{fold} = 0. As a result, solving for the temperature under these conditions gives the midpoint temperature of the chain’s folding transition.

These statistical thermodynamics methods are applicable to many protein properties and the principle of the HP model can be extended to analogous, but more complicated, models of protein folding. Using such techniques, a broad variety of insights around protein folding can be gleaned.

**Reference: **Bahar, I., Jernigan, R. L., & Dill, K. A. (2017). *Protein Actions: Principles and Modeling*. CRC Press LLC.

** **