Protein Folding: How Polypeptides Find Their Functional Shape
The Folding Problem
Cyrus Levinthal pointed out in 1969 that if a protein were to explore all possible conformations randomly, it would take longer than the age of the universe to find its native state. A modest protein of 100 amino acids, with roughly three possible backbone conformations per residue, would have approximately 3^100 (about 5 x 10^47) conformations to sample. Yet proteins fold in milliseconds to seconds. This apparent paradox, known as Levinthal's paradox, demonstrates that folding cannot be a random search and must instead follow a directed pathway.
Christian Anfinsen's classic experiments in the 1950s and 1960s with the enzyme ribonuclease A showed that a denatured (unfolded) protein could refold spontaneously into its functional conformation when denaturing conditions were removed. This demonstrated that the information needed for correct folding is contained entirely within the amino acid sequence. Anfinsen received the Nobel Prize in Chemistry in 1972 for establishing this thermodynamic hypothesis of protein folding.
Forces That Drive Folding
Protein folding is driven by several noncovalent forces that collectively favor the native state over the unfolded ensemble.
The hydrophobic effect is widely considered the dominant driving force. Nonpolar amino acid side chains (valine, leucine, isoleucine, phenylalanine, and others) are thermodynamically unfavorable in contact with water because they force water molecules into ordered arrangements around them, decreasing entropy. Folding buries these hydrophobic residues in the protein's interior, away from water, releasing the ordered water molecules and increasing the overall entropy of the system. This hydrophobic core is a hallmark of globular protein structure.
Hydrogen bonds form between backbone amide and carbonyl groups and between polar side chains. In the unfolded protein, these groups hydrogen-bond with water. Upon folding, they form intramolecular hydrogen bonds instead. While the exchange of water hydrogen bonds for intramolecular ones is roughly energetically neutral, the specific pattern of hydrogen bonds determines secondary structure elements (alpha helices, beta sheets) and contributes to the overall specificity of the fold.
Van der Waals interactions arise from close packing of atoms in the protein interior. Individually weak, these contacts are extremely numerous in a tightly packed protein core and collectively contribute significant stabilization energy. The interior of a folded protein is packed nearly as tightly as a crystal of small organic molecules, with very few cavities or gaps.
Electrostatic interactions, including salt bridges between oppositely charged side chains (for example, between lysine and aspartate), can stabilize the folded state, particularly on the protein surface where the aqueous environment partially shields the charges. Disulfide bonds between cysteine residues provide covalent cross-links that stabilize the folded structure of many secreted and extracellular proteins.
The Energy Landscape
Modern understanding of protein folding uses the concept of a funnel-shaped energy landscape. The unfolded protein occupies the broad rim of the funnel, where many conformations have high free energy. As folding progresses, the protein descends the funnel toward lower free energy states, with the native state at the bottom. The funnel is not smooth but contains local minima (kinetic traps) where partially folded intermediates can become temporarily stuck.
The overall free energy difference between the unfolded and folded states is surprisingly small, typically only 5 to 15 kcal/mol for a globular protein. This marginal stability means that the native state results from a delicate balance of many stabilizing and destabilizing contributions. It also means that relatively modest perturbations, such as a single amino acid mutation, a change in temperature, or a shift in pH, can tip the balance and cause unfolding or misfolding.
Many proteins fold through defined intermediates rather than in a single step. Molten globule intermediates have substantial secondary structure and a compact shape but lack the tight side-chain packing of the native state. These intermediates represent partially descended positions on the energy funnel and are important for understanding both the kinetics and thermodynamics of folding.
Molecular Chaperones
Although the information for folding is encoded in the amino acid sequence, many proteins require assistance from molecular chaperones to fold correctly in the crowded cellular environment. The cytoplasm contains protein concentrations of 200 to 300 mg/mL, and at these concentrations, exposed hydrophobic surfaces on partially folded proteins can interact with neighboring molecules, leading to aggregation rather than productive folding.
The Hsp70 family (heat shock protein 70) chaperones bind to short hydrophobic segments of unfolded or partially folded proteins, preventing premature aggregation. They bind and release substrates in a cycle driven by ATP hydrolysis. Hsp70 chaperones often act co-translationally, binding to hydrophobic segments as they emerge from the ribosome and keeping them soluble until the entire polypeptide chain has been synthesized and can fold productively.
The chaperonins (such as GroEL/GroES in bacteria and TRiC/CCT in eukaryotes) provide a different type of assistance. GroEL is a large barrel-shaped complex that encapsulates a single unfolded protein within its central cavity, capped by the lid-like GroES co-chaperonin. Inside this chamber, the protein can fold in isolation, protected from aggregation. The cycle of binding, encapsulation, and release is driven by ATP hydrolysis. Approximately 10 to 15 percent of all newly synthesized proteins in E. coli require GroEL/GroES for correct folding.
Other folding helpers include protein disulfide isomerase (PDI), which catalyzes the formation, breakage, and rearrangement of disulfide bonds in the endoplasmic reticulum, and peptidyl-prolyl isomerase, which accelerates the cis-trans isomerization of peptide bonds preceding proline residues, a step that can otherwise be rate-limiting for folding.
Misfolding and Disease
When protein folding fails, the consequences can be severe. Misfolded proteins may lose their normal function, gain a toxic function, or both. A particularly dangerous outcome is the formation of amyloid fibrils, highly ordered aggregates in which misfolded proteins stack into cross-beta structures with repeating hydrogen-bonded beta strands running perpendicular to the fibril axis. These fibrils are extremely stable and resistant to degradation.
In Alzheimer's disease, amyloid-beta peptides aggregate into plaques in the brain, and the protein tau forms neurofibrillary tangles. In Parkinson's disease, the protein alpha-synuclein forms amyloid inclusions called Lewy bodies in neurons. In type 2 diabetes, the hormone islet amyloid polypeptide (IAPP, also called amylin) forms amyloid deposits in pancreatic islets that contribute to beta-cell death.
Prion diseases represent an extreme case of protein misfolding. The prion protein (PrP) can exist in a normal cellular form (PrPC) or a misfolded, infectious form (PrPSc). The misfolded form acts as a template that converts normal PrPC molecules into the PrPSc conformation, propagating the misfolding in a chain reaction. This mechanism underlies Creutzfeldt-Jakob disease in humans, bovine spongiform encephalopathy (BSE, or mad cow disease) in cattle, and scrapie in sheep.
Cells possess quality control systems to detect and dispose of misfolded proteins. The unfolded protein response (UPR) in the endoplasmic reticulum senses the accumulation of misfolded proteins and activates genes encoding chaperones and other folding helpers. If the load of misfolded proteins cannot be resolved, the UPR triggers apoptosis (programmed cell death) to protect the organism. Misfolded cytoplasmic proteins are tagged with ubiquitin and degraded by the proteasome, a barrel-shaped protease complex that unfolds and digests damaged proteins.
Computational Approaches to Folding
Predicting a protein's three-dimensional structure from its amino acid sequence has been one of the grand challenges of computational biology. Traditional methods like homology modeling and threading work well when a related structure is already known, but predicting the fold of a protein with no known structural relatives (ab initio prediction) has been far more difficult.
The development of AlphaFold by DeepMind, which demonstrated remarkable accuracy in the CASP14 competition in 2020, transformed the field. AlphaFold uses deep learning to predict protein structures with accuracy approaching that of experimental methods for many proteins. The AlphaFold Protein Structure Database has since made predicted structures available for hundreds of millions of proteins, greatly accelerating research in biology and medicine. While computational prediction has advanced enormously, understanding the physical process of folding, how the polypeptide chain navigates the energy landscape in real time, remains an active area of research.
Protein folding transforms a linear polypeptide into a precise three-dimensional structure through forces dominated by the hydrophobic effect, hydrogen bonding, and van der Waals interactions. Molecular chaperones assist folding in the crowded cellular environment, and failures in folding underlie amyloid diseases including Alzheimer's, Parkinson's, and prion diseases.