Transcription and Translation: How Cells Read Genes and Build Proteins
Transcription: Copying DNA into RNA
Transcription is the synthesis of an RNA molecule from a DNA template. It begins when transcription factors (specialized proteins) recognize and bind to the promoter region upstream of a gene. The promoter contains specific DNA sequences, such as the TATA box in many eukaryotic genes, that position RNA polymerase correctly to begin transcription at the right starting point.
Once RNA polymerase is positioned at the promoter, it unwinds a short stretch of the DNA double helix and begins reading the template strand in the 3-prime to 5-prime direction. As it reads, it synthesizes a complementary RNA strand in the 5-prime to 3-prime direction, using ribonucleotides (ATP, UTP, GTP, and CTP) as building blocks. The RNA strand is identical to the non-template (coding) strand of the DNA, except that uracil replaces thymine.
Transcription proceeds through three phases: initiation (binding to the promoter and beginning synthesis), elongation (moving along the gene and extending the RNA strand at about 40 nucleotides per second in eukaryotes), and termination (recognizing a stop signal and releasing the completed RNA transcript). In bacteria, termination occurs at specific DNA sequences that form hairpin structures in the RNA. In eukaryotes, the process is more complex, involving cleavage of the RNA followed by polyadenylation.
RNA Processing in Eukaryotes
In eukaryotic cells, the initial RNA transcript (pre-mRNA) must undergo several modifications before it can be translated. These processing steps occur in the nucleus and are essential for producing a stable, functional messenger RNA that ribosomes can read correctly.
The 5-prime cap is added almost immediately after transcription begins. This cap is a modified guanine nucleotide linked to the RNA by an unusual 5-prime to 5-prime bond. It protects the mRNA from degradation by exonucleases, helps ribosomes recognize the mRNA for translation, and facilitates transport from the nucleus to the cytoplasm.
The 3-prime poly-A tail is added after the pre-mRNA is cleaved at a specific signal sequence (usually AAUAAA). An enzyme called poly-A polymerase then adds 100 to 250 adenine nucleotides to the cut end. The poly-A tail stabilizes the mRNA and plays a role in export from the nucleus and translation initiation. mRNA lifespan in the cytoplasm correlates with poly-A tail length, as the tail is gradually shortened over time.
RNA splicing removes introns (non-coding intervening sequences) and joins exons (the sequences that will be translated) into a continuous coding message. The spliceosome, a large molecular machine composed of five small nuclear RNAs (snRNAs) and numerous proteins, carries out this precise cutting and joining reaction. Splice sites are defined by conserved sequences at intron-exon boundaries that the spliceosome recognizes.
The Genetic Code
The genetic code is the set of rules by which nucleotide triplets (codons) in mRNA correspond to specific amino acids in proteins. With four possible bases at each of three positions, there are 64 possible codons. Of these, 61 specify amino acids and 3 are stop codons (UAA, UAG, UGA) that signal the end of translation. The codon AUG serves dual duty as the start codon (initiating translation) and as the code for the amino acid methionine.
The genetic code is degenerate, meaning that most amino acids are specified by more than one codon. Leucine and serine, for example, are each encoded by six different codons. This redundancy provides some protection against mutations, since many single-base changes in the third position of a codon (called wobble position changes) do not alter the amino acid produced. The code is also nearly universal across all life forms, from bacteria to humans, reflecting its ancient evolutionary origin.
Translation: Building Proteins from mRNA
Translation takes place on ribosomes, complex molecular machines composed of ribosomal RNA and proteins. Each ribosome has two subunits (large and small) that come together on the mRNA to form the functional translation complex. The ribosome has three binding sites for transfer RNA: the A site (aminoacyl, where new charged tRNAs enter), the P site (peptidyl, where the growing peptide chain is held), and the E site (exit, where empty tRNAs leave).
Translation initiation begins when the small ribosomal subunit, guided by initiation factors and the 5-prime cap, scans along the mRNA until it encounters the start codon AUG. A special initiator tRNA carrying methionine recognizes this codon, and the large ribosomal subunit joins to form the complete ribosome. The initiator tRNA occupies the P site, and translation is ready to proceed.
During elongation, the ribosome moves along the mRNA one codon at a time. At each step, a transfer RNA carrying the appropriate amino acid enters the A site, its anticodon base-pairing with the mRNA codon. The ribosome catalyzes formation of a peptide bond between the amino acid in the A site and the growing chain in the P site, then shifts (translocates) one codon forward. This cycle repeats at a rate of about 6 amino acids per second in eukaryotes until a stop codon is reached.
Termination occurs when a stop codon (UAA, UAG, or UGA) enters the A site. No normal tRNA recognizes stop codons. Instead, proteins called release factors enter the A site, triggering hydrolysis of the bond between the completed polypeptide and the final tRNA. The ribosome then dissociates into its subunits, releasing the finished protein.
After Translation: Protein Folding and Modification
The newly synthesized polypeptide chain must fold into its correct three-dimensional structure to become a functional protein. Folding begins even during translation, as the emerging chain interacts with itself and with molecular chaperones (helper proteins that prevent misfolding and aggregation). The final folded structure is determined primarily by the amino acid sequence, which dictates the pattern of hydrophobic, hydrophilic, and charged interactions within the protein.
Many proteins undergo post-translational modifications that are essential for their function. These include phosphorylation (adding phosphate groups, often to regulate protein activity), glycosylation (adding sugar chains, common in secreted and membrane proteins), and proteolytic cleavage (cutting the protein into its active form, as occurs with insulin). Such modifications expand the functional diversity of proteins far beyond what the genetic code alone could specify.
Gene expression proceeds through transcription (DNA to mRNA in the nucleus) and translation (mRNA to protein on ribosomes). The genetic code uses three-base codons to specify amino acids, and the resulting polypeptide chain folds into a functional protein that carries out cellular work.