Gene expression: DNA to protein

Learning objectives

  1. Describe the Central Dogma of molecular biology
  2. Know the general functions of the three major types of RNA (mRNA, rRNA, tRNA).
  3. Describe the DNA sequence motifs and proteins required to initiate transcription.
  4. Predict the RNA transcribed from a DNA sequence identified as either the template strand or the coding strand.
  5. Use the genetic code to predict the protein amino acid sequence translated from an mRNA sequence.
  6. Describe the process of and key components required for translation.
  7. Predict the likely effects of mutations in DNA on protein amino acid sequence, structure and function
  8. Compare and contrast prokaryotic and eukaryotic transcription and translation

The Central Dogma

Francis Crick coined the phrase “the Central Dogma” to describe the flow of information from nucleic acid to protein. Information encoded in DNA is transcribed to RNA, and RNA is translated to a linear sequence of amino acids in protein. Although information can flow reversibly between DNA and RNA via transcription and reverse transcription, no mechanism has yet been found for alterations in protein amino acid sequence to somehow effect a corresponding change in the RNA or DNA.

This video gives a highly simplified overview of the central dogma of molecular biology:

And this video provides an animated overview of gene expression in a eukaryotic cell:

Transcription: DNA to RNA

Transcription is the process of using DNA as a template to make an RNA molecule:

  • The enzyme RNA polymerase reads the template strand of DNA and synthesizes an RNA molecule whose bases are complementary to the template strand of DNA.
  • RNA is synthesized 5′ –> 3′ (same direction as DNA synthesis); RNA polymerase reads the template strand of DNA 3′ –> 5′.
  • The sequence of bases in RNA is the same as the sequence of bases in the “coding” strand of DNA, except that RNA has uracil (U) instead of thymine (T).
  • RNA polymerases in both prokaryotes and eukaryotes depend on DNA-binding proteins, called transcription factors, to bind to special sequence motifs in the DNA called promoters, to recognize where genes start.
  • Transcription factors recruit RNA polymerase to bind to the promoter sequence and begin transcription just “downstream” of the promoter.

This video gives a simplified overview of transcription. Notice that the narrator makes a mistake at 3:45 (that he later catches and corrects!); this mistake serves as a really important reminder of one of the major differences between DNA and RNA, so watch for it.

See a more advanced molecular animation of transcription, with narration, here:

Translation: RNA to Protein

Translation is the process of using an mRNA molecule as a template to make a protein:
Translating a sequence of bases in the RNA to a sequence of amino acids in proteins requires 3 major components:

  1. messenger RNA (mRNA): mRNAs are transcribed from protein-coding genes. (There are other types of genes which do not encode proteins, such as genes encoding rRNAs and tRNAs.)
  2. ribosomes: ribosomes are large assemblies of ribosomal RNA molecules (rRNAs) and dozens of proteins. When they are not working, they fall apart into the small subunit and large subunit, each consisting of a rRNA and numerous proteins. When the structures of prokaryotic ribosomes were determined at high resolution, researchers were astonished to discover that the catalytic site for the peptidyl-transfer reaction (attaching new amino acids to the growing polypeptide chain) consists entirely of rRNA. Thus the ribosome is actually an immense ribozyme, or a catalytic RNA molecule stabilized by numerous proteins, rather than an enzyme.
  3. transfer RNAs (tRNAs) that are “charged” with their corresponding amino acids (meaning the tRNAs are attached to/carrying their corresponding amino acids). tRNAs match the amino acid to the codon in the mRNA. The bases in the anticodon loop are complementary to the bases in an mRNA codon. The 3′ end of the tRNA has a high-energy bond to the appropriate amino acid. Cells have a family of enzymes, called amino-acyl tRNA synthetases, that recognize the various tRNAs and “charge” them by attaching the correct amino acid.

Secondary structure of phenylalanyl-tRNA from yeast, from Wikipedia

Tertiary structure of tRNA, from Wikipedia. The anticodon loop (in gray) base-pairs with the codon in the mRNA in anti-parallel orientation. The amino acid attachment site (yellow) is the location where the tRNA is covalently bonded to its amino acid.

Translation begins near the 5′ end of the mRNA, with the ribosomal small subunit and a special initiator tRNA carrying the amino acid methionine. In most cases, translation begins at the AUG triplet closest to the 5′ end of the mRNA. In eukaryotes, the small subunit of the ribosome typically just “scans” along from the 5′ end of the mRNA until it finds the first AUG codon. In prokaryotes, there is typically a specific sequence that the ribosome binds to, which “positions” the ribosome at the starting AUG. In either case, the large ribosomal subunit then docks and translation begins, always starting with an AUG codon (methionine) in both prokaryotes and eukaryotes. The ribosome moves along the mRNA 3 bases at a time, from the 5′ to the 3′ direction, and new tRNAs whose anti-codons are complementary to the mRNA codons arrive with their corresponding amino acids. A peptide bond forms to join the amino acid to the carboxyl end of the growing polypeptide chain. The ribosome moves another 3 bases, and the empty tRNA is ejected to make room for a new amino-acyl tRNA.

Image modified from “Translation: Figure 3,” by OpenStax College, Biology (CC BY 4.0).

The polypeptide chain made by the ribosome also has directionality; one end has a free amino group and the other end of the chain has a free carboxyl group. These are called the N-terminus and the C-terminus, respectively. New amino acids are added only to a free carboxyl end, so polypeptide chains grow from the N-terminus to the C-terminus.
This video gives a solid overview of translation. It is a little longer than the typical videos we use on this site, but it does a really nice job of breaking down step-by-step what happens during translation:

Watch a much shorter molecular animation of translation here:

The Genetic Code

The universal genetic code chart

The universal genetic code. AUG (methionine, highlighted green) is the “Start” codon. The three codons labeled “Stop” in red are “nonsense” codons that signal termination of translation. From

This genetic code is universally used by all living organisms, whether Archaea, Bacteria or Eukarya, with only minor modifications in the mitochondria of a relatively few species. If you do the math, you can see that there are 64 possible codons (4^3), but we know that there are only 20 amino acids. Thus the code is “degenerate,” because the same amino acid can be specified by 2, 3, 4 or 6 different codons. For example, glycine can be specified by codons GGU, GGC, GGA, and GGG. Methionine is unusual in that it is specified only by a single codon: AUG. (Tryptophan is the only other amino acid specified by a single codon.)

Mutations can have vastly different effects depending on where they occur in a gene or in a codon

If we consider just single nucleotide changes (substitutions, deletions or insertions of single bases), these can have very different consequences depending on whether they occur in the gene. Often a DNA base substitution will have no effect if they change the 3rd base in the codon, due to the degenerate nature of the genetic code. For example, changing GAG to GAA has no effect on the protein because both codons specify alanine. Such “silent” mutations are called “synonymous” mutations.

Other base substitutions in the first or 2nd position will cause amino acid changes; these are “nonsynonymous” mutations, and also mis-sense mutations. Even among nonsynonymous mutations, the exact amino acid change matters. A change of one hydrophobic amino acid to another hydrophobic amino acid will be less disruptive to the structure of the protein than a change of a hydrophobic amino acid to a polar or charged amino acid. Finally, some parts of a protein are more important than others, such as the catalytic site of enzymes, or sites that bind other proteins, DNA, or regulatory molecules.

Some mutations create a new stop codon (UAA, UAG, or UGA). These are called “nonsense” mutations and cause truncated polypeptides to be made. Insertions or deletions (“indels”) of single nucleotides cause a change in the reading of all downstream codons; they are shifted by one base. Such “frameshift” mutations will alter most or all amino acids downstream (towards the 3′ end of the mRNA, towards the C-terminus of the protein) of the mutation.

Differences between prokaryotes and eukaryotes

Much of what is discussed above was originally discovered in bacteria, and then found to be true of archaea and eukaryotes as well; many of the core features of molecular biology are evolutionarily conserved. However, there are a few key differences as outlined below.

Prokaryotes: transcription and translation are coupled

In prokaryotic cells, ribosomes begin to translate even while the mRNA is still being transcribed. DNA, RNA polymerase, and ribosomes are all in the same location. This coupled transcription and translation can occur because prokaryotes have no nucleus. (In eukaryotes, the nucleus separates the transcription machinery from the translation machinery.)

Eukaryotes: transcription and translation are separated in space and time, and nuclear pre-mRNA undergoes processing to become mature mRNA

In eukaryotes transcription occurs in the nucleus, whereas translation occurs outside the nucleus, in the cytoplasm by free cytoplasmic ribosomes or by ribosomes docked to the ER. The RNA transcribed from a protein-coding gene in the nucleus is called the pre-mRNA. Pre-mRNA has to undergo at least two, and usually 3, processing steps before they can be exported to the cytoplasm as mature mRNA. These are, in order:

  1. The 5′ end of the pre-mRNA is modified by the covalent attachment of a 7-methylG nucleotide, called the 5′-cap. The 5′ cap is required for eukaryotic ribosomes to initiate translation.
  2. The majority of eukaryotic genes contain sequences which do not actually code for protein. These sequences are called introns (“intervening” sequences), and they “interrupt” the protein coding sequences, which are called exons (“expressed” sequences) in the gene. These non-protein coding intron sequences are removed by RNA splicing, leaving just the protein-coding exons in the final mRNA.
  3. The 3- end of the pre-mRNA is modified by the addition of hundreds of adenine nucleotides, called the polyA tail. The polyA tail is important for nuclear export, mRNA stability, and translation.

All of these processing steps actually happen while the mRNA is being transcribed; that is, they occur co-transcriptionally. So in reality, a full-length “pre-mRNA” never actually exists.

Eukaryotic pre-mRNA processing

Eukaryotic pre-mRNAs are processed in the nucleus by adding a 5′ cap, 3′ polyA tail, and removal of introns via RNA splicing to create a mature mRNA consisting only of exons, ready for export to the cytoplasm for translation. From

This video gives a nice quick overview of these differences between prokaryotes and eukaryotes:

Dr. Choi’s video lecture on this topic, in one 32-min chunk:

Test your knowledge with these questions & problems: DNA_to_protein_questions

And the slide set: B1510_module4-6_DNA_to_protein

Affordable and Clean Energy

UN Sustainable Development Goal (SDG) 7: Affordable and Clean Energy – Knowledge of the DNA sequence motifs and proteins required to initiate transcription can lead to the development of biotechnology that can produce biofuels and other forms of clean energy. Frequently, biofuel advances are made through inserting new genes to be expressed, or increasing the expression of current genes, in the plants and bacteria used to maximize yields.

2 Responses to Gene expression: DNA to protein

  1. Santiago Acosta says:

    So DNA template strands have promoters which is where the RNA polymerase binds in order to start transcription, but what is it that stops transcription?

Leave a Reply