Introduction to DNA
From BCCD 3.0
Every cell in your body, with a few rare exceptions that we'll ignore, contain the set of instructions for making you. Starting almost right after conception, that set of instructions was put into reality and used to make up your physical being. The same is true for every other living creature on the planet - each of their cells contains the instructions for making them. This set of instructions is called deoxyribonucleic acid, or DNA for short.
DNA itself is a molecule, though a very large molecule (a macromolecule, to be specific). The backbone of the DNA consists of a repeating unit made of phosphate and the sugar called deoxyribose. These units repeat alternately - phosphate, deoxyribose, phosphate, deoxyribose, etc. From the picture, you'll notice that one of the strands of DNA has phosophate on the top and deoxyribose on the bottom, while the other strand is the opposite. This is because the strands actually run opposite directions to each other, also called antiparallel. The end of the DNA strand that finishes in deoxyribose is called the 3' end and the phosphate end is the 5' end.
Bonded to the ribose residues, as can be seen in the picture, are the nucleotides. These are the instructions that the DNA carries. Similar to how a computer stores a bit of information as either 1 or 0 (on or off), each position on the DNA stores one of four molecules: an adenine (A), thymine (T), cytosine (C), or guanine (G). Reading from the 5' to the 3' end on the left strand of the DNA in the picture to the right, we can see that it carries ACTG.
Unlike computer bits, however, the nucleotides on a DNA strand must have an relationship with each other. DNA is often shown as a double-stranded helix; this helix is created by two DNA molecules coming together. The two DNA molecules (strands) pair right at the nucleotides. Each of the nucleotides has a specific other nucleotide it pairs with: A pairs with T, and C with G. Two pairing nucleotides form hydrogen bonds with each other; guanine and cytosine line up to form three hydrogen bonds, and adenine and thymine pair in a different way, forming two hydrogen bonds. The tension from all of these bonds are what causes the DNA to twist into a helix form.
Because A always pairs with T and C always pairs with G, the two strands of DNA making up the helix actually contain the same information. In the picture to the right, the left strand carries ACTG, and the right strand carries TGAC. Given the sequence of nucleotides for one strand, the sequence of the other strand can be found.
For instance, the sequence AGTCACCGGATAAG will always pair with TCAGTGGCCTATTC, and vice versa. They are called complements of each other. To find the complement for a sequence, make sure
- every A pairs with a T
- every T pairs with an A
- every C pairs with a G
- every G pairs with a C
5' --- A G T C A C C G G A T A A G --- 3' | | | | | | | | | | | | | | 3' --- T C A G T G G C C T A T T C --- 5'
Bits on a computer can represent lots of possible things: a software program, a document, or the instructions on how to talk to a printer, for example. Similarly, DNA can code for many different things. However, what the majority of them have in common is how they are read and interpreted. We've established that DNA holds a set of instructions for making a living organism. This set consists of four characters - A, T, C, and G. These are strung together in long sequences of instructions. The instructions must be copied for processing and then processed according to the instructions. What are these "instructions"? The nucleotides actually represent which amino acids should make up a protein and in what order they should be put in. The instructions from the DNA tell how the protein should be put together.
A gene is a set of these instructions. However, for any given chunk of DNA, there are four possible ways of reading the code of nucleotides from the DNA strand: the top strand in the written example above from left to right (5' to 3'), the top strand from right to left (3' to 5'), the right strand from left to right (3' to 5'), or the right strand right to left (5' to 3'). Two of these possibilities can be eliminated with the knowledge that DNA is read from 3' to 5'. This leaves either the top strand from right to left or the bottom strand from left to right. Which one of these contains the actual message is context dependent; signals in the DNA indicate which strand has the start signal for the instructions.
Condensed DNA: Chromosomes
DNA strands consist of thousands of nucleotides, and each organism has multiple strands within the nucleus of each and every cell. Further, not all of a DNA actually codes for genes, and so not all of it needs to be open and accessible at all times. To accomplish the compression needed to fit 46 strands of DNA into one nucleus, several layers of compression are used. First, parts of DNA encircle twice small columnar molecules called histones, creating structures called nucleosomes all along the strand. These nucleosomes then wind up in a tightly compressed upward spiral called a solenoid. The solenoid compresses on itself at least once more before becoming the highly condensed structure called a chromosome.
Humans have 23 different chromosomes and two copies of each type. This amounts to a total of 46 chromosomes. Different species have different numbers of chromosomes. For instance, the common fruit fly has only 4 pairs of chromosomes, while a domesticated chicken has 39. Tomatoes have 12 pairs of chromosomes. Carp have 54 pairs of chromosomes. (Information from here and here - check them for more examples.)
Chromosome number cannot be used as an indicator of the complexity of a species. Organisms with higher chromosome counts cannot be said to be more complex or even to have more genes, because the amount of DNA that does not code for proteins or serve other known functions (sometimes called junk DNA) may vary from species to species. In addition, some organisms (like plants) can tolerate having more than two sets of chromosomes; new species can arise when two sets of chromosomes from one organism and two sets from a different organism combine, or when an individual accidentally inherits more than two sets of chromosomes from its parents.
From DNA to Protein
Going from the nucleotides on the DNA strand itself to a creating a protein is a complicated, multiple step process. First, the enzyme responsible for making a copy of the DNA must find the start position of a gene to be copied; this may involve uncoiling part of the DNA to make it accessible. Once that has been found, the following steps occur:
- A copy of the DNA is made.
- In eukaryotes, the transcript must be processed and then leave the nucleus.
- A ribosome attaches to the copy and constructs a protein from the instructions.
- The protein folds.
Copying the DNA: Transcription
Before the instructions on the DNA can be interpreted, they must first be moved to a form that the interpreter can understand, and this means using ribonucleic acid (RNA). RNA is similar to DNA in that they're both nucleic acids with a phosphate and sugar backbone, but instead of using the sugar deoxyribose, RNA uses ribose. RNA also has a 3' and 5' end. However, RNA is usually found in single strands rather than the double strands of DNA. Also, RNA does not have the nucleotide thymine (T); instead, it has uracil (U), which pairs with adenine (A). Since there are many different types of RNA, the type of RNA that makes up the transcript is called mRNA, for messenger RNA.
A special enzyme called RNA polymerase finds the beginning of the DNA to begin copying; the strand it begins to copy is called the template strand. It follows the DNA strand from 5' to 3' and unzips two DNA strands from each other as it goes along. For each DNA nucleotide, the RNA polymerase matches the corresponding nucleotide. The rules are the same for matching RNA as for DNA, with U replacing T - A pairs with U, and C pairs with G. The new mRNA strand being produced (called the transcript) runs antiparallel to the DNA strand. As the DNA strand is being read from 3' to 5', the RNA transcript is being created from 5' to 3'.
DNA - coding strand / 5' - ATTCGGGCATAACCCCTGAT - 3' - template strand \ 3' - TAAGCCCGTATTGGGGACTA - 5' mRNA 5' - AUUCGGGCAUAACCCCUGAU - 3'
The mRNA is the complement to the template strand, which it was copied from. However, it's identical to the coding strand, except with the thymines being replaced with uracils.
Even as the mRNA is being produced, prokaryotes can begin making proteins from the instructions contained in the transcript. In eukaryotes, however, the mRNA needs to first leave the nucleus before the proteins are made from it. (It's a lot easier to get a strand of RNA through the nuclear membrane than a big protein.) To protect the RNA, a cap is added to the 5' end and a poly(A) tail is added to the 3' end.
In addition, eukaryote mRNA undergoes another step that prokaryote mRNA does not - it is first spliced. Eukaryote DNA contains sections that do not code for protein, called introns. The chunks inbetween that do code for protein are called exons. This allows eukaryote DNA to be very flexible: cutting out different introns or even cutting out some exons can result in multiple kinds of proteins being made from a single section of DNA. (For more information on alternative splicing, please see this article from the Howard Hughes Medical Institute bulletin.)
After being processed and spliced, the eukaryotic mRNA leaves the nucleus and enters the cytoplasm of the cell, where it can undergo the next step.
Interpreting the mRNA: Translation
The mRNA is finally ready for its message to be turned into a protein. First a ribosome, a molecular complex consisting of enzymes and RNA, attaches to the transcript and begins scanning down it. The ribosome looks for a specific sequence on the RNA that tells it to begin translating from the RNA to protein: AUG. Once it finds it, starting with AUG, it interprets three nucleotides on the RNA at a time: these three nucleotides are called a codon. Each codon codes for a specific amino acid. (For a table of codons and their corresponding amino acids, please see Wikipedia's article on the genetic code.)
The ribosome's job is to match each codon on the mRNA with another kind of RNA that carries in the amino acid. This other kind of RNA is called tRNA, for transfer RNA. Each tRNA carries a specific amino acid. For instance, on the mRNA, CCA codes for proline, because the tRNA with the sequence GGU carries proline. The ribosome matches up one tRNA carrying its corresponding amino acid with the codon on the mRNA. As it moves down to the next codon, the tRNA releases the amino acid and it joins a growing chain of amino acids coming out of the ribosome. Finally, the protein is being constructed. This is called translation, because the message carried by the DNA and copied into the mRNA is being translated into a protein.
Amino acids, the building blocks of proteins, have a specific structure, as shown to the right. The R on the bottom of the diagram represents the "R group", which varies by amino acid. As the ribosome continues down the mRNA, additional amino acids are added to the chain on the carboxyl end. The amino acids join by the old carboxyl end donating a hydroxy group (OH-) and the new amino acid donating a hydrogen (H) from the amino end. This releases a water molecule (H2O) and forms a peptide bond between the two amino acids.
The ribosome continues down the transcript adding amino acids until it reaches one of the "stop" codons specific to that species. (In humans, UAA, UAG, and UGA are stop codons.) Unlike the start codon, the stop codon does not add an amino acid to the chain, but simply signals the termination of the protein.
- Griffiths, Anthony J. F., et al. Introduction to Genetic Analysis. 8th ed. New York: W. H. Freeman and Company, 2005.