Home About us Products Services Contact us Bookmark
:: wikimiki.org ::
Genetic Information

Genetic information

A DNA sequence (sometimes genetic sequence) is a succession of letters representing the primary structure of a real or hypothetical DNA molecule or strand, The possible letters are A, C, G, and T, representing the four nucleotide subunits of a DNA strand (adenine, cytosine, guanine, thymine), and typically these are printed abutting one another without gaps, as in the sequence AAAGTCTGAC. This coded sequence is sometimes referred to as genetic information. A succession of any number of nucleotides greater than four is liable to be called a sequence. With regard to its biological function, which may depend on context, a sequence may be sense or anti-sense (see DNA), and either coding or noncoding. DNA sequences can also contain "junk DNA".

See also


- DNA
- DNA motif Category:DNA

Primary structure

s.]] In biochemistry, the primary structure of an unbranched biopolymer, such as a molecule of DNA, RNA or protein, is the specific nucleotide or peptide sequence from the beginning to the end of the molecule. The primary structure, in other words, identifies a biopolymer's exact chemical composition and the sequence of its monomeric subunits. The primary structure of a biological polymer to a large extent determines the three-dimensional shape known as the tertiary structure, but nucleic acid and protein folding are so complex that knowing the primary structure often doesn't help either to deduce the shape or to predict localized secondary structure, such as the formation of loops or helices. However, knowing the structure of a similar homologous sequence (for example a member of the same protein family) can unambiguously identify the tertiary structure of the given sequence. Sequence families are often determined by sequence clustering, and structural genomics projects aim to produce a set of representative structures to cover the sequence space of possible non-redundant sequences.

See also


- sequencing
- DNA sequence
- protein sequencing
- secondary structure
- tertiary structure
- quaternary structure
- translation Category:Protein structure

Molecule

A molecule is the smallest particle of a pure chemical substance that still retains its chemical composition and properties. The science of molecules is called molecular chemistry or molecular physics, depending on the focus. Molecular chemistry deals with the laws governing the interaction between molecules that results in the formation and breakage of chemical bonds, while molecular physics deals with the laws governing their structure and properties. In practice, however, this distinction is vague. According to the strict definition, molecules can consist of one atom (as in noble gases) or more atoms bonded together. The concept of monatomic (single-atom) molecule is used almost exclusively in the kinetic theory of gases. In molecular sciences, a molecule consists of a stable system (bound state) comprising two or more atoms. The term unstable molecule is used for very reactive species, i.e., short-lived assemblies (resonances) of electrons and nuclei, such as radicals, molecular ions, Rydberg molecules, transition states, Van der Waals complexes, or systems of colliding atoms as in Bose-Einstein condensates. A peculiar use of the term molecular is as a synonym to covalent, which arises from the fact that, unlike molecular covalent compounds, ionic compounds do not yield well-defined smallest particles that would be consistent with the definition above. No typical "smallest particle" can be defined for covalent crystals, or network solids, which are composed of repeating unit cells that extend indefinitely either in a plane (such as in graphite) or three-dimensionally (such as in diamond). Although the concept of molecules was first introduced in 1811 by Avogadro, and was accepted by many chemists as a result of Dalton's laws of Definite and Multiple Proportions (1803-1808), with notable exceptions (Boltzmann, Maxwell, Gibbs), the existence of molecules as anything other than convenient mathematical constructs was still an open debate in the physics community until the work of Perrin (1911), and was strenuously resisted by early positvists such as Mach. The modern theory of molecules makes great use of the many numerical techniques offered by computational chemistry. Dozens of molecules have now been identified in interstellar space by microwave spectroscopy.
microwave spectroscopy (right) representations of the terpenoid, atisane. In the 3D model on the left, carbon atoms are represented by gray spheres; white spheres represent the hydrogen atoms and the cylinders represent the bonds. The model is enveloped in a "mesh" representation of the molecular surface, colored by areas of positive (red) and negative (blue) electric charge. In the 3D model (center), the light-blue spheres represent carbon atoms, the white spheres are hydrogen atoms, and the cylinders in between the atoms correspond to single bonds.]]

Chemical bond

:See main article chemical bond In a molecule, the atoms are joined by shared pairs of electrons in a chemical bond. It may consist of atoms of the same chemical element, as with oxygen (O2), or of different elements, as with water (H2O).

Size

Most molecules are much too small to be seen with the naked eye, but there are exceptions. DNA, a macromolecule, can reach macroscopic sizes. The smallest molecule is the hydrogen molecule. The interatomic distance is 0.15 nanometres (1.5 Å). But the size of its electron cloud is difficult to define precisely. Under standard conditions molecules have a dimension of a few to a few dozen Å.

Empirical formula

:See main article empirical formula The empirical formula of a molecule is the simplest integer ratio of the chemical elements that constitute the compound. For example, in their pure forms, water is always composed of a 2:1 ratio of hydrogen to oxygen, and ethyl alcohol or ethanol is always composed of carbon, hydrogen, and oxygen in a 2:6:1 ratio. However, this does not determine the kind of molecule uniquely - dimethyl ether has the same ratio as ethanol, for instance. Molecules with the same atoms in different arrangements are called isomers. The empirical formula is often the same as the molecular formula but not always. For example the molecule acetylene has molecular formula C2H2, but the simplest integer ratio of elements is CH.

Chemical formula

:See main article chemical formula The chemical formula reflects the exact number of atoms that compose a molecule. The molecular mass can be calculated from the chemical formula and is expressed in conventional units equal to 1/12 from the mass of a 12C isotope atom. For network solids, the term formula unit is used in stoichiometric calculations.

Molecular geometry

:See main article molecular geometry Molecules have fixed equilibrium geometries—bond lengths and angles—. A pure substance is composed of molecules with the same geometrical structure. The chemical formula and the structure of a molecule are the two important factors that determine its properties, particularly its reactivity. Isomers share a chemical formula but normally have very different properties because of their different structures. Stereoisomers, a particular type of isomers, may have very similar physico-chemical properties and at the same time very different biochemical activities.

Molecular spectroscopy

:See main article spectroscopy Molecular spectroscopy is the study of the response (spectrum) of a molecule to a signal of known energy (or frequency, according to Planck's formula). This signal is usually an electromagnetic wave or a beam of electrons, but new molecular spectroscopies, such as the positron spectroscopy, are under development. The molecular response can be signal absorption (absorption spectroscopy), emission of another signal (emission spectroscopy), fragmentation, or a change in its chemical nature. Spectroscopy is recognized as the most powerful tool in the investigation of the microscopic properties of molecules, and, in particular, their energy levels. Nowadays, in order to extract the maximum microscopic information from the experimental results, spectroscopical studies are very often coupled with computational chemical investigations. The theoretical background of spectroscopy is the scattering theory.

See also


- Covalent bond
- Diatomic molecule
- Molecular geometry
- Molecular orbital
- Nonpolar molecule
- Polar molecule

Related lists


- For a list of molecules see the List of compounds
- List of molecules in interstellar space Category:Matter als:Molekül ko:분자 ja:分子 simple:Molecule th:โมเลกุล

Nucleotide

A nucleotide is a monomer or the structural unit of nucleotide chains forming such nucleic acids as RNA and DNA. A nucleotide consists of a heterocyclic nucleobase, a pentose sugar (ribose or deoxyribose), and a phosphate or polyphosphate group. Nucleotides also play important roles in cellular energy transport and transformations (notably ATP and NAD+/NADH), and in enzyme regulation (see for example, protein kinase). The nucleobase can be purines or pyrimidines, the sugar can be deoxyribose in DNA or ribose in RNA, and the phosphate chain can be a monophosphate, diphosphate, or triphosphate. A nucleotide that lacks the phosphate group is called nucleoside. nucleoside

Nomenclature

Nucleotide names are abbreviated into standard four-letter codes. The first letter is lower case and indicates whether the nucleotide in question is a ribonucleotide (r) or deoxyribonucleotide (d). The second letter indicates the nucleoside corresponding to the nucleobase: : G: Guanine : A: Adenine : T: Thymine : C: Cytosine : U: Uracil not usually present in DNA, but takes the place of Thymine in RNA The third and fourth letters indicate the length of the attached phosphate chain (Mono-, Di-, Tri-) and the presence of a phosphate (P). For example, deoxy-cytidine-triphosphate is abbreviated as dCTP.

Chemical structures

Nucleotides

Deoxynucleotides

Synthesis

Natural

Purine ribonucleotides

Uracil Uracil By using a variety of isotopically labeled compounds it was demonstrated that N1 of purines arises from the amine group of Asp; C2 and C8 originate from formate; N3 and N9 are contributed by the amide group of Gln; C4, C5 and N7 are derived from Gly; and C6 comes from HCO3- (CO2). The de novo synthesis of purine nucleotides by which these precursors are incorporated into the purine ring, proceeds by a 10 step pathway to the branch point intermediate IMP, the nucleotide of the base hypoxanthine. AMP and GMP are subsequently synthesized from this intermediate via separate, two step each, pathways. Thus purine moieties are initially formed as part of the ribonucleotides rahter than as free bases. Six enzymes take part in IMP synthesis. Three of them are multifunctional - GART (reactions 2, 3, and 5), PAICS (reactions 6, and 7) and ATIC (reactions 9, and 10). Reaction 1. The pathway starts with the formation of PRPP. PRPS1 is the enzyme that activates R5P, which is primarily formed by the pentose phosphate pathway, to PRPP by reacting it with ATP. The reaction is unusual in that a pyrophosphoryl group is directly transferred from ATP to C1 of R5P and that the product has the α configuration about C1. This reaction is also shared with the pathways for the synthesis of the pyrimidine nucleotides, Trp, and His. As a result of being on (a) such (a) major metabolic crossroad and the use of energy, this reaction is highly regulated. Reaction 2. In the first reaction unique to purine nucleotide biosynthesis, PPAT catalyzes the displacement of PRPP's pyrophosphate group (PPi) by Gln's amide nitrogen. The reaction occurs with the inversion of configuration about ribose C1, thereby forming β-5-phosphorybosylamine (5-PRA) and establishsing the anomeric form of the future nucleotide. This reaction which is driven to completion by the subsequent hydrolysis of the released PPi, is the pathway's flux generating step and is therefore regulated too. Reaction 3.

Pyrimidine ribonucleotides

His

See also


- Gene
- Genetics
- Chromosome

External links


- [http://www.chem.qmul.ac.uk/iupac/misc/naabb.html Abbreviations and Symbols for Nucleic Acids, Polynucleotides and their Constituents] (IUPAC)
- [http://www.iupac.org/reports/provisional/abstract04/BB-prs310305/Chapter10.pdf Provisional Recommendations 2004] (IUPAC) Category:Nucleic acids Category:Nucleotides ko:뉴클레오티드 ja:ヌクレオチド

Cytosine

Cytosine is one of the 5 main nucleobases used in storing and transporting genetic information within a cell in the nucleic acids DNA and RNA. It is a pyrimidine derivative, with a heterocyclic aromatic ring and two substituents attached (an amine group at position 4 and a keto group at position 2). The nucleoside of cytosine is cytidine. In Watson-Crick base pairing, it forms three hydrogen bonds with guanine. Cytosine was first discovered in 1894 when it was isolated from calf thymus tissues. A structure was proposed in 1903, and was synthesized (and thus confirmed) in the laboratory in the same year. Cytosine recently found use in quantum computation. The first time any quantum mechanical properties were harnessed to process information took place on August 1st in 1998 when researchers at Oxford implemented David Deutsch's algorithm on a two qubit NMRQC (Nuclear Magnetic Resonance Quantum Computer) based on the cytosine molecule. Cytosine can be found as part of DNA, RNA, or as a part of a nucleotide. As cytosine triphosphate (CTP), it can act as a co-factor to enzymes, and can transfer a phosphate to convert adenosine diphosphate (ADP) to adenosine triphosphate (ATP). In DNA and RNA, cytosine is paired with guanine. However, it is inherently unstable, and can change into uracil (spontaneous deamination). This can lead to a point mutation if not repaired by the DNA repair enzymes. Cytosine can also be methylated into 5-methylcytosine by an enzyme called DNA methyltransferase.

External links


- — 4-amino-3H-pyrimidin-2-one
- — 4-aminopyrimidin-2-ol
-
- [http://www.compchemwiki.org/index.php?title=Cytosine Computational Chemistry Wiki]
- [http://www.pnas.org/cgi/content/full/96/8/4396 Prebiotic cytosine synthesis: A critical analysis and implications for the origin of life] Category:Pyrimidines ja:シトシン

Guanine

Guanine is one of the five main nucleobases found in nucleic acids (e.g., DNA and RNA). Guanine is a purine derivative, and in Watson-Crick base pairing forms three hydrogen bonds with cytosine. Guanine "stacks" vertically with the other nucleobases via aromatic interactions. Guanine is a tautomer (see keto-enol tautomerism). The guanine nucleoside is called guanosine. Guanine is also the name of a white amorphous substance found in the scales of certain fishes, the guano of sea-birds, and the liver and pancreas of mammals. In fact, the name of the nucleobase is derived from the term 'guano', because it was first isolated from bird manure. In cosmetic industry, crystallic guanine is used as an additive to various products (eg. shampoos), where it provides the pearly iridescent effect. It provides shimmering lustre to eye shadow and nail polish. May irritate eyes. Its alternatives are synthetic pearl, and aluminium and bronze particles.

External links


- [http://www.compchemwiki.org/index.php?title=Guanine Computational Chemistry Wiki] Category:Purines Category:Cosmetic chemicals ja:グアニン

Information

:"Info" redirects here; for other uses, see .info and NFO Information is a word which has many different meanings in everyday usage and in specialized contexts, but as a rule, the concept is closely related to others such as data, instruction, knowledge, meaning, communication, representation, and mental stimulus. Many people speak of the advent of the information age, the information society, and information technologies, and even though information science and computer science are often in the spotlight, the word "information" is often used without careful consideration of the various meanings it has acquired.

Information as a message

Information is a message, something to be communicated from the sender to the receiver. If information is viewed merely as a message, it does not have to be accurate. It may be a truth or a lie, or just a sound of a kiss. Strangely it may even be a disruptive noise used to inhibit the flow of communication and create misunderstanding. This model assumes a sender and a receiver, and does not attach any significance to the idea that information is something that can be extracted from an environment, e.g., through observation or measurement. Information in this sense is simply any message the sender chooses to create.

Measuring information

The view of information as a message came into prominence with the publication in 1948 of an influential paper by Claude Shannon, "A Mathematical Theory of Communication." This paper provides the foundations of information theory and endows the word information not only with a technical meaning but also a measure. If the sending device is equally likely to send any one of a set of N messages, then the preferred measure of "the information produced when one message is chosen from the set" is the base two logarithm of N (This measure is called self-information). In this paper, Shannon continues: :The choice of a logarithmic base corresponds to the choice of a unit for measuring information. If the base 2 is used the resulting units may be called binary digits, or more briefly bits, a word suggested by J. W. Tukey. A device with two stable positions, such as a relay or a flip-flop circuit, can store one bit of information. N such devices can store N bits ... [The Bell System Technical Journal, Vol. 27, p. 379, (July 1948).] A complementary way of measuring information is provided by Algorithmic information theory. In brief, this measures the information content of a list of symbols based on how predictable they are, or more specifically how easy it is to generate the list. The sequence below would have a very low algorithmic information measurement since it is a very predictable pattern, and as the pattern continues the measurement would not change. Shannon information would give the same information measurement for each symbol, since they are statistically random, and each new symbol would increase the measurement. : 123456789101112131415161718192021 Also see: lexicographic information cost

Information as a pattern

Information is any represented pattern. This view assumes neither accuracy nor directly communicating parties, but instead assumes a separation between an object and its representation, as well as the involvement of someone capable of understanding this relationship. This view seems therefore to require a conscious mind. Consider the following example: economic statistics represent an economy, however inaccurately. What are commonly referred to as data in computing, statistics, and other fields, are forms of information in this sense. The electro-magnetic patterns in a computer network and connected devices are related to something other than the pattern itself, such as text to be displayed and keyboard input. Signals, signs, and symbols are also in this category. On the other hand, according to semiotics, data is symbols with certain syntax and information is data with a certain semantic. Painting and drawing contain information to the extent that they represent something such as an assortment of objects on a table, a profile, or a landscape. In other words, when a pattern of something is transposed to a pattern of something else, the latter is information. This type of information still assumes some involvement of conscious mind, of either the entity constructing the representation, or the entity interpreting it. When one constructs a representation of an object, one can selectively extract from the object (sampling) or use a system of signs to replace (encoding), or both. The sampling and encoding result in representation. An example of the former is a "sample" of a product; an example of the latter is "verbal description" of a product. Both contain information of the product, however inaccurate. When one interprets representation, one can predict a broader pattern from a limited number of observations (inference) or understand the relation between patterns of two different things (decoding). One example of the former is to sip a soup to know if it is spoiled; an example of the latter is examining footprints to determine the animal and its condition. In both cases, information sources are not constructed or presented by some "sender" of information. To repeat, information in this sense does not assume direct communication, but it assumes involvement of some conscious mind. Regardless, information is dependent upon, but usually unrelated to and separate from, the medium or media used to express it. In other words, the position of a theoretical series of bits, or even the output once interpreted by a computer or similar device, is unimportant, except when someone or something is present to interpret the information. Therefore, a quantity of information is totally distinct from its medium.

Information as sensory input

Often information is viewed as a type of input to an organism or designed device. Inputs are of two kinds. Some inputs are important to the function of the organism (for example, food) or device (energy) by themselves. In his book Sensory Ecology, Dusenbery called these causal inputs. Other inputs (information) are important only because they are associated with causal inputs and can be used to predict the occurrence of a causal input at a later time (and perhaps another place). Some information is important because of association with other information but eventually there must be a connection to a causal input. In practice, information is usually carried by weak stimuli that must be detected by specialized sensory systems and amplified by energy inputs before they can be functional to the organism or device. For example, light is often a causal input to plants but provides information to animals. The colored light reflected from a flower is too weak to do much photosynthetic work but the visual system of the bee detects it and the bee's nervous system uses the information to guide the bee to the flower, where the bee often finds nectar or pollen, which are causal inputs, serving a nutritional function. Information is any type of sensory input. When an organism with a nervous system receives an input, it transforms the input into an electrical signal. This is regarded information by some. The idea of representation is still relevant, but in a slightly different manner. That is, while abstract painting does not represent anything concretely, when the viewer sees the painting, it is nevertheless transformed into electrical signals that create a representation of the painting. Defined this way, information does not have to be related to truth, communication, or representation of an object. Entertainment in general is not intended to be informative. Music, the performing arts, amusement parks, works of fiction and so on are thus forms of information in this sense, but they are not forms of information according to the previous definitions above. Consider another example: food supplies both nutrition and taste for those who eat it. If information is equated to sensory input, then nutrition is not information but taste is.

Information as an influence which leads to a transformation

Information is any type of pattern that influences the formation or transformation of other patterns. In this sense, there is no need for a conscious mind to perceive, much less appreciate, the pattern. Consider, for example, DNA. The sequence of nucleotides is a pattern that influences the formation and development of an organism without any need for a conscious mind. Systems theory at times seems to refer to information in this sense, assuming information does not necessarily involve any conscious mind, and patterns circulating (due to feedback) in the system can be called information. In other words, it can be said that information in this sense is something potentially perceived as representation, though not created or presented for that purpose. When Marshall McLuhan speaks of media and their effects on human cultures, he refers to the structure of artifacts that in turn shape our behaviors and mindsets. Also, pheromones are often said to be "information" in this sense. In 2003, J. D. Bekenstein claimed there is a growing trend in physics to define the physical world as being made of information itself (and thus information is defined in this way). See the section below on information as a property in physics. (Also see Gregory Bateson.)

Information as a property in physics

Main article: Physical information Information has a well defined meaning in physics. Examples of this include the phenomenon of quantum entanglement where particles can interact without reference to their separation or the speed of light. Information itself cannot travel faster than light even if the information is transmitted indirectly. This could lead to the fact that all attempts at physically observing a particle with an "entangled" relationship to another are slowed down, even though the particles not connected in any other way other than by the information they carry. Another link is demonstrated by the Maxwell's demon thought experiment. In this experiment, a direct relationship between information and another physical property, entropy, is demonstrated. A consequence is that it is impossible to destroy information without increasing the entropy of a system; in practical terms this often means generating heat. Thus, in the study of logic gates, the theoretical lower bound of thermal energy released by an AND gate is more than for the NOT gate (because information is destroyed in an AND gate and simply converted in an NOT gate). Physical information is of particular importance in the theory of quantum computers.

Etymology

According to the Oxford English Dictionary, the earliest historical meaning of the word information in English was the act of informing, or giving form or shape to the mind, as in education, instruction, or training. A quote from 1387: "Five books come down from heaven for information of mankind." It was also used for an item of training, e.g. a particular instruction. "Melibee had heard the great skills and reasons of Dame Prudence, and her wise informations and techniques." (1386) The English word was apparently derived by adding the common "noun of action" ending "-ation" (descended through French from Latin "-tio") to the earlier verb to inform, in the sense of to give form to the mind, to discipline, instruct, teach: "Men so wise should go and inform their kings." (1330) Inform itself comes (via French) from the Latin verb informare, to give form to, to form an idea of. Furthermore, Latin itself already even contained the word informatio meaning concept or idea, but the extent to which this may have influenced the development of the word information in English is unclear. As a final note, the ancient Greek word for form was eidos, and this word was famously used in a technical philosophical sense by Plato (and later Aristotle) to denote the ideal identity or essence of something (see The Forms).

References


- Bekenstein, Jacob D. (2003, August). Information in the holographic universe. Scientific American. Retrieved from http://www.referencenter.com

See also


- Algorithmic information theory
- Classified information
- Fisher information
- Freedom of information
- Information entropy
- Propaganda model
- Free Information Infrastructure
- Information theory
- Information overload
- Information processing
- Information processor
- Information mapping
- Information technology
- Library and Information Science
- Medium
- Observation
- Physical information
- Prediction
- Receiver operating characteristic
- Systems theory and cybernetics
- Satisficing
- The Information highway - A nickname of the Internet, dubbed the greatest source of information.

External links


- [http://plato.stanford.edu/entries/information-semantic/ Semantic Conceptions of Information] Review by Luciano Floridi for the Stanford Encyclopedia of Philosophy
- [http://pespmc1.vub.ac.be/ASC/NEGENTROPY.html Principia Cybernetica entry on negentropy]
- [http://www.princeton.edu/~pear/IU.pdf Information & Uncertainty in Remote Perception Research]
- [http://www.princeton.edu/~pear/JahnATpages.pdf Information, Consciousness & Health] Category:Communication Category:Cybernetics Category:Information technology
-
ko:정보 ja:情報 simple:Information

DNA

:For other uses, see DNA (disambiguation). DNA (disambiguation) Deoxyribonucleic acid (DNA) is a nucleic acid that contains the genetic instructions specifying the biological development of all cellular forms of life (and most viruses). DNA is a long polymer of nucleotides and encodes the sequence of the amino acid residues in proteins using the genetic code, a triplet code of nucleotides. In complex cells (eukaryotes), such as those from plants, animals, fungi and protists, most of the DNA is located in the cell nucleus. By contrast, in simpler cells called prokaryotes (the eubacteria and archaea), DNA is not separated from the cytoplasm by a nuclear envelope. The cellular organelles known as chloroplasts and mitochondria also carry DNA. DNA is often referred to as the molecule of heredity as it is responsible for the genetic propagation of most inherited traits. These traits can range from hair colour to disease susceptibility. During cell division, DNA is replicated and can be transmitted to offspring during reproduction. Lineage studies can be done based on the facts that the DNA in mitochondria (mitochondrial DNA) only comes from the mother, and the male "Y" chromosome only comes from the father. Every person's DNA, their genome, is inherited from both parents. The mother's mitochondrial DNA together with twenty-three chromosomes from each parent combine to form the genome of a fertilized egg. As a result, with certain exceptions such as red blood cells, most human cells contain 23 pairs of chromosomes, together with mitochondrial DNA inherited from the mother.

DNA Overview

red blood cell This section presents an introductory and therefore incomplete overview of DNA.
- Genes can be loosely viewed as the organism's "cookbook" or "blueprint";
- A strand of DNA contains genes, areas that regulate genes, and areas that either have no function, or a function we do not (yet) know (also see last bullet point in this section for the difference between DNA and RNA);
- DNA is organized as two complementary strands, head-to-toe, with bonds between them that can be "unzipped" like a zipper, separating the strands;
- DNA is a chain of chemical "building blocks", called "bases", of which there are four types: these can be abbreviated A, T, C, and G. Each base can only "pair up" with one single predetermined other base: A+T, T+A, C+G and G+C are the only possible combinations; that is, an "A" on one strand of double-stranded DNA will "mate" properly only with a "T" on the other, complementary strand;
  - N.B.: U occasionally replaces T, notably in PBS1 phage DNA; you can thus substitute "U" for "T" throughout this section.
- Because each strand of DNA has a directionality, the sequence order does matter: A+T is not the same as T+A, just as C+G is not the same as G+C;
- For each given base, there is just one possible complementary base, so naming the bases on the conventionally chosen side of the strand is enough to describe the entire double-strand sequence;
- The genetic information contained in a strand of DNA is determined by the sequence of bases along its length;
- The cell begins DNA replication by forcibly unzipping the DNA double strand down the middle, and then recreates the "other half" of each new single strand by drowning each half in a "soup" made of the four bases. An enzyme makes a new strand by finding the correct "base" in the soup and pairing it with the original strand. In this way, the base on the old strand dictates which base will be on the new strand, and the cell ends up with an extra copy of its DNA.
- Mutations are simply chemical imperfections in this process: a base is accidentally skipped, inserted, or incorrectly copied, or the chain is trimmed, or added to; many basic mutations can be described as combinations of these accidental "operations". Mutations can also occur through chemical damage (through mutagens), light (UV damage), or through other more complicated gene swapping events.
- DNA (for DeoxyriboNucleic Acid) differs from RNA (for RiboNucleic Acid) by having the sugar 2-deoxyribose instead of ribose in its backbone (ribose contains one extra oxygen atom compared to deoxyribose -- in other words, DNA contains deoxygenated ribose, whereas RNA contains "plain" ribose.) This is the basic chemical distinction between RNA and DNA.

DNA in practice

DNA in crime

Forensic scientists can use DNA located in blood, semen, skin, saliva, or hair left at the scene of a crime to identify a possible suspect, a process called genetic fingerprinting or DNA profiling. In DNA profiling the relative lengths of sections of repetitive DNA, such as short tandem repeats and minisatellites, are compared. DNA profiling was developed in 1984 by English geneticist Alec Jeffreys, and was first used in 1986 in the Enderby murders case in Leicestershire, England. Many jurisdictions require convicts of certain types of crimes to provide a sample of DNA for inclusion in a computerized database. This has helped investigators solve old cases where the perpetrator was unknown and only a DNA sample was obtained from the scene (particularly in rape cases between strangers). This method is one of the most reliable techniques for identifying a criminal, but is not always perfect, for example if no DNA can be retrieved, or if the scene is contaminated with the DNA of several possible suspects.

DNA in computation

Despite its biological origins, DNA plays an important role in computer science, both as a motivating research problem and as a method of computation in itself, called DNA computing. As a simple example, research on string searching algorithms, which find an occurrence of a sequence of letters inside a larger sequence of letters, was motivated by DNA research, where it is used to find specific sequences of nucleotides in a large sequence. In other applications like text editors, even simple algorithms for this problem usually suffice, but DNA sequences cause these algorithms to exhibit near-worst-case behavior due to their small number of distinct characters. Databases have also been strongly motivated by DNA research, which requires special tools for storing and manipulating DNA sequences. Databases specialized for this purpose are called genomic databases, and have a number of unique technical challenges associated with the operations of approximate matching, sequence comparison, finding repeating patterns, and homology searching. In 1994, Leonard Adleman of the University of Southern California made headlines when he discovered a way of solving the directed Hamiltonian path problem, an NP-complete problem, using tools from molecular biology, in particular DNA. The new approach, dubbed DNA computing, has practical advantages over traditional computers in power use, space use, and efficiency, due to its ability to highly parallelize the computation (see parallel computing)(there is labor worth mention involved in retrieving answers computed these computational DNA techniques.). A number of other problems, including simulation of various abstract machines, the boolean satisfiability problem, and the bounded version of the Post correspondence problem, have since been analyzed using DNA computing. Due to its compactness, DNA also has an important role in cryptography, where in particular it allows unbreakable one-time pads to be efficiently constructed and used.[http://citeseer.ist.psu.edu/gehani99dnabased.html]

Overview of molecular structure

one-time pad Although sometimes called "the molecule of heredity", pieces of DNA as people typically think of them are not single molecules. Rather, they are pairs of molecules, which entwine like vines to form a double helix (see the illustration at the right). Each vine-like molecule is a strand of DNA: a chemically linked chain of nucleotides, each of which consists of a sugar, a phosphate and one of five kinds of nucleobases ("bases"). Because DNA strands are composed of these nucleotide subunits, they are polymers. The diversity of the bases means that there are five kinds of nucleotides, which are commonly referred to by the identity of their bases. These are adenine (A), thymine (T), uracil (U), cytosine (C), and guanine (G). U is rarely found in DNA except as a result of chemical degradation of C, but in some viruses, notably PBS1 phage DNA, U completely replaces the usual T in its DNA. Similarly, RNA usually contains U in place of T, but in certain RNAs such as transfer RNA, T is always found in some positions. Thus, the only true difference between DNA and RNA is the sugar, 2-deoxyribose in DNA and ribose in RNA. In a DNA double helix, two polynucleotide strands can associate through the hydrophobic effect and pi stacking. Specificity of which strands stay associated is determined by complementary pairing. Each base forms hydrogen bonds readily to only one other -- A to T and C to G -- so that the identity of the base on one strand dictates the strength of the association; the more complementary bases exist, the stronger and longer-lasting the association. The cell's machinery is capable of melting or disassociating a DNA double helix, and using each DNA strand as a template for synthesizing a new strand which is nearly identical to the previous strand. Errors that occur in the synthesis are known as mutations. The process known as PCR (polymerase chain reaction) mimics this process in vitro in a nonliving system. Because pairing causes the nucleotide bases to face the helical axis, the sugar and phosphate groups of the nucleotides run along the outside; the two chains they form are sometimes called the "backbones" of the helix. In fact, it is chemical bonds between the phosphates and the sugars that link one nucleotide to the next in the DNA strand.

The role of the sequence

Within a gene, the sequence of nucleotides along a DNA strand defines a messenger RNA sequence which then defines a protein, that an organism is liable to manufacture or "express" at one or several points in its life using the information of the sequence. The relationship between the nucleotide sequence and the amino-acid sequence of the protein is determined by simple cellular rules of translation, known collectively as the genetic code. The genetic code is made up of three-letter 'words' (termed a codon) formed from a sequence of three nucleotides (e.g. ACT, CAG, TTT). These codons can then be translated with messenger RNA and then transfer RNA, with a codon corresponding to a particular amino acid. There are 64 possible codons (4 bases in 3 places 4^3) that encode 20 amino acids. Most amino acids, therefore, have more than one possible codon. There are also three 'stop' or 'nonsense' codons signifying the end of the coding region, namely the UAA, UGA and UAG codons. In many species, only a small fraction of the total sequence of the genome appears to encode protein. For example, only about 1.5% of the human genome consists of protein-coding exons. The function of the rest is a matter of speculation. It is known that certain nucleotide sequences specify affinity for DNA binding proteins, which play a wide variety of vital roles, in particular through control of replication and transcription. These sequences are frequently called regulatory sequences, and researchers assume that so far they have identified only a tiny fraction of the total that exist. "Junk DNA" represents sequences that do not yet appear to contain genes or to have a function. The reasons for the presence of so much non-coding DNA in eukaryotic genomes and the extraordinary differences in genome size ("C-value") among species represent a long-standing puzzle in DNA research known as the "C-value enigma". Some DNA sequences play structural roles in chromosomes. Telomers and centromeres typically contain few (if any) protein-coding genes, but are important for the function and stability of chromosomes. Some genes code for "RNA genes" (see tRNA and rRNA). Some RNA genes code for transcripts that function as regulatory RNAs (see siRNA) that influence the function of other RNA molecules. The intron-exon structure of some genes (such as immunoglobin and protocadeherin genes) is important for allowing alternative splicing of pre-mRNA which allows several different proteins to be made from the same gene. Some non-coding DNA represents pseudogenes that can be used as raw material for the creation of new genes with new functions. Some non-coding DNA provided hot-spots for duplication of short DNA regions; such sequence duplication has been the major form of genetic change in the human lineage (see evidence from the Chimpanzee Genome Project). Exons interspersed with introns allows for "exon shuffling" and the creation of modified genes that might have new adaptive functions. Large amounts of non-coding DNA is probably adaptive in that it provides chromosomal regions where recombination between homologous portions of chromosomes can take place without disrupting the function of genes. Some biologists such as Stuart Kauffman have speculated that there must be mechanisms by which the rate of evolution of a species can be increased or decreased. Non-coding DNA provides mechanisms for gene creation, modification and recombination it is probably important for control of the rate of human evolution. Sequence also determines a DNA segment's susceptibility to cleavage by restriction enzymes, the quintessential tools of genetic engineering. The position of cleavage sites throughout an individual's genome determines one kind of an individual's "DNA fingerprint".

DNA replication

Main article: DNA replication DNA replication DNA replication or DNA synthesis is the process of copying the double-stranded DNA prior to cell division. The two resulting double strands are generally almost perfectly identical, but occasionally errors in replication can result in a less than perfect copy (see mutation), and each of them consists of one original and one newly synthesized strand. This is called semiconservative replication. The process of replication consists of three steps: initiation, replication and termination.

Mechanical properties relevant to biology

Main article: Mechanical properties of DNA.

Strands association and dissociation

The hydrogen bonds between the strands of the double helix are weak enough that they can be easily separated by enzymes. Enzymes known as helicases unwind the strands to facilitate the advance of sequence-reading enzymes such as DNA polymerase. The unwinding requires that helicases chemically cleave the phosphate backbone of one of the strands so that it can swivel around the other. The strands can also be separated by gentle heating, as used in PCR, provided they have fewer than about 10,000 base pairs (10 kilobase pairs, or 10 kbp). The intertwining of the DNA strands makes long segments difficult to separate.

Circular DNA

When the ends of a piece of double-helical DNA are joined so that it forms a circle, as in plasmid DNA, the strands are topologically knotted. This means they cannot be separated by gentle heating or by any process that does not involve breaking a strand. The task of unknotting topologically linked strands of DNA falls to enzymes known as topoisomerases. Some of these enzymes unknot circular DNA by cleaving two strands so that another double:stranded segment can pass through. Unknotting is required for the replication of circular DNA as well as for various types of recombination in linear DNA.

Great length versus tiny breadth

The narrow breadth of the double helix makes it impossible to detect by conventional electron microscopy, except by heavy staining. At the same time, the DNA found in many cells can be macroscopic in length -- approximately 5 centimetres long for strands in a human chromosome. Consequently, cells must compact or "package" DNA to carry it within them. This is one of the functions of the chromosomes, which contain spool-like proteins known as histones, around which DNA winds.

Entropic stretching behavior

When DNA is in solution, it undergoes conformational fluctuations due to the energy available in the thermal bath. For entropic reasons, more floppy states are thermally accessible than stretched out states; for this reason, a single molecule of DNA stretches similarly to a rubber band. Using optical tweezers, the entropic stretching behavior of DNA has been studied and analyzed from a polymer physics perspective, and it has been found that DNA behaves like the Kratky-Porod worm-like chain model with a persistence length of about 53 nm. Furthermore, DNA undergoes a stretching phase transition at a force of 65 pN; above this force, DNA is thought to take the form that Linus Pauling originally hypothesized, with the phosphates in the middle and bases splayed outward. This proposed structure for overstretched DNA has been called "P-form DNA," in honor of Pauling.

Different helix geometries

The DNA helix can assume one of three slightly different geometries, of which the "B" form described by James D. Watson and Francis Crick is believed to predominate in cells. It is 2 nanometres wide and extends 3.4 nanometres per 10 bp of sequence. This is also the approximate length of sequence in which the double helix makes one complete turn about its axis. This frequency of twist (known as the helical pitch) depends largely on stacking forces that each base exerts on its neighbors in the chain.

Supercoiled DNA

The B form of the DNA helix twists 360° per 10.6 bp in the absence of strain. But many molecular biological processes can induce strain. A DNA segment with excess or insufficient helical twisting is referred to, respectively, as positively or negatively "supercoiled". DNA in vivo is typically negatively supercoiled, which facilitates the unwinding of the double-helix required for RNA transcription.

Sugar pucker

There are four conformations that the ribofuranose rings in nucleotides can acquire: # C-2' endo # C-2' exo # C-3' endo # C-3' exo Ribose is usually in C-3'endo, while deoxyribose is usually in the C-2' endo sugar pucker conformation. The A and B forms differ mainly in their sugar pucker. In the A form, the C3' configuration is above the sugar ring, whilst the C2' configuration is below it. Thus, the A form is described as "C3'-endo." Likewise, in the B form, the C2' configuration is above the sugar ring, whilst C3' is below; this is called "C2'-endo." Altered sugar puckering in A-DNA results in shortening the distance between adjacent phosphates by around one angstrom. This gives 11 to 12 base pairs to each helix in the DNA strand, instead of 10.5 in B-DNA. Sugar pucker gives uniform ribbon shape to DNA, a cylindrical open core, and also a deep major groove more narrow and pronounced that grooves found in B-DNA.

Conditions for formation of A and Z helices

The two other known double-helical forms of DNA, called A and Z, differ modestly in their geometry and dimensions. The A form appears likely to occur only in dehydrated samples of DNA, such as those used in crystallographic experiments, and possibly in hybrid pairings of DNA and RNA strands. Segments of DNA that cells have methylated for regulatory purposes may adopt the Z geometry, in which the strands turn about the helical axis like a mirror image of the B form.

Table of comparison of the properties of different helical forms

Non-helical forms

Other, including non-helical, forms of DNA have been described, for example a side-by-side (SBS) configuration. Indeed, it is far from certain that the B-form double helix is the dominant form in living cells.

Direction of DNA strands

The asymmetric shape and linkage of nucleotides means that a DNA strand always has a discernible orientation or directionality. Because of this directionality, close inspection of a double helix reveals that nucleotides are heading one way along one strand (the "ascending strand"), and the other way along the other strand (the "descending strand"). This arrangement of the strands is called antiparallel.

Chemical nomenclature (5' and 3')

For reasons of chemical nomenclature, people who work with DNA refer to the asymmetric ends of ("five prime" and "three prime"). Biologists and the DNA enzymes they use, predominantly read nucleotide sequences in the "5' to 3' direction". However, because chemically produced DNA is synthesized and manipulated in the opposite or in non-directional manners, the orientation should not be assumed. In a vertically oriented double helix, the 3' strand is said to be ascending while the 5' strand is said to be descending.

Sense and antisense

As a result of their antiparallel arrangement and the sequence-reading preferences of enzymes, even if both strands carried identical instead of complementary sequences, cells could properly translate only one of them. The other strand a cell can only read backwards. Molecular biologists call a sequence "sense" if it is translated or translatable, and they call its complement "antisense". It follows then, somewhat paradoxically, that the template for transcription is the antisense strand. The resulting transcript is an RNA replica of the sense strand and is itself sense.

Distinction between sense and antisense strands

A small proportion of genes in prokaryotes, and more in plasmids and viruses, blur the distinction made above between sense and antisense strands. Certain sequences of their genomes do double duty, encoding one protein when read 5' to 3' along one strand, and a second protein when read in the opposite direction (still 5' to 3') along the other strand. As a result, the genomes of these viruses are unusually compact for the number of genes they contain, which biologists view as an adaptation. This merely confirms that there is no biological distinction between the two strands of the double helix. Indeed, typically each strand of a DNA double helix will act as sense and antisense in different regions.

As viewed by topologists

Topologists like to note that the juxtaposition of the 3′ end of one DNA strand beside the 5′ end of the other at both ends of a double-helical segment makes the arrangement a "crab canon".

Single-stranded DNA (ssDNA) and repair of mutations

In some viruses DNA appears in a non-helical, single-stranded form. Because many of the DNA repair mechanisms of cells work only on paired bases, viruses that carry single-stranded DNA genomes mutate more frequently than they would otherwise. As a result, such species may adapt more rapidly to avoid extinction. The result would not be so favorable in more complicated and more slowly replicating organisms, however, which may explain why only viruses carry single-stranded DNA. These viruses presumably also benefit from the lower cost of replicating one strand versus two.

The history of DNA research

mutate at the University of Cambridge]] The discovery that DNA was the carrier of genetic information was a process that required many earlier discoveries. The existence of DNA was discovered in the mid 19th century. However, it was only in the early 20th century that researchers began suggesting that it might store genetic information. This was only accepted after the structure of DNA was elucidated by Watson and Crick in their 1953 Nature publication. Watson and Crick proposed the central dogma of molecular biology in 1957, describing the process whereby proteins are produced from nucleic DNA.

First isolation of DNA

Working in the 19th century, biochemists initially isolated DNA and RNA (mixed together) from cell nuclei. They were relatively quick to appreciate the polymeric nature of their "nucleic acid" isolates, but realized only later that nucleotides were of two types--one containing ribose and the other deoxyribose. It was this subsequent discovery that led to the identification and naming of DNA as a substance distinct from RNA. Friedrich Miescher (1844-1895) discovered a substance he called "nuclein" in 1869. Somewhat later, he isolated a pure sample of the material now known as DNA from the sperm of salmon, and in 1889 his pupil, Richard Altmann, named it "nucleic acid". This substance was found to exist only in the chromosomes. In 1929 Phoebus Levene at the Rockefeller Institute identified the components (the four bases, the sugar and the phosphate chain) and he showed that the components of DNA were linked in the order phosphate-sugar-base. He called each of these units a nucleotide and suggested the DNA molecule consisted of a string of nucleotide units linked together through the phosphate groups, which are the 'backbone' of the molecule. However Levene thought the chain was short and that the bases repeated in the same fixed order. Torbjorn Caspersson and Einar Hammersten showed that DNA was a polymer.

Establishing a link between heritable traits and chromosomes

Max Delbrück, Nikolai V. Timofeeff-Ressovsky, and Karl G. Zimmer published results in 1935 suggesting that chromosomes are very large molecules the structure of which can be changed by treatment with X-rays, and that by so changing their structure it was possible to change the heritable characteristics governed by those chromosomes. In 1937 William Astbury produced the first X-ray diffraction patterns from DNA. He was not able to propose the correct structure but the patterns showed that DNA had a regular structure and therefore it might be possible to deduce what this structure was. In 1943, Oswald Theodore Avery discovered that traits proper to the "smooth" form of the Pneumococcus could be transferred to the "rough" form of the same bacteria merely by making the killed "smooth" (S) form available to the live "rough" (R) form. Quite unexpectedly, the living R Pneumococcus bacteria were transformed into a new strain of the S form, and the transferred S characteristics turned out to be heritable. Avery called the medium of transfer of traits the transforming principle; he identified DNA as the transforming principle, and not protein as previously thought. In 1953, Alfred Hershey and Martha Chase did an experiment (Hershey-Chase experiment) that showed, in T2 phage, that DNA is the genetic material (Hershey shared the Nobel prize with Luria). genetic material double-helix pattern]] In 1944, the renowned physicist, Erwin Schrödinger, published a brief book entitled What is Life?, where he maintained that chromosomes contained what he called the "hereditary code-script" of life. He added: "But the term code-script is, of course, too narrow. The chromosome structures are at the same time instrumental in bringing about the development they foreshadow. They are law-code and executive power -- or, to use another simile, they are architect's plan and builder's craft -- in one." He conceived of these dual functional elements as being woven into the molecular structure of chromosomes. By understanding the exact molecular structure of the chromosomes one could hope to understand both the "architect's plan" and also how that plan was carried out through the "builder's craft." Three groups took up Schrödinger's challenge to work out the structure of the chromosomes and the question of how the segments of the chromosomes that were conceived to relate to specific traits could possibly do their jobs. Just how the presence of specific features in the molecular structure of chromosomes could produce traits and behaviors in living organisms was unimaginable at the time. Because chemical dissection of DNA samples always yielded the same four nucleotides, the chemical composition of DNA appeared simple, perhaps even uniform. Organisms, on the other hand, are fantastically complex individually and widely diverse collectively. Geneticists did not speak of genes as conveyors of "information" in such words, but if they had, they would not have hesitated to quantify the amount of information that genes need to convey as vast. The idea that information might reside in a chemical in the same way that it exists in text--as a finite alphabet of letters arranged in a sequence of unlimited length--had not yet been conceived. It would emerge upon the discovery of DNA's structure, but few researchers imagined that DNA's structure had much to say about genetics.

Discovery of the structure of DNA

In the 1950s, three groups made it their goal to determine the structure of DNA. The first group to start was at King's College London and was led Maurice Wilkins and was later joined by Rosalind Franklin. Another group consisting of Francis Crick and James D. Watson was at Cambridge. A third group was at CalTech and was led by Linus Pauling. Crick and Watson built physical models using metal rods and balls, in which they incorporated the known chemical structures of the nucleotides, as well as the known position of the linkages joining one nucleotide to the next along the polymer. At King's College Maurice Wilkins and Rosalind Franklin examined X-ray diffraction patterns of DNA fibers. Of the three groups, only the London group was able to produce good quality diffraction patterns and thus produce sufficient quantitative data about the structure X-ray diffraction

Discovery that DNA is helical

In 1948 Pauling discovered that many proteins included helical (see alpha helix) shapes. Pauling had deduced this structure from X-ray patterns. (Pauling was also later to suggest an incorrect three chain helical structure based on Astbury's data.) Even in the initial diffraction data from DNA by Maurice Wilkins, it was evident that the structure involved helices. But this insight was only a beginning. There remained the questions of how many strands came together, whether this number was the same for every helix, whether the bases pointed toward the helical axis or away, and ultimately what were the explicit angles and coordinates of all the bonds and atoms. Such questions motivated the modeling efforts of Watson and Crick.

Discovery that complementary nucleotides occur in equal proportions

In their modeling, Watson and Crick restricted themselves to what they saw as chemically and biologically reasonable. Still, the breadth of possibilities was very wide. A breakthrough occurred in 1952, when Erwin Chargaff visited Cambridge and inspired Crick with a description of experiments Chargaff had published in 1947. Chargaff had observed that the proportions of the four nucleotides vary between one DNA sample and the next, but that for particular pairs of nucleotides -- adenine and thymine, guanine and cytosine -- the two nucleotides are always present in equal proportions.

Watson and Crick's model

1947 Watson and Crick had begun to contemplate double helical arrangements, but they lacked information about the amount of twist (pitch) and the distance between the two strands. Rosalind Franklin had to disclose some of her findings for the Medical Research Council and Crick saw this material through Max Perutz's links to the MRC. Franklin's work confirmed a double helix that was on the outside of the molecule and also gave an insight into its symmetry, in particular that the two helical strands ran in opposite directions. Watson and Crick were again greatly assisted by more of Franklin's data. This is controversial because Franklin's critical X-ray pattern was shown to Watson and Crick without Franklin's knowledge or permission. Wilkins showed the famous Photo 51 to Watson at his lab immediately after Watson had been unsuccessful in asking Franklin to collaborate to beat Pauling in finding the structure. From the data in photograph 51 Watson and Crick were able to discern that not only was the distance between the two strands was constant, but also to measure its exact value of 2 nanometres. The same photograph also gave them the 3.4 nanometre-per-10 bp "pitch" of the helix. The final insight came when Crick and Watson saw that a complementary pairing of the bases could provide an explanation for Chargaff's puzzling finding. However the structure of the bases had been incorrectly guessed in the textbooks as the enol tautomer when they were more likely to be in the keto form. When Jerry Donohue pointed this fallacy out to Watson, Watson quickly realised that the pairs of adenine and thymine, and guanine and cytosine were almost identical in shape and so would provide equally sized 'rungs' between the two strands. With the base-pairing, the Watson and Crick quickly converged upon a model, which they announced before Franklin herself had published any of her work. Franklin was two steps away from the solution. She had not guessed the base-pairing and had not appreciated the implications of the symmetry that she had described. However she had been working almost alone and did not have regular contact with a partner like Crick and Watson, and with other experts such as Jerry Donohoe. Her notebooks show that she was aware both of Jerry Donohue's work concerning tautomeric forms of bases (she used the keto forms for three of the bases) and of Chargaff's work. The disclosure of Franklin's data to Watson has angered some people who believe Franklin did not receive due credit at the time and that she might have discovered the structure on her own before Crick and Watson. In Crick and Watson's famous paper in Nature in 1953, they said that their work had been stimulated by the work of Wilkins and Franklin, whereas it had been the basis of their work. However they had agreed with Wilkins and Franklin that they all should publish papers in the same issue of Nature in support of the proposed structure.

Publishing of the "Central Dogma"

Watson and Crick's model attracted great interest immediately upon its presentation. Arriving at their conclusion on February 21 1953, Watson and Crick made their first announcement on February 28. Their paper [http://www.nature.com/genomics/human/watson-crick/ 'A Structure for Deoxyribose Nucleic Acid'] was published on April 25. In an influential presentation in 1957, Crick laid out the "Central Dogma", which foretold the relationship between DNA, RNA, and proteins, and articulated the "sequence hypothesis." A critical confirmation of the replication mechanism that was implied by the double-helical structure followed in 1958 in the form of the Meselson-Stahl experiment. Work by Crick and coworkers showed that the genetic code was based on non-overlapping triplets of codons, and Har Gobind Khorana and others deciphered the genetic code not long afterward. These findings represent the birth of molecular biology. Watson, Crick, and Wilkins were awarded the 1962 Nobel Prize for Physiology or Medicine for discovering the molecular structure of DNA, by which time Franklin had died. Nobel prizes are not awarded posthumously; had she lived, the difficult decision over whom to jointly award the prize would have been complicated as the prize can only be shared between two or three. The process of the actual nomination is covered in Graeme Hunter's biography of Sir Lawrence Bragg, "Light is a Messenger" (pub. 2004)

Bibliography


- DNA: The Secret of Life, by James D. Watson. ISBN 0-375-41546-7
- The Double Helix: A Personal Account of the Discovery of the Structure of DNA (Norton Critical Editions), by James D. Watson. ISBN 0393950751

External links


- Extensive online guide to the life and work of Francis Crick, O.M. compiled by Martin Packer, Birmingham (England): http://www.packer34.freeserve.co.uk/rememberingfranciscrickacelebration.htm martin@packer34.freeserve.co.uk; recollections of Francis Crick (for publication) for the forthcoming biography would be very much appreciated as soon as possible.
- Listen to Francis Crick and James Watson talking on the BBC in 1962, 1972, and 1974: http://www.bbc.co.uk/bbcfour/audiointerviews/profilepages/crickwatson1.shtml
- [http://news.bbc.co.uk/1/hi/sci/tech/2949629.stm 17 April, 2003, BBC News: Most ancient DNA ever?]
- [http://www.whatsnextnetwork.com/health/index.php?cat=61 Latest Advances In Gene Research]
- [http://www.dnai.org DNA Interactive] (requires Macromedia Flash)
- [http://3dscience.com/3d_dna_models.asp Free 3d DNA model Images]
- [http://nist.rcsb.org/pdb/molecules/pdb23_1.html DNA: PDB molecule of the month]
- [http://www.fidelitysystems.com/Unlinked_DNA.html DNA under electron microscope]
- [http://www.myfirstbookaboutdna.com My First Book About DNA] Designed for children to learn more about DNA.
-
- [http://www.rotten.com/library/medicine/dna/ Rotten Library] articles on DNA
- Watson, James, and Francis Crick, "[http://biocrs.biomed.brown.edu/Books/Chapters/Ch%208/DH-Paper.html Molecular structure of nucleic acids], A structure for Deoxyribose Nucleic Acid". April 2, 1953. (paper on the structure of DNA) Category:Nucleic acids Category:Genetics
-
ko:DNA ms:DNA ja:デオキシリボ核酸 simple:DNA th:ดีเอ็นเอ

Genetic code

The genetic code is a set of rules that maps DNA sequences to proteins in the living cell, and is employed in the process of protein synthesis. Nearly all living things use the same genetic code, called the standard genetic code, although a few organisms use minor variations of the standard code.

Genome expression

The genetic information carried by an organism - its genome - is inscribed in one or more DNA molecules. Each functional portion of a DNA molecule is referred to as a gene. Each gene is transcribed into a short template molecule of the related polymer RNA, which is better suited for protein synthesis. This in turn is translated by mediation of a machinery consisting of ribosomes and a set of transfer RNAs and associated enzymes into an amino acid chain (polypeptide), which will then be folded into a protein. The gene sequence inscribed in DNA, and in RNA, is composed of tri-nucleotide units called codons, each coding for a single amino acid. Each nucleotide sub-unit consists of a phosphate, deoxyribose sugar and one of the 4 nitrogenous nucleotide bases grouped into 2 categories, purine and pyrimidine. The purine bases adenine (A) and guanine (G) are larger and consist of two aromatic rings. The pyrimidine bases cytosine (C) and thymine (T) are smaller and consist of only one aromatic ring. In RNA, however, thymine (T) is substituted by uracil (U), and the deoxyribose is substituted by ribose. Overall, there are 43 = 64 different codon combinations. For example, the RNA sequence UUUAAACCC contains the codons UUU, AAA and CCC, each of which specifies one amino acid. So, this RNA sequence represents a protein sequence, three amino acids long. (DNA is also a sequence of nucleotide bases, but there thymine takes the place of uracil.) The standard genetic code is shown in the following tables. Table 1 shows what amino acid each of the 64 codons specifies. Table 2 shows what codons specify each of the 20 standard amino acids involved in translation. These are called forward and reverse codon tables, respectively. For example, the codon AAU represents the amino acid asparagine (Asn), and cysteine (Cys) is represented by UGU and by UGC.

Table 1: RNA Codon table

This table shows the 64 codons and the amino acid each codon codes for.
2nd base
U C A G
1st
base
U UUU (Phe/F)Phenylalanine
UUC (Phe/F)Phenylalanine
UUA (Leu/L)Leucine
UUG (Leu/L)Leucine, Start
UCU (Ser/S)Serine
UCC (Ser/S)Serine
UCA (Ser/S)Serine
UCG (Ser/S)Serine
UAU (Tyr/Y)Tyrosine
UAC (Tyr/Y)Tyrosine
UAA Ochre (Stop)
UAG Amber (Stop)
UGU (Cys/C)Cysteine
UGC (Cys/C)Cysteine
UGA Opal (Stop)
UGG (Trp/W)Tryptophan
C CUU (Leu/L)Leucine
CUC (Leu/L)Leucine
CUA (Leu/L)Leucine
CUG (Leu/L)Leucine, Start
CCU (Pro/P)Proline
CCC (Pro/P)Proline
CCA (Pro/P)Proline
CCG (Pro/P)Proline
CAU (His/H)Histidine
CAC (His/H)Histidine
CAA (Gln/Q)Glutamine
CAG (Gln/Q)Glutamine
CGU (Arg/R)Arginine
CGC (Arg/R)Arginine
CGA (Arg/R)Arginine
CGG (Arg/R)Arginine
A AUU (Ile/I)Isoleucine, Start2
AUC (Ile/I)Isoleucine
AUA (Ile/I)Isoleucine
AUG (Met/M)Methionine, Start1
ACU (Thr/T)Threonine
ACC (Thr/T)Threonine
ACA (Thr/T)Threonine
ACG (Thr/T)Threonine
AAU (Asn/N)Asparagine
AAC (Asn/N)Asparagine
AAA (Lys/K)Lysine
AAG (Lys/K)Lysine
AGU (Ser/S)Serine
AGC (Ser/S)Serine
AGA (Arg/R)Arginine
AGG (Arg/R)Arginine
G GUU (Val/V)Valine
GUC (Val/V)Valine
GUA (Val/V)Valine
GUG (Val/V)Valine, Start2
GCU (Ala/A)Alanine
GCC (Ala/A)Alanine
GCA (Ala/A)Alanine
GCG (Ala/A)Alanine
GAU (Asp/D)Aspartic acid
GAC (Asp/D)Aspartic acid
GAA (Glu/E)Glutamic acid
GAG (Glu/E)Glutamic acid
GGU (Gly/G)Glycine
GGC (Gly/G)Glycine
GGA (Gly/G)Glycine
GGG (Gly/G)Glycine
1The codon AUG both codes for methionine and serves as an initiation site: the first AUG in an mRNA's coding region is where translation into protein begins.
2This is a start codon for prokaryotes only.

Table 2: Reverse codon table

This table shows the 20 standard amino acids used in proteins, and the codons that code for each amino acid.
Ala A GCU, GCC, GCA, GCG Leu L UUA, UUG, CUU, CUC, CUA, CUG
Arg R CGU, CGC, CGA, CGG, AGA, AGG Lys K AAA, AAG
Asn N AAU, AAC Met M AUG
Asp D GAU, GAC Phe F UUU, UUC
Cys C UGU, UGC Pro P CCU, CCC, CCA, CCG
Gln Q CAA, CAG Ser S UCU, UCC, UCA, UCG, AGU,AGC
Glu E GAA, GAG Thr T ACU, ACC, ACA, ACG
Gly G GGU, GGC, GGA, GGG Trp W UGG
His H CAU, CAC Tyr Y UAU, UAC
Ile I AUU, AUC, AUA Val V GUU, GUC, GUA, GUG
Start AUG, GUG Stop UAG, UGA, UAA
Marshall W. Nirenberg and Heinrich J. Matthaei at the National Institutes of Health performed the experiments that first elucidated the correspondence between the codons and the amino acids that they code. Har Gobind Khorana expanded on Nirenberg's work and found the codes for the amino acids that Nirenberg's methods could not find. Khorana and Nirenberg won a share of the 1968 Nobel Prize in Physiology or Medicine for this work.

Technical details

Stop/Stop Codons

In classical genetics, the stop codons were given names: UAG was amber, UGA was opal, and UAA was ochre. These names were originally the names of the specific genes in which mutation of each of these stop codons was first detected. Translation starts with a chain initiation codon (start codon). Unlike stop codons, the codon alone is not sufficient to begin the process; nearby initiation sequences are also required to induce transcription into mRNA and binding by ribosomes. The most notable start codon is AUG, which also codes for methionine. CUG and UUG, and in prokaryotes GUG and AUU, also function as start codons, but occur much less frequently.

Degeneracy of the genetic code

Many codons are degenerate or redundant, meaning that two or more codons may code for the same amino acid. Degenerate codons typically differ in their third positions; e.g., both GAA and GAG code for the amino acid glutamic acid. A codon is said to be four-fold degenerate if any nucleotide at its third position specifies the same amino acid; it is said to be two-fold degenerate if only two of four possible nucleotides at its third position specify the same amino acid. In two-fold degenerate codons, the equivalent third position nucleotides are always either two purines (A/G) or two pyrimidines (C/T). The degeneracy of the genetic code is what accounts for the existence of silent mutations. Degeneracy is required in order to produce enough different codons to code for 20 amino acids and a stop and start codon (at least 22 codons required). Because there are four different bases, triplet codons are the minimum number required to produce at least 22 different codes. For example, if there were two bases per codon, then only 16 amino acids could be coded for (4²=16). Because at least 22 codes are required, then 4³ gives 64, which is the number of possible codons. These properties of the genetic code make it more fault-tolerant for point mutations. For example, four-fold degenerate codons can tolerate any point mutation at the third position; two-fold degenerate codons can tolerate one out of the three possible point mutations at the third position. Since transition mutations (purine to purine or pyrimidine to pyrimidine mutations) are more likely than transversion (purine to pyrimidine or vice-versa) mutations, the equivalence of purines or that of pyrimidines at two-fold degenerate sites adds a further fault-tolerance. A practical consequence of redundancy is that some errors in the genetic code only cause either a silent mutation or an error that would not affect the amino acid's hydrophilic/hydrophobic property; e.g., a codon of XUX (where X = any nucleotide) tends to code for hydrophobic amino acids. Even so, it is a single point mutation that causes a modified hemoglobin molecule in sickle-cell disease. The hydrophilic glutamate (Glu) is substituted by the hydrophobic valine (Val), which reduces the solubility of ß-globin. This causes hemoglobin to form linear polymers linked by the hydrophobic interaction between the valine groups causing sickle-cell deformation of erythrocytes. Sickle-cell disease is generally not caused by a de novo mutation. Rather it is selected for in malarial regions (in a similar way to thalassemia), as heterozygous people have some resistance to the malarial Plasmodium parasite (heterozygote advantage). In general, these properties are widely interpreted to form part of the reason for the origin of the standard genetic code [see below]. These variable codes for amino acids are possible because of modified bases in the first base of the anticodon, and the basepair formed is called a wobble base pair. The modified bases include inosine and the U-G basepair. Only two amino acids are specified by a single codon; one of these is the amino-acid methionine, specified by the codon AUG, which also specifies the start of transcription; the other is tryptophan, specified by the codon UGG.

Phase or reading frame of a sequence

Note that a "codon" is entirely defined by your starting position. For example, the string GGGAAACCC, if read from the first position, contains the codons GGG, AAA and CCC. If read from the second position, it contains the codons GGA and AAC (partial codons being ignored). If read starting from the third position, GAA and ACC. Every DNA sequence can thus be read in three reading frames, each of which will produce a radically different amino acid sequence (in our example, Gly-Lys-Pro, Gly-Asp, and Glu-Thr, respectively). The actual frame a protein sequence is translated in is defined by a start codon, usually the first occurrence of AUG in the RNA sequence. Mutations that disrupt the reading frame (i.e. insertions or deletions of one or two nucleotide bases) severely impair the function of a protein and are thus exceedingly rare in in vivo protein-coding sequences, since they often lead to death before an organism is viable.

Variations

Numerous variations of the standard genetic code are found in mitochondria, which are energy-producing organelles. Ciliate protozoa also have some variation in the genetic code: UAG and often UAA code for Glutamine (a variant also found in some green algae), or UGA codes for Cysteine. Another variant is found in some species of the yeast candida, where CUG codes for Serine. In certain proteins, non-standard amino acids are substituted for standard stop codons, depending upon associated signal sequences in the messenger RNA: UGA can code for selenocysteine and UAG can code for pyrrolysine (for details, see the articles on these two amino acids). There may be other non-standard interpretations that are not yet known.

Origin of the genetic code

Despite the variations that exist, the genetic codes used by all known forms of life on Earth are very similar. Since there are many possible genetic codes that are thought to have similar utility to the one used by Earth life, the theory of evolution suggests that the genetic code was established very early in the history of life. One can ask the question: is the genetic code completely random, just one set of codon-amino acid correspondences that happened to establish itself and be "frozen in" early in evolution, although functionally any other of the near-infinite set of possible transcription tables would have done just as well? Already a cursory look at the table shows patterns that suggest that this is not the case. There are three themes running through the many theories that seek to explain the evolution of the genetic code (and hence the origin of these patterns). One is illustrated by recent aptamer