Understanding DNA Sequencing, Part 2

Interested in learning more about DNA sequencing technology?  Be sure to read part one of this series before starting this post.

DNA sequencing in today’s laboratory is a little different than it was when it was developed in the late 1970’s.  Historically, a single sequencing gel contains 48 sample wells, meaning that we could run 12 complete sequencing reactions at one time. This was instrumental for sequencing single genes, but it was not ideal for massive sequencing efforts. Today, the entire process of collecting and analyzing sequencing data is automated (Fig. 1). Despite this, the basic biochemistry of the chain-termination method of DNA sequencing remains unchanged. Fluorescently labeled ddNTPs are used in the sequencing reactions, both to terminate the growing nucleotide chain and to label the DNA for analysis. Each ddNTP is tagged with a different color fluorophore, allowing all four sequencing reactions to be performed in one tube.

Fig. 1:  DNA sequencing in today's laboratories.  CC-BY-SA estevezj
Fig. 1: DNA sequencing in today’s laboratories. CC-BY-SA estevezj

To analyze the fluorescent sequencing reactions, automated machines utilize a polyacrylamide gel formed in a thin capillary tube as opposed to a slab. While the DNA fragments are separating through the gel matrix, a laser beam is focused on the capillary. The laser excites the fluorophores on the DNA, and the detector captures the fluorescent emission. This set-up allows the instrument to identify the bases in the sequencing reaction in real time as they are being separated through the capillary. These high-throughput machines can provide fast, high-quality sequence for a fraction of the cost of traditional sequencing strategies. As the demand for high-quality DNA sequencing information continues to grow, “next-generation” strategies are being developed to speed up sequencing efforts while simultaneously reducing costs.

With the rise of automated sequencing techniques in the early 1990’s, scientists began large-scale genome sequencing efforts in order to understand the biological complexity of an organism at the DNA level. This new field of biology, called genomics, analyzes the sequence and structure of an organism’s genome, as well as the interactions between genes themselves. In addition the human genome, many scientifically and commercially

Fig. 2:  Organisms that have complete genome sequences.  Clockwise from upper L:  C. elegans, corn, humans, chimpanzee.  ©Edvotek 2014
Fig. 2: Organisms that have complete genome sequences. Clockwise from upper L: C. elegans, corn, humans, chimpanzee. ©Edvotek 2014

relevant organisms have had their complete genome sequenced. Notable organisms with sequenced genomes include the bacteria E. coli, the baker’s yeast S. cerevisiae, the nematode C. elegans, corn (Z. mays) and our closest living relative, the chimpanzee (P. troglodytes).

As a result of these projects, a vast amount of DNA sequence information has been made available for research. However, an organism’s DNA sequence was of limited use unless it can be converted to biologically useful information. In the early stages of genomic studies, researchers had recognized the need for data management systems to store and analyze large quantities of sequence information. As computer scientists developed technology to address these needs, they established the interdisciplinary field of science known as Bioinformatics. This discipline blends computer science, biology, and information technology to develop extensive databases to utilize biological information. These databases are universally accessible online, making it easier for scientists around the world to share biological data and come up with greater discoveries.

Bioinformatics has allowed scientists to unlock mysteries coded for by DNA. For example, since the human genome was completed in 2003, the sequence information has been used to map specific genes to their chromosomal location and to identify novel genes. Since the genome varies about 0.2% between individuals (about one base in every 500 is different), specific variations in the DNA sequence can be used as markers for disease predisposition. Potential protein coding genes are easily identified by the presence of stop and start codons. Likewise, special programs have been developed to analyze non-coding sequences of DNA called promoters, which initiate gene expression by regulating the amount of RNA produced within a cell. The annotation of novel DNA sequences will continue as new sequence information is added and more powerful programs mine the information, allowing scientists to understand the function of every base within a given genome.  Furthermore, this technology has allowed us to learn about the genesis of new genes (click here and here to learn more about this topic).

Furthermore, since the genome of many model organisms has also been sequenced, DNA sequence comparison software like the Basic Local Alignment Search tool (or BLAST) has allowed scientists to identify genes that are similar to those that are important for human health and development. Scientists can learn more about these genes by studying their function in a model organism. For instance, about 75% of the genes that cause disease in humans have homologs in D. melanogaster. In fact, the fly model of Alzheimer’s disease has provided new information on the disease, which has allowed scientists to identify novel targets for treatment.

DNA sequencing technology continues to develop today as scientists and engineers develop next-generation sequencing technologies, including massively parallel signature sequencing (MPSS) and Illumina sequencing.  One day, these ultra high-throughput methods will allow an organism’s entire genome to be sequenced in a few hours.  Next week, we will discuss some careers where DNA sequencing technology is important.