The status quo and development of DNA sequencing technology (2)

2.1.3 Short clip mapping software package after cutting

More complex specialized algorithms are needed to relocate the reverse transcribed cDNA of RNA into the genome. It is completely different to relocate short fragments of RNA generated by different exons after splicing into the genome and to relocate a short fragment of RNA generated by one exon into the genome (Figure 14).

Software packages such as ERANGE (http://woldlab.caltech.edu/rnaseq) used in the localization of RNA reverse transcript cDNA will use the exon position and intron position information of known genes as reference. In this way, the ERANGE package can "cross" multiple exons to construct a new reference sequence, and then call the Maq program or the Bowtie program to locate the clipped RNA fragment into the reference sequence. Because this method cannot find new (people-unknown) cut patterns, some researchers use a "machine learning method" to predict new cut patterns. The method is exercised on a statistical model by means of existing reference sequence annotation information. In contrast, the TopHat package (http://tophat.cbcb.umd.edu) does not require any annotation information. It uses Bowtie software to find exons containing short fragments, and then the rest. Short fragments are located among the various exon junctions found previously. There is also a program G-Mo.R-Se ( http:// ) that uses this strategy, but it is based on RNA sequencing data rather than through Bowtie software. Proton.

2.2 Limitations and problems

Existing methods for short segment mapping have their own limitations. For example, Maq and Bowtie software have little effect when dealing with inserted or missing segments.

Some software, such as SHRiMP (http://compbio.cs.toronto.edu/shrimp, Figure 15), can support ABI's "color space" sequencing results, but most software does not support this result. of. Similar problems exist with short-segment mapping software after cutting, and they have their own special problems. For example, software based on annotation information can of course only obtain results that are comparable to annotation information, but the genome-wide annotation information for many species is simply homologous prediction information or computer prediction information. If the "machine learning method" is "drilled" by the wrong comment message, it will not give good results.

Therefore, for the development designers of short-segment mapping software, there are still many problems that need to be solved. All sequencing instrument manufacturers are working hard to get longer sequencing results. Can existing short-segment mapping software handle these "big guys"? Maq, Bowtie, and several other short-segment mapping software can process sequencing fragments longer than 100bp, but only in specific situations, and only software designed for long segments, such as BLAT, can handle it better. Such sequencing results. In addition, if the sequence of the sequenced sample species differs greatly from the existing reference sequence, how do you adjust the parameters of the mapping software? Can the software automatically adjust the parameters? What is the quality of the picture that comes out of this? The solutions to these problems depend on the detection method and scope of analysis used. However, with the advancement of technology, I believe that all these problems will soon be overcome.

Original Search: Cole Trapnell & Steven L Salzberg. (2009) How to map billions of short reads onto genomes. Nature Biotechnology, 27(5): 455-457.

Small dictionary 1

"reference" genome

The genomes of each species have a certain number of relatively constant genes and gene arrangements, but mutations in certain genes or gene fragments form different species. The genome of a representative species can usually be used as a model genome within a species to facilitate research on other species. This "genome of a representative species" is the "reference" genome.

3. Faster - just 15 minutes, cheaper - just $100 for human genome sequencing technology coming soon

Soon, we will see the latest human genome sequencing technology available. With this new technology, the cost of human genome sequencing will be greatly reduced, averaging only $100 per sample. At the same time, the sequencing speed of this technology is 20,000 times faster than the second-generation sequencing technology widely used on the market, and we can use this technology to observe the process of human genomic DNA amplification in real time.

Stephen Turner, chief technology officer at Pacific Biosciences, said the latest commercial single-molecule real-time sequencing (SMRT) will be available in 2010.

Ten years ago, both Celera Genomics and the Human Genome Project took years to get a complete human genome sequence map. But by 2008, with a new generation of sequencers, it took us only a few months to get James Watson's personal complete genome sequence.

Now, with the SMRT sequencer, Pacific Biosciences hopes to use the sequencer to complete the sequencing of the human genome in minutes.

The research strategy we used in our work on the Human Genome Project was to take advantage of the natural mechanisms by which cells replicate DNA.

The use of DNA polymerase to replicate DNA strands yields billions of DNA fragment molecules of various lengths. Then add a small fluorescent labeling molecule at the end of each fragment. The fluorescent labeling molecule can only mark the last base of the DNA strand, and then arrange the molecules according to the length of the DNA fragment. We can read like a book. The bases at the end of the DNA sequence are read one by one in order of length and length.

However, this method is not used by the SMRT sequencer. The DNA polymerase completes the replication process and then reads the sequence. The technique uses real-time monitoring of the working state of the DNA polymerase. Each DNA strand is used. The molecules are adsorbed at the bottom of the well, and while the DNA polymerase replicates, the SMRT sequencer reads each base in real time, so that the complete sequence is obtained.

Each base used in the SMRT sequencer carries a unique fluorescent label. Once a base is incorporated into the newly synthesized DNA strand, a specific fluorescent signal is emitted, and the real-time detector can The fluorescent signal determines which of A, C, G, and T is the base.

Researchers inventing SMRT technology hope to further improve the technology and become a chip-based multi-channel parallel processing sequencer, which can further speed up the sequencing.

"If we can process 1 million fragment molecules at the same time, then we can get a complete human genome sequence map in 15 minutes," Turner said.

The SMRT sequencer can increase the sequencing speed while improving the accuracy of sequencing. Errors occur when sequencing with the SMRT sequencer, which means that the probability of error at each site is the same, and no site is more prone to error, so if you repeat the sequence multiple times, you should be able to improve the accuracy.

PVC Respirator

Pvc Portable Resuscitator,Pvc Transparent Respirator ,Simple Respirator,Emergency Resuscitator

Jiangsu Yongle Medical Technology Co., Ltd. , https://www.jsylmedical.com