PerspectivesAre you interested in submitting a Perspective Article? Be sure to read The Science Advisory Board's Editorial Guides for Perspective Articles. Click here. Should they charge us to pass the Genome? by Arseni Markoff, Ph.D. Given the recent success in sequencing the human genome, almost every lab doing research in the fields of human molecular and cell biology, biochemistry, genetics and pathology is interested in using the assembled and annotated data. Annotation means that coding regions (i.e., the genes) are properly positioned on the chromosomes and their essential features (e.g., transcripts, encoded protein(s), polymorphic alleles, etc.) are indicated. There exists two -- not completely annotated -- versions of the human genome: the publicly available and the one generated at The Institute of Genomic Research (TIGR). TIGR, headed by Craig Venter, first launched its annotated version of the human genome as a proprietary product of Celera Genomics. Later on, the company introduced a powerful analytical tool suite called Discovery System for integrated data mining and interpolation on the processed and catalogued genomic sequence. Through registered access the user is connected via the Internet with Celera's unique supercomputing facility, featuring 800 interconnected Compaq Alpha-based, true 64-bit computer systems, each of which is capable of performing more than 250 billion sequence comparisons per hour. This facility, initially used to assemble DNA fragments into complete genomes, is now used for comparative analysis and sequence data mining. It is Celera's sequence assembly algorithms that made possible the completion of the human genome after shot-gun sequencing in just nine months. The same algorithms and computer platforms are now made available for licensed use on analytical tasks. Should we pay to use the facilities? The answer to this question depends only on our research task priorities. If there is a hot new gene identified in a series of biochemical and/or cell biology experiments that might play a crucial role in human pathology, one would certainly like to know more about its "relatives" and their potential function. It is becoming widely recognized that molecular events underpinning disease mechanisms involve only specific protein isoforms. Their reflection in the human genome is the wealth of gene polymorphisms (alleles). The standalone PCs with a single "bio-app" analytical package is certainly not the proper model of a serious bioinformatic effort on gene prediction and modeling. The post-genomic age requires more computer power and non-trivial algorithm solutions. It is somewhat ironic that the British Medical Research Council (MRC) has decided to sign a deal with Celera on using their Discovery System. One has only to recall how Mr. Venter's privately sponsored effort to sequence the human genome was met with highly raised brows, the approach used was mocked and doomed for "inappropriate" and "inaccurate" exactly by the advocates of the public initiative. When the results came out and experts were invited to judge the quality of assembled data, there was a bit of confused silence in the mainstream. It was clear that some serious and precise "number crunching" produced unexpectedly useful results. The puzzle was solved -- the genome assembled! Of course there have been some errors in the private as well as in the public version, such as "contaminations" with sequences from other species (mouse) and inverted directions of large contigs -- sequence blocks of hundred thousand base pairs. Such mistakes however were corrected fast and the two versions of the human genome are now available on the Internet. The only difference, the access to the Celera data is available for a fee. One could certainly ask, "Do we really need the private data, when we have access to the public?" Perhaps not. But I suspect there is much more behind the data. It is the way they are being grouped and analyzed. Not that the publicly available tools are bad, but I am very positive that Mr. Venter's crew has more to offer. I saw several of TIGRs sequence assembly and quality assessment tools available as a downloadable freeware for the non-profit, scientific community. These tools are all answering specific questions of the data mining and evaluation in a better way than many of the commercial PC-based packages. A combination of these TIGR tools on an integrated platform, be it even only standalone versions, is already something which is worth paying for. The only problem is that to run this software one needs a supercomputing facility. The use of an architecture, where all of the tools are available and are integrated to answer complex structure, prediction and feature identification and analysis questions is already something beyond the borders of just shareware. In conclusion, I daresay that using the Celera facilities seems to be a much fairer deal than the use of a single molecular biology desktop package. You use their machines and their data, with their tools. The results are yours. The time you waste on cursing a standalone application because it doesn't do what you want it to do and/or in the way you want it done is perhaps worth what you pay for access to Discovery System. Don't forget, in this way you support development of newer and better tools for the bioinformatics. ### Arseni Markoff, Ph.D. Institute of Medical Biochemistry Muenster, Germany ### << Previous Next >> [ View All Perspectives ] |
|