Computers to map the human genetic code
FROM PERSONALITY traits to physical appearance, all characteristics are believed to be manifestations of what an individual's genes contain. But, what are genes made of? Now, Indian scientists have devised an ingenious computer programme that will enable researchers to analyse the complex structure of deoxyribonucleic acid (DNA) and thus assist in the international effort to crack the human genetic code.
The DNA in a human cell contains 50,000 to 100,000 genes, each of which is made up of combinations of four compounds known as nucleotides or bases -- adenine (A), guanine (G), cytosine (C) and thymine (T). These nucleotides occur in a certain sequence in each gene and scientists have been able to partially sequence about 5,000 genes only so far. The total number of times these four nucleotides are repeated in the entire human genetic material or genome is estimated at three billion of which 70 million have been generated through a chemical analysis of DNA. To analyse the enormous data that a study of the human genome entails, a computer programme -- Genome Mapping -- has been developed by Pradeep Kumar Burma and Samir K Brahmachari of the Indian Institute of Science (IISc), Bangalore, in collaboration with Alok Raj and Jayant K Deb of the Birla Institute of Technology, Ranchi.
Says Brahmachari, faculty member of IISc's molecular biophysics unit and member of the international Human Genome Organisation, "We do not have the trained manpower or the facilities to compete with laboratories abroad where actual gene sequencing is concerned, but we have been able to use our intellectual resources to help order the vast quantities of data that is likely to be produced in the coming years."
The information on the human genome is doubling every two years. Earlier, scientists worked with sequences of 1,000 to 2,000 nucleotides, which were amenable to simpler analytical techniques, points out Brahmachari, but today, the volume of information available is mind-boggling. A part of the genome of yeast, which was described in 1992, is made up of 315,357 nucleotides. It would require 45 pages of a journal like Nature to print all this data, says Brahmachari, and it is clearly a formidable task to analyse such a huge quantity of information.
While developing this programme, the scientists felt it would be easier to analyse the unwieldy nucleotide sequences if it could be translated into a visual pattern on a computer screen. The Genome Mapping software plots the entire DNA sequence on a square whose corners are marked A, T, G, C, to represent the different nucleotides.
The pattern that appears, the scientists say, will enable researchers to not only classify sequences into various groups on the basis of their similarities, but can also be used to identify highly repetitive sequences or those that are rare or absent.
Genomes of several organisms are often quite similarly organised. However, this similarity can vary from 20 to 90 per cent in different regions of the genome. It can also vary a great deal within a particular region. With the help of Genome Mapping, researchers can now graphically see the qualitative similarity between different genomes and between different regions within the genomes and decide when it would be worthwhile to opt for a quantitative computer intensive analysis of genomes, using other programmes.
Genome Mapping has two versions -- Genfast and Gen -- both of which use the same basic plotting technique. But Genfast can plot sequences of a million nucleotides in four to five seconds, whereas Gen takes as much as 90 minutes for the same task. But in Gen, the position of each nucleotide in the sequence in which it occurs is stored in the computer memory, making it more appropriate for detailed analysis, explains Brahmachari.