How many human genes are there? Nearly 20 years of academic "big controversy" brings the latest conclusion

Release date: 2018-06-21

As early as 2000, when the human genome sequence sketch was still in production, geneticists began to estimate the number of human genes. Nearly 20 years later, they still have no way to agree on this amount with real data, and this knowledge gap has hindered their efforts to discover diseases related to mutations. Until recently, scientists released the latest data: they believe that there are more than 21,000 protein-coding genes in humans.

The latest results used hundreds of human tissue samples and were released on the BioRxiv preprint server on May 29. It contains nearly 5,000 previously undiscovered genes, of which nearly 1,200 carry instructions for making proteins. Overall, this statistic has increased compared to the previously estimated number of approximately 20,000 protein-coding genes, with a total of more than 21,000.

DOI: https://doi.org/10.1101/332825

However, many geneticists are not convinced that all newly proposed genes will stand up to scrutiny. Their criticism also highlights the difficulty of identifying and defining new genes.

Steven Salzberg, a biologist who led the statistics on the number of genes, said: "People have been working on this for 20 years, but we still have no answer."

The final answer?

In 2000, Ewan Birney (currently director of the European Institute of Bioinformatics [EBI] in Hinxton, UK) initiated a genetic competition as the genomics debate on the number of human genes. He made his first bet in a bar at the annual genetics conference, which eventually attracted more than 1,000 contestants and $3,000 in prize money. The number of bets on the number of genes ranges from more than 312,000 to just under 26,000, with an average of about 40,000. After that, the range of estimates is shrinking, roughly ranging between 19,000 and 22,000, but there are still differences.

Source: M. Pertea & SL Salzberg

Gene counts can vary based on the data being analyzed, the tools used, and the criteria for rejecting false positives. The latest statistics use larger data sets and different calculation methods from previous ones, as well as broader genetic definition standards.

Salzberg's team used data from the Genotype Tissue Expression (GTEx) project, which sequenced RNA from more than 30 different tissues of hundreds of dead bodies (RNA is the intermediary between DNA and protein). To identify genes encoding proteins and those genes that are not encoded in cells but still play an important role, they assembled 9000 million of GTEx microRNA fragments and aligned them with the human genome.

However, just because a piece of DNA is expressed as RNA does not necessarily mean that it is a gene. So the team tried to filter out noise with various criteria. For example, they compared the results of the study with the genomes of other species, arguing that the sequences shared by distant relatives are likely to be retained by evolution (because they are functional) and are likely to be genes.

In the end, the team left 21,306 protein-coding genes and 21,856 non-coding genes, far more than the two most widely used human gene databases (the GENCODE genome maintained by EBI includes 19,901 protein-coding genes). And 15,779 non-coding genes and 20,203 protein-coding genes and 17,871 non-coding genes listed by RefSeq, a database managed by the National Center for Biotechnology Information.

Kim Pruitt, former head of RefSeq, believes that part of the reason for this difference is due to the large amount of data analyzed by the Salzberg team; another major difference is that both GENCODE and RefSeq rely on manual processing – artificially looking at the evidence for each gene and doing it The final decision was made, and the Salzberg team relied entirely on computer programs to filter the data.

"If people like our list of genes, then maybe we will be an arbiter of human genes in a few years," Salzberg said.

What is the definition of genes?

It should be noted that many scientists still insist that they need more evidence to be sure of the accuracy of this list. EBI computational biologist Adam Frankish, who coordinates GENCODE's manual annotations, said he and his team have scanned about 100 protein-coding genes identified by the Salzberg team. According to their assessment, only one of them seems to be a true protein-coding gene.

Pruitt's team members studied more than a dozen new protein-coding genes from the Salzberg team, but did not find any genes that met the RefSeq criteria. Some overlap with regions of the genome that appear to belong to retroviruses that invade our ancestral genome; others belong to other repetitive stretches and are rarely translated into proteins.

But Salzberg believes that some repeats can be considered genes. An example is ERV3–1, which appears in RefSeq and encodes proteins that are overexpressed in colorectal cancer. At the same time, Salzberg also admitted that the new genes on his team's list will require verification by themselves and others. "

The most confusing is the change and inaccuracy of gene definition. Biologists used to think that genes are sequences that encode proteins, but later found that some non-coding RNA molecules play an important role in cells. The standard controversy for this genetic determination also explains some of the differences between the Salzberg count and other counts.

Significance

Accurate statistics of all human genes are important to reveal the link between genes and disease. Salzberg points out that countless genes are often overlooked, even if they contain pathogenic mutations. But rushing to add genes to the main list can also be risky. A wrong gene will shift the attention of geneticists to real problems.

Pruitt added: "Biology is complex. The inconsistency in the number of genes between the database and the library is still a problem for researchers, and people are still seeking a final answer."

references:

New human gene tally reignites debate

Source: Bio-Exploration

Colostomy Pouch

colostomy pouch ostomy pouch stoma pouch urostomy pouch

Wenzhou Celecare Medical Instruments Co.,Ltd , https://www.wzcelecare.com