The consortium of investigators known as ENCODE (ENCyclopedia Of DNA Elements) published, with much publicity, a series of about thirty papers last fall purporting to “identify all functional elements in the human genome sequence” (https://www.genome.gov/ENCODE/).  Dan Graur, an evolutionary geneticist at the University of Houston, and his associates have published a paper in Genome Biology and Evolution (2013; online) challenging the assertion by the ENCODE investigators that 80.4% of the human genome can be considered functional (Nature, 2012).  Graur’s critique of the ENCODE claim is grounded in evolutionary principles.

The crux of Graur’s argument hinges on the definition of “function” as this term is applied to elements of the human genome.  He and his colleagues distinguish between what they term “selected-effect” function and “causal-role” function.  Paraphrasing the authors, who cite a paper of Ruth Millikan’s (1989) for this distinction, a selected-effect function can be associated with a particular genetic sequence if selection has maintained that sequence through generations so as to preserve the function, or a similar one, under consideration.  A causal-role function, again paraphrasing, can be attributed to a genetic sequence if that sequence mediates or participates in an effect or interaction that can be experimentally observed, such as binding to a transcription factor.  These concepts of function are similar to two concepts of function (termed “biochemical” and “genetic” function) I proposed previously (Greenspan, 1998) and (Greenspan, 2011).

Graur et al. argue forcefully that the ENCODE authors use the “causal-role” concept of function to arrive at their high estimate, 80.4%, for the proportion of the genome that is functional. They (the group from the University of Houston) suggest, for example, that the mere binding of a transcription factor to a portion of genomic DNA does not suffice to call that bit of the genetic material functional in a biologically meaningful way unless the transcription factor binding is accompanied by transcription and additional consequential effects on the cellular or organismal phenotype.  In numerous cases, examples of genomic elements labeled as functional by the ENCODE authors do not meet such a standard.

In addition, the University of Houston group offer several additional criticisms of other aspects of the case built by the ENCODE authors.  For the full explanation of these alleged errors by the ENCODE participants, the reader should consult Graur et al., but below I provide brief summary statements of a sampling of the criticisms offered by the geneticists from Houston.

1. Not only do the ENCODE authors use a flawed concept of function (i.e. the “causal-role” concept of function) they use it incorrectly and inconsistently.

2. There are a number of inconsistent statements by ENCODE authors in different publications pertaining to the precise estimate for the proportion of the human genome that is “functional.”

3. The ENCODE Project Consortium makes claims for functional sequences corresponding to introns and transposons that correspond to frequencies of functionality that are not consistent with the estimates derived from earlier, independent analyses of these types of sequences.

4. The numbers and sizes of transcription factor binding sites are substantially overestimated.

5. Transcription is in part a stochastic process and some transcripts may be inconsequential, or nearly so, so that the mere fact of being transcribed does not justify an attribution of biological functionality.

6. Sites of methylation, histone modification, or open chromatin are not necessarily worthy of being labeled functional solely on the basis of those biochemical criteria.

I find the arguments of Graur et al. compelling overall, but it will be of interest to follow the response, if any, from the members of the ENCODE Consortium.  In the meantime, I have to note that the complexities of biology are such that even the rigorous approach of Graur et al. needs to be questioned in certain respects.  As an illustration of this point, I describe a few functions of DNA sequences that the geneticists from Houston may not have considered in advocating for the “selected effect” criterion for function and the associated non-permissiveness to mutation.

The first example is related to immunoglobulin variable (V) region gene diversification for both heavy (H) and light (L) chains.  In humans and the favorite experimental animal of immunologists, mice, V gene diversity arises from multiple functional V genes, and other gene segments (D and J segments for H chains and J segments for L chains) rearrangements of genomic DNA that generate diverse nucleotide sequences through variation in the precise locations of the junctions, a process that randomly adds non-templated nucleotides at the junctions, and combinatorial pairing of H and L chains.  However, in rabbits and chickens, there is only one V gene that can be transcribed and that V gene sequence is diversified in different B cells by gene conversion events involving different V pseudogenes (i.e., V genes that are not transcribed) that may tolerate more mutation than the V genes of mice and men (Kurosawa and Ohta, 2011).

A plausible mechanism, for which I have no documented examples, involves a DNA sequence that binds one or more transcription factors without leading to transcription.  Such a sequence might truly be non-functional, but an alternative is that a function exists for this stretch of nucleotides by virtue of the ability of the sequence to compete for transcription factor binding with a second sequence from which transcription does proceed, thereby potentially regulating the extent of transcription from the second sequence.  Of course, such a sequence that binds transcription factors without inducing transcription should be subject to selection in some degree.

Finally, a possible role for much DNA devoid of any obvious genetic or molecular biological function has been described in the context of host defense mediated by polymorphonuclear cells, also known as neutrophils.  These leukocytes have been revealed (Brinkmann et al., 2004) to release into the extracellular space a complex material consisting of chromatin and granule proteins (Neutrophil Extracellular Traps or NETs) that immobilizes and kills bacteria and inactivates virulence-associated molecules.  Thus, even otherwise nominally useless genomic sequences may contribute to fitness through this mechanism, once again illustrating the impressively opportunistic ‘imagination’ of the evolutionary process.

References

https://www.genome.gov/ENCODE/. Accessed on 3/16/13.

Graur D, Zheng Y, Price N, Azevedo RB, Zufall RA, Elhaik E. On the immortality of television sets: “function” in the human genome according to the evolution-free gospel of ENCODE. Genome Biol Evol. 2013 Feb 20. [Epub ahead of print] PubMed PMID: 23431001.

The ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 2012; 489: 57-74.

Millikan RG 1989. In defense of proper functions. Philos. Sci. 56: 288-302.

Greenspan, N.S. Genomic logic, allelic inference, and the functional classification of genes. Perspectives Biol. Med. 1998; 41:409-416.

Greenspan, N.S. Attributing functions to genes and gene products. Trends  Biochem Sci., 2011 36(6):293-297.  Jan 24. [Epub ahead of print] doi:10.1016/j.tibs.2010.12.005. PubMed PMID: 21269834.

Kurosawa K and Ohta K. Genetic diversification by somatic gene conversion. Genes 2011, 2(1), 48-58; doi:10.3390/genes2010048.

Brinkmann V, Reichard U, Goosmann C, Fauler B, Uhlemann Y, Weiss DS, Weinrauch Y, Zychlinsky A. Neutrophil extracellular traps kill bacteria. Science. 2004 Mar  5;303(5663):1532-5. PubMed PMID: 15001782.