Professor of Human Genetics and
Adjunct Associate Professor of Biomedical Informatics
B.S. University of Texas, Austin
Ph.D. University of Colorado, Boulder
Mark Yandell's Lab Page
Mark Yandell's PubMed Literature Search
Molecular Biology Program
Bioinformatics and Comparative Genomics
Sequenced genomes contain a treasure trove of information about how genes function and evolve. Getting at this information, however, is challenging and requires novel approaches that combine computer science and experimental molecular biology. My lab works at the intersection of both domains, and research in our group can be summarized as follows: generate hypotheses concerning gene function and evolution by computational means, and then test these hypotheses at the bench. This is easier said than done, as serious barriers still exist to using sequenced genomes and their annotations as starting points for experimental work. Some of these barriers lie in the computational domain, others in the experimental. Though challenging, overcoming these barriers offers exciting training opportunities in both computer science and molecular genetics, especially for those seeking a future at the intersection of both fields. Ongoing projects in the lab are centered on genome annotation and comparative genomics. New areas of inquiry include high-throughput biological image analysis, and exploring the relationships between sequence variation and human disease.
One of the great ironies of the DNA sequencing revolution is that genome annotation, not genome sequencing, has become the bottleneck in genomics today. New genomes are being sequenced at a far faster rate than they are being annotated. As of 2007, there are nearly 700 eukaryotic genomes in the sequencing pipeline. Many of these genomes are associated with relatively small research communities who are finding themselves left in the lurch when it comes to annotating their genomes.
Over the past year my lab has been working on an easy-to-use genome annotation pipeline called MAKER. Our goal is to provide research communities without extensive bioinformatics expertise the means to independently annotate their genomes and to distribute the results to the larger biomedical community. For proof of principle, we have collaborated with the S. mediterranea genome project lead by Prof. Alejandro Sánchez Alvarado, Dept. of Neurobiology & Anatomy, University of Utah School of Medicine. To date, our successful annotation of this genome has produced three papers—one describing MAKER, one describing the genome database that we constructed from MAKER's outputs, and another paper describing the our analyses of the S. mediterranea genome and its contents. The first two papers are now in press at Genome Research and Nucleic Acids Research respectively; the third is under review at Science. Going forward, we plan to use the S. mediterranea genome annotations for functional genomics screens. This work will provide many opportunities for research with both computational and experimental components.
High-throughput biological image analysis
The production and analysis large numbers of digital images is an emerging field of bioinformatics. High-throughput imaging screens typically involve placing living cells or embryos in 96 well plates, and then adding different RNAi constructs or small molecules to each well. An automated microscope is then used to capture the results as digital images. These screens combine computation, genomics and molecular biology in new ways—genome annotations are used to design RNAi constructs; cell-lines and embryos expressing various fluorescent markers must be constructed; and software must be written to process the results. My lab is currently engaged in active collaborations with other groups on campus working in this area, as there is a pressing need to develop image-processing pipelines to analyze the data these screens produce.
In 2006, I helped to organize an R21 large-equipment grant to purchase an automated confocal microscope for high-throughput image based screens. The application was successful, and the university has now acquired a BD Pathway Bioimager. This instrument will provide a basic resource for university researchers carrying out high-throughput image-based screens.
In a continuation of my collaboration with the S. mediterranea genome project, Prof. Sánchez Alvarado and I are using the S. mediterranea genome annotations for a genome-wide, image-based RNAi screen for genes involved in cellular regeneration and wound healing. The Bioimager is essential equipment for this work. Our results to date demonstrate that S. mediterranea is an ideal organism for high-throughput image-based screening, in part because it is literally a flatworm. This fact allows us to circumvent some of the technological problems that limit the scope and power of image-based screens of (not so flat) D. melanogaster and C. elegans.
Sequence Variation and Human disease
The Utah Population database (UTPD) and associated phenotype & clinical data collected through the Utah Genetic Reference Project (UGRP) offer unique resources for human genomics research. Tying the clinical and phenotypic data contained within these databases to the genome and genome annotations, however, is a challenging task. My is lab interested in characterizing large-scale trends in the UTPD & UGRP data, both with respect to sequence variation and demographics; developing methods to identify cohorts for clinical studies; and the development of diagnostic devices for purposes of personalized medicine.
- Eilbeck K, Moore B, Holt C, Yandell M (2009) Quantitative Measures for the Management and Comparison of Annotated Genomes. BMC Bioinformatics 10(67)doi:10.1186/147
- Yandell M, Moore B, Salas F, Mungall C, MacBride A, White C, Reese MG (2008) Genome-Wide Analysis of Human Disease Alleles Reveals That Their Locations Are Correlated in Paralogous Proteins. PLoS Comput Biol 4(11)e1000218
- Cantarel B, Korf I, Robb SMC, Parra G, Ross E, Morre B, Holt C, Sanchez Alvarado A, Yandell M (2008) MAKER: An Easy-to-use Annotation Pipeline Designed for Emerging Model Organism Genomes. Genome Research. Jan;18(1):188-96
- Yandell M, Mungall CJ, Smith C, Prochnik S, Kaminker J, et al (2006) Large-scale trends in the evolution of gene structures within 11 animal genomes. PLoS Comput Biol 2(3):e15
- Yandell M, Bailey AM, Misra S, Shu S, Wiel C, Evans-Holm M, Celniker SE, Rubin GM (2005) A computational and experimental approach to validating annotations and gene predictions in the Drosophila melanogaster genome. PNAS 102:5, 1566-1571
- Eilbeck K, Lewis SE, Mungall CJ, Yandell M, Stein L, Durbin R, Ashburner M (2005) The Sequence Ontology: a tool for the unification of genome annotations. Genome Biology 6:R44
- Majoros WH, Subramanian GM, Yandell MD (2003) Identification of key concepts in biomedical literature using a modified Markov heuristic. Bioinformatics 19(3):402-7
- Zdobnov EM, et al (2002) Comparative genome and proteome analysis of Anopheles gambiae and Drosophila melanogaster. Science 298(5591):149-59
- Holt RA, Subramanian GM, Halpern A, Sutton GG, Charlab R, Nusskern DR Wincker P, Clark AG, Ribeiro JMC, Wides R, Salzberg SL, Loftus B, Yandell MD, et al (2002) The genome sequence of the malaria mosquito Anopheles gambiae. Science 298(5591):129-49
- Yandell MD, Majoros WH (2002) Genomics and natural language processing. Nat Rev Genet. (8):601-10. Review
- Mural, et al (2002) A comparison of whole-genome shotgun-derived mouse chromosome 16 and the human genome. Science 296(5573):1661-71
- Kerlavage A, Bonazzi V, di Tommaso M, Lawrence C, Li P, Mayberry F, Mural R, Nodell M, Yandell M, Zhang J, Thomas P (2002) The Celera Discovery System. Nucleic Acids Res. 30(1):129-36
- Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M. et al (2001) The sequence of the human genome. Science 291(5507):1304-51
- Jin S, Martinek S, Joo WS, Wortman JR, Mirkovic N, Sali A, Yandell MD, Pavletich NP, Young MW, Levine AJ (2000) Identification and characterization of a p53 homologue in Drosophila melanogaster. PNAS 97(13):7301-6
- Adams MD, Celniker SE, Holt RA, Evans CA, Gocayne JD, Amanatides PG, Scherer SE, Li PW, Hoskins RA, Galle RF, George RA, Lewis SE, Richards S, Ashburner M, Henderson SN, Sutton GG, Wortman JR, Yandell MD, et al (2000) The genome sequence of Drosophila melanogaster. Science 287(5461):2185-95
- Rubin GM, Yandell MD, et al (2000) Comparative genomics of the eukaryotes. Science 287(5461):2204-15
- Marth GT, Korf I, Yandell MD, Yeh RT, Gu Z, Zakeri H, Stitziel NO, Hillier L, Kwok PY, Gish WR (1999) A general approach to single-nucleotide polymorphism discovery. Nat Genet. (4):452-6