Gabor Marth

Professor of Human Genetics

Gabor Marth

B.S., M.S. Technical University of Budapest

D.Sc. Washington University



Gabor Marth's Lab Page

Gabor Marth's PubMed Literature Search


Molecular Biology Program

DNA sequence analysis software


My research focuses on development of DNA sequence analysis software. Over the past 15 years, my group has developed software to aid genome sequence completion (finishing), for single-nucleotide polymorphism discovery, for population genetic analysis of genomic variation data.  We have developed software packages for base calling, read mapping, variant discovery, and data visualization in high-throughput, next-generation sequencing data. My current research is aimed at developing complete, automated pipelines for sequence processing, variant detection, and variant interpretation; adapt and extend our tools for cancer sequence analysis, and at developing informatics technologies to support population, medical, and personal genome sequencing of very large numbers of samples.

A major goal of genetics research is to characterize the contribution of variation in DNA sequence to differences in physical traits or disease susceptibility between individuals. Until recently, the discovery of genetic variants was the rate-limiting step in genetics research due to the prohibitive cost of obtaining DNA sequences of large numbers of individuals. Over the past five years advances in next generation sequencing (NGS) have lowered the cost of sequencing DNA. NGS has had a profound impact because it is now possible to sequence large numbers of individuals and fully describe the complete spectrum of genetic variation in a species.

Genetic variation occurs at different levels within the genome. The simplest and most common type of variation is single-nucleotide polymorphisms (SNPs) or single-base changes. Often short sections of DNA can be inserted into or deleted from an individual's genome (short INDELs). Often longer regions are deleted from and individual's chromosomes; other regions may be present in multiple copies (chromosomal amplifications); sometimes long sections of chromosomes are translocated. These large-scale variations are termed structural genetic variations.

Our current focus is to develop computer software to process and analyze the vast amount of sequence data generated by NGS technologies. We are actively developing software for reference-guided assembly, haplotype-based variant discovery and genotyping, as well as APIs & command-line toolkits for working with NGS data, and tools to build custom NGS analysis pipelines.


  1. Extending reference assembly models. Church DM, Schneider VA, Steinberg KM, Schatz MC, Quinlan AR, Chin CS, Kitts PA, Aken B, Marth GT, Hoffman MM, Herrero J, Mendoza ML, Durbin R, Flicek P. Genome Biol. 2015 Jan 24;16:13.
  2. Toolbox for mobile-element insertion detection on cancer genomes. Lee WP, Wu J, Marth GT. Cancer Inform. 2014 Oct 15;13(Suppl 4):45-52.
  3. bam.iobio: a web-based, real-time, sequence alignment file inspector. Miller CA, Qiao Y, DiSera T, D'Astous B, Marth GT. Nat Methods. 2014 Dec;11(12):1189.
  4. Tangram: a comprehensive toolbox for mobile element insertion detection. Wu J, Lee WP, Ward A, Walker JA, Konkel MK, Batzer MA, Marth GT. BMC Genomics. 2014 Sep 16;15:795.
  5. SubcloneSeeker: a computational framework for reconstructing tumor clone structure for cancer variant interpretation and prioritization. Qiao Y, Quinlan AR, Jazaeri AA, Verhaak R, Wheeler DA, Marth GT. Genome Biol. 2014 Aug 26;15(8):443.
  6. Human genomic regions with exceptionally high levels of population differentiation identified from 911 whole-genome sequences. Colonna V, Ayub Q, Chen Y, Pagani L, Luisi P, Pybus M, Garrison E, Xue Y, Tyler-Smith C; 1000 Genomes Project Consortium, Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM, Handsaker RE, Kang HM, Marth GT, McVean GA. Genome Biol. 2014 Jun 30;15(6):R88.
  7. Whole genome profiling of spontaneous and chemically induced mutations in Toxoplasma gondii. Farrell A, Coleman BI, Benenati B, Brown KM, Blader IJ, Marth GT, Gubbels MJ. BMC Genomics. 2014 May 10;15:354.
  8. MOSAIK: a hash-based algorithm for accurate next-generation sequencing short-read mapping. Lee WP, Stromberg MP, Ward A, Stewart C, Garrison EP, Marth GT. PLoS One. 2014 Mar 5;9(3):e90581.
  9. SSW library: an SIMD Smith-Waterman C/C++ library for use in genomic applications. Zhao M, Lee WP, Garrison EP, Marth GT. PLoS One. 2013 Dec 4;8(12):e82138.
  10. Integrative annotation of variants from 1092 humans: application to cancer genomics. Khurana E, Fu Y, Colonna V, Mu XJ, Kang HM, Lappalainen T, Sboner A, Lochovsky L, Chen J, Harmanci A, Das J, Abyzov A, Balasubramanian S, Beal K, Chakravarty D, Challis D, Chen Y, Clarke D, Clarke L, Cunningham F, Evani US, Flicek P, Fragoza R, Garrison E, Gibbs R, Gümüs ZH, Herrero J, Kitabayashi N, Kong Y, Lage K, Liluashvili V, Lipkin SM, MacArthur DG, Marth G, Muzny D, Pers TH, Ritchie GR, Rosenfeld JA, Sisu C, Wei X, Wilson M, Xue Y, Yu F; 1000 Genomes Project Consortium, Dermitzakis ET, Yu H, Rubin MA, Tyler-Smith C, Gerstein M. Science. 2013 Oct 4;342(6154):1235587.
  11. Variant discovery in targeted resequencing using whole genome amplified DNA. Indap AR, Cole R, Runge CL, Marth GT, Olivier M. BMC Genomics. 2013 Jul 10;14:468.
  12. Genetic basis for phenotypic differences between different Toxoplasma gondii type I strains. Yang N, Farrell A, Niedelman W, Melo M, Lu D, Julien L, Marth GT, Gubbels MJ, Saeij JP. BMC Genomics. 2013 Jul 10;14:467.
  13. Scotty: a web tool for designing RNA-Seq experiments to measure differential gene expression. Busby MA, Stewart C, Miller CA, Grzeda KR, Marth GT. Bioinformatics. 2013 Mar 1;29(5):656-7.
  14. Copy Number Variation detection from 1000 Genomes Project exon capture sequencing data. Wu J, Grzeda KR, Stewart C, Grubert F, Urban AE, Snyder MP, Marth GT. BMC Bioinformatics. 2012 Nov 17;13:305.
  15. An integrated map of genetic variation from 1,092 human genomes. 1000 Genomes Project Consortium, Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM, Handsaker RE, Kang HM, Marth GT, McVean GA. Nature. 2012 Nov 1;491(7422):56-65.
  16. Targeted proteomic dissection of Toxoplasma cytoskeleton sub-compartments using MORN1. Lorestani A, Ivey FD, Thirugnanam S, Busby MA, Marth GT, Cheeseman IM, Gubbels MJ. Cytoskeleton (Hoboken). 2012 Dec;69(12):1069-85.
  17. The 1000 Genomes Project: data management and community access. Clarke L, Zheng-Bradley X, Smith R, Kulesha E, Xiao C, Toneva I, Vaughan B, Preuss D, Leinonen R, Shumway M, Sherry S, Flicek P; 1000 Genomes Project Consortium. Nat Methods. 2012 Apr 27;9(5):459-62.
  18. A DOC2 protein identified by mutational profiling is essential for apicomplexan parasite exocytosis. Farrell A, Thirugnanam S, Lorestani A, Dvorin JD, Eidell KP, Ferguson DJ, Anderson-White BR, Duraisingh MT, Marth GT, Gubbels MJ. Science. 2012 Jan 13;335(6065):218-21.
  19. Expression divergence measured by transcriptome sequencing of four yeast species. Busby MA, Gray JM, Costa AM, Stewart C, Stromberg MP, Barnett D, Chuang JH, Springer M, Marth GT. BMC Genomics. 2011 Dec 29;12:635.
  20. ART: a next-generation sequencing read simulator. Huang W, Li L, Myers JR, Marth GT. Bioinformatics. 2012 Feb 15;28(4):593-4.
  21. The functional spectrum of low-frequency coding variation. Marth GT, Yu F, Indap AR, Garimella K, Gravel S, Leong WF, Tyler-Smith C, Bainbridge M, Blackwell T, Zheng-Bradley X, Chen Y, Challis D, Clarke L, Ball EV, Cibulskis K, Cooper DN, Fulton B, Hartl C, Koboldt D, Muzny D, Smith R, Sougnez C, Stewart C, Ward A, Yu J, Xue Y, Altshuler D, Bustamante CD, Clark AG, Daly M, DePristo M, Flicek P, Gabriel S, Mardis E, Palotie A, Gibbs R; 1000 Genomes Project. Genome Biol. 2011 Sep 14;12(9):R84.
  22. A comprehensive map of mobile element insertion polymorphisms in humans. Stewart C, Kural D, Strömberg MP, Walker JA, Konkel MK, Stütz AM, Urban AE, Grubert F, Lam HY, Lee WP, Busby M, Indap AR, Garrison E, Huff C, Xing J, Snyder MP, Jorde LB, Batzer MA, Korbel JO, Marth GT; 1000 Genomes Project. PLoS Genet. 2011 Aug;7(8):e1002236.
  23. Demographic history and rare allele sharing among human populations. Gravel S, Henn BM, Gutenkunst RN, Indap AR, Marth GT, Clark AG, Yu F, Gibbs RA; 1000 Genomes Project, Bustamante CD. Proc Natl Acad Sci U S A. 2011 Jul 19;108(29):11983-8.
  24. Variation in genome-wide mutation rates within and between human families. Conrad DF, Keebler JE, DePristo MA, Lindsay SJ, Zhang Y, Casals F, Idaghdour Y, Hartl CL, Torroja C, Garimella KV, Zilversmit M, Cartwright R, Rouleau GA, Daly M, Stone EA, Hurles ME, Awadalla P; 1000 Genomes Project. Nat Genet. 2011 Jun 12;43(7):712-4.
  25. The variant call format and VCFtools. Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, Handsaker RE, Lunter G, Marth GT, Sherry ST, McVean G, Durbin R; 1000 Genomes Project Analysis Group. Bioinformatics. 2011 Aug 1;27(15):2156-8.
  26. BamTools: a C++ API and toolkit for analyzing and managing BAM files. Barnett DW, Garrison EK, Quinlan AR, Strömberg MP, Marth GT. Bioinformatics. 2011 Jun 15;27(12):1691-2.
  27. Mapping copy number variation by population-scale genome sequencing. Mills RE, Walter K, Stewart C, Handsaker RE, Chen K, Alkan C, Abyzov A, Yoon SC, Ye K, Cheetham RK, Chinwalla A, Conrad DF, Fu Y, Grubert F, Hajirasouliha I, Hormozdiari F, Iakoucheva LM, Iqbal Z, Kang S, Kidd JM, Konkel MK, Korn J, Khurana E, Kural D, Lam HY, Leng J, Li R, Li Y, Lin CY, Luo R, Mu XJ, Nemesh J, Peckham HE, Rausch T, Scally A, Shi X, Stromberg MP, Stütz AM, Urban AE, Walker JA, Wu J, Zhang Y, Zhang ZD, Batzer MA, Ding L, Marth GT, McVean G, Sebat J, Snyder M, Wang J, Ye K, Eichler EE, Gerstein MB, Hurles ME, Lee C, McCarroll SA, Korbel JO; 1000 Genomes Project. Nature. 2011 Feb 3;470(7332):59-65.
  28. Diversity of human copy number variation and multicopy genes. Sudmant PH, Kitzman JO, Antonacci F, Alkan C, Malig M, Tsalenko A, Sampas N, Bruhn L, Shendure J; 1000 Genomes Project, Eichler EE. Science. 2010 Oct 29;330(6004):641-6.
  29. A map of human genome variation from population-scale sequencing. 1000 Genomes Project Consortium, Abecasis GR, Altshuler D, Auton A, Brooks LD, Durbin RM, Gibbs RA, Hurles ME, McVean GA. Nature. 2010 Oct 28;467(7319):1061-73.
  30. A standard variation file format for human genome sequences. Reese MG, Moore B, Batchelor C, Salas F, Cunningham F, Marth GT, Stein L, Flicek P, Yandell M, Eilbeck K. Genome Biol. 2010;11(8):R88.
  31. Population genomic inferences from sparse high-throughput sequencing of two populations of Drosophila melanogaster. Sackton TB, Kulathinal RJ, Bergman CM, Quinlan AR, Dopman EB, Carneiro M, Marth GT, Hartl DL, Clark AG. Genome Biol Evol. 2009 Nov 18;1:449-65.
  32. Rapid whole-genome mutational profiling using next-generation sequencing technologies. Smith DR, Quinlan AR, Peckham HE, Makowsky K, Tao W, Woolf B, Shen L, Donahue WF, Tusneem N, Stromberg MP, Stewart DA, Zhang L, Ranade SS, Warner JB, Lee CC, Coleman BE, Zhang Z, McLaughlin SF, Malek JA, Sorenson JM, Blanchard AP, Chapman J, Hillman D, Chen F, Rokhsar DS, McKernan KJ, Jeffries TW, Marth GT, Richardson PM. Genome Res. 2008 Oct;18(10):1638-42.
  33. Whole-genome sequencing and variant discovery in C. elegans. Hillier LW, Marth GT, Quinlan AR, Dooling D, Fewell G, Barnett D, Fox P, Glasscock JI, Hickenbotham M, Huang W, Magrini VJ, Richt RJ, Sander SN, Stewart DA, Stromberg M, Tsung EF, Wylie T, Schedl T, Wilson RK, Mardis ER. Nat Methods. 2008 Feb;5(2):183-8.
  34. Pyrobayes: an improved base caller for SNP discovery in pyrosequences. Quinlan AR, Stewart DA, Strömberg MP, Marth GT. Nat Methods. 2008 Feb;5(2):179-81.
  35. Primer-site SNPs mask mutations. Quinlan AR, Marth GT. Nat Methods. 2007 Mar;4(3):192.
  36. Analysis of concordance of different haplotype block partitioning algorithms. Indap AR, Marth GT, Struble CA, Tonellato P, Olivier M. BMC Bioinformatics. 2005 Dec 15;6:303.
  37. The allele frequency spectrum in genome-wide human variation data reveals signals of differential demographic history in three large world populations. Marth GT, Czabarka E, Murvai J, Sherry ST. Genetics. 2004 Jan;166(1):351-72.
  38. Computational SNP discovery in DNA sequence data. Marth GT. Methods Mol Biol. 2003;212:85-110.

to page top

Last Updated: 11/2/16