A comprehensive guide to bioinformatics tools and databases
from SNPs to haplotypes
DOI:
https://doi.org/10.59317/5gpdqq40Keywords:
SNP, Genome-wide association studies, Haplotype, Variations, GeneticsAbstract
The interest in the functional aspects of the genome has increased eventually with the expansion of genomic databases. In this review, various genetic alterations within the genome, such as Single Nucleotide Polymorphisms (SNPs), disease-associated mutations, neutral polymorphisms, and small insertions or deletions (INDELs), have been identified and characterized. These polymorphisms, particularly SNPs, have become widely employed in genetic analyses, serving as valuable molecular markers. They are applied across diverse genetic studies, including the reconstruction of haplotypes. In addition to experimental methods for investigating specific genetic variants, there has been a significant surge in bioinformatics research over the past decade, aimed at the molecular consequences of these genetic alterations. Haplotypes, which represent various alleles of a gene, can be reconstructed not only for entire chromosomes but also for specific genes, utilizing the information derived from SNPs. The computational approach to SNP and haplotype discovery has gained prominence due to the continuous expansion of sequence data within public databases. This growth enables more precise identification of SNPs. This article aims to introduce easily accessible and practical online tools for researchers. In summary, SNP and haplotype identification tools are indispensable in bioinformatics for studying genetic variation, population genetics, disease associations, and personalized medicine. They provide valuable insights into the genetic underpinnings of various traits and diseases, helping advance our understanding of genetics and its applications in healthcare.
Downloads
References
Adzhubei, I., Jordan, D. M. and Sunyaev, S. R. 2013. Predicting functional effect of human missense mutations using PolyPhen 2. Current protocols in human genetics 76(1): 7-20.
Abmus, J., Schmitt, A. O., Bortfeldt, R. H. and Brockmann, G. A. 2011. NovelSNPer: a fast tool for the identification and characterization of novel SNPs and InDels. Advances in Bioinformatics 2011.
Bai, B., Zhao, W. M., Tang, B. X., Wang, Y. Q., Wang, L., Zhang, Z. and Zhang, Y. P. 2015. DoGSD: the dog and wolf genome SNP database. Nucleic acids research 43(D1): D777-D783.
Barrett, J. C., Fry, B., Maller, J. D. M. J. and Daly, M. J. 2005. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21(2): 263-265.
Batley, J., Jewell, E. and Edwards, D. 2007. Automated discovery of single nucleotide polymorphism and simple sequence repeat molecular genetic markers. Plant Bioinformatics: Methods and Protocols. Pp. 473-494.
Becker, K. G., Barnes, K. C., Bright, T. J. and Wang, S. A. 2004. The genetic association database. Nature genetics 36(5): 431-432.
Birney, E., Andrews, T. D., Bevan, P., Caccamo, M., Chen, Y., Clarke, L. and Clamp, M. 2004. An overview of Ensembl. Genome research, 14(5): 925-928.
Bromberg, Y. and Rost, B. 2007. SNAP: predict effect of non-synonymous polymorphisms on function. Nucleic acids research 35(11): 3823-3835.
Brookes, A. J. 1999. The essence of SNPs. Gene 234(2): 177-186.
Browning, S. R. and Browning, B. L. 2007. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. The American Journal of Human Genetics 81(5): 1084-1097.
Calabrese, R., Capriotti, E., Fariselli, P., Martelli, P. L. and Casadio, R. 2009. Functional annotations improve the predictive score of human disease‐related mutations in proteins. Human mutation 30(8): 1237-1244.
Capriotti, E., Calabrese, R. and Casadio, R. 2006. Predicting the insurgence of human genetic diseases associated to single point protein mutations with support vector machines and evolutionary information. Bioinformatics 22(22): 2729-2734.
Chen, X. and Sullivan, P. F. 2003. Single nucleotide polymorphism genotyping: biochemistry, protocol, cost and throughput. The pharmacogenomics journal 3(2): 77-96.
Conde, L., Vaquerizas, J. M., Santoyo, J., Al-Shahrour, F., Ruiz-Llorente, S., Robledo, M. and Dopazo, J. 2004. PupaSNP Finder: a web tool for finding SNPs with putative effect at transcriptional level. Nucleic acids research 32(suppl_2): W242-W248.
Das, S., Forer, L., Schönherr, S., Sidore, C., Locke, A. E., Kwong, A. and Fuchsberger, C. 2016. Next-generation genotype imputation service and methods. Nature genetics 48(10): 1284-1287.
Delaneau, O., Marchini, J. and Zagury, J. F. 2012. A linear complexity phasing method for thousands of genomes. Nature methods 9(2): 179-181.
Dereeper, A., Nicolas, S., Le Cunff, L., Bacilieri, R., Doligez, A., Peros, J. P. and This, P. 2011. SNiPlay: a web-based tool for detection, management and analysis of SNPs. Application to grapevine diversity projects. BMC bioinformatics 12: 1-14.
Emahazion, T., Feuk, L., Jobs, M., Sawyer, S. L., Fredman, D., St Clair, D. and Brookes, A. J. 2001. SNP association studies in Alzheimer’s disease highlight problems for complex disease analysis. TRENDS in Genetics 17(7): 407-413.
Excoffier, L., Laval, G. and Schneider, S. 2005. Arlequin (version 3.0): an integrated software package for population genetics data analysis. Evolutionary bioinformatics, 1, 117693430500100003.
Ferrer-Costa, C., Gelpí, J. L., Zamakola, L., Parraga, I., De La Cruz, X. and Orozco, M. 2005. PMUT: a web-based tool for the annotation of pathological mutations on proteins. Bioinformatics 21(14): 3176-3178.
Fredman, D., Siegfried, M., Yuan, Y. P., Bork, P., Lehväslaiho, H. and Brookes, A. J. 2002. HGVbase: a human sequence variation database emphasizing data quality and a broad spectrum of data sources. Nucleic Acids Research 30(1): 387-391.
Giacomini, K. M., Brett, C. M., Altman, R. B., Benowitz, N. L., Dolan, M. E., Flockhart, D. A. and Pharmacogenetics Research Network. 2007. The pharmacogenetics research network: from SNP discovery to clinical drug response. Clinical Pharmacology and Therapeutics 81(3): 328-345.
Gibbs, R. A., Belmont, J. W., Hardenbol, P., Willis, T. D., Yu, F. L., Yang, H. M. and Duster, T. 2003. The international HapMap project.
Halperin, E. and Eskin, E. 2004. Haplotype reconstruction from genotype data using imperfect phylogeny. Bioinformatics 20(12): 1842-1849.
Hecht, M., Bromberg, Y. and Rost, B. 2015. Better prediction of functional effects for sequence variants. BMC genomics 16: 1-12.
Hirakawa, M., Tanaka, T., Hashimoto, Y., Kuroda, M., Takagi, T. and Nakamura, Y. 2002. JSNP: a database of common gene variations in the Japanese population. Nucleic acids research 30(1) 158-162.
Jia, X., Han, B., Onengut-Gumuscu, S., Chen, W. M., Concannon, P. J., Rich, S. S. and de Bakker, P. I. 2013. Imputing amino acid polymorphisms in human leukocyte antigens. PloS one 8(6): e64683.
Kang, H. J., Choi, K. O., Kim, B. D., Kim, S. and Kim, Y. J. 2005. FESD: a functional element SNPs database in human. Nucleic Acids Research 33(suppl_1): D518-D522.
Karolchik, D., Baertsch, R., Diekhans, M., Furey, T. S., Hinrichs, A., Lu, Y. T. and Kent, W. J. 2003. The UCSC genome browser database. Nucleic acids research 31(1): 51-54.
Lenffer, J., Nicholas, F. W., Castle, K., Rao, A., Gregory, S., Poidinger, M. and Ranganathan, S. 2006. OMIA (Online Mendelian Inheritance in Animals): an enhanced platform and integration into the Entrez search interface at NCBI. Nucleic acids research 34(suppl_1): D599-D601.
Li, H. and Durbin, R. 2009. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25(14): 1754-1760.
Li, Y., Willer, C. J., Ding, J., Scheet, P. and Abecasis, G. R. 2010. MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genetic epidemiology 34(8): 816-834.
Mailman, M. D., Feolo, M., Jin, Y., Kimura, M., Tryka, K., Bagoutdinov, R. and Sherry, S. T. 2007. The NCBI dbGaP database of genotypes and phenotypes. Nature genetics 39(10): 1181-1186.
Marth, G. T., Korf, I., Yandell, M. D., Yeh, R. T., Gu, Z., Zakeri, H. and Gish, W. R. 1999. A general approach to single-nucleotide polymorphism discovery. Nature genetics 23(4): 452-456.
Matukumalli, L. K., Grefenstette, J. J., Hyten, D. L., Choi, I. Y., Cregan, P. B. and Van Tassell, C. P. 2006. SNP-PHAGE–High throughput SNP discovery pipeline. BMC bioinformatics 7: 1-7.
Ng, P. C. and Henikoff, S. 2003. SIFT: Predicting amino acid changes that affect protein function. Nucleic acids research 31(13): 3812-3814.
Nickerson, D. A., Tobe, V. O. and Taylor, S. L. 1997. PolyPhred: automating the detection and genotyping of single nucleotide substitutions using fluorescence-based resequencing. Nucleic acids research 25(14): 2745-2751.
Nijveen, H., van Kaauwen, M., Esselink, D. G., Hoegen, B. and Vosman, B. 2013. QualitySNPng: a user-friendly SNP detection and visualization tool. Nucleic acids research 41(W1): W587-W590.
Ong, R. T. H., Liu, X., Poh, W. T., Sim, X., Chia, K. S. and Teo, Y. Y. 2011. A method for identifying haplotypes carrying the causative allele in positive natural selection and genome-wide association studies. Bioinformatics 27(6): 822-828.
Pers, T. H., Timshel, P. and Hirschhorn, J. N. 2015. SNPsnap: a Web-based tool for identification and annotation of matched SNPs. Bioinformatics 31(3): 418-420.
Rai, A. J., Yee, J. and Fleisher, M. 2010. Biomarkers in the era of personalized medicine–a multiplexed SNP assay using capillary electrophoresis for assessing drug metabolism capacity. Scandinavian Journal of Clinical and Laboratory Investigation 70(sup242): 15-18.
Ramensky, V., Bork, P. and Sunyaev, S. 2002. Human non-synonymous SNPs: server and survey. Nucleic acids research 30(17): 3894-3900.
Reumers, J., Maurer-Stroh, S., Schymkowitz, J. and Rousseau, F. 2006. SNPeffect v2. 0: a new step in investigating the molecular phenotypic effects of human non-synonymous SNPs. Bioinformatics 22(17): 2183-2185.
Riva, A. and Kohane, I. S. 2002. SNPper: retrieval and analysis of human SNPs. Bioinformatics 18(12): 1681-1685.
Sachidanandam, R., Weissman, D., Schmidt, S. C., Kakol, J. M., Stein, L. D., Marth, G., Sherry, S., Mullikin, J. C., Mortimore, B. J., Willey, D. L., Hunt, S. E., Cole, C. G., Coggill, P. C., Rice, C. M., Ning, Z., Rogers, J., Bentley, D. R., Kwok, P. Y., Mardis, E. R., Yeh, R. T. and International SNP Map Working Group. 2001. A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409(6822): 928–933.
Savage, D., Batley, J., Erwin, T., Logan, E., Love, C. G., Lim, G. A. and Edwards, D. 2005. SNPServer: a real-time SNP discovery tool. Nucleic Acids Research 33(suppl_2): W493-W495.
Scheet, P. and Stephens, M. 2006. A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. The American Journal of Human Genetics 78(4): 629-644.
Shastry, B. S. 2002. SNP alleles in human disease and evolution. Journal of human genetics 47(11): 561-566.
Sherry, S. T., Ward, M. H., Kholodov, M., Baker, J., Phan, L., Smigielski, E. M. and Sirotkin, K. 2001. dbSNP: the NCBI database of genetic variation. Nucleic acids research 29(1): 308-311.
Stone, E. A. and Sidow, A. 2005. Physicochemical constraint violation by missense substitutions mediates impairment of protein function and disease severity. Genome research 15(7): 978-986.
Tang, J., Leunissen, J. A., Voorrips, R. E., van der Linden, C. G. and Vosman, B. 2008. HaploSNPer: a web-based allele and SNP detection tool. BMC genetics 9: 1-7.
Taylor, N. E. and Greene, E. A. 2003. PARSESNP: a tool for the analysis of nucleotide polymorphisms. Nucleic acids research 31(13): 3808-3811.
Thomas, P. D., Kejariwal, A., Campbell, M. J., Mi, H., Diemer, K., Guo, N. and Doremieux, O. 2003. PANTHER: a browsable database of gene products organized by biological function, using curated protein family and subfamily classification. Nucleic acids research 31(1): 334-341.
Thorisson, G. A. and Stein, L. D. 2003. The SNP Consortium website: past, present and future. Nucleic acids research 31(1): 124-127.
Tian, J., Wu, N., Guo, X., Guo, J., Zhang, J. and Fan, Y. 2007. Predicting the phenotypic effects of non-synonymous single nucleotide polymorphisms based on support vector machines. BMC bioinformatics 8: 1-9.
Todesco, M., Owens, G. L., Bercovich, N., Légaré, J. S., Soudi, S., Burge, D. O. and Rieseberg, L. H. 2020. Massive haplotypes underlie ecotypic differentiation in sunflowers. Nature 584(7822): 602-607.
Voisey, J. and Morris, C. P. 2008. SNP technologies for drug discovery: a current review. Current Drug Discovery Technologies 5(3): 230-235.
Wainreb, G., Ashkenazy, H., Bromberg, Y., Starovolsky-Shitrit, A., Haliloglu, T., Ruppin, E. and Ben-Tal, N. 2010. MuD: an interactive web server for the prediction of non-neutral substitutions using protein structural data. Nucleic acids research, 38(suppl_2): W523-W528.
Wang, L., Liu, S., Niu, T. and Xu, X. 2005. SNPHunter: a bioinformatic software for single nucleotide polymorphism data acquisition and management. BMC bioinformatics 6: 1-7.
Weckx, S., Del-Favero, J., Rademakers, R., Claes, L., Cruts, M., De Jonghe, P. and De Rijk, P. 2005. novoSNP, a novel computational tool for sequence variation discovery. Genome research 15(3): 436-442.
Ye, Z. Q., Zhao, S. Q., Gao, G., Liu, X. Q., Langlois, R. E., Lu, H. and Wei, L. 2007. Finding new structural and sequence attributes to predict possible disease association of single amino acid polymorphism (SAP). Bioinformatics 23(12): 1444-1450.
Yue, P., Melamud, E. and Moult, J. 2006. SNPs3D: candidate gene and SNP selection for association studies. BMC bioinformatics 7: 1-15.
Zhang, J., Wheeler, D. A., Yakub, I., Wei, S., Sood, R., Rowe, W. and Buetow, K. H. 2005. SNPdetector: a software tool for sensitive and accurate SNP detection. PLoS computational biology 1(5): e53.
