DBs/Tools

Below are listed databases and tools developed by bioinformatic researchers at the University of Toronto. This list was updated on 8 November 2017, and is reasonably comprehensive. Please contact nicholas.provart@utoronto.ca to add any other tools or DBs as these are published.

Name of Tool Reference Description
Analysis
CisRegTest Moses, BMC Evolutionary Biology (2009) Statistical tests for natural selection on regulatory regions based on the strength of transcription factor binding sites.
MONKEY Moses~Eisen, Genome Biology (2004) MONKEY Identifies conserved transcription-factor binding sites in multispecies alignments.
MLHKA Wright and Charlesworth, Genetics (2004) MLHKA is a program for testing for positive or balancing selection using polymorphism and divergence data from two species.
WASP Barash~Frey, Nature (2010) A web application that predicts whether or not an exon is alternatively spliced, and if so, how its splicing depends on different cellular conditions, such as tissue type. The application also maps putative regulatory elements in primary transcript sequence that is nearby regulated exons.
Subseqer He and Parkinson, Bioinformatics (2008) Graph-based Webtool for uncovering meaningful sequence motifs from low complexity sequences.
ISOLATE Quon~Morris, Bioinformatics (2009) Separates heterogeneous tumor gene expression profiles into its constituent, purified tumor and healthy tissue gene expression profiles.
OrthoNets Hao~Wodak, Bioinformatics (2011) This Cytoscape plugin enables the simultaneous visualization of interaction and domain co-occurrence networks in multiple organisms, using information aggregated in DAnCER and iRefWeb/iRefIndex.
GenePro Vlasblom~Wodak, Bioinformatics (2006) GenePro is a Cytoscape plug-in for the visualization and analysis of protein and gene interaction networks at multiple levels of resolution.
Cytoscape Cline~Bader, Nature Protocols (2007) Cytoscape is a bioinformatics software platform for visualizing molecular interaction networks and integrating these interactions with gene expression profiles and other state data. Additional features are available as plugins.
Cytoscape Web Lopes~Bader, Bioinformatics (2010) Cytoscape Web is a simplified web-based version of Cytoscape
NetMatch Cytoscape plugin Ferro~Shasha, Bioinformatics (2007) NetMatch is a Cytoscape plugin that finds user defined network motifs in any Cytoscape network. Node and edge attributes of any type and paths of unknown length can be specified in the search.
MCODE Cytoscape plugin Bader~Hogue, BMC Bioinformatics (2003) MCODE is a Cytoscape plugin that finds clusters (highly interconnected regions) in a network.
Enrichment Map Cytoscape plugin Merico~Bader, PLOS ONE (2010) Enrichment Map is a Cytoscape plugin for functional enrichment visualization. Enrichment results have to be generated outside Enrichment Map, using any of the available methods. 
WordCloud Cytoscape plugin   The WordCloud plugin is a Cytoscape plugin that generates a visual summary of a network. It displays string attributes associated with nodes in the network as a tag cloud, where more frequent words are displayed using a larger font size.
GeneMANIA Cytoscape plugin Montojo~Bader, Bioinformatics (2010) The GeneMANIA Cytoscape plugin brings fast gene function prediction capabilities to the desktop. GeneMANIA identifies the most related genes to a query gene set using a guilt-by-association approach.
NAViGaTOR . Network Analysis, Visualization and Graphing, Toronto Brown~Jurisica, Bioinformatics (2009) and other refs. Visualization and analysis of large networks
MoDIL Lee~Brudno, Nature Methods (2009) MoDIL, or Mixture of Distributions Indel Locator, is a novel method for finding medium sized insertions or deletions from high throughput sequencing datasets. Our method can take advantage of the high clone coverage of these datasets to identify progressively shorter indel variants, even if the individual clone sizes are unreliable.
VARiD Dalca~Brudno, Bioinformatics (2010) VARiD is a Hidden Markov Model for SNP and indel identification with AB-SOLiD color-space as well as regular letter-space reads. VARiD combines both types of data in a single framework which allows for accurate predictions. VARiD was developed at the University of Toronto Computational Biology Lab.
Savant Genome Browser Fiume~Brudno, Bioinformatics (2010) The Savant Genome Browser is a desktop visualization tool for genomic data. It was primarily developed for visualizing high throughput (aka next generation) sequencing data, although it can be used to visualize virtually any genome-based sequence, point, interval, or continuous dataset.
SHRiMP Rumble~Brudno, PLoS Computational Biology (2009) SHRiMP, or SHort Read Mapping Program, is a software package for aligning genomic reads against a target genome. It was primarily developed with the multitudinous short reads of next generation sequencing machines in mind, as well as Applied Biosystem's colourspace genomic representation.
SHRiMP2 David~Brudno, Bioinformatics (2011) A major update of the original SHort Read Mapping Program (SHRiMP). SHRiMP2 primarily targets mapping sensitivity, and is able to achieve high accuracy at a very reasonable speed.
CNVer Medvedev~Brudno, Genome Research (2010) CNVer is a method for CNV detection that supplements the depth-of-coverage with paired-end mapping information, where matepairs mapping discordantly to the reference serve to indicate the presence of variation.
SCPSRM Moses~Durbin, Genome Biology (2007) Scripts to identify Spatial clustering of phosphorylation site recognition motifs for predicting the targets of cyclin-dependent kinase.
SGRP Blast Server Liti~Louis, Nature (2009) Blast server for Saccharomyces Genome Resequencing Project.
Density Estimation Tool for Enzyme Classification (DETECT) Hung~Parkinson, Bioinformatics (2010) DETECT is a probabilistic method and standalone tool for enzyme prediction that accounts for the sequence diversity across enzyme families.
NLStradamus Nguyen~Moses, BMC Bioinformatics (2009) Webserver using hidden Markov models (HMMs) to predict novel Nuclear Localization Signals in proteins.
RNAcontext Kazan~Morris, PLoS Computational Biology (2010) RNAcontext is a motif-finding algorithm to infer sequence and structure preferences of RNA binding proteins (RBP) from experimental affinity data. The input to RNAcontext consists of a set of sequences, their associated structure annotation profiles and affinity estimates (binary or continuous) for the given RBP.
Restricted Neighborhood Search Clustering Algorithm (RNSC) King~Jurisica, Bioinformatics (2004) Protein complex prediction via cost-based clustering.
Modular Subnetwork Biomarker Identification Fortney~Jurisica, Genome Biology (2010) A method for biomarker identification that combines networks of genes selected based on phenotype-dependent activity and a graph-theoretic property called modularity.
kmerHMM Wong~Zhang, Nucleic Acids Research (2013) De novo motif discovery method for Protein Binding Microarray (PBM) data. (Similar tools: MEME and Gibbs Sampler).
SNPdryad Wong~Zhang, Bioinformatics (2014) Deleterious non-synonymous SNP predictions for human. (Similar tools: Polyphen2 and SIFT).
Segway Hoffman~Noble, Nature Methods (2012) Segway performs semi-automated genome annotation using multiple tracks of genome-wide data such as that from ChIP-seq or DNase-seq experiments. It produces annotations that can be used to visualize complex multivariate data in a simple way and interpret the effects of noncoding variation.
Genomedata Hoffman~Noble, Bioinformatics (2010) Genomedata is a format for efficient storage of multiple tracks of numeric data anchored to a genome. The format allows fast random access to hundreds of gigabytes of data, while retaining a small disk space footprint. We have also developed utilities to load data into this format and a reference implementation in Python and C components.
Data Visualization
ePlant Waese~Provart, The Plant Cell (2017) Web-tool for exploring large data sets from Arabidopsis from the km- to nm-scales.
NAViGaTOR . Network Analysis, Visualization and Graphing, Toronto Brown~Jurisica, Bioinformatics (2009) and other refs. Visualization and analysis of large networks
Cytoscape Cline~Bader, Nature Protocols (2007) Cytoscape is a bioinformatics software platform for visualizing molecular interaction networks and integrating these interactions with gene expression profiles and other state data. Additional features are available as plugins (see Analysis section above).
Segtools Buske, Hoffman~Noble, BMC Bioinformatics (2011) Segtools is a Python package for analyzing genomic segmentations. The software efficiently calculates a variety of summary statistics and produces corresponding publication quality visualizations. The overall goal of Segtools is to provide a bird's-eye view of complex genomic data sets, allowing researchers to easily generate and confirm hypotheses.
Databases
Yeast KID Sharifpoor & Nguyen~Moses & Andrews, Genome Biology (2011) The Yeast Kinase Interaction Database contains curated data relevant to phosphorylation events in budding yeast.
CDIP, Cancer Data Integration Portal   Database of significantly deregulated genes in lung, ovarian, prostate, head and neck cancer, sarcoma.
DAnCER Turinsky~Wodak, Bioinformatics (2011) DAnCER permits the exploration of chromatin modification (CM)-related genes in the full context of protein complexes, gene-expression regulation and pathways.
PhyloPro Xiong~Parkinson, Bioinformatics (2011) A web-based tool for the generation and visualization of phylogenetic profiles across Eukarya.
PartiGeneDB Peregrin-Alvarez~Parkinson, Nucleic Acids Research (2005) Database of Partial Genomes based on Expressed Sequence Tags.
eNet Hu,Janga,Babu,Díaz-Mejía,Butland~Moreno-Hagelsied&Emili, PLoS Biology (2009) eNet is a database of gene function prediction in Escherichia coli K12.
GeneCards mirror   Comprehensive gene annotation portal
iRefWeb Turinsky~Wodak, Bioinformatics (2010) iRefWeb is a web interface to protein interaction data consolidated from 10 public databases: BIND, BioGRID, CORUM, DIP, IntAct, HPRD, MINT, MPact, MPPI and OPHID.
Yeast Interactome Database Krogan&Cagney~Emili&Greenblatt, Nature (2006) Yeast TAP Project aims to identify protein-protein interactions in yeast, Saccharomyces cerevisiae.
Bacteriome.org Su~Parkinson, Nucleic Acids Research (2008) Database of high quality E. coli interactions.
BIND Translation Isserlin~Bader, Database (2011) Conversion of the BIND molecular interaction database to PSI-MI 2.5
I2D, Interologous Interaction Database Brown~Jurisica, Bioinformatics (2005) and other refs. Integrated database of physical protein-protein interactions for human, mouse, rat, fly, worm and yeast. Integrates curated, HTP and predicted interactions
mirDIP, microRNA:target prediction Data Integration Portal Shirdel~Jurisica, PLoS ONE (2011) Integrated portal for microRNA prediction from 11 databases
Bio-Analytic Resource Toufighi~Provart, The Plant Journal (2005) Suite of web-based tools for exploring and analyzing gene expression and other data from several plant species.
eFP Browser Winter~Provart, PLoS ONE (2007) Web-tool for exploring gene expression data from several plant species in an intuitive manner.
Mouse Proteome Project Kislinger&Kanna~Rossant, Hughes, Frey & Emili, Cell (2006) A mouse proteome collection of abundance profiles obtained for proteins of special interest, permitting complete access of database results, altered tissue and organelle spectral counts, and high-confidence subcellular assignments.
ElastoDB He~Parkinson, Matrix Biology (2007) Database of elastic-like sequences.
Function Prediction
The GeneMANIA prediction server

Warde-Farley, Donaldson, Comes & Zuberi (joint first authors)~Bader & Morris (joint senior authors), Nucleic Acids Research (2010)

State of the art, query-customized /in silico/ gene function prediction with multiple data types, live over the web.
Modeling
Cell++ Sanford~Parkinson, Bioinformatics (2006) Stochastic cell simulation environment for modelling dynamic biochemical systems within a spatial context.
Pathways
cPath Cerami~Sander, BMC Bioinformatics (2006) cPath, an open source database and web application for collecting, storing, browsing and querying biological pathway data.
Pathway Commons Cerami~Sander, Nucleic Acids Research (2011) Pathway Commons is a convenient point of access to biological pathway information collected from public pathway databases, which you can browse or search. Pathways include biochemical reactions, complex assembly, transport and catalysis events, and physical interactions involving proteins, DNA, RNA, small molecules and complexes.
The Cancer Cell Map   The Cancer Cell Map contains selected cancer related signaling pathways which you can browse or search. Biologists can browse and search the Cancer Cell Map pathways. View gene expression data on any pathway. Computational biologists can download all pathways in BioPAX format for global analysis. Software developers can build software on top of the Cancer Cell Map using the web service API.
BioPAX Demir~Rajasimha, Nature Biotechnology (2010) BioPAX (Biological Pathway Exchange) is a collaborative effort to create a data exchange format for biological pathway data. BioPAX covers metabolic pathways, molecular interactions and protein post-translational modifications.
Pathguide Bader~Sander, Nucleic Acids Research (2006) Pathguide, the Pathway Resource List, contains information about hundreds of online biological pathway resources. Databases that are free and those supporting BioPAX, CellML, PSI-MI or SBML standards are highlighted.
Structure
Structural Genomics of Histone Tail Recognition Wang~Schapira, Bioinformatics (2010) Histone tails are subjected to various post translational modifications, which regulates gene expression and differentiation. This website highlights the structural mechanisms underlying recognition of histone tails by the readers, writers and erasers of methyl and acetyl marks.
LigAlign Heifets~Lilien, Journal of Molecular Graphics and Modeling (2010) Automated ligand-based active site alignment.