## index page for www.phylota.net # # PhyLoTA Browser

PhyLoTA Browser (rel. 1.01)

This database provides a snapshot of the current taxonomic distribution of nucleotide sequences in GenBank. Its purpose is to convey information about the potential phylogenetic data sets (clusters, or sets of homologous sequences) that can be constructed from the database for taxa of interest. It mirrors the NCBI taxonomy tree. The number of clusters is estimated by all-against-all BLAST searches and sequence clustering algorithms (for all nodes with < 20000 sequences, and excluding sequences > 25,000 nt in length). Model organisms are defined as any node (not subtree) having >100 clusters (or more than 20,000 sequences). By default, sequence tallies for model organisms propogate upward in the tree along with nonmodel organisms, but this information can be excluded, so that users can get a sense of taxonomic breadth of the sequence diversity in the database. Note, however, that the bulk of "genomic" data for model organisms is not entered in the database at all (see below for types of sequences included). Cluster tallies are linked to a view of the data availability matrix for that node in the taxonomy tree, which can provide useful guidance for supermatrix and supertree construction. Sequences for each cluster can be downloaded as an unaligned FASTA file for further analysis. Provisional alignments and phylogenetic trees are under construction.

For more information on how the clustering was implemented click here. For a list of model organisms click here.

Please note: The next scheduled rebuild of the database is Summer 2008, at which time we hope to have automated bi-monthly GenBank downloads implemented.
Query with a taxon name or id number:
  All search options   BLAST search(New!)
   Examples: Amorpha or Amor* or Amorpha * or 48130

Quick links to specific nodes:

Types of sequences included: Only "core" nucleotide data are included, which excludes ESTs, STSs, and other kinds of bulk or high-throughput sequences.
Taxonomic coverage: At present the database contains sequences from eukaryotes. These represent the PLN, MAM, PRI, ROD, VRT, and INV divisions of GenBank.

GenBank release:159 (April 15, 2007)
Number of sequences in this database:2593190
Number of nodes in our subtree(s) of the NCBI taxonomy tree:240708
Number of terminal nodes:182267
Number of nodes clustered (usually terminal taxa):181992
Number of subtrees clustered (always internal nodes):57631
Number of nodes with sequences that can be clustered:236023


Questions or comments? Contact Mike Sanderson (sanderm at email dot arizona dot edu)