PlantRBP

Putative Orthologous Groups
Plant RNA Binding Protein Database

About | Search | Blast

About PlantRBP/POGs

Each gene-model in the predicted proteomes of Orysa sativa (rice) and Arabidopsis thaliana was compared to every gene-model in both species using WU-BLAST (http://blast.wustl.edu/), and (up to 20) gene-models with scores above our cutoff threshold (e-value < 1e-5) were considered “related” to the query sequence. 

Each set of related sequences was subjected to All by All pairwise global alignment using an implementation of the Needleman-Wunsch algorithm (Needleman and Wunsch 1970). The scores from these alignments were used to cluster gene-models into Putative Orthologous Groups (POGs) using a mutual best hits strategy. Additionally, all gene-models for a given locus were assigned to the same POG if any gene-model for that locus met the mutual best hits criterion for the POG.

Trees illustrating the relationship between POG members and other closely-related proteins are displayed with each POG.  Trees were generated by using MUSCLE (Edgar 2004) to align the proteins that are most closely related to the members of each POG (evalue > 1e-5 and coverage > 50%). The resulting multiple sequence alignments and guide trees are displayed to allow users to evaluate POG assignments; where the trees support the POG assignments, this is indicated on the POG display page and the POG is strongly-supported.  The trees contain links that allow users to access nearby POGs and/or download the component sequences for more rigorous phylogenetic analysis.

All input gene-models were examined using InterProscan, TargetP, Predotar, NucPred, and PredictNLS, to determine domain architecture and predict intracellular targeting.  This information is displayed for each gene-model on the POG detail page.

Because the maize genome has not been fully sequenced, we were unable to use a reciprocal best hits strategy to assign maize proteins to POGs. We chose instead to associate maize genomic assemblies and EST contigs with their most closely-related rice gene, via blastn comparison with rice genomic DNA.

The database can be queried using any combination of gene name, accession, domain (Pfam, InterPro etc) or predicted intracellular targeting, or using BLAST.

Which sequences are we using?

PlantRBP v.2 uses version 6 of Arabidopsis and version 4 of rice.

The protein, mRNA, and unspliced genomic sequences for Arabidopsis were downloaded from TAIR, those for rice were downloaded from TIGR.
Maize sequences were downloaded from several locations; ISU genomic assemblies were downloaded from http://magi.plantgenomics.iastate.edu, version 150a of PlantGDB's EST assemblies from http://plantgdb.org/ and ESTs associated with the Arizona full-length cDNA project were downloaded from genbank (http://nlm.nih.gov/entrez/).

Software used:

Pairwise BLAST comparisons where performed using  WU BLAST 2.0 (http://blast.wustl.edu/), at either the protein (rice vs Arabidopsis) or nucleotide (maize vs rice) level.

NEEDLE an implementation of the Needleman-Wunsch algorithm (Needleman and Wunsch 1970) distributed with the EMBOSS package (Rice, Longden et al. 2000) was used to perform the pairwise global alignments upon which POG assignments were based. http://emboss.sourceforge.net

MUSCLE version 3.6 was used to generate the multiple sequence alignments and guide trees displayed on the website. http://www.drive5.com/muscle/

We used version 3.3 of InterProScan (http://www.ebi.ac.uk/interpro/index.html) with updated Pfam (version 19) and SuperFamily (1.69) models.

TargetP (http://www.cbs.dtu.dk/services/TargetP/), Predotar (http://urgi.infobiogen.fr/predotar/predotar.html), NucPred (http://www.sbc.su.se/~maccallr/nucpred/) and PredictNLS (http://cubic.bioc.columbia.edu/predictNLS/) were used to predict intracellular targeting.

Refs:

Edgar, R. C. (2004). "MUSCLE: a multiple sequence alignment method with reduced time and space complexity." BMC Bioinformatics 5: 113.

Needleman, S. B. and C. D. Wunsch (1970). "A general method applicable to the search for similarities in the amino acid sequence of two proteins." J Mol Biol 48(3): 443-53.

Rice, P., I. Longden, et al. (2000). "EMBOSS: the European Molecular Biology Open Software Suite." Trends Genet 16(6): 276-7.

Feedback