About MetazSecKB

Who are we?

The MatazSecKB [Human/Animal] Secretome and Subcellular Proteome KnowledgeBase (MetazSecKB) was created by Dr. Xiangjia (Jack) Min and John Meinken at Youngstown State University (YSU). Dr. Min collected the data and designed the prediction algorithms. John Meinken implemented the database and built the website. The work was supported by a grant from the YSU University Research Council and by a graduate assistantship for John Meinken from the YSU Center for Applied Chemical Biology. This server is supported by YSU.

Our motivation

Dramatic increases in the number of protein sequences and full proteomes have led to an increased need for computational tools that can automate analysis of proteins based on the protein sequence. One area where automated analysis has shown considerable promise is in the prediction of protein subcellular location. Many publicly available tools have been developed to analyze a protein sequence for information related to its subcellular location.

The core goal of this project is to combine information from multiple tools in order to produce aggregate predictions that are more accurate than the predictions made by the individual tools alone. Our website offers a single location where researchers can see our predictions as well as see all of the data we have collected from the individual tools. In addition to making predictions, the knowledgebase also serves as a testing site where we can compare prediction accuracies of different tools.

Our Process

The data for this website was collected from the UniProtKB January, 2014 release. It includes 103,088 proteins from the curated UniProtKB/Swiss-Prot database and 3,977,730 proteins from the uncurated UniProtKB/TrEMBL database. For each protein, we perform analysis using SignalP3, SignalP4, TMHMM, Phobius, TargetP, WoLF PSORT, ScanProsite and FragAnchor. Results of all analysis are stored back together in the database along with the protein information.

Our predictions are made using all data available. For proteins with annotation for subcellular location (either from UniProt or curated by us), the annotation is used for prediction. For all other proteins, some combination of tool analysis results are used for prediction. We determine the best algorithms to combine data using a variety of statistical and data mining techniques. You can see an example of how our secretome prediction algorithm was developed in this paper.

Further Reading

Min XJ. (2010) Evaluation of computational methods for secreted protein prediction in different eukaryotes. J. Proteomics Bioinform. 3:143-147.
Lum G, Min XJ. (2011) FunSecKB: the Fungal Secretome KnowledgeBase. Database - the Journal of Biological Databases and Curation. Vol. 2011. bar001. doi: 10.1093/database/bar001.
Meinken J, Min XJ. (2012) Computational prediction of protein subcellular locations in eukaryotes: an experience report. Computational Molecular Biology. 2(1): 1-7.
Lum G, Meinken J, Orr J, Frazier S, Min XJ. (2014) PlantSecKB: the Plant Secretome and Subcellular Proteome KnowledgeBase. Computational Molecular Biology. 4(1).

Using This Website

The home page has four different search options:

Search By ID - Use this option if you have a protein ID from UniProt or NCBI or you know the gene name of the protein you are interested in.

Search By Subcellular Location - Use this option to get a list of all proteins for a species that are predicted in a specific subcellular location. The species can be selected from a list of common species or entered manually.

Search By Protein Keywords or Function - Use this option to get a list of all proteins for a species that match a protein name, function or keyword. For example, to get a list of all proteins involved in amino acid transport, enter the search text "amino acid transport" (word order does not matter). The species can be selected from a list of common species or entered manually.

BLAST Search - This will take you to our BLAST search page where you can search against this database as well as several other databases we maintain.


Get a FASTA formatted list of search results:
When doing a search by subcellular location or protein keyword/function, use the "FASTA Download" button to get the results in FASTA format. The results can be easily copied and pasted to a text file if needed. For individual proteins, the FASTA formatted protein sequence is included at the bottom of the results page.

Download the search results as a text file:
When doing a search by subcellular location or protein keyword/function, use the "Search" button to get a paginated list of results. At the top of the page, you can click the link to "Download result set as a tab delimited text file".

Get the count of proteins in a search result set:
When doing a search by subcellular location or protein keyword/function, use the "Search" button to get a paginated list of results. The number of results returned along with a description of the search parameters will be included at the top of the page.


Get our prediction for subcellular location:
The results page contains a summary section at the top and a details section at the bottom. Our prediction can be found in the summary section under "Predicted Subcellular Location(s)". Note that our prediction algorithms can sometimes produce no prediction or more than one prediction. The logic for how the prediction was made will be included next to the prediction.

Get results from individual computational tools
All of the data we collected from the individual computational tools is included in the details section on the results page.

Get our annotated data
When available, our curated annotation will be included at the bottom of the details section on the results page. However, most proteins do not have local curated annotations. UniProt annotations for subcellular location are included in the summary table on the results page when available. If you want to see the supporting reference for a UniProt annotation, click the UniProt AC value to view that entry in the UniProtKB.

Submit an Annotation:

This database accepts public annotation for subcellular location based on experimental evidence. Submissions will be added to the database after being reviewed by our curator. We have an online form for submitting protein annotations one at a time. Or if you have a large number of proteins to submit, you can contact us directly.