Frequently Asked Questions about the ASFinder Server

About the author
The ASFinder (AS-Finder) server was implemented by Dr. Xiang Jia (Jack) Min in the Bioinformatics Laboratory, Proteomics/Genomics Research Group,Youngstown State University (YSU). The work was supported by the YSU Research Professorship award (2009 - 2010) and the STEM Dean's Reassigned Time.

Motivations
Generating expressed sequence tags (ESTs) remains a primary method for gene discovery in most organisms. Identifying alternatively spliced transcript isoforms of a gene is an important step in gene functional annotation and for downstream experimental chracterization. The server is designed for identifying alternatively spliced transcripts from EST-derived sequences. Note: This server can be used for mapping ESTs to the genome (from which where the ESTs were derived), but it is not designed for predicting alternatively spliced genes from genomic sequences only.

How does it work?
If a genomic sequence file (with multiple sequences in fasta format) is provided by a user, EST-derived sequences (ESTs including cDNAs) will be mapped to the genomic sequences using SIM4 software. ESTs mapped to the same genomic locus and overlapped a certain length with a high similarity (two parameters chosen by a user) and also having exon/intron variations in the overlapping region are treated as "alternatively spliced" transcripts from a single gene. However, if no genomic sequences are provided, the ESTs will be used to perform a self-BLASTN, that is, NCBI-BLASTN will be performed to use the set of ESTs as both a "query" and a "database". ESTs having high similarities at both ends but having an unaligned internal fragment are treated as "alternatively spliced" transcripts.

Input
  • 1) A file contains EST-derived sequences (ESTs, cDNAs or contig sequences assembled from ESTs) in FASTA format. EST-derived sequences are suggested to be assembled to remove redundance using an EST assembler, such as Phrap, CAP3, TIGR Assembler (see Min et al. 2009), or EST2uni. If the EST data are not pre-assembled, i.e., redundant, the results will contain the "redundant" transcripts. Thus pre-assembling ESTs is recommended. Note: The number of EST/cDNA sequences in a file or copy/paste is limited to a maximium of 100,000. If you have >100,000 ESTs, please request a standalone version of the software.
  • 2) Optional (Required): A file contains the genome sequences of the same species. Although this file is optional, if the genome is available (completely sequenced with a good quality), the user should provide the genome seqeunces for alignment. As the genome is absolutely required for EST mapping and the output files from genome alignment will be used for further AS events analysis.
  • Note: if the genome sequences contains a number of super-contigs, you may split the file into several files (each contains one contig sequence), however, you should submit one set data per day or wait until after you get results to have a new submission.
  • 3) Parameters: there are two parameters that can be chosen by a user on the server home page. The minimum aligned fragment length and the minimum identity of the aligned fragments are used to define the "alternatively spliced transcripts" from a gene locus.
  • Note: The total combined data file size (EST file and genome file) is limited to 50 Mb only. Using the following EST sequences and genomic sequences for testing.
  • Output
    If only ESTs are provided, the output files include (1) BLASTN output file, (2) AS clusters (alternatively splited transcripts clusters), (3) a multiple sequence alignment (MSA) file for AS isoforms generated by MUSCLE. If genomic sequences are provided, the output files include (1) SIM4 alignment file, (2) a file with a modified GTF (gene transfer format) format containing tab-delimited alignment information for all ESTs, (3) AS clusters, and (4) AS specific gtf file (AS.gtf) which contains EST alignment informatin of AS transcripts. The accuracies of the methods implemented in the server were evaluated using Aspergillus niger EST data and Arabidopsis mRNA sequences and the results were reported in the paper (Min 2013).

    To further categorize the AS isoforms, the GTF file (est2genome.gtf or AS.gtf) can be used as input to the AStalavista server for analysis of AS events or using the generic genome browser (GBrowse) or the integrated genome browser to visualize the mapping of ESTs to genome. The landscape gtf output file generated by AStalavista can be further processed using our own gtf2events.pl to generate a friendly AS event ouput file.

    Security and confidentiality of user submitted data
    The data submitted to our server will be automatically deleted after the output files are generated. We do not keep data submitted by a user.

    How to obtain user's results
    The results can be downloaded from the server web site. The results will be only kept on the site for 2 days after data processing, then it will be deleted.

    How to cite us
    Min, X.J. (2013) ASFinder: a tool for genome-wide identification of alternatively spliced transcripts from EST-derived sequences. International Journal of Bioinformatics Research and Applications. The webserver website http://bioinformatics.ysu.edu/tools/ASFinder.html can also be used as a reference.

    The following papers have used ASFinder:

  • 1. Sablok G, Gupta PK, Baek JM, Vazquez F, Min XJ. (2011) Genome-wide survey of alternative splicing in the grass Brachypodium distachyon: an emerging model biosystem for plant functional genomics. Biotechnology Letters. 33(3):629-636. (doi:10.1007/s10529-010-0475-6).
  • 2. Walters B, Lum G, Sablok G and Min XJ. (2013) Genome-wide landscape of alternative splicing events in Brachypodium distachyon. DNA Research. doi:10.1093/dnares/dss041.
  • 3. VanBuren R, Walters B, Ming R, Min XJ. (2013) Analysis of expressed sequence tags and alternative splicing genes in sacred lotus (Nelumbo nucifera Gaertn.). Plant Omics J. 6:311-317.

    Stand-alone tool for download
    The standalone version of the software is available free for academic use only. It is written in Perl and need to run in LINUX for the SIM4 software. Please download at following site for downloading.

    Comments and suggestions
    Please contact Dr. Min in the YSU Bioinformatics Lab.


    Back to the AS-Finder Server Top of Page Back to Index Page