登录  
 加关注
   显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭

秒大刀 博客

好好学习 天天向上

 
 
 

日志

 
 
 
 

FASTA format description  

2007-03-16 09:55:33|  分类: 技术积累 |  标签: |举报 |字号 订阅

  下载LOFTER 我的照片书  |

 

FASTA format description

FASTA格式说明

 

A sequence in FASTA format begins with a single-line description, followed by lines of sequence data. The description line is distinguished from the sequence data by a greater-than (">") symbol in the first column. It is recommended that all lines of text be shorter than 80 characters in length. An example sequence in FASTA format is:

FASTA格式序列开始是一行说明信息,后面是紧接着多行的序列数据。说明信息和序列数据是用第一列的一个大于号(“>”)区别开的。建议每行不能多于80个字符的长度。下面是一个FASTA序列示例:

 

>gi|532319|pir|TVFV2E|TVFV2E envelope protein

ELRLRYCAPAGFALLKCNDADYDGFKTNCSNVSVVHCTNLMNTTVTTGLLLNGSYSENRT

QIWQKHRTSNDSALILLNKHYNLTVTCKRPGNKTVLPVTIMAGLVFHSQKYNLRLRQAWC

HFPSNWKGAWKEVKEEIVNLPKERYRGTNDPKRIFFQRQWGDPETANLWFNCHGEFFYCK

MDWFLNYLNNLTVDADHNECKNTSGTKSGNKRAPGPCVQRTYVACHIRSVIIWLETISKK

TYAPPREGHLECTSTVTGMTVELNYIPKNRTNVTLSPQIESIWAAELDRYKLVEITPIGF

APTEVRRYTGGHERQKRVPFVXXXXXXXXXXXXXXXXXXXXXXVQSQHLLAGILQQQKNL

LAAVEAQQQMLKLTIWGVK

 

Sequences are expected to be represented in the standard IUB/IUPAC amino acid and nucleic acid codes, with these exceptions: lower-case letters are accepted and are mapped into upper-case; a single hyphen or dash can be used to represent a gap of indeterminate length; and in amino acid sequences, U and * are acceptable letters (see below). Before submitting a request, any numerical digits in the query sequence should either be removed or replaced by appropriate letter codes (e.g., N for unknown nucleic acid residue or X for unknown amino acid residue).

序列是以IUB/IUPAC氨基酸和核酸标准编码表示的。小写字母是可以被接受的并且被映射为大写;单个的连字符或者划线表示不确定长度的缝隙;在氨基酸序列里U和*可以被接受(见下文)。查询序列中的任何数字在提交请求前应该被删除或者被替换成适当的字母编码(如:N表示不确定的核酸片断,X表示不确定的氨基酸片断)。

 

       The nucleic acid codes supported are:

       支持的核酸编码为:

 

        A --> adenosine腺苷           M --> A C (amino氨基酸)

        C --> cytidine胞啶            S --> G C (strong强的)

        G --> guanine鸟嘌呤           W --> A T (weak弱的)

        T --> thymidine胸苷           B --> G T C

        U --> uridine尿苷             D --> G A T

        R --> G A (purine嘌呤)        H --> A C T

        Y --> T C (pyrimidine嘧啶)    V --> G C A

        K --> G T (keto酮)            N --> A G C T (any任何)

                                      -  gap of indeterminate length长度不确定的裂缝

 

For those programs that use amino acid query sequences (BLASTP and TBLASTN), the accepted amino acid codes are:

       对于使用氨基酸查询序列的程序(BLASTP和TBLASTN),支持的氨基酸编码为:

 

    A  alanine                             P  proline

    B  aspartate or asparagine             Q  glutamine

    C  cystine                             R  arginine

    D  aspartate                           S  serine

    E  glutamate                           T  threonine

    F  phenylalanine                       U  selenocysteine

    G  glycine                             V  valine

    H  histidine                           W  tryptophan

    I  isoleucine                          Y  tyrosine

    K  lysine                              Z  glutamate or glutamine

    L  leucine                             X  any

    M  methionine                          *  translation stop

    N  asparagine                          -  gap of indeterminate length

[完毕]

  评论这张
 
阅读(1541)| 评论(1)

历史上的今天

评论

<#--最新日志,群博日志--> <#--推荐日志--> <#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇,下一篇--> <#-- 热度 --> <#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->
 
 
 
 
 
 
 
 
 
 
 
 
 
 

页脚

网易公司版权所有 ©1997-2018