FASTA format description
FASTA格式说明
A sequence in FASTA format begins with a single-line description, followed by lines of sequence data. The description line is distinguished from the sequence data by a greater-than (">") symbol in the first column. It is recommended that all lines of text be shorter than 80 characters in length. An example sequence in FASTA format is:
FASTA格式序列开始是一行说明信息,后面是紧接着多行的序列数据。说明信息和序列数据是用第一列的一个大于号(“>”)区别开的。建议每行不能多于80个字符的长度。下面是一个FASTA序列示例:
>gi|532319|pir|TVFV2E|TVFV2E envelope protein
ELRLRYCAPAGFALLKCNDADYDGFKTNCSNVSVVHCTNLMNTTVTTGLLLNGSYSENRT
QIWQKHRTSNDSALILLNKHYNLTVTCKRPGNKTVLPVTIMAGLVFHSQKYNLRLRQAWC
HFPSNWKGAWKEVKEEIVNLPKERYRGTNDPKRIFFQRQWGDPETANLWFNCHGEFFYCK
MDWFLNYLNNLTVDADHNECKNTSGTKSGNKRAPGPCVQRTYVACHIRSVIIWLETISKK
TYAPPREGHLECTSTVTGMTVELNYIPKNRTNVTLSPQIESIWAAELDRYKLVEITPIGF
APTEVRRYTGGHERQKRVPFVXXXXXXXXXXXXXXXXXXXXXXVQSQHLLAGILQQQKNL
LAAVEAQQQMLKLTIWGVK
Sequences are expected to be represented in the standard IUB/IUPAC amino acid and nucleic acid codes, with these exceptions: lower-case letters are accepted and are mapped into upper-case; a single hyphen or dash can be used to represent a gap of indeterminate length; and in amino acid sequences, U and * are acceptable letters (see below). Before submitting a request, any numerical digits in the query sequence should either be removed or replaced by appropriate letter codes (e.g., N for unknown nucleic acid residue or X for unknown amino acid residue).
序列是以IUB/IUPAC氨基酸和核酸标准编码表示的。小写字母是可以被接受的并且被映射为大写;单个的连字符或者划线表示不确定长度的缝隙;在氨基酸序列里U和*可以被接受(见下文)。查询序列中的任何数字在提交请求前应该被删除或者被替换成适当的字母编码(如:N表示不确定的核酸片断,X表示不确定的氨基酸片断)。
The nucleic acid codes supported are:
支持的核酸编码为:
A --> adenosine腺苷 M --> A C (amino氨基酸)
C --> cytidine胞啶 S --> G C (strong强的)
G --> guanine鸟嘌呤 W --> A T (weak弱的)
T --> thymidine胸苷 B --> G T C
U --> uridine尿苷 D --> G A T
R --> G A (purine嘌呤) H --> A C T
Y --> T C (pyrimidine嘧啶) V --> G C A
K --> G T (keto酮) N --> A G C T (any任何)
- gap of indeterminate length长度不确定的裂缝
For those programs that use amino acid query sequences (BLASTP and TBLASTN), the accepted amino acid codes are:
对于使用氨基酸查询序列的程序(BLASTP和TBLASTN),支持的氨基酸编码为:
A alanine P proline
B aspartate or asparagine Q glutamine
C cystine R arginine
D aspartate S serine
E glutamate T threonine
F phenylalanine U selenocysteine
G glycine V valine
H histidine W tryptophan
I isoleucine Y tyrosine
K lysine Z glutamate or glutamine
L leucine X any
M methionine * translation stop
N asparagine - gap of indeterminate length
[完毕]
评论