秒大刀博客

好好学习天天向上

日志

关于我

秒大刀

文章分类

Byte-order mark (BOM)

2008-07-18 14:31:08| 分类：技术积累 | 标签： |举报 |字号大中小订阅

下载LOFTER 我的照片书 |

Byte-order mark

From Wikipedia, the free encyclopedia

Jump to: navigation, search

A byte-order mark (BOM) is the Unicode character at code point U+FEFF ("zero-width no-break space") when that character is used to denote the endianness of a string of UCS/Unicode characters encoded in UTF-16 or UTF-32. It is conventionally used as a marker to indicate that text is encoded in UTF-8, UTF-16 or UTF-32.

Usage

In most character encodings the BOM is a pattern which is unlikely to be seen in other contexts (it would usually look like a sequence of obscure control codes). If a BOM is misinterpreted as an actual character within Unicode text then it will generally be invisible due to the fact it is a zero-width no-break space. Use of the U+FEFF character for non-BOM purposes has been deprecated in Unicode 3.2 (which provides an alternative, U+2060, for those other purposes), allowing U+FEFF to be used solely with the semantic of BOM.

In UTF-16, a BOM (U+FEFF) is placed as the first character of a file or character stream to indicate the endianness (byte order) of all the 16-bit code units of the file or stream.

If the 16-bit units are represented in big-endian byte order, this BOM character will appear in the sequence of bytes as 0xFE followed by 0xFF (where "0x" indicates hexadecimal);
if the 16-bit units use little-endian order, the sequence of bytes will have 0xFF followed by 0xFE.

The Unicode value U+FFFE is guaranteed never to be assigned as a Unicode character; this implies that in a Unicode context the 0xFF, 0xFE byte pattern can only be interpreted as the U+FEFF character expressed in little-endian byte order (since it could not be a U+FFFE character expressed in big-endian byte order).

While UTF-8 does not have byte order issues, a BOM encoded in UTF-8 may be used to mark text as UTF-8. It only identifies a file as UTF-8 and does not state anything about byte order.^[1] Quite a lot of Windows software (including Windows Notepad) adds one to UTF-8 files. However in Unix-like systems (which make heavy use of text files for file formats as well as for inter-process communication) this practice is not recommended, as it will interfere with correct processing of important codes such as the hash-bang at the start of an interpreted script. It may also interfere with source for programming languages that don't recognise it. For example, gcc reports stray characters at the beginning of a source file, and in PHP, if output buffering is disabled, it has the subtle effect of causing the page to start being sent to the browser, preventing custom headers from being specified by the PHP script. The UTF-8 representation of the BOM is the byte sequence EF BB BF, which appears as the ISO-8859-1 characters ï»¿ in most text editors and web browsers not prepared to handle UTF-8.

Although a BOM could be used with UTF-32, this encoding is rarely used for transmission. Otherwise the same rules as for UTF-16 are applicable, for the IANA registered charsets UTF-16BE, UTF-16LE, UTF-32BE, and UTF-32LE a "byte order mark" must not be used, an initial U+FEFF has to be interpreted as a (deprecated) "zero width no-break space", because the names of these charsets already determine the byte order. For the registered charsets UTF-16 and UTF-32 an initial U+FEFF indicates the byte order.

Representations of byte order marks by encoding

Encoding	Representation (hexadecimal)
UTF-8	`EF BB BF`^?
UTF-16 (BE)	`FE FF`
UTF-16 (LE)	`FF FE`
UTF-32 (BE)	`00 00 FE FF`
UTF-32 (LE)	`FF FE 00 00`
UTF-7	`2B 2F 76`, and one of the following bytes: `[ 38 \| 39 \| 2B \| 2F ]`^?
UTF-1	`F7 64 4C`
UTF-EBCDIC	`DD 73 66 73`
SCSU	`0E FE FF`^?
BOCU-1	`FB EE 28` optionally followed by `FF`^?

^ In UTF-8, this is not really a "byte order" mark. It identifies the text as UTF-8 but doesn't say anything about the byte order, because UTF-8 does not have byte order issues.^[1]^[2]
^ In UTF-7, the fourth byte of the BOM, before encoding as base64, is 001111xx in binary, and xx depends on the next character (the first character after the BOM). Hence, technically, the fourth byte is not purely a part of the BOM, but also contains information about the next (non-BOM) character. For xx=00, 01, 10, 11, this byte is, respectively, 38, 39, 2B, or 2F when encoded as base64. If no following character is encoded, 38 is used for the fourth byte and the following byte is 2D.
^ SCSU allows other encodings of U+FEFF, the shown form is the signature recommended in UTR #6.^[3]
^ For BOCU-1 a signature changes the state of the decoder. Octet 0xFF resets the decoder to the initial state.^[4]

References

External links

Retrieved from "http://en.wikipedia.org/wiki/Byte-order_mark"

Categories: Unicode

评论这张

转发至微博

阅读(1840)| 评论(0)

历史上的今天

this.p={  m:2,
              b:2,
              loftPermalink:'',
              id:'fks_084075093081085067083083094095086086084069081081082',
              blogTitle:'Byte-order mark (BOM)',
              blogAbstract:'<P\><FONT size=\"5\" \><STRONG\><A target=\"_blank\" rel=\"nofollow\" href=\"http://en.wikipedia.org/wiki/Byte-order_mark\" \><FONT size=\"5\" \><STRONG\>Byte-order mark</STRONG\></FONT\></A\></STRONG\></FONT\></P\>  <DIV\>  <H3\>From Wikipedia, the free encyclopedia</H3\>  <DIV\></DIV\>  <DIV\>Jump to: <A rel=\"nofollow\" href=\"http://en.wikipedia.org/wiki/Byte-order_mark#column-one\"  \>navigation</A\>, <A rel=\"nofollow\" href=\"http://en.wikipedia.org/wiki/Byte-order_mark#searchInput\"  \>search</A\></DIV\>  <P\>A <B\>byte-order mark</B\> (<B\>BOM</B\>) is the <A title=\"Unicode\" rel=\"nofollow\" href=\"http://en.wikipedia.org/wiki/Unicode\"  \>Unicode</A\> character at code point <CODE\>U+FEFF</CODE\> (\"zero-width no-break space\") when that character is used to denote the <A title=\"Endianness\" rel=\"nofollow\" href=\"http://en.wikipedia.org/wiki/Endianness\"  \>endianness</A\> of a string of <A rel=\"nofollow\"\></A\></P\></DIV\>',
              blogTag:'',
              blogUrl:'blog/static/2056574200861823012892',
              isPublished:1,
              istop:false,
              type:0,
              modifyTime:1317439447738,
              publishTime:1216362668779,
              permalink:'blog/static/2056574200861823012892',
              commentCount:0,
              mainCommentCount:0,
              recommendCount:0,
              bsrk:-100,
              publisherId:0,
              recomBlogHome:false,
              currentRecomBlog:false,
              attachmentsFileIds:[],
              vote:{},
              groupInfo:{},
              friendstatus:'none',
              followstatus:'unFollow',
              pubSucc:'',
              visitorProvince:'',
              visitorCity:'',
              visitorNewUser:false,
              postAddInfo:{},
              mset:'000',
              mcon:'',
              srk:-100,
              remindgoodnightblog:false,
              isBlackVisitor:false,
              isShowYodaoAd:false,
              hostIntro:'',
              hmcon:'1',
              selfRecomBlogCount:'0',
              lofter_single:'<iframe width="140" height="560" style="overflow:hidden;" src="http://www.lofter.com/mailEntry.do?blogad=1&blog" frameBorder="0"></iframe>'
            }

{list a as x}
    {if !!x}
    <div class="iblock nbw-fce nbw-f40">
      <a class="fc03 noul" target="_blank" hidefocus="true" href="http://blog.163.com/${x.visitorName}/">
      {if x.visitorName==visitor.userName}
      <img alt="${x.visitorNickname|escape}" onerror="this.src=location.f40" class="cwd bdwa bdc0" src="${fn1(x.visitorName)}&r=${visitor.imageUpdateTime}"/>
      {else}
      <img alt="${x.visitorNickname|escape}" onerror="this.src=location.f40" class="cwd bdwa bdc0" src="${fn1(x.visitorName)}"/>
      {/if}
      </a>
      <div class="cwd vname thide">
        {if x.moveFrom=='wap'}
          <a class="noul pnt" target="_blank" href="http://blog.163.com/services/wapblog.html?frompersonalbloghome"><span title="来自网易手机博客" class="iblock wapIcon"> </span></a>
        {elseif x.moveFrom=='iphone'}
          <a class="noul pnt" target="_blank"><span title="来自iPhone客户端" class="iblock iphoneIcon"> </span></a>
        {elseif x.moveFrom=='android'}
          <a class="noul pnt" target="_blank"><span title="来自Android客户端" class="iblock androidIcon"> </span></a>
        {elseif x.moveFrom=='mobile'}
          <a class="noul pnt" target="_blank" href="http://blog.163.com/services/emsblog.html?frompersonalbloghome"><span title="来自网易短信写博" class="iblock wapIcon"> </span></a>
        {/if}
        <a class="fc03 m2a"  target="_blank" hidefocus="true" href="http://blog.163.com/${x.visitorName}/">
          ${fn(x.visitorNickname,8)|escape}
        </a>
      </div>
    </div>
    {/if}
    {/list}

<#--最新日志，群博日志--> <#--推荐日志-->

<p class="fc06">推荐过这篇日志的人：</p>
    <div>
      {list a as x}
      {if !!x}
      <div class="iblock nbw-fce nbw-f40">
        <a class="fc03 noul" target="_blank" hidefocus="true" href="http://blog.163.com/${x.recommenderName}/">
        <img alt="${x.recommenderNickname|escape}" onerror="this.src=location.f40" class="cwd bdwa bdc0" src="${fn1(x.recommenderName)}"/>
        </a>
        <div class="cwd thide">
          <a class="fc03 m2a" target="_blank" hidefocus="true" href="http://blog.163.com/${x.recommenderName}/">
            ${fn(x.recommenderNickname,6)|escape}
          </a>
        </div>
      </div>
      {/if}
      {/list}
    </div>
    {if !!b&&b.length>0}
    <p  class="fc06">他们还推荐了：</p>
    <ul>
    {list b as y}
      {if !!y}
        <li class="rrb"><span class="iblock">·</span><a class="fc03 m2a" target="_blank" href="http://blog.163.com/${y.recommendBlogPermalink}/?from=blog/static/2056574200861823012892">${y.recommendBlogTitle|escape}</a></li>
      {/if}
    {/list}
    </ul>
    {/if}

<#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇，下一篇--> <#-- 热度 -->

{list a as x}
    {if !!x}
    <div class="hotItem iblock nbw-fce nbw-f40">
      <a class="fc03 noul" target="_blank" hidefocus="true" href="http://blog.163.com/${x.publisherUsername}/">
      {if x.publisherUsername==visitor.userName}
      <img alt="${x.publisherNickname|escape}" onerror="this.src=location.f40" class="cwd bdwa bdc0" src="${fn1(x.publisherUsername)}&r=${visitor.imageUpdateTime}"/>
      {else}
      <img alt="${x.publisherNickname|escape}" onerror="this.src=location.f40" class="cwd bdwa bdc0" src="${fn1(x.publisherUsername)}"/>
      {/if}
      </a>
      <div class="cwd vname thide">
        <a class="fc03 m2a"  target="_blank" hidefocus="true" href="http://blog.163.com/${x.publisherUsername}/">
          ${fn(x.publisherNickname,8)|escape}
        </a>
      </div>
      <a class="f-myLikeIcons hottype {if x.type==1} js-liketype{elseif x.type==2} js-reblogtype{elseif x.type==3} js-sharetype{else}{/if}" target="_blank" hidefocus="true" href="http://blog.163.com/${x.publisherUsername}/"> </a>
    </div>
    {/if}
    {/list}

<#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->

页脚

我的照片书 - 手机博客 - 下载LOFTER APP - 订阅此博客

秒大刀博客

导航

日志

Byte-order mark (BOM)

From Wikipedia, the free encyclopedia

Contents

Usage

Representations of byte order marks by encoding

References

External links

历史上的今天

最近读者

热度

评论

页脚

秒大刀 博客

导航

日志

Byte-order mark (BOM)

From Wikipedia, the free encyclopedia

Contents

Usage

Representations of byte order marks by encoding

References

External links

历史上的今天

最近读者

热度

评论

页脚

秒大刀博客