注册 登录  
 加关注
   显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭

秒大刀 博客

好好学习 天天向上

 
 
 

日志

 
 
 
 

验证正则表达式的正则表达式  

2011-09-30 10:07:58|  分类: 技术积累 |  标签: |举报 |字号 订阅

  下载LOFTER 我的照片书  |
验证正则表达式的正则表达式 - 秒大刀 - 秒大刀的城堡

     正则表达式(Regular Expression、regex或regexp,缩写为RE),在计算机科学中,是指一个用来描述或者匹配一系列符合某个句法规则的字符串的单个字符串。在很多文本编辑器或其他工具里,正则表达式通常被用来检索和/或替换那些符合某个模式的文本内容。

    正则表达式在文本合法性验证上简介有效。那能不能用正则表达式来验证正则表达式文本呢?

    Regular expression for Regular expressions? : "Mathematically, it is impossible to validate a regular expression using a regular expression. This is so because (formal) regular expressions can only recognize regular languages. A language is any set of strings. For example, the set of all decimal numbers is a language (which by the way can be described using a regular expression); the set of all valid regular expressions is also a language. Regular languages are languages that require only fixed finite memory (not a function of the input size) to be recognized.

    The language that contains all valid regular expressions is not a regular language; hence it is impossible to recognize a regular expression using a regular expression.

    To understand this, notice that regular expressions contain parentheses in them that must match. Hence, if an "(" has occurred, a ")" must occur later on. This is impossible to describe with a machine that has only fixed finite memory. For, if there were a way to do it, and your regular expression had a finite memory of K different states (for some integer K), an expression with K opening parentheses followed by K closing parentheses, though a valid regular expression would have been unable to be recognized by that machine -- a contradiction (notice that in formal languages, our assumption is that text processing occurs one character at a time, from left to right, which is the same for applied regular expressions). We call languages such as the one that describes regular expressions context-free and not regular.

    (It is trivial to prove that regular expressions do not form a regular language using the Pumping Lemma)

    So, there is a fundamental computer science problem in recognizing regular expressions using regular expressions: It is mathametically impossible to do so.

    Regular languages are possible to be recognized by finite-state automata, i.e. machines with finite states but without memory. To overcome your problem, you need to add some memory which is dependant on the input size. Regular expressions, as they are context-free (fortunately they're not some obscure, hard-to-recognize type of language) can be recognized in linear time using a push-down automaton. This is a "for" loop that goes through the expression one token (usually a character) at a time and keeps track of what it's seen on a stack, i.e. it "pushes" data that it laters "pops" in a first-in-last-out fashion. (Example of data pushed to the stack: "I need to remember to find a matching `)' later on!"; you can "push" this as many times as you need; you can "pop" it later, when you need to check if you actually needed to have matched an opening parenthesis previously).

    Of course, writing your own recognition engine for regular expressions would be a bit of an overhead -- but if you want to do it, you should know the above limitations. It would be more wise to employ an already existing mechanism to do it -- I suspect you could give that job to a regular expression library or a language that is more keen on handling regular expressions such as Perl; but the @-method doesn't sound like too bad of an idea after all: It may be slow, but your users may input terribly slow regular expressions anyway; and it may be a bad practice, but in your case it seems the best possible solution available."

 

    结论:世界上不存在能验证正则表达式的“万能”正则表达式!只有执行了,才知道是否可行。

 

参考:

Is it possible to have regexp that matches all valid regular expressions?

Regular expression for finding a regular expression?


推荐阅读:
  评论这张
 
阅读(797)| 评论(0)
推荐 转载

历史上的今天

在LOFTER的更多文章

评论

<#--最新日志,群博日志--> <#--推荐日志--> <#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇,下一篇--> <#-- 热度 --> <#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->
 
 
 
 
 
 
 
 
 
 
 
 
 
 

页脚

网易公司版权所有 ©1997-2017