4.14 其他方式定义词法规则 | 【译】Python Lex Yacc手册

上面的例子，词法分析器都是在单个的Python模块中指定的。如果你想将标记的规则放到不同的模块，使用module关键字参数。例如，你可能有一个专有的模块，包含了标记的规则： ~~~ # module: tokrules.py # This module just contains the lexing rules # List of token names. This is always required tokens = ( 'NUMBER', 'PLUS', 'MINUS', 'TIMES', 'DIVIDE', 'LPAREN', 'RPAREN', ) # Regular expression rules for simple tokens t_PLUS = r'\+' t_MINUS = r'-' t_TIMES = r'\*' t_DIVIDE = r'/' t_LPAREN = r'\(' t_RPAREN = r'\)' # A regular expression rule with some action code def t_NUMBER(t): r'\d+' t.value = int(t.value) return t # Define a rule so we can track line numbers def t_newline(t): r'\n+' t.lexer.lineno += len(t.value) # A string containing ignored characters (spaces and tabs) t_ignore = ' \t' # Error handling rule def t_error(t): print "Illegal character '%s'" % t.value[0] t.lexer.skip(1) ~~~ 现在，如果你想要从不同的模块中构建分析器，应该这样（在交互模式下）： ~~~ >>> import tokrules >>> lexer = lex.lex(module=tokrules) >>> lexer.input("3 + 4") >>> lexer.token() LexToken(NUMBER,3,1,1,0) >>> lexer.token() LexToken(PLUS,'+',1,2) >>> lexer.token() LexToken(NUMBER,4,1,4) >>> lexer.token() None ~~~ `module`选项也可以指定类型的实例，例如： ~~~ import ply.lex as lex class MyLexer: # List of token names. This is always required tokens = ( 'NUMBER', 'PLUS', 'MINUS', 'TIMES', 'DIVIDE', 'LPAREN', 'RPAREN', ) # Regular expression rules for simple tokens t_PLUS = r'\+' t_MINUS = r'-' t_TIMES = r'\*' t_DIVIDE = r'/' t_LPAREN = r'\(' t_RPAREN = r'\)' # A regular expression rule with some action code # Note addition of self parameter since we're in a class def t_NUMBER(self,t): r'\d+' t.value = int(t.value) return t # Define a rule so we can track line numbers def t_newline(self,t): r'\n+' t.lexer.lineno += len(t.value) # A string containing ignored characters (spaces and tabs) t_ignore = ' \t' # Error handling rule def t_error(self,t): print "Illegal character '%s'" % t.value[0] t.lexer.skip(1) # Build the lexer def build(self,**kwargs): self.lexer = lex.lex(module=self, **kwargs) # Test it output def test(self,data): self.lexer.input(data) while True: tok = lexer.token() if not tok: break print tok # Build the lexer and try it out m = MyLexer() m.build() # Build the lexer m.test("3 + 4") # Test it ~~~ 当从类中定义lexer，你需要创建类的实例，而不是类本身。这是因为，lexer的方法只有被绑定（bound-methods）对象后才能使PLY正常工作。当给lex()方法使用module选项时，PLY使用`dir()`方法，从对象中获取符号信息，因为不能直接访问对象的`__dict__`属性。（译者注：可能是因为兼容性原因，__dict__这个方法可能不存在）最后，如果你希望保持较好的封装性，但不希望什么东西都写在类里面，lexers可以在闭包中定义，例如： ~~~ import ply.lex as lex # List of token names. This is always required tokens = ( 'NUMBER', 'PLUS', 'MINUS', 'TIMES', 'DIVIDE', 'LPAREN', 'RPAREN', ) def MyLexer(): # Regular expression rules for simple tokens t_PLUS = r'\+' t_MINUS = r'-' t_TIMES = r'\*' t_DIVIDE = r'/' t_LPAREN = r'\(' t_RPAREN = r'\)' # A regular expression rule with some action code def t_NUMBER(t): r'\d+' t.value = int(t.value) return t # Define a rule so we can track line numbers def t_newline(t): r'\n+' t.lexer.lineno += len(t.value) # A string containing ignored characters (spaces and tabs) t_ignore = ' \t' # Error handling rule def t_error(t): print "Illegal character '%s'" % t.value[0] t.lexer.skip(1) # Build the lexer from my environment and return it return lex.lex() ~~~