2008
02.06

During these last months I’ve been developing, just for fun and in my *very* little spare time, my own compiler kit tool.
More than a compiler, it’s a tool for building up compilers (there are others over there, like Lex & Yacc, Flex & Bison in its GNU flavour). Mine is called Bparser.

I’ve put it in my wiki, just in case anybody want to give it a glance.

My tool is better than the LEX & YACC couple in the sense it can parse LR(n) grammars whilst LEX/Yacc only parses LALR ones. It uses a Lookahead and mangle-likes estructures to take some decisions when choosing which grammar rule to use for reduction.

If it eventually cannot decide which rule should be used, it will start to backtrack to find out the rule to apply.

This tool allows ambiguos grammars, so it is very similar to a GLR parser. BISON (GNU’s YACC) can also use the GLR algorithm, but it is less efficient and can take exponential space/time to parse some entries depending on the given grammar.

The calculator example

The calculator example is the “Hello World” of parsers generators. Below there is an calculator definition for mathematical expresions. Simply run the program and enter a mathematic expression like 1+3.2*5^2 and the program will output the result to the screen (with some explanation about rules used). The program will finish when you enter an empty input.

#!/usr/bin/python
# -*- coding: utf-8 -*-

import sys

from bparser import Token
from bparser import Symbol
from bparser import Bparser

class Calc(Bparser):
	def minus_factor(self, symbol, l):
		symbol.value = -l[1].value
		return symbol
	
	def term_mul_term(self, symbol, l):
		symbol.value = l[0].value * l[2].value
		return symbol
	
	def term_div_term(self, symbol, l):
		symbol.value = l[0].value / l[2].value
		return symbol
	
	def term_plus_term(self, symbol, l):
		symbol.value = l[0].value + l[2].value
		return symbol
	
	def term_minus_term(self, symbol, l):
		symbol.value = l[0].value - l[2].value	
		return symbol
	
	def term_pow_term(self, symbol, l):
		symbol.value = l[0].value ** l[2].value
		return symbol
	
	def print_result(self, symbol, l):
		print 'Result is', l[0].value
		return symbol
	
	def lprp_action(self, symbol, l):
		symbol.value = l[1].value
		return symbol

	def default_hook(self, symbol, l):
		symbol.value = l[0].value
	
		print symbol.name, '-->', [x.name for x in l]
		print 'ValueOf(%s) = ' % symbol.name, symbol.value 
	
		return symbol

calc = Calc()

calc.add_token(r'[0-9]+(\.[0-9]+)?([eE][-+]?[0-9]+)?', 'NUMBER')
calc.add_tokens({r'\+': 'PLUS', r'\-': 'MINUS', r'\*': 'MUL', '/': 'DIV', '%': 'MOD', r'\^':'POW'})
calc.add_tokens({r'\(': 'LP', r'\)': 'RP'}) # Escape PARENTHESIS in Regular Expressions
calc.add_token('[ \t\n]+', ('SEPARATOR', Token.skip))

calc.parse_rule('%LEFT (PLUS, MINUS)')
calc.parse_rule('%LEFT (MUL, DIV)')
calc.parse_rule('%RIGHT (POW)')
calc.parse_rule('Program --> expr .print_result')

calc.parse_rule('expr --> term')

calc.parse_rule('term --> term PLUS term .term_plus_term')
calc.parse_rule('term --> term MINUS term .term_minus_term')
calc.parse_rule('term --> term MUL term .term_mul_term')
calc.parse_rule('term --> term DIV term .term_minus_term')
calc.parse_rule('term --> term POW term .term_pow_term')

calc.parse_rule('term --> value')
calc.parse_rule('value --> LP expr RP .lprp_action')
calc.parse_rule('value --> NUMBER')
calc.parse_rule('value --> MINUS NUMBER .minus_factor')


if __name__ == '__main__':
	while True:
		input = sys.stdin.readline().strip()
		if input == '': break
	
		calc.set_buffer(input)
		calc.verbose = False
		calc.start()
		calc.storage.pow_level = []
	
		if calc.parse():
			calc.verbose = True
			calc.mimic()
		else:
			print "Syntax Error."

I’m using this tool in the ZX Spectrum BASIC Compiler, that will serve as a test-language for this tool.

Related posts:

  1. Calling Perl functions from Python
  2. Hamiltonian Path
  3. The ZX Basic Compiler
  4. Python with elegance
  1. Leo (#3):

    Bueno, no es del todo cierto. Esto es una herramienta para hacer compiladores, no un compilador concreto para un lenguaje particular. Y los errores semánticos siempre corren a cargo del programador.

    Respecto a la parte de errores sintácticos, es cierto que está incompleta (al menos con respecto a YACC), pero es mucho más complicado detectar errores sintácticos en gramáticas ambiguas o del tipo LR(n). De hecho es casi imposible.

    Por otra parte, me hubiera gustado que hubieras dejado un enlace a tu MINIJAVA, para echarle un vistazo. :) Si vuelves por aquí, por favor, ponlo.

    Like or Dislike: Thumb up 0 Thumb down 0

  2. Es un compilador muy incompleto.
    No tienes la parte de errores Sintacticos y Semanticos.
    Yo en este momento estoy terminand un MINIJAVA, por asi decirlo donde tengo en cuenta casi todo.

    Like or Dislike: Thumb up 0 Thumb down 0

  3. Paul:

    Yes, that’s the plan. In fact is both an exercise of nostalgia and compilers skills. :)

    I think it will produce rather well optimized code. The syntaxis, on the other hand is much advanced (not like ZX BASIC). It follows the FreeBASIC syntax.

    Like or Dislike: Thumb up 0 Thumb down 0

  4. So, the plan is to make a cross compiler for ZX Basic? Wow! That would be so cool. I wish this existed 25 years ago :)

    I remember fighting with hisoft basic to cram as much as I could into memory. It will be interesting to see if I can find some of my old programs and run them through your example compiler when it’s done.

    One imagines that with faster processing and memory options, such a thing could be remarkably well optimized.

    Paul

    Like or Dislike: Thumb up 0 Thumb down 0