Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:59300 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 97429 invoked from network); 1 Apr 2012 12:19:46 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 1 Apr 2012 12:19:46 -0000 Authentication-Results: pb1.pair.com header.from=ml@anderiasch.de; sender-id=unknown Authentication-Results: pb1.pair.com smtp.mail=ml@anderiasch.de; spf=permerror; sender-id=unknown Received-SPF: error (pb1.pair.com: domain anderiasch.de from 81.169.138.148 cause and error) X-PHP-List-Original-Sender: ml@anderiasch.de X-Host-Fingerprint: 81.169.138.148 ares.art-core.org Linux 2.5 (sometimes 2.4) (4) Received: from [81.169.138.148] ([81.169.138.148:42906] helo=mail.anderiasch.de) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 85/3C-28591-DD7487F4 for ; Sun, 01 Apr 2012 08:19:43 -0400 Message-ID: <4F7847CA.2090307@anderiasch.de> Date: Sun, 01 Apr 2012 14:19:22 +0200 User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.8.1.23) Gecko/20090812 Lightning/0.9 Thunderbird/2.0.0.23 Mnenhy/0.7.5.0 MIME-Version: 1.0 To: internals@lists.php.net Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Question about parser implementation details From: ml@anderiasch.de (Florian Anderiasch) Hey there, due to the widespread acceptance of binary number format (0b1010101) and the growing demand for backwards compatibility I've started to work on support for Roman Numerals (I, II, III, ...) As you might know, this format cannot be strictly parsed from left to right or right to left, as several number values need a look-ahead before being able to compute them (like IV), so my naive first implementation splits the string into tokens (like in 1990 = MCMXC => M,CM,XC => 1000,900,90) then simplifying those 3 on their own, then adding the results, but I'm not sure this could kill performance if calculated inside zend_language_scanner.l. I'd appreciate any hints on how to tackle this serious concern. Btw, in the spirit of x for hex and b for binary I thought about using this syntax: unless anyone has any better suggestions? Greetings, Florian