Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:43844 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 61292 invoked from network); 4 May 2009 22:29:00 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 4 May 2009 22:29:00 -0000 Authentication-Results: pb1.pair.com smtp.mail=php_lists@realplain.com; spf=permerror; sender-id=unknown Authentication-Results: pb1.pair.com header.from=php_lists@realplain.com; sender-id=unknown Received-SPF: error (pb1.pair.com: domain realplain.com from 209.151.69.1 cause and error) X-PHP-List-Original-Sender: php_lists@realplain.com X-Host-Fingerprint: 209.151.69.1 liberty.vosn.net Linux 2.4/2.6 Received: from [209.151.69.1] ([209.151.69.1:43309] helo=liberty.vosn.net) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 8B/63-57065-92C6FF94 for ; Mon, 04 May 2009 18:28:58 -0400 Received: from 75-121-89-64.dyn.centurytel.net ([75.121.89.64]:49448 helo=pc1) by liberty.vosn.net with smtp (Exim 4.69) (envelope-from ) id 1M16eb-00007V-9s; Mon, 04 May 2009 16:28:53 -0600 Message-ID: To: , Cc: "Nuno Lopes" , "Lukas Kahwe Smith" , "Dmitry Stogov" References: <6604D94D40FD465F992144110B075BB5@pc1> <9D5D4CBF-5CB1-47EC-81F4-59E3C48EEEEF@pooteeweet.org> <49FE9AE7.4000008@php.net> <6886527DD6D44DFAAC29221BAE374B07@pc1> <49FF410B.2030807@php.net> <3BCF6C07D9B2434D91A32A8E28C69A41@pc1> <49FF64CC.3000208@php.net> Date: Mon, 4 May 2009 17:28:50 -0500 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=response Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5512 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.5579 X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - liberty.vosn.net X-AntiAbuse: Original Domain - lists.php.net X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - realplain.com Subject: Re: [PHP-DEV] [PATCH] Scanner "diet" with fixes, etc. From: php_lists@realplain.com ("Matt Wilmas") Hi Brian, ----- Original Message ----- From: "shire" Sent: Monday, May 04, 2009 > Matt Wilmas wrote: >> [...] >> How about this? >> >> #define SET_DOUBLE_QUOTES_SCANNED_LENGTH(len) CG(doc_comment_len) = (len) >> #define GET_DOUBLE_QUOTES_SCANNED_LENGTH() CG(doc_comment_len) >> > > Sure, works for me ;-) Cool. :-) >> [...] >>> Have you considered using the lexer STATES and regex's instead of the >>> manual C code for scanning the rest. It seems like if we have a one-char >>> regex match for what the C code is doing we could handle this in the >>> lexer without a lot of manual intervention (need to look at it more, >>> just >>> a thought I had earlier, the expressions are clearer now with your patch >>> applied) ;-) >> >> It seems that matching one-char-at-a-time with re2c would be more >> complicated than the manual way, not to mention slower than the old >> (current) way. >> >> Do you have any objection (well, you've kinda mentioned some :-)) if I'd >> commit the changes in a little while like Dmitry thought could be done? > > Well I'm wondering if something more along these lines (just did this > on-top of your patch as you cleaned up a lot) might be more appealing. > (I'm not sure how much slower this would be than the current > implementation, obviously it'll be somewhat slower, I'm basically just > doing what you did in C but in the scanner instead of course). > > "#"|"//" { > BEGIN(ST_EOL_COMMENT); > yymore(); > } > > ({NEWLINE}|"%>"|"?>") { > char tmp = *(YYCURSOR-1); > if ((tmp == '%' && CG(asp_tags)) | tmp == '?') { > YYCURSOR -= 2; > } > CG(zend_lineno)++; > BEGIN(ST_IN_SCRIPTING); > return T_COMMENT; > } > > {ANY_CHAR} { > if (YYCURSOR >= YYLIMIT) { > BEGIN(ST_IN_SCRIPTING); > return T_COMMENT; > } > yymore(); > } > > > > Let me know what the thoughts are on the above, if we don't want that > then I say yeah, commit away! Wouldn't it be a little more complicated for strings/heredocs than comments? Or not, haven't thought about it much! :-) And you still need the "manual use" of YYCURSOR, etc. In other words, to me, the scanner rules are doing what the manual switch ()'s case statements do, but in a slower, "roundabout" way. Well, I'm gonna be away for a bit now, but I guess I can commit away when I get back. > -shire - Matt