Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:43841 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 46117 invoked from network); 4 May 2009 21:32:37 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 4 May 2009 21:32:37 -0000 Authentication-Results: pb1.pair.com header.from=php_lists@realplain.com; sender-id=unknown Authentication-Results: pb1.pair.com smtp.mail=php_lists@realplain.com; spf=permerror; sender-id=unknown Received-SPF: error (pb1.pair.com: domain realplain.com from 209.151.69.1 cause and error) X-PHP-List-Original-Sender: php_lists@realplain.com X-Host-Fingerprint: 209.151.69.1 liberty.vosn.net Linux 2.4/2.6 Received: from [209.151.69.1] ([209.151.69.1:41773] helo=liberty.vosn.net) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 02/80-57065-3FE5FF94 for ; Mon, 04 May 2009 17:32:36 -0400 Received: from 75-121-89-64.dyn.centurytel.net ([75.121.89.64]:49398 helo=pc1) by liberty.vosn.net with smtp (Exim 4.69) (envelope-from ) id 1M15m4-00052k-4G; Mon, 04 May 2009 15:32:32 -0600 Message-ID: <3BCF6C07D9B2434D91A32A8E28C69A41@pc1> To: , Cc: "Nuno Lopes" , "Lukas Kahwe Smith" , "Dmitry Stogov" References: <6604D94D40FD465F992144110B075BB5@pc1> <9D5D4CBF-5CB1-47EC-81F4-59E3C48EEEEF@pooteeweet.org> <49FE9AE7.4000008@php.net> <6886527DD6D44DFAAC29221BAE374B07@pc1> <49FF410B.2030807@php.net> Date: Mon, 4 May 2009 16:32:29 -0500 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=response Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5512 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.5579 X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - liberty.vosn.net X-AntiAbuse: Original Domain - lists.php.net X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - realplain.com Subject: Re: [PHP-DEV] [PATCH] Scanner "diet" with fixes, etc. From: php_lists@realplain.com ("Matt Wilmas") Hi Brian, ----- Original Message ----- From: "shire" Sent: Monday, May 04, 2009 > > Hey Matt, > > Matt Wilmas wrote: > >>>>> +/* To save initial string length after scanning to first variable, >>>>> CG(doc_comment_len) can be reused */ >>>>> +#define double_quotes_scanned_len CG(doc_comment_len) >>>>> + >>> >>> (minor) Maybe we should rename this var if we're going to use it for >>> other >>> purposes, this doesn't really save any typing. Also if we do want the >>> define maybe we should upper case it so it's more obvious? >> >> Yeah, I tried to think of other ways to do it, but just left it trying >> to look like another variable (not to save typing). Well, it can easily >> be changed later if a "cleaner" way is decided... >> > > Yeah I would just prefer if it was more obvious that it is *not* a > variable ;-) How about this? #define SET_DOUBLE_QUOTES_SCANNED_LENGTH(len) CG(doc_comment_len) = (len) #define GET_DOUBLE_QUOTES_SCANNED_LENGTH() CG(doc_comment_len) >>>>> + while (YYCURSOR < YYLIMIT) { >>>>> + switch (*YYCURSOR++) { >>> >>> In the example above, which we have a couple examples of here, we don't >>> obey the YYFILL macro to detect if we have exceeded our EOF. This >>> *might* be a problem, but only really depends on if we intend to use the >>> YYFILL as a solution for exceeding our mmap bounds. >> >> I don't understand what the problem might be? The YYCURSOR < YYLIMIT >> check is what the YYFILL has been doing. If you mean after changes >> later, as long as the the whole thing is mmap()'d (which I'm assuming >> would be the case?), it just "looks" like a standard string, with >> terminating '\0', right? And there's no reading past YYLIMIT. > > Sorry yeah this wouldn't be a problem currently, but only if we try to > fix the mmap issue by using YYFILL to realloc more space into the buffer. > Then that macro would change to something more complicated. (per my > previous replies with Arnaud) Gotcha. If something changes, YYFILL -- or something to handle what needs to be done -- could just be added to the manual parts as necessary, right? > Have you considered using the lexer STATES and regex's instead of the > manual C code for scanning the rest. It seems like if we have a one-char > regex match for what the C code is doing we could handle this in the > lexer without a lot of manual intervention (need to look at it more, just > a thought I had earlier, the expressions are clearer now with your patch > applied) ;-) It seems that matching one-char-at-a-time with re2c would be more complicated than the manual way, not to mention slower than the old (current) way. Do you have any objection (well, you've kinda mentioned some :-)) if I'd commit the changes in a little while like Dmitry thought could be done? > -shire - Matt