Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:43643 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 15650 invoked from network); 7 Apr 2009 01:50:31 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 7 Apr 2009 01:50:31 -0000 Authentication-Results: pb1.pair.com header.from=php_lists@realplain.com; sender-id=unknown Authentication-Results: pb1.pair.com smtp.mail=php_lists@realplain.com; spf=permerror; sender-id=unknown Received-SPF: error (pb1.pair.com: domain realplain.com from 209.151.69.1 cause and error) X-PHP-List-Original-Sender: php_lists@realplain.com X-Host-Fingerprint: 209.151.69.1 liberty.vosn.net Linux 2.4/2.6 Received: from [209.151.69.1] ([209.151.69.1:45101] helo=liberty.vosn.net) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 9F/70-12298-561BAD94 for ; Mon, 06 Apr 2009 21:50:30 -0400 Received: from 75-120-53-110.dyn.centurytel.net ([75.120.53.110]:58574 helo=pc1) by liberty.vosn.net with smtp (Exim 4.69) (envelope-from ) id 1Lr0SH-0002mx-Tf; Mon, 06 Apr 2009 19:50:26 -0600 Message-ID: To: "php-dev List" , Cc: "Lukas Kahwe Smith" References: <90FF707C-391F-4FF4-8C5E-2AECEB48EB78@pooteeweet.org> <7C741E86-7720-4C87-8A43-BCC34520F0B5@pooteeweet.org> <0729A9B70D864C6E929169523CEBCB11@pc1> <49DA38C7.10006@php.net> <08D45CD5FF6A4A30BA67DC2D50CB5F52@pc1> <49DA823A.509@php.net> Date: Mon, 6 Apr 2009 20:50:23 -0500 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=response Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5512 X-MIMEOLE: Produced By Microsoft MimeOLE V6.00.2900.5579 X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - liberty.vosn.net X-AntiAbuse: Original Domain - lists.php.net X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - realplain.com Subject: Re: [PHP-DEV] RC2 and integer/float handling in 5.3 From: php_lists@realplain.com ("Matt Wilmas") Hi again Brian, ----- Original Message ----- From: "shire" Sent: Monday, April 06, 2009 > > Hey Matt, > > Matt Wilmas wrote: >> Yep, 5.3's snapshot self-compiled from a couple days ago on Windows (not >> that that should matter). (I'm not regenerating it with re2c, which also >> shouldn't matter; using the existing .c file. I haven't touched the >> scanner stuff in a long time (yet) to regen.) Scanner of course hasn't >> changed since then. > > Here's what I'm currently doing (more or less with some changed paths): > [...] > [1]=> > array(3) { > [0]=> > int(366) > [1]=> > " string(57) "// this comment and trailing blank contain windows CR+LF > [2]=> > int(2) > } As a side note, I just noticed that the full Windows newline (\r\n, CR+LF) isn't getting taken with the comment (\n included in WHITESPACE after), as *nix's \n does. See the " before string(57)? Because the CR is resetting the line I guess, without going to the next. It's this rule: [^\n\r?%>]*{ANY_CHAR} { that's only matching the \r before returning T_COMMENT. Simple enough to fix as well, but I hadn't spotted that one before until I was trying to see why that quote was out-of-place. :-) (This isn't new in 5.3 though...) > [2]=> > array(3) { > [0]=> > int(371) > [1]=> > string(3) " > > " > [2]=> > int(2) > } > } > > > The newlines look like this in the second file: > > // this comment and trailing blank contain windows CR+LF^M$ > ^M$ > > Unfortunately I can't test on a windows build, perhaps you could re-test > or share your reproduction that fails as this seems to work for me unless > I'm of course missing some difference. > > >> Test case is the one in the bug report. :-) Last token is not the >> comment, but whitespace. > > There are two reproductions in the bug report ;-) Oops, forgot about the second one -- I meant the first in the initial report. The part I'm talking about is: "It only seems to occur if there isn't a newline behind the comment." So the easiest way to see is simply: var_dump(token_get_all(' array(3) { [0]=> int(368) [1]=> string(6) " int(1) } } >> Also, the unterminated comment Warning is still missing with "> blah " like it's been since the re2c change (except maybe for the time >> your fix was applied). My changes would clean this up of course, unless >> you do something first. > > I think fixing this would be great as well as the other highlighter test > that was changed. I would just prefer that the scanner handle these > rather than us implementing what is essentially a hand-written scanner > within the lexer file. Yeah, I remember you said that last time. :-) But like the inline HTML scanner part you mentioned then, if it's pretty simple to implement manually, I thought it seemed logical (I don't know if that stuff was possible with how flex worked; it was only after seeing the HTML scanning that I thought, "Ah.") The regex would've generated more code, and probably wouldn't make much difference for readability...? (I still wonder if it wasn't used because it wouldn't work with the re2c issues otherwise.) With the string, etc. scanning, my regular expressions are pretty complicated, to match stuff that isn't very complicated, which generates a LOT of code, and probably aren't that readable or easy to understand, even with the comments. Well anyway, if I do something I'll send it along for analysis! > > -shire > - Matt