Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:43834 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 99816 invoked from network); 4 May 2009 16:43:57 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 4 May 2009 16:43:57 -0000 Authentication-Results: pb1.pair.com header.from=arnaud.lb@gmail.com; sender-id=pass; domainkeys=bad Authentication-Results: pb1.pair.com smtp.mail=arnaud.lb@gmail.com; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.218.161 as permitted sender) DomainKey-Status: bad X-DomainKeys: Ecelerity dk_validate implementing draft-delany-domainkeys-base-01 X-PHP-List-Original-Sender: arnaud.lb@gmail.com X-Host-Fingerprint: 209.85.218.161 mail-bw0-f161.google.com Received: from [209.85.218.161] ([209.85.218.161:58900] helo=mail-bw0-f161.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 0B/18-57065-C4B1FF94 for ; Mon, 04 May 2009 12:43:57 -0400 Received: by bwz5 with SMTP id 5so3984743bwz.23 for ; Mon, 04 May 2009 09:43:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:reply-to:received :in-reply-to:references:date:x-google-sender-auth:message-id:subject :from:to:cc:content-type:content-transfer-encoding; bh=n+4eK1guAWp3hwqvsBWar96gkq1a/SXETIU3pUzWo7s=; b=M6eHkBOPKEYRZMHWAW+Z4evnQoEtvUmPmnxtlOQoANGG2vUEqBghTAUgWiFpj6gcOq kWELZfaVGEIJSpfRecZK43hEqxDVCLKYWAi4hWS45c/E710KmP5lEiRaT8xF+poIJD+u MTOnT4ti1J7o8aLYg9rqATXNBl2Wx1MDYoJ00= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:reply-to:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=caZC9xIGdJkpQQdT2SlpvQfzW+z6NhHGoH/vSbBtAGHhkUIb94ErfUsB0jT3tfVxB3 JMnyyKyxql4xJ4u2dTxtsVZylMDl4F6D3ELaDGz3kIcNzIKOM67uAYKfgKjGyLOzlrz0 AdiNDDBE/NcnixS2ijDtnQ1S5RCI7UpMxllcs= MIME-Version: 1.0 Sender: arnaud.lb@gmail.com Reply-To: lbarnaud@php.net Received: by 10.103.49.12 with SMTP id b12mr3664315muk.98.1241455432962; Mon, 04 May 2009 09:43:52 -0700 (PDT) In-Reply-To: <49FF0EFF.4020109@php.net> References: <6604D94D40FD465F992144110B075BB5@pc1> <9D5D4CBF-5CB1-47EC-81F4-59E3C48EEEEF@pooteeweet.org> <49FE9AE7.4000008@php.net> <49FF0EFF.4020109@php.net> Date: Mon, 4 May 2009 18:43:52 +0200 X-Google-Sender-Auth: 7cee4f3ad39fa2da Message-ID: To: shire@php.net Cc: Matt Wilmas , internals@lists.php.net, Nuno Lopes , Lukas Kahwe Smith , Dmitry Stogov Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [PHP-DEV] [PATCH] Scanner "diet" with fixes, etc. From: lbarnaud@php.net (Arnaud Le Blanc) On Mon, May 4, 2009 at 5:51 PM, shire wrote: > Arnaud Le Blanc wrote: >> >> Hi, >> On Mon, May 4, 2009 at 9:36 AM, shire =C2=A0wrote: >>> >>> Regarding the ZEND_MMAP_AHEAD issue and the temp. fix that Dmitry put i= n >>> we >>> need to find a solution to that, perhaps I can play with that this week >>> too >>> as I think I'm seeing some related issues in my testing of 5.3. >>> =C2=A0Essentially >>> we abuse ZEND_MMAP_AHEAD by adding it to the file size and passing it t= o >>> the >>> mmap call which isn't at all valid and only really works up to PAGESIZE= . >>> =C2=A0We >>> could possibly use YYFILL to re-allocate more space as necessary past t= he >>> end of file to fix this. >> >> I was thinking of doing something like that with YYFILL too. However >> there is a bunch of pointers to take in to account and to update (e.g. >> yy_marker, yy_text, etc). >> > > Yeah, I'm pretty sure that's how most of the example re2c code is setup: > > #define YYFILL(n) {cursor =3D fill(s, cursor);} > > uchar *fill(Scanner *s, uchar *cursor){ > =C2=A0 =C2=A0if(!s->eof){ > =C2=A0unint cnt =3D s->lim - s->tok; > =C2=A0uchar *buf =3D malloc((cnt + 1)*sizeof(uchar)); > =C2=A0memcpy(buf, s->tok, cnt); > =C2=A0cursor =3D &buf[cursor - s->tok]; > =C2=A0s->pos =3D &buf[s->pos - s->tok]; > =C2=A0s->ptr =3D &buf[s->ptr - s->tok]; > =C2=A0s->lim =3D &buf[cnt]; > =C2=A0s->eof =3D s->lim; *(s->eof)++ =3D '\n'; > =C2=A0s->tok =3D buf; > =C2=A0 =C2=A0} > =C2=A0 =C2=A0return cursor; > } > > > > -shire > This is what I seen too, but this is not always applicable. The scanner have code that refers to yy_text, yy_start, yy_cursor, yy_marker, etc. All those pointers point to the original buffer and must be updated by fill(). At each point in time the scanner may rollback to yy_marker or a rule may want to fetch yy_text or yy_start at any time. So the buffer must be large enough to contain all data from min(all_of_them) to max(all_of_them). That makes things a little complicated and potentially less efficient than a big buffer for the whole file. Regards, Arnaud