Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:90173 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 46823 invoked from network); 5 Jan 2016 23:58:01 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 5 Jan 2016 23:58:01 -0000 Authentication-Results: pb1.pair.com header.from=fred@fredemmott.co.uk; sender-id=unknown Authentication-Results: pb1.pair.com smtp.mail=fred@fredemmott.co.uk; spf=permerror; sender-id=unknown Received-SPF: error (pb1.pair.com: domain fredemmott.co.uk from 209.85.220.49 cause and error) X-PHP-List-Original-Sender: fred@fredemmott.co.uk X-Host-Fingerprint: 209.85.220.49 mail-pa0-f49.google.com Received: from [209.85.220.49] ([209.85.220.49:36456] helo=mail-pa0-f49.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 47/C4-21755-8885C865 for ; Tue, 05 Jan 2016 18:58:01 -0500 Received: by mail-pa0-f49.google.com with SMTP id yy13so130404953pab.3 for ; Tue, 05 Jan 2016 15:58:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fredemmott-co-uk.20150623.gappssmtp.com; s=20150623; h=from:content-type:content-transfer-encoding:subject:message-id:date :to:mime-version; bh=6yIhK71bc/YEH4XXAiNKmoRj7E1H9bHt+DLqG+ghdcs=; b=T6LXc33LHRKWhhDq1YO7vKl63i42sDdKc27x/Glqvg6m1WPmrfd2MWxzChFwG4+RrG 4taDn6gju/l/R90MMbauu8BhwWu9xIGxCUI1rd0DRQfrfE45pQFtDv/f6mH7WITHsERw 6olmHPcMjbKUXNZi3lDUA9o+/TUXDPMY4IwCbwBEOF5HDqkyDhCzQ1MrcuKCibFbiaC0 pFZ5DUy3Y64xVj3NvFJNY70ok469Z+HuLclmn/lf8mi21McwYMy31EHq2k68Zznj1iHP PktexHztrpWsKwdAU8wUVZsYnPjva8YwzyypB2A5BgjaD4pN3KXBTNT67xtKppA4qfQS R12A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:content-type:content-transfer-encoding :subject:message-id:date:to:mime-version; bh=6yIhK71bc/YEH4XXAiNKmoRj7E1H9bHt+DLqG+ghdcs=; b=Dz6DweQJ/yEJfF7PN6mgODELHVCXHRhePJcOWVroxpcH/NybM3X2qxHyYE5tZVuzNf x2DyLf7ADzJReB429CAVdF6itEbtaDkD61i17LpRPyR+fomL/7lp693RRrVkw84bL+kn U9uVB++u2XXoPPLtGxEM1isW9GWiipuZpNAMCeThXLhVr9V63y9/X9qc6pZRoBqYtNI8 0AzWq67M44FRVwOaqBAOb6dA1fXMKCOwoZ0nSTikUt4bDApDQ0YbEdGoeYRiudTc1LGI 8wtLkRfQuK98Pg+LwjguDIx0SUyGVy365WwNeS58TqKbPwBUnXyvQhZv1VZnxZyGuodP MQGQ== X-Gm-Message-State: ALoCoQnNPouS86eM0062ZPAxXMavr7gQ8FEDsX+znPIsmIhFg3byFbEA20BZxDPaA4Qu6Rd2NbeWDmVKyOAEmtTMauobF7S6OA== X-Received: by 10.66.232.74 with SMTP id tm10mr82792268pac.128.1452038277208; Tue, 05 Jan 2016 15:57:57 -0800 (PST) Received: from ?IPv6:2620:10d:c082:10e2:aa20:66ff:fe45:115c? ([2620:10d:c090:200::7e1a]) by smtp.gmail.com with ESMTPSA id t78sm63357526pfa.34.2016.01.05.15.57.56 for (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Tue, 05 Jan 2016 15:57:56 -0800 (PST) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Message-ID: Date: Tue, 5 Jan 2016 15:57:55 -0800 To: PHP internals Mime-Version: 1.0 (Mac OS X Mail 9.2 \(3112\)) X-Mailer: Apple Mail (2.3112) Subject: token_get_all(): additional location information, and raw tokens From: fred@fredemmott.co.uk (Fred Emmott) I=E2=80=99m planning on adding this functionality in some form to HHVM, = however if it=E2=80=99s also wanted in PHP, I=E2=80=99d rather not add = something HHVM-specific and will be happy to put up RFCs :) Location Information =E2=80=94=E2=80=94=E2=80=94=E2=80=94 token_get_all() returns a line number for some tokens. I propose adding = an additional TOKEN_EXTENDED_LOCATION flag, that would include: - starting line and character number within that line - ending line and character number within that line T_ENCAPSED_AND_WHITESPACE and T_INLINE_HTML seem to be the most common = cases of start line !=3D=3D end line. Raw Tokens =E2=80=94=E2=80=94=E2=80=94=E2=80=94 While token_get_all() is documented as returning whatever the lexer = sees, in practice third-party software frequently depends on specific = output. This gives you 3 options: 1. limit changes you make to the lexer to preserve BC 2. lie about the tokens to preserve BC 3. break BC In our experience, #3 is not practical and #1 can lead to much more = complicated solutions for problems that would be easily fixable in the = lexer - so we went for #2. For example, HHVM converts: - T_HASHBANG to T_INLINE_HTML - T_ELSEIF to T_ELSE T_WHITESPACE T_IF However, this means that there=E2=80=99s not currently a way to get the = real lexer tokens. I propose adding a TOKEN_RAW flag, which should = explicitly allow implementation-specific tokens and no guarantees about = output stability. For now, this would be a no-op in PHP, however it would give you more = freedom in modifying the lexer in the future (in combination with #2 if = the flag isn=E2=80=99t specified). With thanks, - Fred=