Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:110761 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 14775 invoked from network); 28 Jun 2020 20:04:54 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 28 Jun 2020 20:04:54 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 2FE801804CE for ; Sun, 28 Jun 2020 11:53:25 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS, WEIRD_QUOTING autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS15169 209.85.128.0/17 X-Spam-Virus: No X-Envelope-From: Received: from mail-wr1-f45.google.com (mail-wr1-f45.google.com [209.85.221.45]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Sun, 28 Jun 2020 11:53:24 -0700 (PDT) Received: by mail-wr1-f45.google.com with SMTP id f18so6397137wrs.0 for ; Sun, 28 Jun 2020 11:53:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:subject:to:message-id:date:user-agent:mime-version :content-transfer-encoding:content-language; bh=KmROLAsx0maSHxCjKn5lZnIvEb+tUblm2GotTzJ0Rxc=; b=CLzalkuoPBApzOrMXiVF1Zcp+y49KhYWhjL98YfSwi10EfqVT2os/UD2h0OoP3pn9F LeM2ulIuEU9GZVcg7TsZ7GV+yZyeUkEb134bNle/xaXFn53Cnhl0b3MpMwMQY00JaCtx NTM+dn577sOFpubmAu5BZPOhpKfvsXcJ+ARiB4m9VkI9hEVN9DjEhpj5DMMu6yIEctTT c+LFxjShqEuqrxha3rM5zWdXCeBx62TxnaDeTvNx6NQRJ9jU6nek14OVBVty5oNmYpno kuaMk9kx5PbXOhpQzAu1L9gag6lFX+imsf+iWQLjIK0urKlxt33LTjJAe89ZxicNzrtR bP2w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:subject:to:message-id:date:user-agent :mime-version:content-transfer-encoding:content-language; bh=KmROLAsx0maSHxCjKn5lZnIvEb+tUblm2GotTzJ0Rxc=; b=QTa0rK/WNkgFgy/3OlAMrAQAk5Kb1VoABIHPbNKr3C4nu5tqtRfRFdaOmWJAPD61rr pTGVxlG0tc/nHrvOOlSXXKs5taGEiEOvZh3AknCwT2chQ8nMlIkTCpy2qZNls8W953JK b5HueqG+tqNXKeABReKBrTNw6c+SdJIfMCeZW4XCsVAF+hP2evaDkA4/dQImWGH5Zvsx 9j2iyGMjwn7EGV67Zp1nTRjfssAMoM1G5hwtcbwThluaFKKeUYq71I+GBhhDL+c7Ittd rsVIno9rxU88AvXDl1xUGMc5meoKp4EqOVL9tn14Tq7u6A5Sqcu695aPypn8W4imgJNi tCPQ== X-Gm-Message-State: AOAM532svBxdqEcbnH06RozsJfVutxrdFOQodvG9M50Nqyx7wkBokRPk w9xq2i1K6DZ0NQ3tiwyexCEdgKz8 X-Google-Smtp-Source: ABdhPJzjn6PDnk2xVNjxzarWJIzM4gUqu+unWMEjxsJh+1ikOpKCkCMGRPuODRsO0MG1qcbsClRAuw== X-Received: by 2002:adf:fe85:: with SMTP id l5mr13084512wrr.333.1593370403324; Sun, 28 Jun 2020 11:53:23 -0700 (PDT) Received: from [192.168.0.22] (cpc84253-brig22-2-0-cust114.3-3.cable.virginm.net. [81.108.141.115]) by smtp.googlemail.com with ESMTPSA id r3sm14225096wmh.36.2020.06.28.11.53.22 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 28 Jun 2020 11:53:22 -0700 (PDT) To: PHP Internals Message-ID: Date: Sun, 28 Jun 2020 19:53:21 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.9.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-GB Subject: Improving output of syntax errors From: rowan.collins@gmail.com (Rowan Tommins) Hi all, During the discussion of "T_PAAMAYIM_NEKUDOTAYIM", there was broad agreement that internal token names should not be included in errors shown to users. I now have an implementation of this. Note that we currently only customise the token descriptions placed into Bison's hard-coded template, so the format remains "syntax error, unexpected %s, expecting %s or %s or %s". I have distinguished between two types of token: * Tokens which always represent the same text (e.g. keywords and operators) are represented by their standard form, e.g. 'unexpected token "static", expecting "function" ...' * Tokens which have variable content are given a user-friendly name, shown as well as the actual text when possible, e.g. 'unexpected identifier "foo", expecting quoted string ...' As a special-case, quoted strings show the string's *content* in double quotes, e.g. 'unexpected quoted string "foo" ...' rather than 'unexpected quoted string ""foo"" ...' or 'unexpected quoted string "'foo'" ...'. A "..." is also included if the text is longer than 30 bytes (where previously it would have been silently truncated). For example, given the following: <<<<<<<<<<<< The current 8.0 alpha will show this: Parse error: syntax error, unexpected '<<' (T_SL), expecting identifier (T_STRING) or static (T_STATIC) or namespace (T_NAMESPACE) or \\ (T_NS_SEPARATOR) in filename.php on line N The proposed patch will instead show this: Parse error: syntax error, unexpected token "<<", expecting identifier or "static" or "namespace" or "\" in filename.php on line N For more examples, see: https://rwec.co.uk/x/php-parse-errors/comparison.html The patch can be reviewed at: https://github.com/php/php-src/pull/5722 I am happy to post a small RFC if people think this requires a vote. Any other feedback is welcome. (As an aside, the other commonly requested change was to include column numbers; this appears to be feasible, but definitely more complex, and with potential performance trade-offs. I hope to re-visit this later.) Regards, -- Rowan Tommins (né Collins) [IMSoP]