Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:91949 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 91340 invoked from network); 25 Mar 2016 14:20:18 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 25 Mar 2016 14:20:18 -0000 X-Host-Fingerprint: 178.62.40.5 ajf.me Received: from [178.62.40.5] ([178.62.40.5:13788] helo=localhost.localdomain) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id A3/72-10214-12945F65 for ; Fri, 25 Mar 2016 09:20:17 -0500 Message-ID: To: internals@lists.php.net X-Mozilla-News-Host: news://news.php.net:119 Date: Fri, 25 Mar 2016 14:20:13 +0000 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:43.0) Gecko/20100101 Firefox/43.0 SeaMonkey/2.40 MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Posted-By: 178.62.40.5 Subject: Why is 0x7F permitted in PHP identifiers? From: ajf@ajf.me (Andrea Faulds) Hi everyone, Identifiers in PHP source code (including variables names with $) conform to the regex /[_a-zA-Z\x7F-\xFF][_0-9a-zA-Z\x7F-\xFF]*/. Most of this regex is pretty standard: it allows alphanumeric ASCII characters and underscores, plus any character with the 8th bit set (presumably to allow any extension of ASCII, such as Latin-1 or UTF-8, to be used). But there's one part of this I find rather curious: why is \x7F included? It's not a high-byte/8-bit character, it's a 7-bit ASCII character, and a control character at that. Unless there's some ASCII extension which reuses that value as a printing character, I assume it must have been a mistake to include this character. As a control character, it is invisible and difficult to type, and it might do weird things in some terminal emulators. I can't see the value in permitting it within an identifier. I've done a little bit of looking around, and I can't find an important ASCII extension which changes what 0x7F does. Given that, I assume it was simply a mistake. But one of you might be able to enlighten me otherwise. I've filed a bug report, and made a patch to fix this in php-src and php-langspec master: https://bugs.php.net/bug.php?id=71897 Thanks! -- Andrea Faulds https://ajf.me/