Hi everyone,
Identifiers in PHP source code (including variables names with $) 
conform to the regex /[_a-zA-Z\x7F-\xFF][_0-9a-zA-Z\x7F-\xFF]*/. Most of 
this regex is pretty standard: it allows alphanumeric ASCII characters 
and underscores, plus any character with the 8th bit set (presumably to 
allow any extension of ASCII, such as Latin-1 or UTF-8, to be used).
But there's one part of this I find rather curious: why is \x7F 
included? It's not a high-byte/8-bit character, it's a 7-bit ASCII 
character, and a control character at that. Unless there's some ASCII 
extension which reuses that value as a printing character, I assume it 
must have been a mistake to include this character. As a control 
character, it is invisible and difficult to type, and it might do weird 
things in some terminal emulators. I can't see the value in permitting 
it within an identifier.
I've done a little bit of looking around, and I can't find an important 
ASCII extension which changes what 0x7F does. Given that, I assume it 
was simply a mistake. But one of you might be able to enlighten me 
otherwise.
I've filed a bug report, and made a patch to fix this in php-src and 
php-langspec master:
https://bugs.php.net/bug.php?id=71897
Thanks!
Andrea Faulds 
https://ajf.me/
Hi everyone,
Identifiers in PHP source code (including variables names with $) conform
to the regex /[_a-zA-Z\x7F-\xFF][_0-9a-zA-Z\x7F-\xFF]*/. Most of this regex
is pretty standard: it allows alphanumeric ASCII characters and
underscores, plus any character with the 8th bit set (presumably to allow
any extension of ASCII, such as Latin-1 or UTF-8, to be used).But there's one part of this I find rather curious: why is \x7F included?
It's not a high-byte/8-bit character, it's a 7-bit ASCII character, and a
control character at that. Unless there's some ASCII extension which reuses
that value as a printing character, I assume it must have been a mistake to
include this character. As a control character, it is invisible and
difficult to type, and it might do weird things in some terminal emulators.
I can't see the value in permitting it within an identifier.I've done a little bit of looking around, and I can't find an important
ASCII extension which changes what 0x7F does. Given that, I assume it was
simply a mistake. But one of you might be able to enlighten me otherwise.I've filed a bug report, and made a patch to fix this in php-src and
php-langspec master:https://bugs.php.net/bug.php?id=71897
Thanks!
Andrea Faulds
https://ajf.me/--
Interestingly, extract() skips keys with \x7F: https://3v4l.org/ZC9ZA
Scott Arciszewski 
Chief Development Officer 
Paragon Initiative Enterprises https://paragonie.com/
On Fri, Mar 25, 2016 at 1:25 PM, Scott Arciszewski scott@paragonie.com 
wrote:
Hi everyone,
Identifiers in PHP source code (including variables names with $) conform
to the regex /[_a-zA-Z\x7F-\xFF][_0-9a-zA-Z\x7F-\xFF]*/. Most of this
regex
is pretty standard: it allows alphanumeric ASCII characters and
underscores, plus any character with the 8th bit set (presumably to allow
any extension of ASCII, such as Latin-1 or UTF-8, to be used).But there's one part of this I find rather curious: why is \x7F included?
It's not a high-byte/8-bit character, it's a 7-bit ASCII character, and a
control character at that. Unless there's some ASCII extension which
reuses
that value as a printing character, I assume it must have been a mistake
to
include this character. As a control character, it is invisible and
difficult to type, and it might do weird things in some terminal
emulators.
I can't see the value in permitting it within an identifier.I've done a little bit of looking around, and I can't find an important
ASCII extension which changes what 0x7F does. Given that, I assume it was
simply a mistake. But one of you might be able to enlighten me otherwise.I've filed a bug report, and made a patch to fix this in php-src and
php-langspec master:https://bugs.php.net/bug.php?id=71897
Thanks!
Andrea Faulds
https://ajf.me/--
Interestingly,
extract()skips keys with \x7F: https://3v4l.org/ZC9ZA\
Also the keys after the \x7F were not present in HHVM, PHP7, however in 
5.5-5.6 you get 
[9]=>string(1) "" 
along with they key that came after it. That's very strange indeed!
Scott Arciszewski
Chief Development Officer
Paragon Initiative Enterprises https://paragonie.com/