Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:100690 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 18698 invoked from network); 17 Sep 2017 12:37:23 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 17 Sep 2017 12:37:23 -0000 Authentication-Results: pb1.pair.com smtp.mail=rowan.collins@gmail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=rowan.collins@gmail.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 74.125.82.52 as permitted sender) X-PHP-List-Original-Sender: rowan.collins@gmail.com X-Host-Fingerprint: 74.125.82.52 mail-wm0-f52.google.com Received: from [74.125.82.52] ([74.125.82.52:50949] helo=mail-wm0-f52.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 67/8A-19300-28C6EB95 for ; Sun, 17 Sep 2017 08:37:23 -0400 Received: by mail-wm0-f52.google.com with SMTP id v142so16479110wmv.5 for ; Sun, 17 Sep 2017 05:37:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:user-agent:in-reply-to:references:mime-version :content-transfer-encoding:subject:to:from:message-id; bh=u1QAmMVBRJPkXn1ulQF/wbXvGXO9uJ4RfDSyo8X6R4I=; b=IClSMDt1tH418HmfpCE9KyXygxpQ44BX84gqieVt6sqDakXx0tjPfjLh4FkwbEYraR AphN654n4OvbL52F5QHDsr6m4RivoCqKPUHbIMyP0wPJ2Y+SffwyuqBssNoWJlFuU32U 9a8mAyeWol28gOvaIrhHgYZvlqW8bcqE1KE9c/fLsC/oJbYjSONjAnZBWoWdeOdwyyPR 2DJsXWR46d6lf36nVkgJkSJUGVEU8qMzEWxQuoA5zJWcgJymi+D8nwanOAfOAQhrBIhe q8TjIbjAUzXfEOyfEd31TaoHzsa8qR6IRMbl8zlpOCmN9zKelslSWNbq6Mu9giucGWsN Jc3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:user-agent:in-reply-to:references :mime-version:content-transfer-encoding:subject:to:from:message-id; bh=u1QAmMVBRJPkXn1ulQF/wbXvGXO9uJ4RfDSyo8X6R4I=; b=FJD3xvEcj8dt67z5MXoK9MVCp26PS/uuDluuwjdsIUBfZazyEwY9kCwfbKjHCD6QdI 5+qyAfo3xshvh0r3TQKsEvOjMWpjf+sBtCGVd1BbVVf+1xXHFb0UcHTtR7Jk8DufEwfO w8owJ2NNDNSETCMexpmaE7Cxg+q2R4Oh+H/V2R1liRgqIcndgru7A3919ztbrCwHgFjn jUiIttxkDvm2LR0Mn48fA9Uzd+6WaO+8yjGvaYvpZsZKlEAUxnvSuusEy6B8/M2ggUqX RaB6BE8rpVdeZ4zAVSLWZQGkxcOAipxbxzMJMcZ/fgIk82+43k7q/nxROuDrcp+4smwO fiKw== X-Gm-Message-State: AHPjjUge1EX3X+jEviF8CAC0EEJv6RlT4z/xcRWODAK+kHJEFW+gLUmX k7Yh0xOAQMndsd0MLYM= X-Google-Smtp-Source: AOwi7QDRNt1gQidDuRrJSncOSbA0pqh2O3KHWmXshx3/U9IVsfc7wOZz++NxDEKaoSvOgE32aMPBMw== X-Received: by 10.28.130.131 with SMTP id e125mr7419210wmd.125.1505651839884; Sun, 17 Sep 2017 05:37:19 -0700 (PDT) Received: from [10.62.0.184] (188.29.165.65.threembb.co.uk. [188.29.165.65]) by smtp.gmail.com with ESMTPSA id m9sm5039285wrf.51.2017.09.17.05.37.18 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 17 Sep 2017 05:37:18 -0700 (PDT) Date: Sun, 17 Sep 2017 13:37:13 +0100 User-Agent: K-9 Mail for Android In-Reply-To: <82cc3de5-6aac-6656-cee1-a83e1e3808b0@gmx.de> References: <7E527061-26D5-4E0C-BAF7-A6F1A940053B@gmail.com> <82cc3de5-6aac-6656-cee1-a83e1e3808b0@gmx.de> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable To: internals@lists.php.net Message-ID: Subject: Re: [PHP-DEV] Progress or just 'a mess'? From: rowan.collins@gmail.com (Rowan Collins) On 17 September 2017 13:18:44 BST, "Christoph M=2E Becker" wrote: >On 17=2E09=2E2017 at 12:53, Rowan Collins wrote: > >> I checked the PHP lang-spec repo expecting to find a set of Unicode >classes, but it currently mentions "U+0080-U+00FF": >https://github=2Ecom/php/php-langspec/blob/master/spec/09-lexical-structu= re=2Emd#names >That seems wrong to me, unless I'm looking at the wrong definition - >the first part of that range is control characters, and you can have >variables called things like $=F0=9F=90=98 (with an emoji as the entire n= ame)=2E > >The specification in the PHP manual[1] appears to be more appropriate >for our current implementation: > >| As a regular expression, it would be expressed thus: '[a-zA-Z_\x7f- >| \xff][a-zA-Z0-9_\x7f-\xff]*' > >With regard to control characters: that depends on the chosen character >encoding; for instance in Windows-1252 the =C2=A2 character is mapped to >\xA2=2E > >[1] Ah, so the mistake in the spec is that these aren't actually Unicode code = points at all, but allowed *bytes*, which happen to allow for the UTF8 enco= ding of pretty much any Unicode codepoints=2E That makes much more sense, but doesn't answer the other question, of if t= here's a working definition of what we mean by "case insensitive"=2E Regards, --=20 Rowan Collins [IMSoP]