Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:106042 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 47263 invoked from network); 23 Jun 2019 18:14:15 -0000 Received: from unknown (HELO out3-smtp.messagingengine.com) (66.111.4.27) by pb1.pair.com with SMTP; 23 Jun 2019 18:14:15 -0000 Received: from compute7.internal (compute7.nyi.internal [10.202.2.47]) by mailout.nyi.internal (Postfix) with ESMTP id 9DFC622089; Sun, 23 Jun 2019 11:29:55 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute7.internal (MEProxy); Sun, 23 Jun 2019 11:29:55 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=benramsey.com; h=from:message-id:content-type:mime-version:subject:date :in-reply-to:cc:to:references; s=fm2; bh=aRwHwSJt1P1dxfkxWcbI7Pp U9cWiyRs7ZDKbEODoLB4=; b=IVF9uL7RN1dX9ROxCqLQgb0tNJTEQbtfM+y/jJk +kXWATLH651uj8ageUMPkb/nrWklSZ2/iifNiJNgVbk4bYac9wBg/pUwJy7BIBZA yTqLHVtTd51XEaSMRSLizXtL/MnjnIykbwF/q9lzBSktemcvVdYPE9jH/8RXGvsS Dpa2vaCUQgwDqti+kcA8DJoE5xBUO+9RdZwZA8DcMHOFUSA9o5XueJEFG9k94Zlf vKh5Wkpa056qiqgoGsWoKLedQ94Dv2+II7Wv1WzA4NI6F1OG2eAd04B1bycZttV1 TSYcski9Thtqcs8dyBJfL5pjvGpBHs1Jv4QAabcdW56OZVg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm3; bh=aRwHwS Jt1P1dxfkxWcbI7PpU9cWiyRs7ZDKbEODoLB4=; b=fPsenftkYhraj+IQFPOi5R NkvUYJlDR/nKYDDJr1Y19ZCrP3LOldgjmcT8/SaVDBMlllcaWYCN1Zcsxz8PWyuY JgU4bFZ1iSnAuH/gopDnFSumTXqkJfyvN5Nf1Gw359l2YadAC9Qx+RoxvaFojpgt NBQjc4luUgXsMfNuyTur3HWyUpESRBRe3OnK9BUP0QOORRTwQH5SZB1SICUXq8EZ JrluKc7oLvPUabWxWMF/VYCfZH58mAF2GwjyTb+Y5YqXchkZkaUM8B6V7u9pv5ks Sdqe8yW0lm29fsi4M2lWkrlGtIaKAUIWPU2T+UAZcaGtpORRnO6aNfqxd3kKJIYQ == X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduvddruddtgdelgecutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc fjughrpefhkfgtggfuffgjvfhfofesghdtmherhhdtjeenucfhrhhomhepuegvnhcutfgr mhhsvgihuceosggvnhessggvnhhrrghmshgvhidrtghomheqnecukfhppeeliedriedurd dujedtrdehtdenucfrrghrrghmpehmrghilhhfrhhomhepsggvnhessggvnhhrrghmshgv hidrtghomhenucevlhhushhtvghrufhiiigvpedt X-ME-Proxy: Received: from [10.10.42.56] (h96-61-170-50.lvrgtn.dsl.dynamic.tds.net [96.61.170.50]) by mail.messagingengine.com (Postfix) with ESMTPA id E8B06380076; Sun, 23 Jun 2019 11:29:54 -0400 (EDT) Message-ID: <8CFCFE96-E2B7-456B-85A3-8737754C59D6@benramsey.com> Content-Type: multipart/signed; boundary="Apple-Mail=_7739FCFA-F4B6-421B-8B47-13106E1139EB"; protocol="application/pgp-signature"; micalg=pgp-sha256 Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\)) Date: Sun, 23 Jun 2019 10:29:54 -0500 In-Reply-To: <3E2100B1-7BF7-4C9F-AA77-D82924A2D5FC@gmail.com> Cc: internals@lists.php.net To: Rowan Collins References: <8442f1fa5544b2ca03e7cebbc64e8e5c@wkhudgins.info> <683c5da474e13283030cac3d0c0ec080@wkhudgins.info> <2c37999d1e5372ae6ab48bfce5420796@wkhudgins.info> <2CF672F8-12F5-4D37-8B8C-591A6E695220@benramsey.com> <3E2100B1-7BF7-4C9F-AA77-D82924A2D5FC@gmail.com> X-Mailer: Apple Mail (2.3445.104.11) Subject: Re: [PHP-DEV] [RFC] Desire to move RFC add_str_begin_and_end_functions to a vote From: ben@benramsey.com (Ben Ramsey) --Apple-Mail=_7739FCFA-F4B6-421B-8B47-13106E1139EB Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 > On Jun 23, 2019, at 05:35, Rowan Collins = wrote: >=20 > On 22 June 2019 20:56:24 BST, Ben Ramsey wrote: >> Perhaps it would only be an issue with the case-insensitive versions, >> as Nikita points out? If so, can someone provide some example strings >> where an mb_starts_with_ci() would return true, while >> str_starts_with_ci() would return false? >=20 >=20 > That's easy: any character that has a lower- and uppercase form, and = is not represented as one byte in the target encoding. For that matter, = any such character in the non-ASCII section of a single-byte encoding, = since a non-mbstring case insensitive flag would presumably leave = everything other than ASCII letters untouched. >=20 > So, any non-Latin script, like Greek or Cyrillic; any accented = characters, unless you're lucky and they're represented by ASCII-letter = plus combining modifier; the Turkish "i", which if I remember rightly = has three forms not two; and so on. According to Google, "=C4=B0yi ak=C5=9Famlar=E2=80=9D is the Turkish = phrase for =E2=80=9CGood evening=E2=80=9D (Turkish speakers, please = correct me, if this wrong). However, using the existing mb_* functions, = I can=E2=80=99t get mb_stripos() to return 0 when trying to see if the = string =E2=80=9C=C4=B0YI AK=C5=9EAMLAR=E2=80=9D begins with =E2=80=9Ci=CC=87= yi.=E2=80=9D I=E2=80=99m just using UTF-8, so maybe there=E2=80=99s an encoding issue = here? $string =3D '=C4=B0yi ak=C5=9Famlar'; $upper =3D mb_strtoupper($string); $lowerChars =3D mb_strtolower(mb_substr($string, 0, 3)); var_dump($string, $upper, $lowerChars); var_dump(mb_stripos($upper, $lowerChars)); --Apple-Mail=_7739FCFA-F4B6-421B-8B47-13106E1139EB Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc Content-Description: Message signed with OpenPGP -----BEGIN PGP SIGNATURE----- iHUEAREIAB0WIQToXQMR3fpbrPOmEOewLZeYnIwHGwUCXQ+a8gAKCRCwLZeYnIwH G6fiAP45Ps58EuWIlh8g9jOkU1KJbV+LfbHrVq8+zvxnG2MkGwD+JHdj8O7nfl5s tJ415fMn7dhXW1Oc6Q98A1w56h0dgFM= =laCE -----END PGP SIGNATURE----- --Apple-Mail=_7739FCFA-F4B6-421B-8B47-13106E1139EB--