Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:67495 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 43642 invoked from network); 24 May 2013 15:34:55 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 24 May 2013 15:34:55 -0000 Authentication-Results: pb1.pair.com header.from=adam@adamharvey.name; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=adam@adamharvey.name; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain adamharvey.name designates 209.85.223.169 as permitted sender) X-PHP-List-Original-Sender: adam@adamharvey.name X-Host-Fingerprint: 209.85.223.169 mail-ie0-f169.google.com Received: from [209.85.223.169] ([209.85.223.169:40943] helo=mail-ie0-f169.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id BD/EA-20943-D988F915 for ; Fri, 24 May 2013 11:34:54 -0400 Received: by mail-ie0-f169.google.com with SMTP id u16so12932887iet.0 for ; Fri, 24 May 2013 08:34:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=adamharvey.name; s=google; h=mime-version:sender:in-reply-to:references:from:date :x-google-sender-auth:message-id:subject:to:cc:content-type :content-transfer-encoding; bh=RS9+0QGMpsjlT42hTP1gnPUOj0y6wDSBPqQZWNLBarU=; b=jyMimisUXAjOO+jcUOnU5BsERaUchEg4iGVVs0bfQutbFDwQhV7PghQLpeoURA4fX4 E6fpjiCVmD/fK8++9wG61Dgm9wrGvNvbDKnLI0y5dL8e/xhpuLC8Suh2YB/n1z7yaOeo PCs+mH67Xm/YPag1p7voVAtH/iPqDgiZZErWo= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date :x-google-sender-auth:message-id:subject:to:cc:content-type :content-transfer-encoding:x-gm-message-state; bh=RS9+0QGMpsjlT42hTP1gnPUOj0y6wDSBPqQZWNLBarU=; b=gsJIlZwgW9w40Ksc03cks0OINGl5qCkhoe3f2f3w/VysvD4Uv3mASktcCItdY6CKEs WVlQn7eNj9qWCjAj0d/OtbRdP77kIqNJn/GNxZrHlai9sZ214RgRGKtbnJ76PfGTOuZH j/mmBJZBrRb6Qx9jiM4Tn5UYZwO23ZdMz5akPsjFTARrCWxjOihDA7FV+8LeuXY4W+d8 p9GhxkgfBHqOUTDSFCAOc3TdAntOfGQzGfhuFiVkmnHiJJ6UBS5brM4msX45qS/542DZ QX2A1Ll6Z/yiiqaPGiygwRI23egbdZaPp1QcLe50q/zfTUdMeEMvJWQDjN5TxgAHnNH5 4Oaw== X-Received: by 10.50.115.34 with SMTP id jl2mr1853561igb.24.1369409691627; Fri, 24 May 2013 08:34:51 -0700 (PDT) MIME-Version: 1.0 Sender: adam@adamharvey.name Received: by 10.43.105.138 with HTTP; Fri, 24 May 2013 08:34:31 -0700 (PDT) In-Reply-To: References: <61BC4F17-86D9-4CBD-B185-58A2D4AFAE5F@rouvenwessling.de> Date: Fri, 24 May 2013 08:34:31 -0700 X-Google-Sender-Auth: Ax-AyVXXM7IxNjHvZHPxpfXtVzI Message-ID: To: Ferenc Kovacs Cc: Nikita Popov , PHP internals Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Gm-Message-State: ALoCoQnFZnVSy6TGFu6oMYJNfqfja6l6/qkgTgqb+W7Tkt6lI0ShBK8Qd/jMiGUL8TXU7dJCRAx6 Subject: Re: [PHP-DEV] Proposal for better UTF-8 handling From: aharvey@php.net (Adam Harvey) On 24 May 2013 08:26, Ferenc Kovacs wrote: > On Fri, May 24, 2013 at 3:09 PM, Nikita Popov wrot= e: >> We already have a lot of functions for multibyte string handling. Let me >> list a few: >> >> * The str* functions. Most of them are safe for usage with UTF8. >> Exceptions are basically everything where you manually provide an offset= , >> e.g. writing substr($str, 0, 100) is not safe. substr($str, 0, strpos($s= tr, >> 'xyz')) on the other hand is. >> * The mb* functions. They work with various encodings and usually make = of >> of character offsets and lengths rather than byte offsets and lengths. T= hey >> are not necessary most of the time, but useful for the aforementioned >> substr call with hardcoded offsets. >> * The Intl extension. This give you *real* unicode support, as in >> collations, locales, transliteration, etc. >> * The grapheme* functions which are also part of intl. The work with >> grapheme cluster offsets and lengths. >> >> Anyway, my point is that just adding *yet another* set of string functio= ns >> won't solve anything, just make things even more complicated than they >> already are. I'm not strictly opposed to adding more functions if they a= re >> necessary, but one has to be aware of what there already is and how the = new >> functions will integrate. >> >> Nikita >> > > did you just forgot the pcre functions with the /u modifier?!?! > :P And that's without even touching PECL. :) I agree with Nikita =E2=80=94 I'm not against adding more Unicode/charset handling functions if they make sense (and I haven't looked at the code for this particular proposal yet), particularly if they'd be part of a default build, but enough water has hopefully passed under the bridge since the PHP 6 days that it might be time to canvass ideas on a less piecemeal approach to character set handling and internationalisation for PHP 5.5+1 or PHP 5.5+2. Adam