Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:72644 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 6510 invoked from network); 17 Feb 2014 04:57:42 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 17 Feb 2014 04:57:42 -0000 Authentication-Results: pb1.pair.com smtp.mail=yohgaki@gmail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=yohgaki@gmail.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.215.47 as permitted sender) X-PHP-List-Original-Sender: yohgaki@gmail.com X-Host-Fingerprint: 209.85.215.47 mail-la0-f47.google.com Received: from [209.85.215.47] ([209.85.215.47:35649] helo=mail-la0-f47.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 98/18-56374-4C691035 for ; Sun, 16 Feb 2014 23:57:41 -0500 Received: by mail-la0-f47.google.com with SMTP id hr17so10720049lab.20 for ; Sun, 16 Feb 2014 20:57:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=X8Pdl5eMHBYecraT1yS7SchcuNMyRBaWlVEMIGs+ZlI=; b=PHfgb06HTVpmYAbBLU0i79o23v6Low51qy2OptXz4SF8ZSeKMCZbPMT5u6C6nHbcKv mmOsMhkLDaQ5NUCVJbv/bMOkUykOR+XHV7MhsP7toOciTQcQG+ixnCKoN4wg2xYClt8m vhQiApw9UWpzElwhu1siWVdLVHDZlGx6xEcQY6M39B7QeHPsxdz94h1S4R+jX/b5AH+O /IZzP0etTROwlqXFEVhuVjMN2Gl5hyO63P6HcKmRH0YkW0A/SYqDhqtydjrJCU/ntytD K0LqWfPJ4hw9M4xC2213OPV8uDMS6XN25LBlU/y3d9+r9qjCM1NkxcCcSz5P9c6j2vet G6hg== X-Received: by 10.112.55.65 with SMTP id q1mr15071717lbp.11.1392613057392; Sun, 16 Feb 2014 20:57:37 -0800 (PST) MIME-Version: 1.0 Sender: yohgaki@gmail.com Received: by 10.112.199.37 with HTTP; Sun, 16 Feb 2014 20:56:57 -0800 (PST) In-Reply-To: <53017E72.3050807@sugarcrm.com> References: <50100EC8.3040102@ajf.me> <52FDF7BC.8050408@lsces.co.uk> <52FE46D2.4060903@gmail.com> <52FE6FEA.5050204@lerdorf.com> <53017E72.3050807@sugarcrm.com> Date: Mon, 17 Feb 2014 13:56:57 +0900 X-Google-Sender-Auth: NGahmGxmfu3LtgqMGio_1f7kmL8 Message-ID: To: Stas Malyshev Cc: Rasmus Lerdorf , Rowan Collins , "internals@lists.php.net" Content-Type: multipart/alternative; boundary=001a11c3ee8e98752a04f292feb2 Subject: Re: [PHP-DEV] PHP6 wiki page From: yohgaki@ohgaki.net (Yasuo Ohgaki) --001a11c3ee8e98752a04f292feb2 Content-Type: text/plain; charset=UTF-8 Hi Stas, On Mon, Feb 17, 2014 at 12:13 PM, Stas Malyshev wrote: > > operation. There are a ton of non-obvious things beyond simple string > > manipulation. String collation alone is massively complicated, for > example. > > Oh yes, and if somebody thinks case sensitivity is weird now, wait until > Unicode gets into play. There for some chars when you change the case > string length changes, and for some conversion is not roundtrip-safe. > And you have various long form/short form combining issues which means > you need to normalize everything on every corner. So letting Unicode > into things like identifiers opens a huge container of worms. > Also, if one wants to appreciate what other cans of worms are hiding > there, I recommend this oldie but goodie: > http://stackoverflow.com/a/6163129/214196 > It's about Perl, but we'd have many of the same issues. Nice article. I mostly agree. "Code that converts unknown characters to ? is broken, stupid, braindead, and runs contrary to the standard recommendation, which says NOT TO DO THAT! RTFM for why not." While I agree this (It's BAD to accept broken text as valid input), there are situations that programmer has to handle broken text. Ruby finally admits scrab method is needed. It's available from Ruby 2.1.0. Regards, -- Yasuo Ohgaki yohgaki@ohgaki.net --001a11c3ee8e98752a04f292feb2--