Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:92839 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 17211 invoked from network); 27 Apr 2016 14:34:14 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 27 Apr 2016 14:34:14 -0000 Authentication-Results: pb1.pair.com header.from=fsb@thefsb.org; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=fsb@thefsb.org; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain thefsb.org designates 173.203.187.67 as permitted sender) X-PHP-List-Original-Sender: fsb@thefsb.org X-Host-Fingerprint: 173.203.187.67 smtp67.iad3a.emailsrvr.com Received: from [173.203.187.67] ([173.203.187.67:38031] helo=smtp67.iad3a.emailsrvr.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 05/76-20013-5EDC0275 for ; Wed, 27 Apr 2016 10:34:14 -0400 Received: from smtp9.relay.iad3a.emailsrvr.com (localhost.localdomain [127.0.0.1]) by smtp9.relay.iad3a.emailsrvr.com (SMTP Server) with ESMTP id BEB963807D9; Wed, 27 Apr 2016 10:34:10 -0400 (EDT) X-Auth-ID: fsb@thefsb.org Received: by smtp9.relay.iad3a.emailsrvr.com (Authenticated sender: fsb-AT-thefsb.org) with ESMTPSA id 9E41A3807EF; Wed, 27 Apr 2016 10:34:10 -0400 (EDT) X-Sender-Id: fsb@thefsb.org Received: from yossy.local (c-66-30-62-12.hsd1.ma.comcast.net [66.30.62.12]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA) by 0.0.0.0:587 (trex/5.5.4); Wed, 27 Apr 2016 10:34:10 -0400 To: Sara Golemon , Yasuo Ohgaki References: Cc: PHP internals Message-ID: <5720CDDC.3080904@thefsb.org> Date: Wed, 27 Apr 2016 10:34:04 -0400 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:38.0) Gecko/20100101 Thunderbird/38.7.2 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [PHP-DEV] [RFC] IntlCharsetDetector From: fsb@thefsb.org (Tom Worster) On 4/26/16 12:10 PM, Sara Golemon wrote: > On Tue, Apr 26, 2016 at 2:06 AM, Yasuo Ohgaki wrote: >> Things might have been changed, but as you've mentioned encoding >> detection is unstable and ICU is poor compared to mbstring's detection >> at least for Japanese encodings. >> > For me, the difference is that I expect further work to be done on > improving ICU, Why do you expect that? When I researched this problem some years ago I had the impression a number of attempted solutions had been published and abandoned. I took this to mean that there was a learning experience that ended with the understanding that it's insoluble. That's why I'm curious if you know of ongoing efforts in ICU. I took a look and saw little activity in the last 10 years. > while I lack that confidence for mbstring. If the API > is in place early on, the library can improve underneath it to the > point it becomes more trustworthy later, but still be usable on older > versions of PHP (linked against newer libicu). How would it becomes more trustworthy? A way to make it trustworthy would need to exist. And somebody would have to work on it. Tom