Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:92298 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 32979 invoked from network); 14 Apr 2016 12:43:45 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 14 Apr 2016 12:43:45 -0000 Authentication-Results: pb1.pair.com smtp.mail=fsb@thefsb.org; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=fsb@thefsb.org; sender-id=pass Received-SPF: pass (pb1.pair.com: domain thefsb.org designates 173.203.187.99 as permitted sender) X-PHP-List-Original-Sender: fsb@thefsb.org X-Host-Fingerprint: 173.203.187.99 smtp99.iad3a.emailsrvr.com Linux 2.6 Received: from [173.203.187.99] ([173.203.187.99:53969] helo=smtp99.iad3a.emailsrvr.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 6B/F0-25796-0809F075 for ; Thu, 14 Apr 2016 08:43:45 -0400 Received: from smtp21.relay.iad3a.emailsrvr.com (localhost.localdomain [127.0.0.1]) by smtp21.relay.iad3a.emailsrvr.com (SMTP Server) with ESMTP id 3C0D81802C6; Thu, 14 Apr 2016 08:43:42 -0400 (EDT) X-Auth-ID: fsb@thefsb.org Received: by smtp21.relay.iad3a.emailsrvr.com (Authenticated sender: fsb-AT-thefsb.org) with ESMTPSA id 230661802EB; Thu, 14 Apr 2016 08:43:42 -0400 (EDT) X-Sender-Id: fsb@thefsb.org Received: from yossy.local (c-66-30-62-12.hsd1.ma.comcast.net [66.30.62.12]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA) by 0.0.0.0:587 (trex/5.5.4); Thu, 14 Apr 2016 08:43:42 -0400 To: Sara Golemon , PHP internals References: <57050CAB.1040302@php.net> <570BD2A2.4040504@gmail.com> Message-ID: <570F9075.4010205@thefsb.org> Date: Thu, 14 Apr 2016 08:43:33 -0400 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [PHP-DEV] IntlCharsetDetector From: fsb@thefsb.org (Tom Worster) On 4/11/16 6:11 PM, Sara Golemon wrote: > On Mon, Apr 11, 2016 at 9:36 AM, Stanislav Malyshev wrote: >> The point is even imperfect detection may be useful in certain >> circumstances, and detector being part of ICU hints that people find it >> useful enough to spend time implementing and supporting it. We should >> not ignore that. >> > Well, Stas, your informal thumbs up to the idea means enough to me to > at least formalize it into an RFC even though I was previously feeling > negative on it. > > I may yet vote no on my own RFC after the discussion period, but as > you say it's worth considering the fact that someone thought it > reasonable enough to actually build into ICU... The general problem is impossible. If you constrain the question, for example as Stas says by knowing the language and choosing between a given set of codes, then you may have success. And I'm sure I'm not alone in sometimes using a simple heuristic to choose between cp1252 and utf8. But this does not logically imply that ICU CharsetDetector is a suitable solution in such cases or that it's a good API or a decent implementation. Or that PHP should expose it. An SO chat doesn't necessarily count as a feature request. I'd rather people engineered real solutions specific to their requirements than resort to any of the failed attempts to solve the general problem. Tom