Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:92199 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 19174 invoked from network); 11 Apr 2016 16:36:56 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 11 Apr 2016 16:36:56 -0000 Authentication-Results: pb1.pair.com smtp.mail=smalyshev@gmail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=smalyshev@gmail.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.192.173 as permitted sender) X-PHP-List-Original-Sender: smalyshev@gmail.com X-Host-Fingerprint: 209.85.192.173 mail-pf0-f173.google.com Received: from [209.85.192.173] ([209.85.192.173:35995] helo=mail-pf0-f173.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 50/22-07428-8A2DB075 for ; Mon, 11 Apr 2016 12:36:56 -0400 Received: by mail-pf0-f173.google.com with SMTP id e128so126745244pfe.3 for ; Mon, 11 Apr 2016 09:36:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-transfer-encoding; bh=7IZWalvzc0clDDvWlR2So11NNCMG+44H0MaFeL9U0GY=; b=0LAY44d/9i8EgSaxuuLuNCHFri1k6AUM1Ybvu0sVHWi0vjHx59UHWuLnqPZSxhYNQc ArUUV20cH/6e6SloQyuCeugDhew+syRuewTHKGqgEd4alu3eWSYuN/g8U+5+9WFBiohP AaZPmSs/lrIxRe7zcjCJrulh9fRlETxT9i7UkJNmuZ3QDoEmumj7K+GAWkwpI28UWH55 S6223O3Bp7A3sio2ZhPpkn66zS/nK+7MBW8zhw++U4YOSNzNBBJfg0h8pRDLigVTF2xW Iro4qul0pyl+yAHSMjyj4Qk6Ija6DBFkX1mGM9IPT4VlwvuCIyiemX2FYZzg7cTUtBWl mBSA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=7IZWalvzc0clDDvWlR2So11NNCMG+44H0MaFeL9U0GY=; b=EGDDdyx4Vm4Pve/Y+TvKtoCptI/+90KmTzcW508Yt3dQ+sqcT7vRuOU88hhceRX3fr TU3UJTmYr+GPMhMxAex74lg3kyPS5RIMXPomj65k8gjnKGPO44unHx5TiRzzpt/HAiiy cdl89Po3QdlAu3C9UEZljgGItZ88kBnp2ouQT6slVL1PmajKox7dh9vBiNQZ8IP38Yma sdPKwNKs9GEr8zcLUi9P8vBLFfd5wqluPx1I3ezrj697Cg9YFLjnAWSkbYP/5UKsLPoF F2W3ydz5SWYM4Wy8Vme5YjTv/BW1Ux59gr+aF3dOkqErzmGQFZO6znWUc/9aEPtr4vfM DZtg== X-Gm-Message-State: AD7BkJJUZ762T6RR5TfZX19V+Sto3RXs9YYH7pMz0knt03zfRqDJxGCTrL1GN8SsYGe78A== X-Received: by 10.98.64.4 with SMTP id n4mr34069428pfa.58.1460392613516; Mon, 11 Apr 2016 09:36:53 -0700 (PDT) Received: from Stas-Air.local (76-220-46-95.lightspeed.sntcca.sbcglobal.net. [76.220.46.95]) by smtp.gmail.com with ESMTPSA id b82sm37348179pfd.89.2016.04.11.09.36.51 for (version=TLSv1/SSLv3 cipher=OTHER); Mon, 11 Apr 2016 09:36:51 -0700 (PDT) To: internals@lists.php.net References: <57050CAB.1040302@php.net> Message-ID: <570BD2A2.4040504@gmail.com> Date: Mon, 11 Apr 2016 11:36:50 -0500 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:38.0) Gecko/20100101 Thunderbird/38.7.1 MIME-Version: 1.0 In-Reply-To: <57050CAB.1040302@php.net> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Subject: Re: [PHP-DEV] IntlCharsetDetector From: smalyshev@gmail.com (Stanislav Malyshev) Hi! >> As you say, it doesn't work properly. As a matter of fact, guessing >> charsets, like timezones, is not possible. You need to know which >> charset something is in. If not, you need to address *that* problem. It is true that you can not detect charsets with 100% accuracy. It is, however, also true that many charsets can be distinguished with enough accuracy to make it useful, especially if you know the set of charsets you are dealing with. E.g., Russian had about 5 commonly used encodings before everybody started to use UTF-8, and several exotic ones. Being able to detect at least the major ones while dealing with a heterogeneous library of Russian-language texts is a great help. There may be other cases like this. The point is even imperfect detection may be useful in certain circumstances, and detector being part of ICU hints that people find it useful enough to spend time implementing and supporting it. We should not ignore that. -- Stas Malyshev smalyshev@gmail.com