Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:92132 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 24917 invoked from network); 7 Apr 2016 16:37:18 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 7 Apr 2016 16:37:18 -0000 Authentication-Results: pb1.pair.com smtp.mail=bishop.bettini@gmail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=bishop.bettini@gmail.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.214.174 as permitted sender) X-PHP-List-Original-Sender: bishop.bettini@gmail.com X-Host-Fingerprint: 209.85.214.174 mail-ob0-f174.google.com Received: from [209.85.214.174] ([209.85.214.174:33065] helo=mail-ob0-f174.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 6A/9F-48788-ABC86075 for ; Thu, 07 Apr 2016 12:37:16 -0400 Received: by mail-ob0-f174.google.com with SMTP id tz8so50926645obc.0 for ; Thu, 07 Apr 2016 09:37:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:sender:in-reply-to:references:from:date :message-id:subject:to:cc; bh=KfEo/+89HpnZZ5mc22ygHEW2AywIsNxwb46yC/tkSoI=; b=gBPQaKqXZw5PFxWDMec9M+PbM7HJAAQrSmEKkAa2IU87ag5JmkoKlQ0gmfJwERc7es 0AXBF5qaTTcgRDh5i0xURDx6wwvSVW4PkTyqRolpj31pu3f9uPvbKXe2CqMDMsW7qZ5P xokk+w/6kJEFVydoenGpnNQAnRvMa8BDSxBnGpuGghuxIC4PMhRY9JHD1ohyXbQoFehG PeWPaHoGZgcCkVLyJ39km4CBpU0E42gUOaARzrYXucJHYoMCVe4qJLN+KNFZh64abjhA 13yRuSItoPPZh7EzLz5Dvv9utQvrYeqKrQ5cb5bnjMoQeu5l/liDooGi56IcEw7Ssfh1 6RZA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:reply-to:sender:in-reply-to :references:from:date:message-id:subject:to:cc; bh=KfEo/+89HpnZZ5mc22ygHEW2AywIsNxwb46yC/tkSoI=; b=aHfL+L9VpinJAhbcYqcP8DfEqnbIjoDCi7U+oPqozF0W+glmrb3VBByubg7liaB4Iv 1dIyA+Mi4A06X1OvpCvVd6Dq3FoqloRARH/C8BDkzuk/RlbIe6txVZ0eT6/9/TXmLpOk b4+3zqG6P9a53FGiYHPe0+ReBMOO7TeHdQkrWRzvp3M8nqYhpSMccaFPyyn3MUxvKSNh h1Pyf3wc5g1bZoGZCYCdERikeKDlWv6GXgAWMKZVg2wWs7jyhVFpVZtr7sk6dIML0BrL SwFGKsU3rwaFVSNqlzpLW05eKhl3vxQdjg/JioApAlu3hPd7r3eHD0h2yde/2Ks/uy2S 2JRw== X-Gm-Message-State: AD7BkJLC5UBltBXnqb9tgKU8MbfsJyi93bS3+6zfF6aRifIs9aKKPnNg8mt2IbvU5TAmmw1CyQagEf5qxOoWvw== X-Received: by 10.60.52.241 with SMTP id w17mr1936428oeo.53.1460047031371; Thu, 07 Apr 2016 09:37:11 -0700 (PDT) MIME-Version: 1.0 Reply-To: bishop@php.net Sender: bishop.bettini@gmail.com Received: by 10.157.3.164 with HTTP; Thu, 7 Apr 2016 09:36:42 -0700 (PDT) In-Reply-To: <57050CAB.1040302@php.net> References: <57050CAB.1040302@php.net> Date: Thu, 7 Apr 2016 12:36:42 -0400 X-Google-Sender-Auth: RCWOnllMH-5Q0HeXPuxdn8rS0tc Message-ID: To: Sebastian Bergmann Cc: PHP internals Content-Type: multipart/alternative; boundary=001a113324aca93e17052fe7b04f Subject: Re: [PHP-DEV] IntlCharsetDetector From: bishop@php.net (Bishop Bettini) --001a113324aca93e17052fe7b04f Content-Type: text/plain; charset=UTF-8 On Wed, Apr 6, 2016 at 9:18 AM, Sebastian Bergmann wrote: > Am 05.04.2016 um 11:05 schrieb Derick Rethans: > > I would advice against adding this. > > > > As you say, it doesn't work properly. As a matter of fact, guessing > > charsets, like timezones, is not possible. You need to know which > > charset something is in. If not, you need to address *that* problem. > > Agreed. The problem is, developers are going to write code to guess character sets. Ironically, PHPUnit attempts to detect UTF-8 . There is also no shortage of SO posts explaining other approaches. My favorite is using a preg_match trick . I'd rather we include the patch for a few reasons: 1. so that there's a modern "standard" method of doing so, and that "standard" method has plenty of documentation that points people to the limitations. 2. to completely expose the underlying ICU, rather than arbitrarily deciding one part isn't good for developers to use. 3. to provide an alternative to mb_detect_encoding. While I can't say if this will or won't cause more user confusion, I do believe this adds value: ICU provides a confidence metric, which no other in-built or buildable solution (to my knowledge) provides. --001a113324aca93e17052fe7b04f--