Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:92176 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 52272 invoked from network); 8 Apr 2016 18:20:26 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 8 Apr 2016 18:20:26 -0000 Authentication-Results: pb1.pair.com smtp.mail=php@golemon.com; spf=softfail; sender-id=softfail Authentication-Results: pb1.pair.com header.from=php@golemon.com; sender-id=softfail Received-SPF: softfail (pb1.pair.com: domain golemon.com does not designate 209.85.215.67 as permitted sender) X-PHP-List-Original-Sender: php@golemon.com X-Host-Fingerprint: 209.85.215.67 mail-lf0-f67.google.com Received: from [209.85.215.67] ([209.85.215.67:35745] helo=mail-lf0-f67.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 96/81-35810-166F7075 for ; Fri, 08 Apr 2016 14:20:23 -0400 Received: by mail-lf0-f67.google.com with SMTP id o124so10971755lfb.2 for ; Fri, 08 Apr 2016 11:20:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=golemon-com.20150623.gappssmtp.com; s=20150623; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc; bh=ZaCkCcNRJKo7/1ITWVHTaRdq70WV9FW7DVVnSWzN244=; b=EfHNgco70Fhfrp9MgnZO+bm5J2mcKCUJAzh5+V6bHzHK6cE0vssQS7+Ly1XKv8uyzg 2p8+3vuVqPltKBpMmKyDBp0b5o6Wj4ym8YI2OkVIqH19dpWrJpQixauNNTZGCBFyPQuY UVKvSbD9zkdeuemEB/9uDkaRmZbuZZaRqlTwsslzDmHB2BEdFq2NPHtGAUu0B3kefk3P zN8NJyf8YEuyLYesma0i+FZ9XQug2HXamM8Vl6eNKJIJBOeF/p9cOcJKuTQcHXAAb2+J tIE5ZghgV8dOUrPz4Iy9CPyZyfFH6Id/RTgJP29t5rogW1EH4qTz4SB2F/ZCXg2zNzl0 4+zw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:date :message-id:subject:from:to:cc; bh=ZaCkCcNRJKo7/1ITWVHTaRdq70WV9FW7DVVnSWzN244=; b=VNurP57Qs9ikQMva2omPngyjAzPC6vu69zkAkEoVV5uLQawUIlE2/ue/fLclRSB8WT j8tSseSK+ko0VIqMCOLU51bSxMPTE5je8vL53tVFymIWPm5mfSAir6xa8ui4+yxjI3l6 JdHZE5/7DRYRAaA2ThCRT+D60u9V/Nfc+e/BL7uQpK6RGySFGun26iKGvYxhTeyYRYQS RLCu3u44DzdGcqhw0alhSgu6AtfNoYl5Cn/LKecBHe9OHWj6ZJCQbkId/8fAZesm8yE5 x/NYxTxY36DMRrOy67gFZ+Pj+3NkvI7dy2/yOaimEv6ged5QRziGCHImAunF5I+Zv9aw kBJQ== X-Gm-Message-State: AD7BkJKZ5hdoB4etcJZMi5vIuUp00ql66jM9XfYrXSU+Al95uHEbn2st1fJdb4jb7CI91RgdnFUJJn1xVZ5OrQ== MIME-Version: 1.0 X-Received: by 10.112.181.196 with SMTP id dy4mr4151924lbc.42.1460139614416; Fri, 08 Apr 2016 11:20:14 -0700 (PDT) Sender: php@golemon.com Received: by 10.112.18.75 with HTTP; Fri, 8 Apr 2016 11:20:14 -0700 (PDT) X-Originating-IP: [107.198.91.68] In-Reply-To: References: <57050CAB.1040302@php.net> Date: Fri, 8 Apr 2016 11:20:14 -0700 X-Google-Sender-Auth: 3VbhD7UoqXxK5pz68KYn62TCnm4 Message-ID: To: Bishop Bettini Cc: Sebastian Bergmann , PHP internals Content-Type: text/plain; charset=UTF-8 Subject: Re: [PHP-DEV] IntlCharsetDetector From: pollita@php.net (Sara Golemon) On Thu, Apr 7, 2016 at 9:36 AM, Bishop Bettini wrote: > The problem is, developers are going to write code to guess character sets. > True. But they're going to put more faith in something in the standard distribution, assuming it's passed muster. > Ironically, PHPUnit attempts to detect UTF-8 > Awwwwwwwwkward.... > I'd rather we include the patch for a few reasons: > > 1. so that there's a modern "standard" method of doing so, and that > "standard" method has plenty of documentation that points people to the > limitations. > In that spirit, how about we put in some stub documentation under the intl extension with a paragraph or two on why UCharsetDetector *isn't* wrapped, and why it's such a bad idea to try to solve the problem from this end. > 2. to completely expose the underlying ICU, rather than arbitrarily > deciding one part isn't good for developers to use. > Is it arbitrary though? The fact that coming up with test cases which produce reasonable/expected results is half crap-shoot makes this an evidence based decision, not a capricious one. > 3. to provide an alternative to mb_detect_encoding. > And again in that spirit, I think this is a good argument for going E_DEPRECATED on mb_detect_encoding(). The entire conversation which led to prototyping an IntlCharsetDetector extension came from the fact that mb_detect_encoding() wasn't doing its job well. Rather than have two supported, bad solutions, I think it'd be better to have one deprecated (and thus unsupported) bad solution (which is only kept for BC). > While I can't say if this will or won't cause more user confusion, I do > believe this adds value: ICU provides a confidence metric, which no other > in-built or buildable solution (to my knowledge) provides. > The confidence metric is useful, but my spidey sense tells me that it'll simply be ignored. How about a compromise. I'll reorder this patch to be a standalone extension and we PECLize it. If someone REALLY wants to throw caution to the wind, they can, but they're on their own when it gives them fugly results. -Sara