Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:92256 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 19881 invoked from network); 13 Apr 2016 18:19:07 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 13 Apr 2016 18:19:07 -0000 Authentication-Results: pb1.pair.com header.from=php@fleshgrinder.com; sender-id=unknown Authentication-Results: pb1.pair.com smtp.mail=php@fleshgrinder.com; spf=permerror; sender-id=unknown Received-SPF: error (pb1.pair.com: domain fleshgrinder.com from 212.232.25.163 cause and error) X-PHP-List-Original-Sender: php@fleshgrinder.com X-Host-Fingerprint: 212.232.25.163 mx207.easyname.com Received: from [212.232.25.163] ([212.232.25.163:47073] helo=mx207.easyname.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id A7/64-26321-A9D8E075 for ; Wed, 13 Apr 2016 14:19:07 -0400 Received: from cable-81-173-133-226.netcologne.de ([81.173.133.226] helo=[192.168.178.20]) by mx.easyname.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.84_2) (envelope-from ) id 1aqPNW-0005zH-DX; Wed, 13 Apr 2016 18:19:02 +0000 Reply-To: internals@lists.php.net References: <570C2EB8.4080009@gmail.com> To: Sara Golemon , Stanislav Malyshev Cc: PHP internals Message-ID: <570E8D88.4080100@fleshgrinder.com> Date: Wed, 13 Apr 2016 20:18:48 +0200 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.7.2 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="FHeo3Fn2DhkiOtQG1pETbhKA9nRTnR53K" X-ACL-Warn: X-DNSBL-BARRACUDACENTRAL Subject: Re: [PHP-DEV] [RFC] IntlCharsetDetector From: php@fleshgrinder.com (Fleshgrinder) --FHeo3Fn2DhkiOtQG1pETbhKA9nRTnR53K Content-Type: multipart/mixed; boundary="akaGFtxed3qHKVjFMlsWBI13xI0wF2vJ3" From: Fleshgrinder Reply-To: internals@lists.php.net To: Sara Golemon , Stanislav Malyshev Cc: PHP internals Message-ID: <570E8D88.4080100@fleshgrinder.com> Subject: Re: [PHP-DEV] [RFC] IntlCharsetDetector References: <570C2EB8.4080009@gmail.com> In-Reply-To: --akaGFtxed3qHKVjFMlsWBI13xI0wF2vJ3 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 4/12/2016 1:25 AM, Sara Golemon wrote: > On Mon, Apr 11, 2016 at 4:09 PM, Stanislav Malyshev wrote: >> The API looks a bit strange - new IntlCharsetDetector($text) and then >> detect(). Can't we just have detect([$text])? >> > I went with a direct wrapping of the underlying API because it always > feels like we regret adding magic eventually. It's trivial for some > composer installable library to wrap this into something nicer. In > fact, one probably WOULD use a userspace library in order to provide > fallback to mb_detect_encoding() on PHP < 7.1 >=20 > That said, how do you feel about compromising by adding this function > in addition to the raw API? >=20 > function ucsdet_detect_encoding(string $text, string $hint =3D null, > bool $filter =3D false) { > $det =3D new IntlCharsetDetector($text); > if ($hint !=3D=3D null) { > $det->setDeclaredEncoding($hint); > } > $det->enableInputFiltering($filter); > return $det->detect()['name']; > } >=20 > That'll give simplicity for the 80% case (I have a string, I want a > best guess) but still provide the true API for consideration of > confidence and/or other guesses. >=20 > -Sara >=20 I agree with Stanislav here. I think that the HHVM implementation is much better because it enables dependency injection and does not fall back to arbitrary native arrays. However, the detect method should take an optional text argument directly, as Stanislav proposed. I understand the argument that extending native interfaces sometimes led to problems but simply copying stuff without thinking of better ways is also not really solving all problems. In our native language: class IntlCharsetDetector { fn __construct(string $text =3D null); fn setText(string $text): void; fn setDeclaredEncoding(string $encoding): void; // I incorporated your proposal here because I think it is // extremely useful. fn detect( string $text =3D null, string $hint =3D null, bool $filter =3D false ): IntlEncodingMatch; fn detectAll(string $text =3D null): array; fn inputFilterEnabled(): bool; fn enableInputFilter(): void; fn disableInputFilter(): void; fn getDetectableCharsets(): array; fn enableDetectableCharset(string $charset): void; fn disableDetectableCharset(string $charset): void; static fn getAllDetectableCharsets(): array; } NOTE how I changed the methods that take bool arguments int he Java implementation to direct calls. This removes the necessity to validate the arguments of the caller and results in a very clean API that does not require extensive documentation to be easily comprehensible. class IntlEncodingMatch { fn isValid(): bool; fn getEncoding(): string; fn getConfidence(): int; fn getLanguage(): string; fn getUtf8(): string; } NOTE that I also use *getUtf8* as does HHVM and not simply *getString* as it is used in the Java implementation because again it reduces the amount of documentation reading because it is clear what this method is supposed to do (return). I did not write *getUTF8* though because I think it violates the /camelCase/ rule but that is not really clarified in the PHP Coding Standards and thus hard to argue about. Having the *IntlEncodingMatch* object as a result has the advantage that one does not need to look up the documentation every time they want to handle the result, no creation of the strings to access the offsets at runtime, no error prone mistyping issues plus the added value of type hinting against it. That being said, the procedural functions could still return native arrays. I do not see a reason why the procedural versions have to be 1:1 mapping to the OO versions. --=20 Richard "Fleshgrinder" Fussenegger --akaGFtxed3qHKVjFMlsWBI13xI0wF2vJ3-- --FHeo3Fn2DhkiOtQG1pETbhKA9nRTnR53K Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJXDo2MAAoJEOKkKcqFPVVrrFQP/R+YBAG9RAj4k09Npa+Rqlx0 042Zdp/yXamCKOR556OoWwu2ucZIsY6lwbC9mipgNsIbPkjUokOUii6jNZkbEBhp 2EugI/e+A6jkg8Y9ZAdJi0WiFqjN9Lfq57loB9bb6yRzI5rZabKsKJXGZo+6V2nT XahHZwNIU5UsFwk8CCIgK0YOotDaCigLJpMD5pi8vOo09pUs1yR0MhSgyPH4vDok 2rVDh9QnrmPAIffiN2whmpHhWJq5/pRDDcsAfSb2fY995VBNLUYiv3KYbZm9oJk0 sqrmGLcQTVwwHvvaoaeiR5jys+ub2HtqQi/g7wU/AvCjIZCuYqI4h5WyqRP10JoI v/zXNoyU1tzuKt+T1uARzok613g2OHujMIGtnV4ruapQaUPFX0B+JKkR2psgb7AJ JOEkw34uAmB2EMNOARRvWEzwvhglZzR05LqxktIU34ZhixhWxVJoSKamAsN2cb89 DY+dhHxWqWkR/XMdQLSR7YyMZb7YxnJfPNcK2MX2MCpr72D1i5+o2aIl83Kr6so/ ATq5iAnoS8ksDbM/Dt8lARM0wPzQ/WMiNc/bjk+8TbE6dA32srdA9R30AF7yMeWG heeEBOW0Rjc16EcdKK7HW1SG7+T9PJcUM5tQmnRt/wlrKDQwb/N4TjHoBV5oKa6H 41gByWEThyAPkfJlu1sZ =WeEt -----END PGP SIGNATURE----- --FHeo3Fn2DhkiOtQG1pETbhKA9nRTnR53K--