Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:119195 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 45075 invoked from network); 21 Dec 2022 15:51:32 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 21 Dec 2022 15:51:32 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 35EF31804A7 for ; Wed, 21 Dec 2022 07:51:32 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,HTML_MESSAGE, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS15169 209.85.128.0/17 X-Spam-Virus: No X-Envelope-From: Received: from mail-wr1-f43.google.com (mail-wr1-f43.google.com [209.85.221.43]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Wed, 21 Dec 2022 07:51:31 -0800 (PST) Received: by mail-wr1-f43.google.com with SMTP id co23so15375664wrb.4 for ; Wed, 21 Dec 2022 07:51:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=HPxl4VlA6lf5IkZ/9CrG1V8ZHfZDWveU2MFgcyDSgD0=; b=Qfi/vol6JiNGqjrsjfehtYZ3M7Jkn1aCc/WMNwP0MMVCprPBE4deXY3aAInbZPuikX 9zGZTfZguP5bRliwQC8fNNUyb1jx8rXluo4b4BmBORJ0gXCCIztHTF9eruLY0KuQ2p5l YbgeBj+eTD529BYkW6ptEdVId2I3PRjDrf93W2Ts9s9M606hOVzBPRJS2r1dIgvXcdCN QSKk3WtXrx2uVYvZb3PPMPV2S+7VcQzQ3QwY7RzDzKlEX8+0LMJByDocyS77dzwbvJF5 vxGl4sMrj+Vzr+zDFcQm6FqxnqWf+P/8HDvL4N0+IsU1bj01fi4HLMHftxiOOpyoViU1 diOg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=HPxl4VlA6lf5IkZ/9CrG1V8ZHfZDWveU2MFgcyDSgD0=; b=4/x9VGHyQ1y0ofrRJUHbqCMFa2OjLaQXlelv9BzrFDwToxun+QFC/Uwo6fZecXU7jo //AH8DxM9gt0cdem2yJnW8PP9dgtQLKRdiRIR2omC+rFlkIe6GHOpda9ZZCMn2Anm/8/ /0dw0vwljjEfsxX63AJ0IU5/FF7zIpsHAqin5dU5+pZFTMWWWA8e1NfjWypUy8f5dhnO QIHE49N72x8OAkIYVkh2CG01T7YHAfbFL16vMkcYihAbHZ4gCz6QaUvxHIOSLtSVHZx6 R6wyXKDEWL7ycb7EzKYEJB2bG3ZY2c4ezi0hbcLPygEeW7JRvMM19IHq5J1Iqb1mzZSI pmyQ== X-Gm-Message-State: AFqh2kpMkMpULsZasYmMtr+hcLcO70YCADZpmqg3ZiLla7XzATeUJWZf LKf0+8VXQbqfHyq9VwcDtxg0xtBJw31tOhF17X2vNuTUJBI= X-Google-Smtp-Source: AMrXdXu5pN+WMSp3++fDuTqcKD6QzrvN7MKLT3rcBOQzKTY1YUhhjHlxhkhdrYNROy+zWVKl/khflXR/pQswhmiXSwo= X-Received: by 2002:adf:ef4a:0:b0:242:4b9b:5201 with SMTP id c10-20020adfef4a000000b002424b9b5201mr91384wrp.97.1671637890334; Wed, 21 Dec 2022 07:51:30 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: Date: Wed, 21 Dec 2022 15:51:18 +0000 Message-ID: To: internals@lists.php.net Content-Type: multipart/alternative; boundary="000000000000a585b305f05886c9" Subject: Re: [PHP-DEV] [RFC] Unicode Text Processing From: rowan.collins@gmail.com (Rowan Tommins) --000000000000a585b305f05886c9 Content-Type: text/plain; charset="UTF-8" On Wed, 21 Dec 2022 at 11:48, Derick Rethans wrote: > I know what a polyfill is, and I still don't want to see this. > I can 100% guarantee that you will *see* it - as soon as this RFC is even close to being accepted, either the Symfony project or someone else will work on such a polyfill. But if you're not interested in *writing* it, that's fair enough. > I know what it would entail, but I am rejecting it regardless. "Just an > optional argument on a couple of methods" increases the complexity. > Complexity can cut both ways: from the point of view of the user, this... $value = new Text( UConverter::transcode($utf16_value, 'UTF-8', 'UTF-16BE') ); // ... echo UConverter::transcode((string)$value, 'UTF-16BE' , 'UTF-8'); ...is significantly more complex than this: $value = new Text($utf16_value, 'UTF-16BE'); // ... echo $value->asBytes('UTF-16BE'); An additional problem with the long version is that UConverter is only available if ext/intl is enabled (mb_convert_encoding is an alternative, with the same problem) - something which was discussed at length when I proposed removal of utf8_encode and utf8_decode. > That sounds like an argument for having a sort() method where you can > override the collator. I would however expect that most people would not > set a default collation other than "standard" on Text objects though. > And if something more clever needs to be done, this can be overridden in > all methods. > In this case, I think I'm arguing from the same angle as you are on encodings - if setting a per-object default collator is a rare action, and not generally useful, let's eliminate the complexity of supporting it, and just leave the method arguments. > Yes, and that is why the RFC includes a ''TextCollator'' object that > does precisely that. Indeed; I think mostly what I'm saying is that users should always look at the object first, and the string format only if they really need it. Specifically, the current proposal lists parameters as "string $collation" implying this: // strings can be passed directly $a->compareWith($b, 'en-u-ks-level1'); // object is only a way of creating the string $a->compareWith($b, (new TextCollator('en'))->setCaseInsensitive()->getCollationString() ); I'm proposing instead that they be "TextCollator $collation", implying this: // object is the normal argument $a->compareWith($b, (new TextCollator('en'))->setCaseInsensitive()); // strings are only a way of getting an object $a->compareWith($b, TextCollator::fromCollationString('en-u-ks-level1')); A third option would be to make the parameters a union "TextCollator|string $collation", implying this: // object is supported directly $a->compareWith($b, (new TextCollator('en'))->setCaseInsensitive()); // so are strings if you already have one for some reason $a->compareWith($b, 'en-u-ks-level1'); Regards, -- Rowan Tommins [IMSoP] --000000000000a585b305f05886c9--