Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:116507 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 58915 invoked from network); 25 Nov 2021 10:16:58 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 25 Nov 2021 10:16:58 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 1B2DB180544 for ; Thu, 25 Nov 2021 03:14:08 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-3.7 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS15169 209.85.128.0/17 X-Spam-Virus: No X-Envelope-From: Received: from mail-pl1-f172.google.com (mail-pl1-f172.google.com [209.85.214.172]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Thu, 25 Nov 2021 03:14:07 -0800 (PST) Received: by mail-pl1-f172.google.com with SMTP id u11so4302962plf.3 for ; Thu, 25 Nov 2021 03:14:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=wikimedia.org; s=google; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-transfer-encoding:content-language; bh=EtK4ND1NJSBfM9NL1r1W/mQYYmDO1VzcIBmwnzYzzLU=; b=oFCu939hYDIJBOVLs/9duKfK25DrHlLMPcjnZiPicNndaFtuSAqAtEKre4PkcRbE/w Z2y2rza2VVtUSD6Se8XK08d7b/dg58VsIaKSNHSpzeIaQs6Cq5Ilrn3HOzGvTeHJQhLW cyjaaj7+X/PHwl/MG+wssPl33bDN/jIt8Ynkk= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding :content-language; bh=EtK4ND1NJSBfM9NL1r1W/mQYYmDO1VzcIBmwnzYzzLU=; b=2EYx0h5QTiSWxHqDAmXIKHX3Dpza0pUV7Za1b+w8Crye6okDGpAKeYGFk/8rqkHWBq XaVrdJ6ixkcp0TRY7XX81I+77aBxgg6JUX3tQEjfxCs7sXpVbWVbJaMpstMyFu47YLfp arJ17YwS1fa8maAI8aMk85kZJnZ0dohIo7MFMxpu9d+IIHJp4p+rt+d8vQ0/LB/CnbuV w1AaNhiB1EOH8S8+1hn0+p1Yh4MPrfE2D0yAPv+A7C+Z1XatRvpOne/EHrW9V8IXUYA7 B1QWuvUJZ1oQ9o4bQG1ESrjlU6FnUiTGU/siyFXQXm4DdQHII7OTxNOWz7rlsiewEgVW DRcw== X-Gm-Message-State: AOAM533AacxS85ONZVW9MJ8Fc2NBqljIplnGST919arUdAbWkWZuFkbR 3cOIeW0KE69a73gIVdKwShW80YQExCb8Uw== X-Google-Smtp-Source: ABdhPJwiLf/7e7NAsOoOlShV5XWQgkNNsf3JsZHxApnnUvu0iZa7l21OEOnddrR/TfUXMzooH9wSCg== X-Received: by 2002:a17:90a:7d09:: with SMTP id g9mr5844403pjl.199.1637838845987; Thu, 25 Nov 2021 03:14:05 -0800 (PST) Received: from [10.1.1.45] (124-168-155-31.dyn.iinet.net.au. [124.168.155.31]) by smtp.gmail.com with ESMTPSA id q1sm3340298pfu.33.2021.11.25.03.14.04 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 25 Nov 2021 03:14:05 -0800 (PST) To: =?UTF-8?Q?C=c3=b4me_Chilliet?= , internals@lists.php.net References: <757fcf17-4d8b-0eee-8226-e88705d92795@wikimedia.org> <5769524.lOV4Wx5bFT@come-prox15amd> Message-ID: Date: Thu, 25 Nov 2021 22:14:01 +1100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.13.0 MIME-Version: 1.0 In-Reply-To: <5769524.lOV4Wx5bFT@come-prox15amd> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Content-Language: en-US Subject: Re: [PHP-DEV] [VOTE] Locale-independent case conversion From: tstarling@wikimedia.org (Tim Starling) On 25/11/21 7:57 pm, Côme Chilliet wrote: > Hello, > > The RFC is missing information about alternatives: > Do all of these function have an mbstring version? The following functions have an mbstring version: strtolower, strtoupper, stristr, stripos, strripos. mb_convert_case() provides functionality equivalent to lcfirst, ucfirst and ucwords. There is no mbstring version of str_ireplace, that is https://bugs.php.net/bug.php?id=75225 There is no mbstring equivalent for the array sorting functions with SORT_FLAG_CASE, but there is Collator::asort() in intl. > Are those locale dependant or have an option for it? The mbstring functions are locale-independent. Unfortunately there do not seem to be PHP wrappers for the family of case conversion functions in ICU's ustring.h. There is IntlChar::tolower() and IntlChar::toupper(), but they provide locale-independent case conversion, equvialent to mbstring. It's not ideal to change the case of a string character by character, since some languages have multi-character mappings. ICU calls this context-sensitive case conversion. Considering the lack of wide character support or context-sensitive case conversion in the existing strtoupper/strtolower, I would consider this missing functionality rather than functionality which I am removing. > To reuse the example from the RFC, if I want to convert a UTF string to uppercase using Turkish rules and get dotted capital I, what should I use? For case-insensitive comparison you can use Collator. But for display you just have to do it yourself. For the Turkish Wikipedia and other Turkic language websites we are currently using str_replace(). -- Tim Starling