Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:122322 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 37901 invoked from network); 7 Feb 2024 03:56:49 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 7 Feb 2024 03:56:49 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1707278261; bh=rC9cyWCAloHIVJlZufx2bfAvFsdesI6XTIfKATFY7pM=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=WIZ/EjatBJ7/jg2IxASk2B0/N09L1U8LOpeVrupocFXd3BlPoOXuu+R8f93maKoz7 +YmLzY9TBcPn2PYy/bTZGLcAnKF+Kqn19dgeymWb2EbgT0Z5KVEZREGwvd29azUCLd 3/yVYl5IyfSTtaKU5riEvZuaNcggUeTQK04i16rQIryIcQGKKyFuKzMmxp5M6v0yNM 2pXbb6Nd4AIpJtF9n0A5uVZVu/+9OqTe/PYXq/aQJrvXAMTUtcEeeXGEFtUD6wddWg Fjcgt0jN0VckOwlvGf56rCBgDWY5JFoi/C+24GEOGwk69EvYFzNH8ITzl722JN69cC LbZrZj3HswYMQ== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 3BBA5180039 for ; Tue, 6 Feb 2024 19:57:40 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,BODY_8BITS, DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, DMARC_PASS,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from mail-pg1-f175.google.com (mail-pg1-f175.google.com [209.85.215.175]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Tue, 6 Feb 2024 19:57:39 -0800 (PST) Received: by mail-pg1-f175.google.com with SMTP id 41be03b00d2f7-5dc11fdddd6so130550a12.1 for ; Tue, 06 Feb 2024 19:56:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=wikimedia.org; s=google; t=1707278206; x=1707883006; darn=lists.php.net; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=lo8qDC7vfYs5GgZaj/DXqPAiAp57bblILFJIl78xxDM=; b=bI1vXBGsoNGb0zxMRmlJ3N1d0XRTb7uRC5Zc+ZP2VNUCPoR2jke/qZAihvu/D1hORY LwgNoChccdtZMJyoyK/Ni1cp7c9d6yKte9vo4LUOzVcEU03PxkCQu8CWj78BBghk63pS NvIEjNo3GMaZkh4AYEDMlK1tJFzo7e2xVBiRA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707278206; x=1707883006; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=lo8qDC7vfYs5GgZaj/DXqPAiAp57bblILFJIl78xxDM=; b=coQc9CWF0WfgKDDz5N7VVpCXo3CgorBTHzo86asrohIIgco+K6I3c8JdHXfrI6rsaF jWhgQa0NBsodQnK9UShWMelfjIeoVAW5TjgCzz6DTzCvG7qPothllGZF4QFt7rs7WkqO /9w9KWMmu2s0NrY5+4BMn5yCKT8XFziahBJt9xz8lWNn0/bf2balBoJZ7TANW0gF6uYg 0HUik6EotvaYLayzDICno1Efym6cpOFIxMmmwjq9h/0LBiUeJOe/p8WQOgU9gTag5OVH JMVXS2/TnzOCErcsWhDWSyMY5FoZxmFLTpNKQ084eVRu6l7hURzx1zOpotQ9SRNmJOfn Bj7Q== X-Forwarded-Encrypted: i=1; AJvYcCW9UL7penq/n/n/6p309eRS9b2uMatCoCakrufqR1ICQT0Dwi34dwqDxCaUFtbn4AHNqm7HuWhzUyuI0B8evW2omA20i6vSFw== X-Gm-Message-State: AOJu0YzMc+XTuyfxbOgQcylSo9JAEhRoHR5Q1keFPmaRmP9a36TJ1923 l0qpXhQGhyFIZ7jkXAjhisux9Yt9K9Oo3em1+swf6nSwfGnTuHLv1sV7fHt8Pqr0k8KgFRoHu2g e X-Google-Smtp-Source: AGHT+IHwGippGrvTuFT3lZ9+pgERxmPm+d1IYsN3YzQis72yO6tOZ0x/iWtTsAkmdQsd9rwSbvj4Wg== X-Received: by 2002:a17:90a:f98e:b0:293:fc07:22c7 with SMTP id cq14-20020a17090af98e00b00293fc0722c7mr1622087pjb.47.1707278205762; Tue, 06 Feb 2024 19:56:45 -0800 (PST) X-Forwarded-Encrypted: i=1; AJvYcCXOvezroWRiTBFxxkmBFWCHJHbx4+sliUCTupzmQgqj+xJMm5z9aJsL3PyCwFZ0bi8rjI7pYNuGpsN6xD4y8BnsiTYcgpQJqg== Received: from [10.1.1.45] (124-168-138-242.dyn.iinet.net.au. [124.168.138.242]) by smtp.gmail.com with ESMTPSA id x20-20020a17090aca1400b00290f9e8b4f9sm366645pjt.46.2024.02.06.19.56.43 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 06 Feb 2024 19:56:45 -0800 (PST) Message-ID: <7e42d1eb-4924-4a57-a4c1-412a665a4496@wikimedia.org> Date: Wed, 7 Feb 2024 14:56:40 +1100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Content-Language: en-US To: Ayesh Karunaratne Cc: youkidearitai , php internals References: <29b0a205-8903-4ae8-b1e4-45db846fee7f@wikimedia.org> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Subject: Re: [PHP-DEV][VOTE][RFC] mb_ucfirst and mb_lcfirst functions From: tstarling@wikimedia.org (Tim Starling) On 7/2/24 13:43, Ayesh Karunaratne wrote: > > Hi Tim, > Now that the RFC is restarted, could you mention some examples in > Georgian that might be good test cases? > > I was thinking there might be some good test cases in Turkish, but > couldn't find any. The RFC has examples > (https://github.com/php/php-src/pull/13161) in Vietnamese, but they > are correct for both "uppercase first character" and titlecase > conversions. Any Georgian word would do. Your ASCII test case is "abc". The Georgian equivalent for that would be "აბგ" (ani bani gani, U+10D0 U+10D1 U+10D2) which should remain the same after passing through mb_ucfirst(). Compare mb_strtoupper("აბგ") -> "ᲐᲑᲒ" (U+1C90 U+1C91 U+1C92). On the task I mentioned that ligatures are also affected. I gave the example mb_ucfirst("lj") -> "Lj", that is, U+01C9 -> U+01C8. You could add a test case for that. Compare mb_strtoupper("lj") -> "LJ" (U+01C7). To repeat my rationale -- we can view ucfirst() either through a technical lens (convert the first character of a string to upper case) or through a natural language lens (convert a string to sentence case, with the initial letter capitalised per local conventions). I am arguing to make mb_ucfirst() be a natural language extension of ucfirst(), because applying the technical extension would produce results that look quite jarring in a natural language context. There are some edge cases which are not quite right. To really do a good job, a new case map will be needed. But if we document it as being for natural language, and set the right expectations, we can fix the edge cases later. -- Tim Starling