Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:120371 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 30981 invoked from network); 20 May 2023 15:56:25 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 20 May 2023 15:56:25 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 759061804F7 for ; Sat, 20 May 2023 08:56:24 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,NICE_REPLY_A, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS15169 209.85.128.0/17 X-Spam-Virus: No X-Envelope-From: Received: from mail-ed1-f49.google.com (mail-ed1-f49.google.com [209.85.208.49]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Sat, 20 May 2023 08:56:23 -0700 (PDT) Received: by mail-ed1-f49.google.com with SMTP id 4fb4d7f45d1cf-50bcb4a81ceso7690968a12.2 for ; Sat, 20 May 2023 08:56:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1684598182; x=1687190182; h=content-transfer-encoding:in-reply-to:from:references:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=ncIQuWpBLwjkC6OmYKFu8IguBIpqCpXRI3CPHc2oR78=; b=FtrKChDpK/6eDeD1+cFJCIabZUp3BnNzOrvE3t5kcB35GOywfnZTql3DGuzymdPN4t qBwhyF21HpfkHkRNboLX2wOpDC1sfL0vaJ5hw/486w2cCRJYtH/3Mk1E30l3P5QWJ5Vw rxIwvwgyskRb6965RXvG7TQ1z9Ztgg8Ic2hZeHtr8dMXH1j6tXaLXe1CqxvVq314E77V TD1vRrnfFHmhOR7x13imi/IQEd0ZQ5yh4WsgcAtkfKj8jOqR3kPtci2xomiFLdbylljE nrBmYHNChwg1mmKIQlDtwgsBzCl4L1+OUsirmezB7foVfOvto0tmeXp6Q5RA9J000zOH M4uw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684598182; x=1687190182; h=content-transfer-encoding:in-reply-to:from:references:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ncIQuWpBLwjkC6OmYKFu8IguBIpqCpXRI3CPHc2oR78=; b=ZlPYX1uRQoMnWA13sAUMhdP1XJ0PypOA8fStBPHnoaTf1PeuC4C67TlFw795YxuwOp UF68HkA9LdBodqvMUwuFjYmdD20YNep610wDgvVsGBGAZx4Ks+AwfIB017+LGGwPgFnH zixEi9/LUrdWaUWVTn1PDO7Ir4zHY5PV6n6InuyLquB8W1BtaPqtxtlNHsbH5tWkU6TH XwQBynA+GgQmfh2XcGoFZUxYpw0nQpt04SvXKOXFXm8NmX6hCgmmcsS5YQypFDkh63by l19QBXMcWqEtT90risIPRYG8LNkE4hvCbvPSkjH/vmQtSmOOvaaAThZ5NljHV1wMCX32 UKzg== X-Gm-Message-State: AC+VfDxGAGV+TO+v7R4ZZkM50fHDnN3mBInkx71/ykl7JWGJjSQ41r53 /t6NDFQHYH5zeEmd3+2IH4Mo7KE4Lgw= X-Google-Smtp-Source: ACHHUZ5jSDvDloiOhhBap+Pimp6iWmibMmKnnxwYvdUgqh/ApaK182tB+pz2H/ure/Uwa2ROCAIPOA== X-Received: by 2002:a05:6402:393:b0:510:ed49:e7f7 with SMTP id o19-20020a056402039300b00510ed49e7f7mr4287158edv.15.1684598182294; Sat, 20 May 2023 08:56:22 -0700 (PDT) Received: from [192.168.0.59] (178-117-137-225.access.telenet.be. [178.117.137.225]) by smtp.gmail.com with ESMTPSA id y94-20020a50bb67000000b0050bcbb5708asm949634ede.35.2023.05.20.08.56.21 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sat, 20 May 2023 08:56:21 -0700 (PDT) Message-ID: <8a223970-45b3-2d9f-cd91-1389965202ba@gmail.com> Date: Sat, 20 May 2023 17:56:21 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.11.0 Content-Language: en-US To: internals@lists.php.net References: In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Subject: Re: [PHP-DEV] RFC [Discussion]: mb_str_pad From: dossche.niels@gmail.com (Niels Dossche) Hi Rowan On 5/20/23 17:13, Rowan Tommins wrote: > On 20 May 2023 13:53:20 BST, Niels Dossche wrote: >> RFC: https://wiki.php.net/rfc/mb_str_pad > > Hi Niels, > > This seems like a reasonable addition. My only hesitation is that it will share with other mbstring functions the slightly dubious definition of "character" as "code point", rather than "grapheme", when dealing with Unicode strings. > > This is most easily demonstrated using combining diacritics, e.g. "Franc\u{0327}ais" is 9 code points long, but visually identical to the 8 code point "Fran\u{00E7}ais" used in your examples. Unicode defines "graphemes" or "grapheme clusters" to better match the common intuition of what a "character" means. > Thanks for your insight. This is a good point. I've added a clarification in the RFC text to make clear the definition of character is code point in this case, consistent with mbstring. > Perhaps we should instead, or also, add a "grapheme_strpad" function to ext/intl? > > Regards, > I've added this suggestion to the future scope section. Kind regards Niels