Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:125751 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id C18241A00BD for ; Fri, 4 Oct 2024 16:20:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1728058990; bh=81p5E0L3wg4r7sdlBUE62yzt27/qDz7llHM4sg1Rp90=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=ZngMlkueJXZAXYVqv2fQDP6ex/hl+dlWRBTRkKSeIg4Z9eVWyvnJz2OoyrM0YcrYD Lhj2xD/OkQ4pP9R+1qo5nezQ1g2Xh59S2Ip8e4bsNxcxMlz3n5tMjubreEvS9yb1YE KE4EwklAJm9GTuT3NoW8qiDGJ1w1LXHYPwvjg5KPzOBD7wAXdSZmF37irhujrYfF+Q 4THTkhFNA8lqdcKWCpAMz5M+LgWCG0s07B8xSSWLvDsfNXu9VQeISucDjnvAWRYY3J 0lnE9Aga7SWKzFvDztOC806dIp7PaHr94NGM8lHYn9CVHT1p8XciEBcC0+g7VcFEYZ TWQbsOFDCjwlA== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 6508A18004D for ; Fri, 4 Oct 2024 16:23:09 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=0.6 required=5.0 tests=BAYES_50,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from chrono.xqk7.com (chrono.xqk7.com [176.9.45.72]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Fri, 4 Oct 2024 16:23:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bastelstu.be; s=mail20171119; t=1728058852; bh=9lwSQilAzE0lun0hR+OoLYkQXj0pgUczaBRnlY87J/g=; h=MIME-Version:Date:From:To:Cc:Subject:In-Reply-To:References: Message-ID:Content-Type:from:to:cc:subject:message-id; b=LT43xLxJzklQMnZjZuRs4KtZjs350GZB/pjF1kEMtvxZEzuMaDmuvHgcVkDJOjszN NECEk6aDs3gWdaP1JydOsOZIWlnOf/mBctBR/eOA8JzvOqJ7rolSo8VK5EfToPHn+w tfbIrwyE+g3P5zJG5uT/WSC644CQcrHb4TgYtmJ0P3n6wk4Y9yeSN8JdUbcIqwsUc/ tkMIp85+IjeiffbqyrgAoGo15fgcrxsC3kYM07wGSJ4cPcSUCGFV7N2o3iImLaHy5T b5z+Dov75gNhDvbeP9ENHyaXL3THtc3uc1E8dYS+KpDb6X2Ewc9Y3hir/j2iICdmeC Cc8GilGIjU+/w== Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net x-ms-reactions: disallow MIME-Version: 1.0 Date: Fri, 04 Oct 2024 18:20:52 +0200 To: youkidearitai Cc: php internals Subject: Re: [PHP-DEV][DISCUSSION] Multibyte for levenshtein function In-Reply-To: References: Message-ID: <754710ced33bdb2f9840d96ba0c58424@bastelstu.be> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit From: tim@bastelstu.be (=?UTF-8?Q?Tim_D=C3=BCsterhus?=) Hi Am 2024-09-25 09:21, schrieb youkidearitai: > I tried implement mb_levenshtein function and create an RFC. > https://wiki.php.net/rfc/mb_levenshtein > https://github.com/php/php-src/pull/16043 > > I would like discussion, feel free to comment. Thank you for your RFC. I share the concern raised by cmb in the PR discussion: https://github.com/php/php-src/pull/16043#issuecomment-2374574538 Generally working with codepoints is going to be confusing for a user, but sometimes it is necessary when dealing with external systems that themselves work with codepoints (MySQL comes to my mind). However calculating the Levenshtein distance is most certainly something that purely is "user-facing" and not constrained by external systems. Calculating the distance of codepoints is going to be extremely confusing when dealing with things like Emoji. It would probably best to either only offer a `grapheme_*` function here or to leave this fully to userland. Best regards Tim Düsterhus