Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:125755 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id 5D82B1A00BD for ; Sun, 6 Oct 2024 05:45:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1728193657; bh=35kRWoAGSNqicvSJ5QBNF6/9JedSPvWfE8VC8zz4kvw=; h=References:In-Reply-To:From:Date:Subject:To:From; b=gPP+Is8ikHLO5drv7iShCcnhuni55Fn0nsvFHwBtocXk9cmhMjD044p/8ODxudEzE KSQTsZovhObk3H0i98QG3FAdYUYFw48mXtOGl5i+j+emcwQpiivcI386A/tqCUHsKd gRsjHgQRL34f1mlDo5esooDBUMfVVwjSp3Ht1yOFwbUgfNHhpUMp25A0POAYb7M9YF bIXEG9dYnwSdbQG6PZzMGv9vqUK33+qaFdFkSobRX9yLDJUi8YbwnAlP1DyWEmsBx2 r6shVsWUCiMdJvsCVr/fnq6SBf5wUu20bMMnGC0TXmz92A5ppCjaQADCMz533oRkqj BRgA6RSPyKBQQ== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id D09DF180041 for ; Sun, 6 Oct 2024 05:47:36 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=0.6 required=5.0 tests=BAYES_50,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from mail-wm1-f49.google.com (mail-wm1-f49.google.com [209.85.128.49]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Sun, 6 Oct 2024 05:47:36 +0000 (UTC) Received: by mail-wm1-f49.google.com with SMTP id 5b1f17b1804b1-42cae6bb895so33051865e9.1 for ; Sat, 05 Oct 2024 22:45:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1728193519; x=1728798319; darn=lists.php.net; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=35kRWoAGSNqicvSJ5QBNF6/9JedSPvWfE8VC8zz4kvw=; b=hP3NX7K1DT3ScB4S1jg/Pq8gGvxvl7Ta/mvRDRkSM65AkVnAMlCnFYO8O9rqVJyDhe 9iHnxbxD4d7+bSipnHibXthcnKvoRIneCg6v+AccBtHcU/3vfot4Ox1YcWnyuTmqOHqZ YaSgDlvrH/uE+Kv1uhwtMGnK3/mU0lGtIsX/uXWyEttXxVasSjs6jy1EJ/5Zv4cKZt+G sDnfeufL7FKVcH74mx6Yu0hMJk6euFgRYngViHMVadeO+jU7i5V/gtU+ZtnNPLovjxpS 44zTmcJUgrIBAEK7kVsQ/Dit5xkPBdmzoUSVthxSZQFXOMd/4g32ErPUjt1Xoqw1LRRj PbJw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728193519; x=1728798319; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=35kRWoAGSNqicvSJ5QBNF6/9JedSPvWfE8VC8zz4kvw=; b=gceHFpmkzx1nTKJyLrZNU1RJSsAH7lsDTOcWFFY2NCrW4Ldr2Z3hQdVe6BsbXZts87 knWmfgVkyqxLhn4TSVAAfMDYcvaYiEkGGU1n7K2txnpmv6txPbH62XwWgUrlzhN/QVGN WcEzYcEPCU+YEjrvKXkCBaoR2sUIqwmk1sswiS59U2aATveg8FhHo3tYQhi0djvOM5iE aG+7ZlDDyGEAsKZtUBpjQssW9BbJd4a33N+baY5vUJcLefTdoK6gjClzf50lMvhhJVLu klC1MEbpmQMPRB48VB4nFj+UjiNM1fN4pkKCNaAYVUSOGKcnQ23FkNaCu+wBeJIiAMQw AuiA== X-Gm-Message-State: AOJu0Yz7CdN1J820jpb4wYp5OqC1ebherPp0BXDrGaYc5GNHVQ4y5Bbn xbsJZPZH0VzyR6N+zkt5YtQ8o9sa9j1vQrZ5i+NfUxpYTaadJQfpU/HAaYYWo7Q1Nvp3fChc2g6 VoTKkZ9rO2PSREcPT1H0xkkUOx+hPlAc= X-Google-Smtp-Source: AGHT+IEpxfsgmIuldaMSbWdJcLVW9bZvR2/+NtBS/rYJDHUwVXbJ2DBsYdPJmKhNesehB41tbQLkyj4g4me0OAbPH9I= X-Received: by 2002:a05:600c:474a:b0:42f:6d3d:399f with SMTP id 5b1f17b1804b1-42f85ac193fmr56018255e9.21.1728193519370; Sat, 05 Oct 2024 22:45:19 -0700 (PDT) Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net x-ms-reactions: disallow MIME-Version: 1.0 References: <754710ced33bdb2f9840d96ba0c58424@bastelstu.be> In-Reply-To: <754710ced33bdb2f9840d96ba0c58424@bastelstu.be> Date: Sun, 6 Oct 2024 14:45:07 +0900 Message-ID: Subject: Re: [PHP-DEV][DISCUSSION] Multibyte for levenshtein function To: php internals Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable From: youkidearitai@gmail.com (youkidearitai) 2024=E5=B9=B410=E6=9C=885=E6=97=A5(=E5=9C=9F) 1:20 Tim D=C3=BCsterhus : > > Hi > > Am 2024-09-25 09:21, schrieb youkidearitai: > > I tried implement mb_levenshtein function and create an RFC. > > https://wiki.php.net/rfc/mb_levenshtein > > https://github.com/php/php-src/pull/16043 > > > > I would like discussion, feel free to comment. > > Thank you for your RFC. I share the concern raised by cmb in the PR > discussion: > https://github.com/php/php-src/pull/16043#issuecomment-2374574538 > > Generally working with codepoints is going to be confusing for a user, > but sometimes it is necessary when dealing with external systems that > themselves work with codepoints (MySQL comes to my mind). However > calculating the Levenshtein distance is most certainly something that > purely is "user-facing" and not constrained by external systems. > Calculating the distance of codepoints is going to be extremely > confusing when dealing with things like Emoji. It would probably best to > either only offer a `grapheme_*` function here or to leave this fully to > userland. > > Best regards > Tim D=C3=BCsterhus Hi, Tim Thank you for response. I thinking about wants users what is levenshtein distance. Surely, I think Levenshtein distance should be measured in terms of grapheme clusters. In most userland codes that based on UTF-8. So seems move to grapheme function is make sense. I more thinking usecase of levenshtein. Probably I'm going to grapheme func= tion. Thanks Yuya --=20 --------------------------- Yuya Hamada (tekimen) - https://tekitoh-memdhoi.info - https://github.com/youkidearitai -----------------------------