Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:121199 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 42398 invoked from network); 1 Oct 2023 08:44:20 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 1 Oct 2023 08:44:20 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 20AF618050B for ; Sun, 1 Oct 2023 01:44:20 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS15169 209.85.128.0/17 X-Spam-Virus: No X-Envelope-From: Received: from mail-oi1-f176.google.com (mail-oi1-f176.google.com [209.85.167.176]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Sun, 1 Oct 2023 01:44:19 -0700 (PDT) Received: by mail-oi1-f176.google.com with SMTP id 5614622812f47-3af6cd01323so1739854b6e.3 for ; Sun, 01 Oct 2023 01:44:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1696149859; x=1696754659; darn=lists.php.net; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=4TxDWbg1HoMpFRunLSznVSprrGDcnR5HGKKtmiKGu4k=; b=QzJDXUB0W2/cgXEyU2+SjsxN3O/IQe/19RNVqkPEXgCxoKCOeRqlhvAXwCSieHUeoE netTpAxuCn2RIwzLqGeGkGNqNkmTw06O5BH2+wElPIhoXsHfcWqWb1RWy7eIlMQFlZUG tXRELwe8Saq8ES4arbW//r3WXyjKRrVvGD6kjilhIsAkfxpO8jv1I1NeTN4owotBqUy7 N+izGTryiKg90VWlBLnTulOLP2A1xRE7CFlXJcgtYqpFvsLkZ0slWzPovnrrZiLZ+acy jbZ56fht5q7FPQ7wyxLHqF5ALeKWS8+8maAtJ84y2x45yKEQLKivJkxLI2W8Fs8cjDpQ aApw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696149859; x=1696754659; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=4TxDWbg1HoMpFRunLSznVSprrGDcnR5HGKKtmiKGu4k=; b=IpbTxzzhyKSQTuZJvZ/9K68U8Fl9ox/LtGN2KXv4HktSNp+wfLlEQyZfwkP9WDBUtu jFThKFkUUWsfS+SyCie94sov0UjskQBOJHHy88VyS5sSuVwXbmxLCdWjrlRCmYQuODRV AVwtr7HLA8v4KZXB7YeEz6X1RImu2WPJLoksj8TlcnbYAlT1scM3mJB26E628aomwsUB nSHyHlwZAQ7JUuUAPyLNGSAyrusLjn3d7yMCIfdSOGC0VhZVZHsPYgAl8wFHb1Equ0ta 0l+B0SEd/8wIZhO8OWoASHeuNy3+gXB87q+m38NhJHXEHHjK9z4FK7juoHQBKO3EWU0W TdfA== X-Gm-Message-State: AOJu0YwJ6yKiReaLxMMWg3TQtQAxIDJeCgfliZffzA3bnvdCL2hWvPsB fSWwWNOc9uaiaLJD1M0C+orhdvcdWUec70Dl66s= X-Google-Smtp-Source: AGHT+IHpRGYn2SiMpKyyNOFRuiScBk8c4oGzxQaL16Uw9ncqmFlvGSAH5XHhdi8UvWlnpRRVh5djQUlb50FQI5+G6nM= X-Received: by 2002:a05:6808:2105:b0:3a7:3988:87ee with SMTP id r5-20020a056808210500b003a7398887eemr10224572oiw.58.1696149858745; Sun, 01 Oct 2023 01:44:18 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: Date: Sun, 1 Oct 2023 10:43:42 +0200 Message-ID: To: youkidearitai Cc: Saki Takamachi , internals@lists.php.net Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Subject: Re: [PHP-DEV] trim support for multibyte spaces From: divinity76@gmail.com (Hans Henrik Bergan) > If have any idea, feel free to comment to me. i think the C code would look something like void mb_trim(size_t *strlen, char **str, const size_t trim_lengths_num, const size_t *trim_lengths, const char **trim_chars) { size_t local_strlen =3D *strlen; char *local_str =3D *str; for (;;) { for (size_t i =3D 0; i < trim_lengths_num; ++i) { if (local_strlen >=3D trim_lengths[i] && memcmp(local_str, trim_chars[i], trim_lengths[i]) =3D=3D 0) { local_strlen -=3D trim_lengths[i]; local_str +=3D trim_lengths[i]; goto remove_from_start_continue_2; } } break; remove_from_start_continue_2:; } for (;;) { for (size_t i =3D 0; i < trim_lengths_num; ++i) { if (local_strlen >=3D trim_lengths[i] && memcmp(((local_str + local_strlen) - trim_lengths[i]), trim_chars[i], trim_lengths[i]) =3D=3D 0) { local_strlen -=3D trim_lengths[i]; goto remove_from_end_continue_2; } } break; remove_from_end_continue_2:; } memmove(*str, local_str, local_strlen); char *newstr =3D (char *)realloc(*str, local_strlen); if (newstr !=3D nullptr) { *strlen =3D local_strlen; *str =3D newstr; } else { // some error handling } } with my simple testcode looking like int main() { const char *trim_chars[] =3D { " ", "!", // utf8 whitespace: "\xE2\x80\x80", // EN QUAD "\xE2\x80\x81", // EM QUAD "\xE2\x80\x82", // EN SPACE "\xE2\x80\x83", // EM SPACE "\xE2\x80\x84", // THREE-PER-EM SPACE "\xE2\x80\x85", // FOUR-PER-EM SPACE "\xE2\x80\x86", // SIX-PER-EM SPACE }; size_t trim_lengths[] =3D { strlen(trim_chars[0]), strlen(trim_chars[1]), strlen(trim_chars[2]), strlen(trim_chars[3]), strlen(trim_chars[4]), strlen(trim_chars[5]), strlen(trim_chars[6]), strlen(trim_chars[7]), strlen(trim_chars[8]), }; size_t trim_lengths_num =3D sizeof(trim_lengths) / sizeof(trim_lengths[= 0]); char *teststr =3D strdup(" ! \xE2\x80\x80\xE2\x80\x81\xE2\x80\x82 Hello World ! \xE2\x80\x83\xE2\x80\x84\xE2\x80\x85\xE2\x80\x86 ! "); // char *teststr =3D strdup(" ! Hello World ! ! "); size_t teststrlen =3D strlen(teststr); std::cout << teststrlen << ": \"" << std::string_view(teststr, teststrlen) << "\"" << std::endl; mb_trim(&teststrlen, &teststr, trim_lengths_num, trim_lengths, trim_cha= rs); std::cout << teststrlen << ": \"" << std::string_view(teststr, teststrlen) << "\"" << std::endl; return 0; } On Sat, 30 Sept 2023 at 13:16, youkidearitai wrot= e: > > 2023=E5=B9=B49=E6=9C=8830=E6=97=A5(=E5=9C=9F) 17:42 Saki Takamachi : > > > > > I also want to trim function of multibyte trim functions. > > > > > I think that in addition to mb_trim, > > mb_ltrim and mb_rtrim are also necessary. > > > > Hi. > > > > Having a new option besides regex sounds like a good idea for me, as a = user of a language that benefits from `mb_trim()`. > > > > Perhaps users are more intuitive about the usefulness of those function= s who have spent more time cleaning "multibyte spaces". > > Hi > > Thanks for reply. > I'm trying implements to trim for multibyte function. > Please give me some time. > > If have any idea, feel free to comment to me. > > Regards > Yuya > > -- > --------------------------- > Yuya Hamada (tekimen) > - https://tekitoh-memdhoi.info > - https://github.com/youkidearitai > ----------------------------- > > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: https://www.php.net/unsub.php >