Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:121208 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 53559 invoked from network); 2 Oct 2023 17:46:07 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 2 Oct 2023 17:46:07 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 11474180082 for ; Mon, 2 Oct 2023 10:46:04 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS15169 209.85.128.0/17 X-Spam-Virus: No X-Envelope-From: Received: from mail-wm1-f51.google.com (mail-wm1-f51.google.com [209.85.128.51]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Mon, 2 Oct 2023 10:46:02 -0700 (PDT) Received: by mail-wm1-f51.google.com with SMTP id 5b1f17b1804b1-40652e570d9so375445e9.1 for ; Mon, 02 Oct 2023 10:46:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1696268761; x=1696873561; darn=lists.php.net; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Fo/M3OdxyJf4FgUv41E2ERDtzVXciat3UFPvs10CjIE=; b=EIgfsUoI1BG5xMD5uJPOvWtA5pzGWghca244di02XKqv+qXSIA4WaCZz7h1nV17CZR r5BUpZqZzZlSh2HjkhEYkY7iP3sny+rOPM2xcMaQKFU6t6KWG0CU5UU3348pg1v4m61y 7wEUBqkmJOUiEbQMu8alg7JKEX2X0B698PzWOxoLATUC3iwd2QQN77VVJBwqtOiaVTn/ dcgThcHhXbIFGNEQ3E9eULNkkf4lWSQ83nZldMNPlcQrSDYiloeNsFFoJcoLWzn4zdXg iUo4NjKgv431FH7xz9LXi1uhr4IkTHrX20etaWLJfBMbuBkrvYWi78WT2IQa7xovJxUw uA9A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696268761; x=1696873561; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Fo/M3OdxyJf4FgUv41E2ERDtzVXciat3UFPvs10CjIE=; b=gRRdpmKsqA223EFgCfFPqI/cU1sGfhFilwhg2hQh0ouhiTsJWdkpfoHDy0EiXc0CXV Cetlriofhz1AZlLBwiio1PV2hOPSKNR9iEUN+UiEcinJ+LoHlQr0DkSVfrPkRVqTsN1b 4awr5wrNGEkhLaf2lUjifzh5kJMAuKXGh8xTKl73svJWf93FoI1/t/ircEkpdn5tZHt+ TrKgKanBYLIXlZX//eX9axUypRlB4N1mFeEtdR9aMhoAGZIZLfKgql7e6aSObrmptYRN 41Rb6gsSEwrLR/6eUaECzpAHA1hw0HD2UrwVwsxt7v0cadocIhVt8//hUCqxuGXtIGfy kSpw== X-Gm-Message-State: AOJu0YyqWb+yw7Nu24S7TEuEnvE+GmQpFk4fJ75Bk904PV6T3Gfv2QrH azMKGhlLaQYRi5kU7a0QFMBQdWHG2xZEbgOuEJhK3+fddA== X-Google-Smtp-Source: AGHT+IEBfW+w36w4VQGL5nsST6C8pxSAHD1biFtYG3Lr27EdQGuewrux1juDa6OFvzQus91bXDDZwSb2AL2y1qdX3B4= X-Received: by 2002:a05:600c:22ca:b0:405:3a3d:6f53 with SMTP id 10-20020a05600c22ca00b004053a3d6f53mr11635395wmg.3.1696268761231; Mon, 02 Oct 2023 10:46:01 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: Date: Tue, 3 Oct 2023 02:45:49 +0900 Message-ID: To: internals@lists.php.net Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Subject: Re: [PHP-DEV] trim support for multibyte spaces From: youkidearitai@gmail.com (youkidearitai) 2023=E5=B9=B410=E6=9C=882=E6=97=A5(=E6=9C=88) 23:24 Hans Henrik Bergan : > > add a > void str_dump(const size_t strlen, const char *str) > { > printf("string(%zu) \"", strlen); > fwrite(str, strlen, 1, stdout); > printf("\"\n"); > } > > then replace the string_view stuff with > > size_t teststrlen =3D strlen(teststr); > str_dump(teststrlen, teststr); > mb_trim(&teststrlen, &teststr, trim_lengths_num, trim_lengths, trim_c= hars); > str_dump(teststrlen, teststr); > > and it should be pure C, i think > > On Mon, 2 Oct 2023 at 15:36, youkidearitai wrot= e: > > > > Hi, Hans. > > > > Thank you very much for your code. And sorry for late. > > I confirmed works fine. > > https://gist.github.com/youkidearitai/0018dee27353c00aebaff3bf57c5b8c6 > > > > However, this code is C++17, php-src is C code. > > If you would like contribute, I would like written to C code. > > > > Regards > > Yuya > > > > 2023=E5=B9=B410=E6=9C=881=E6=97=A5(=E6=97=A5) 19:46 Hans Henrik Bergan = : > > > > > > .. probably a bunch of stuff that *could* be optimized or done better= , > > > but one i saw just now is that instead of 2x nested loops and goto, > > > the outer loop and labels could be removed and the `goto > > > remove_from_start_continue_2;` could be replaced with `i=3D-1;` eg > > > > > > size_t local_strlen =3D *strlen; > > > char *local_str =3D *str; > > > for (size_t i =3D 0; i < trim_lengths_num; ++i) > > > { > > > if (local_strlen >=3D trim_lengths[i] && memcmp(local_str, > > > trim_chars[i], trim_lengths[i]) =3D=3D 0) > > > { > > > local_strlen -=3D trim_lengths[i]; > > > local_str +=3D trim_lengths[i]; > > > i =3D -1; > > > } > > > } > > > > > > 2x nested loops reduced to 1 loop, and goto removed~ > > > > > > On Sun, 1 Oct 2023 at 10:43, Hans Henrik Bergan wrote: > > > > > > > > > If have any idea, feel free to comment to me. > > > > > > > > i think the C code would look something like > > > > > > > > > > > > void mb_trim(size_t *strlen, char **str, const size_t > > > > trim_lengths_num, const size_t *trim_lengths, const char **trim_cha= rs) > > > > { > > > > size_t local_strlen =3D *strlen; > > > > char *local_str =3D *str; > > > > for (;;) > > > > { > > > > for (size_t i =3D 0; i < trim_lengths_num; ++i) > > > > { > > > > if (local_strlen >=3D trim_lengths[i] && memcmp(local_s= tr, > > > > trim_chars[i], trim_lengths[i]) =3D=3D 0) > > > > { > > > > local_strlen -=3D trim_lengths[i]; > > > > local_str +=3D trim_lengths[i]; > > > > goto remove_from_start_continue_2; > > > > } > > > > } > > > > break; > > > > remove_from_start_continue_2:; > > > > } > > > > for (;;) > > > > { > > > > for (size_t i =3D 0; i < trim_lengths_num; ++i) > > > > { > > > > if (local_strlen >=3D trim_lengths[i] && memcmp(((local= _str > > > > + local_strlen) - trim_lengths[i]), trim_chars[i], trim_lengths[i])= =3D=3D > > > > 0) > > > > { > > > > local_strlen -=3D trim_lengths[i]; > > > > goto remove_from_end_continue_2; > > > > } > > > > } > > > > break; > > > > remove_from_end_continue_2:; > > > > } > > > > memmove(*str, local_str, local_strlen); > > > > char *newstr =3D (char *)realloc(*str, local_strlen); > > > > if (newstr !=3D nullptr) > > > > { > > > > *strlen =3D local_strlen; > > > > *str =3D newstr; > > > > } > > > > else > > > > { > > > > // some error handling > > > > } > > > > } > > > > > > > > with my simple testcode looking like > > > > > > > > int main() > > > > { > > > > const char *trim_chars[] =3D { > > > > " ", > > > > "!", > > > > // utf8 whitespace: > > > > "\xE2\x80\x80", // EN QUAD > > > > "\xE2\x80\x81", // EM QUAD > > > > "\xE2\x80\x82", // EN SPACE > > > > "\xE2\x80\x83", // EM SPACE > > > > "\xE2\x80\x84", // THREE-PER-EM SPACE > > > > "\xE2\x80\x85", // FOUR-PER-EM SPACE > > > > "\xE2\x80\x86", // SIX-PER-EM SPACE > > > > }; > > > > size_t trim_lengths[] =3D { > > > > strlen(trim_chars[0]), > > > > strlen(trim_chars[1]), > > > > strlen(trim_chars[2]), > > > > strlen(trim_chars[3]), > > > > strlen(trim_chars[4]), > > > > strlen(trim_chars[5]), > > > > strlen(trim_chars[6]), > > > > strlen(trim_chars[7]), > > > > strlen(trim_chars[8]), > > > > }; > > > > size_t trim_lengths_num =3D sizeof(trim_lengths) / sizeof(trim_= lengths[0]); > > > > char *teststr =3D strdup(" ! \xE2\x80\x80\xE2\x80\x81\xE2\x80= \x82 > > > > Hello World ! \xE2\x80\x83\xE2\x80\x84\xE2\x80\x85\xE2\x80\x86 ! > > > > "); > > > > // char *teststr =3D strdup(" ! Hello World ! ! = "); > > > > size_t teststrlen =3D strlen(teststr); > > > > std::cout << teststrlen << ": \"" << std::string_view(teststr, > > > > teststrlen) << "\"" << std::endl; > > > > mb_trim(&teststrlen, &teststr, trim_lengths_num, trim_lengths, = trim_chars); > > > > std::cout << teststrlen << ": \"" << std::string_view(teststr, > > > > teststrlen) << "\"" << std::endl; > > > > return 0; > > > > } > > > > > > > > > > > > On Sat, 30 Sept 2023 at 13:16, youkidearitai wrote: > > > > > > > > > > 2023=E5=B9=B49=E6=9C=8830=E6=97=A5(=E5=9C=9F) 17:42 Saki Takamach= i : > > > > > > > > > > > > > I also want to trim function of multibyte trim functions. > > > > > > > > > > > > > I think that in addition to mb_trim, > > > > > > mb_ltrim and mb_rtrim are also necessary. > > > > > > > > > > > > Hi. > > > > > > > > > > > > Having a new option besides regex sounds like a good idea for m= e, as a user of a language that benefits from `mb_trim()`. > > > > > > > > > > > > Perhaps users are more intuitive about the usefulness of those = functions who have spent more time cleaning "multibyte spaces". > > > > > > > > > > Hi > > > > > > > > > > Thanks for reply. > > > > > I'm trying implements to trim for multibyte function. > > > > > Please give me some time. > > > > > > > > > > If have any idea, feel free to comment to me. > > > > > > > > > > Regards > > > > > Yuya > > > > > > > > > > -- > > > > > --------------------------- > > > > > Yuya Hamada (tekimen) > > > > > - https://tekitoh-memdhoi.info > > > > > - https://github.com/youkidearitai > > > > > ----------------------------- > > > > > > > > > > -- > > > > > PHP Internals - PHP Runtime Development Mailing List > > > > > To unsubscribe, visit: https://www.php.net/unsub.php > > > > > > > > > > > > > -- > > --------------------------- > > Yuya Hamada (tekimen) > > - https://tekitoh-memdhoi.info > > - https://github.com/youkidearitai > > ----------------------------- > > > > -- > > PHP Internals - PHP Runtime Development Mailing List > > To unsubscribe, visit: https://www.php.net/unsub.php > > Hi, Internals I'm sorry for the format was wrong. I'm trying implements mb_trim function. https://github.com/youkidearitai/php-src/tree/mb_trim Remaining task is mb_ltrim and mb_rtrim functions. And, Unicode character to trim. I'm implementing to it is. Regards, Yuya --=20 --------------------------- Yuya Hamada (tekimen) - https://tekitoh-memdhoi.info - https://github.com/youkidearitai -----------------------------