Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:118309 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 23581 invoked from network); 29 Jul 2022 10:02:10 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 29 Jul 2022 10:02:10 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 6DC55180381 for ; Fri, 29 Jul 2022 05:00:56 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS15169 209.85.128.0/17 X-Spam-Virus: No X-Envelope-From: Received: from mail-ej1-f51.google.com (mail-ej1-f51.google.com [209.85.218.51]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Fri, 29 Jul 2022 05:00:55 -0700 (PDT) Received: by mail-ej1-f51.google.com with SMTP id tk8so8122039ejc.7 for ; Fri, 29 Jul 2022 05:00:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc; bh=9y0tH7qMwgbSOsVd4oqMxxa68CwBkhCpJMVSoWI3gmo=; b=mzyHQum0e5vHFoGsdQ0Cyu11Sj9bk62eL6KWkwzIad7U7pn2F1oAelT9w5ZkZW2kJM 9hk4kpKjEwzaaUbvOlvopoBtFz4pBrvZ8/j1cxne6gonz4aY02AuZfZkgb89NL1PTeq3 h/bBL2MAMPe5FKzryxIDFm+jaz5cDF91OeK5t2rSeTIFxjwVNkcgXtMy62ZCHE/geu+b 6Y9PgYswH/MU404vlp/dMXnaH13OWeoHNpJe/3HieDyuZS+u7XLls4hP1uH3M2gUeoYH WOtBBFLbeLVWhwwR91SQOWe26dH9Z8xlPYEdT+onlm3E/B6ZTYTMi8440gU1RPC7iB7p 0nhQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc; bh=9y0tH7qMwgbSOsVd4oqMxxa68CwBkhCpJMVSoWI3gmo=; b=ZLJSW248fwVZpsUSnu8Q9WkNB005Da5xxFm+yk4h7rerTmPhj7ddwMI8GtmEVvBW4y 5RXzngfPYNcZ7rj19Vjae3nUxdpSSJWHee56DICYjhO4PeBt5rpISf76PwJ54AK7Sxnu wM88KG1vceK9xAomLzOdqQyjvM/OT/XkSgbAQqyjxUuAi/pXBC5ercwzdohvXxMoFN05 m/H5e+g9cy9serkCydEF5m2/HFjBcNR/CXOBH/AXr+LJzQCkcEVTFazmpb0xGiVw1EDa 2ONLv+sC2ejN2cU3fkTPQlyIpFs2/A8MsYffIQv64baQZNVnnGouQNtYPeDLu/1y+xo9 HFuA== X-Gm-Message-State: AJIora9pYfzP5q9Z7whJSlnS3nJl/kicYXyiR90toSxoKxxOnsQXElBm 889jtZa2fajkl2nx6AcMA2cIftX+w2ePrtU4BjGaEJXf/vvpqA== X-Google-Smtp-Source: AGRyM1seZvJrwohE4NGjqQ9V8wLWQGjh42jlblOUVZj20Ny7zX0a8zoDChVGcInDqlH/L+w1tjPJmHEjqU1vlX9aWNE= X-Received: by 2002:a17:907:3fa0:b0:72f:aefd:621e with SMTP id hr32-20020a1709073fa000b0072faefd621emr2695825ejc.475.1659096054175; Fri, 29 Jul 2022 05:00:54 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: Date: Fri, 29 Jul 2022 14:00:18 +0200 Message-ID: To: Guilliam Xavier Cc: mickmackusa , "internals@lists.php.net" Content-Type: multipart/alternative; boundary="000000000000f51c2805e4f066f8" Subject: Re: [PHP-DEV] Re: Character range syntax ".." for character masks From: divinity76@gmail.com (Hans Henrik Bergan) --000000000000f51c2805e4f066f8 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable >1. Are there any reasonable objections to consistently implementing character range expressions for all character masks? would be a minor BC break to silently change the meaning of memspn($str, "a..b"), which currently has the same meaning as "a.b" with wasted cpu cycles, but with your suggestion it would become the same meaning as "ab" and the dot would no longer pass the check.. But then again, currently writing ".." is just a waste of cpu, and i don't think i've actually ever seen anyone do that in the wild =C2=AF\_(=E3=83=84)_/=C2=AF On Fri, 29 Jul 2022 at 10:58, Guilliam Xavier wrote: > On Fri, Jul 29, 2022 at 7:15 AM mickmackusa wrote= : > > > > > > > On Monday, July 25, 2022, Guilliam Xavier > > wrote: > > > >> On Sat, Jul 9, 2022 at 1:56 AM mickmackusa > wrote: > >> > >>> I've discovered that several native string functions offer a characte= r > >>> mask > >>> as a parameter. > >>> > >>> I've laid out my observations at > >>> https://stackoverflow.com/q/72865138/2943403 > >>> > >> > >> Out of curiosity, why do you say that strtr() is "not a good candidate > >> because character order matters" (although you give a reasonable > example)? > >> Maybe you have some counter-example? > >> > >> Regards, > >> > >> -- > >> Guilliam Xavier > >> > > > > I prefer to keep my scope very tight when posting on Stack Overflow. > > > > My focus was purely on enabling character range syntax for native > > functions with character mask parameters. My understanding of characte= r > > masks in PHP requires single-byte characters and no meaning to characte= r > > order. > > > > When strtr() is fed two strings, they cannot be considered "character > > masks" because the character orders matter. > > > > If extending character range syntax to parameters which are not charact= er > > masks, I might support the feature for strtr(), but ensuring that the t= wo > > strings are balanced will be made more difficult with ranged syntax. > > strtr() will silently condone imbalanced strings. > https://3v4l.org/PY15F > > > > Thanks for the clarifications. You're right that the internal > `php_charmask` converts a character list (possibly containing one or more > ranges) into a 256-char *mask*, thus "losing" any original order; so > strtr() actually couldn't use the same implementation (even without > ranges), and a counter-example is `strtr('adobe', 'abcde', 'ebcda')` > (`strtr('adobe', 'a..e', 'e..a')` would trigger a Warning "Invalid > '..'-range, '..'-range needs to be incrementing"). > > I had seen a parallel with the Unix `tr` command, which *does* support > [incrementing] ranges (e.g. both `echo adobe | tr abcde ABCDE` and `echo > adobe | tr a-e A-E` give "ADoBE", while `echo adobe | tr abcde edcba` giv= es > "eboda" but `echo adobe | tr a-e e-a` errors "range-endpoints of 'e-a' ar= e > in reverse collating sequence order"), but its implementation doesn't use > character masks indeed ( > https://github.com/coreutils/coreutils/blob/master/src/tr.c), and `echo > abracadabra | tr a-f x` gives "xxrxxxxxxrx" not "xbrxcxdxbrx"; and it als= o > supports more things like POSIX character classes... > > PS: I find the `strtr(string $string, array $replace_pairs)` form general= ly > superior to the `strtr(string $string, string $from, string $to)` one > anyway ;) > > Regards, > > -- > Guilliam Xavier > --000000000000f51c2805e4f066f8--