I've discovered that several native string functions offer a character mask
as a parameter.
I've laid out my observations at
https://stackoverflow.com/q/72865138/2943403
In a nutshell, not all character masks offer ranges via "double dot"
syntax. Or should I refer to ".." as the "string spread operator" to avoid
naming conflict with "..." -- the better known "spread operator" (array
spread operator)?
Rowan/@IMSoP informed me that the current division between the haves and
the have-nots appears to be based on the source language from which PHP
pulled. Essentially, if from C, the double dot does not represent a range.
https://chat.stackoverflow.com/transcript/11?m=54864842#54864842
Character ranges are not yet supported for:
-
strcspn()
-
strpbrk()
-
strspn()
Before I fire off an RFC, I would like to know:
- Are there any reasonable objections to consistently implementing
character range expressions for all character masks? - Are there any native functions that I did not mention my Stack Overflow
answer? - Is it true that only single-byte characters can be used in all
scenarios? If so, must it remain that way? - Is there already an official or widely-used term that I should be using
for the two-dot operator?
I should also mention that I initially considered requesting that all
character mask parameters be named $mask (instead of $separators, $token,
or $characters), but I later resigned to the fact that changing to a name
that describes the texture of the string would remove the more
vital/intuitive purpose of the string. I suppose the best that can be done
to inform developers is to explicitly mention in the documentation when
character range expressions are implemented and demonstrate their usage in
an example (not just as a user comment at the bottom; this isn't In-N-Out
Burger -- put your offerings on the frickin' menu!).
mickmackusa
Note that the "..." operator is unary, so there is no syntax conflict when using two floats:
echo 0...1; // 00.1
However, in the case of the ".." operator, it is assumed to be a binary operator, so problems with grammar ambiguity may arise:
echo 0 ..1; // 00.1
echo 0.. 1; // 01
- Note: The syntax you suggest is widely used in at least Ruby ( https://ruby-doc.org/core-2.5.1/Range.html ) and CoffeeScript.
- Note: There is also a
trim
,ltrim
andrtrim
functions
Суббота, 9 июля 2022, 2:56 +03:00 от mickmackusa mickmackusa@gmail.com:
I've discovered that several native string functions offer a character mask
as a parameter.I've laid out my observations at
https://stackoverflow.com/q/72865138/2943403In a nutshell, not all character masks offer ranges via "double dot"
syntax. Or should I refer to ".." as the "string spread operator" to avoid
naming conflict with "..." -- the better known "spread operator" (array
spread operator)?Rowan/@IMSoP informed me that the current division between the haves and
the have-nots appears to be based on the source language from which PHP
pulled. Essentially, if from C, the double dot does not represent a range.
https://chat.stackoverflow.com/transcript/11?m=54864842#54864842Character ranges are not yet supported for:
strcspn()
strpbrk()
strspn()
Before I fire off an RFC, I would like to know:
- Are there any reasonable objections to consistently implementing
character range expressions for all character masks?- Are there any native functions that I did not mention my Stack Overflow
answer?- Is it true that only single-byte characters can be used in all
scenarios? If so, must it remain that way?- Is there already an official or widely-used term that I should be using
for the two-dot operator?I should also mention that I initially considered requesting that all
character mask parameters be named $mask (instead of $separators, $token,
or $characters), but I later resigned to the fact that changing to a name
that describes the texture of the string would remove the more
vital/intuitive purpose of the string. I suppose the best that can be done
to inform developers is to explicitly mention in the documentation when
character range expressions are implemented and demonstrate their usage in
an example (not just as a user comment at the bottom; this isn't In-N-Out
Burger -- put your offerings on the frickin' menu!).mickmackusa
--
Kirill Nesmeyanov
Note that the "..." operator is unary, so there is no syntax conflict when
using two floats:echo 0...1; // 00.1
However, in the case of the ".." operator, it is assumed to be a binary
operator, so problems with grammar ambiguity may arise:echo 0 ..1; // 00.1 echo 0.. 1; // 01
- Note: The syntax you suggest is widely used in at least Ruby (
https://ruby-doc.org/core-2.5.1/Range.html ) and CoffeeScript.- Note: There is also a
trim
,ltrim
andrtrim
functionsСуббота, 9 июля 2022, 2:56 +03:00 от mickmackusa mickmackusa@gmail.com:
I've discovered that several native string functions offer a character
mask
as a parameter.I've laid out my observations at
https://stackoverflow.com/q/72865138/2943403In a nutshell, not all character masks offer ranges via "double dot"
syntax. Or should I refer to ".." as the "string spread operator" to avoid
naming conflict with "..." -- the better known "spread operator" (array
spread operator)?Rowan/@IMSoP informed me that the current division between the haves and
the have-nots appears to be based on the source language from which PHP
pulled. Essentially, if from C, the double dot does not represent a range.
https://chat.stackoverflow.com/transcript/11?m=54864842#54864842Character ranges are not yet supported for:
strcspn()
strpbrk()
strspn()
Before I fire off an RFC, I would like to know:
- Are there any reasonable objections to consistently implementing
character range expressions for all character masks?- Are there any native functions that I did not mention my Stack Overflow
answer?- Is it true that only single-byte characters can be used in all
scenarios? If so, must it remain that way?- Is there already an official or widely-used term that I should be using
for the two-dot operator?I should also mention that I initially considered requesting that all
character mask parameters be named $mask (instead of $separators, $token,
or $characters), but I later resigned to the fact that changing to a name
that describes the texture of the string would remove the more
vital/intuitive purpose of the string. I suppose the best that can be done
to inform developers is to explicitly mention in the documentation when
character range expressions are implemented and demonstrate their usage in
an example (not just as a user comment at the bottom; this isn't In-N-Out
Burger -- put your offerings on the frickin' menu!).mickmackusa
--
Kirill Nesmeyanov
Thanks for your reply, Kirill, but I am no way trying to introduce a new,
general use operator for all encountered strings.
I am purely focused on having the operator consistently implemented for all
character masks.
The language construct echo
does not have a specified character mask
parameter.
mickmackusa
Thanks for your reply, Kirill, but I am no way trying to introduce a new,
general use operator for all encountered strings.I am purely focused on having the operator consistently implemented for all
character masks.
I think the confusion here comes from your use of the word "operator" - in a technical sense, this is not an operator in the language, which takes two values or expressions and produces a new value. Rather, it's a convention used inside certain functions, to interpret a string argument in a special way. I suppose you could argue that the result is a very simple embedded language, like regular expressions, and then '..' would be an operator in that embedded language; but it's probably not how most people would describe it.
As for proposing to add it in more places, it would be good to have a clear expression of why having this facility in those functions would be useful. Every extra feature adds complexity, and is a potential source of bugs both in its implementation and in code that users write which touches it. A proposal needs to make a clear case of the gains that outweigh those costs.
Regards,
--
Rowan Tommins
[IMSoP]
I've discovered that several native string functions offer a character mask
as a parameter.I've laid out my observations at
https://stackoverflow.com/q/72865138/2943403In a nutshell, not all character masks offer ranges via "double dot"
syntax. Or should I refer to ".." as the "string spread operator" to avoid
naming conflict with "..." -- the better known "spread operator" (array
spread operator)?Rowan/@IMSoP informed me that the current division between the haves and
the have-nots appears to be based on the source language from which PHP
pulled. Essentially, if from C, the double dot does not represent a range.
https://chat.stackoverflow.com/transcript/11?m=54864842#54864842Character ranges are not yet supported for:
strcspn()
strpbrk()
strspn()
Before I fire off an RFC, I would like to know:
- Are there any reasonable objections to consistently implementing
character range expressions for all character masks?
In my opinion, this notation is somewhat confusing; trim($str, "a..z")
and trim($str, "a.z") look pretty similar, but have completely different
meaning. I'd rather have some general way to construct such ranges; the
slightly contrived implode(range()) is already available, though.
Besides, adding support for such character ranges to other functions
now, constitutes a (probably minor) BC break.
- Are there any native functions that I did not mention my Stack Overflow
answer?
It is impossible to list all "native" functions, at least if you mean
internal functions, because these may be defined by extensions. And
these extensions would need to explicitly implement support for such
character ranges.
- Is it true that only single-byte characters can be used in all
scenarios? If so, must it remain that way?
I think it needs to remain that way, since the functions already
accepting character ranges actually work on byte strings.
- Is there already an official or widely-used term that I should be using
for the two-dot operator?
I'd call them character ranges; the implementation is called
php_charmask()
(https://github.com/php/php-src/blob/php-8.1.8/ext/standard/string.c#L689).
I should also mention that I initially considered requesting that all
character mask parameters be named $mask (instead of $separators, $token,
or $characters), but I later resigned to the fact that changing to a name
that describes the texture of the string would remove the more
vital/intuitive purpose of the string. I suppose the best that can be done
to inform developers is to explicitly mention in the documentation when
character range expressions are implemented and demonstrate their usage in
an example (not just as a user comment at the bottom; this isn't In-N-Out
Burger -- put your offerings on the frickin' menu!).
I agree that the documentation needs to be improved. While trim()
mentions the character range support in one sentence, addcslashes()
dedicates several paragraphs of detailed explanation.
--
Christoph M. Becker
If I seek to have a round of voting for an RFC on character ranges in
character mask parameters, should I propose it for PHP8.3 or a higher
version?
I have only identified 4 native string functions that make reasable
candidates to join the 7 existing functions with this feature.
I don't think there is any benefit in explaining how these functions work.
The sole purpose for this change (and the reason that other functions have
it already) is to reduce code bloat without needing any extra function
calls. If the feature is good enough for the first 7 functions, then it
should be good enough for these other 4 functions.
Breaking change possibility: if code is silly enough to repeat ANY
characters in the mask AND the repeated character is a dot between two
other characters, then I don't have much sympathy. Honestly though, we
are talking about a super unlikely occurrence.
Some demos: https://3v4l.org/2Y0q4
Mick
I've discovered that several native string functions offer a character mask
as a parameter.I've laid out my observations at
https://stackoverflow.com/q/72865138/2943403
Out of curiosity, why do you say that strtr()
is "not a good candidate
because character order matters" (although you give a reasonable example)?
Maybe you have some counter-example?
Regards,
--
Guilliam Xavier
I've discovered that several native string functions offer a character
mask
as a parameter.I've laid out my observations at
https://stackoverflow.com/q/72865138/2943403Out of curiosity, why do you say that
strtr()
is "not a good candidate
because character order matters" (although you give a reasonable example)?
Maybe you have some counter-example?Regards,
--
Guilliam Xavier
I prefer to keep my scope very tight when posting on Stack Overflow.
My focus was purely on enabling character range syntax for native functions
with character mask parameters. My understanding of character masks in PHP
requires single-byte characters and no meaning to character order.
When strtr()
is fed two strings, they cannot be considered "character
masks" because the character orders matter.
If extending character range syntax to parameters which are not character
masks, I might support the feature for strtr()
, but ensuring that the two
strings are balanced will be made more difficult with ranged syntax.
strtr()
will silently condone imbalanced strings. https://3v4l.org/PY15F
On Monday, July 25, 2022, Guilliam Xavier guilliam.xavier@gmail.com
wrote:I've discovered that several native string functions offer a character
mask
as a parameter.I've laid out my observations at
https://stackoverflow.com/q/72865138/2943403Out of curiosity, why do you say that
strtr()
is "not a good candidate
because character order matters" (although you give a reasonable example)?
Maybe you have some counter-example?Regards,
--
Guilliam XavierI prefer to keep my scope very tight when posting on Stack Overflow.
My focus was purely on enabling character range syntax for native
functions with character mask parameters. My understanding of character
masks in PHP requires single-byte characters and no meaning to character
order.When
strtr()
is fed two strings, they cannot be considered "character
masks" because the character orders matter.If extending character range syntax to parameters which are not character
masks, I might support the feature forstrtr()
, but ensuring that the two
strings are balanced will be made more difficult with ranged syntax.
strtr()
will silently condone imbalanced strings. https://3v4l.org/PY15F
Thanks for the clarifications. You're right that the internal
php_charmask
converts a character list (possibly containing one or more
ranges) into a 256-char mask, thus "losing" any original order; so
strtr()
actually couldn't use the same implementation (even without
ranges), and a counter-example is strtr('adobe', 'abcde', 'ebcda')
(strtr('adobe', 'a..e', 'e..a')
would trigger a Warning "Invalid
'..'-range, '..'-range needs to be incrementing").
I had seen a parallel with the Unix tr
command, which does support
[incrementing] ranges (e.g. both echo adobe | tr abcde ABCDE
and echo adobe | tr a-e A-E
give "ADoBE", while echo adobe | tr abcde edcba
gives
"eboda" but echo adobe | tr a-e e-a
errors "range-endpoints of 'e-a' are
in reverse collating sequence order"), but its implementation doesn't use
character masks indeed (
https://github.com/coreutils/coreutils/blob/master/src/tr.c), and echo abracadabra | tr a-f x
gives "xxrxxxxxxrx" not "xbrxcxdxbrx"; and it also
supports more things like POSIX character classes...
PS: I find the strtr(string $string, array $replace_pairs)
form generally
superior to the strtr(string $string, string $from, string $to)
one
anyway ;)
Regards,
--
Guilliam Xavier
- Are there any reasonable objections to consistently implementing
character range expressions for all character masks?
would be a minor BC break to silently change the meaning of memspn($str,
"a..b"), which currently has the same meaning as "a.b" with wasted cpu
cycles, but with your suggestion it would become the same meaning as "ab"
and the dot would no longer pass the check..
But then again, currently writing ".." is just a waste of cpu, and i don't
think i've actually ever seen anyone do that in the wild
¯_(ツ)_/¯
On Fri, 29 Jul 2022 at 10:58, Guilliam Xavier guilliam.xavier@gmail.com
wrote:
On Monday, July 25, 2022, Guilliam Xavier guilliam.xavier@gmail.com
wrote:On Sat, Jul 9, 2022 at 1:56 AM mickmackusa mickmackusa@gmail.com
wrote:I've discovered that several native string functions offer a character
mask
as a parameter.I've laid out my observations at
https://stackoverflow.com/q/72865138/2943403Out of curiosity, why do you say that
strtr()
is "not a good candidate
because character order matters" (although you give a reasonable
example)?
Maybe you have some counter-example?Regards,
--
Guilliam XavierI prefer to keep my scope very tight when posting on Stack Overflow.
My focus was purely on enabling character range syntax for native
functions with character mask parameters. My understanding of character
masks in PHP requires single-byte characters and no meaning to character
order.When
strtr()
is fed two strings, they cannot be considered "character
masks" because the character orders matter.If extending character range syntax to parameters which are not character
masks, I might support the feature forstrtr()
, but ensuring that the two
strings are balanced will be made more difficult with ranged syntax.
strtr()
will silently condone imbalanced strings.
https://3v4l.org/PY15FThanks for the clarifications. You're right that the internal
php_charmask
converts a character list (possibly containing one or more
ranges) into a 256-char mask, thus "losing" any original order; so
strtr()
actually couldn't use the same implementation (even without
ranges), and a counter-example isstrtr('adobe', 'abcde', 'ebcda')
(strtr('adobe', 'a..e', 'e..a')
would trigger a Warning "Invalid
'..'-range, '..'-range needs to be incrementing").I had seen a parallel with the Unix
tr
command, which does support
[incrementing] ranges (e.g. bothecho adobe | tr abcde ABCDE
andecho adobe | tr a-e A-E
give "ADoBE", whileecho adobe | tr abcde edcba
gives
"eboda" butecho adobe | tr a-e e-a
errors "range-endpoints of 'e-a' are
in reverse collating sequence order"), but its implementation doesn't use
character masks indeed (
https://github.com/coreutils/coreutils/blob/master/src/tr.c), andecho abracadabra | tr a-f x
gives "xxrxxxxxxrx" not "xbrxcxdxbrx"; and it also
supports more things like POSIX character classes...PS: I find the
strtr(string $string, array $replace_pairs)
form generally
superior to thestrtr(string $string, string $from, string $to)
one
anyway ;)Regards,
--
Guilliam Xavier