Hi internals!
A while ago a question was asked on the php-general mailing list with
regard to digit seperators in numeric literals[1].
IMHO it might be a useful enhancement to allow such digit separators for
numeric (integer and float) literals in PHP for better readability;
several other languages already support them (such as Java, Perl, Ruby,
C#, Eiffel and C++14).
Before attempting to draft a respective RFC, I'd like to get some
feedback, whether this is generally considered to be useful, which
character would be preferred (most other languages seem to allow the
underscore, but an apostroph or maybe some other character might be
reasonable as well), and which restrictions should be applied (e.g.
arbitrary use of the separator, group thousands only, etc.)
I'm looking forward to hear your opinion. Thanks in advance.
[1] http://marc.info/?l=php-general&m=142143581810951&w=2
--
Christoph M. Becker
Hi internals!
A while ago a question was asked on the php-general mailing list with
regard to digit seperators in numeric literals[1].IMHO it might be a useful enhancement to allow such digit separators for
numeric (integer and float) literals in PHP for better readability;
several other languages already support them (such as Java, Perl, Ruby,
C#, Eiffel and C++14).Before attempting to draft a respective RFC, I'd like to get some
feedback, whether this is generally considered to be useful, which
character would be preferred (most other languages seem to allow the
underscore, but an apostroph or maybe some other character might be
reasonable as well), and which restrictions should be applied (e.g.
arbitrary use of the separator, group thousands only, etc.)I'm looking forward to hear your opinion. Thanks in advance.
I think it will be difficult to find a separator character that doesn't
make a mess of the grammar.
my_func(1,999,999) obviously doesn't work
my_func(1'999'999) as per C++14 clashes with our single-quoted strings
my_func(1_999_999) like in ADA might work
but 999 would need to work as well and _ is a valid char in a constant
so you can have a constant named 999.
- nope
nope
@ nope
~ nope
! nope
% nope
^ nope
We went through this for the namespace char, and there simply isn't a
typable single character left to use for something like this. _ is the
closest but it would require some changes and BC breaks which I am not
sure is worth for what appears to me to be a not-so critical feature.
Now if we went into Unicode territory, we could do it. eg.
my_func(1 999 999) U+1680 (although it looks too much like a -)
my_func(1 999 999) U+205F (mathematical space)
my_func(1٬999٬999) U+066C (Arabic thousands separator)
my_func(1·999·999) U+00B7 (middle dot)
The last one looks best to me, but we'd need a team of people working in
shifts to answer the, "How do I type this?" question.
-Rasmus
Hi internals!
A while ago a question was asked on the php-general mailing list with
regard to digit seperators in numeric literals[1].IMHO it might be a useful enhancement to allow such digit separators for
numeric (integer and float) literals in PHP for better readability;
several other languages already support them (such as Java, Perl, Ruby,
C#, Eiffel and C++14).Before attempting to draft a respective RFC, I'd like to get some
feedback, whether this is generally considered to be useful, which
character would be preferred (most other languages seem to allow the
underscore, but an apostroph or maybe some other character might be
reasonable as well), and which restrictions should be applied (e.g.
arbitrary use of the separator, group thousands only, etc.)I'm looking forward to hear your opinion. Thanks in advance.
I think it will be difficult to find a separator character that doesn't
make a mess of the grammar.my_func(1,999,999) obviously doesn't work
my_func(1'999'999) as per C++14 clashes with our single-quoted strings
my_func(1_999_999) like in ADA might workbut 999 would need to work as well and _ is a valid char in a constant
so you can have a constant named 999.
- nope
nope
@ nope
~ nope
! nope
% nope
^ nopeWe went through this for the namespace char, and there simply isn't a
typable single character left to use for something like this. _ is the
closest but it would require some changes and BC breaks which I am not
sure is worth for what appears to me to be a not-so critical feature.Now if we went into Unicode territory, we could do it. eg.
my_func(1 999 999) U+1680 (although it looks too much like a -)
my_func(1 999 999) U+205F (mathematical space)
my_func(1٬999٬999) U+066C (Arabic thousands separator)
my_func(1·999·999) U+00B7 (middle dot)The last one looks best to me, but we'd need a team of people working in
shifts to answer the, "How do I type this?" question.-Rasmus
how about:
my_func( '1,000.04' ); //if you want to use separators there.
Rick. Who is firmly in the camp that considers type juggling an
essential feature of PHP.
Now if we went into Unicode territory, we could do it. eg.
my_func(1 999 999) U+1680 (although it looks too much like a -)
my_func(1 999 999) U+205F (mathematical space)
my_func(1٬999٬999) U+066C (Arabic thousands separator)
my_func(1·999·999) U+00B7 (middle dot)The last one looks best to me, but we'd need a team of people working in
shifts to answer the, "How do I type this?" question.-Rasmus
how about:
my_func( '1,000.04' ); //if you want to use separators there.
The problem with that is that the world is split. The other half, or
actually more than half, would write that as '1.000,04'. There is no way
we would want to take sides on that one. And we have support for
locale-based number formatting and parsing via numfmt_format()
and
numfmt_parse()
. If we were going to add a separator for literals, the
only real low-ascii choice is _ which is also used by Ada, D, Java, Perl
and Ruby.
I was 90% kidding about using a Unicode character, but if you think
about it a bit, most people are using IDEs or at least smart scriptable
editors, it wouldn't be that much of a stretch to picture your editor
pretty-printing 1234567890 as 1·234·567·890 or 1˙234˙567˙890 (U+02D9).
It would be easy to make the parser ignore that character in numeric
literals. Much easier than working out the various issues with _ anyway.
Although, personally it would freak me out if my editor started messing
with my numbers on me. But I don't use an IDE. I don't even have
syntax-highlighting turned on in my vim config. We didn't have stuff
like that on the Wyse 50.
-Rasmus
Rasmus Lerdorf wrote:
how about:
my_func( '1,000.04' ); //if you want to use separators there.
The problem with that is that the world is split. The other half, or
actually more than half, would write that as '1.000,04'. There is no way
we would want to take sides on that one. And we have support for
locale-based number formatting and parsing vianumfmt_format()
and
numfmt_parse()
. If we were going to add a separator for literals, the
only real low-ascii choice is _ which is also used by Ada, D, Java, Perl
and Ruby.
I agree that _ is most reasonable, mainly because it is used by other
languages also.
I was 90% kidding about using a Unicode character, but if you think
about it a bit, most people are using IDEs or at least smart scriptable
editors, it wouldn't be that much of a stretch to picture your editor
pretty-printing 1234567890 as 1·234·567·890 or 1˙234˙567˙890 (U+02D9).
It would be easy to make the parser ignore that character in numeric
literals. Much easier than working out the various issues with _ anyway.
Which issues do you see? IMHO it doesn't make much sense to use a digit
separator in a trailing or leading position, because that wouldn't
improve readability. So PHP could make the same restrictions as Java
with regard to integer and float literals[1] (basically that the
underscore is allowed only between actual digits), in which case I don't
see any syntactic ambiguity.
Ignoring the _ in the scanner (IMHO there is no need to obtain it in the
token) doesn't seem to be harder than ignoring any other character.
[1] http://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-3.10
--
Christoph M. Becker
but 999 would need to work as well and _ is a valid char in a constant
so you can have a constant named 999.
Why would we need to support the underscore in front (and maybe even at
the end) of a number?
--
Regards,
Mike
but 999 would need to work as well and _ is a valid char in a constant
so you can have a constant named 999.Why would we need to support the underscore in front (and maybe even at
the end) of a number?
I guess we could restrict it to not be leading. I was thinking along the
lines of the character being defined to be ignored anywhere in a
literal. The underscore was actually rejected in C++14 because C++ has
user defined literals.
http://en.cppreference.com/w/cpp/language/user_literal
which would then make 0x12_ab problematic. We obviously don't have
user-defined literals so we probably could make it work. Like I said,
that is the only viable option as far as I can see.
-Rasmus
2015-02-19 6:44 GMT+04:00 Rasmus Lerdorf rasmus@lerdorf.com:
I think it will be difficult to find a separator character that doesn't
make a mess of the grammar.my_func(1,999,999) obviously doesn't work
my_func(1'999'999) as per C++14 clashes with our single-quoted strings
my_func(1_999_999) like in ADA might workbut 999 would need to work as well and _ is a valid char in a constant
so you can have a constant named 999.
- nope
nope
@ nope
~ nope
! nope
% nope
^ nopeWe went through this for the namespace char, and there simply isn't a
typable single character left to use for something like this. _ is the
closest but it would require some changes and BC breaks which I am not
sure is worth for what appears to me to be a not-so critical feature.Now if we went into Unicode territory, we could do it. eg.
my_func(1 999 999) U+1680 (although it looks too much like a -)
my_func(1 999 999) U+205F (mathematical space)
my_func(1٬999٬999) U+066C (Arabic thousands separator)
my_func(1·999·999) U+00B7 (middle dot)The last one looks best to me, but we'd need a team of people working in
shifts to answer the, "How do I type this?" question.-Rasmus
Hey,
Why not space? It's certainly possible (I just checked) and it would look
clear I guess:
my_func(1 999 999);
>
> Why not space? It's certainly possible (I just checked) and it would look
> clear I guess:
>
> my_func(1 999 999);
>
Yes, but what if I just missed one or two commas there? ;)
--
Regards,
Mike
Nikita Nefedov wrote:
2015-02-19 6:44 GMT+04:00 Rasmus Lerdorf rasmus@lerdorf.com:
I think it will be difficult to find a separator character that doesn't
make a mess of the grammar.Why not space? It's certainly possible (I just checked) and it would look
clear I guess:my_func(1 999 999);
By the same reasoning spaces could be allowed for identifiers as well, e.g.
my func(1 999 999);
Too confusing and error prone, IMHO.
--
Christoph M. Becker