Hi, Internals
I have just opened the voting "Multibyte ucfirst and lcfirst functions" RFC.
https://wiki.php.net/rfc/mb_ucfirst
Voting will be open until February 26th, 2024 at 01:00 UTC.
Cheers
Yuya
--
Yuya Hamada (tekimen)
On Fri, Feb 2, 2024 at 2:00 AM youkidearitai youkidearitai@gmail.com
wrote:
Hi, Internals
I have just opened the voting "Multibyte ucfirst and lcfirst functions"
RFC.
https://wiki.php.net/rfc/mb_ucfirstVoting will be open until February 26th, 2024 at 01:00 UTC.
Cheers
Yuya--
Yuya Hamada (tekimen)
--
To unsubscribe, visit: https://www.php.net/unsub.php
In the proposal part is mentioned "From what I've researched with Unicode,
it may not behave as expected in some languages. In that case, please deal
with it in userland.". If my understanding here is wrong, please correct
me. ucfirst and lcfirst are to uppercase/lowercase the first character of a
word for characters that have an upper/lower case variant. Whether or not a
word should have an uppercase or lower case character is not important
and currently doesn't behave in such a way for ucfirst and lcfirst. To me
this isn't unexpected behavior, that's exactly how I would expect it to
behave.
On Fri, Feb 2, 2024 at 2:00 AM youkidearitai youkidearitai@gmail.com
wrote:Hi, Internals
I have just opened the voting "Multibyte ucfirst and lcfirst functions"
RFC.
https://wiki.php.net/rfc/mb_ucfirstVoting will be open until February 26th, 2024 at 01:00 UTC.
Cheers
Yuya--
Yuya Hamada (tekimen)
--
To unsubscribe, visit: https://www.php.net/unsub.php
In the proposal part is mentioned "From what I've researched with Unicode,
it may not behave as expected in some languages. In that case, please deal
with it in userland.". If my understanding here is wrong, please correct
me. ucfirst and lcfirst are to uppercase/lowercase the first character of a
word for characters that have an upper/lower case variant. Whether or not a
word should have an uppercase or lower case character is not important
and currently doesn't behave in such a way for ucfirst and lcfirst. To me
this isn't unexpected behavior, that's exactly how I would expect it to
behave.
I think the author refers to the potential edge cases in certain Unicode
mappings. There isn't an ucfirst mapping, but there are uppercase and
titlecase mappings.
Unicode titlecase mapping is different from uppercase mapping. This PR
seems to be using uppercase mapping. This should not matter for a vast
majority of characters, except for ligatures and digraphs.
I'm not at all and expert in these edge cases, but I just wanted to put my
two cents forth that I personally think using titlecase mapping on the
first word would be the more appropriate approach.
Thank you.
2024年2月2日(金) 18:15 Ayesh Karunaratne ayesh@php.watch:
On Fri, Feb 2, 2024 at 2:00 AM youkidearitai youkidearitai@gmail.com
wrote:Hi, Internals
I have just opened the voting "Multibyte ucfirst and lcfirst functions"
RFC.
https://wiki.php.net/rfc/mb_ucfirstVoting will be open until February 26th, 2024 at 01:00 UTC.
Cheers
Yuya--
Yuya Hamada (tekimen)
--
To unsubscribe, visit: https://www.php.net/unsub.php
In the proposal part is mentioned "From what I've researched with Unicode,
it may not behave as expected in some languages. In that case, please deal
with it in userland.". If my understanding here is wrong, please correct
me. ucfirst and lcfirst are to uppercase/lowercase the first character of a
word for characters that have an upper/lower case variant. Whether or not a
word should have an uppercase or lower case character is not important
and currently doesn't behave in such a way for ucfirst and lcfirst. To me
this isn't unexpected behavior, that's exactly how I would expect it to
behave.I think the author refers to the potential edge cases in certain Unicode mappings. There isn't an ucfirst mapping, but there are uppercase and titlecase mappings.
Unicode titlecase mapping is different from uppercase mapping. This PR seems to be using uppercase mapping. This should not matter for a vast majority of characters, except for ligatures and digraphs.
I'm not at all and expert in these edge cases, but I just wanted to put my two cents forth that I personally think using titlecase mapping on the first word would be the more appropriate approach.
Thank you.
Hi, Thank you for reply.
I think the author refers to the potential edge cases in certain Unicode mappings. There isn't an ucfirst mapping, but there are uppercase and titlecase mappings.
Yes, Ayesh is right.
This is a text that is the result of investigating edge cases.
I'm not at all and expert in these edge cases, but I just wanted to put my two cents forth that I personally think using titlecase mapping on the first word would be the more appropriate approach.
I see. I'll change mb_ucfirst using titlecase.
Thank you.
Regards
Yuya
--
Yuya Hamada (tekimen)
I see. I'll change mb_ucfirst using titlecase.
Per my comments a month ago on the GitHub issue , I think it is much
better to use title case for mb_ucfirst() than to use upper case,
since conversion of the first character to upper case has the effect
of corrupting text in the Georgian script, and initial lower-case
ligatures are converted to a form which appears like two upper case
letters. So I'm pleased to see this change to the PR.
I would appreciate it if the RFC could also be updated to include this
detail, since my vote depends on whether title case or upper case will
be used.
-- Tim Starling
2024年2月6日(火) 8:33 Tim Starling tstarling@wikimedia.org:
I see. I'll change mb_ucfirst using titlecase.
Per my comments a month ago on the GitHub issue , I think it is much better to use title case for mb_ucfirst() than to use upper case, since conversion of the first character to upper case has the effect of corrupting text in the Georgian script, and initial lower-case ligatures are converted to a form which appears like two upper case letters. So I'm pleased to see this change to the PR.
I would appreciate it if the RFC could also be updated to include this detail, since my vote depends on whether title case or upper case will be used.
-- Tim Starling
Hi, Tim
Thank you for your comment.
I modified to "uses unicode case title" in an RFC.
Regards
Yuya
--
Yuya Hamada (tekimen)
2024年2月6日(火) 8:33 Tim Starling tstarling@wikimedia.org:
I see. I'll change mb_ucfirst using titlecase.
Per my comments a month ago on the GitHub issue , I think it is much better to use title case for mb_ucfirst() than to use upper case, since conversion of the first character to upper case has the effect of corrupting text in the Georgian script, and initial lower-case ligatures are converted to a form which appears like two upper case letters. So I'm pleased to see this change to the PR.
I would appreciate it if the RFC could also be updated to include this detail, since my vote depends on whether title case or upper case will be used.
-- Tim Starling
Hi, TimThank you for your comment.
I modified to "uses unicode case title" in an RFC.Regards
Yuya
Help me out here, but doesn't changing what is actually proposed in an
RFC after the vote has started invalidate the vote ?
Shouldn't the vote be closed/withdrawn and restarted in that case ?
2024年2月7日(水) 2:56 Juliette Reinders Folmer php-internals_nospam@adviesenzo.nl:
2024年2月6日(火) 8:33 Tim Starling tstarling@wikimedia.org:
I see. I'll change mb_ucfirst using titlecase.
Per my comments a month ago on the GitHub issue , I think it is much better to use title case for mb_ucfirst() than to use upper case, since conversion of the first character to upper case has the effect of corrupting text in the Georgian script, and initial lower-case ligatures are converted to a form which appears like two upper case letters. So I'm pleased to see this change to the PR.
I would appreciate it if the RFC could also be updated to include this detail, since my vote depends on whether title case or upper case will be used.
-- Tim Starling
Hi, TimThank you for your comment.
I modified to "uses unicode case title" in an RFC.Regards
YuyaHelp me out here, but doesn't changing what is actually proposed in an
RFC after the vote has started invalidate the vote ?Shouldn't the vote be closed/withdrawn and restarted in that case ?
Hi, Internals.
Juliette, Thank you for pointing.
I'm mistake.
I checked the following and it was exactly as I said.
https://wiki.php.net/RFC/voting#voting
A valid voting period must be declared when voting is started and must not be changed during the vote.
I broke this rule. In this case, should I return to "Under discussion"?
I apologize to those who voted, admitting lack of discussion, I want
to reorganize.
Regards
Yuya
--
Yuya Hamada (tekimen)
2024年2月7日(水) 4:49 youkidearitai youkidearitai@gmail.com:
2024年2月7日(水) 2:56 Juliette Reinders Folmer php-internals_nospam@adviesenzo.nl:
2024年2月6日(火) 8:33 Tim Starling tstarling@wikimedia.org:
I see. I'll change mb_ucfirst using titlecase.
Per my comments a month ago on the GitHub issue , I think it is much better to use title case for mb_ucfirst() than to use upper case, since conversion of the first character to upper case has the effect of corrupting text in the Georgian script, and initial lower-case ligatures are converted to a form which appears like two upper case letters. So I'm pleased to see this change to the PR.
I would appreciate it if the RFC could also be updated to include this detail, since my vote depends on whether title case or upper case will be used.
-- Tim Starling
Hi, TimThank you for your comment.
I modified to "uses unicode case title" in an RFC.Regards
YuyaHelp me out here, but doesn't changing what is actually proposed in an
RFC after the vote has started invalidate the vote ?Shouldn't the vote be closed/withdrawn and restarted in that case ?
Hi, Internals.
Juliette, Thank you for pointing.
I'm mistake.I checked the following and it was exactly as I said.
https://wiki.php.net/RFC/voting#votingA valid voting period must be declared when voting is started and must not be changed during the vote.
I broke this rule. In this case, should I return to "Under discussion"?
I apologize to those who voted, admitting lack of discussion, I want
to reorganize.Regards
Yuya
Hi, Internals.
This an RFC is revert to "Under Discussion".
https://wiki.php.net/rfc/howto
Referenced Section 7.3:
A serious issue with your RFC needs to be addressed: update the status of your RFC page and its section on https://wiki.php.net/RFC to “Under Discussion” and continue again from step 5.
I would like to wait another two weeks to vote again.
Scheduled for February 21st, GMT 00:00 restart voting.
Please feel free to comment.
I apologize to everyone who voted again.
Regards
Yuya
--
Yuya Hamada (tekimen)
I see. I'll change mb_ucfirst using titlecase.
Per my comments a month ago on the GitHub issue , I think it is much
better to use title case for mb_ucfirst() than to use upper case,
since conversion of the first character to upper case has the effect
of corrupting text in the Georgian script, and initial lower-case
ligatures are converted to a form which appears like two upper case
letters. So I'm pleased to see this change to the PR.I would appreciate it if the RFC could also be updated to include this
detail, since my vote depends on whether title case or upper case will
be used.-- Tim Starling
Hi Tim,
Now that the RFC is restarted, could you mention some examples in Georgian
that might be good test cases?
I was thinking there might be some good test cases in Turkish, but couldn't
find any. The RFC has examples (https://github.com/php/php-src/pull/13161)
in Vietnamese, but they are correct for both "uppercase first character"
and titlecase conversions.
Thank you.
Hi Tim,
Now that the RFC is restarted, could you mention some examples in
Georgian that might be good test cases?I was thinking there might be some good test cases in Turkish, but
couldn't find any. The RFC has examples
(https://github.com/php/php-src/pull/13161) in Vietnamese, but they
are correct for both "uppercase first character" and titlecase
conversions.
Any Georgian word would do. Your ASCII test case is "abc". The
Georgian equivalent for that would be "აბგ" (ani bani gani, U+10D0
U+10D1 U+10D2) which should remain the same after passing through
mb_ucfirst(). Compare mb_strtoupper("აბგ") -> "ᲐᲑᲒ" (U+1C90 U+1C91
U+1C92).
On the task I mentioned that ligatures are also affected. I gave the
example mb_ucfirst("lj") -> "Lj", that is, U+01C9 -> U+01C8. You could
add a test case for that. Compare mb_strtoupper("lj") -> "LJ" (U+01C7).
To repeat my rationale -- we can view ucfirst()
either through a
technical lens (convert the first character of a string to upper case)
or through a natural language lens (convert a string to sentence case,
with the initial letter capitalised per local conventions). I am
arguing to make mb_ucfirst() be a natural language extension of
ucfirst()
, because applying the technical extension would produce
results that look quite jarring in a natural language context.
There are some edge cases which are not quite right. To really do a
good job, a new case map will be needed. But if we document it as
being for natural language, and set the right expectations, we can fix
the edge cases later.
-- Tim Starling
2024年2月7日(水) 12:56 Tim Starling tstarling@wikimedia.org:
Hi Tim,
Now that the RFC is restarted, could you mention some examples in
Georgian that might be good test cases?I was thinking there might be some good test cases in Turkish, but
couldn't find any. The RFC has examples
(https://github.com/php/php-src/pull/13161) in Vietnamese, but they
are correct for both "uppercase first character" and titlecase
conversions.Any Georgian word would do. Your ASCII test case is "abc". The
Georgian equivalent for that would be "აბგ" (ani bani gani, U+10D0
U+10D1 U+10D2) which should remain the same after passing through
mb_ucfirst(). Compare mb_strtoupper("აბგ") -> "ᲐᲑᲒ" (U+1C90 U+1C91
U+1C92).On the task I mentioned that ligatures are also affected. I gave the
example mb_ucfirst("lj") -> "Lj", that is, U+01C9 -> U+01C8. You could
add a test case for that. Compare mb_strtoupper("lj") -> "LJ" (U+01C7).To repeat my rationale -- we can view
ucfirst()
either through a
technical lens (convert the first character of a string to upper case)
or through a natural language lens (convert a string to sentence case,
with the initial letter capitalised per local conventions). I am
arguing to make mb_ucfirst() be a natural language extension of
ucfirst()
, because applying the technical extension would produce
results that look quite jarring in a natural language context.There are some edge cases which are not quite right. To really do a
good job, a new case map will be needed. But if we document it as
being for natural language, and set the right expectations, we can fix
the edge cases later.-- Tim Starling
Hi, Tim
Thank you for Georgian test case.
I added to test case.
If other any comments, please feel free.
Regards
Yuya
--
Yuya Hamada (tekimen)