PHP6, Unicode for language functions, classes, methods, vars names

20 years ago by Pierre Joye — view source — reply

unread

Hello,

Do we really want to support unicode for functions, classes, methods
and variables names?

I really like to have unicode for comments (// /* */) and inside quotes.

But having:
<?php
function unicode_ist_nicht_süß(){}
function Unicode_не_хорош() {}
?>

is not something I like to see. For language constructs, I would
really like to have only ASCII support...

Regards,

--Pierre

20 years ago by ondrej@kmit.sk — view source — reply

unread

Pierre Joye wrote:

is not something I like to see. For language constructs, I would
really like to have only ASCII support...

This suff works in php4,5:

<?php

function zmaž($čozmazať) {
echo "mažem $čozmazať\n";
}

zmaž(3);

IMHO, if someone need ...

--
Ondrej Ivanič
(ondrej@kmit.sk)

20 years ago by Pierre Joye — view source — reply

unread

Pierre Joye wrote:

is not something I like to see. For language constructs, I would
really like to have only ASCII support...

This suff works in php4,5:

<?php

function zmaž($čozmazať) {
echo "mažem $čozmazať\n";
}

zmaž(3);

?>

IMHO, if someone need ...

I know, that's why I say: PHP6.

--Pierre

20 years ago by ondrej@kmit.sk — view source — reply

unread

Pierre Joye wrote:

IMHO, if someone need ...

I know, that's why I say: PHP6.

Another constraint? why?

It's a BC break (... which impact 1 or 2 users? :) )
PHP can be scripting engine in learning programs for children ( like
this: http://www.input.sk/slogo/ ). For children is better to write
"programs" in their native language.

--
Ondrej Ivanič
(ondrej@kmit.sk)

20 years ago by Derick Rethans — view source — reply

unread

Pierre Joye wrote:

IMHO, if someone need ...

I know, that's why I say: PHP6.

Another constraint? why?

It's a BC break (... which impact 1 or 2 users? :) )

Actually, I saw this in use in a couple of generic apps, where the
french coders thought it was nice to use the ? (in utf8) in their
function names.

Derick

--
Derick Rethans
http://derickrethans.nl | http://ez.no | http://xdebug.org

20 years ago by Ilia Alshanetsky — view source — reply

unread

Pierre Joye wrote:

is not something I like to see. For language constructs, I would
really like to have only ASCII support...

+1 IMHO language identifiers should be limited to ASCII. Yes you can now
use language specific chars by changing the locale, so that ž, č, ÿ are
taken, but that hardly makes for portable code.

Ilia

20 years ago by Rasmus Lerdorf — view source — reply

unread

Ilia Alshanetsky wrote:

Pierre Joye wrote:

is not something I like to see. For language constructs, I would
really like to have only ASCII support...

+1 IMHO language identifiers should be limited to ASCII. Yes you can now
use language specific chars by changing the locale, so that ž, č, ÿ are
taken, but that hardly makes for portable code.

What do you mean? Why wouldn't it be portable? Because you can't read
it? It will still run. Limiting identifiers to ASCII is an artificial
limitation as far as I am concerned. I see no reason for it. It's not
as if people are going to suddenly write code for distribution with all
sorts of weird unicode identifiers. We support high-ascii today and you
never see those in public code. Java has had unicode identifiers
forever as well, and it doesn't seem to be a problem for them.

For people writing localized code it is very nice to be able to use
descriptive identifiers in their own character set. It makes it much
easier to understand the code for them.

-Rasmus

20 years ago by Andrei Zmievski — view source — reply

unread

Yep, what Rasmus said.

-Andrei

Ilia Alshanetsky wrote:

Pierre Joye wrote:

is not something I like to see. For language constructs, I would
really like to have only ASCII support...

+1 IMHO language identifiers should be limited to ASCII. Yes you can
now
use language specific chars by changing the locale, so that ž, č, ÿ
are
taken, but that hardly makes for portable code.

What do you mean? Why wouldn't it be portable? Because you can't read
it? It will still run. Limiting identifiers to ASCII is an artificial
limitation as far as I am concerned. I see no reason for it. It's not
as if people are going to suddenly write code for distribution with all
sorts of weird unicode identifiers. We support high-ascii today and
you
never see those in public code. Java has had unicode identifiers
forever as well, and it doesn't seem to be a problem for them.

For people writing localized code it is very nice to be able to use
descriptive identifiers in their own character set. It makes it much
easier to understand the code for them.

-Rasmus

20 years ago by Andi Gutmans — view source — reply

unread

Me too...

At 09:16 AM 9/13/2005, Andrei Zmievski wrote:

Yep, what Rasmus said.

-Andrei

Ilia Alshanetsky wrote:

Pierre Joye wrote:

is not something I like to see. For language constructs, I would
really like to have only ASCII support...

+1 IMHO language identifiers should be limited to ASCII. Yes you can now
use language specific chars by changing the locale, so that , Ä, Ã¿ are
taken, but that hardly makes for portable code.

What do you mean? Why wouldn't it be portable? Because you can't read
it? It will still run. Limiting identifiers to ASCII is an artificial
limitation as far as I am concerned. I see no reason for it. It's not
as if people are going to suddenly write code for distribution with all
sorts of weird unicode identifiers. We support high-ascii today and you
never see those in public code. Java has had unicode identifiers
forever as well, and it doesn't seem to be a problem for them.

For people writing localized code it is very nice to be able to use
descriptive identifiers in their own character set. It makes it much
easier to understand the code for them.

-Rasmus

20 years ago by Ilia Alshanetsky — view source — reply

unread

Rasmus Lerdorf wrote:

What do you mean? Why wouldn't it be portable?

Well, for one thing code written to use unicode identifiers will
immediately be limited to running on PHP 6 installs. While code using
ASCII identifier with standard "compat" layer could run just fine.

Another reason to only allow ASCII is that now code can be read by
anyone rather then just the people who are familiar with the particular
language user. Heck, some editors do not even allow utf-8 or properly
render some high-ascii chars making those scripts difficult if not
impossible to edit.

Ilia

20 years ago by Rasmus Lerdorf — view source — reply

unread

Ilia Alshanetsky wrote:

Rasmus Lerdorf wrote:

What do you mean? Why wouldn't it be portable?

Well, for one thing code written to use unicode identifiers will
immediately be limited to running on PHP 6 installs. While code using
ASCII identifier with standard "compat" layer could run just fine.

Another reason to only allow ASCII is that now code can be read by
anyone rather then just the people who are familiar with the particular
language user.

This is a choice that should be up to the developer. If she wants to
share his code with people who don't understand his language and/or
character set, then she should use some common language/character set.
But the language should not force this limitation on her.

-Rasmus

20 years ago by Pierre Joye — view source — reply

unread

This is a choice that should be up to the developer. If she wants to
share his code with people who don't understand his language and/or
character set, then she should use some common language/character set.
But the language should not force this limitation on her.

The language does put a limitation as it does not provide a way to
show the identifiers in a neutral way. For example ACI 4D (for those
who knows it) did it in the right way. All functions are localized, I
can choose the language I want I will be able to read the identifiers.
But I doubt we will ever do that in PHP :)

Regards,

--Pierre

20 years ago by Sara Golemon — view source — reply

unread

The language does put a limitation as it
does not provide a way to show the
identifiers in a neutral way. For example
ACI 4D (for those who knows it) did it
in the right way. All functions are localized,
I can choose the language I want I will be
able to read the identifiers. But I doubt we
will ever do that in PHP :)

static function_entry php_espanol_functions[] = {
PHP_FALIAS(secuencia_color, highlight_string, NULL)
PHP_FALIAS(aabierto, fopen, NULL)
PHP_FALIAS(mysql_pregunta, mysql_query, NULL)
etc...etc...etc...
};

-Sara (trying to interject some humor)

20 years ago by Sebastian Nohn — view source — reply

unread

Rasmus Lerdorf wrote:

This is a choice that should be up to the developer. If she wants to
share his code with people who don't understand his language and/or
character set, then she should use some common language/character set.
But the language should not force this limitation on her.

Why not display T_PAAMAYIM_NEKUDOTAYIM in hebrew characters if that is so?

Sebastian

20 years ago by Derick Rethans — view source — reply

unread

Rasmus Lerdorf wrote:

What do you mean? Why wouldn't it be portable?

Well, for one thing code written to use unicode identifiers will
immediately be limited to running on PHP 6 installs. While code using
ASCII identifier with standard "compat" layer could run just fine.

I don't see why this is a problem...

Another reason to only allow ASCII is that now code can be read by
anyone rather then just the people who are familiar with the particular
language user. Heck, some editors do not even allow utf-8 or properly
render some high-ascii chars making those scripts difficult if not
impossible to edit.

Again, I don't see why this support would be a problem for you... if
other people want to use it, let them. It doesn't hurt anybody who
really doesn't use it.

Derick

--
Derick Rethans
http://derickrethans.nl | http://ez.no | http://xdebug.org