Turkish/Azeri locale support

15 years ago by Adam Harvey — view source

unread

As at least some of you would already be aware, there's a
long-standing issue with using PHP in a Turkish or Azeri locale,
namely that case-insensitive lookups within the Zend engine (method
names, for example) fail on lookups involving upper-case I characters,
since lower-case I in those languages is ı instead of i (note the lack
of a dot).

The long term plan for this, per bug #35050 and any number of
duplicates, was to deal with it in PHP 6. Since PHP 6 isn't going to
happen in its original form, I think we're going to need to revisit
how we want to deal with this. There's a patch linked in the bug from
Tomas Kuliavas and Marcus that fixes the problem by simply redefining
zend_tolower() to a simple locale-insensitive ASCII tolower()
function, which does fix the Turkish and Azeri locales.

The potential breakage from this is that single-byte locales will no
longer get case-insensitive lookups of non-ASCII characters: for
example, somebody using fr_FR.ISO-8859-1 as a locale could no longer
call a method É() as é(). Since it doesn't break anything when using
multi-byte locales (which have never had case-insensitive lookups
anyway since the Zend Engine uses the single-byte tolower()
internally), my inclination would be to apply the patch on trunk and
document it as a BC issue.

I've uploaded an updated version of Tomas's patch that applies cleanly
to trunk to http://www.adamharvey.name/patches/35050/zend_operators.c.diff
and a phpt file to test the fix to
http://www.adamharvey.name/patches/35050/bug35050.phpt. It's likely
that the test would require massaging before being committed to work
on Windows, but since I don't have a Windows development box readily
available and don't know a thing about how Windows implements locale
support, this would require help from someone familiar with the
platform.

So: thoughts; concerns; alternate approaches? It would be nice to have
this sorted for PHP.next.

Thanks,

Adam

15 years ago by Stan Vassilev — view source

unread

As at least some of you would already be aware, there's a
long-standing issue with using PHP in a Turkish or Azeri locale,
namely that case-insensitive lookups within the Zend engine (method
names, for example) fail on lookups involving upper-case I characters,
since lower-case I in those languages is ı instead of i (note the lack
of a dot).

The long term plan for this, per bug #35050 and any number of
duplicates, was to deal with it in PHP 6. Since PHP 6 isn't going to
happen in its original form, I think we're going to need to revisit
how we want to deal with this. There's a patch linked in the bug from
Tomas Kuliavas and Marcus that fixes the problem by simply redefining
zend_tolower() to a simple locale-insensitive ASCII tolower()
function, which does fix the Turkish and Azeri locales.

As you illustrated in your post, fixing it for locales becomes...
complicated.

If you ask me, there's only one way to fix this, which is how most other
languages fixed it: make the next major version of PHP case-sensitive for
all identifiers. For less bugs, less locale problems and more performance.

It was somewhat-the-plan, even before the Turkish locale issue was brought
up.

Regards,
Stan Vassilev

15 years ago by Adam Harvey — view source

unread

If you ask me, there's only one way to fix this, which is how most other
languages fixed it: make the next major version of PHP case-sensitive for
all identifiers. For less bugs, less locale problems and more performance.

Definitely another option — and one I personally like — although I
suspect the BC implications of that are considerably greater than
breaking high-bit characters in single-byte locales.

Adam

15 years ago by Tomas Kuliavas — view source

unread

2010.04.19 07:59 Stan Vassilev rašė:

As at least some of you would already be aware, there's a
long-standing issue with using PHP in a Turkish or Azeri locale,
namely that case-insensitive lookups within the Zend engine (method
names, for example) fail on lookups involving upper-case I characters,
since lower-case I in those languages is ı instead of i (note the lack
of a dot).

The long term plan for this, per bug #35050 and any number of
duplicates, was to deal with it in PHP 6. Since PHP 6 isn't going to
happen in its original form, I think we're going to need to revisit
how we want to deal with this. There's a patch linked in the bug from
Tomas Kuliavas and Marcus that fixes the problem by simply redefining
zend_tolower() to a simple locale-insensitive ASCII tolower()
function, which does fix the Turkish and Azeri locales.

As you illustrated in your post, fixing it for locales becomes...
complicated.

If you ask me, there's only one way to fix this, which is how most other
languages fixed it: make the next major version of PHP case-sensitive for
all identifiers. For less bugs, less locale problems and more performance.

It was somewhat-the-plan, even before the Turkish locale issue was brought
up.

Fixing issue is not complicated. I could do that without any C coding
background. Your (@php.net) developers only have to learn that they should
not use locale sensitive functions and assume that English case
sensitivity rules apply. This is main lesson Turkey presents to any coder.
If you continue to ignore it, you will continue to trigger PHP bugs in
Turkey. For n years PHP used only locale sensitive case-insensitive
functions. You never bothered to fix it. Offsetting it to some distant
PHP6 feature does not help Turks.

Patch for 35050 is not something that should break things. You reviewed
patch, commented it, I have updated patch based on your style comments and
you continued to ignore the problem. Excuse that patch breaks something is
funny, because Win32 builds are set to use internal (not for public use)
Microsoft C library calls that are locale insensitive. If some PHP code
breaks when string functions are locale insensitive, please show that
code. I would like to see how i18n unportable PHP5 programming looks like.

If users want to use Turkish locale here and now, they must set LC_CTYPE
to C. This workaround disables all locale specific quirks and only gettext
must be set to use correct charset for all translations. Other fix is more
complex. php scripts must replace all locale sensitive native functions
with own locale insensitive replacements and pray that supported PHP
versions don't trigger bugs, when LC_CTYPE is not C.

--
Tomas

15 years ago by Adam Harvey — view source

unread

As at least some of you would already be aware, there's a
long-standing issue with using PHP in a Turkish or Azeri locale,
namely that case-insensitive lookups within the Zend engine (method
names, for example) fail on lookups involving upper-case I characters,
since lower-case I in those languages is ı instead of i (note the lack
of a dot).

Well, I'm going to assume that people have had whatever say they were
going to. It seems that we have three options, so let's put it to a
vote.

(To be completely clear, this is purely for trunk. This certainly
isn't a candidate for backporting to 5.3.)

The options are:

Apply Tomas's patch to make case-insensitive lookups
locale-ignorant. Pros: fixes immediate problem. Cons: breaks BC for
case-insensitive function/method name lookups for high-bit characters
in single-byte encodings. (Not that we've ever advertised or
documented that.)
Make function/method names case-sensitive, per Stan's e-mail. Pros:
fixes problem; brings PHP into line with most other languages; extra
consistency with variables; possible performance improvement. Cons: BC
break from current documented behaviour.
Do nothing. Pros: no BC breaks of any kind. Cons: continues to
annoy Turkish and Azeri developers and those developing for those
locales.

If you'd care to reply with a vote for option 1, 2 or 3, I'll tally up
the votes in a week or so. And yes, I am volunteering to deal with
this should option 1 or 2 be picked.

Adam

15 years ago by Adam Harvey — view source

unread

If you'd care to reply with a vote for option 1, 2 or 3, I'll tally up
the votes in a week or so. And yes, I am volunteering to deal with
this should option 1 or 2 be picked.

Separate e-mail for housekeeping purposes: my vote is for option 1.

Adam

15 years ago by Steven Van Poeck — view source

unread

Adam Harvey wrote:

As at least some of you would already be aware, there's a
long-standing issue with using PHP in a Turkish or Azeri locale,
namely that case-insensitive lookups within the Zend engine (method
names, for example) fail on lookups involving upper-case I characters,
since lower-case I in those languages is ı instead of i (note the lack
of a dot).

Well, I'm going to assume that people have had whatever say they were
going to. It seems that we have three options, so let's put it to a
vote.

(To be completely clear, this is purely for trunk. This certainly
isn't a candidate for backporting to 5.3.)

The options are:

Apply Tomas's patch to make case-insensitive lookups
locale-ignorant. Pros: fixes immediate problem. Cons: breaks BC for
case-insensitive function/method name lookups for high-bit characters
in single-byte encodings. (Not that we've ever advertised or
documented that.)

Make function/method names case-sensitive, per Stan's e-mail. Pros:
fixes problem; brings PHP into line with most other languages; extra
consistency with variables; possible performance improvement. Cons: BC
break from current documented behaviour.

Do nothing. Pros: no BC breaks of any kind. Cons: continues to
annoy Turkish and Azeri developers and those developing for those
locales.

If you'd care to reply with a vote for option 1, 2 or 3, I'll tally up
the votes in a week or so. And yes, I am volunteering to deal with
this should option 1 or 2 be picked.

Adam

No idea if I have the right to vote, but here goes:
+1 for option 2

BR,
Steven

15 years ago by Mark Skilbeck — view source

unread

The options are:

Apply Tomas's patch to make case-insensitive lookups
locale-ignorant. Pros: fixes immediate problem. Cons: breaks BC for
case-insensitive function/method name lookups for high-bit characters
in single-byte encodings. (Not that we've ever advertised or
documented that.)

Make function/method names case-sensitive, per Stan's e-mail. Pros:
fixes problem; brings PHP into line with most other languages; extra
consistency with variables; possible performance improvement. Cons: BC
break from current documented behaviour.

Do nothing. Pros: no BC breaks of any kind. Cons: continues to
annoy Turkish and Azeri developers and those developing for those
locales.

If you'd care to reply with a vote for option 1, 2 or 3, I'll tally up
the votes in a week or so. And yes, I am volunteering to deal with
this should option 1 or 2 be picked.

Adam

Not that my input matters, but I vote for #2.

Mark Skilbeck.

15 years ago by Alexey Zakhlestin — view source

unread

Adam Harvey wrote:

Well, I'm going to assume that people have had whatever say they were
going to. It seems that we have three options, so let's put it to a
vote.

(To be completely clear, this is purely for trunk. This certainly
isn't a candidate for backporting to 5.3.)

The options are:

Apply Tomas's patch to make case-insensitive lookups
locale-ignorant. Pros: fixes immediate problem. Cons: breaks BC for
case-insensitive function/method name lookups for high-bit characters
in single-byte encodings. (Not that we've ever advertised or
documented that.)

Make function/method names case-sensitive, per Stan's e-mail. Pros:
fixes problem; brings PHP into line with most other languages; extra
consistency with variables; possible performance improvement. Cons: BC
break from current documented behaviour.

Do nothing. Pros: no BC breaks of any kind. Cons: continues to
annoy Turkish and Azeri developers and those developing for those
locales.

If you'd care to reply with a vote for option 1, 2 or 3, I'll tally up
the votes in a week or so. And yes, I am volunteering to deal with
this should option 1 or 2 be picked.

my vote: option 2

that's the most consistent solution

15 years ago by Richard Quadling — view source

unread

The options are:

Apply Tomas's patch to make case-insensitive lookups
locale-ignorant. Pros: fixes immediate problem. Cons: breaks BC for
case-insensitive function/method name lookups for high-bit characters
in single-byte encodings. (Not that we've ever advertised or
documented that.)

Make function/method names case-sensitive, per Stan's e-mail. Pros:
fixes problem; brings PHP into line with most other languages; extra
consistency with variables; possible performance improvement. Cons: BC
break from current documented behaviour.

Do nothing. Pros: no BC breaks of any kind. Cons: continues to
annoy Turkish and Azeri developers and those developing for those
locales.

+1 for Option 2. Coming from languages where are case sensitive it has
never occurred to me that PHP wasn't case sensitive. So no BC for at
least 1 user!

--

Richard Quadling
"Standing on the shoulders of some very clever giants!"
EE : http://www.experts-exchange.com/M_248814.html
EE4Free : http://www.experts-exchange.com/becomeAnExpert.jsp
Zend Certified Engineer : http://zend.com/zce.php?c=ZEND002498&r=213474731
ZOPA : http://uk.zopa.com/member/RQuadling

15 years ago by Ferenc Kovacs — view source

unread

As at least some of you would already be aware, there's a
long-standing issue with using PHP in a Turkish or Azeri locale,
namely that case-insensitive lookups within the Zend engine (method
names, for example) fail on lookups involving upper-case I characters,
since lower-case I in those languages is ı instead of i (note the lack
of a dot).

Well, I'm going to assume that people have had whatever say they were
going to. It seems that we have three options, so let's put it to a
vote.

(To be completely clear, this is purely for trunk. This certainly
isn't a candidate for backporting to 5.3.)

The options are:

Apply Tomas's patch to make case-insensitive lookups
locale-ignorant. Pros: fixes immediate problem. Cons: breaks BC for
case-insensitive function/method name lookups for high-bit characters
in single-byte encodings. (Not that we've ever advertised or
documented that.)

Make function/method names case-sensitive, per Stan's e-mail. Pros:
fixes problem; brings PHP into line with most other languages; extra
consistency with variables; possible performance improvement. Cons: BC
break from current documented behaviour.

Do nothing. Pros: no BC breaks of any kind. Cons: continues to
annoy Turkish and Azeri developers and those developing for those
locales.

If you'd care to reply with a vote for option 1, 2 or 3, I'll tally up
the votes in a week or so. And yes, I am volunteering to deal with
this should option 1 or 2 be picked.

Adam

--

+1 for the 2nd option.

Tyrael

15 years ago by Derick Rethans — view source

unread

The options are:

Apply Tomas's patch to make case-insensitive lookups
locale-ignorant. Pros: fixes immediate problem. Cons: breaks BC for
case-insensitive function/method name lookups for high-bit characters
in single-byte encodings. (Not that we've ever advertised or
documented that.)

People do do this though.

I'm for option 2.

Derick

--
http://derickrethans.nl | http://xdebug.org
Like Xdebug? Consider a donation: http://xdebug.org/donate.php
twitter: @derickr and @xdebug

15 years ago by Tomas Kuliavas — view source

unread

2010.05.04 17:56 Derick Rethans rašė:

The options are:

Apply Tomas's patch to make case-insensitive lookups
locale-ignorant. Pros: fixes immediate problem. Cons: breaks BC for
case-insensitive function/method name lookups for high-bit characters
in single-byte encodings. (Not that we've ever advertised or
documented that.)

People do do this though.

I'm for option 2.

Change to 100% case-insensitive function names has bigger probability of
BC break. I think I've seen code which used functions in a way that
depended on case insensitive lookups and same code had problems with
Turkish, because case insensitive dependency was only on latin I.

Option 1 maintains BC for ascii names. high bit characters don't break
only in some locales. You will be lucky until you hit something in
0xC0-0xDF range that does not have direct match in 0xE0-0xFF range, you
will enter minefield, if you use 0x80-0xBF and code will be hosed when
locale does not support any usual iso-8859-1 high-bit characters matching.

--
Tomas

15 years ago by Tomas Kuliavas — view source

unread

2010.05.04 20:20 Tomas Kuliavas rašė:

2010.05.04 17:56 Derick Rethans rašė:

The options are:

Apply Tomas's patch to make case-insensitive lookups
locale-ignorant. Pros: fixes immediate problem. Cons: breaks BC for
case-insensitive function/method name lookups for high-bit characters
in single-byte encodings. (Not that we've ever advertised or
documented that.)

People do do this though.

I'm for option 2.

Change to 100% case-insensitive function names has bigger probability of
BC break.

typo. it is "100% case sensitive function names".

I think code in question is related to strcasecmp() or some other standard
PHP string function. Make sure that you don't break other case insensitive
functions, if you go with option two.

--
Tomas

15 years ago by Stanislav Malyshev — view source

unread

Hi!

Change to 100% case-insensitive function names has bigger probability of
BC break. I think I've seen code which used functions in a way that

It's not a probability, it's a certainty. There's tons of code out there
(I'm sure including 99% of all popular apps and frameworks) that uses
different cases somehow somewhere).
If there was no BC issue, I'd definitely be for case-sensitive names,
but with BC it may become kind of tricky...

Stanislav Malyshev, Zend Software Architect
stas@zend.com http://www.zend.com/
(408)253-8829 MSN: stas@zend.com

15 years ago by Patrick ALLAERT — view source

unread

2010/5/4 Adam Harvey aharvey@php.net:

As at least some of you would already be aware, there's a
long-standing issue with using PHP in a Turkish or Azeri locale,
namely that case-insensitive lookups within the Zend engine (method
names, for example) fail on lookups involving upper-case I characters,
since lower-case I in those languages is ı instead of i (note the lack
of a dot).

Well, I'm going to assume that people have had whatever say they were
going to. It seems that we have three options, so let's put it to a
vote.

(To be completely clear, this is purely for trunk. This certainly
isn't a candidate for backporting to 5.3.)

The options are:

Apply Tomas's patch to make case-insensitive lookups
locale-ignorant. Pros: fixes immediate problem. Cons: breaks BC for
case-insensitive function/method name lookups for high-bit characters
in single-byte encodings. (Not that we've ever advertised or
documented that.)

Make function/method names case-sensitive, per Stan's e-mail. Pros:
fixes problem; brings PHP into line with most other languages; extra
consistency with variables; possible performance improvement. Cons: BC
break from current documented behaviour.

Once and for all: +1 for #2 (BTW that kind of BC will not be that hard to fix!)

RMs: should this really be part of PHP 5.4 if it gets approved?

Do nothing. Pros: no BC breaks of any kind. Cons: continues to
annoy Turkish and Azeri developers and those developing for those
locales.

If you'd care to reply with a vote for option 1, 2 or 3, I'll tally up
the votes in a week or so. And yes, I am volunteering to deal with
this should option 1 or 2 be picked.

Adam

--

Patrick

15 years ago by Ferenc Kovacs — view source

unread

On Wed, May 5, 2010 at 8:44 AM, Patrick ALLAERT patrickallaert@php.netwrote:

2010/5/4 Adam Harvey aharvey@php.net:

As at least some of you would already be aware, there's a
long-standing issue with using PHP in a Turkish or Azeri locale,
namely that case-insensitive lookups within the Zend engine (method
names, for example) fail on lookups involving upper-case I characters,
since lower-case I in those languages is ı instead of i (note the lack
of a dot).

Well, I'm going to assume that people have had whatever say they were
going to. It seems that we have three options, so let's put it to a
vote.

(To be completely clear, this is purely for trunk. This certainly
isn't a candidate for backporting to 5.3.)

The options are:

Apply Tomas's patch to make case-insensitive lookups
locale-ignorant. Pros: fixes immediate problem. Cons: breaks BC for
case-insensitive function/method name lookups for high-bit characters
in single-byte encodings. (Not that we've ever advertised or
documented that.)

Make function/method names case-sensitive, per Stan's e-mail. Pros:
fixes problem; brings PHP into line with most other languages; extra
consistency with variables; possible performance improvement. Cons: BC
break from current documented behaviour.

Once and for all: +1 for #2 (BTW that kind of BC will not be that hard to
fix!)

RMs: should this really be part of PHP 5.4 if it gets approved?

AFAIK the new stable branch version number hasn't been decided yet.

Tyrael

15 years ago by Lukas Kahwe Smith — view source

unread

2010/5/4 Adam Harvey aharvey@php.net:

As at least some of you would already be aware, there's a
long-standing issue with using PHP in a Turkish or Azeri locale,
namely that case-insensitive lookups within the Zend engine (method
names, for example) fail on lookups involving upper-case I characters,
since lower-case I in those languages is ı instead of i (note the lack
of a dot).

Well, I'm going to assume that people have had whatever say they were
going to. It seems that we have three options, so let's put it to a
vote.

(To be completely clear, this is purely for trunk. This certainly
isn't a candidate for backporting to 5.3.)

The options are:

Apply Tomas's patch to make case-insensitive lookups
locale-ignorant. Pros: fixes immediate problem. Cons: breaks BC for
case-insensitive function/method name lookups for high-bit characters
in single-byte encodings. (Not that we've ever advertised or
documented that.)

Make function/method names case-sensitive, per Stan's e-mail. Pros:
fixes problem; brings PHP into line with most other languages; extra
consistency with variables; possible performance improvement. Cons: BC
break from current documented behaviour.

Once and for all: +1 for #2 (BTW that kind of BC will not be that hard to fix!)

we have had the topic of making PHP case sensitive before. I do not find the above reason all that compelling to make this change. However there are several other reasons that imho are more relevant for making this change.

RMs: should this really be part of PHP 5.4 if it gets approved?

no way.

regards,
Lukas Kahwe Smith
mls@pooteeweet.org

15 years ago by Ford — view source

unread

-----Original Message-----
From: adam@adamharvey.name [mailto:adam@adamharvey.name] On Behalf
Of Adam Harvey
Sent: 04 May 2010 13:15

The options are:

Apply Tomas's patch to make case-insensitive lookups
locale-ignorant. Pros: fixes immediate problem. Cons: breaks BC for
case-insensitive function/method name lookups for high-bit
characters
in single-byte encodings. (Not that we've ever advertised or
documented that.)

Make function/method names case-sensitive, per Stan's e-mail.
Pros:
fixes problem; brings PHP into line with most other languages; extra
consistency with variables; possible performance improvement. Cons:
BC
break from current documented behaviour.

Do nothing. Pros: no BC breaks of any kind. Cons: continues to
annoy Turkish and Azeri developers and those developing for those
locales.

Low karma here, too, but for what it's worth I'm for option 2.

(I can never remember what's case sensitive and what's not, so I always write code that would work either way anyway!)

Cheers!

Mike

Mike Ford,
Electronic Information Developer, Libraries and Learning Innovation,
Leeds Metropolitan University, C507, Civic Quarter Campus,
Woodhouse Lane, LEEDS, LS1 3HE, United Kingdom
Email: m.ford@leedsmet.ac.uk
Tel: +44 113 812 4730

To view the terms under which this email is distributed, please go to http://disclaimer.leedsmet.ac.uk/email.htm

15 years ago by Hannes Magnusson — view source

unread

As at least some of you would already be aware, there's a
long-standing issue with using PHP in a Turkish or Azeri locale,
namely that case-insensitive lookups within the Zend engine (method
names, for example) fail on lookups involving upper-case I characters,
since lower-case I in those languages is ı instead of i (note the lack
of a dot).

Well, I'm going to assume that people have had whatever say they were
going to. It seems that we have three options, so let's put it to a
vote.

(To be completely clear, this is purely for trunk. This certainly
isn't a candidate for backporting to 5.3.)

The options are:

Apply Tomas's patch to make case-insensitive lookups
locale-ignorant. Pros: fixes immediate problem. Cons: breaks BC for
case-insensitive function/method name lookups for high-bit characters
in single-byte encodings. (Not that we've ever advertised or
documented that.)

Make function/method names case-sensitive, per Stan's e-mail. Pros:
fixes problem; brings PHP into line with most other languages; extra
consistency with variables; possible performance improvement. Cons: BC
break from current documented behaviour.

There is no way this can happen. It will break massive amount of code
and will cause major headaches for people using __call().

-Hannes

15 years ago by Steven Van Poeck — view source

unread

Hannes Magnusson wrote:

Make function/method names case-sensitive, per Stan's e-mail. Pros:
fixes problem; brings PHP into line with most other languages; extra
consistency with variables; possible performance improvement. Cons: BC
break from current documented behaviour.

There is no way this can happen. It will break massive amount of code
and will cause major headaches for people using __call().

-Hannes

Can you give an example of consistent code where this evolution would
cause __call() not to function properly ? I'm afraid I can't think of any...

Steven

15 years ago by Hannes Magnusson — view source

unread

Hannes Magnusson wrote:

Make function/method names case-sensitive, per Stan's e-mail. Pros:
fixes problem; brings PHP into line with most other languages; extra
consistency with variables; possible performance improvement. Cons: BC
break from current documented behaviour.

There is no way this can happen. It will break massive amount of code
and will cause major headaches for people using __call().

-Hannes

Can you give an example of consistent code where this evolution would
cause __call() not to function properly ? I'm afraid I can't think of any...

Can you give an example of consistent code? Just any. Any at all.
Doesn't have to be long.

class Logs {
function getSQLLogger() {
return $this->logs["sql"];
}
function __call() {
return $this->logs["default"];
}
}

$logs->getSqlLogger()->logSql("...");
..call to undefined method default::logSql()...

-Hannes

15 years ago by Steven Van Poeck — view source

unread

Hannes Magnusson wrote:

Hannes Magnusson wrote:

Make function/method names case-sensitive, per Stan's e-mail. Pros:
fixes problem; brings PHP into line with most other languages; extra
consistency with variables; possible performance improvement. Cons: BC
break from current documented behaviour.

There is no way this can happen. It will break massive amount of code
and will cause major headaches for people using __call().

-Hannes

Can you give an example of consistent code where this evolution would
cause __call() not to function properly ? I'm afraid I can't think of any...

Can you give an example of consistent code? Just any. Any at all.
Doesn't have to be long.

class Logs {
function getSQLLogger() {
return $this->logs["sql"];
}
function __call() {
return $this->logs["default"];
}
}

$logs->getSqlLogger()->logSql("...");
..call to undefined method default::logSql()...

-Hannes

Right. That's what I meant by inconsistent code. The call
$logs->getSQLLogger()->logSql("...") would function I presume.
The reason your example code does not is because it is inconsistent.
You're calling $logs->getSqlLogger() instead of $logs->getSQLLogger()

Or am I missing your point ?

Steven

15 years ago by Adam Harvey — view source

unread

Right. That's what I meant by inconsistent code. The call
$logs->getSQLLogger()->logSql("...") would function I presume.
The reason your example code does not is because it is inconsistent. You're
calling $logs->getSqlLogger() instead of $logs->getSQLLogger()

Agreed, but not everyone's going to spot that first time, particularly
the sorts of newer programmers we get quite a lot of writing PHP.
Also, the error message you get out the other end is suboptimal, to
say the least.

I've been holding out on stating my reasoning much beyond my initial
post and vote to try avoiding influencing anyone, but I'm opinionated,
so here goes. :)

Here's an example of my problem with enforcing case-sensitivity in
general that builds on something Johannes said yesterday on IRC: how
many people use different cases for the gd functions? Taking, say,
imagestring as a simple example and having a quick look at Google Code
Search, the variants people are using there are:

ImageString: about 3000 results [0]
imagestring: about 4000 results [1]
imageString: 453 results [2]
IMAGESTRING: 1 result [3]

There are probably other variations in use too, and that's one of the
more simply named functions in gd.

The above example pretty much sums up why I went for option 1 instead
of 2 in my vote: it's still a BC break, but I honestly thing it's
going to be a much more minor break (remember, it only affects people
in certain Western European locales such as French, Spanish and German
who are also using single-byte encodings — most Linux distros I know
of are shipping with UTF-8 as a default, so they're already
effectively case-sensitive for non-ASCII characters) than changing the
entire language and probably causing issues in a wide variety of
applications.

If I was designing the language anew, sure, I'd go for
case-sensitivity the whole way, but as things stand, I'm pretty
dubious it would work out.

Adam

[0] http://www.google.com.au/codesearch?as_q=%22ImageString%22&btnG=Search+Code&hl=en&as_lang=php&as_license_restrict=i&as_license=&as_package=&as_filename=&as_case=y
[1] http://www.google.com.au/codesearch?as_q=imagestring&btnG=Search+Code&hl=en&as_lang=php&as_license_restrict=i&as_license=&as_package=&as_filename=&as_case=y
[2] http://www.google.com.au/codesearch?as_q=imageString&btnG=Search+Code&hl=en&as_lang=php&as_license_restrict=i&as_license=&as_package=&as_filename=&as_case=y
[3] http://www.google.com.au/codesearch?as_q=IMAGESTRING&btnG=Search+Code&hl=en&as_lang=php&as_license_restrict=i&as_license=&as_package=&as_filename=&as_case=y

15 years ago by Richard Quadling — view source

unread

Right. That's what I meant by inconsistent code. The call
$logs->getSQLLogger()->logSql("...") would function I presume.
The reason your example code does not is because it is inconsistent. You're
calling $logs->getSqlLogger() instead of $logs->getSQLLogger()

Agreed, but not everyone's going to spot that first time, particularly
the sorts of newer programmers we get quite a lot of writing PHP.
Also, the error message you get out the other end is suboptimal, to
say the least.

I've been holding out on stating my reasoning much beyond my initial
post and vote to try avoiding influencing anyone, but I'm opinionated,
so here goes. :)

Here's an example of my problem with enforcing case-sensitivity in
general that builds on something Johannes said yesterday on IRC: how
many people use different cases for the gd functions? Taking, say,
imagestring as a simple example and having a quick look at Google Code
Search, the variants people are using there are:

ImageString: about 3000 results [0]
imagestring: about 4000 results [1]
imageString: 453 results [2]
IMAGESTRING: 1 result [3]

There are probably other variations in use too, and that's one of the
more simply named functions in gd.

The above example pretty much sums up why I went for option 1 instead
of 2 in my vote: it's still a BC break, but I honestly thing it's
going to be a much more minor break (remember, it only affects people
in certain Western European locales such as French, Spanish and German
who are also using single-byte encodings — most Linux distros I know
of are shipping with UTF-8 as a default, so they're already
effectively case-sensitive for non-ASCII characters) than changing the
entire language and probably causing issues in a wide variety of
applications.

If I was designing the language anew, sure, I'd go for
case-sensitivity the whole way, but as things stand, I'm pretty
dubious it would work out.

Adam

[0] http://www.google.com.au/codesearch?as_q=%22ImageString%22&btnG=Search+Code&hl=en&as_lang=php&as_license_restrict=i&as_license=&as_package=&as_filename=&as_case=y
[1] http://www.google.com.au/codesearch?as_q=imagestring&btnG=Search+Code&hl=en&as_lang=php&as_license_restrict=i&as_license=&as_package=&as_filename=&as_case=y
[2] http://www.google.com.au/codesearch?as_q=imageString&btnG=Search+Code&hl=en&as_lang=php&as_license_restrict=i&as_license=&as_package=&as_filename=&as_case=y
[3] http://www.google.com.au/codesearch?as_q=IMAGESTRING&btnG=Search+Code&hl=en&as_lang=php&as_license_restrict=i&as_license=&as_package=&as_filename=&as_case=y

--

IF case sensitivity is going to be incorporated and the BC is too
great, is the following an option.

1 - Introduce case sensitivity with an option to allow a fallback to
the existing mechanism.
2 - Generate an E_NOTICE or an E_STRICT to inform developers/users of the issue.
3 - Stick with the fallback for a while until it is removed
(introduced in 5.x then drop in 7.x).

Alternatively, only activate case sensitivity if E_STRICT is set. If a
developer is creating E_STRICT code, I doubt that they are sloppy with
their case.

E_STRICT is documented as "Enable to have PHP suggest changes to your
code which will ensure the best interoperability and forward
compatibility of your code.". Which seems a perfect fit.

--

15 years ago by Pierre Joye — view source

unread

Apply Tomas's patch to make case-insensitive lookups
locale-ignorant. Pros: fixes immediate problem. Cons: breaks BC for
case-insensitive function/method name lookups for high-bit characters
in single-byte encodings. (Not that we've ever advertised or
documented that.)

Make function/method names case-sensitive, per Stan's e-mail. Pros:
fixes problem; brings PHP into line with most other languages; extra
consistency with variables; possible performance improvement. Cons: BC
break from current documented behaviour.

Do nothing. Pros: no BC breaks of any kind. Cons: continues to
annoy Turkish and Azeri developers and those developing for those
locales.

I don't think that option 2 and 3 can be done in 5.x. However I'm +1
for option 2 in PHP 6 (whenever it comes).

Cheers,

Pierre

@pierrejoye | http://blog.thepimp.net | http://www.libgd.org

15 years ago by Pierre Joye — view source

unread

I don't think that option 2 and 3 can be done in 5.x. However I'm +1
for option 2 in PHP 6 (whenever it comes).

I meant option 1 and 2.

--
Pierre

@pierrejoye | http://blog.thepimp.net | http://www.libgd.org

15 years ago by Steven Van Poeck — view source

unread

Pierre Joye wrote:

Apply Tomas's patch to make case-insensitive lookups
locale-ignorant. Pros: fixes immediate problem. Cons: breaks BC for
case-insensitive function/method name lookups for high-bit characters
in single-byte encodings. (Not that we've ever advertised or
documented that.)

Make function/method names case-sensitive, per Stan's e-mail. Pros:
fixes problem; brings PHP into line with most other languages; extra
consistency with variables; possible performance improvement. Cons: BC
break from current documented behaviour.

Do nothing. Pros: no BC breaks of any kind. Cons: continues to
annoy Turkish and Azeri developers and those developing for those
locales.

I don't think that option 2 and 3 can be done in 5.x. However I'm +1
for option 2 in PHP 6 (whenever it comes).

Cheers,

As Lukas clearly stated, case sensivity is not an option for PHP 5.4 so
I was voting for option 2 in PHP6 anyway :)

Steven

15 years ago by Jan Schneider — view source

unread

Zitat von Adam Harvey aharvey@php.net:

Well, I'm going to assume that people have had whatever say they were
going to. It seems that we have three options, so let's put it to a
vote.

+1 for option 1.

Unless we can have some aliases to fix the problem with some PHP
functions being documented non-all-lower-case, like the GD functions
mentioned earlier in the thread. In that case: +1 for 2.

Jan.

--
Do you need professional PHP or Horde consulting?
http://horde.org/consulting/

15 years ago by Joel Perras — view source

unread

+1 for option #2.

Joël.

As at least some of you would already be aware, there's a
long-standing issue with using PHP in a Turkish or Azeri locale,
namely that case-insensitive lookups within the Zend engine (method
names, for example) fail on lookups involving upper-case I characters,
since lower-case I in those languages is ı instead of i (note the lack
of a dot).

The long term plan for this, per bug #35050 and any number of
duplicates, was to deal with it in PHP 6. Since PHP 6 isn't going to
happen in its original form, I think we're going to need to revisit
how we want to deal with this. There's a patch linked in the bug from
Tomas Kuliavas and Marcus that fixes the problem by simply redefining
zend_tolower() to a simple locale-insensitive ASCII tolower()
function, which does fix the Turkish and Azeri locales.

The potential breakage from this is that single-byte locales will no
longer get case-insensitive lookups of non-ASCII characters: for
example, somebody using fr_FR.ISO-8859-1 as a locale could no longer
call a method É() as é(). Since it doesn't break anything when using
multi-byte locales (which have never had case-insensitive lookups
anyway since the Zend Engine uses the single-byte tolower()
internally), my inclination would be to apply the patch on trunk and
document it as a BC issue.

I've uploaded an updated version of Tomas's patch that applies cleanly
to trunk to http://www.adamharvey.name/patches/35050/zend_operators.c.diff
and a phpt file to test the fix to
http://www.adamharvey.name/patches/35050/bug35050.phpt. It's likely
that the test would require massaging before being committed to work
on Windows, but since I don't have a Windows development box readily
available and don't know a thing about how Windows implements locale
support, this would require help from someone familiar with the
platform.

So: thoughts; concerns; alternate approaches? It would be nice to have
this sorted for PHP.next.

Thanks,

Adam

--

--
I do know everything, just not all at once. It's a virtual memory problem.

15 years ago by Etienne Kneuss — view source

unread

Hi,

A definite, -1 for #2, it's a massive BC break with no justification
so far IMHO.

The optimization point is quite moot, tolower could be restricted to
compilation + dynamic accesses, which would remove most of them
already.

OTOH option #1 seems like the most sensible approach, breaking only in
very limited cases, so +1 from me.

Best,

+1 for option #2.

Joël.

As at least some of you would already be aware, there's a
long-standing issue with using PHP in a Turkish or Azeri locale,
namely that case-insensitive lookups within the Zend engine (method
names, for example) fail on lookups involving upper-case I characters,
since lower-case I in those languages is ı instead of i (note the lack
of a dot).

The long term plan for this, per bug #35050 and any number of
duplicates, was to deal with it in PHP 6. Since PHP 6 isn't going to
happen in its original form, I think we're going to need to revisit
how we want to deal with this. There's a patch linked in the bug from
Tomas Kuliavas and Marcus that fixes the problem by simply redefining
zend_tolower() to a simple locale-insensitive ASCII tolower()
function, which does fix the Turkish and Azeri locales.

The potential breakage from this is that single-byte locales will no
longer get case-insensitive lookups of non-ASCII characters: for
example, somebody using fr_FR.ISO-8859-1 as a locale could no longer
call a method É() as é(). Since it doesn't break anything when using
multi-byte locales (which have never had case-insensitive lookups
anyway since the Zend Engine uses the single-byte tolower()
internally), my inclination would be to apply the patch on trunk and
document it as a BC issue.

I've uploaded an updated version of Tomas's patch that applies cleanly
to trunk to http://www.adamharvey.name/patches/35050/zend_operators.c.diff
and a phpt file to test the fix to
http://www.adamharvey.name/patches/35050/bug35050.phpt. It's likely
that the test would require massaging before being committed to work
on Windows, but since I don't have a Windows development box readily
available and don't know a thing about how Windows implements locale
support, this would require help from someone familiar with the
platform.

So: thoughts; concerns; alternate approaches? It would be nice to have
this sorted for PHP.next.

Thanks,

Adam

--

--
I do know everything, just not all at once. It's a virtual memory problem.

--

--
Etienne Kneuss
http://www.colder.ch

Turkish/Azeri locale support

--

It's not a probability, it's a certainty. There's tons of code out there (I'm sure including 99% of all popular apps and frameworks) that uses different cases somehow somewhere). If there was no BC issue, I'd definitely be for case-sensitive names, but with BC it may become kind of tricky...

Mike

Or am I missing your point ?

--

Cheers,

As Lukas clearly stated, case sensivity is not an option for PHP 5.4 so I was voting for option 2 in PHP6 anyway :)

It's not a probability, it's a certainty. There's tons of code out there
(I'm sure including 99% of all popular apps and frameworks) that uses
different cases somehow somewhere).
If there was no BC issue, I'd definitely be for case-sensitive names,
but with BC it may become kind of tricky...

As Lukas clearly stated, case sensivity is not an option for PHP 5.4 so
I was voting for option 2 in PHP6 anyway :)