Newsgroups: php.internals
Path: news.php.net
Xref: news.php.net php.internals:47252
Mailing-List: contact internals-help@lists.php.net; run by ezmlm
Received-SPF: error (pb1.pair.com: domain rowe-clan.net from 72.167.82.87 cause and error)
Message-ID: <4B9C91D7.2050402@rowe-clan.net>
Date: Sun, 14 Mar 2010 01:35:51 -0600
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.8) Gecko/20100227 Lightning/1.0b1 Thunderbird/3.0.3
MIME-Version: 1.0
To: internals@lists.php.net
References: <4B9C9007.1080802@lsces.co.uk>
In-Reply-To: <4B9C9007.1080802@lsces.co.uk>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Subject: Re: [PHP-DEV] Where are we ACTUALLY on Unicode?
From: wrowe@rowe-clan.net ("William A. Rowe Jr.")

If Unicode were the solution, the PHP project was on the right page with 6.0.
Sure there remained work to do, but...

How long did it take to realize UTF16 wasn't the end of the story?  UCS-4 is
the minimum to solve this, and we all agree that 32 bits aren't storing a single
char in the western world, no way, no how.

The UTF-8 solution is probably the right answer... you maintain 95% of char *UTF
behavior, and you gain international character representation.  The only Unicode
OS I can think of offhand is NT, and of course they hit the UCS-4 problem early.
They found this out 15+ years ago.

Sure it doesn't appear as atomic, one Xword per char, but the existing library
frameworks contain most of the string processing that is required.  There is no
16-bit network transmission API that I can think of, you are still devolving to
UTF-8 for client results.

To move forward with accepting -and preferring- UTF-8 as the representation of
characters throughout PHP, recognizing UTF-8 for char-length representations,
and so forth, would do wonders to move forwards.  And 8-bit octet data can be
set aside in the same data structures.  It is the straightforward answer, which
is probably why Linux did not repeat Windows NT decision, and adopted utf-8.