Greetings,
I have noticed a lot of recent comments, posts, and even Nikita's recent
PHP Russia video discussing scalar objects, a potential future feature
that I believe already has widespread support, and would have widespread
usage once it arrived.
I think most scalars would be self explanatory, but spurred on by
discussion on here and other places about string functions, I would like
to debate the string object in particular, and specifically the use of
encoding in combination with such a scalar object.
I see two options available:
Option 1
Every String() scalar-object would expose methods for standard byte-safe
strings, ascii and multibyte functions, this may result in something like:
"Hello".substr(1)
"Hello".mbSubstr(1)
Option 2
Allow the string to be bound to a specific encoding which would require
_zend_string to be extended with a pointer to a structure containing
encoding helpers. All of the php-src macros would need updating to take
these into account.
The scalar object methods would then use that to detect which
implementation to use.
"Hello".substr(1) // would work as expected regardless of encoding
My question to everyone is, what mechanism would be used to mark a
string as being of a specific encoding? Naturally a .toUTF8() would be
possible, but I'm not sure that would be as tidy as it could be.
"Hello".toUTF8()
$_GET['example'].toUTF8()
In certain languages, a basic string can be prefixed with L to treat it
as a 16 bit wide character, particularly useful for Windows API calls.
Perhaps that could be the way to go for interned strings in the code itself?
L"Hello"
L$_GET['example']
But most strings we use will be coming from an external source, such as
user input or a database, what would be the cleanest way to mark them as
having a specific encoding?
Perhaps going a bit more out-there, would this perhaps bring about the
necessity of a adding a specific encoded native type for at least the
defacto encoding for the web?
function x(utf8_string $x): utf8_string { ... }
These are all just questions I have no answer to or firm opinion on, but
I would be interested to know people's general ideas as to solutions.
--
Mark Randall