I spent some time, reviewing an old Andrea's idea about packed strings.
The idea is simple. In every place were we use zend_string*, we may store characters directly.
We use low byte to encode packed string marker and string length, we also need one byte for trailing zero, so we can keep up to 2-characters on 32-bit system and up to 6 characters on 64-bit without allocation of additional memory.
The refreshed dirty PoC implementation https://github.com/php/php-src/compare/master...dstogov:packedStrings2?expand=1
You may take a quick look only into zend_string.h changes (the rest is almost a monkey work).
I was able to run bench.php, and probably won't go forward.
Unfortunately, I got into two serious problems:
- The original implementation used packed strings their selves as their hash value. This leaded to huge slowdown, because of hash collisions. (e.g. on bench.php hash1()). I switched to hash recalculation on each usage, but this negates the benefit of allocation elimination. Probably, we may use a cheaper hash function for packed strings...
- PHP still uses char* in many places. When we take ZSTR_VAL() from a packed string stored in local variable (or function argument), we may very easy get a dangling pointer. (e.g. INI directives processed by OnUpdateString, internal functions parameters received as char*, ...). Changing all this char* into zend_string* would help, but looks unrealistic for PHP-7.3.
So, I gave up for now.
I decided, to share these results. May be someone would get related ideas.