Hi guys,
I am working on and off on my own programming language, but knowing myself
well, I know that this is just a hobby project that will never be finished.
I have an idea for PHP7 that may or may not be relevant to you guys. I do
code in C but I don't have the time to get deeply involved in PHP.
Therefore I will present my idea. I hope that your reply will be "Oh,
PHP(x) already does this!" or "Great, that sounds useful to us!" :-)
The idea is especially relevant to large web hosting companies that run
tens of thousands of PHP-powered websites. It has only little relevance to
a small VPS running few pages.
The basic idea is as follows:
Try to reduce the global memory load on the PHP opcode cache and the server
by sharing cached module's opcodes across totally independent sites on the
server.
I know this has to be done very carefully, otherwise all hell will break
lose :-)
My take on this idea is this:
- Compute an SHA512 checksum for all source files that are entered into
the opcode cache. - If the SHA512 checksum is identical to an existing checksum, the files
are considered identical and can share entries in the opcode cache (which
needs a reference count or something). - To reduce the overhead of computing the SHA512 checksum, the SHA512
checksums are cached in another cache. - This makes for two caches: the checksum cache and the opcode cache
(called "the module cache" below).
The first cache, the checksum cache, is updated whenever a new PHP source
file needs to be cached.
The second cache, the opcode cache, is ONLY updated if the checksum is not
already in the checksum cache.
The idea here is that PHPx can cache opcodes from, say, 200 different
WordPress installations (of the exact same version, although this happens
automatically) as ONE opcode cache entry. This should really be a nice
gift to all the great web hosting companies out there that offer PHPx as
part of their service. Suddenly their machines may have 200-1000 times "as
much" opcode memory as before.
I hope this makes you happier not unhappier! If my idea is crazy or cannot
be done for reasons I don't see, please let me know. I just think it is a
cool idea and it is a shame that it lies unused on my harddisk when you
might as well make use of it. I use PHP daily and I so much look forward
to my web hosting company upgrading to PHP7 because of the default
exception error reporting and the enhanced opcode cache.
P.S. The sample code is written in my own language, but should be readable
to virtually anybody.
Cheers,
Archfrog
My private notes (in case they help somebody understand what I mean) are:
A source file contains a single module, which may contain definitions
for any number of symbols. The source file is cached using
its SHA-512 value so that multiple identical source files in the
system are treated as one. This saves a lot of space in shared
hosting environments where the same source files (WordPress, etc.) are
located in thousands of locations on the disk. Instead of
keeping thousands of redundant copies in memory, this system only
keeps a single copy. The location of the source mode has no
influence on how it is cached, unlike in traditional systems, and the
date-time value of the file is also ignored due to the fact
that many hosting environments only allow FTP access, which is not
generally guaranteed to preserve the time stamp of files during
uploaded.
To simplify things, the system mandates that any and all source files
can fit into available memory. Large source files, of
gigabytes in size, are thus not supported, which really shouldn't be a
problem in real life.
This design makes use of a two-level cache:
- The first cache is the SHA-512 cache, which maps a ( path, time,
data ) triplet into an SHA-512 value in such a way that the
data block is not read from disk unless the path+time is missing in
the cache, in which case the cache item is recomputed.2. The second
cache is the module cache, which maps an SHA-512 value into a
ready-to-use module (a sequence of opcodes).
These two caches together are expected to eliminate 90+ percent of all
checksum calculations and source file compilation
operations.
When a new source file needs to processed, the driver performs the
following steps:
# Create new 'File' object containing the data (lazily loaded),
path, size, and time of the file. create file := new
Braceless0.Platform.Disk.File(path)
# Try to retrieve the checksum value from the checksum cache,
otherwise compute it and update the checksum cache. create checksum
is SHA512 if not checksums.Lookup(file.Path, file.Time, out
checksum): let checksum := new Checksum(file.Data) call
checksums.Insert(file.Path, file.Time, checksum)
# Try to retrieve the compiled wordcode from the scripts cache,
otherwise compile it and update the scripts cache. create script is
Script if not scripts.Lookup(checksum, out script): let
script := new Script(file) call scripts.Insert(checksum,
script)
Morning Mikeal,
The memory that opcache uses is anonymously mapped shared memory, this
cannot be shared across distinct processes, it is not backed by any file.
Even if you were to back the memory with a file, the most efficient
synchronization primitives that exists cannot be shared across distinct
processes either.
This is why opcache, apc(u) etc only work as expected in a prefork multi
processing model.
Cheers
Joe
Hi guys,
I am working on and off on my own programming language, but knowing myself
well, I know that this is just a hobby project that will never be finished.I have an idea for PHP7 that may or may not be relevant to you guys. I do
code in C but I don't have the time to get deeply involved in PHP.
Therefore I will present my idea. I hope that your reply will be "Oh,
PHP(x) already does this!" or "Great, that sounds useful to us!" :-)The idea is especially relevant to large web hosting companies that run
tens of thousands of PHP-powered websites. It has only little relevance to
a small VPS running few pages.The basic idea is as follows:
Try to reduce the global memory load on the PHP opcode cache and the server
by sharing cached module's opcodes across totally independent sites on the
server.I know this has to be done very carefully, otherwise all hell will break
lose :-)My take on this idea is this:
- Compute an SHA512 checksum for all source files that are entered into
the opcode cache.- If the SHA512 checksum is identical to an existing checksum, the files
are considered identical and can share entries in the opcode cache (which
needs a reference count or something).- To reduce the overhead of computing the SHA512 checksum, the SHA512
checksums are cached in another cache.- This makes for two caches: the checksum cache and the opcode cache
(called "the module cache" below).The first cache, the checksum cache, is updated whenever a new PHP source
file needs to be cached.
The second cache, the opcode cache, is ONLY updated if the checksum is not
already in the checksum cache.The idea here is that PHPx can cache opcodes from, say, 200 different
WordPress installations (of the exact same version, although this happens
automatically) as ONE opcode cache entry. This should really be a nice
gift to all the great web hosting companies out there that offer PHPx as
part of their service. Suddenly their machines may have 200-1000 times "as
much" opcode memory as before.I hope this makes you happier not unhappier! If my idea is crazy or cannot
be done for reasons I don't see, please let me know. I just think it is a
cool idea and it is a shame that it lies unused on my harddisk when you
might as well make use of it. I use PHP daily and I so much look forward
to my web hosting company upgrading to PHP7 because of the default
exception error reporting and the enhanced opcode cache.P.S. The sample code is written in my own language, but should be readable
to virtually anybody.Cheers,
ArchfrogMy private notes (in case they help somebody understand what I mean) are:
A source file contains a single module, which may contain definitions
for any number of symbols. The source file is cached using
its SHA-512 value so that multiple identical source files in the
system are treated as one. This saves a lot of space in shared
hosting environments where the same source files (WordPress, etc.) are
located in thousands of locations on the disk. Instead of
keeping thousands of redundant copies in memory, this system only
keeps a single copy. The location of the source mode has no
influence on how it is cached, unlike in traditional systems, and the
date-time value of the file is also ignored due to the fact
that many hosting environments only allow FTP access, which is not
generally guaranteed to preserve the time stamp of files during
uploaded.To simplify things, the system mandates that any and all source files
can fit into available memory. Large source files, of
gigabytes in size, are thus not supported, which really shouldn't be a
problem in real life.This design makes use of a two-level cache:
- The first cache is the SHA-512 cache, which maps a ( path, time,
data ) triplet into an SHA-512 value in such a way that the
data block is not read from disk unless the path+time is missing in
the cache, in which case the cache item is recomputed.2. The second
cache is the module cache, which maps an SHA-512 value into a
ready-to-use module (a sequence of opcodes).These two caches together are expected to eliminate 90+ percent of all
checksum calculations and source file compilation
operations.When a new source file needs to processed, the driver performs the
following steps:
# Create new 'File' object containing the data (lazily loaded),
path, size, and time of the file. create file := new
Braceless0.Platform.Disk.File(path)
# Try to retrieve the checksum value from the checksum cache,
otherwise compute it and update the checksum cache. create checksum
is SHA512 if not checksums.Lookup(file.Path, file.Time, out
checksum): let checksum := new Checksum(file.Data) call
checksums.Insert(file.Path, file.Time, checksum)
# Try to retrieve the compiled wordcode from the scripts cache,
otherwise compile it and update the scripts cache. create script is
Script if not scripts.Lookup(checksum, out script): let
script := new Script(file) call scripts.Insert(checksum,
script)
Ok, thanks for the clarification :-)
I guess an opcode cache server of some sort would be the only way to do
this, listening either on a Unix socket og a TCP/IP socket. Probably not
worth it anyway.
I'm going to leave the list again as I just wanted to share my "brilliant"
idea with you guys :-)
Cheers,
Mikael
man. 2. maj 2016 kl. 11.45 skrev Joe Watkins pthreads@pthreads.org:
Morning Mikeal,
The memory that opcache uses is anonymously mapped shared memory, this
cannot be shared across distinct processes, it is not backed by any file.Even if you were to back the memory with a file, the most efficient
synchronization primitives that exists cannot be shared across distinct
processes either.This is why opcache, apc(u) etc only work as expected in a prefork
multi processing model.Cheers
JoeOn Sun, May 1, 2016 at 11:12 PM, Mikael Lyngvig mikael@lyngvig.org
wrote:Hi guys,
I am working on and off on my own programming language, but knowing myself
well, I know that this is just a hobby project that will never be
finished.I have an idea for PHP7 that may or may not be relevant to you guys. I do
code in C but I don't have the time to get deeply involved in PHP.
Therefore I will present my idea. I hope that your reply will be "Oh,
PHP(x) already does this!" or "Great, that sounds useful to us!" :-)The idea is especially relevant to large web hosting companies that run
tens of thousands of PHP-powered websites. It has only little relevance
to
a small VPS running few pages.The basic idea is as follows:
Try to reduce the global memory load on the PHP opcode cache and the
server
by sharing cached module's opcodes across totally independent sites on the
server.I know this has to be done very carefully, otherwise all hell will break
lose :-)My take on this idea is this:
- Compute an SHA512 checksum for all source files that are entered into
the opcode cache.- If the SHA512 checksum is identical to an existing checksum, the files
are considered identical and can share entries in the opcode cache (which
needs a reference count or something).- To reduce the overhead of computing the SHA512 checksum, the SHA512
checksums are cached in another cache.- This makes for two caches: the checksum cache and the opcode cache
(called "the module cache" below).The first cache, the checksum cache, is updated whenever a new PHP source
file needs to be cached.
The second cache, the opcode cache, is ONLY updated if the checksum is not
already in the checksum cache.The idea here is that PHPx can cache opcodes from, say, 200 different
WordPress installations (of the exact same version, although this happens
automatically) as ONE opcode cache entry. This should really be a nice
gift to all the great web hosting companies out there that offer PHPx as
part of their service. Suddenly their machines may have 200-1000 times "as
much" opcode memory as before.I hope this makes you happier not unhappier! If my idea is crazy or
cannot
be done for reasons I don't see, please let me know. I just think it is a
cool idea and it is a shame that it lies unused on my harddisk when you
might as well make use of it. I use PHP daily and I so much look forward
to my web hosting company upgrading to PHP7 because of the default
exception error reporting and the enhanced opcode cache.P.S. The sample code is written in my own language, but should be readable
to virtually anybody.Cheers,
ArchfrogMy private notes (in case they help somebody understand what I mean) are:
A source file contains a single module, which may contain definitions
for any number of symbols. The source file is cached using
its SHA-512 value so that multiple identical source files in the
system are treated as one. This saves a lot of space in shared
hosting environments where the same source files (WordPress, etc.) are
located in thousands of locations on the disk. Instead of
keeping thousands of redundant copies in memory, this system only
keeps a single copy. The location of the source mode has no
influence on how it is cached, unlike in traditional systems, and the
date-time value of the file is also ignored due to the fact
that many hosting environments only allow FTP access, which is not
generally guaranteed to preserve the time stamp of files during
uploaded.To simplify things, the system mandates that any and all source files
can fit into available memory. Large source files, of
gigabytes in size, are thus not supported, which really shouldn't be a
problem in real life.This design makes use of a two-level cache:
- The first cache is the SHA-512 cache, which maps a ( path, time,
data ) triplet into an SHA-512 value in such a way that the
data block is not read from disk unless the path+time is missing inthe cache, in which case the cache item is recomputed.2. The second
cache is the module cache, which maps an SHA-512 value into a
ready-to-use module (a sequence of opcodes).These two caches together are expected to eliminate 90+ percent of all
checksum calculations and source file compilation
operations.When a new source file needs to processed, the driver performs the
following steps:
# Create new 'File' object containing the data (lazily loaded),
path, size, and time of the file. create file := new
Braceless0.Platform.Disk.File(path)
# Try to retrieve the checksum value from the checksum cache,
otherwise compute it and update the checksum cache. create checksum
is SHA512 if not checksums.Lookup(file.Path, file.Time, out
checksum): let checksum := new Checksum(file.Data) call
checksums.Insert(file.Path, file.Time, checksum)
# Try to retrieve the compiled wordcode from the scripts cache,
otherwise compile it and update the scripts cache. create script is
Script if not scripts.Lookup(checksum, out script): let
script := new Script(file) call scripts.Insert(checksum,
script)