Continuous Integration Atomic Deploys and PHP 5.5

12 years ago by Rasmus Lerdorf — view source

unread

One of the things I have been helping companies with for the past couple
of years is sorting through the complexities of deploying PHP code with
the least possible interruption to the running site.

With APC you can achieve atomic deploys without a server restart and
without clearing the opcode cache through careful use of the
realpath/stat cache and a clearstatcache() call in the front-controller.
The logic behind it is a little complicated, but it goes something like
this:

Request 1 starts before the deploy and loads script A, B
Deploy to a separate directory and the docroot symlink now points to here
Request 2 starts and loads A, B, C
Request 1 was a bit slow and gets to load C now

So this is the scenario that trips up most deploy systems because
request 1 would load a version of C that doesn't match A and B already
loaded and thus this deploy is not atomic even though all the files were
deployed atomically.

With the realpath/stat cache and APC's use of inodes as cache keys
request 1 will get the inode from the previous version of C, so it will
not be out of sync with the previously loaded A and B. In request 2 we
put a clearstatcache() call in the front-controller triggered usually by
comparing the version baked into the front-controller with a version
number written to shared memory. So by detecting that there is a more
recent version of the code available in the front-controller at the
start of a request we can make sure that all new requests will see the
new code while requests that were executing when the deploy happened
will continue to use the previous version until they are done.

Now, with PHP 5.5 and the new OPcache things are a bit different.
OPcache is not inode-based so we can't use the same trick. Since we are
focusing on a single cache implementation I think we should document a
preferred approach to this common scenario. I see a couple of approaches:

Turn off validate_timestamps and always do a graceful server restart
on a deploy

effective

slow and annoying when you deploy a lot, especially companies who do
a lot of A/B testing and feature-based development with potentially
hundreds of small code and config deploys to ramp features up/down
throughout the day. Being able to invalidate a single cache entry might
mean you could avoid doing the full restart on a simple config-file
deploy, but currently opcache can't do that(*)

Do something interesting with revalidate_freq. If we always knew that
the file stat happened at :00 of the minute and we deploy at :01 then
perhaps we could get away with not doing anything else

no server restarts and no cache clears

scripts that take longer than 59 seconds to complete would be a
problem and the code currently can't guarantee timestamps checks at
regular intervals like this

Add some magic to OPcache that gives it the concept of a server
request. Almost like a DB transaction. Currently on a cache reset,
OPcache lets currently executing entries complete, but this is on a
per-entry basis. A web request is made up of many of these entries so
unless they are somehow bracketed it doesn't help us. So something like
opcache_request_begin()/opcache_request_done() might work.

no server restarts and no cache clears

This might get way too complex, especially since userspace may never
call opcache_request_done() which means we would need some sort of
timeout mechanism as well

(*) for single-file deploys, such as a config-change to ramp a feature
up or down you could blacklist the config file and use apcu/yac or some
other user cache mechanism to speed things up.

None of these approaches sound ideal to me, and that includes the
existing inode-caching APC approach. Too brittle and complicated. Any
other ideas?

-Rasmus

12 years ago by Ferenc Kovacs — view source

unread

One of the things I have been helping companies with for the past couple
of years is sorting through the complexities of deploying PHP code with
the least possible interruption to the running site.

With APC you can achieve atomic deploys without a server restart and
without clearing the opcode cache through careful use of the
realpath/stat cache and a clearstatcache() call in the front-controller.
The logic behind it is a little complicated, but it goes something like
this:

Request 1 starts before the deploy and loads script A, B

Deploy to a separate directory and the docroot symlink now points to here

Request 2 starts and loads A, B, C

Request 1 was a bit slow and gets to load C now

So this is the scenario that trips up most deploy systems because
request 1 would load a version of C that doesn't match A and B already
loaded and thus this deploy is not atomic even though all the files were
deployed atomically.

With the realpath/stat cache and APC's use of inodes as cache keys
request 1 will get the inode from the previous version of C, so it will
not be out of sync with the previously loaded A and B. In request 2 we
put a clearstatcache() call in the front-controller triggered usually by
comparing the version baked into the front-controller with a version
number written to shared memory. So by detecting that there is a more
recent version of the code available in the front-controller at the
start of a request we can make sure that all new requests will see the
new code while requests that were executing when the deploy happened
will continue to use the previous version until they are done.

Now, with PHP 5.5 and the new OPcache things are a bit different.
OPcache is not inode-based so we can't use the same trick. Since we are
focusing on a single cache implementation I think we should document a
preferred approach to this common scenario. I see a couple of approaches:

Turn off validate_timestamps and always do a graceful server restart
on a deploy

effective

slow and annoying when you deploy a lot, especially companies who do
a lot of A/B testing and feature-based development with potentially
hundreds of small code and config deploys to ramp features up/down
throughout the day. Being able to invalidate a single cache entry might
mean you could avoid doing the full restart on a simple config-file
deploy, but currently opcache can't do that(*)

Do something interesting with revalidate_freq. If we always knew that
the file stat happened at :00 of the minute and we deploy at :01 then
perhaps we could get away with not doing anything else

no server restarts and no cache clears

scripts that take longer than 59 seconds to complete would be a
problem and the code currently can't guarantee timestamps checks at
regular intervals like this

Add some magic to OPcache that gives it the concept of a server
request. Almost like a DB transaction. Currently on a cache reset,
OPcache lets currently executing entries complete, but this is on a
per-entry basis. A web request is made up of many of these entries so
unless they are somehow bracketed it doesn't help us. So something like
opcache_request_begin()/opcache_request_done() might work.

no server restarts and no cache clears

This might get way too complex, especially since userspace may never
call opcache_request_done() which means we would need some sort of
timeout mechanism as well

(*) for single-file deploys, such as a config-change to ramp a feature
up or down you could blacklist the config file and use apcu/yac or some
other user cache mechanism to speed things up.

None of these approaches sound ideal to me, and that includes the
existing inode-caching APC approach. Too brittle and complicated. Any
other ideas?

-Rasmus

--

realpath the document root(which is a symlink to the actual release
directory) from your index.php/bootstrap file and use that as a base path
for making absolute paths everywhere?
that way the requests started before the symlink switch will continue with
the old version but requests started after the switch will use the files
from the new revision.
ofc. you can still have issues like an ajax request from the old version
gets served by the new version, and if you have more than one server sooner
or later you will/have to sacrifice something from the CAP trio.

--
Ferenc Kovács
@Tyr43l - http://tyrael.hu

12 years ago by Rasmus Lerdorf — view source

unread

realpath the document root(which is a symlink to the actual release
directory) from your index.php/bootstrap file and use that as a base
path for making absolute paths everywhere?
that way the requests started before the symlink switch will continue
with the old version but requests started after the switch will use the
files from the new revision.
ofc. you can still have issues like an ajax request from the old version
gets served by the new version, and if you have more than one server
sooner or later you will/have to sacrifice something from the CAP trio.

Well, solving the multi-request/multi-server ajax scenario is a bit of a
different problem. You'd need to version those requests to handle that.
The scope I am concerned with here is per-server deploy atomicity.

But yes, some way to have a 2-docroot scenario where all requests
started on one via the docroot symlink stays on that one would be a good
approach but it would take a lot of discipline at the userspace level to
enforce that across a large and diverse codebase with autoloaders and
actual realpath calls all over the place.

-Rasmus

12 years ago by David Muir — view source

unread

realpath the document root(which is a symlink to the actual release
directory) from your index.php/bootstrap file and use that as a base
path for making absolute paths everywhere?
that way the requests started before the symlink switch will continue
with the old version but requests started after the switch will use the
files from the new revision.
ofc. you can still have issues like an ajax request from the old version
gets served by the new version, and if you have more than one server
sooner or later you will/have to sacrifice something from the CAP trio.
Well, solving the multi-request/multi-server ajax scenario is a bit of a
different problem. You'd need to version those requests to handle that.
The scope I am concerned with here is per-server deploy atomicity.

But yes, some way to have a 2-docroot scenario where all requests
started on one via the docroot symlink stays on that one would be a good
approach but it would take a lot of discipline at the userspace level to
enforce that across a large and diverse codebase with autoloaders and
actual realpath calls all over the place.

-Rasmus

Are you saying that to allow atomic deploys with O+ you need to make
sure that all files are either autoloaded with a full realpath, or
manually included/required by the realpath?

You mentioned that O+ does not use inodes as cache keys like APC, but
what does it use instead? Just the file path?

Cheers,
David

12 years ago by Terry Ellison — view source

unread

Rasmus,

<snip>

Request 1 starts before the deploy and loads script A, B

Deploy to a separate directory and the docroot symlink now points to here

Request 2 starts and loads A, B, C

Request 1 was a bit slow and gets to load C now
The issues that you raise about introducing atomic versioning in the
script namespace do need to be addressed to avoid material service
disruption during application version upgrade. However, surely another
facet of the O+ architectural also frustrates this deployment model.

My reading is that is that O+ processes each new (cache-miss) compile
request by first sizing the memory requirements for the compiled source
and then allocating a single brick from (one of) the SMA at its high
water mark. Stale cache entries are marked as corrupt and their storage
is then allocated to wasted_shared_memory with no attempt to reuse it.
SMA exhaustion or the % wastage exceeding a threshold ultimately
triggers a process shutdown cascade. This strategy is lean and fast
but as far as I understand this, it ultimately uses a process death
cascade and population rebirth to implement garbage collection.

Wouldn't your non-stop models would require a more stable reuse
architecture which recycles wasted memory stably without the death
cascade? Perhaps one of the Zend team could correct my inference if
I've got it wrong again :-(

Regards
Terry