Hi,
There has been a bit of inevitable FUD with phar. Although the manual
(http://php.net/phar) describes a fair amount of how phar works, the
design decisions are not documented.
Originally, the phar stream wrapper was a userspace thing. Davey Shafik
designed it to take advantage of a neat loophole in the design of the
tar file format so that a valid tar could be run by PHP without needing
to have the phar stream wrapper loaded. This was great until I started
using it to run the PEAR Installer. The performance hit was tremendous,
as every newly included file required scanning the entire file, header
by header, until we found the needed file. Worst case, it meant loading
megabytes of information just to locate a file. The zip file format has
the same limitation - the entire archive needs to be scanned.
Both of these formats were not designed for random access in the way a
traditional filesystem is designed. In fact, I could not find an
example of a archive format that is designed for this.
As such, borrowing from the design of disk filesystems, I created a new
format that is very small and processes very quickly. It is so much
faster, I can't detect a difference in performance running the PEAR
installer off of the disk and running it out of a phar. I am sure there
is a difference that apache benchmark would detect because of extra
load-in time of the file manifest. The way phar works now is a file
manifest is at the start of the phar archive (similar to a directory
file in traditional filesystems). Each file has a manifest entry
containing the file name, size of the file, and offset into the archive
plus some flags and optional meta-data. The manifest is currently
limited in size to 1 MB, so some applications probably would not be
possible to phar under the current design.
Each phar has a loader stub, which can be any php code, but must contain
the __HALT_COMPILER(); token. This will allow creating phars that also
contain PHP_Archive to work under conditions where the phar extension is
disabled. It is the loader stub that makes it possible to run a phar
with plain vanilla PHP.
I see two possible solutions to the concerns raised by others.
- don't worry, be happy
- re-design the phar file format such that it is a tar again, and put
the manifest for quick loading in one of the first files of the tar archive.
If I had thought #2 was a good idea, I would have already done it, so
there is my opinion.
One basic assumption I would like to raise here is that nobody is going
to download a .phar archive who does not already have PHP. Does this
assumption sound sane?
If so, I would like to provide some simple scripts for unpharring and
repharring a .phar archive. This is not hard to do with a 5-line PHP
script.
One of the big questions I would have though would be for xdebug (hi
Derick) and designers of IDEs, as it would be good to ensure that it is
possible to step through a phar, or even to dump the source line with an
error message. These, to me, seem to be the most pressing disadvantages
of phar currently - it becomes much harder to debug a problem in a PHP
script when it is stuffed into a phar.
Thanks,
Greg
The stub could also easily include code to allow for an extraction flag
to work. So you could run php my.phar --extract and have the code dumped
to the FS as it originally was.
The choice to add these things (the stub and the extract flag), is just
that, a choice. The same as choosing short tags, or relying on
magic_quotes* etc. Of course, sane defaults when creating phars, is
something we can decide on as things move on, I would vote for having
both PHP_Archive and the --extract flag code inserted by default, this
would solve the issues IMO.
- Davey
Gregory Beaver wrote:
Hi,
There has been a bit of inevitable FUD with phar. Although the manual
(http://php.net/phar) describes a fair amount of how phar works, the
design decisions are not documented.Originally, the phar stream wrapper was a userspace thing. Davey Shafik
designed it to take advantage of a neat loophole in the design of the
tar file format so that a valid tar could be run by PHP without needing
to have the phar stream wrapper loaded. This was great until I started
using it to run the PEAR Installer. The performance hit was tremendous,
as every newly included file required scanning the entire file, header
by header, until we found the needed file. Worst case, it meant loading
megabytes of information just to locate a file. The zip file format has
the same limitation - the entire archive needs to be scanned.Both of these formats were not designed for random access in the way a
traditional filesystem is designed. In fact, I could not find an
example of a archive format that is designed for this.As such, borrowing from the design of disk filesystems, I created a new
format that is very small and processes very quickly. It is so much
faster, I can't detect a difference in performance running the PEAR
installer off of the disk and running it out of a phar. I am sure there
is a difference that apache benchmark would detect because of extra
load-in time of the file manifest. The way phar works now is a file
manifest is at the start of the phar archive (similar to a directory
file in traditional filesystems). Each file has a manifest entry
containing the file name, size of the file, and offset into the archive
plus some flags and optional meta-data. The manifest is currently
limited in size to 1 MB, so some applications probably would not be
possible to phar under the current design.Each phar has a loader stub, which can be any php code, but must contain
the __HALT_COMPILER(); token. This will allow creating phars that also
contain PHP_Archive to work under conditions where the phar extension is
disabled. It is the loader stub that makes it possible to run a phar
with plain vanilla PHP.I see two possible solutions to the concerns raised by others.
- don't worry, be happy
- re-design the phar file format such that it is a tar again, and put
the manifest for quick loading in one of the first files of the tar archive.If I had thought #2 was a good idea, I would have already done it, so
there is my opinion.One basic assumption I would like to raise here is that nobody is going
to download a .phar archive who does not already have PHP. Does this
assumption sound sane?If so, I would like to provide some simple scripts for unpharring and
repharring a .phar archive. This is not hard to do with a 5-line PHP
script.One of the big questions I would have though would be for xdebug (hi
Derick) and designers of IDEs, as it would be good to ensure that it is
possible to step through a phar, or even to dump the source line with an
error message. These, to me, seem to be the most pressing disadvantages
of phar currently - it becomes much harder to debug a problem in a PHP
script when it is stuffed into a phar.Thanks,
Greg
Gregory Beaver wrote:
[snip]
megabytes of information just to locate a file. The zip file format has
the same limitation - the entire archive needs to be scanned.Both of these formats were not designed for random access in the way a
traditional filesystem is designed. In fact, I could not find an
example of a archive format that is designed for this.
Josh Eichorn challenged this assertion on IRC, and so I took another
look at the zip file format. It turns out, I was wrong - the format has
something called a "central directory" at the end of the file, which is
the equivalent to the manifest that phar uses. I apologize for the
misrepresentation.
Because the .zip file format cannot be parsed directly by PHP in the way
that a phar archive can (in other words "php somefile.zip" yields
gibberish, whereas "php somefile.phar" allows php loader execution), in
order to use this for the phar file format, it would require some
changes to PHP itself.
In other words, php would do better to simply port over Java's jar stuff
rather than phar if this is what people want (to be perfectly clear I'm
not volunteering to do this).
Greg