Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:29279 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 6129 invoked by uid 1010); 7 May 2007 18:40:57 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 6114 invoked from network); 7 May 2007 18:40:57 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 7 May 2007 18:40:57 -0000 Authentication-Results: pb1.pair.com header.from=greg@chiaraquartet.net; sender-id=unknown Authentication-Results: pb1.pair.com smtp.mail=greg@chiaraquartet.net; spf=permerror; sender-id=unknown Received-SPF: error (pb1.pair.com: domain chiaraquartet.net from 66.79.163.178 cause and error) X-PHP-List-Original-Sender: greg@chiaraquartet.net X-Host-Fingerprint: 66.79.163.178 bluga.net Linux 2.5 (sometimes 2.4) (4) Received: from [66.79.163.178] ([66.79.163.178:60836] helo=mail.bluga.net) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id FC/43-56106-8B27F364 for ; Mon, 07 May 2007 14:40:57 -0400 Received: from mail.bluga.net (mail.bluga.net [127.0.0.1]) by mail.bluga.net (Postfix) with ESMTP id 135958737E for ; Mon, 7 May 2007 11:41:18 -0700 (PDT) Received: from [192.168.0.106] (CPE-72-133-60-163.neb.res.rr.com [72.133.60.163]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.bluga.net (Postfix) with ESMTP id BB7DA8735D for ; Mon, 7 May 2007 11:41:17 -0700 (PDT) Message-ID: <463F7301.9040104@chiaraquartet.net> Date: Mon, 07 May 2007 13:42:09 -0500 User-Agent: Thunderbird 1.5.0.10 (X11/20070306) MIME-Version: 1.0 To: internals Mailing List Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV using ClamSMTP Subject: how does Phar actually work? From: greg@chiaraquartet.net (Gregory Beaver) Hi, There has been a bit of inevitable FUD with phar. Although the manual (http://php.net/phar) describes a fair amount of how phar works, the design decisions are not documented. Originally, the phar stream wrapper was a userspace thing. Davey Shafik designed it to take advantage of a neat loophole in the design of the tar file format so that a valid tar could be run by PHP without needing to have the phar stream wrapper loaded. This was great until I started using it to run the PEAR Installer. The performance hit was tremendous, as every newly included file required scanning the entire file, header by header, until we found the needed file. Worst case, it meant loading megabytes of information just to locate a file. The zip file format has the same limitation - the entire archive needs to be scanned. Both of these formats were not designed for random access in the way a traditional filesystem is designed. In fact, I could not find an example of a archive format that is designed for this. As such, borrowing from the design of disk filesystems, I created a new format that is very small and processes very quickly. It is so much faster, I can't detect a difference in performance running the PEAR installer off of the disk and running it out of a phar. I am sure there is a difference that apache benchmark would detect because of extra load-in time of the file manifest. The way phar works now is a file manifest is at the start of the phar archive (similar to a directory file in traditional filesystems). Each file has a manifest entry containing the file name, size of the file, and offset into the archive plus some flags and optional meta-data. The manifest is currently limited in size to 1 MB, so some applications probably would not be possible to phar under the current design. Each phar has a loader stub, which can be any php code, but must contain the __HALT_COMPILER(); token. This will allow creating phars that also contain PHP_Archive to work under conditions where the phar extension is disabled. It is the loader stub that makes it possible to run a phar with plain vanilla PHP. I see two possible solutions to the concerns raised by others. 1) don't worry, be happy 2) re-design the phar file format such that it is a tar again, and put the manifest for quick loading in one of the first files of the tar archive. If I had thought #2 was a good idea, I would have already done it, so there is my opinion. One basic assumption I would like to raise here is that nobody is going to download a .phar archive who does not already have PHP. Does this assumption sound sane? If so, I would like to provide some simple scripts for unpharring and repharring a .phar archive. This is not hard to do with a 5-line PHP script. One of the big questions I would have though would be for xdebug (hi Derick) and designers of IDEs, as it would be good to ensure that it is possible to step through a phar, or even to dump the source line with an error message. These, to me, seem to be the most pressing disadvantages of phar currently - it becomes much harder to debug a problem in a PHP script when it is stuffed into a phar. Thanks, Greg