Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:34636 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 61252 invoked by uid 1010); 10 Jan 2008 12:05:43 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 61236 invoked from network); 10 Jan 2008 12:05:43 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 10 Jan 2008 12:05:43 -0000 Authentication-Results: pb1.pair.com header.from=jorton@redhat.com; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=jorton@redhat.com; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain redhat.com designates 66.187.233.31 as permitted sender) X-PHP-List-Original-Sender: jorton@redhat.com X-Host-Fingerprint: 66.187.233.31 mx1.redhat.com Linux 2.6 Received: from [66.187.233.31] ([66.187.233.31:59567] helo=mx1.redhat.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 8B/10-59671-41A06874 for ; Thu, 10 Jan 2008 07:05:42 -0500 Received: from int-mx1.corp.redhat.com (int-mx1.corp.redhat.com [172.16.52.254]) by mx1.redhat.com (8.13.8/8.13.8) with ESMTP id m0AC5cGZ028259; Thu, 10 Jan 2008 07:05:38 -0500 Received: from turnip.manyfish.co.uk (IDENT:U2FsdGVkX18l+xELcNmk1fBqf6qxQ5LuL+nparpRyrI@vpn-14-93.rdu.redhat.com [10.11.14.93]) by int-mx1.corp.redhat.com (8.13.1/8.13.1) with ESMTP id m0AC5amt020152; Thu, 10 Jan 2008 07:05:37 -0500 Received: from jorton by turnip.manyfish.co.uk with local (Exim 4.68) (envelope-from ) id 1JCwAB-0001vO-PH; Thu, 10 Jan 2008 12:05:35 +0000 Date: Thu, 10 Jan 2008 12:05:35 +0000 To: Derick Rethans Cc: PHP Developers Mailing List Message-ID: <20080110120535.GA27518@redhat.com> Mail-Followup-To: Derick Rethans , PHP Developers Mailing List References: <20080109141541.GA14571@redhat.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="gKMricLos+KVdGMg" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.17 (2007-11-01) Organization: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SL4 1TE, United Kingdom. Registered in UK and Wales under Company Registration No. 03798903 Directors: Michael Cunningham (USA), Brendan Lane (Ireland), Matt Parson (USA), Charlie Peters (USA) X-Scanned-By: MIMEDefang 2.58 on 172.16.52.254 Subject: Re: [PHP-DEV] [PATCH] date/timelib: use system timezone database From: jorton@redhat.com (Joe Orton) --gKMricLos+KVdGMg Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Thanks for all the feedback. On Wed, Jan 09, 2008 at 09:03:02PM +0100, Derick Rethans wrote: > On Wed, 9 Jan 2008, Joe Orton wrote: > > > It's a bit of a maintenance headache for distributions when > > packages include their own copy of the timezone database, since this > > needs to be updated frequently. > > There is a PECL extension to provide those updates: > http://pecl.php.net/package/timezonedb > It is a drop-in replacement to update the rules. My aim is to eliminate the additional copies of the timezone database, rather than ship and maintain yet more. Just as I don't want to statically link all system libraries into PHP, nor have to build special copies of any of them for PHP. I'm not sure I find the logic of the "but the system-provided data will become stale" arguments convincing; systems which are left unmaintained by the administrators will have old versions of software on; that's a given. I can't see why adding *more* packages (PECL timezonedb) which those admins have to keep up-to-date will make any difference to that. The point of the patch is to make it *easier* for admins and packagers to keep systems up to date; fewer package updates to download, test, deploy, etc. If every package on your system followed the same philosophy of duplicating the timezone database, the system would be unmaintainable. > > 1) I've not implemented timelib_timezone_builtin_identifiers_list() here > > since it doesn't seem to be used, is it there for third-party > > extensions? It could be implemented by iterating through the directory. > > It's not used (anymore). However, there is a PHP function > timezone_identifiers_list that uses the index information to enumerate > all entries. Your patch prevents this function from working as both > struct elements index_size and index are set to 0/null. OK, fixed in attached patch. > > 2) there's no general way that I can find to obtain the database > > version, so I just invented a string here. > > Not having the version creates another problem, as the PECL timezonedb > extension will then *always* override the builtin database. The > extension uses the version number to see if the extension provides a > later rules set. With your patch, this extra check will not work > correctly anymore if PHP's built-in version is newer (which would happen > if PHP got upgraded). This seems fine, even desirable. Anybody who configures PHP --with-system-tzdata *and* installs the PECL timezonedb always gets the data provided by the timezonedb; the risk of stale timezone data is little different to the current risk. > - This patchs allows accessing any file on the filesystem (because the > path is not sanitized). Fixed in attached patch. > - Opening a file is less quick than having the data in memory. Sure, this is a trade-off; it's configurable exactly so that users get to make the choice. > - The system's format of tzfiles does not necessarily have to be the > same as the one that PHP reads. On my (Debian) system the tz provided > data is already 64bit read. Once I update the extension to use this > extra data, reading the system tz files won't work anymore. I'm not sure exactly what the concern is here; the 64-bit TZif2 format is backwards-compatible with the TZif format and the existing PHP code reads it without problems. (I've been testing against it) > - It would be possible to use a "wrong set" of tzdata, resulting in bugs > like http://bugs.php.net/bug.php?id=35958 It's always going to be possible for people to shoot themselves in the feet. Someone trying to "fix" the PHP-supplied database could similarly screw it up. Again, eliminating extra copies of the timezone database is intended to make it easier for people to keep up-to-date. Regards, joe --gKMricLos+KVdGMg Content-Type: text/plain; charset=utf-8 Content-Disposition: attachment; filename="php_systz-2.diff" Index: ext/date/lib/parse_tz.c =================================================================== RCS file: /repository/php-src/ext/date/lib/parse_tz.c,v retrieving revision 1.35 diff -u -r1.35 parse_tz.c --- ext/date/lib/parse_tz.c 31 Dec 2007 07:12:08 -0000 1.35 +++ ext/date/lib/parse_tz.c 10 Jan 2008 11:42:15 -0000 @@ -20,6 +20,16 @@ #include "timelib.h" +#ifdef HAVE_SYSTEM_TZDATA +#include +#include +#include +#include +#include + +#include "php_scandir.h" +#endif + #include #ifdef HAVE_STRING_H @@ -27,7 +37,10 @@ #else #include #endif + +#ifndef HAVE_SYSTEM_TZDATA #include "timezonedb.h" +#endif #if (defined(__APPLE__) || defined(__APPLE_CC__)) && (defined(__BIG_ENDIAN__) || defined(__LITTLE_ENDIAN__)) # if defined(__LITTLE_ENDIAN__) @@ -202,6 +215,195 @@ } } +#ifdef HAVE_SYSTEM_TZDATA + +#ifdef HAVE_SYSTEM_TZDATA_PREFIX +#define ZONEINFO_PREFIX HAVE_SYSTEM_TZDATA_PREFIX +#else +#define ZONEINFO_PREFIX "/usr/share/zoneinfo" +#endif + +static const timelib_tzdb *timezonedb_system = NULL; + +/* Filter out some non-tzdata files and the posix/right databases, if + * present. */ +static int index_filter(const struct dirent *ent) +{ + return strcmp(ent->d_name, ".") != 0 + && strcmp(ent->d_name, "..") != 0 + && strcmp(ent->d_name, "posix") != 0 + && strcmp(ent->d_name, "posixrules") != 0 + && strcmp(ent->d_name, "right") != 0 + && strstr(ent->d_name, ".tab") == NULL; +} + +/* Create the zone identifier index by trawling the filesystem. */ +static void create_zone_index(timelib_tzdb *db) +{ + size_t dirstack_size, dirstack_top; + size_t index_size, index_next; + timelib_tzdb_index_entry *db_index; + char **dirstack; + + /* LIFO stack to hold directory entres to scan; each slot is a + * directory name relative to the zoneinfo prefix. */ + dirstack_size = 32; + dirstack = malloc(dirstack_size * sizeof *dirstack); + dirstack_top = 1; + dirstack[0] = strdup(""); + + /* Index array. */ + index_size = 64; + db_index = malloc(index_size * sizeof *db_index); + index_next = 0; + + do { + struct dirent **ents; + char name[PATH_MAX], *top; + int count; + + /* Pop the top stack entry, and iterate through its contents. */ + top = dirstack[--dirstack_top]; + snprintf(name, sizeof name, ZONEINFO_PREFIX "/%s", top); + + count = php_scandir(name, &ents, index_filter, php_alphasort); + + while (count > 0) { + struct stat st; + const char *leaf = ents[count - 1]->d_name; + + snprintf(name, sizeof name, ZONEINFO_PREFIX "/%s/%s", + top, leaf); + + if (strlen(name) && stat(name, &st) == 0) { + /* Name, relative to the zoneinfo prefix. */ + const char *root = top; + + if (root[0] == '/') root++; + + snprintf(name, sizeof name, "%s%s%s", root, + *root ? "/": "", leaf); + + if (S_ISDIR(st.st_mode)) { + if (dirstack_top == dirstack_size) { + dirstack_size *= 2; + dirstack = realloc(dirstack, + dirstack_size * sizeof *dirstack); + } + dirstack[dirstack_top++] = strdup(name); + } + else { + if (index_next == index_size) { + index_size *= 2; + db_index = realloc(db_index, + index_size * sizeof *db_index); + } + + db_index[index_next].id = strdup(name); + db_index[index_next++].pos = 0; + } + } + + free(ents[--count]); + } + + free(ents); + free(top); + } while (dirstack_top); + + db->index = db_index; + db->index_size = index_next; + + free(dirstack); +} + +/* Return the mmap()ed tzfile if found, else NULL. On success, the + * length of the mapped data is placed in *length. */ +static char *map_tzfile(const char *timezone, size_t *length) +{ + char fname[PATH_MAX]; + struct stat st; + char *p; + int fd; + + if (strstr(timezone, "..") != NULL) { + return NULL; + } + + snprintf(fname, sizeof fname, ZONEINFO_PREFIX "/%s", timezone); + + fd = open(fname, O_RDONLY); + if (fd == -1) { + return NULL; + } else if (fstat(fd, &st) != 0 || st.st_size < 21) { + close(fd); + return NULL; + } + + *length = st.st_size; + p = mmap(NULL, st.st_size, PROT_READ, MAP_SHARED, fd, 0); + close(fd); + + return p != MAP_FAILED ? p : NULL; +} + +const timelib_tzdb *timelib_builtin_db(void) +{ + if (timezonedb_system == NULL) { + timelib_tzdb *tmp = malloc(sizeof *tmp); + + tmp->version = "0.system"; + tmp->data = NULL; + create_zone_index(tmp); + timezonedb_system = tmp; + } + + return timezonedb_system; +} + +const timelib_tzdb_index_entry *timelib_timezone_builtin_identifiers_list(int *count) +{ + *count = timezonedb_system->index_size; + return timezonedb_system->index; +} + +int timelib_timezone_id_is_valid(char *timezone, const timelib_tzdb *tzdb) +{ + char fname[PATH_MAX]; + + if (strstr(timezone, "..") != NULL) { + return 0; + } + + snprintf(fname, sizeof fname, ZONEINFO_PREFIX "/%s", timezone); + + return access(fname, R_OK) == 0 ? 1 : 0; +} + +timelib_tzinfo *timelib_parse_tzfile(char *timezone, const timelib_tzdb *tzdb) +{ + char *tzf, *orig; + timelib_tzinfo *tmp; + size_t len; + + orig = map_tzfile(timezone, &len); + if (orig == NULL) { + return NULL; + } + + tmp = timelib_tzinfo_ctor(timezone); + + tzf = orig + 20; + read_header(&tzf, tmp); + read_transistions(&tzf, tmp); + read_types(&tzf, tmp); + + munmap(orig, len); + + return tmp; +} +#else /* !HAVE_SYSTEM_TZDATA */ + static int seek_to_tz_position(const unsigned char **tzf, char *timezone, const timelib_tzdb *tzdb) { int left = 0, right = tzdb->index_size - 1; @@ -258,6 +460,7 @@ return tmp; } +#endif static ttinfo* fetch_timezone_offset(timelib_tzinfo *tz, timelib_sll ts, timelib_sll *transition_time) { Index: ext/date/lib/timelib.m4 =================================================================== RCS file: /repository/php-src/ext/date/lib/timelib.m4,v retrieving revision 1.4 diff -u -r1.4 timelib.m4 --- ext/date/lib/timelib.m4 3 Jul 2005 23:30:52 -0000 1.4 +++ ext/date/lib/timelib.m4 10 Jan 2008 11:42:16 -0000 @@ -78,3 +78,17 @@ dnl Check for strtoll, atoll AC_CHECK_FUNCS(strtoll atoll strftime) + +PHP_ARG_WITH(system-tzdata, for use of system timezone data, +[ --with-system-tzdata[=DIR] to specify use of system timezone data], +no, no) + +if test "$PHP_SYSTEM_TZDATA" != "no"; then + AC_DEFINE(HAVE_SYSTEM_TZDATA, 1, [Define if system timezone data is used]) + + if test "$PHP_SYSTEM_TZDATA" != "yes"; then + AC_DEFINE_UNQUOTED(HAVE_SYSTEM_TZDATA_PREFIX, "$PHP_SYSTEM_TZDATA", + [Define for location of system timezone data]) + fi +fi + --gKMricLos+KVdGMg--