Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:79109 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 84874 invoked from network); 23 Nov 2014 23:36:34 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 23 Nov 2014 23:36:34 -0000 Authentication-Results: pb1.pair.com smtp.mail=bill@devtemple.com; spf=permerror; sender-id=unknown Authentication-Results: pb1.pair.com header.from=bill@devtemple.com; sender-id=unknown Received-SPF: error (pb1.pair.com: domain devtemple.com from 50.116.27.81 cause and error) X-PHP-List-Original-Sender: bill@devtemple.com X-Host-Fingerprint: 50.116.27.81 li478-81.members.linode.com Received: from [50.116.27.81] ([50.116.27.81:52693] helo=mail.devtemple.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 23/22-05261-18F62745 for ; Sun, 23 Nov 2014 18:36:33 -0500 Received: from BillHP (pool-71-110-124-28.lsanca.fios.verizon.net [71.110.124.28]) by mail.devtemple.com (Postfix) with ESMTPSA id 93AE74AA8; Sun, 23 Nov 2014 18:36:29 -0500 (EST) To: "'Rowan Collins'" , References: <012701d0074c$ccfe6f40$66fb4dc0$@devtemple.com> <7E8EF92A-6071-4BDA-ADD5-2C2B54AA2DCB@gmail.com> In-Reply-To: <7E8EF92A-6071-4BDA-ADD5-2C2B54AA2DCB@gmail.com> Date: Sun, 23 Nov 2014 15:36:30 -0800 Message-ID: <018601d00776$50681c40$f13854c0$@devtemple.com> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Mailer: Microsoft Outlook 14.0 Content-Language: en-us Thread-Index: AQJLFpisn43FMG/Tn3/WVT0M0w3M/wIFd7Jwm2iIKaA= Subject: RE: [PHP-DEV] enhance fget to accept a callback From: bill@devtemple.com ("Bill Salak") >Hi list, > >I'm considering writing an RFC to add a 3rd parameter to fgets which=20 >accepts a user defined function. If we had this today we wouldn't need=20 >fgetcsv with the added benefit of fgetcsv style support for data=20 >packaging formats we would otherwise create more 1 off functions for. =20 >For example, if we decided to support reading json from files in the=20 >same manner as our current fgetcsv functionality today, we would create = >an fgetjson function. > >This change unifies the way in which we support native transliteration=20 >of data packaging formats from files into php data structures through a = >single interface. The other major design benefit, from my point of=20 >view, is the unification of userland transliteration=20 >functions/libraries with the same modality as our native support for=20 >these types of use cases. I believe this will ultimately result in more = >intuitive userland code around this type of functionality. It's an interesting idea, but I can't immediately picture how it would = work - what would the callback be given, and what would it return? Would = it somehow be able to manipulate the number of characters read from the = stream? For any variant of CSV, reading a line at a time is what you want = anyway, and you can easily build an Iterator which post-processes each = line as it is read, giving the memory efficiency of fgetcsv() but much = more flexibility. For JSON, newlines aren't the delimiter you want, but with nested = structures, I'm not sure how you'd parse a partial structure anyway. Are = there JSON equivalents of SAX (event-based) parsers? The callback would be given the string as returned by fgets today. The = functional equivalent to fgetjson today is handled by something like=20 $handle =3D fopen(~some file~, 'r'); while (($data =3D fgets($handle)) !=3D=3D FALSE) { $data =3D json_decode($data, true); ...other stuff... } and would change to $handle =3D fopen(~some file~, 'r'); $decode =3D json_decode($data, true); while (($data =3D fgets($handle,0,$decode)) !=3D=3D FALSE) { ...other stuff... } fgetcsv equivalent would be=20 $handle =3D fopen(~some file~, 'r'); $decode =3D str_getcsv(...options...); while (($data =3D fgets($handle,0,$decode)) !=3D=3D FALSE) { ...other stuff... } userland benefits from having an API that promotes consistency through a = flexible interface $handle =3D fopen(~some file~, 'r'); $decode =3D function($foo) { ...do stuff...; return $bar;} while (($data =3D fgets($handle,0,$decode)) !=3D=3D FALSE) { ...other stuff... } On a side-note, small json data packages, delimited with newlines, and = stored on cheap disk is an increasingly popular (in my circles) way to = handle storing raw data that could be subject to later scrutiny or = processing. In these cases parsing json just like a csv file and = converting into a native format is necessary. I've found json packages = in files to have huge advantages over storing the equivalent data + = relationship in csv format. With that said, json isn't the end-all and = be-all. We need to anticipate this model evolving into any existing or = future data format. Providing a clean and consistent way to handle all = of the existing and yet-to-be-determined use cases around fgets and = different data packaging formats is the primary purpose of my proposal.=20