Hi list,
I'm considering writing an RFC to add a 3rd parameter to fgets which accepts
a user defined function. If we had this today we wouldn't need fgetcsv with
the added benefit of fgetcsv style support for data packaging formats we
would otherwise create more 1 off functions for. For example, if we decided
to support reading json from files in the same manner as our current fgetcsv
functionality today, we would create an fgetjson function.
This change unifies the way in which we support native transliteration of
data packaging formats from files into php data structures through a single
interface. The other major design benefit, from my point of view, is the
unification of userland transliteration functions/libraries with the same
modality as our native support for these types of use cases. I believe this
will ultimately result in more intuitive userland code around this type of
functionality.
Before I go any further in formalizing my proposal, I'd like to get feedback
from list members.
Thanks for your time.
Bill Salak
Hi list,
I'm considering writing an RFC to add a 3rd parameter to fgets which
accepts
a user defined function. If we had this today we wouldn't need fgetcsv
with
the added benefit of fgetcsv style support for data packaging formats
we
would otherwise create more 1 off functions for. For example, if we
decided
to support reading json from files in the same manner as our current
fgetcsv
functionality today, we would create an fgetjson function.This change unifies the way in which we support native transliteration
of
data packaging formats from files into php data structures through a
single
interface. The other major design benefit, from my point of view, is
the
unification of userland transliteration functions/libraries with the
same
modality as our native support for these types of use cases. I believe
this
will ultimately result in more intuitive userland code around this type
of
functionality.
It's an interesting idea, but I can't immediately picture how it would work - what would the callback be given, and what would it return? Would it somehow be able to manipulate the number of characters read from the stream?
For any variant of CSV, reading a line at a time is what you want anyway, and you can easily build an Iterator which post-processes each line as it is read, giving the memory efficiency of fgetcsv()
but much more flexibility.
For JSON, newlines aren't the delimiter you want, but with nested structures, I'm not sure how you'd parse a partial structure anyway. Are there JSON equivalents of SAX (event-based) parsers?
Regards,
Rowan Collins
[IMSoP]
Hi list,
I'm considering writing an RFC to add a 3rd parameter to fgets which
accepts a user defined function. If we had this today we wouldn't need
fgetcsv with the added benefit of fgetcsv style support for data
packaging formats we would otherwise create more 1 off functions for.
For example, if we decided to support reading json from files in the
same manner as our current fgetcsv functionality today, we would create
an fgetjson function.This change unifies the way in which we support native transliteration
of data packaging formats from files into php data structures through a
single interface. The other major design benefit, from my point of
view, is the unification of userland transliteration
functions/libraries with the same modality as our native support for
these types of use cases. I believe this will ultimately result in more
intuitive userland code around this type of functionality.
It's an interesting idea, but I can't immediately picture how it would work - what would the callback be given, and what would it return? Would it somehow be able to manipulate the number of characters read from the stream?
For any variant of CSV, reading a line at a time is what you want anyway, and you can easily build an Iterator which post-processes each line as it is read, giving the memory efficiency of fgetcsv()
but much more flexibility.
For JSON, newlines aren't the delimiter you want, but with nested structures, I'm not sure how you'd parse a partial structure anyway. Are there JSON equivalents of SAX (event-based) parsers?
The callback would be given the string as returned by fgets today. The functional equivalent to fgetjson today is handled by something like
$handle = fopen(~some file~, 'r');
while (($data = fgets($handle)) !== FALSE) {
$data = json_decode($data, true);
...other stuff...
}
and would change to
$handle = fopen(~some file~, 'r');
$decode = json_decode($data, true);
while (($data = fgets($handle,0,$decode)) !== FALSE) {
...other stuff...
}
fgetcsv equivalent would be
$handle = fopen(~some file~, 'r');
$decode = str_getcsv(...options...);
while (($data = fgets($handle,0,$decode)) !== FALSE) {
...other stuff...
}
userland benefits from having an API that promotes consistency through a flexible interface
$handle = fopen(~some file~, 'r');
$decode = function($foo) { ...do stuff...; return $bar;}
while (($data = fgets($handle,0,$decode)) !== FALSE) {
...other stuff...
}
On a side-note, small json data packages, delimited with newlines, and stored on cheap disk is an increasingly popular (in my circles) way to handle storing raw data that could be subject to later scrutiny or processing. In these cases parsing json just like a csv file and converting into a native format is necessary. I've found json packages in files to have huge advantages over storing the equivalent data + relationship in csv format. With that said, json isn't the end-all and be-all. We need to anticipate this model evolving into any existing or future data format. Providing a clean and consistent way to handle all of the existing and yet-to-be-determined use cases around fgets and different data packaging formats is the primary purpose of my proposal.
The callback would be given the string as returned by fgets today. The
functional equivalent to fgetjson today is handled by something like
$handle = fopen(~some file~, 'r');
while (($data = fgets($handle)) !== FALSE) {
$data = json_decode($data, true);
...other stuff...
}
and would change to
$handle = fopen(~some file~, 'r');
$decode = json_decode($data, true);
while (($data = fgets($handle,0,$decode)) !== FALSE) {
...other stuff...
}
Since you need a function reference for the callback, you'd actually need a closure to capture the options:
$decode = function($data) { return json_decode($data, true); };
This is actually more effort and code than the existing version, so I'm not sure what is gained.
Either way, the likelihood is you'd want to wrap this into a user function. As I mentioned earlier, making it into an Iterator is often useful, and potentially as simple as a generator function a bit like this:
function fjsoniterator($fh) {
if ( ! feof($fh) ) {
yield json_decode(fgets($fh), true);
}
}
$fh = fopen(...);
foreach ( fjsoniterator($fh) as $data ) { ... }
Regards,
Rowan Collins
[IMSoP]
For JSON, newlines aren't the delimiter you want, but with nested structures, I'm not sure how you'd parse a partial structure anyway. Are there JSON equivalents of SAX (event-based) parsers?
If JSON is encoded into another format, newlines can be a valid
delimiter. For example, JSON-Base64 uses newlines:
JSON-Base64 is more for cross-application support where PHP isn't the
only language in the mix. If I'm moving data between two PHP hosts in a
migration scenario, I'll tend to use serialize()
and Base64 encoding,
which preserves PHP objects across the network and requires less effort.
--
Thomas Hruska
CubicleSoft President
I've got great, time saving software that you will find useful.
Hi!
Hi list,
I'm considering writing an RFC to add a 3rd parameter to fgets which accepts
a user defined function. If we had this today we wouldn't need fgetcsv with
the added benefit of fgetcsv style support for data packaging formats we
would otherwise create more 1 off functions for. For example, if we decided
to support reading json from files in the same manner as our current fgetcsv
functionality today, we would create an fgetjson function.This change unifies the way in which we support native transliteration of
data packaging formats from files into php data structures through a single
interface. The other major design benefit, from my point of view, is the
unification of userland transliteration functions/libraries with the same
modality as our native support for these types of use cases. I believe this
will ultimately result in more intuitive userland code around this type of
functionality.
The fgets()
function has a few behavioural properties that are hard to change:
- it reads until it reaches a newline;
- it’s expected to return a string.
The above two properties prevent it from being a good candidate for reading other formats, such as CSV or JSON which may:
- span across multiple lines and
- almost always return something other than a string.
Before I go any further in formalizing my proposal, I'd like to get feedback
from list members.Thanks for your time.
Bill Salak