hello everyone!
I'm trying to gain some speed by moving a function from PHP legacy
code to C and making an extension. I'm trying to call lynx from the
command line since their C api isn't something soo nice likely to use
it as any other libXX
Currently in PHP I'm doing a system call by using proc_open and
sending and getting the response. Moving the code to C code I found
that I need a "bidirectional" popen to replicate the PHP behavior, and
since this is not allowed/implemented by default on POSIX environments
I'm forced to fork to get this working.
So back to my question: How feasible is to make fork calls on a PHP
extension? Can I just do it or should I use php_stream_* functions to
get a stream and make the system call? To avoid some questions the
main idea of this extension is to be used on POSIX environments, so
basically I dont care if this is not compatible with Windows.
Thanks for your time.
--
Gabriel Sosa
Si buscas resultados distintos, no hagas siempre lo mismo. - Einstein
hello everyone!
I'm trying to gain some speed by moving a function from PHP legacy
code to C and making an extension. I'm trying to call lynx from the
command line since their C api isn't something soo nice likely to use
it as any other libXX
If it were me I would take a step back and look at what lynx was
actually needed for and whether there are other libraries out there that
can do similar things. Like libcurl + the html parsing available in
libxml2, for example. I'm assuming you are calling lynx in order to use
its html parser?
Moving something from PHP to C for performance reasons, but leaving in
the call to an external program isn't going to buy you much, if
anything, since most of your time is spent forking and launching that
external program on each request.
-Rasmus
I'm basically using lynx to convert some html into plain text
basically replicating the following command:
lynx -pseudo_inlines=off -hiddenlinks=merge -reload -cache=0 -notitle
-force_html -dump -nocolor -stdin
I've been looking but I didn't find any other library capable to do
the same with "almost" the same quality.
Thanks
hello everyone!
I'm trying to gain some speed by moving a function from PHP legacy
code to C and making an extension. I'm trying to call lynx from the
command line since their C api isn't something soo nice likely to use
it as any other libXXIf it were me I would take a step back and look at what lynx was actually
needed for and whether there are other libraries out there that can do
similar things. Like libcurl + the html parsing available in libxml2, for
example. I'm assuming you are calling lynx in order to use its html parser?Moving something from PHP to C for performance reasons, but leaving in the
call to an external program isn't going to buy you much, if anything, since
most of your time is spent forking and launching that external program on
each request.-Rasmus
--
Gabriel Sosa
Si buscas resultados distintos, no hagas siempre lo mismo. - Einstein
I'm basically using lynx to convert some html into plain text
basically replicating the following command:
lynx -pseudo_inlines=off -hiddenlinks=merge -reload -cache=0 -notitle
-force_html -dump -nocolor -stdinI've been looking but I didn't find any other library capable to do
the same with "almost" the same quality.
You may be right that it does it better than other mechanisms and it may be the way to go. But it sounds like you need it to be faster. You are still not going to gain much simply by calling lynx from C. The only way to speed this up is to not have to fork and exec a new process on every request. One way to do that would be to figure out how to talk to an already running instance of lynx. Then write a little Gearman wrapper for them and launch a bunch of Gearman workers. Another benefit of this approach is that you will be able call lynx asynchronously.
-Rasmus
I'm basically using lynx to convert some html into plain text
basically replicating the following command:
lynx -pseudo_inlines=off -hiddenlinks=merge -reload -cache=0 -notitle
-force_html -dump -nocolor -stdinI've been looking but I didn't find any other library capable to do
the same with "almost" the same quality.You may be right that it does it better than other mechanisms and it
may be the way to go. But it sounds like you need it to be faster. You
are still not going to gain much simply by calling lynx from C. The
only way to speed this up is to not have to fork and exec a new
process on every request. One way to do that would be to figure out
how to talk to an already running instance of lynx. Then write a
little Gearman wrapper for them and launch a bunch of Gearman workers.
Another benefit of this approach is that you will be able call lynx
asynchronously.
Rasmus is spot on, but another thought is that if your content is often
the same, caching it somehow (either with PHP code or with a PHP
extension--I would just try PHP code for starters) could yield large
speed-ups, too.
Ben.