Hey Mark
Thanks for the huge amount of work you invested there.
But to be honest: Personally I don't think it helps in moving the
documentation to git for several reasons.
Am 22.09.19 um 23:05 schrieb Mark Randall:
Hello all,
After participating in various conversations, and listening to the most
recent PHP internals podcast about moving the documentation to GIT, and
the problems with editing, I spent most of the weekend knocking up a
prototype that I want to present as a possible solution to these problems.The prototype is available at https://php-doceditor.markrandall.uk/ and
has some basic functionality. I must stress it is far from complete.In this PoC I attempted to solve the following issues:
** 1. Difficulty in editing XML **
Something I hear a lot is that the XML format is difficult to work with.
The format clearly has its benefits, after all almost all the web is
written in it, but I think for something like the documentation, a UI
based editor would be a superior mechanism.To that end I created a front-end for editing the files, or rather, a
converted format of the files which takes the XML and ports it into
JSON, which is programatically MUCH easier to work with.It in effect still deals with nodes, but they are abstracted out to
nested UI elements.
XML is much easier to edit than JSON. There is a reason why the
documentation is written in DocBook as it's a widely used format for
writing documentations. At least it was a widely used format. And there
have been different ideas already to move the documentation to a
different format (MarkDown, ASCII-_Docs, ReST, to name a few) but JSON
has never so far popped up. JSON is something that is intended for
machine-to-machine transfer of information. It is intended to be easily
readable for machines but not for human beings. XML on the contrary is
easy to understand by machines as well as human beings. So
programatically it is as easy to work with as JSON but from a human POV
it is much easier to work with than JSON. And as long as human beings
are creating the documentation we need to consider them as well.
Also using JSON means you will need to have a technical aid for writing
the documentation. With XML all you need is a text-editor. Add the
posibility to add comments to XML and I doubt that the people doing the
Documentation will be happy with that step.
A point that is completely missing here is that the change from one
doc-format to another one is not helping in the transition from SVN to
git. On the contrary. It is a) binding additional resources and b) if
the change is necessary to transition to git it would delay the
transition considerably.
** 2. Immediate Visual Feedback **
The process for re-creating the documents once they have been uploaded
to SVN is not an instant one. I wanted a way for the editors to be able
to see what they would get, and so I provided a very simplistic
Javascript based renderer.Naturally it will need a lot of additional work to include all of the
features we would require of it.
There already is a tool to modify the translation and get immediate
feedback. It's available at https://edit.php.net but the visual feedback
generator is broken for years. So it looks like it is not that easy as
so far no one repaired it. Or it is not necessary to get the immediate
feedback.
That does not say that it is a nice add on to the process, but to get
the docs as fast as possible (and I'm at the project for 3 years
already) from SVN to git it's not a necessity. And again: it binds a lot
of workforce for a tool that doesn't seem to be necessary at the moment.
** 3. Translation **
This is perhaps the biggest and most controversial of suggestions. At
present, various translations are kept in completely independent files
where the structure is re-created in each one.My proposal and demo turns this on its head. Rather than using multiple
files, all languages will coexist in a single document that will
simultaneously act as both the template, the English source and its
translations.To achieve this I have introduced the idea of text sections. These are
effectively containers for a set of paragraphs or examples. Crucially,
each text section will contain effectively the same text, but in
multiple languages.This means that text that effectively says the same thing, will be
stored next to each other. As you can see on the example, it is easy to
open both the French and English editing boxes at once.
This will mean breaking with everything that is currently available.
The current project tries to replace the Version Control System
underneath the documentation while trying to leave as much of the
currently existing workflows, tools and processes they way they are.
That does not mean that they are the best processes on earth but those
are the processes the people creating the documentation are used to and
changing one bit at a time is easier to grok than changing everything at
once. Especially when the people doing the work are doing it in their
free time.
One of the reasons there are different files, even folders, even
workspaces per language is the fact that not everyone wants to download
the complete documentation. Additionally the single files will get even
bigger than they are. And if you have the english version as well as all
the translations in one file, does that mean that I have to modify every
language when I modify the english version? Otherwise the local versions
would not be in sync with the english version. Exactly this is currently
solved by the revision-number that is written into the translated file.
It is the revision number of the english file this translation is based
on. When the english file changes it gets a new revision and I can see
that the currentl translated file is not in sync anymore and that it
needs to be revisited and at least checked. How do you allow that when
all translations are in one file?
In addition to that the current setup with different files, even
different workspaces, allows the PHP-project to have different people in
charge of the different translations. And not only in terms of being
able to modify a part of a file but actually having the right to merge
or reject changes to a file for a certain translation. That should be
spread onto different shoulders as not everyone is as fluent in a
certain language as someone else. How would you achieve that if you have
all translations in one file?
** 4. Translation Synchronisation **
Following one from the UI element of translation, as I understand it,
one of the biggest problems with moving to GIT is needing to store
hashes so that the system knows what information is out of date and by
how many revisions.The ability for the UI to add metadata to the text sections eliminates
this problem completely. I propose that each translated section would
have a button displayed that would indicate that a major change had been
made, which would update a modified timestamp in the respective
language’s JSON object for that section.This would allow easy comparison of when translations were outdated. The
renderer, having access to all known translations at once, could
potentially give warnings that a given section was wildly out of date,
and offer to display the English or other more recent version inline.
The "need" to store the hash is not the biggest problem. It is
currently the problem that is challenging me personally the most.
And it is as I want to change as little as possible in the current
workflow. Currently people are accustomed to modifying the revision
number of the english file in the translated file. So instead of using
the revision number they would use the hash. That seems not to work for
some technical reasons (SVN is a bitch here) so we might have to find a
different solution here that changes the current workflow as little as
possible. Some ideas are already brewing here.
Putting everything into one file doesn'T solve the issue. Also having a
timestamp doesn't solve the issue as it merely states when the last
change was made, not how many changes were made since then. Knowing
the latest change was 3 days ago is one thing, knowing there where 15
changes in between says something completely different. Must have been a
major thing...
** 5. Improved Type Information **
With the almost certain addition of unions in 8.0, the types for
parameters and return types are becoming much more formalised.Rather than just expecting one type, I have designed the prototype to
allow specifying multiple parameter types, and multiple return types,
and providing distinct information for each, with the UI adjusting
accordingly when union types are present.
This is awesome, but I hope to have finished the transition before we
get to the point where we need that. And I'm sure DocBook has a way to
handle that as well and I'm equaly sure the people doing the
documentation have their ideas on that topic already. I know for example
that Nikita has started an initiative to add all the types to the
documentation.
One thing we didn't yet speak about is the complete toolchain in the
back that needs to be adapted to create all the files we know as the
PHP-Documentation but also all the files most of us do not know (yet)
that are hidden at http://doc.php.net/
So while I believe that the current way of writing and translating the
documentation can be improved in many ways such an improvement always
needs to take the current situation and the people working with the
current situation and their willingness to change their beloved
workflows into account. Especially on OSS projects where people usually
don't get paid to work on that.
This attempt is in my eyes a great idea that should be discussed on the
documentation mailinglist whether it is an attempt for a future
modification of the files, processes and workflows.
But in the meantime I sadly don't see it helping in the current project
of moving the currently existing documentation from SVN to git.
My 0.02 €
Cheers
Andreas
PS:
Pre-Emptive Q&A
- Why do certain blocks have the ability to add multiple text sections?
This was a design choice to help with translation. While it would have
been possible, and frankly easier to make each major part only have one
translation for each, these could become quite large, and I think it
makes sense to split them up.Therefore, sections like notes and examples have the ability to extend
themselves with a multiple text sections, where each one is tracked and
translated independently.
- But I love XML
At present, I have only made a one-way conversion process that takes XML
and turns it into the necessary JSON for rendering, and it’s my
intention to improve it some to be able to pull in existing translations
from multiple languages (using common identifiers such as a parameter
name as a point of reference to join the data).It would be possible to write something that did this in reverse, and
took the JSON and turned it back into valid Docbook XML. If this makes
any sense in the long run I am not convinced as I think writing the
renderers is a lot easier in JSON, and it can be committed to GIT all
the same if it's pretty-printed so it's not all mushed up on one line.
We currently have renderers in place. They are working quite well.
Moving to JSON means we have to rewrite them which binds working forces.
- Validation?
Definitely needs to have JSON Schema applied to it before it's put into
use.
We have validation by XML-Schemata. Moving to JSON means we have to
redo it again. Which – again – binds working forces
- Source?
Sauce? Tomato Sauce? https://github.com/marandall/phpdoc-editorAvoid the XML parsing code. It's pure cancer.
A personal note here: An XSLT file would be able to do the transition
without the need for PHP ;-)
,,,
(o o)
+---------------------------------------------------------ooO-(_)-Ooo-+
| Andreas Heigl |
| mailto:andreas@heigl.org N 50°22'59.5" E 08°23'58" |
| http://andreas.heigl.org http://hei.gl/wiFKy7 |
+---------------------------------------------------------------------+
| http://hei.gl/root-ca |
+---------------------------------------------------------------------+