At the risk of being criticised again, I will lay out a couple of problems that
need to be addressed as part of making any progress on DVCS support.
The first question is 'Do we need DVCS?'
The answer is simple - yes - but what and how is not so clear cut.
(Bare with me ...)
This leads on to 'What is stopping using DVCS currently?'
And the answer is in the rfc - "We will not convert the whole SVN repository at
once."
A mirror of the existing SVN repository falls over because of the size. So would
it be possible to break down the base into more manageable chunks without moving
it to DVCS? Modularize the existing code so that DVCS mirrors are more practical?
It is this area that is probably more important to get agreement on than
selecting a particular DVCS system. The submodule/subrepo handling process has
been relegated to subsequent RFC's, when in reality it is handling this area
which is key to getting a fully DVCS based system working at all and how the
underlying system handles it needs to be part of the decision process?
The other question I think is a no-brainer. 'Where is a DVCS solution hosted?'
Choosing a solution simply because github is more popular than bitbucket did
annoy me when other projects made it, but in the case of PHP I don't think
anybody would suggest that the whole repo system would be moved to one of these?
So the master repos would be at dvcs.php.net?
That being the case, with the correct modular structure, then people should be
able to simply clone to their preferred DVCS system, and commit changes back via
karma authentication? I have no doubt that work in progress would then be
appearing on bitbucket, github and private hosts, but everybody knows where the
master codebase can be found.
In hindsight, the SVN move probably took too long to agree on, and developments
in other areas were already at that time pressing on that process. But at that
time DVCS systems were definitely not ready for handling modular code bases like
PHP, and I would argue at the moment that this is still an area that is not
totally developed? Simply mirroring CVS to DVCS would probably have been a lot
easier process to manage, but we are now 'lumbered' with SVN, so can the SVN
experts offer any ideas to creating a path forward?
--
Lester Caine - G8HFL
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk//
Firebird - http://www.firebirdsql.org/index.php
I'd be happy to put people in touch with members of the team that
handled the Drupal git migration earlier this year. They successfully
migrated > 1 million lines of code with a 10 year history across a few
thousand repositories from CVS to Git without a hitch (at least no hitch
that the outside world saw). That sort of practical experience should
be helpful in determining some of the finer points of how you'd actually
DO that for a project like PHP.
--Larry Garfield
At the risk of being criticised again, I will lay out a couple of
problems that need to be addressed as part of making any progress on
DVCS support.The first question is 'Do we need DVCS?'
The answer is simple - yes - but what and how is not so clear cut.
(Bare with me ...)This leads on to 'What is stopping using DVCS currently?'
And the answer is in the rfc - "We will not convert the whole SVN
repository at once."
A mirror of the existing SVN repository falls over because of the size.
So would it be possible to break down the base into more manageable
chunks without moving it to DVCS? Modularize the existing code so that
DVCS mirrors are more practical?It is this area that is probably more important to get agreement on than
selecting a particular DVCS system. The submodule/subrepo handling
process has been relegated to subsequent RFC's, when in reality it is
handling this area which is key to getting a fully DVCS based system
working at all and how the underlying system handles it needs to be part
of the decision process?The other question I think is a no-brainer. 'Where is a DVCS solution
hosted?'
Choosing a solution simply because github is more popular than bitbucket
did annoy me when other projects made it, but in the case of PHP I don't
think anybody would suggest that the whole repo system would be moved to
one of these? So the master repos would be at dvcs.php.net?That being the case, with the correct modular structure, then people
should be able to simply clone to their preferred DVCS system, and
commit changes back via karma authentication? I have no doubt that work
in progress would then be appearing on bitbucket, github and private
hosts, but everybody knows where the master codebase can be found.In hindsight, the SVN move probably took too long to agree on, and
developments in other areas were already at that time pressing on that
process. But at that time DVCS systems were definitely not ready for
handling modular code bases like PHP, and I would argue at the moment
that this is still an area that is not totally developed? Simply
mirroring CVS to DVCS would probably have been a lot easier process to
manage, but we are now 'lumbered' with SVN, so can the SVN experts offer
any ideas to creating a path forward?
I'd be happy to put people in touch with members of the team that handled
the Drupal git migration earlier this year. They successfully migrated > 1
million lines of code with a 10 year history across a few thousand
repositories from CVS to Git without a hitch (at least no hitch that the
outside world saw). That sort of practical experience should be helpful in
determining some of the finer points of how you'd actually DO that for a
project like PHP.
Another great example of a successful is the freedesktop project. They
have 100s of sub project and each of them have modules.
However, I fear that Lester is mixing topics in all possible ways and
that's slightly confusing for outsiders.
Cheers,
Pierre
@pierrejoye | http://blog.thepimp.net | http://www.libgd.org
Pierre Joye wrote:
However, I fear that Lester is mixing topics in all possible ways and
that's slightly confusing for outsiders.
Explain .....
--
Lester Caine - G8HFL
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk//
Firebird - http://www.firebirdsql.org/index.php
Pierre Joye wrote:
Another great example of a successful is the freedesktop project. They
have 100s of sub project and each of them have modules.
Can I suggest you have a closer look at this conversion Pierre and work out what
is currently missing from it ... hint - as a new developer how do I 'check out
(clone)' a copy of libraoffice.
--
Lester Caine - G8HFL
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk//
Firebird - http://www.firebirdsql.org/index.php
Pierre Joye wrote:
Another great example of a successful is the freedesktop project. They
have 100s of sub project and each of them have modules.Can I suggest you have a closer look at this conversion Pierre and work out
what is currently missing from it ... hint - as a new developer how do I
'check out (clone)' a copy of libraoffice.
By RTFM: http://www.libreoffice.org/get-involved/developers/
--
Pierre
@pierrejoye | http://blog.thepimp.net | http://www.libgd.org
Pierre Joye wrote:
Pierre Joye wrote:
Another great example of a successful is the freedesktop project. They
have 100s of sub project and each of them have modules.Can I suggest you have a closer look at this conversion Pierre and work out
what is currently missing from it ... hint - as a new developer how do I
'check out (clone)' a copy of libraoffice.
By RTFM:http://www.libreoffice.org/get-involved/developers/
Exactly ...
While there were separate submodules for each section, as a result of converting
the modular structure of the original CVS code base, those are now marked as
obsolete ( http://cgit.freedesktop.org/libreoffice ) because submodules does not
work well, they have simply rolled the modules back into core ... a single
monolithic repo :(
--
Lester Caine - G8HFL
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk//
Firebird - http://www.firebirdsql.org/index.php
Pierre Joye wrote:
On Thu, Aug 18, 2011 at 9:32 AM, Lester Cainelester@lsces.co.uk wrote:
Pierre Joye wrote:
Another great example of a successful is the freedesktop project.
They
have 100s of sub project and each of them have modules.Can I suggest you have a closer look at this conversion Pierre and
work out
what is currently missing from it ... hint - as a new developer how do
I
'check out (clone)' a copy of libraoffice.Exactly ...
While there were separate submodules for each section, as a result of
converting the modular structure of the original CVS code base, those are
now marked as obsolete ( http://cgit.freedesktop.org/libreoffice ) because
submodules does not work well, they have simply rolled the modules back into
core ... a single monolithic repo :(
That's not the reason but the changes in how it was developed. See
gstreamer or xorg for other examples with multiple modules.
--
Pierre
@pierrejoye | http://blog.thepimp.net | http://www.libgd.org
Pierre Joye wrote:
Pierre Joye wrote:
Pierre Joye wrote:
Another great example of a successful is the freedesktop project.
They
have 100s of sub project and each of them have modules.Can I suggest you have a closer look at this conversion Pierre and
work out
what is currently missing from it ... hint - as a new developer how do
I
'check out (clone)' a copy of libraoffice.Exactly ...
While there were separate submodules for each section, as a result of
converting the modular structure of the original CVS code base, those are
now marked as obsolete (http://cgit.freedesktop.org/libreoffice ) because
submodules does not work well, they have simply rolled the modules back into
core ... a single monolithic repo:(
That's not the reason but the changes in how it was developed.
Well I started a clone over an hour ago, and it's not 5% through yet ... many of
the sections it's currently downloading I do not need to bother about.
See gstreamer or xorg for other examples with multiple modules.
Again neither of those seem to be using 'superprojects', just the odd library
included as a submodule.
This was the problem I hit last year. A nice CVS repo got translated into some
200 git repo's, but there still is no easy way of building a 'project' that
pulls the say a core build with all the essential modules. You end up writing
scripts to pull each and manage them individually. I would anticipate that php
is probably larger than the libraoffice codebase, so being able to properly
manage modular builds is going to be important?
--
Lester Caine - G8HFL
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk//
Firebird - http://www.firebirdsql.org/index.php
This was the problem I hit last year. A nice CVS repo got translated into
some 200 git repo's, but there still is no easy way of building a 'project'
that pulls the say a core build with all the essential modules. You end up
writing scripts to pull each and manage them individually. I would
anticipate that php is probably larger than the libraoffice codebase, so
being able to properly manage modular builds is going to be important?
Again, it is totally unrelated to git or hg but the choices of the
developers. The discussion about having extensions (or whatever else)
in external modules or in core (no matter which system is used, svn,
git or hg) is not related to this topic.
ps: it's libreoffice.
Cheers,
Pierre
@pierrejoye | http://blog.thepimp.net | http://www.libgd.org
Again neither of those seem to be using 'superprojects', just the odd library
included as a submodule.
It seems that the statements in the RFC were not clear enough, I'll add some explanation.
We will very probably not use submodules in php-src anyway. Other module will define
what's best for them.
Please talk about the RFC and it's content. If you have a certain problem with a certain
section in the RFC i'm happy to discuss it. At the moment I cannot see what the topic
adds to the RFC discussion.
I will clarify some things anyway.
David Soria Parra wrote:
Again neither of those seem to be using 'superprojects', just the odd library
included as a submodule.
It seems that the statements in the RFC were not clear enough, I'll add some explanation.
We will very probably not use submodules in php-src anyway. Other module will define
what's best for them.
Just for the record ... the libreoffice clone I started 9 hours ago is still
going strong, and I'm estimating that it will finish some time tomorrow
afternoon, another 20 hours or so. A single huge repo is going to take time to
handle? Or am I doing something wrong? All I've done at the moment is followed
the instructions Pierre directed me to ... they were not on the original notes I
looked at.
Please talk about the RFC and it's content. If you have a certain problem with a certain
section in the RFC i'm happy to discuss it. At the moment I cannot see what the topic
adds to the RFC discussion.
I simply don't see the point of the rfc?
Starting with the drawbacks on SVN ... they were ignored when used as reasons to
not move from CVS. People had already decided that a move should take place and
when DVCS was brought up then it was not seen as a reason hold fire since people
wanted to plough on. Lets not plough on with another change without fully
understanding the problem?
Many of us are already using DVCS, and so the question is one of how do we link
from our preferred DVCS system into what ever PHP does. git simply does not work
for me and so I am already committed to hg. HAVING broken things down into
subrepos in hg, I can now build projects just using the modules I need, and I
don't need to carry around lots of unused history. Trying to follow the trees of
some single repo projects on github and elsewhere can be almost impossible.
I will clarify some things anyway.
In 'Moving extension from/to core to/from pecl'
'Commits across multiple subrepositories will lead to separate commits.' When
making a change which affects several modules then the reason for the change
needs to be properly documented. This is why I say that we need a proper
management of these changes. A good example in the past was the bugs introduced
into several of the database drivers by global changes. php_interbase and others
developed a problem with bolb ID's being corrupted. Trying to track why and
where changes had been made was a problem, and just being able to isolate the
one driver and fix that was fun. Had the commits to each driver been separated,
repairing the damage would have been a lot easier? Changes that create patches
to hundreds of files in the one commit would seem to be a lot more of a problem
than distributing those commits, and logging the details of each module affected
in the base bug/feature report justifying a global change?
Moving on from that ... probably 50% of the 'core modules' are essentially
optional, so being able to 'pick and choose' the modules you want makes perfect
sense, much as we all do when selecting modules in the build process. Why do you
need a combined history across an array of modules? Although I don't see
anything preventing one being created since all the information is available? I
flag changes that have a global effect, but the bulk of commits are always
within a single module. If they affect a second module it may well be that
something is wrong with the segregation of the activity and that needs to be
looked at?
--
Lester Caine - G8HFL
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk//
Firebird - http://www.firebirdsql.org/index.php
Lester Caine wrote:
Just for the record ... the libreoffice clone I started 9 hours ago is
still going strong, and I'm estimating that it will finish some time
tomorrow afternoon, another 20 hours or so. A single huge repo is going
to take time to handle? Or am I doing something wrong? All I've done at
the moment is followed the instructions Pierre directed me to ... they
were not on the original notes I looked at.
I've been away from home over the weekend stuck in an exhibition hall, so I left
the libreoffice clone via hggit running ... It's still running this morning 4
days later. I'll leave it to finish now. The straight git clone did run in under
an hour, but obviously the cross DVCS processes need a lot more work :( The
other git hosted projects I am working with work with a modular setup using
submodule, and this does not present the same problem, creating hg clones
reasonably quickly.
So is the current question one of better making the case for one over the other?
On the existing RFC there is little detail which covers the reasons that people
who are working cross platform DO have a problem with git, and why currently hg
is providing a clean transparent platform. Both will do the job that it seems is
being targeted, even if I disagree that it's the right target, but neither are
clear front runners? It would seem however that a hybrid system supporting both
is probably still some way off :(
--
Lester Caine - G8HFL
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk//
Firebird - http://www.firebirdsql.org/index.php
Lester Caine wrote:
Just for the record ... the libreoffice clone I started 9 hours ago is
still going strong, and I'm estimating that it will finish some time
tomorrow afternoon, another 20 hours or so. A single huge repo is going
to take time to handle? Or am I doing something wrong? All I've done at
the moment is followed the instructions Pierre directed me to ... they
were not on the original notes I looked at.I've been away from home over the weekend stuck in an exhibition hall,
so I left the libreoffice clone via hggit running ... It's still
running this morning 4 days later. I'll leave it to finish now. The
straight git clone did run in under an hour, but obviously the cross
DVCS processes need a lot more work :( The other git hosted projects I
am working with work with a modular setup using submodule, and this
does not present the same problem, creating hg clones reasonably quickly.So is the current question one of better making the case for one over
the other? On the existing RFC there is little detail which covers the
reasons that people who are working cross platform DO have a problem
with git, and why currently hg is providing a clean transparent
platform. Both will do the job that it seems is being targeted, even
if I disagree that it's the right target, but neither are clear front
runners? It would seem however that a hybrid system supporting both is
probably still some way off :(
There's something weird in HG's network stack. I tried doing grabbing an
svn repository with hg, and it took over an hour to get part of the way
through. Bazaar was able to grab the whole repo in 2-3 minutes. Haven't
tried with git though.
David
David Muir wrote:
It would seem however that a hybrid system supporting both is
probably still some way off:(
There's something weird in HG's network stack. I tried doing grabbing an
svn repository with hg, and it took over an hour to get part of the way
through. Bazaar was able to grab the whole repo in 2-3 minutes. Haven't
tried with git though.
I'm not sure that it is necessarily the 'hg' side, but rather the extension that
is the bottleneck. All the core teams seem to concentrate on just what they want
to work well and the other bits, such as the GUI interfaces and cross platform
work gets left in the cold :(
My hg clone of just the PHP5.3 tree seems to have stopped working again, and yes
it took a while to pull just the subset of SVN I selected.
--
Lester Caine - G8HFL
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk//
Firebird - http://www.firebirdsql.org/index.php