docs/obs-concept.rst
author Pierre-Yves David <pierre-yves.david@logilab.fr>
Wed, 28 Mar 2012 11:07:53 +0200
changeset 186 0698376bb13c
parent 173 4d44a37d51d1
parent 184 f047cd4f1235
child 192 706a3a57b567
permissions -rw-r--r--
merge with arne change

-----------------------------------------------------------
Why Do We Need a New Concept
-----------------------------------------------------------

Current DVCS are great tools to forge a series of flawless changeset on your own.
But they perform poorly when it comes to **sharing** some work in progress and
**collaborating** on such work in progress.

When people forge a new version of a changeset they actually create a
new changeset and get rid of the original changeset. Difficultis to
collaborate mostly came from the way old content is *removed* from
a repository.

Mercurial Approach: Strip
-----------------------------------------------------

With the current version of mercurial, every changeset that exists in
your repository is *visible* and *meaningful*. To delete old
(rewritten) changesets, mercurial removes them from the repository
storage with an operation called *strip*. After the *stripping*, the
repository looks like if the changeset never existed.

This approach is simple and effective except there is a very big
drawback: you can remove changesets from **your repository only**. If
a stripped changeset exists in another repository it touches, it will
show up again. This is because a shared changeset becomes
part of a shared global history. Stripping a changeset from all
repositories is at best impractical and in most case impossible!

As consequence, **you can not rewrite something once you exchange it with
others**. The old version will still exist along side the new one [#]_.

However backups are created while stripping a changeset in most
cases. This allow restoration of an old changeset but the process is
painful.

Finally, as the repository format is not optimized for deletion,
stripping a changeset may be slow in some situation.

To sum up, the strip approach is very simple but does not handle
interaction with the outer world, which is very unfortunate for a
*Distributed* VCS.

.. [#] various workarounds exist but they are workarounds with their own flow.

Git Approach: Overwrite Reference
-----------------------------------------------------

The Git approach to repository structure is a bit more complex: there
can be any amount of unrelated changesets in a repository, and **only
changesets referenced by a git branch** are *visible* and
*meaningful*.


.. warning:: add a schema::

        C
        | B---<foo>
        |/
        |
        A

    Only B and A are visible.

This ease the process of getting rid of old changesets. You can just
leave them in place and move the reference on the new one. You can
then propagate those changes by moving the git-branch on remote host,
the newer versions overwritting the older ones.

This approach goes a bit further but still have a major drawback:

Because you **overwrite** git-branch you have no conflit resolution. The last
to speak wins. This makes collaboration on multiple changesets difficult because
you can't merge concurent updates on a changeset.

Every overwrite is a forced operation where the operator says "Yes I
want this to replace that". On a higly distributed environment, a user may
end with conflicting references and with no proper way to choose.

Because of this way to visualize a repository, git-branches are a very
core part of git. This makes the user interface more complicated and
moving through history more constrained.

Finally, even if all older changesets still exist in the repository, accesing them
is still painful.


-----------------------------------------------------
The Obsolete Marker Concept
-----------------------------------------------------


As none of these concepts were powerful enough to embrace the need to
safely share rewritten history we needed another one.

Basic concept
-----------------------------------------------------


Every history rewriting operation stores the information that old rewritten
changesets has a newer version available in a set of changesets.

All basic history rewriting operation can create a appropriate obsolete marker.


.. figure:: ./figures/example-1-update.*

    *Updating* a changeset

    Create one obsolete marker: ``([A'] obsolete A)``



.. figure:: ./figures/example-2-split.*

    *Splitting* a changeset in multiple one

    Create one obsolete marker ``([B1, B2] obsolete B)]``


.. figure:: ./figures/example-3-merge.*

    *Merging* multiple changeset in a single one

    Create two obsolete markers ``([C] obsolete A), ([C] obsolete B)``

.. figure:: ./figures/example-4-reorder.*

    *Moving* changeset around

    Reordering those two changesets need two obsolete markers:
    ``([A'] obsolete A), ([B'] obsolete B)``



.. figure:: ./figures/example-5-delete.*

    *Removing* a changeset:

    One obselete marker ``([] obsolete B)``


To conclude, a single obsolete marker express a relation from **0..n** new
changesets to **1** old changeset.

Basic Usage
-----------------------------------------------------

Obsolete markers create a perpendicular history: **a versionned version of the
changeset graph**. This means that we can have the same feature we have for
versioned files but applied to changeset:

First: we can display a **coherent view** of the history graph with only a
single version of your changeset displayed by the UI.

Second, because obsolete changeset contents are still **available**,
you can

    * **browse** the contents of your obsolete commits,

    * **compare** newer and older versions of a changeset,

    * **restore** contents of previously obsolete changesets.

Finally, the obsolete marker can be **exchanged between
repositories**. You are able to share the result on your history
rewriting operations with other prople and **collaborate on the
mutable part of the history**.

Conflicting history rewriting operation can be detected and
**resolved** as easily as conflicting changes on a file.


Detecting and solving tricky situations
-----------------------------------------------------

History rewriting can lead to complex situations. The obsolete marker
introduces a simple representation for this complex reality. But
people using complex workflows will one day or another have to face
the intrinsic complexity of some real-world situation.

This section describes possible situations, define precise sets of
changesets involved in such situations and explains how the error
cases can be automatically resolved using available information.


Obsolete changesets
````````````````````

Old changesets left behind by obsolete operation are said **obsolete**.

With the current version of mercurial, this *obsolete* part is
stripped from the repository before the end of every rewritting
operation.

.. figure:: ./figures/error-obsolete.*

    Rebasing `B` and `C` on `A` (as `B'`, `C'`)

    This rebase operation added two obsolete markers from new changesets to old
    changesets. These Two old changesets are now part of the *obsolete* part of the
    history.

In most cases the obsolete set will be fully hidden to both the UI and
discovery, hence users do not have to care about them unless they want to
audit history rewriting operations.

Unstable changesets
```````````````````

While exploring the obsolete marker possibility a bit further you may
end up with *obsolete* changeset with *non-obsolete* children. There
are two common ways to achieve this:

* Pull a changeset based of an old version of a changeset [#]_.

* Use a partial rewriting operation. For example amend on a changeset with
  childrens.

*Non-obsolete* changeset based on *obsolete* one are said **unstable**

.. figure:: ./figures/error-unstable.*

    Amend `A` into `A'` leaving `B` behind.

    In this situation we can not consider `B` as *obsolete*.  But we have all
    necessary data to detect `B` as an *unstable* branch of the history because
    its parent `A` is *obsolete*. In addition, we have enough data to
    automatically resolve this instability: we know that the new version of `B`
    parent (`A`) is `A'`, We can deduce that we should rebase `B` on `A'` to get
    a stable history again.

Proper warnings should be issued when part of the history becomes
unstable. The UI will be able to use the obsolete marker to
automatically suggest a resolution to the user of even carry them out
for him.


XXX details automatic resolution for

* movement

* handling deletion

* handling split on multiple head


.. [#] For this to happen one needs to explicitly enable exchange of draft
       changeset. See phase help for details.

The two parts of the obsolete set
``````````````````````````````````````

The previous section shows that there could be two kinds of *obsolete*
changesets:

* *obsolete* changeset with no or *obsolete* only descendants, said **extinct**.

* *obsolete* changeset with *unstable* descendants, said **suspended**.


.. figure:: ./figures/error-extinct.*

    Amend `A` and `C` leaving `B` behind.

    In this example we have two *obsolete* changesets: `C` with no *unstable*
    children is *extinct*. `A` with *unstable* descendant (`B`) is *suspended*.
    `B` is *unstable* as before.


Because nothing outside the obsolete set default on *extinct* changesets, they
can be safely hidden in the UI and even garbage collected. *Suspended* changeset
have to stay visible and available until they unstable descendant are rewritten
in stable version.


Conflicting rewriting
``````````````````````

If people start to concurrently edit the same part of the history they will
likely meet conflicting situations when a changeset have been rewritten in two
different versions.


.. figure:: ./figures/error-conflicting.*

    Conflicting rewriting of `A` into `A'` and `A''`

This kind of conflict is easy to detect with an obsolete marker
because an obsolete changeset can have more than one new version. It
may be seen as the multiple heads case. Mercurial warns you about this
on pull. It is resolved the same way by a merge of A' and A'' that
will keep the same parent than `A'` and `A''` with two obsolete
markers pointing to both `A` and `A'`

.. warning::  TODO: Add a schema of the resolution. (merge A' and A'' with A as
              ancestor and graft the result of A^)

Allowing multiple new changesets to obsolete a single one allows to
distinguish a split changeset from an history rewriting conflict.

Reliable history
``````````````````````

Obsolete marker really help to smooth rewriting operation process. However they
do not change the fact that **you should only rewrite the mutable part of the
history**. The phase concept enforce this rules by explicitly defining a
public immutable set of changeset. Rewriting operation refuse to work on
public changeset, but they is still some corner case where changesets
rewritten in the past are made public.

Special rules apply for obsolete marker pointing to public changeset:

* Public changesets are excluded from the obsolete set (public
  changesets are never hidden or candidate to garbage collection)

* *newer* version of a public changeset are said **latecomer** and highlighted as
  an error case.

Solving such an error is easy. Because we know what changeset a
*latecomer* tries to rewrite, we can easily compute a smaller
changeset containing only the change from the old *public* to the new
*latecomer*.

.. warning:: add a schema


Conclusion
----------------

Obsolete marker is a powerful concept that allows mercurial to safely handle
history rewriting operations. It is a new type of relation between Mercurial
changesets that track the result of history rewriting operations.

This concept is simple to define and provides a very solid base to:


- Very fast history rewriting operations,

- auditable and reversible history rewritting process,

- clean final history,

- share and collaborate on mutable part of the history,

- gracefully handle history rewriting conflict,

- allows various history rewriting UI to collaborate with a underlying common API.

.. list-table:: Comparison on solution [#]_
   :header-rows: 1

   * - Solution
     - Remove changeset locally
     - Works on any point of your history
     - Propagation
     - Collaboration
     - Speed
     - Access to older version

   * - Strip
     - `+`
     - `+`
     - \
     - \ 
     - \ 
     - `- -`

   * - Reference
     - `+`
     - \ 
     - `+`
     - \ 
     - `+`
     - `-`

   * - Obsolete
     - `+`
     - `+`
     - `++`
     - `++`
     - `+`
     - `+`



.. [#] To preserve good tradition in comparison table, an overwhelming advantage
       goes to the defended solution.