docs/obs-concept.rst
author Arne Babenhauserheide <bab@draketo.de>
Wed, 28 Mar 2012 11:35:53 +0200
changeset 191 0f1b8119a281
parent 184 f047cd4f1235
child 192 706a3a57b567
permissions -rw-r--r--
Backed out backout changeset f047cd4f1235

==============================
 Why Do We Need a New Concept
==============================

Current DVCS are great tool to forge a series of flawless changeset on your own.
But they perform poorly when is comes to **share** work in progress and
**collaborate** on such work in progress.

When people forge new version of a changeset they create a new changeset and get
ride of the original changeset. Difficulties to collaborate mostly came from the
way old content are *removed* from repository.

Mercurial Approach: Strip
=========================

With current version of mercurial, every changesets that exist in your
repository are *visible* and *meaningful*. To get ride of old changeset you
rewrote mercurial remove them from the repository storage. with an operation
called *strip*. After the *strip* the repository looks like if the changeset
never existed.

This approach is simple and effective but have a very big drawback: You can
remove changesets from **your repository only**. If strip exists in other
repositories it will show of again and again. This only cure for this is to
strip the offending changeset from all repository. And operation at best
impractical and in most case impossible!


As consequence, **you can not rewrite something once you exchange it with
others**. The old version will still exists along side the new one [#]_.

Moreover stripping changesets creates backup bundles. This allows
restoration of the deleted changesets, but the process is painful.

Finally, as the repository format is not optimized for deletion. stripping a
changeset may be slow in some situations.

To sum up, the strip approach is very simple but does not handle interaction
with the outer world. Which is unfortunate for a *Distributed* VCS.

.. [#] various work around exists but they require their own workflows which are distinct from the very elegant basic workflow of Mercurial.

Git Approach: Overwrite Reference
=================================

Git approach for repository is a bit more complex: Any number of
changesets can exist in a repository. but **only changesets referenced by a git
branch** are *visible* and *meaningful*.


.. warning:: add a schema::

        C
        | B---<foo>
        |/
        |
        A

    Only B and A are visible.

This simplifies the process of getting rid of old changesets. You can
just leave them in place and move the reference on the new one. You
can then propagate that change by moving the git-branch on remote host
with the newer version of the marker overwriting the older one.

This approach goes a bit further but still has a major drawback:


Because you **overwrite** the git-branch, you have no conflict resolution. The last
to act wins. This makes collaboration on multiple changesets difficult because
you can't merge concurrent updates on a changeset.

Every overwrite is a forced operation where the operator say "Yes I want this to
replace that. In highly distributed environments, a user may end up with conflicting
references and no proper way to choose.

Because of this way to visualize a repository, git-branches are a core
part of git, which makes the user interface more complicated and
constrains the ways to move through history.

Finally, even if all older changeset still exist in the repository, access to them
is still painful.


=============================
 The Obsolete Marker Concept
=============================





As None of the concepts was powerful enough to fulfill the need of safely rewriting
history, including easy sharing and collaborating on mutable history, we needed another one.



Basic concept
=============


Every history rewriting operation stores the information that the old rewritten
changeset is replaced by newer version in a given set of changeset.

All basic history rewriting operation can create an appropriate obsolete marker.


.. figure:: ./figures/example-1-update.*

    *Updating* a changeset

    Create one obsolete marker: ``([A'] obsolete A)``



.. figure:: ./figures/example-2-split.*

    *Splitting* a changeset in multiple one

    Create one obsolete marker ``([B1, B2] obsolete B)]``


.. figure:: ./figures/example-3-merge.*

    *Merging* multiple changeset in a single one

    Create two obsolete markers ``([C] obsolete A), ([C] obsolete B)``

.. figure:: ./figures/example-4-reorder.*

    *Moving* changeset around

    Reordering those two changesets need two obsolete markers:
    ``([A'] obsolete A), ([B'] obsolete B)``



.. figure:: ./figures/example-5-delete.*

    *Removing* a changeset:

    One obselete marker ``([] obsolete B)``


To conclude, a single obsolete marker express a relation from **0..n** new
changesets to **1** old changeset.

Basic Usage
===========

Obsolete markers create a perpendicular history: **a versioned changeset graph**. This means that offers the same features we have for
versioned files but applied to changeset:

First: we can display a **coherent view** of the history graph in which only a
single version of your changesets are displayed by the UI.

Second, because obsolete changeset content is still **available**. You can 

    * **browse** the content of your obsolete commit,

    * **compare** newer and older version of a changeset,

    * **restore** content of previously obsolete changeset.

Finally, obsolete marker can be **exchanged between repositories**. You are able to
share the result on your history rewriting operation with other and **collaborate
on mutable part of the history**.

Conflicting history rewriting operation can be detected and **resolved** as easily
as conflicting changes on file.


Detecting and solving tricky situations
======================================

History rewriting can lead to complex situations. Obsolete marker introduce a
simple representation of this complex reality. But people using complex workflows
will one day or another have to face the intrinsic complexity of some
situations.

This section describes possible situations, defines precise sets of changesets
involved in such situations and explains how error cases can automatically be
resolved using available information.


obsolete changesets
-------------------

Old changesets left behind by obsolete operation are called **obsolete**.

With the current version of mercurial, this *obsolete* part is stripped from the
repository before the end of every rewriting operation.

.. figure:: ./figures/error-obsolete.*

    Rebasing `B` and `C` on `A` (as `B'`, `C'`)

    This rebase operation added two obsolete markers from new changesets to old
    changesets. These two old changesets are now part of the *obsolete* part of the
    history.

In most cases, the obsolete set will be fully hidden to both UI and discovery so
the user does not have to care about them unless he wants to audit the history rewriting
operation.

Unstable changesets
-------------------

While exploring the possibilities of the obsolete a bit further, you may end up with
*obsolete* changeset which have *non-obsolete* children. There is two common ways to
achieve this:

* Pull a changeset based of an old version of a changeset [#]_.

* Use a partial rewriting operation. For example amend on a changeset with
  children .

*Non-obsolete* changeset based on *obsolete* one are called **unstable**

.. figure:: ./figures/error-unstable.*

    Amend `A` into `A'` leaving `B` behind.

    In this situation we can not consider `B` as *obsolete*.  But we have all
    necessary data to detect `B` as an *unstable* branch of the history because
    its parent `A` is *obsolete*. In addition, we have enough data to
    automatically resolve this instability: we know that the new version of `B`
    parent (`A`) is `A'`, We can deduce that we should rebase `B` on `A'` to get
    a stable history again.

Proper warning should be issued when part of the history become unstable. UI
will be able to use the obsolete marker to automatically suggest resolution to
the user of even carry them out for him.


XXX details on automatic resolution for

* movement

* handling deletion

* handling split on multiple head


.. [#] For this to happen one needs to explicitly enable exchange of draft
       changesets. See phase help for details.

The two part of the obsolete set
--------------------------------

The previous section show that there can be two kinds of an *obsolete* changeset:


* an *obsolete* changeset with no or *obsolete* only descendants is called **extinct**.

* an *obsolete* changeset with *unstable* descendants is called **suspended**.


.. figure:: ./figures/error-extinct.*

    Amend `A` and `C` leaving `B` behind.

    In this example we have two *obsolete* changesets: `C` with no *unstable*
    children is *extinct*. `A` with *unstable* descendant (`B`) is *suspended*.
    `B` is *unstable* as before.


Because nothing outside the obsolete set default on *extinct* changesets, they
can be safely hidden in the UI and even garbage collected. *Suspended* changesets
have to stay visible and available until their unstable descendant are rewritten
into stable version.


Conflicting rewrites
---------------------

If people start to concurrently edit the same part of the history they will
likely meet conflicting situations when a changeset has been rewritten in two
different ways.


.. figure:: ./figures/error-conflicting.*

    Conflicting rewrite of `A` into `A'` and `A''`

This kind of conflict is easy to detect with obsolete markers, because an obsolete
changeset can have more than one new version. It may be seen as the multiple heads
case which Mercurial warns you about on pull. It is resolved the same way by a merge of
A' and A'' that will keep the same parent than `A'` and `A''` with two obsolete
markers pointing to both `A` and `A'`

.. warning::  TODO: Add a schema of the resolution. (merge A' and A'' with A as
              ancestor and graft the result of A^)

Allowing multiple new changesets to obsolete a single one allows to differenciate
split changesets from history rewriting conflicts.

Reliable history
----------------

Obsolete marker help to smooth rewriting operation process. However they
do not change the fact that **you should only rewrite the mutable part of the
history**. The phase concept enforces this rule by explicitly defining a
public immutable set of changesets. Rewriting operations refuse to work on
public changesets, but there are still some corner cases where previously rewritten changesets
are made public.

Special rules apply for obsolete markers pointing to public changesets

* Public changesets are excluded from the obsolete set (public changeset are
  never hidden or candidate to garbage collection)

* *newer* version of public changeset are said **latecomer** and highlighted as
  error case.


Solving such error is easy. Because we know what changeset a *latecomer* try to
rewrite, we can easily compute a smaller changeset containing only the change
from the old *public* to the new *latecomer*.


.. warning:: add a schema


Conclusion
==========

The obsolete marker is a powerful concept that allows mercurial to safely handle
history rewriting operations. It is a new type of relation between Mercurial
changesets which tracks the result of history rewriting operations.

This concept is simple to define and provides a very solid base for:


- Very fast history rewriting operations,

- auditable and reversible history rewriting process,

- clean final history,

- sharing and collaborating on the mutable part of the history,

- gracefully handling history rewriting conflicts,

- various history rewriting UI’s collaborating with an underlying common API.

.. list-table:: Comparison on solution [#]_
   :header-rows: 1

   * - Solution
     - Remove changeset locally
     - Works on any point of your history
     - Propagation
     - Collaboration
     - Speed
     - Access to older version

   * - Strip
     - `+`
     - `+`
     - \
     - \ 
     - \ 
     - `- -`

   * - Reference
     - `+`
     - \ 
     - `+`
     - \ 
     - `+`
     - `-`

   * - Obsolete
     - `+`
     - `+`
     - `++`
     - `++`
     - `+`
     - `+`



.. [#] To preserve good tradition in comparison table, an overwhelming advantage
       goes to the defended solution.