docs/obs-implementation.rst
author Pierre-Yves.David@ens-lyon.org
Sat, 12 May 2012 00:12:18 +0200
changeset 246 1e8e32d3871c
parent 244 9d37254031fc
child 247 1a713fa2d3ba
permissions -rw-r--r--
doc: idea about OnDisk Storage


-----------------------------------------------------
Implementation of Obsolete Marker
-----------------------------------------------------
.. warning:: This document is still in heavy work in progress

Main questions about Obsolete Marker Implementation
-----------------------------------------------------


What data should be contained in a Marker ?
````````````````````````````````````````````````````

There is two critical data that **must** be stored in an obsolete Marker.

:object:
    the old obsoleted changeset

:replacements:
    list of new changeset. list size can be anything, including 0 (0..N)

Everybody agreed on this point.

 ---

This is probably a good idea to have an unique Identifier, for UI, transfert and
access.

    :id: same as changeset but for marker.

The field below will depends of the way we exchange obsolete marker betwenn
changeset.

 ---

Having audit data will be very usefull. When it get messy you need all the
information you can to understand the situation.

I have the felling that we are versionning history. Therefor we will probably
need the same kind of information than when versionning Files.

:date: date of the marker creation

:user: ui.username

To go further:

:description: "Optional reason for the rewrite (generated of added by the user)"

:tool: the automated too that made this

:operation: Kind of rewritting operation that created the marker (delete,
            update, split, fold, reordering) To help conflict resolution.

Matt see this a "too complicated". I'll wait for him to meet a very hairy
situation to agree that they are needed.

Leaving the door open to any addition data is an option too.

How shall we store Marker on disk
`````````````````````````````````````````````````````````

Requirement
.............

We need to quickly load the 'object' to know the "obsolete" set.
We need quick access by object and replacements to travels along the graph.

Common Part
.............

The file is store in `.hg/store/obsmarkers`. It is a binary files:

We probably need a Format Version Number somewhere

I'll put it in `.hg/store/obsmarkers-version`

Minimalistic proposal
.........................

The core of a Marker will we stored as:

* number of replacement (8-Bytes integer)
* node id of the obsolete changeset (20-Bytes hash)
* node id of replacement changeset (20-Bytes hash x number of remplacement)

Version with ID
.........................

This version add a node id computed from the marker content. It will be present
*before* other data:

* node id of the maker (20-Bytes hash)


Version with Metadata proposal
...............................

An extra files is used to old metadata (date, user, etc) `.hg/store/obs-extra`:. The format of this
field is undefined yet. This will add the following field at the end of a marker

* offset of the metadata in obs-extra (8-Bytes integer)


How shall we exchange Marker over the Wire ?
`````````````````````````````````````````````````````````

We can have a lot of markers. We do not want to exchange data for the one we
already know. Listkey() is not very appropriate there as you get everything.

Moreover, we might want to only hear about Marker that impact changeset we are
pulling.

pushkey is not batchable yet (could be fixed)

A dedicated discovery and exchange protocol seems mandatory here.


Various technical details
-----------------------------------------------------

Some stuff that worse to note. some may deserve their own section later.

storing old changeset
``````````````````````

The new general delta format allows a very efficient storage of two very similar
changesets. Storing obsolete childrens using general delta takes no more place
than storing the obsolete diff. Reverted file will even we reused. The whole
operation will take much less space the strip backup.


Abstraction from history rewriting UI
```````````````````````````````````````````

How Mercurial handle obsolete marker is independent from who decide to create
them and what actual operation solve error case. Any of the existing history
rewriting UI (rebase, mq, histedit) can lay obsolete marker and resolve
situation created by other. To go further a hook system of obsolete marker
creation would allow each mechanism to collaborate with other though a standard
and central mechanism.


Obsolete marker storage
```````````````````````````

Obsolete marker will most likely be stored outside standard history. They are
multiple reasons for that:


First, obsolete markers are really perpendicular to standard history there is not
strong reason to include it here other than convenience.

Second, storing obsolete marker inside standard history means:


* A changeset must be created every time an obsolete relation is added. Very
  inconvenient for delete operation.

* Obsolete marker must be forged at the creation of the new changeset. This
  is very inconvenient for split operation. And in general it become
  complicated to fix history afterward in particular when working with older
  client.

Storing obsolete marker outside history have several pro:

* It ease Exchange of obsolete marker without unnecessary obsolete changeset content

* It allow tuning the actual storage and protocol exchange while maintaining
  compatibility with older client through the wire (as we do the repository
  format)

* ease the exchange of obsolete related information during discovery to exchange
  obsolete changeset relevant to conflict resolution. Exchanging such
  information deserve a dedicated protocol.

Persistent
```````````````````````

*Extinct* changeset and obsolete marker will most likely be garbage collected as
some point. However, archive server may decide to keep them forever in order to
keep a fully auditable history in it's finest conception.


Current status
-----------------------------------------------------

An experimental implementatione exists. What have been done so far.


* 1-1 obsolete marker stored outside history,

* compute obsolete-tip

* obsolete marker exchange through pushkey,

* compute obsolete, unstable, extinct and suspended set.

* hidden extinct changesets for UI.

* Use secret phase to remove from discovery obsolete and unstable changeset (to
  be improved soon)

* alter rebase to use obsolete marker instead of stripping. (XXX break --keep for now)

* Have an experimental mq-like extension to rewrite history (more on that later)

* Have an extension to update and mq repository according evolution of standard (more on that later)