stablesort: add some documentation for stablesort
This should help people to understand what is going on here.
--- a/hgext3rd/evolve/stablesort.py Tue Sep 03 12:48:47 2019 +0200
+++ b/hgext3rd/evolve/stablesort.py Sun Sep 08 11:56:11 2019 +0200
@@ -7,6 +7,78 @@
# This software may be used and distributed according to the terms of the
# GNU General Public License version 2 or any later version.
+"""Stable sorting for the mercurial graph
+
+The goal is to provided an efficient, revnum independant way, to sort revisions
+in a topologicaly. Having it independant from revnum is important to make it
+stable from one repository to another, unlocking various capabilities. For
+example it can be used for discovery purposes.
+
+This docstring describe the currently preferred solution:
+
+Basic principle:
+----------------
+
+We are always talking about set of revision defined by a single heads
+(eg: `stablesort(::r)`)
+
+For non merge revisions, the definition is simple::
+
+ stablesort(::r) == stablesort(p1(r)) + r
+
+For merge revision, we reuse as much as possible of the parents order:
+
+ pl = stablemin(parents(m))
+ ph = stablemax(parents(m))
+ stablesort(::m) == stablesort(pl)
+ + [i for in in stablesort(ph) if in ph % pl]
+ + m
+
+The `ph % pl` set of revision is called the "exclusive part". In this area we
+try to reuse as much as the stable-sorted order for `ph`. In simple case, the
+`[i for i in stablesort(ph) if i in ph % pl]` is just the contiguous final range of
+`stablesort(ph)`. However in more advance case, this will not be contiguous and
+we'll need to skip over multiple parts of `stablesort(ph)` to cover `ph % pl`.
+
+Another important details is that, in practice, the sorted revision are always
+walked backward, from the head of the set of revisions.
+
+preexisting cached data
+-----------------------
+
+The stable sort assume we already have 2 important property cached for each
+changesets:
+
+1) changeset depth == len(::r)
+2) first merge == max(merge() and ::r)
+
+Caching strategy
+----------------
+
+Since we always walk from the head, the iteration mostly have to follow the
+unique parent of non merge revision. For merge revision, we need to iterate
+through one of the parent before coming back to the other parent eventually.
+
+To effiently cache the path we need to walk, we records "jumps". A jump is a
+point where the next revision will not be a parent of the current revision, but
+another point in the graph. This correspond to point were we need to "come back
+to the other parent".
+
+Jumps are recorded using the following formats:
+
+ (jump-point, jump-destination, section-size)
+
+* jump-point is the last revision number we should iterate over before jumping,
+* jump-destination is the next revision we should iterate over after the jump point,
+* section-size is the number of revision to be iterated before reaching jump-point.
+
+the section-size is not directly used when doing a stable-sorted walk. However
+it is useful for higher level piece of code to take decision without having to
+actually walk the graph, (see stable range documentation).
+
+For each merge, we store the set of jumps that cover the exclusive side.
+"""
+
import array
import collections
import struct