server/hook.py
changeset 6147 95c604ec89bf
parent 6142 8bc6eac1fac1
child 6279 42079f752a9c
--- a/server/hook.py	Wed Aug 25 18:13:05 2010 +0200
+++ b/server/hook.py	Wed Aug 25 18:29:55 2010 +0200
@@ -15,37 +15,228 @@
 #
 # You should have received a copy of the GNU Lesser General Public License along
 # with CubicWeb.  If not, see <http://www.gnu.org/licenses/>.
-"""Hooks management
+"""
+Generalities
+------------
+
+Paraphrasing the `emacs`_ documentation, let us say that hooks are an important
+mechanism for customizing an application. A hook is basically a list of
+functions to be called on some well-defined occasion (this is called `running
+the hook`).
+
+.. _`emacs`: http://www.gnu.org/software/emacs/manual/html_node/emacs/Hooks.html
+
+Hooks
+~~~~~
+
+In |cubicweb|, hooks are subclasses of the :class:`~cubicweb.server.hook.Hook`
+class. They are selected over a set of pre-defined `events` (and possibly more
+conditions, hooks being selectable appobjects like views and components).  They
+should implement a :meth:`~cubicweb.server.hook.Hook.__call__` method that will
+be called when the hook is triggered.
 
-This module defined the `Hook` class and registry and a set of abstract classes
-for operations.
+There are two families of events: data events (before / after any individual
+update of an entity / or a relation in the repository) and server events (such
+as server startup or shutdown).  In a typical application, most of the hooks are
+defined over data events.
+
+Also, some :class:`~cubicweb.server.hook.Operation` may be registered by hooks,
+which will be fired when the transaction is commited or rollbacked.
+
+The purpose of data event hooks is usually to complement the data model as
+defined in the schema, which is static by nature and only provide a restricted
+builtin set of dynamic constraints, with dynamic or value driven behaviours.
+For instance they can serve the following purposes:
+
+* enforcing constraints that the static schema cannot express (spanning several
+  entities/relations, exotic value ranges and cardinalities, etc.)
+
+* implement computed attributes
+
+It is functionally equivalent to a `database trigger`_, except that database
+triggers definition languages are not standardized, hence not portable (for
+instance, PL/SQL works with Oracle and PostgreSQL but not SqlServer nor Sqlite).
+
+.. _`database trigger`: http://en.wikipedia.org/wiki/Database_trigger
 
 
-Hooks are called before / after any individual update of entities / relations
-in the repository and on special events such as server startup or shutdown.
+Operations
+~~~~~~~~~~
+
+Operations are subclasses of the :class:`~cubicweb.server.hook.Operation` class
+that may be created by hooks and scheduled to happen just before (or after) the
+`precommit`, `postcommit` or `rollback` event. Hooks are being fired immediately
+on data operations, and it is sometime necessary to delay the actual work down
+to a time where all other hooks have run. Also while the order of execution of
+hooks is data dependant (and thus hard to predict), it is possible to force an
+order on operations.
+
+Operations may be used to:
+
+* implements a validation check which needs that all relations be already set on
+  an entity
+
+* process various side effects associated with a transaction such as filesystem
+  udpates, mail notifications, etc.
 
 
-Operations may be registered by hooks during a transaction, which will  be
-fired when the pool is commited or rollbacked.
+Events
+------
+
+Hooks are mostly defined and used to handle `dataflow`_ operations. It
+means as data gets in (entities added, updated, relations set or
+unset), specific events are issued and the Hooks matching these events
+are called.
+
+You can get the event that triggered a hook by accessing its :attr:event
+attribute.
+
+.. _`dataflow`: http://en.wikipedia.org/wiki/Dataflow
 
 
-Entity hooks (eg before_add_entity, after_add_entity, before_update_entity,
-after_update_entity, before_delete_entity, after_delete_entity) all have an
-`entity` attribute
+Entity modification related events
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+When called for one of these events, hook will have an `entity` attribute
+containing the entity instance.
+
+* 'before_add_entity', 'before_update_entity':
+
+  - on those events, you can check what attributes of the entity are modified in
+    `entity.cw_edited` (by definition the database is not yet updated in a before
+    event)
+
+  - you are allowed to further modify the entity before database operations,
+    using the dictionary notation. By doing this, you'll avoid the need for a
+    whole new rql query processing, the only difference is that the underlying
+    backend query (eg usually sql) will contains the additional data. For
+    example:
+
+    .. sourcecode:: python
+
+       self.entity.set_attributes(age=42)
+
+    will set the `age` attribute of the entity to 42. But to do so, it will
+    generate a rql query that will have to be processed, then trigger some
+    hooks, and so one (potentially leading to infinite hook loops or such
+    awkward situations..) You can avoid this by doing the modification that way:
+
+    .. sourcecode:: python
+
+       self.entity.cw_edited['age'] = 42
+
+    Here the attribute will simply be edited in the same query that the
+    one that triggered the hook.
 
-Relation (eg before_add_relation, after_add_relation, before_delete_relation,
-after_delete_relation) all have `eidfrom`, `rtype`, `eidto` attributes.
+    Similarly, removing an attribute from `cw_edited` will cancel its
+    modification.
+
+  - on 'before_update_entity' event, you can access to old and new values in
+    this hook, by using `entity.cw_edited.oldnewvalue(attr)`
+
+
+* 'after_add_entity', 'after_update_entity'
+
+  - on those events, you can still check what attributes of the entity are
+    modified in `entity.cw_edited` but you can't get anymore the old value, nor
+    modify it.
+
+* 'before_delete_entity', 'after_delete_entity'
+
+  - on those events, the entity has no `cw_edited` set.
+
+
+Relation modification related events
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+When called for one of these events, hook will have `eidfrom`, `rtype`, `eidto`
+attributes containing respectivly the eid of the subject entity, the relation
+type and the eid of the object entity.
+
+* 'before_add_relation', 'before_delete_relation'
+
+  - on those events, you can still get original relation by issuing a rql query
+
+* 'after_add_relation', 'after_delete_relation'
+
+This is an occasion to remind us that relations support the add / delete
+operation, but no update.
+
+
+Non data events
+~~~~~~~~~~~~~~~
 
-Server start/maintenance/stop hooks (eg server_startup, server_maintenance,
-server_shutdown) have a `repo` attribute, but *their `_cw` attribute is None*.
-The `server_startup` is called on regular startup, while `server_maintenance`
-is called on cubicweb-ctl upgrade or shell commands. `server_shutdown` is
-called anyway.
+Hooks called on server start/maintenance/stop event (eg 'server_startup',
+'server_maintenance', 'server_shutdown') have a `repo` attribute, but *their
+`_cw` attribute is None*.  The `server_startup` is called on regular startup,
+while `server_maintenance` is called on cubicweb-ctl upgrade or shell
+commands. `server_shutdown` is called anyway.
+
+Hooks called on backup/restore event (eg 'server_backup', 'server_restore') have
+a `repo` and a `timestamp` attributes, but *their `_cw` attribute is None*.
+
+Hooks called on session event (eg 'session_open', 'session_close') have no
+special attribute.
+
+
+API
+---
+
+Hooks control
+~~~~~~~~~~~~~
+
+It is sometimes convenient to explicitly enable or disable some hooks. For
+instance if you want to disable some integrity checking hook.  This can be
+controlled more finely through the `category` class attribute, which is a string
+giving a category name.  One can then uses the
+:class:`~cubicweb.server.session.hooks_control` context manager to explicitly
+enable or disable some categories.
+
+.. autoclass:: cubicweb.server.session.hooks_control
+
+
+The existing categories are:
+
+* ``security``, security checking hooks
+
+* ``worfklow``, workflow handling hooks
 
-Backup/restore hooks (eg server_backup, server_restore) have a `repo` and a
-`timestamp` attributes, but *their `_cw` attribute is None*.
+* ``metadata``, hooks setting meta-data on newly created entities
+
+* ``notification``, email notification hooks
+
+* ``integrity``, data integrity checking hooks
+
+* ``activeintegrity``, data integrity consistency hooks, that you should *never*
+  want to disable
+
+* ``syncsession``, hooks synchronizing existing sessions
+
+* ``syncschema``, hooks synchronizing instance schema (including the physical database)
+
+* ``email``, email address handling hooks
+
+* ``bookmark``, bookmark entities handling hooks
+
 
-Session hooks (eg session_open, session_close) have no special attribute.
+Nothing precludes one to invent new categories and use the
+:class:`~cubicweb.server.session.hooks_control` context manager to filter them
+in or out.
+
+
+Hooks specific selector
+~~~~~~~~~~~~~~~~~~~~~~~
+.. autoclass:: cubicweb.server.hook.match_rtype
+.. autoclass:: cubicweb.server.hook.match_rtype_sets
+
+
+Hooks and operations classes
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. autoclass:: cubicweb.server.hook.Hook
+.. autoclass:: cubicweb.server.hook.Operation
+.. autoclass:: cubicweb.server.hook.LateOperation
+.. autofunction:: cubicweb.server.hook.set_operation
+
 """
 
 from __future__ import with_statement
@@ -192,6 +383,29 @@
 # base class for hook ##########################################################
 
 class Hook(AppObject):
+    """Base class for hook.
+
+    Hooks being appobjects like views, they have a `__regid__` and a `__select__`
+    class attribute. Like all appobjects, hooks have the `self._cw` attribute which
+    represents the current session. In entity hooks, a `self.entity` attribute is
+    also present.
+
+    The `events` tuple is used by the base class selector to dispatch the hook
+    on the right events. It is possible to dispatch on multiple events at once
+    if needed (though take care as hook attribute may vary as described above).
+
+    .. Note::
+
+      Do not forget to extend the base class selectors as in ::
+
+      .. sourcecode:: python
+
+          class MyHook(Hook):
+            __regid__ = 'whatever'
+            __select__ = Hook.__select__ & implements('Person')
+
+      else your hooks will be called madly, whatever the event.
+    """
     __select__ = enabled_category()
     # set this in derivated classes
     events = None
@@ -353,40 +567,53 @@
 # abstract classes for operation ###############################################
 
 class Operation(object):
-    """an operation is triggered on connections pool events related to
+    """Base class for operations.
+
+    Operation may be instantiated in the hooks' `__call__` method. It always
+    takes a session object as first argument (accessible as `.session` from the
+    operation instance), and optionally all keyword arguments needed by the
+    operation. These keyword arguments will be accessible as attributes from the
+    operation instance.
+
+    An operation is triggered on connections pool events related to
     commit / rollback transations. Possible events are:
 
-    precommit:
-      the pool is preparing to commit. You shouldn't do anything which
-      has to be reverted if the commit fails at this point, but you can freely
-      do any heavy computation or raise an exception if the commit can't go.
-      You can add some new operations during this phase but their precommit
-      event won't be triggered
+    * 'precommit':
 
-    commit:
-      the pool is preparing to commit. You should avoid to do to expensive
-      stuff or something that may cause an exception in this event
+      the transaction is being prepared for commit. You can freely do any heavy
+      computation, raise an exception if the commit can't go. or even add some
+      new operations during this phase. If you do anything which has to be
+      reverted if the commit fails afterwards (eg altering the file system for
+      instance), you'll have to support the 'revertprecommit' event to revert
+      things by yourself
 
-    revertcommit:
-      if an operation failed while commited, this event is triggered for
-      all operations which had their commit event already to let them
-      revert things (including the operation which made fail the commit)
+    * 'revertprecommit':
+
+      if an operation failed while being pre-commited, this event is triggered
+      for all operations which had their 'precommit' event already fired to let
+      them revert things (including the operation which made the commit fail)
+
+    * 'rollback':
 
-    rollback:
       the transaction has been either rollbacked either:
+
        * intentionaly
-       * a precommit event failed, all operations are rollbacked
-       * a commit event failed, all operations which are not been triggered for
-         commit are rollbacked
+       * a 'precommit' event failed, in which case all operations are rollbacked
+         once 'revertprecommit'' has been called
+
+    * 'postcommit':
 
-    postcommit:
-      The transaction is over. All the ORM entities are
-      invalid. If you need to work on the database, you need to stard
-      a new transaction, for instance using a new internal_session,
-      which you will need to commit (and close!).
+      the transaction is over. All the ORM entities accessed by the earlier
+      transaction are invalid. If you need to work on the database, you need to
+      start a new transaction, for instance using a new internal session, which
+      you will need to commit (and close!).
 
-    order of operations may be important, and is controlled according to
-    the insert_index's method output
+    For an operation to support an event, one has to implement the `<event
+    name>_event` method with no arguments.
+
+    Notice order of operations may be important, and is controlled according to
+    the insert_index's method output (whose implementation vary according to the
+    base hook class used).
     """
 
     def __init__(self, session, **kwargs):
@@ -468,20 +695,55 @@
     {set: set.add, list: list.append}[container.__class__](container, value)
 
 def set_operation(session, datakey, value, opcls, containercls=set, **opkwargs):
-    """Search for session.transaction_data[`datakey`] (expected to be a set):
+    """Function to ease applying a single operation on a set of data, avoiding
+    to create as many as operation as they are individual modification. You
+    should try to use this instead of creating on operation for each `value`,
+    since handling operations becomes coslty on massive data import.
+
+    Arguments are:
+
+    * the `session` object
 
-    * if found, simply append `value`
+    * `datakey`, a specially forged key that will be used as key in
+      session.transaction_data
+
+    * `value` that is the actual payload of an individual operation
+
+    * `opcls`, the class of the operation. An instance is created on the first
+      call for the given key, and then subsequent calls will simply add the
+      payload to the container (hence `opkwargs` is only used on that first
+      call)
 
-    * else, initialize it to containercls([`value`]) and instantiate the given
-      `opcls` operation class with additional keyword arguments. `containercls`
-      is a set by default. Give `list` if you want to keep arrival ordering.
+    * `containercls`, the container class that should be instantiated to hold
+      payloads.  An instance is created on the first call for the given key, and
+      then subsequent calls will add the data to the existing container. Default
+      to a set. Give `list` if you want to keep arrival ordering.
+
+    * more optional parameters to give to the operation (here the rtype which do not
+      vary accross operations).
+
+    The body of the operation must then iterate over the values that have been mapped
+    in the transaction_data dictionary to the forged key, e.g.:
+
+    .. sourcecode:: python
 
-    You should use this instead of creating on operation for each `value`,
-    since handling operations becomes coslty on massive data import.
+           for value in self._cw.transaction_data.pop(datakey):
+               ...
+
+    .. Note::
+       **poping** the key from `transaction_data` is not an option, else you may
+       get unexpected data loss in some case of nested hooks.
     """
+
+
+
     try:
+        # Search for session.transaction_data[`datakey`] (expected to be a set):
+        # if found, simply append `value`
         _container_add(session.transaction_data[datakey], value)
     except KeyError:
+        # else, initialize it to containercls([`value`]) and instantiate the given
+        # `opcls` operation class with additional keyword arguments
         opcls(session, **opkwargs)
         session.transaction_data[datakey] = containercls()
         _container_add(session.transaction_data[datakey], value)