[serverctl] use repoapi for db-check, add-source, rebuild-fti commands
One more step towards getting rid of dbapi.
# copyright 2003-2013 LOGILAB S.A. (Paris, FRANCE), all rights reserved.
# contact http://www.logilab.fr/ -- mailto:contact@logilab.fr
#
# This file is part of CubicWeb.
#
# CubicWeb is free software: you can redistribute it and/or modify it under the
# terms of the GNU Lesser General Public License as published by the Free
# Software Foundation, either version 2.1 of the License, or (at your option)
# any later version.
#
# CubicWeb is distributed in the hope that it will be useful, but WITHOUT
# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
# FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more
# details.
#
# You should have received a copy of the GNU Lesser General Public License along
# with CubicWeb. If not, see <http://www.gnu.org/licenses/>.
"""
Generalities
------------
Paraphrasing the `emacs`_ documentation, let us say that hooks are an important
mechanism for customizing an application. A hook is basically a list of
functions to be called on some well-defined occasion (this is called `running
the hook`).
.. _`emacs`: http://www.gnu.org/software/emacs/manual/html_node/emacs/Hooks.html
Hooks
~~~~~
In |cubicweb|, hooks are subclasses of the :class:`~cubicweb.server.hook.Hook`
class. They are selected over a set of pre-defined `events` (and possibly more
conditions, hooks being selectable appobjects like views and components). They
should implement a :meth:`~cubicweb.server.hook.Hook.__call__` method that will
be called when the hook is triggered.
There are two families of events: data events (before / after any individual
update of an entity / or a relation in the repository) and server events (such
as server startup or shutdown). In a typical application, most of the hooks are
defined over data events.
Also, some :class:`~cubicweb.server.hook.Operation` may be registered by hooks,
which will be fired when the transaction is commited or rolled back.
The purpose of data event hooks is usually to complement the data model as
defined in the schema, which is static by nature and only provide a restricted
builtin set of dynamic constraints, with dynamic or value driven behaviours.
For instance they can serve the following purposes:
* enforcing constraints that the static schema cannot express (spanning several
entities/relations, exotic value ranges and cardinalities, etc.)
* implement computed attributes
It is functionally equivalent to a `database trigger`_, except that database
triggers definition languages are not standardized, hence not portable (for
instance, PL/SQL works with Oracle and PostgreSQL but not SqlServer nor Sqlite).
.. _`database trigger`: http://en.wikipedia.org/wiki/Database_trigger
.. hint::
It is a good practice to write unit tests for each hook. See an example in
:ref:`hook_test`
Operations
~~~~~~~~~~
Operations are subclasses of the :class:`~cubicweb.server.hook.Operation` class
that may be created by hooks and scheduled to happen on `precommit`,
`postcommit` or `rollback` event (i.e. respectivly before/after a commit or
before a rollback of a transaction).
Hooks are being fired immediately on data operations, and it is sometime
necessary to delay the actual work down to a time where we can expect all
information to be there, or when all other hooks have run (though take case
since operations may themselves trigger hooks). Also while the order of
execution of hooks is data dependant (and thus hard to predict), it is possible
to force an order on operations.
So, for such case where you may miss some information that may be set later in
the transaction, you should instantiate an operation in the hook.
Operations may be used to:
* implements a validation check which needs that all relations be already set on
an entity
* process various side effects associated with a transaction such as filesystem
udpates, mail notifications, etc.
Events
------
Hooks are mostly defined and used to handle `dataflow`_ operations. It
means as data gets in (entities added, updated, relations set or
unset), specific events are issued and the Hooks matching these events
are called.
You can get the event that triggered a hook by accessing its `event`
attribute.
.. _`dataflow`: http://en.wikipedia.org/wiki/Dataflow
Entity modification related events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
When called for one of these events, hook will have an `entity` attribute
containing the entity instance.
- `before_add_entity`, `before_update_entity`:
On those events, you can access the modified attributes of the entity using
the `entity.cw_edited` dictionary. The values can be modified and the old
values can be retrieved.
If you modify the `entity.cw_edited` dictionary in the hook, that is before
the database operations take place, you will avoid the need to process a whole
new rql query and the underlying backend query (eg usually sql) will contain
the modified data. For example:
.. sourcecode:: python
self.entity.cw_edited['age'] = 42
will modify the age before it is written to the backend storage.
Similarly, removing an attribute from `cw_edited` will cancel its
modification:
.. sourcecode:: python
del self.entity.cw_edited['age']
On a `before_update_entity` event, you can access the old and new values:
.. sourcecode:: python
old, new = entity.cw_edited.oldnewvalue('age')
- `after_add_entity`, `after_update_entity`
On those events, you can get the list of attributes that were modified using
the `entity.cw_edited` dictionary, but you can not modify it or get the old
value of an attribute.
- `before_delete_entity`, `after_delete_entity`
On those events, the entity has no `cw_edited` dictionary.
.. note:: `self.entity.cw_set(age=42)` will set the `age` attribute to
42. But to do so, it will generate a rql query that will have to be processed,
hence may trigger some hooks, etc. This could lead to infinitely looping hooks.
Relation modification related events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
When called for one of these events, hook will have `eidfrom`, `rtype`, `eidto`
attributes containing respectively the eid of the subject entity, the relation
type and the eid of the object entity.
* `before_add_relation`, `before_delete_relation`
On those events, you can still get the original relation by issuing a rql query.
* `after_add_relation`, `after_delete_relation`
Specific selectors are shipped for these kinds of events, see in particular
:class:`~cubicweb.server.hook.match_rtype`.
Also note that relations can be added or deleted, but not updated.
Non data events
~~~~~~~~~~~~~~~
Hooks called on server start/maintenance/stop event (e.g.
`server_startup`, `server_maintenance`, `before_server_shutdown`,
`server_shutdown`) have a `repo` attribute, but *their `_cw` attribute
is None*. The `server_startup` is called on regular startup, while
`server_maintenance` is called on cubicweb-ctl upgrade or shell
commands. `server_shutdown` is called anyway but connections to the
native source is impossible; `before_server_shutdown` handles that.
Hooks called on backup/restore event (eg `server_backup`,
`server_restore`) have a `repo` and a `timestamp` attributes, but
*their `_cw` attribute is None*.
Hooks called on session event (eg `session_open`, `session_close`) have no
special attribute.
API
---
Hooks control
~~~~~~~~~~~~~
It is sometimes convenient to explicitly enable or disable some hooks. For
instance if you want to disable some integrity checking hook. This can be
controlled more finely through the `category` class attribute, which is a string
giving a category name. One can then uses the
:meth:`~cubicweb.server.session.Session.deny_all_hooks_but` and
:meth:`~cubicweb.server.session.Session.allow_all_hooks_but` context managers to
explicitly enable or disable some categories.
The existing categories are:
* ``security``, security checking hooks
* ``worfklow``, workflow handling hooks
* ``metadata``, hooks setting meta-data on newly created entities
* ``notification``, email notification hooks
* ``integrity``, data integrity checking hooks
* ``activeintegrity``, data integrity consistency hooks, that you should **never**
want to disable
* ``syncsession``, hooks synchronizing existing sessions
* ``syncschema``, hooks synchronizing instance schema (including the physical database)
* ``email``, email address handling hooks
* ``bookmark``, bookmark entities handling hooks
Nothing precludes one to invent new categories and use existing mechanisms to
filter them in or out.
Hooks specific predicates
~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: cubicweb.server.hook.match_rtype
.. autoclass:: cubicweb.server.hook.match_rtype_sets
Hooks and operations classes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: cubicweb.server.hook.Hook
.. autoclass:: cubicweb.server.hook.Operation
.. autoclass:: cubicweb.server.hook.LateOperation
.. autoclass:: cubicweb.server.hook.DataOperationMixIn
"""
__docformat__ = "restructuredtext en"
from warnings import warn
from logging import getLogger
from itertools import chain
from logilab.common.decorators import classproperty, cached
from logilab.common.deprecation import deprecated, class_renamed
from logilab.common.logging_ext import set_log_methods
from logilab.common.registry import (NotPredicate, OrPredicate,
objectify_predicate)
from cubicweb import RegistryNotFound, server
from cubicweb.cwvreg import CWRegistry, CWRegistryStore
from cubicweb.predicates import ExpectedValuePredicate, is_instance
from cubicweb.appobject import AppObject
ENTITIES_HOOKS = set(('before_add_entity', 'after_add_entity',
'before_update_entity', 'after_update_entity',
'before_delete_entity', 'after_delete_entity'))
RELATIONS_HOOKS = set(('before_add_relation', 'after_add_relation' ,
'before_delete_relation','after_delete_relation'))
SYSTEM_HOOKS = set(('server_backup', 'server_restore',
'server_startup', 'server_maintenance',
'server_shutdown', 'before_server_shutdown',
'session_open', 'session_close'))
ALL_HOOKS = ENTITIES_HOOKS | RELATIONS_HOOKS | SYSTEM_HOOKS
def _iter_kwargs(entities, eids_from_to, kwargs):
if not entities and not eids_from_to:
yield kwargs
elif entities:
for entity in entities:
kwargs['entity'] = entity
yield kwargs
else:
for subject, object in eids_from_to:
kwargs.update({'eidfrom': subject, 'eidto': object})
yield kwargs
class HooksRegistry(CWRegistry):
def register(self, obj, **kwargs):
obj.check_events()
super(HooksRegistry, self).register(obj, **kwargs)
def call_hooks(self, event, session=None, **kwargs):
"""call `event` hooks for an entity or a list of entities (passed
respectively as the `entity` or ``entities`` keyword argument).
"""
kwargs['event'] = event
if session is None: # True for events such as server_start
for hook in sorted(self.possible_objects(session, **kwargs),
key=lambda x: x.order):
hook()
else:
if 'entities' in kwargs:
assert 'entity' not in kwargs, \
'can\'t pass "entities" and "entity" arguments simultaneously'
assert 'eids_from_to' not in kwargs, \
'can\'t pass "entities" and "eids_from_to" arguments simultaneously'
entities = kwargs.pop('entities')
eids_from_to = []
elif 'eids_from_to' in kwargs:
entities = []
eids_from_to = kwargs.pop('eids_from_to')
else:
entities = []
eids_from_to = []
pruned = self.get_pruned_hooks(session, event,
entities, eids_from_to, kwargs)
# by default, hooks are executed with security turned off
with session.security_enabled(read=False):
for _kwargs in _iter_kwargs(entities, eids_from_to, kwargs):
hooks = sorted(self.filtered_possible_objects(pruned, session, **_kwargs),
key=lambda x: x.order)
debug = server.DEBUG & server.DBG_HOOKS
with session.security_enabled(write=False):
for hook in hooks:
if debug:
print event, _kwargs, hook
hook()
def get_pruned_hooks(self, session, event, entities, eids_from_to, kwargs):
"""return a set of hooks that should not be considered by filtered_possible objects
the idea is to make a first pass over all the hooks in the
registry and to mark put some of them in a pruned list. The
pruned hooks are the one which:
* are disabled at the session level
* have a selector containing a :class:`match_rtype` or an
:class:`is_instance` predicate which does not match the rtype / etype
of the relations / entities for which we are calling the hooks. This
works because the repository calls the hooks grouped by rtype or by
etype when using the entities or eids_to_from keyword arguments
Only hooks with a simple predicate or an AndPredicate of simple
predicates are considered for disabling.
"""
if 'entity' in kwargs:
entities = [kwargs['entity']]
if len(entities):
look_for_selector = is_instance
etype = entities[0].__regid__
elif 'rtype' in kwargs:
look_for_selector = match_rtype
etype = None
else: # nothing to prune, how did we get there ???
return set()
cache_key = (event, kwargs.get('rtype'), etype)
pruned = session.pruned_hooks_cache.get(cache_key)
if pruned is not None:
return pruned
pruned = set()
session.pruned_hooks_cache[cache_key] = pruned
if look_for_selector is not None:
for id, hooks in self.iteritems():
for hook in hooks:
enabled_cat, main_filter = hook.filterable_selectors()
if enabled_cat is not None:
if not enabled_cat(hook, session):
pruned.add(hook)
continue
if main_filter is not None:
if isinstance(main_filter, match_rtype) and \
(main_filter.frometypes is not None or \
main_filter.toetypes is not None):
continue
first_kwargs = _iter_kwargs(entities, eids_from_to, kwargs).next()
if not main_filter(hook, session, **first_kwargs):
pruned.add(hook)
return pruned
def filtered_possible_objects(self, pruned, *args, **kwargs):
for appobjects in self.itervalues():
if pruned:
filtered_objects = [obj for obj in appobjects if obj not in pruned]
if not filtered_objects:
continue
else:
filtered_objects = appobjects
obj = self._select_best(filtered_objects,
*args, **kwargs)
if obj is None:
continue
yield obj
class HooksManager(object):
def __init__(self, vreg):
self.vreg = vreg
def call_hooks(self, event, session=None, **kwargs):
try:
registry = self.vreg['%s_hooks' % event]
except RegistryNotFound:
return # no hooks for this event
registry.call_hooks(event, session, **kwargs)
for event in ALL_HOOKS:
CWRegistryStore.REGISTRY_FACTORY['%s_hooks' % event] = HooksRegistry
@deprecated('[3.10] use entity.cw_edited.oldnewvalue(attr)')
def entity_oldnewvalue(entity, attr):
return entity.cw_edited.oldnewvalue(attr)
# some hook specific predicates #################################################
@objectify_predicate
def enabled_category(cls, req, **kwargs):
if req is None:
return True # XXX how to deactivate server startup / shutdown event
return req.is_hook_activated(cls)
@objectify_predicate
def from_dbapi_query(cls, req, **kwargs):
if req.running_dbapi_query:
return 1
return 0
class rechain(object):
def __init__(self, *iterators):
self.iterators = iterators
def __iter__(self):
return iter(chain(*self.iterators))
class match_rtype(ExpectedValuePredicate):
"""accept if the relation type is found in expected ones. Optional
named parameters `frometypes` and `toetypes` can be used to restrict
target subject and/or object entity types of the relation.
:param \*expected: possible relation types
:param frometypes: candidate entity types as subject of relation
:param toetypes: candidate entity types as object of relation
"""
def __init__(self, *expected, **more):
self.expected = expected
self.frometypes = more.pop('frometypes', None)
self.toetypes = more.pop('toetypes', None)
assert not more, "unexpected kwargs in match_rtype: %s" % more
def __call__(self, cls, req, *args, **kwargs):
if kwargs.get('rtype') not in self.expected:
return 0
if self.frometypes is not None and \
req.entity_metas(kwargs['eidfrom'])['type'] not in self.frometypes:
return 0
if self.toetypes is not None and \
req.entity_metas(kwargs['eidto'])['type'] not in self.toetypes:
return 0
return 1
class match_rtype_sets(ExpectedValuePredicate):
"""accept if the relation type is in one of the sets given as initializer
argument. The goal of this predicate is that it keeps reference to original sets,
so modification to thoses sets are considered by the predicate. For instance
.. sourcecode:: python
MYSET = set()
class Hook1(Hook):
__regid__ = 'hook1'
__select__ = Hook.__select__ & match_rtype_sets(MYSET)
...
class Hook2(Hook):
__regid__ = 'hook2'
__select__ = Hook.__select__ & match_rtype_sets(MYSET)
Client code can now change `MYSET`, this will changes the selection criteria
of :class:`Hook1` and :class:`Hook1`.
"""
def __init__(self, *expected):
self.expected = expected
def __call__(self, cls, req, *args, **kwargs):
for rel_set in self.expected:
if kwargs.get('rtype') in rel_set:
return 1
return 0
# base class for hook ##########################################################
class Hook(AppObject):
"""Base class for hook.
Hooks being appobjects like views, they have a `__regid__` and a `__select__`
class attribute. Like all appobjects, hooks have the `self._cw` attribute which
represents the current session. In entity hooks, a `self.entity` attribute is
also present.
The `events` tuple is used by the base class selector to dispatch the hook
on the right events. It is possible to dispatch on multiple events at once
if needed (though take care as hook attribute may vary as described above).
.. Note::
Do not forget to extend the base class selectors as in:
.. sourcecode:: python
class MyHook(Hook):
__regid__ = 'whatever'
__select__ = Hook.__select__ & is_instance('Person')
else your hooks will be called madly, whatever the event.
"""
__select__ = enabled_category()
# set this in derivated classes
events = None
category = None
order = 0
# stop pylint from complaining about missing attributes in Hooks classes
eidfrom = eidto = entity = rtype = repo = None
@classmethod
@cached
def filterable_selectors(cls):
search = cls.__select__.search_selector
if search((NotPredicate, OrPredicate)):
return None, None
enabled_cat = search(enabled_category)
main_filter = search((is_instance, match_rtype))
return enabled_cat, main_filter
@classmethod
def check_events(cls):
try:
for event in cls.events:
if event not in ALL_HOOKS:
raise Exception('bad event %s on %s.%s' % (
event, cls.__module__, cls.__name__))
except AttributeError:
raise
except TypeError:
raise Exception('bad .events attribute %s on %s.%s' % (
cls.events, cls.__module__, cls.__name__))
@classmethod
def __registered__(cls, reg):
cls.check_events()
@classproperty
def __registries__(cls):
if cls.events is None:
return []
return ['%s_hooks' % ev for ev in cls.events]
known_args = set(('entity', 'rtype', 'eidfrom', 'eidto', 'repo', 'timestamp'))
def __init__(self, req, event, **kwargs):
for arg in self.known_args:
if arg in kwargs:
setattr(self, arg, kwargs.pop(arg))
super(Hook, self).__init__(req, **kwargs)
self.event = event
set_log_methods(Hook, getLogger('cubicweb.hook'))
# abtract hooks for relation propagation #######################################
# See example usage in hooks of the nosylist cube
class PropagateRelationHook(Hook):
"""propagate some `main_rtype` relation on entities linked as object of
`subject_relations` or as subject of `object_relations` (the watched
relations).
This hook ensure that when one of the watched relation is added, the
`main_rtype` relation is added to the target entity of the relation.
Notice there are no default behaviour defined when a watched relation is
deleted, you'll have to handle this by yourself.
You usually want to use the :class:`match_rtype_sets` predicate on concrete
classes.
"""
events = ('after_add_relation',)
# to set in concrete class
main_rtype = None
subject_relations = None
object_relations = None
def __call__(self):
assert self.main_rtype
for eid in (self.eidfrom, self.eidto):
etype = self._cw.entity_metas(eid)['type']
if self.main_rtype not in self._cw.vreg.schema.eschema(etype).subjrels:
return
if self.rtype in self.subject_relations:
meid, seid = self.eidfrom, self.eidto
else:
assert self.rtype in self.object_relations
meid, seid = self.eidto, self.eidfrom
self._cw.execute(
'SET E %s P WHERE X %s P, X eid %%(x)s, E eid %%(e)s, NOT E %s P'
% (self.main_rtype, self.main_rtype, self.main_rtype),
{'x': meid, 'e': seid})
class PropagateRelationAddHook(Hook):
"""Propagate to entities at the end of watched relations when a `main_rtype`
relation is added.
`subject_relations` and `object_relations` attributes should be specified on
subclasses and are usually shared references with attributes of the same
name on :class:`PropagateRelationHook`.
Because of those shared references, you can use `skip_subject_relations` and
`skip_object_relations` attributes when you don't want to propagate to
entities linked through some particular relations.
"""
events = ('after_add_relation',)
# to set in concrete class (mandatory)
subject_relations = None
object_relations = None
# to set in concrete class (optionaly)
skip_subject_relations = ()
skip_object_relations = ()
def __call__(self):
eschema = self._cw.vreg.schema.eschema(self._cw.entity_metas(self.eidfrom)['type'])
execute = self._cw.execute
for rel in self.subject_relations:
if rel in eschema.subjrels and not rel in self.skip_subject_relations:
execute('SET R %s P WHERE X eid %%(x)s, P eid %%(p)s, '
'X %s R, NOT R %s P' % (self.rtype, rel, self.rtype),
{'x': self.eidfrom, 'p': self.eidto})
for rel in self.object_relations:
if rel in eschema.objrels and not rel in self.skip_object_relations:
execute('SET R %s P WHERE X eid %%(x)s, P eid %%(p)s, '
'R %s X, NOT R %s P' % (self.rtype, rel, self.rtype),
{'x': self.eidfrom, 'p': self.eidto})
class PropagateRelationDelHook(PropagateRelationAddHook):
"""Propagate to entities at the end of watched relations when a `main_rtype`
relation is deleted.
This is the opposite of the :class:`PropagateRelationAddHook`, see its
documentation for how to use this class.
"""
events = ('after_delete_relation',)
def __call__(self):
eschema = self._cw.vreg.schema.eschema(self._cw.entity_metas(self.eidfrom)['type'])
execute = self._cw.execute
for rel in self.subject_relations:
if rel in eschema.subjrels and not rel in self.skip_subject_relations:
execute('DELETE R %s P WHERE X eid %%(x)s, P eid %%(p)s, '
'X %s R' % (self.rtype, rel),
{'x': self.eidfrom, 'p': self.eidto})
for rel in self.object_relations:
if rel in eschema.objrels and not rel in self.skip_object_relations:
execute('DELETE R %s P WHERE X eid %%(x)s, P eid %%(p)s, '
'R %s X' % (self.rtype, rel),
{'x': self.eidfrom, 'p': self.eidto})
# abstract classes for operation ###############################################
class Operation(object):
"""Base class for operations.
Operation may be instantiated in the hooks' `__call__` method. It always
takes a session object as first argument (accessible as `.session` from the
operation instance), and optionally all keyword arguments needed by the
operation. These keyword arguments will be accessible as attributes from the
operation instance.
An operation is triggered on connections set events related to commit /
rollback transations. Possible events are:
* `precommit`:
the transaction is being prepared for commit. You can freely do any heavy
computation, raise an exception if the commit can't go. or even add some
new operations during this phase. If you do anything which has to be
reverted if the commit fails afterwards (eg altering the file system for
instance), you'll have to support the 'revertprecommit' event to revert
things by yourself
* `revertprecommit`:
if an operation failed while being pre-commited, this event is triggered
for all operations which had their 'precommit' event already fired to let
them revert things (including the operation which made the commit fail)
* `rollback`:
the transaction has been either rolled back either:
* intentionaly
* a 'precommit' event failed, in which case all operations are rolled back
once 'revertprecommit'' has been called
* `postcommit`:
the transaction is over. All the ORM entities accessed by the earlier
transaction are invalid. If you need to work on the database, you need to
start a new transaction, for instance using a new internal session, which
you will need to commit (and close!).
For an operation to support an event, one has to implement the `<event
name>_event` method with no arguments.
The order of operations may be important, and is controlled according to
the insert_index's method output (whose implementation vary according to the
base hook class used).
"""
def __init__(self, session, **kwargs):
self.session = session
self.__dict__.update(kwargs)
self.register(session)
# execution information
self.processed = None # 'precommit', 'commit'
self.failed = False
def register(self, session):
session.add_operation(self, self.insert_index())
def insert_index(self):
"""return the index of the lastest instance which is not a
LateOperation instance
"""
# faster by inspecting operation in reverse order for heavy transactions
i = None
for i, op in enumerate(reversed(self.session.pending_operations)):
if isinstance(op, (LateOperation, SingleLastOperation)):
continue
return -i or None
if i is None:
return None
return -(i + 1)
def handle_event(self, event):
"""delegate event handling to the opertaion"""
if event == 'postcommit_event' and hasattr(self, 'commit_event'):
warn('[3.10] %s: commit_event method has been replaced by postcommit_event'
% self.__class__, DeprecationWarning)
self.commit_event() # pylint: disable=E1101
getattr(self, event)()
def precommit_event(self):
"""the observed connections set is preparing a commit"""
def revertprecommit_event(self):
"""an error went when pre-commiting this operation or a later one
should revert pre-commit's changes but take care, they may have not
been all considered if it's this operation which failed
"""
def rollback_event(self):
"""the observed connections set has been rolled back
do nothing by default
"""
def postcommit_event(self):
"""the observed connections set has committed"""
# these are overridden by set_log_methods below
# only defining here to prevent pylint from complaining
info = warning = error = critical = exception = debug = lambda msg,*a,**kw: None
set_log_methods(Operation, getLogger('cubicweb.session'))
def _container_add(container, value):
{set: set.add, list: list.append}[container.__class__](container, value)
class DataOperationMixIn(object):
"""Mix-in class to ease applying a single operation on a set of data,
avoiding to create as many as operation as they are individual modification.
The body of the operation must then iterate over the values that have been
stored in a single operation instance.
You should try to use this instead of creating on operation for each
`value`, since handling operations becomes costly on massive data import.
Usage looks like:
.. sourcecode:: python
class MyEntityHook(Hook):
__regid__ = 'my.entity.hook'
__select__ = Hook.__select__ & is_instance('MyEntity')
events = ('after_add_entity',)
def __call__(self):
MyOperation.get_instance(self._cw).add_data(self.entity)
class MyOperation(DataOperationMixIn, Operation):
def precommit_event(self):
for bucket in self.get_data():
process(bucket)
You can modify the `containercls` class attribute, which defines the
container class that should be instantiated to hold payloads. An instance is
created on instantiation, and then the :meth:`add_data` method will add the
given data to the existing container. Default to a `set`. Give `list` if you
want to keep arrival ordering. You can also use another kind of container
by redefining :meth:`_build_container` and :meth:`add_data`
More optional parameters can be given to the `get_instance` operation, that
will be given to the operation constructor (for obvious reasons those
parameters should not vary accross different calls to this method for a
given operation).
.. Note::
For sanity reason `get_data` will reset the operation, so that once
the operation has started its treatment, if some hook want to push
additional data to this same operation, a new instance will be created
(else that data has a great chance to be never treated). This implies:
* you should **always** call `get_data` when starting treatment
* you should **never** call `get_data` for another reason.
"""
containercls = set
@classproperty
def data_key(cls):
return ('cw.dataops', cls.__name__)
@classmethod
def get_instance(cls, session, **kwargs):
# no need to lock: transaction_data already comes from thread's local storage
try:
return session.transaction_data[cls.data_key]
except KeyError:
op = session.transaction_data[cls.data_key] = cls(session, **kwargs)
return op
def __init__(self, *args, **kwargs):
super(DataOperationMixIn, self).__init__(*args, **kwargs)
self._container = self._build_container()
self._processed = False
def __contains__(self, value):
return value in self._container
def _build_container(self):
return self.containercls()
def union(self, data):
"""only when container is a set"""
assert not self._processed, """Trying to add data to a closed operation.
Iterating over operation data closed it and should be reserved to precommit /
postcommit method of the operation."""
self._container |= data
def add_data(self, data):
assert not self._processed, """Trying to add data to a closed operation.
Iterating over operation data closed it and should be reserved to precommit /
postcommit method of the operation."""
_container_add(self._container, data)
def remove_data(self, data):
assert not self._processed, """Trying to add data to a closed operation.
Iterating over operation data closed it and should be reserved to precommit /
postcommit method of the operation."""
self._container.remove(data)
def get_data(self):
assert not self._processed, """Trying to get data from a closed operation.
Iterating over operation data closed it and should be reserved to precommit /
postcommit method of the operation."""
self._processed = True
op = self.session.transaction_data.pop(self.data_key)
assert op is self, "Bad handling of operation data, found %s instead of %s for key %s" % (
op, self, self.data_key)
return self._container
@deprecated('[3.10] use opcls.get_instance(session, **opkwargs).add_data(value)')
def set_operation(session, datakey, value, opcls, containercls=set, **opkwargs):
"""Function to ease applying a single operation on a set of data, avoiding
to create as many as operation as they are individual modification. You
should try to use this instead of creating on operation for each `value`,
since handling operations becomes coslty on massive data import.
Arguments are:
* the `session` object
* `datakey`, a specially forged key that will be used as key in
session.transaction_data
* `value` that is the actual payload of an individual operation
* `opcls`, the class of the operation. An instance is created on the first
call for the given key, and then subsequent calls will simply add the
payload to the container (hence `opkwargs` is only used on that first
call)
* `containercls`, the container class that should be instantiated to hold
payloads. An instance is created on the first call for the given key, and
then subsequent calls will add the data to the existing container. Default
to a set. Give `list` if you want to keep arrival ordering.
* more optional parameters to give to the operation (here the rtype which do not
vary accross operations).
The body of the operation must then iterate over the values that have been mapped
in the transaction_data dictionary to the forged key, e.g.:
.. sourcecode:: python
for value in self._cw.transaction_data.pop(datakey):
...
.. Note::
**poping** the key from `transaction_data` is not an option, else you may
get unexpected data loss in some case of nested hooks.
"""
try:
# Search for session.transaction_data[`datakey`] (expected to be a set):
# if found, simply append `value`
_container_add(session.transaction_data[datakey], value)
except KeyError:
# else, initialize it to containercls([`value`]) and instantiate the given
# `opcls` operation class with additional keyword arguments
opcls(session, **opkwargs)
session.transaction_data[datakey] = containercls()
_container_add(session.transaction_data[datakey], value)
class LateOperation(Operation):
"""special operation which should be called after all possible (ie non late)
operations
"""
def insert_index(self):
"""return the index of the lastest instance which is not a
SingleLastOperation instance
"""
# faster by inspecting operation in reverse order for heavy transactions
i = None
for i, op in enumerate(reversed(self.session.pending_operations)):
if isinstance(op, SingleLastOperation):
continue
return -i or None
if i is None:
return None
return -(i + 1)
class SingleLastOperation(Operation):
"""special operation which should be called once and after all other
operations
"""
def register(self, session):
"""override register to handle cases where this operation has already
been added
"""
operations = session.pending_operations
index = self.equivalent_index(operations)
if index is not None:
equivalent = operations.pop(index)
else:
equivalent = None
session.add_operation(self, self.insert_index())
return equivalent
def equivalent_index(self, operations):
"""return the index of the equivalent operation if any"""
for i, op in enumerate(reversed(operations)):
if op.__class__ is self.__class__:
return -(i+1)
return None
def insert_index(self):
return None
class SendMailOp(SingleLastOperation):
def __init__(self, session, msg=None, recipients=None, **kwargs):
# may not specify msg yet, as
# `cubicweb.sobjects.supervision.SupervisionMailOp`
if msg is not None:
assert recipients
self.to_send = [(msg, recipients)]
else:
assert recipients is None
self.to_send = []
super(SendMailOp, self).__init__(session, **kwargs)
def register(self, session):
previous = super(SendMailOp, self).register(session)
if previous:
self.to_send = previous.to_send + self.to_send
def postcommit_event(self):
self.session.repo.threaded_task(self.sendmails)
def sendmails(self):
self.session.vreg.config.sendmails(self.to_send)
class RQLPrecommitOperation(Operation):
# to be defined in concrete classes
rqls = None
def precommit_event(self):
execute = self.session.execute
for rql in self.rqls:
execute(*rql)
class CleanupNewEidsCacheOp(DataOperationMixIn, SingleLastOperation):
"""on rollback of a insert query we have to remove from repository's
type/source cache eids of entities added in that transaction.
NOTE: querier's rqlst/solutions cache may have been polluted too with
queries such as Any X WHERE X eid 32 if 32 has been rolled back however
generated queries are unpredictable and analysing all the cache probably
too expensive. Notice that there is no pb when using args to specify eids
instead of giving them into the rql string.
"""
data_key = 'neweids'
def rollback_event(self):
"""the observed connections set has been rolled back,
remove inserted eid from repository type/source cache
"""
try:
self.session.repo.clear_caches(self.get_data())
except KeyError:
pass
class CleanupDeletedEidsCacheOp(DataOperationMixIn, SingleLastOperation):
"""on commit of delete query, we have to remove from repository's
type/source cache eids of entities deleted in that transaction.
"""
data_key = 'pendingeids'
def postcommit_event(self):
"""the observed connections set has been rolled back,
remove inserted eid from repository type/source cache
"""
try:
eids = self.get_data()
self.session.repo.clear_caches(eids)
self.session.repo.app_instances_bus.publish(['delete'] + list(str(eid) for eid in eids))
except KeyError:
pass