doc/tutorials/dataimport/index.rst
changeset 12792 e2cdb1be6bd9
parent 12556 d1c659d70368
--- a/doc/tutorials/dataimport/index.rst	Fri Dec 06 13:20:05 2019 +0100
+++ b/doc/tutorials/dataimport/index.rst	Thu Apr 04 17:11:14 2019 +0200
@@ -4,18 +4,18 @@
 Introduction
 ~~~~~~~~~~~~
 
-This tutorial explains how to import data from an external source (e.g. a collection of files) 
+This tutorial explains how to import data from an external source (e.g. a collection of files)
 into a CubicWeb cube instance.
 
-First, once we know the format of the data we wish to import, we devise a 
+First, once we know the format of the data we wish to import, we devise a
 *data model*, that is, a CubicWeb (Yams) schema which reflects the way the data
 is structured. This schema is implemented in the ``schema.py`` file.
-In this tutorial, we will describe such a schema for a particular data set, 
+In this tutorial, we will describe such a schema for a particular data set,
 the Diseasome data (see below).
 
-Once the schema is defined, we create a cube and an instance. 
-The cube is a specification of an application, whereas an instance 
-is the application per se. 
+Once the schema is defined, we create a cube and an instance.
+The cube is a specification of an application, whereas an instance
+is the application per se.
 
 Once the schema is defined and the instance is created, the import can be performed, via
 the following steps:
@@ -41,18 +41,18 @@
 
 The first thing to do when using CubicWeb for creating an application from scratch
 is to devise a *data model*, that is, a relational representation of the problem to be
-modeled or of the structure of the data to be imported. 
+modeled or of the structure of the data to be imported.
 
 In such a schema, we define
 an entity type (``EntityType`` objects) for each type of entity to import. Each such type
 has several attributes. If the attributes are of known CubicWeb (Yams) types, viz. numbers,
 strings or characters, then they are defined as attributes, as e.g. ``attribute = Int()``
-for an attribute named ``attribute`` which is an integer. 
+for an attribute named ``attribute`` which is an integer.
 
 Each such type also has a set of
 relations, which are defined like the attributes, except that they represent, in fact,
 relations between the entities of the type under discussion and the objects of a type which
-is specified in the relation definition. 
+is specified in the relation definition.
 
 For example, for the Diseasome data, we have two types of entities, genes and diseases.
 Thus, we create two classes which inherit from ``EntityType``::
@@ -73,27 +73,27 @@
     class Gene(EntityType):
         ...
 
-In this schema, there are attributes whose values are numbers or strings. Thus, they are 
-defined by using the CubicWeb / Yams primitive types, e.g., ``label = String(maxsize=12)``. 
-These types can have several constraints or attributes, such as ``maxsize``. 
+In this schema, there are attributes whose values are numbers or strings. Thus, they are
+defined by using the CubicWeb / Yams primitive types, e.g., ``label = String(maxsize=12)``.
+These types can have several constraints or attributes, such as ``maxsize``.
 There are also relations, either between the entity types themselves, or between them
-and a CubicWeb type, ``ExternalUri``. The latter defines a class of URI objects in 
-CubicWeb. For instance, the ``chromosomal_location`` attribute is a relation between 
+and a CubicWeb type, ``ExternalUri``. The latter defines a class of URI objects in
+CubicWeb. For instance, the ``chromosomal_location`` attribute is a relation between
 a ``Disease`` entity and an ``ExternalUri`` entity. The relation is marked by the CubicWeb /
 Yams ``SubjectRelation`` method. The latter can have several optional keyword arguments, such as
-``cardinality`` which specifies the number of subjects and objects related by the relation type 
+``cardinality`` which specifies the number of subjects and objects related by the relation type
 specified. For example, the ``'?*'`` cardinality in the ``chromosomal_relation`` relation type says
 that zero or more ``Disease`` entities are related to zero or one ``ExternalUri`` entities.
 In other words, a ``Disease`` entity is related to at most one ``ExternalUri`` entity via the
 ``chromosomal_location`` relation type, and that we can have zero or more ``Disease`` entities in the
-data base. 
+data base.
 For a relation between the entity types themselves, the ``associated_genes`` between a ``Disease``
 entity and a ``Gene`` entity is defined, so that any number of ``Gene`` entities can be associated
 to a ``Disease``, and there can be any number of ``Disease`` s if a ``Gene`` exists.
 
 Of course, before being able to use the CubicWeb / Yams built-in objects, we need to import them::
 
-    
+
     from yams.buildobjs import EntityType, SubjectRelation, String, Int
     from cubicweb.schemas.base import ExternalUri
 
@@ -101,48 +101,48 @@
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 The data we wish to import is structured in the RDF format,
-as a text file containing a set of lines. 
-On each line, there are three fields. 
-The first two fields are URIs ("Universal Resource Identifiers"). 
+as a text file containing a set of lines.
+On each line, there are three fields.
+The first two fields are URIs ("Universal Resource Identifiers").
 The third field is either an URI or a string. Each field bares a particular meaning:
 
-- the leftmost field is an URI that holds the entity to be imported. 
-  Note that the entities defined in the data model (i.e., in ``schema.py``) should 
+- the leftmost field is an URI that holds the entity to be imported.
+  Note that the entities defined in the data model (i.e., in ``schema.py``) should
   correspond to the entities whose URIs are specified in the import file.
 
-- the middle field is an URI that holds a relation whose subject is the  entity 
+- the middle field is an URI that holds a relation whose subject is the  entity
   defined by the leftmost field. Note that this should also correspond
   to the definitions in the data model.
 
-- the rightmost field is either an URI or a string. When this field is an URI, 
+- the rightmost field is either an URI or a string. When this field is an URI,
   it gives the object of the relation defined by the middle field.
   When the rightmost field is a string, the middle field is interpreted as an attribute
   of the subject (introduced by the leftmost field) and the rightmost field is
   interpreted as the value of the attribute.
 
-Note however that some attributes (i.e. relations whose objects are strings) 
+Note however that some attributes (i.e. relations whose objects are strings)
 have their objects defined as strings followed by ``^^`` and by another URI;
 we ignore this part.
 
 Let us show some examples:
 
 - of line holding an attribute definition:
-  ``<http://www4.wiwiss.fu-berlin.de/diseasome/resource/genes/CYP17A1> 
+  ``<http://www4.wiwiss.fu-berlin.de/diseasome/resource/genes/CYP17A1>
   <http://www.w3.org/2000/01/rdf-schema#label> "CYP17A1" .``
   The line contains the definition of the ``label`` attribute of an
   entity of type ``gene``. The value of ``label`` is '``CYP17A1``'.
 
 - of line holding a relation definition:
-  ``<http://www4.wiwiss.fu-berlin.de/diseasome/resource/diseases/1> 
-  <http://www4.wiwiss.fu-berlin.de/diseasome/resource/diseasome/associatedGene> 
+  ``<http://www4.wiwiss.fu-berlin.de/diseasome/resource/diseases/1>
+  <http://www4.wiwiss.fu-berlin.de/diseasome/resource/diseasome/associatedGene>
   <http://www4.wiwiss.fu-berlin.de/diseasome/resource/genes/HADH2> .``
   The line contains the definition of the ``associatedGene`` relation between
-  a ``disease`` subject entity identified by ``1`` and a ``gene`` object 
+  a ``disease`` subject entity identified by ``1`` and a ``gene`` object
   entity defined by ``HADH2``.
 
 Thus, for parsing the data, we can (:note: see the ``diseasome_parser`` module):
 
-1. define a couple of regular expressions for parsing the two kinds of lines, 
+1. define a couple of regular expressions for parsing the two kinds of lines,
    ``RE_ATTS`` for parsing the attribute definitions, and ``RE_RELS`` for parsing
    the relation definitions.
 
@@ -166,33 +166,33 @@
 
 In the case of diseasome data, we can just define two dictionaries for mapping
 the names of the relations as extracted by the parser, to the names of the relations
-as defined in the ``schema.py`` data model. In the ``diseasome_parser`` module 
-they are called ``MAPPING_ATTS`` and ``MAPPING_RELS``. 
+as defined in the ``schema.py`` data model. In the ``diseasome_parser`` module
+they are called ``MAPPING_ATTS`` and ``MAPPING_RELS``.
 Given that the relation and attribute names are given in CamelCase in the original data,
 mappings are necessary if we follow the PEP08 when naming the attributes in the data model.
-For example, the RDF relation ``chromosomalLocation`` is mapped into the schema relation 
+For example, the RDF relation ``chromosomalLocation`` is mapped into the schema relation
 ``chromosomal_location``.
 
 Once these mappings have been defined, we just iterate over the (subject, relation, object)
 tuples provided by the parser and we extract the entities, with their attributes and relations.
 For each entity, we thus have a dictionary with two keys, ``attributes`` and ``relations``.
-The value associated to the ``attributes`` key is a dictionary containing (attribute: value) 
-pairs, where "value" is a string, plus the ``cwuri`` key / attribute holding the URI of 
+The value associated to the ``attributes`` key is a dictionary containing (attribute: value)
+pairs, where "value" is a string, plus the ``cwuri`` key / attribute holding the URI of
 the entity itself.
 The value associated to the ``relations`` key is a dictionary containing (relation: value)
 pairs, where "value" is an URI.
-This is implemented in the ``entities_from_rdf`` interface function of the module 
+This is implemented in the ``entities_from_rdf`` interface function of the module
 ``diseasome_parser``. This function provides an iterator on the dictionaries containing
 the ``attributes`` and ``relations`` keys for all entities.
 
-However, this is a simple case. In real life, things can get much more complicated, and the 
-mapping can be far from trivial, especially when several data sources (which can follow 
+However, this is a simple case. In real life, things can get much more complicated, and the
+mapping can be far from trivial, especially when several data sources (which can follow
 different formatting and even structuring conventions) must be mapped into the same data model.
 
 Importing the data
 ~~~~~~~~~~~~~~~~~~
 
-The data import code should be placed in a Python module. Let us call it 
+The data import code should be placed in a Python module. Let us call it
 ``diseasome_import.py``. Then, this module should be called via
 ``cubicweb-ctl``, as follows::
 
@@ -216,31 +216,31 @@
    between them.
 
 2. ``RQLObjectStore``: store which uses the RQL language for performing
-   database insertions and updates. It relies on all the CubicWeb hooks 
+   database insertions and updates. It relies on all the CubicWeb hooks
    machinery, especially for dealing with security issues (database access
    permissions).
 
 2. ``NoHookRQLObjectStore``: store which uses the RQL language for
-   performing database insertions and updates, but for which 
-   all hooks are deactivated. This implies that 
-   certain checks with respect to the CubicWeb / Yams schema 
-   (data model) are not performed. However, all SQL queries 
+   performing database insertions and updates, but for which
+   all hooks are deactivated. This implies that
+   certain checks with respect to the CubicWeb / Yams schema
+   (data model) are not performed. However, all SQL queries
    obtained from the RQL ones are executed in a sequential
    manner, one query per inserted entity.
 
-4. ``SQLGenObjectStore``: store which uses the SQL language directly. 
-   It inserts entities either sequentially, by executing an SQL query 
-   for each entity, or directly by using one PostGRES ``COPY FROM`` 
-   query for a set of similarly structured entities. 
+4. ``SQLGenObjectStore``: store which uses the SQL language directly.
+   It inserts entities either sequentially, by executing an SQL query
+   for each entity, or directly by using one PostGRES ``COPY FROM``
+   query for a set of similarly structured entities.
 
 For really massive imports (millions or billions of entities), there
-is a cube ``dataio`` which contains another store, called 
+is a cube ``dataio`` which contains another store, called
 ``MassiveObjectStore``. This store is similar to ``SQLGenObjectStore``,
 except that anything related to CubicWeb is bypassed. That is, even the
 CubicWeb EID entity identifiers are not handled. This store is the fastest,
 but has a slightly different API from the other four stores mentioned above.
 Moreover, it has an important limitation, in that it doesn't insert inlined [#]_
-relations in the database. 
+relations in the database.
 
 .. [#] An inlined relation is a relation defined in the schema
        with the keyword argument ``inlined=True``. Such a relation
@@ -260,9 +260,9 @@
 
 All three stores ``RQLObjectStore``, ``NoHookRQLObjectStore`` and
 ``SQLGenObjectStore`` provide exactly the same API for importing data, that is
-entities and relations, in an SQL database. 
+entities and relations, in an SQL database.
 
-Before using a store, one must import the ``dataimport`` module and then initialize 
+Before using a store, one must import the ``dataimport`` module and then initialize
 the store, with the current ``session`` as a parameter::
 
     import cubicweb.dataimport as cwdi
@@ -274,7 +274,7 @@
 
 #. ``create_entity(Etype, **attributes)``, which allows us to add
    an entity of the Yams type ``Etype`` to the database. This entity's attributes
-   are specified in the ``attributes`` dictionary. The method returns the entity 
+   are specified in the ``attributes`` dictionary. The method returns the entity
    created in the database. For example, we add two entities,
    a person, of ``Person`` type, and a location, of ``Location`` type::
 
@@ -284,21 +284,21 @@
 
 #. ``relate(subject_eid, r_type, object_eid)``, which allows us to add a relation
    of the Yams type ``r_type`` to the database. The relation's subject is an entity
-   whose EID is ``subject_eid``; its object is another entity, whose EID is 
+   whose EID is ``subject_eid``; its object is another entity, whose EID is
    ``object_eid``.  For example [#]_::
 
         store.relate(person.eid(), 'lives_in', location.eid(), **kwargs)
 
    ``kwargs`` is only used by the ``SQLGenObjectStore``'s ``relate`` method and is here
    to allow us to specify the type of the subject of the relation, when the relation is
-   defined as inlined in the schema. 
+   defined as inlined in the schema.
 
 .. [#] The ``eid`` method of an entity defined via ``create_entity`` returns
        the entity identifier as assigned by CubicWeb when creating the entity.
        This only works for entities defined via the stores in the CubicWeb's
        ``dataimport`` module.
 
-   The keyword argument that is understood by ``SQLGenObjectStore`` is called 
+   The keyword argument that is understood by ``SQLGenObjectStore`` is called
    ``subjtype`` and holds the type of the subject entity. For the example considered here,
    this comes to having [#]_::
 
@@ -306,26 +306,26 @@
 
    If ``subjtype`` is not specified, then the store tries to infer the type of the subject.
    However, this doesn't always work, e.g. when there are several possible subject types
-   for a given relation type. 
+   for a given relation type.
 
 .. [#] The ``cw_etype`` attribute of an entity defined via ``create_entity`` holds
        the type of the entity just created. This only works for entities defined via
        the stores in the CubicWeb's ``dataimport`` module. In the example considered
        here, ``person.cw_etype`` holds ``'Person'``.
-    
+
    All the other stores but ``SQLGenObjectStore`` ignore the ``kwargs`` parameters.
 
 #. ``flush()``, which allows us to perform the actual commit into the database, along
-   with some cleanup operations. Ideally, this method should be called as often as 
+   with some cleanup operations. Ideally, this method should be called as often as
    possible, that is after each insertion in the database, so that database sessions
-   are kept as atomic as possible. In practice, we usually call this method twice: 
+   are kept as atomic as possible. In practice, we usually call this method twice:
    first, after all the entities have been created, second, after all relations have
-   been created. 
+   been created.
 
    Note however that before each commit the database insertions
    have to be consistent with the schema. Thus, if, for instance,
    an entity has an attribute defined through a relation (viz.
-   a ``SubjectRelation``) with a ``"1"`` or ``"+"`` object 
+   a ``SubjectRelation``) with a ``"1"`` or ``"+"`` object
    cardinality, we have to create the entity under discussion,
    the object entity of the relation under discussion, and the
    relation itself, before committing the additions to the database.
@@ -343,14 +343,14 @@
 to interact directly with the database server, via SQL queries.
 
 Moreover, these queries rely on PostGreSQL's ``COPY FROM`` instruction
-to create several entities in a single query. This brings tremendous 
+to create several entities in a single query. This brings tremendous
 performance improvements with respect to the RQL-based data insertion
 procedures.
 
 However, the API of this store is slightly different from the API of
 the stores in CubicWeb's ``dataimport`` module.
 
-Before using the store, one has to import the ``dataio`` cube's 
+Before using the store, one has to import the ``dataio`` cube's
 ``dataimport`` module, then initialize the store by giving it the
 ``session`` parameter::
 
@@ -375,23 +375,23 @@
    they only specify their types, not their unique identifiers.
 
 #. ``create_entity(Etype, **attributes)``, which allows us to add new entities,
-   whose attributes are given in the ``attributes`` dictionary. 
-   Please note however that, by default, this method does *not* return 
+   whose attributes are given in the ``attributes`` dictionary.
+   Please note however that, by default, this method does *not* return
    the created entity. The method is called, for example, as in::
 
-        store.create_entity('Person', name='Toto', age='18', height='190', 
+        store.create_entity('Person', name='Toto', age='18', height='190',
                             uri='http://link/to/person/toto_18_190')
         store.create_entity('Location', town='Paris', arrondissement='13',
                             uri='http://link/to/location/paris_13')
-   
+
    In order to be able to link these entities via the relations when needed,
    we must provide ourselves a means for uniquely identifying the entities.
    In general, this is done via URIs, stored in attributes like ``uri`` or
    ``cwuri``. The name of the attribute is irrelevant as long as its value is
    unique for each entity.
 
-#. ``relate_by_iid(subject_iid, r_type, object_iid)`` allows us to actually 
-   relate the entities uniquely identified by ``subject_iid`` and 
+#. ``relate_by_iid(subject_iid, r_type, object_iid)`` allows us to actually
+   relate the entities uniquely identified by ``subject_iid`` and
    ``object_iid`` via a relation of type ``r_type``. For example::
 
         store.relate_by_iid('http://link/to/person/toto_18_190',
@@ -411,11 +411,11 @@
    their values are unique. For example, for inserting all relations
    of type ``lives_in`` between ``People`` and ``Location`` entities,
    we write::
-        
+
         store.convert_relations('Person', 'lives_in', 'Location', 'uri', 'uri')
 
-#. ``flush()`` performs the actual commit in the database. It only needs 
-   to be called after ``create_entity`` and ``relate_by_iid`` calls. 
+#. ``flush()`` performs the actual commit in the database. It only needs
+   to be called after ``create_entity`` and ``relate_by_iid`` calls.
    Please note that ``relate_by_iid`` does *not* perform insertions into
    the database, hence calling ``flush()`` for it would have no effect.
 
@@ -437,106 +437,106 @@
 We define an import function, ``diseasome_import``, which does basically four things:
 
 #. creates and initializes the store to be used, via a line such as::
-    
+
         store = cwdi.SQLGenObjectStore(session)
-   
-   where ``cwdi`` is the imported ``cubicweb.dataimport`` or 
+
+   where ``cwdi`` is the imported ``cubicweb.dataimport`` or
    ``cubicweb_dataio.dataimport``.
 
-#. calls the diseasome parser, that is, the ``entities_from_rdf`` function in the 
+#. calls the diseasome parser, that is, the ``entities_from_rdf`` function in the
    ``diseasome_parser`` module and iterates on its result, in a line such as::
-        
+
         for entity, relations in parser.entities_from_rdf(filename, ('gene', 'disease')):
-        
-   where ``parser`` is the imported ``diseasome_parser`` module, and ``filename`` is the 
+
+   where ``parser`` is the imported ``diseasome_parser`` module, and ``filename`` is the
    name of the file containing the data (with its path), e.g. ``../data/diseasome_dump.nt``.
 
-#. creates the entities to be inserted in the database; for Diseasome, there are two 
+#. creates the entities to be inserted in the database; for Diseasome, there are two
    kinds of entities:
-   
+
    #. entities defined in the data model, viz. ``Gene`` and ``Disease`` in our case.
    #. entities which are built in CubicWeb / Yams, viz. ``ExternalUri`` which define
       URIs.
-   
+
    As we are working with RDF data, each entity is defined through a series of URIs. Hence,
    each "relational attribute" [#]_ of an entity is defined via an URI, that is, in CubicWeb
    terms, via an ``ExternalUri`` entity. The entities are created, in the loop presented above,
    as such::
-        
+
         ent = store.create_entity(etype, **entity)
-        
+
    where ``etype`` is the appropriate entity type, either ``Gene`` or ``Disease``.
 
 .. [#] By "relational attribute" we denote an attribute (of an entity) which
        is defined through a relation, e.g. the ``chromosomal_location`` attribute
        of ``Disease`` entities, which is defined through a relation between a
        ``Disease`` and an ``ExternalUri``.
-   
+
    The ``ExternalUri`` entities are as many as URIs in the data file. For them, we define a unique
    attribute, ``uri``, which holds the URI under discussion::
-        
+
         extu = store.create_entity('ExternalUri', uri="http://path/of/the/uri")
 
 #. creates the relations between the entities. We have relations between:
-   
+
    #. entities defined in the schema, e.g. between ``Disease`` and ``Gene``
-      entities, such as the ``associated_genes`` relation defined for 
+      entities, such as the ``associated_genes`` relation defined for
       ``Disease`` entities.
    #. entities defined in the schema and ``ExternalUri`` entities, such as ``gene_id``.
-   
-   The way relations are added to the database depends on the store: 
-   
-   - for the stores in the CubicWeb ``dataimport`` module, we only use 
-     ``store.relate``, in 
-     another loop, on the relations (that is, a 
+
+   The way relations are added to the database depends on the store:
+
+   - for the stores in the CubicWeb ``dataimport`` module, we only use
+     ``store.relate``, in
+     another loop, on the relations (that is, a
      loop inside the preceding one, mentioned at step 2)::
-        
+
         for rtype, rels in relations.iteritems():
             ...
-            
+
             store.relate(ent.eid(), rtype, extu.eid(), **kwargs)
-        
+
      where ``kwargs`` is a dictionary designed to accommodate the need for specifying
      the type of the subject entity of the relation, when the relation is inlined and
      ``SQLGenObjectStore`` is used. For example::
-            
+
             ...
             store.relate(ent.eid(), 'chromosomal_location', extu.eid(), subjtype='Disease')
-   
-   - for the ``MassiveObjectStore`` in the ``dataio`` cube's ``dataimport`` module, 
+
+   - for the ``MassiveObjectStore`` in the ``dataio`` cube's ``dataimport`` module,
      the relations are created in three steps:
-     
+
      #. first, a table is created for each relation type, as in::
-            
+
             ...
             store.init_rtype_table(ent.cw_etype, rtype, extu.cw_etype)
-            
+
         which comes down to lines such as::
-            
+
             store.init_rtype_table('Disease', 'associated_genes', 'Gene')
             store.init_rtype_table('Gene', 'gene_id', 'ExternalUri')
-            
-     #. second, the URI of each entity will be used as its identifier, in the 
+
+     #. second, the URI of each entity will be used as its identifier, in the
         ``relate_by_iid`` method, such as::
-            
+
             disease_uri = 'http://www4.wiwiss.fu-berlin.de/diseasome/resource/diseases/3'
             gene_uri = '<http://www4.wiwiss.fu-berlin.de/diseasome/resource/genes/HSD3B2'
             store.relate_by_iid(disease_uri, 'associated_genes', gene_uri)
-            
-     #. third, the relations for each relation type will be added to the database, 
+
+     #. third, the relations for each relation type will be added to the database,
         via the ``convert_relations`` method, such as in::
-            
+
             store.convert_relations('Disease', 'associated_genes', 'Gene', 'cwuri', 'cwuri')
-            
+
         and::
-            
+
             store.convert_relations('Gene', 'hgnc_id', 'ExternalUri', 'cwuri', 'uri')
-            
+
         where ``cwuri`` and ``uri`` are the attributes which store the URIs of the entities
         defined in the data model, and of the ``ExternalUri`` entities, respectively.
 
 #. flushes all relations and entities::
-    
+
     store.flush()
 
    which performs the actual commit of the inserted entities and relations in the database.
@@ -551,10 +551,10 @@
 
 In order to time the import script, we just decorate the import function with the ``timed``
 decorator::
-    
+
     from logilab.common.decorators import timed
     ...
-    
+
     @timed
     def diseasome_import(session, filename):
         ...
@@ -565,7 +565,7 @@
 
 Here, the meanings of these measurements are [#]_:
 
-- ``clock`` is the time spent by CubicWeb, on the server side (i.e. hooks and data pre- / post-processing on SQL 
+- ``clock`` is the time spent by CubicWeb, on the server side (i.e. hooks and data pre- / post-processing on SQL
   queries),
 
 - ``time`` is the sum between ``clock`` and the time spent in PostGreSQL.
@@ -586,7 +586,7 @@
 - the data file, introduced by ``-df [--datafile]``, and
 - the store, introduced by ``-st [--store]``.
 
-The timings (in seconds) for different stores are given in the following table, for 
+The timings (in seconds) for different stores are given in the following table, for
 importing 4213 ``Disease`` entities and 3919 ``Gene`` entities with the import module
 just described:
 
@@ -613,23 +613,23 @@
 Three of those are integrated into CubicWeb, namely the ``RQLObjectStore``, ``NoHookRQLObjectStore`` and
 ``SQLGenObjectStore`` stores, which have a common API:
 
-- ``RQLObjectStore`` is by far the slowest, especially its time spent on the 
-  CubicWeb side, and so it should be used only for small amounts of 
+- ``RQLObjectStore`` is by far the slowest, especially its time spent on the
+  CubicWeb side, and so it should be used only for small amounts of
   "sensitive" data (i.e. where security is a concern).
 
-- ``NoHookRQLObjectStore`` slashes by almost four the time spent on the CubicWeb side, 
-  but is also quite slow; on the PostGres side it is as slow as the previous store. 
+- ``NoHookRQLObjectStore`` slashes by almost four the time spent on the CubicWeb side,
+  but is also quite slow; on the PostGres side it is as slow as the previous store.
   It should be used for data where security is not a concern,
   but consistency (with the data model) is.
 
-- ``SQLGenObjectStore`` slashes by three the time spent on the CubicWeb side and by five the time 
+- ``SQLGenObjectStore`` slashes by three the time spent on the CubicWeb side and by five the time
   spent on the PostGreSQL side. It should be used for relatively great amounts of data, where
   security and data consistency are not a concern. Compared to the previous store, it has the
   disadvantage that, for inlined relations, we must specify their subjects' types.
 
 For really huge amounts of data there is a fourth store, ``MassiveObjectStore``, available
 from the ``dataio`` cube. It provides a blazing performance with respect to all other stores:
-it is almost 25 times faster than ``RQLObjectStore`` and almost three times faster than 
+it is almost 25 times faster than ``RQLObjectStore`` and almost three times faster than
 ``SQLGenObjectStore``. However, it has a few usage caveats that should be taken into account:
 
 #. it cannot insert relations defined as inlined in the schema,