[doc] more fixes of warnings/errors in doc build
- move 3.21.rst changelog description at it's proper place
- include changelogs in the main index
- few typos
- add :noindex: for autoxxx directives not in api/
Related to #4832808
--- a/appobject.py Tue Jun 23 17:04:40 2015 +0200
+++ b/appobject.py Thu Jul 02 19:54:25 2015 +0200
@@ -16,7 +16,6 @@
# You should have received a copy of the GNU Lesser General Public License along
# with CubicWeb. If not, see <http://www.gnu.org/licenses/>.
"""
-.. _appobject:
The `AppObject` class
---------------------
@@ -27,7 +26,6 @@
We can find a certain number of attributes and methods defined in this class and
common to all the application objects.
-.. autoclass:: AppObject
"""
__docformat__ = "restructuredtext en"
--- a/cwvreg.py Tue Jun 23 17:04:40 2015 +0200
+++ b/cwvreg.py Thu Jul 02 19:54:25 2015 +0200
@@ -41,7 +41,6 @@
- handling the registration process at startup time, and during automatic
reloading in debug mode.
-.. _AppObjectRecording:
Details of the recording process
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -80,9 +79,13 @@
named `vreg`):
.. automethod:: cubicweb.cwvreg.CWRegistryStore.register_all
+ :noindex:
.. automethod:: cubicweb.cwvreg.CWRegistryStore.register_and_replace
+ :noindex:
.. automethod:: cubicweb.cwvreg.CWRegistryStore.register
+ :noindex:
.. automethod:: cubicweb.cwvreg.CWRegistryStore.unregister
+ :noindex:
Examples:
@@ -122,7 +125,6 @@
to the `register_all` method.
-.. _Selection:
Runtime objects selection
~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -171,7 +173,6 @@
case. Among selectable objects, the `is_instance('Card')` selector will return a higher
score since it's more specific, so the correct view will be selected as expected.
-.. _SelectionAPI:
API for objects selections
~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -182,12 +183,16 @@
selectors that will inspect their content and return a score accordingly.
.. automethod:: cubicweb.vregistry.Registry.select
+ :noindex:
.. automethod:: cubicweb.vregistry.Registry.select_or_none
+ :noindex:
.. automethod:: cubicweb.vregistry.Registry.possible_objects
+ :noindex:
.. automethod:: cubicweb.vregistry.Registry.object_by_id
+ :noindex:
"""
__docformat__ = "restructuredtext en"
@@ -269,7 +274,7 @@
def selected(self, winner, args, kwargs):
"""overriden to avoid the default 'instanciation' behaviour, ie
- winner(*args, **kwargs)
+ `winner(*args, **kwargs)`
"""
return winner
--- a/doc/3.21.rst Tue Jun 23 17:04:40 2015 +0200
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,74 +0,0 @@
-What's new in CubicWeb 3.21?
-============================
-
-New features
-------------
-
-* the datadir-url configuration option lets one choose where static data files
- are served (instead of the default ${base-url}/data/)
-
-* some integrity checking that was previously implemented in Python was
- moved to the SQL backend. This includes some constraints, and
- referential integrity. Some consequences are that:
-
- - disabling integrity hooks no longer disables those checks
- - upgrades that modify constraints will fail when running on sqlite
- (but upgrades aren't supported on sqlite anyway)
-
-* for easier instance monitoring, cubicweb can regularly dump some statistics
- (basically those exposed by the 'info' and 'gc' views) in json format to a file
-
-User-visible changes
---------------------
-
-* the use of fckeditor for text form fields is disabled by default
-
-* the 'https-deny-anonymous' configuration setting no longer exists
-
-Code movement
--------------
-
-The cubicweb.web.views.timeline module (providing the timeline-json, timeline
-and static-timeline views) has moved to a standalone cube_
-
-.. _cube: https://www.cubicweb.org/project/cubicweb-timeline
-
-API changes
------------
-
-* req.set_cookie's "expires" argument, if not None, is expected to be a
- date or a datetime in UTC. It was previously interpreted as localtime
- with the UTC offset the server started in, which was inconsistent (we
- are not aware of any users of that API).
-
-* the way to run tests on a postgresql backend has changed slightly, use
- cubicweb.devtools.{start,stop}pgcluster in setUpModule and tearDownModule
-
-* the Connection and ClientConnection objects introduced in CubicWeb 3.19 have
- been unified. To connect to a repository, use:
-
- session = repo.new_session(login, password=...)
- with session.new_cnx() as cnx:
- cnx.execute(...)
-
- In tests, the 'repo_cnx' and 'client_cnx' methods of RepoAccess are now
- aliases to 'cnx'.
-
-Deprecated code drops
----------------------
-
-* the user_callback api has been removed; people should use plain
- ajax functions instead
-
-* the `Pyro` and `Zmq-pickle` remote repository access methods have
- been entirely removed (emerging alternatives such as rqlcontroller
- and cwclientlib should be used instead). Note that as a side effect,
- "repository-only" instances (i.e. without a http component) are no
- longer possible. If you have any such instances, you will need to
- rename the configuration file from repository.conf to all-in-one.conf
- and run ``cubicweb-ctl upgrade`` to update it. Likewise, remote cubicweb-ctl
- shell is no longer available.
-
-* the old (deprecated since 3.19) `DBAPI` api is completely removed
-
-* cubicweb.toolsutils.config_connect() has been removed
--- a/doc/api/appobject.rst Tue Jun 23 17:04:40 2015 +0200
+++ b/doc/api/appobject.rst Thu Jul 02 19:54:25 2015 +0200
@@ -5,6 +5,7 @@
.. automodule:: cubicweb.appobject
- .. autoclass:: AppObject
- :show-inheritance:
- :members:
+.. _appobject:
+ .. autoclass:: AppObject
+ :show-inheritance:
+ :members:
--- a/doc/api/cwvreg.rst Tue Jun 23 17:04:40 2015 +0200
+++ b/doc/api/cwvreg.rst Thu Jul 02 19:54:25 2015 +0200
@@ -33,6 +33,6 @@
:show-inheritance:
:members:
- .. autoclass:: CWRegistryStore:
+ .. autoclass:: CWRegistryStore
:show-inheritance:
:members:
--- a/doc/api/rset.rst Tue Jun 23 17:04:40 2015 +0200
+++ b/doc/api/rset.rst Thu Jul 02 19:54:25 2015 +0200
@@ -1,7 +1,7 @@
.. _rset_module:
:mod:`cubicweb.rset`
-===================
+====================
.. automodule:: cubicweb.rset
--- a/doc/book/admin/pyro.rst Tue Jun 23 17:04:40 2015 +0200
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,62 +0,0 @@
-.. _UsingPyro:
-
-Working with a distributed client (using Pyro)
-==============================================
-
-In some circumstances, it is practical to split the repository and
-web-client parts of the application for load-balancing reasons. Or
-one wants to access the repository from independant scripts to consult
-or update the database.
-
-Prerequisites
--------------
-
-For this to work, several steps have to be taken in order.
-
-You must first ensure that the appropriate software is installed and
-running (see :ref:`ConfigEnv`)::
-
- pyro-nsd -x -p 6969
-
-Then you have to set appropriate options in your configuration. For
-instance::
-
- pyro-server=yes
- pyro-ns-host=localhost:6969
-
- pyro-instance-id=myinstancename
-
-Connect to the CubicWeb repository from a python script
--------------------------------------------------------
-
-Assuming pyro-nsd is running and your instance is configured with ``pyro-server=yes``,
-you will be able to use :mod:`cubicweb.dbapi` api to initiate the connection.
-
-.. note::
- Regardless of whether your instance is pyro activated or not, you can still
- achieve this by using cubicweb-ctl shell scripts in a simpler way, as by default
- it creates a repository 'in-memory' instead of connecting through pyro. That
- also means you've to be on the host where the instance is running.
-
-Finally, the client (for instance a python script) must connect specifically
-as in the following example code:
-
-.. sourcecode:: python
-
- from cubicweb import dbapi
-
- cnx = dbapi.connect(database='instance-id', user='admin', password='admin')
- cnx.load_appobjects()
- cur = cnx.cursor()
- for name in (u'Personal', u'Professional', u'Computers'):
- cur.execute('INSERT Tag T: T name %(n)s', {'n': name})
- cnx.commit()
-
-Calling :meth:`cubicweb.dbapi.load_appobjects`, will populate the
-cubicweb registries (see :ref:`VRegistryIntro`) with the application
-objects installed on the host where the script runs. You'll then be
-allowed to use the ORM goodies and custom entity methods and views. Of
-course this is optional, without it you can still get the repository
-data through the connection but in a roughly way: only RQL cursors
-will be available, e.g. you can't even build entity objects from the
-result set.
--- a/doc/book/annexes/faq.rst Tue Jun 23 17:04:40 2015 +0200
+++ b/doc/book/annexes/faq.rst Thu Jul 02 19:54:25 2015 +0200
@@ -137,10 +137,6 @@
to anonymous user, which will automatically execute what is
decribed above.
-How to load data from a python script ?
----------------------------------------
-Please, refer to :ref:`UsingPyro`.
-
How to format an entity date attribute ?
----------------------------------------
--- a/doc/book/devrepo/datamodel/definition.rst Tue Jun 23 17:04:40 2015 +0200
+++ b/doc/book/devrepo/datamodel/definition.rst Thu Jul 02 19:54:25 2015 +0200
@@ -90,7 +90,7 @@
There is also a `RichString` kindof type:
- .. autoclass:: yams.buildobjs.RichString
+.. autofunction:: yams.buildobjs.RichString
The ``__unique_together__`` class attribute is a list of tuples of names of
attributes or inlined relations. For each tuple, CubicWeb ensures the unicity
--- a/doc/book/devrepo/devcore/dbapi.rst Tue Jun 23 17:04:40 2015 +0200
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,133 +0,0 @@
-.. _dbapi:
-
-Python/RQL API
-~~~~~~~~~~~~~~
-
-The Python API developped to interface with RQL is inspired from the standard db-api,
-with a Connection object having the methods cursor, rollback and commit essentially.
-The most important method is the `execute` method of a cursor.
-
-.. sourcecode:: python
-
- execute(rqlstring, args=None, build_descr=True)
-
-:rqlstring: the RQL query to execute (unicode)
-:args: if the query contains substitutions, a dictionary containing the values to use
-
-The `Connection` object owns the methods `commit` and `rollback`. You
-*should never need to use them* during the development of the web
-interface based on the *CubicWeb* framework as it determines the end
-of the transaction depending on the query execution success. They are
-however useful in other contexts such as tests or custom controllers.
-
-.. note::
-
- If a query generates an error related to security (:exc:`Unauthorized`) or to
- integrity (:exc:`ValidationError`), the transaction can still continue but you
- won't be able to commit it, a rollback will be necessary to start a new
- transaction.
-
- Also, a rollback is automatically done if an error occurs during commit.
-
-.. note::
-
- A :exc:`ValidationError` has a `entity` attribute. In CubicWeb,
- this atttribute is set to the entity's eid (not a reference to the
- entity itself).
-
-Executing RQL queries from a view or a hook
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-When you're within code of the web interface, the db-api like connexion is
-handled by the request object. You should not have to access it directly, but
-use the `execute` method directly available on the request, eg:
-
-.. sourcecode:: python
-
- rset = self._cw.execute(rqlstring, kwargs)
-
-Similarly, on the server side (eg in hooks), there is no db-api connexion (since
-you're directly inside the data-server), so you'll have to use the execute method
-of the session object.
-
-
-Proper usage of `.execute`
-~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-Let's say you want to get T which is in configuration C, this translates to:
-
-.. sourcecode:: python
-
- self._cw.execute('Any T WHERE T in_conf C, C eid %s' % entity.eid)
-
-But it must be written in a syntax that will benefit from the use
-of a cache on the RQL server side:
-
-.. sourcecode:: python
-
- self._cw.execute('Any T WHERE T in_conf C, C eid %(x)s', {'x': entity.eid})
-
-The syntax tree is built once for the "generic" RQL and can be re-used
-with a number of different eids. There rql IN operator is an exception
-to this rule.
-
-.. sourcecode:: python
-
- self._cw.execute('Any T WHERE T in_conf C, C name IN (%s)'
- % ','.join(['foo', 'bar']))
-
-Alternativelly, some of the common data related to an entity can be
-obtained from the `entity.related()` method (which is used under the
-hood by the orm when you use attribute access notation on an entity to
-get a relation. The initial request would then be translated to:
-
-.. sourcecode:: python
-
- entity.related('in_conf', 'object')
-
-Additionnaly this benefits from the fetch_attrs policy (see
-:ref:`FetchAttrs`) eventually defined on the class element, which says
-which attributes must be also loaded when the entity is loaded through
-the orm.
-
-
-.. _resultset:
-
-The `ResultSet` API
-~~~~~~~~~~~~~~~~~~~
-
-ResultSet instances are a very commonly manipulated object. They have
-a rich API as seen below, but we would like to highlight a bunch of
-methods that are quite useful in day-to-day practice:
-
-* `__str__()` (applied by `print`) gives a very useful overview of both
- the underlying RQL expression and the data inside; unavoidable for
- debugging purposes
-
-* `printable_rql()` produces back a well formed RQL expression as a
- string; it is very useful to build views
-
-* `entities()` returns a generator on all entities of the result set
-
-* `get_entity(row, col)` gets the entity at row, col coordinates; one
- of the most used result set method
-
-.. autoclass:: cubicweb.rset.ResultSet
- :members:
-
-
-The `Cursor` and `Connection` API
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-The whole cursor API is developped below.
-
-.. note::
-
- In practice you'll usually use the `.execute` method on the _cw object of
- appobjects. Usage of other methods is quite rare.
-
-.. autoclass:: cubicweb.dbapi.Cursor
- :members:
-
-.. autoclass:: cubicweb.dbapi.Connection
- :members:
--- a/doc/book/devrepo/repo/sessions.rst Tue Jun 23 17:04:40 2015 +0200
+++ b/doc/book/devrepo/repo/sessions.rst Thu Jul 02 19:54:25 2015 +0200
@@ -150,6 +150,7 @@
.. autoclass:: cubicweb.rset.ResultSet
:members:
+ :noindex:
Authentication and management of sessions
--- a/doc/book/devrepo/vreg.rst Tue Jun 23 17:04:40 2015 +0200
+++ b/doc/book/devrepo/vreg.rst Thu Jul 02 19:54:25 2015 +0200
@@ -12,9 +12,12 @@
An overview of AppObjects, the VRegistry and Selectors is given in the
:ref:`VRegistryIntro` chapter.
-.. autodocstring:: cubicweb.cwvreg
+.. autodocstring:: cubicweb.cwvreg
+ :noindex:
.. autodocstring:: cubicweb.predicates
+ :noindex:
.. automodule:: cubicweb.appobject
+ :noindex:
Base predicates
---------------
@@ -33,87 +36,137 @@
Bare predicates
~~~~~~~~~~~~~~~
+
Those predicates are somewhat dumb, which doesn't mean they're not (very) useful.
.. autoclass:: cubicweb.appobject.yes
+ :noindex:
.. autoclass:: cubicweb.predicates.match_kwargs
+ :noindex:
.. autoclass:: cubicweb.predicates.appobject_selectable
+ :noindex:
.. autoclass:: cubicweb.predicates.adaptable
+ :noindex:
.. autoclass:: cubicweb.predicates.configuration_values
+ :noindex:
Result set predicates
~~~~~~~~~~~~~~~~~~~~~
+
Those predicates are looking for a result set in the context ('rset' argument or
the input context) and match or not according to its shape. Some of these
predicates have different behaviour if a particular cell of the result set is
specified using 'row' and 'col' arguments of the input context or not.
.. autoclass:: cubicweb.predicates.none_rset
+ :noindex:
.. autoclass:: cubicweb.predicates.any_rset
+ :noindex:
.. autoclass:: cubicweb.predicates.nonempty_rset
+ :noindex:
.. autoclass:: cubicweb.predicates.empty_rset
+ :noindex:
.. autoclass:: cubicweb.predicates.one_line_rset
+ :noindex:
.. autoclass:: cubicweb.predicates.multi_lines_rset
+ :noindex:
.. autoclass:: cubicweb.predicates.multi_columns_rset
+ :noindex:
.. autoclass:: cubicweb.predicates.paginated_rset
+ :noindex:
.. autoclass:: cubicweb.predicates.sorted_rset
+ :noindex:
.. autoclass:: cubicweb.predicates.one_etype_rset
+ :noindex:
.. autoclass:: cubicweb.predicates.multi_etypes_rset
+ :noindex:
Entity predicates
~~~~~~~~~~~~~~~~~
+
Those predicates are looking for either an `entity` argument in the input context,
or entity found in the result set ('rset' argument or the input context) and
match or not according to entity's (instance or class) properties.
.. autoclass:: cubicweb.predicates.non_final_entity
+ :noindex:
.. autoclass:: cubicweb.predicates.is_instance
+ :noindex:
.. autoclass:: cubicweb.predicates.score_entity
+ :noindex:
.. autoclass:: cubicweb.predicates.rql_condition
+ :noindex:
.. autoclass:: cubicweb.predicates.relation_possible
+ :noindex:
.. autoclass:: cubicweb.predicates.partial_relation_possible
+ :noindex:
.. autoclass:: cubicweb.predicates.has_related_entities
+ :noindex:
.. autoclass:: cubicweb.predicates.partial_has_related_entities
+ :noindex:
.. autoclass:: cubicweb.predicates.has_permission
+ :noindex:
.. autoclass:: cubicweb.predicates.has_add_permission
+ :noindex:
.. autoclass:: cubicweb.predicates.has_mimetype
+ :noindex:
.. autoclass:: cubicweb.predicates.is_in_state
+ :noindex:
.. autofunction:: cubicweb.predicates.on_fire_transition
+ :noindex:
Logged user predicates
~~~~~~~~~~~~~~~~~~~~~~
+
Those predicates are looking for properties of the user issuing the request.
.. autoclass:: cubicweb.predicates.match_user_groups
+ :noindex:
Web request predicates
~~~~~~~~~~~~~~~~~~~~~~
+
Those predicates are looking for properties of *web* request, they can not be
used on the data repository side.
.. autoclass:: cubicweb.predicates.no_cnx
+ :noindex:
.. autoclass:: cubicweb.predicates.anonymous_user
+ :noindex:
.. autoclass:: cubicweb.predicates.authenticated_user
+ :noindex:
.. autoclass:: cubicweb.predicates.match_form_params
+ :noindex:
.. autoclass:: cubicweb.predicates.match_search_state
+ :noindex:
.. autoclass:: cubicweb.predicates.match_context_prop
+ :noindex:
.. autoclass:: cubicweb.predicates.match_context
+ :noindex:
.. autoclass:: cubicweb.predicates.match_view
+ :noindex:
.. autoclass:: cubicweb.predicates.primary_view
+ :noindex:
.. autoclass:: cubicweb.predicates.contextual
+ :noindex:
.. autoclass:: cubicweb.predicates.specified_etype_implements
+ :noindex:
.. autoclass:: cubicweb.predicates.attribute_edited
+ :noindex:
.. autoclass:: cubicweb.predicates.match_transition
+ :noindex:
Other predicates
~~~~~~~~~~~~~~~~
.. autoclass:: cubicweb.predicates.match_exception
+ :noindex:
.. autoclass:: cubicweb.predicates.debug_mode
+ :noindex:
You'll also find some other (very) specific predicates hidden in other modules
than :mod:`cubicweb.predicates`.
--- a/doc/changes/3.14.rst Tue Jun 23 17:04:40 2015 +0200
+++ b/doc/changes/3.14.rst Thu Jul 02 19:54:25 2015 +0200
@@ -158,7 +158,7 @@
Configuration
-------------
+-------------
* Added option 'resources-concat' to make javascript/css files concatenation
optional.
--- a/doc/changes/3.18.rst Tue Jun 23 17:04:40 2015 +0200
+++ b/doc/changes/3.18.rst Thu Jul 02 19:54:25 2015 +0200
@@ -20,10 +20,10 @@
`set_fields_order` method similar to the one available for forms
* new method `ResultSet.one(col=0)` to retrive a single entity and enforce the
- result has only one row (see `#3352314 https://www.cubicweb.org/ticket/3352314`_)
+ result has only one row (see `#3352314 <https://www.cubicweb.org/ticket/3352314>`_)
* new method `RequestSessionBase.find` to look for entities
- (see `#3361290 https://www.cubicweb.org/ticket/3361290`_)
+ (see `#3361290 <https://www.cubicweb.org/ticket/3361290>`_)
* the embedded jQuery copy has been updated to version 1.10.2, and jQuery UI to
version 1.10.3.
@@ -79,10 +79,10 @@
* the old multi-source system
* `find_one_entity` and `find_entities` in favor of `find`
- (see `#3361290 https://www.cubicweb.org/ticket/3361290`_)
+ (see `#3361290 <https://www.cubicweb.org/ticket/3361290>`_)
-* the `TmpFileViewMixin` and `TmpPngView` classes (see `#3400448
- https://www.cubicweb.org/ticket/3400448`_)
+* the `TmpFileViewMixin` and `TmpPngView` classes (see
+ `#3400448 <https://www.cubicweb.org/ticket/3400448>`_)
Deprecated Code Drops
----------------------
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/doc/changes/3.21.rst Thu Jul 02 19:54:25 2015 +0200
@@ -0,0 +1,74 @@
+What's new in CubicWeb 3.21?
+============================
+
+New features
+------------
+
+* the datadir-url configuration option lets one choose where static data files
+ are served (instead of the default ${base-url}/data/)
+
+* some integrity checking that was previously implemented in Python was
+ moved to the SQL backend. This includes some constraints, and
+ referential integrity. Some consequences are that:
+
+ - disabling integrity hooks no longer disables those checks
+ - upgrades that modify constraints will fail when running on sqlite
+ (but upgrades aren't supported on sqlite anyway)
+
+* for easier instance monitoring, cubicweb can regularly dump some statistics
+ (basically those exposed by the 'info' and 'gc' views) in json format to a file
+
+User-visible changes
+--------------------
+
+* the use of fckeditor for text form fields is disabled by default
+
+* the 'https-deny-anonymous' configuration setting no longer exists
+
+Code movement
+-------------
+
+The cubicweb.web.views.timeline module (providing the timeline-json, timeline
+and static-timeline views) has moved to a standalone cube_
+
+.. _cube: https://www.cubicweb.org/project/cubicweb-timeline
+
+API changes
+-----------
+
+* req.set_cookie's "expires" argument, if not None, is expected to be a
+ date or a datetime in UTC. It was previously interpreted as localtime
+ with the UTC offset the server started in, which was inconsistent (we
+ are not aware of any users of that API).
+
+* the way to run tests on a postgresql backend has changed slightly, use
+ cubicweb.devtools.{start,stop}pgcluster in setUpModule and tearDownModule
+
+* the Connection and ClientConnection objects introduced in CubicWeb 3.19 have
+ been unified. To connect to a repository, use::
+
+ session = repo.new_session(login, password=...)
+ with session.new_cnx() as cnx:
+ cnx.execute(...)
+
+ In tests, the 'repo_cnx' and 'client_cnx' methods of RepoAccess are now
+ aliases to 'cnx'.
+
+Deprecated code drops
+---------------------
+
+* the user_callback api has been removed; people should use plain
+ ajax functions instead
+
+* the `Pyro` and `Zmq-pickle` remote repository access methods have
+ been entirely removed (emerging alternatives such as rqlcontroller
+ and cwclientlib should be used instead). Note that as a side effect,
+ "repository-only" instances (i.e. without a http component) are no
+ longer possible. If you have any such instances, you will need to
+ rename the configuration file from repository.conf to all-in-one.conf
+ and run ``cubicweb-ctl upgrade`` to update it. Likewise, remote cubicweb-ctl
+ shell is no longer available.
+
+* the old (deprecated since 3.19) `DBAPI` api is completely removed
+
+* cubicweb.toolsutils.config_connect() has been removed
--- a/doc/changes/index.rst Tue Jun 23 17:04:40 2015 +0200
+++ b/doc/changes/index.rst Thu Jul 02 19:54:25 2015 +0200
@@ -4,10 +4,11 @@
.. toctree::
:maxdepth: 1
- 3.14
- 3.15
- 3.16
+ 3.21
+ 3.20
+ 3.19
+ 3.18
3.17
- 3.18
- 3.19
- 3.20
+ 3.16
+ 3.15
+ 3.14
--- a/doc/conf.py Tue Jun 23 17:04:40 2015 +0200
+++ b/doc/conf.py Thu Jul 02 19:54:25 2015 +0200
@@ -1,5 +1,5 @@
# -*- coding: utf-8 -*-
-# copyright 2003-2014 LOGILAB S.A. (Paris, FRANCE), all rights reserved.
+# copyright 2003-2015 LOGILAB S.A. (Paris, FRANCE), all rights reserved.
# contact http://www.logilab.fr/ -- mailto:contact@logilab.fr
#
# This file is part of CubicWeb.
@@ -69,7 +69,7 @@
# General substitutions.
project = 'CubicWeb'
-copyright = '2001-2014, Logilab'
+copyright = '2001-2015, Logilab'
# The default replacements for |version| and |release|, also used in various
# other places throughout the built documents.
@@ -88,9 +88,11 @@
# List of documents that shouldn't be included in the build.
unused_docs = []
-# List of directories, relative to source directories, that shouldn't be searched
-# for source files.
-#exclude_dirs = []
+# A list of glob-style patterns that should be excluded when looking
+# for source files. [1] They are matched against the source file names
+# relative to the source directory, using slashes as directory
+# separators on all platforms.
+exclude_patterns = ['book/_maybe_to_integrate']
# The reST default role (used for this markup: `text`) to use for all documents.
#default_role = None
--- a/doc/dev/features_list.rst Tue Jun 23 17:04:40 2015 +0200
+++ b/doc/dev/features_list.rst Thu Jul 02 19:54:25 2015 +0200
@@ -87,7 +87,7 @@
| vregistry - debugging selection | 2 | 1 |
+--------------------------------------------------------------------+----+----+
| entities - interfaces | 2 | ? |
-| entities - customization (dc_,...) | 2 | ? |
+| entities - customization (`dc_`, ...) | 2 | ? |
| entities - app logic | 2 | 2 |
| entities - orm configuration | 2 | 1 |
| entities - pluggable mixins | 1 | 0 |
--- a/doc/index.rst Tue Jun 23 17:04:40 2015 +0200
+++ b/doc/index.rst Thu Jul 02 19:54:25 2015 +0200
@@ -76,6 +76,7 @@
book/additionnal_services/index
book/annexes/index
+
Tutorial
~~~~~~~~
@@ -94,6 +95,16 @@
tutorials/advanced/index
tutorials/tools/windmill.rst
tutorials/textreports/index
+ tutorials/dataimport/index
+
+
+Changes
+~~~~~~~
+
+.. toctree::
+ :maxdepth: 2
+
+ changes/index
Reference documentation
--- a/doc/tutorials/dataimport/data_import_tutorial.rst Tue Jun 23 17:04:40 2015 +0200
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,646 +0,0 @@
-Importing relational data into a CubicWeb instance
-==================================================
-
-Introduction
-~~~~~~~~~~~~
-
-This tutorial explains how to import data from an external source (e.g. a collection of files)
-into a CubicWeb cube instance.
-
-First, once we know the format of the data we wish to import, we devise a
-*data model*, that is, a CubicWeb (Yams) schema which reflects the way the data
-is structured. This schema is implemented in the ``schema.py`` file.
-In this tutorial, we will describe such a schema for a particular data set,
-the Diseasome data (see below).
-
-Once the schema is defined, we create a cube and an instance.
-The cube is a specification of an application, whereas an instance
-is the application per se.
-
-Once the schema is defined and the instance is created, the import can be performed, via
-the following steps:
-
-1. Build a custom parser for the data to be imported. Thus, one obtains a Python
- memory representation of the data.
-
-2. Map the parsed data to the data model defined in ``schema.py``.
-
-3. Perform the actual import of the data. This comes down to "populating"
- the data model with the memory representation obtained at 1, according to
- the mapping defined at 2.
-
-This tutorial illustrates all the above steps in the context of relational data
-stored in the RDF format.
-
-More specifically, we describe the import of Diseasome_ RDF/OWL data.
-
-.. _Diseasome: http://datahub.io/dataset/fu-berlin-diseasome
-
-Building a data model
-~~~~~~~~~~~~~~~~~~~~~
-
-The first thing to do when using CubicWeb for creating an application from scratch
-is to devise a *data model*, that is, a relational representation of the problem to be
-modeled or of the structure of the data to be imported.
-
-In such a schema, we define
-an entity type (``EntityType`` objects) for each type of entity to import. Each such type
-has several attributes. If the attributes are of known CubicWeb (Yams) types, viz. numbers,
-strings or characters, then they are defined as attributes, as e.g. ``attribute = Int()``
-for an attribute named ``attribute`` which is an integer.
-
-Each such type also has a set of
-relations, which are defined like the attributes, except that they represent, in fact,
-relations between the entities of the type under discussion and the objects of a type which
-is specified in the relation definition.
-
-For example, for the Diseasome data, we have two types of entities, genes and diseases.
-Thus, we create two classes which inherit from ``EntityType``::
-
- class Disease(EntityType):
- # Corresponds to http://www.w3.org/2000/01/rdf-schema#label
- label = String(maxsize=512, fulltextindexed=True)
- ...
-
- #Corresponds to http://www4.wiwiss.fu-berlin.de/diseasome/resource/diseasome/associatedGene
- associated_genes = SubjectRelation('Gene', cardinality='**')
- ...
-
- #Corresponds to 'http://www4.wiwiss.fu-berlin.de/diseasome/resource/diseasome/chromosomalLocation'
- chromosomal_location = SubjectRelation('ExternalUri', cardinality='?*', inlined=True)
-
-
- class Gene(EntityType):
- ...
-
-In this schema, there are attributes whose values are numbers or strings. Thus, they are
-defined by using the CubicWeb / Yams primitive types, e.g., ``label = String(maxsize=12)``.
-These types can have several constraints or attributes, such as ``maxsize``.
-There are also relations, either between the entity types themselves, or between them
-and a CubicWeb type, ``ExternalUri``. The latter defines a class of URI objects in
-CubicWeb. For instance, the ``chromosomal_location`` attribute is a relation between
-a ``Disease`` entity and an ``ExternalUri`` entity. The relation is marked by the CubicWeb /
-Yams ``SubjectRelation`` method. The latter can have several optional keyword arguments, such as
-``cardinality`` which specifies the number of subjects and objects related by the relation type
-specified. For example, the ``'?*'`` cardinality in the ``chromosomal_relation`` relation type says
-that zero or more ``Disease`` entities are related to zero or one ``ExternalUri`` entities.
-In other words, a ``Disease`` entity is related to at most one ``ExternalUri`` entity via the
-``chromosomal_location`` relation type, and that we can have zero or more ``Disease`` entities in the
-data base.
-For a relation between the entity types themselves, the ``associated_genes`` between a ``Disease``
-entity and a ``Gene`` entity is defined, so that any number of ``Gene`` entities can be associated
-to a ``Disease``, and there can be any number of ``Disease`` s if a ``Gene`` exists.
-
-Of course, before being able to use the CubicWeb / Yams built-in objects, we need to import them::
-
-
- from yams.buildobjs import EntityType, SubjectRelation, String, Int
- from cubicweb.schemas.base import ExternalUri
-
-Building a custom data parser
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-The data we wish to import is structured in the RDF format,
-as a text file containing a set of lines.
-On each line, there are three fields.
-The first two fields are URIs ("Universal Resource Identifiers").
-The third field is either an URI or a string. Each field bares a particular meaning:
-
-- the leftmost field is an URI that holds the entity to be imported.
- Note that the entities defined in the data model (i.e., in ``schema.py``) should
- correspond to the entities whose URIs are specified in the import file.
-
-- the middle field is an URI that holds a relation whose subject is the entity
- defined by the leftmost field. Note that this should also correspond
- to the definitions in the data model.
-
-- the rightmost field is either an URI or a string. When this field is an URI,
- it gives the object of the relation defined by the middle field.
- When the rightmost field is a string, the middle field is interpreted as an attribute
- of the subject (introduced by the leftmost field) and the rightmost field is
- interpreted as the value of the attribute.
-
-Note however that some attributes (i.e. relations whose objects are strings)
-have their objects defined as strings followed by ``^^`` and by another URI;
-we ignore this part.
-
-Let us show some examples:
-
-- of line holding an attribute definition:
- ``<http://www4.wiwiss.fu-berlin.de/diseasome/resource/genes/CYP17A1>
- <http://www.w3.org/2000/01/rdf-schema#label> "CYP17A1" .``
- The line contains the definition of the ``label`` attribute of an
- entity of type ``gene``. The value of ``label`` is '``CYP17A1``'.
-
-- of line holding a relation definition:
- ``<http://www4.wiwiss.fu-berlin.de/diseasome/resource/diseases/1>
- <http://www4.wiwiss.fu-berlin.de/diseasome/resource/diseasome/associatedGene>
- <http://www4.wiwiss.fu-berlin.de/diseasome/resource/genes/HADH2> .``
- The line contains the definition of the ``associatedGene`` relation between
- a ``disease`` subject entity identified by ``1`` and a ``gene`` object
- entity defined by ``HADH2``.
-
-Thus, for parsing the data, we can (:note: see the ``diseasome_parser`` module):
-
-1. define a couple of regular expressions for parsing the two kinds of lines,
- ``RE_ATTS`` for parsing the attribute definitions, and ``RE_RELS`` for parsing
- the relation definitions.
-
-2. define a function that iterates through the lines of the file and retrieves
- (``yield`` s) a (subject, relation, object) tuple for each line.
- We called it ``_retrieve_structure`` in the ``diseasome_parser`` module.
- The function needs the file name and the types for which information
- should be retrieved.
-
-Alternatively, instead of hand-making the parser, one could use the RDF parser provided
-in the ``dataio`` cube.
-
-.. XXX To further study and detail the ``dataio`` cube usage.
-
-Once we get to have the (subject, relation, object) triples, we need to map them into
-the data model.
-
-
-Mapping the data to the schema
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-In the case of diseasome data, we can just define two dictionaries for mapping
-the names of the relations as extracted by the parser, to the names of the relations
-as defined in the ``schema.py`` data model. In the ``diseasome_parser`` module
-they are called ``MAPPING_ATTS`` and ``MAPPING_RELS``.
-Given that the relation and attribute names are given in CamelCase in the original data,
-mappings are necessary if we follow the PEP08 when naming the attributes in the data model.
-For example, the RDF relation ``chromosomalLocation`` is mapped into the schema relation
-``chromosomal_location``.
-
-Once these mappings have been defined, we just iterate over the (subject, relation, object)
-tuples provided by the parser and we extract the entities, with their attributes and relations.
-For each entity, we thus have a dictionary with two keys, ``attributes`` and ``relations``.
-The value associated to the ``attributes`` key is a dictionary containing (attribute: value)
-pairs, where "value" is a string, plus the ``cwuri`` key / attribute holding the URI of
-the entity itself.
-The value associated to the ``relations`` key is a dictionary containing (relation: value)
-pairs, where "value" is an URI.
-This is implemented in the ``entities_from_rdf`` interface function of the module
-``diseasome_parser``. This function provides an iterator on the dictionaries containing
-the ``attributes`` and ``relations`` keys for all entities.
-
-However, this is a simple case. In real life, things can get much more complicated, and the
-mapping can be far from trivial, especially when several data sources (which can follow
-different formatting and even structuring conventions) must be mapped into the same data model.
-
-Importing the data
-~~~~~~~~~~~~~~~~~~
-
-The data import code should be placed in a Python module. Let us call it
-``diseasome_import.py``. Then, this module should be called via
-``cubicweb-ctl``, as follows::
-
- cubicweb-ctl shell diseasome_import.py -- <other arguments e.g. data file>
-
-In the import module, we should use a *store* for doing the import.
-A store is an object which provides three kinds of methods for
-importing data:
-
-- a method for importing the entities, along with the values
- of their attributes.
-- a method for importing the relations between the entities.
-- a method for committing the imports to the database.
-
-In CubicWeb, we have four stores:
-
-1. ``ObjectStore`` base class for the stores in CubicWeb.
- It only provides a skeleton for all other stores and
- provides the means for creating the memory structures
- (dictionaries) that hold the entities and the relations
- between them.
-
-2. ``RQLObjectStore``: store which uses the RQL language for performing
- database insertions and updates. It relies on all the CubicWeb hooks
- machinery, especially for dealing with security issues (database access
- permissions).
-
-2. ``NoHookRQLObjectStore``: store which uses the RQL language for
- performing database insertions and updates, but for which
- all hooks are deactivated. This implies that
- certain checks with respect to the CubicWeb / Yams schema
- (data model) are not performed. However, all SQL queries
- obtained from the RQL ones are executed in a sequential
- manner, one query per inserted entity.
-
-4. ``SQLGenObjectStore``: store which uses the SQL language directly.
- It inserts entities either sequentially, by executing an SQL query
- for each entity, or directly by using one PostGRES ``COPY FROM``
- query for a set of similarly structured entities.
-
-For really massive imports (millions or billions of entities), there
-is a cube ``dataio`` which contains another store, called
-``MassiveObjectStore``. This store is similar to ``SQLGenObjectStore``,
-except that anything related to CubicWeb is bypassed. That is, even the
-CubicWeb EID entity identifiers are not handled. This store is the fastest,
-but has a slightly different API from the other four stores mentioned above.
-Moreover, it has an important limitation, in that it doesn't insert inlined [#]_
-relations in the database.
-
-.. [#] An inlined relation is a relation defined in the schema
- with the keyword argument ``inlined=True``. Such a relation
- is inserted in the database as an attribute of the entity
- whose subject it is.
-
-In the following section we will see how to import data by using the stores
-in CubicWeb's ``dataimport`` module.
-
-Using the stores in ``dataimport``
-++++++++++++++++++++++++++++++++++
-
-``ObjectStore`` is seldom used in real life for importing data, since it is
-only the base store for the other stores and it doesn't perform an actual
-import of the data. Nevertheless, the other three stores, which import data,
-are based on ``ObjectStore`` and provide the same API.
-
-All three stores ``RQLObjectStore``, ``NoHookRQLObjectStore`` and
-``SQLGenObjectStore`` provide exactly the same API for importing data, that is
-entities and relations, in an SQL database.
-
-Before using a store, one must import the ``dataimport`` module and then initialize
-the store, with the current ``session`` as a parameter::
-
- import cubicweb.dataimport as cwdi
- ...
-
- store = cwdi.RQLObjectStore(session)
-
-Each such store provides three methods for data import:
-
-#. ``create_entity(Etype, **attributes)``, which allows us to add
- an entity of the Yams type ``Etype`` to the database. This entity's attributes
- are specified in the ``attributes`` dictionary. The method returns the entity
- created in the database. For example, we add two entities,
- a person, of ``Person`` type, and a location, of ``Location`` type::
-
- person = store.create_entity('Person', name='Toto', age='18', height='190')
-
- location = store.create_entity('Location', town='Paris', arrondissement='13')
-
-#. ``relate(subject_eid, r_type, object_eid)``, which allows us to add a relation
- of the Yams type ``r_type`` to the database. The relation's subject is an entity
- whose EID is ``subject_eid``; its object is another entity, whose EID is
- ``object_eid``. For example [#]_::
-
- store.relate(person.eid(), 'lives_in', location.eid(), **kwargs)
-
- ``kwargs`` is only used by the ``SQLGenObjectStore``'s ``relate`` method and is here
- to allow us to specify the type of the subject of the relation, when the relation is
- defined as inlined in the schema.
-
-.. [#] The ``eid`` method of an entity defined via ``create_entity`` returns
- the entity identifier as assigned by CubicWeb when creating the entity.
- This only works for entities defined via the stores in the CubicWeb's
- ``dataimport`` module.
-
- The keyword argument that is understood by ``SQLGenObjectStore`` is called
- ``subjtype`` and holds the type of the subject entity. For the example considered here,
- this comes to having [#]_::
-
- store.relate(person.eid(), 'lives_in', location.eid(), subjtype=person.cw_etype)
-
- If ``subjtype`` is not specified, then the store tries to infer the type of the subject.
- However, this doesn't always work, e.g. when there are several possible subject types
- for a given relation type.
-
-.. [#] The ``cw_etype`` attribute of an entity defined via ``create_entity`` holds
- the type of the entity just created. This only works for entities defined via
- the stores in the CubicWeb's ``dataimport`` module. In the example considered
- here, ``person.cw_etype`` holds ``'Person'``.
-
- All the other stores but ``SQLGenObjectStore`` ignore the ``kwargs`` parameters.
-
-#. ``flush()``, which allows us to perform the actual commit into the database, along
- with some cleanup operations. Ideally, this method should be called as often as
- possible, that is after each insertion in the database, so that database sessions
- are kept as atomic as possible. In practice, we usually call this method twice:
- first, after all the entities have been created, second, after all relations have
- been created.
-
- Note however that before each commit the database insertions
- have to be consistent with the schema. Thus, if, for instance,
- an entity has an attribute defined through a relation (viz.
- a ``SubjectRelation``) with a ``"1"`` or ``"+"`` object
- cardinality, we have to create the entity under discussion,
- the object entity of the relation under discussion, and the
- relation itself, before committing the additions to the database.
-
- The ``flush`` method is simply called as::
-
- store.flush().
-
-
-Using the ``MassiveObjectStore`` in the ``dataio`` cube
-+++++++++++++++++++++++++++++++++++++++++++++++++++++++
-
-This store, available in the ``dataio`` cube, allows us to
-fully dispense with the CubicWeb import mechanisms and hence
-to interact directly with the database server, via SQL queries.
-
-Moreover, these queries rely on PostGreSQL's ``COPY FROM`` instruction
-to create several entities in a single query. This brings tremendous
-performance improvements with respect to the RQL-based data insertion
-procedures.
-
-However, the API of this store is slightly different from the API of
-the stores in CubicWeb's ``dataimport`` module.
-
-Before using the store, one has to import the ``dataio`` cube's
-``dataimport`` module, then initialize the store by giving it the
-``session`` parameter::
-
- from cubes.dataio import dataimport as mcwdi
- ...
-
- store = mcwdi.MassiveObjectStore(session)
-
-The ``MassiveObjectStore`` provides six methods for inserting data
-into the database:
-
-#. ``init_rtype_table(SubjEtype, r_type, ObjEtype)``, which specifies the
- creation of the tables associated to the relation types in the database.
- Each such table has three column, the type of the subject entity, the
- type of the relation (that is, the name of the attribute in the subject
- entity which is defined via the relation), and the type of the object
- entity. For example::
-
- store.init_rtype_table('Person', 'lives_in', 'Location')
-
- Please note that these tables can be created before the entities, since
- they only specify their types, not their unique identifiers.
-
-#. ``create_entity(Etype, **attributes)``, which allows us to add new entities,
- whose attributes are given in the ``attributes`` dictionary.
- Please note however that, by default, this method does *not* return
- the created entity. The method is called, for example, as in::
-
- store.create_entity('Person', name='Toto', age='18', height='190',
- uri='http://link/to/person/toto_18_190')
- store.create_entity('Location', town='Paris', arrondissement='13',
- uri='http://link/to/location/paris_13')
-
- In order to be able to link these entities via the relations when needed,
- we must provide ourselves a means for uniquely identifying the entities.
- In general, this is done via URIs, stored in attributes like ``uri`` or
- ``cwuri``. The name of the attribute is irrelevant as long as its value is
- unique for each entity.
-
-#. ``relate_by_iid(subject_iid, r_type, object_iid)`` allows us to actually
- relate the entities uniquely identified by ``subject_iid`` and
- ``object_iid`` via a relation of type ``r_type``. For example::
-
- store.relate_by_iid('http://link/to/person/toto_18_190',
- 'lives_in',
- 'http://link/to/location/paris_13')
-
- Please note that this method does *not* work for inlined relations!
-
-#. ``convert_relations(SubjEtype, r_type, ObjEtype, subj_iid_attribute,
- obj_iid_attribute)``
- allows us to actually insert
- the relations in the database. At one call of this method, one inserts
- all the relations of type ``rtype`` between entities of given types.
- ``subj_iid_attribute`` and ``object_iid_attribute`` are the names
- of the attributes which store the unique identifiers of the entities,
- as assigned by the user. These names can be identical, as long as
- their values are unique. For example, for inserting all relations
- of type ``lives_in`` between ``People`` and ``Location`` entities,
- we write::
-
- store.convert_relations('Person', 'lives_in', 'Location', 'uri', 'uri')
-
-#. ``flush()`` performs the actual commit in the database. It only needs
- to be called after ``create_entity`` and ``relate_by_iid`` calls.
- Please note that ``relate_by_iid`` does *not* perform insertions into
- the database, hence calling ``flush()`` for it would have no effect.
-
-#. ``cleanup()`` performs database cleanups, by removing temporary tables.
- It should only be called at the end of the import.
-
-
-
-.. XXX to add smth on the store's parameter initialization.
-
-
-
-Application to the Diseasome data
-+++++++++++++++++++++++++++++++++
-
-Import setup
-############
-
-We define an import function, ``diseasome_import``, which does basically four things:
-
-#. creates and initializes the store to be used, via a line such as::
-
- store = cwdi.SQLGenObjectStore(session)
-
- where ``cwdi`` is the imported ``cubicweb.dataimport`` or
- ``cubes.dataio.dataimport``.
-
-#. calls the diseasome parser, that is, the ``entities_from_rdf`` function in the
- ``diseasome_parser`` module and iterates on its result, in a line such as::
-
- for entity, relations in parser.entities_from_rdf(filename, ('gene', 'disease')):
-
- where ``parser`` is the imported ``diseasome_parser`` module, and ``filename`` is the
- name of the file containing the data (with its path), e.g. ``../data/diseasome_dump.nt``.
-
-#. creates the entities to be inserted in the database; for Diseasome, there are two
- kinds of entities:
-
- #. entities defined in the data model, viz. ``Gene`` and ``Disease`` in our case.
- #. entities which are built in CubicWeb / Yams, viz. ``ExternalUri`` which define
- URIs.
-
- As we are working with RDF data, each entity is defined through a series of URIs. Hence,
- each "relational attribute" [#]_ of an entity is defined via an URI, that is, in CubicWeb
- terms, via an ``ExternalUri`` entity. The entities are created, in the loop presented above,
- as such::
-
- ent = store.create_entity(etype, **entity)
-
- where ``etype`` is the appropriate entity type, either ``Gene`` or ``Disease``.
-
-.. [#] By "relational attribute" we denote an attribute (of an entity) which
- is defined through a relation, e.g. the ``chromosomal_location`` attribute
- of ``Disease`` entities, which is defined through a relation between a
- ``Disease`` and an ``ExternalUri``.
-
- The ``ExternalUri`` entities are as many as URIs in the data file. For them, we define a unique
- attribute, ``uri``, which holds the URI under discussion::
-
- extu = store.create_entity('ExternalUri', uri="http://path/of/the/uri")
-
-#. creates the relations between the entities. We have relations between:
-
- #. entities defined in the schema, e.g. between ``Disease`` and ``Gene``
- entities, such as the ``associated_genes`` relation defined for
- ``Disease`` entities.
- #. entities defined in the schema and ``ExternalUri`` entities, such as ``gene_id``.
-
- The way relations are added to the database depends on the store:
-
- - for the stores in the CubicWeb ``dataimport`` module, we only use
- ``store.relate``, in
- another loop, on the relations (that is, a
- loop inside the preceding one, mentioned at step 2)::
-
- for rtype, rels in relations.iteritems():
- ...
-
- store.relate(ent.eid(), rtype, extu.eid(), **kwargs)
-
- where ``kwargs`` is a dictionary designed to accommodate the need for specifying
- the type of the subject entity of the relation, when the relation is inlined and
- ``SQLGenObjectStore`` is used. For example::
-
- ...
- store.relate(ent.eid(), 'chromosomal_location', extu.eid(), subjtype='Disease')
-
- - for the ``MassiveObjectStore`` in the ``dataio`` cube's ``dataimport`` module,
- the relations are created in three steps:
-
- #. first, a table is created for each relation type, as in::
-
- ...
- store.init_rtype_table(ent.cw_etype, rtype, extu.cw_etype)
-
- which comes down to lines such as::
-
- store.init_rtype_table('Disease', 'associated_genes', 'Gene')
- store.init_rtype_table('Gene', 'gene_id', 'ExternalUri')
-
- #. second, the URI of each entity will be used as its identifier, in the
- ``relate_by_iid`` method, such as::
-
- disease_uri = 'http://www4.wiwiss.fu-berlin.de/diseasome/resource/diseases/3'
- gene_uri = '<http://www4.wiwiss.fu-berlin.de/diseasome/resource/genes/HSD3B2'
- store.relate_by_iid(disease_uri, 'associated_genes', gene_uri)
-
- #. third, the relations for each relation type will be added to the database,
- via the ``convert_relations`` method, such as in::
-
- store.convert_relations('Disease', 'associated_genes', 'Gene', 'cwuri', 'cwuri')
-
- and::
-
- store.convert_relations('Gene', 'hgnc_id', 'ExternalUri', 'cwuri', 'uri')
-
- where ``cwuri`` and ``uri`` are the attributes which store the URIs of the entities
- defined in the data model, and of the ``ExternalUri`` entities, respectively.
-
-#. flushes all relations and entities::
-
- store.flush()
-
- which performs the actual commit of the inserted entities and relations in the database.
-
-If the ``MassiveObjectStore`` is used, then a cleanup of temporary SQL tables should be performed
-at the end of the import::
-
- store.cleanup()
-
-Timing benchmarks
-#################
-
-In order to time the import script, we just decorate the import function with the ``timed``
-decorator::
-
- from logilab.common.decorators import timed
- ...
-
- @timed
- def diseasome_import(session, filename):
- ...
-
-After running the import function as shown in the "Importing the data" section, we obtain two time measurements::
-
- diseasome_import clock: ... / time: ...
-
-Here, the meanings of these measurements are [#]_:
-
-- ``clock`` is the time spent by CubicWeb, on the server side (i.e. hooks and data pre- / post-processing on SQL
- queries),
-
-- ``time`` is the sum between ``clock`` and the time spent in PostGreSQL.
-
-.. [#] The meanings of the ``clock`` and ``time`` measurements, when using the ``@timed``
- decorators, were taken from `a blog post on massive data import in CubicWeb`_.
-
-.. _a blog post on massive data import in CubicWeb: http://www.cubicweb.org/blogentry/2116712
-
-The import function is put in an import module, named ``diseasome_import`` here. The module is called
-directly from the CubicWeb shell, as follows::
-
- cubicweb-ctl shell diseasome_instance diseasome_import.py \
- -- -df diseasome_import_file.nt -st StoreName
-
-The module accepts two arguments:
-
-- the data file, introduced by ``-df [--datafile]``, and
-- the store, introduced by ``-st [--store]``.
-
-The timings (in seconds) for different stores are given in the following table, for
-importing 4213 ``Disease`` entities and 3919 ``Gene`` entities with the import module
-just described:
-
-+--------------------------+------------------------+--------------------------------+------------+
-| Store | CubicWeb time (clock) | PostGreSQL time (time - clock) | Total time |
-+==========================+========================+================================+============+
-| ``RQLObjectStore`` | 225.98 | 62.05 | 288.03 |
-+--------------------------+------------------------+--------------------------------+------------+
-| ``NoHookRQLObjectStore`` | 62.73 | 51.38 | 114.11 |
-+--------------------------+------------------------+--------------------------------+------------+
-| ``SQLGenObjectStore`` | 20.41 | 11.03 | 31.44 |
-+--------------------------+------------------------+--------------------------------+------------+
-| ``MassiveObjectStore`` | 4.84 | 6.93 | 11.77 |
-+--------------------------+------------------------+--------------------------------+------------+
-
-
-Conclusions
-~~~~~~~~~~~
-
-In this tutorial we have seen how to import data in a CubicWeb application instance. We have first seen how to
-create a schema, then how to create a parser of the data and a mapping of the data to the schema.
-Finally, we have seen four ways of importing data into CubicWeb.
-
-Three of those are integrated into CubicWeb, namely the ``RQLObjectStore``, ``NoHookRQLObjectStore`` and
-``SQLGenObjectStore`` stores, which have a common API:
-
-- ``RQLObjectStore`` is by far the slowest, especially its time spent on the
- CubicWeb side, and so it should be used only for small amounts of
- "sensitive" data (i.e. where security is a concern).
-
-- ``NoHookRQLObjectStore`` slashes by almost four the time spent on the CubicWeb side,
- but is also quite slow; on the PostGres side it is as slow as the previous store.
- It should be used for data where security is not a concern,
- but consistency (with the data model) is.
-
-- ``SQLGenObjectStore`` slashes by three the time spent on the CubicWeb side and by five the time
- spent on the PostGreSQL side. It should be used for relatively great amounts of data, where
- security and data consistency are not a concern. Compared to the previous store, it has the
- disadvantage that, for inlined relations, we must specify their subjects' types.
-
-For really huge amounts of data there is a fourth store, ``MassiveObjectStore``, available
-from the ``dataio`` cube. It provides a blazing performance with respect to all other stores:
-it is almost 25 times faster than ``RQLObjectStore`` and almost three times faster than
-``SQLGenObjectStore``. However, it has a few usage caveats that should be taken into account:
-
-#. it cannot insert relations defined as inlined in the schema,
-#. no security or consistency check is performed on the data,
-#. its API is slightly different from the other stores.
-
-Hence, this store should be used when security and data consistency are not a concern,
-and there are no inlined relations in the schema.
-
-
-
-
-
-
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/doc/tutorials/dataimport/index.rst Thu Jul 02 19:54:25 2015 +0200
@@ -0,0 +1,646 @@
+Importing relational data into a CubicWeb instance
+==================================================
+
+Introduction
+~~~~~~~~~~~~
+
+This tutorial explains how to import data from an external source (e.g. a collection of files)
+into a CubicWeb cube instance.
+
+First, once we know the format of the data we wish to import, we devise a
+*data model*, that is, a CubicWeb (Yams) schema which reflects the way the data
+is structured. This schema is implemented in the ``schema.py`` file.
+In this tutorial, we will describe such a schema for a particular data set,
+the Diseasome data (see below).
+
+Once the schema is defined, we create a cube and an instance.
+The cube is a specification of an application, whereas an instance
+is the application per se.
+
+Once the schema is defined and the instance is created, the import can be performed, via
+the following steps:
+
+1. Build a custom parser for the data to be imported. Thus, one obtains a Python
+ memory representation of the data.
+
+2. Map the parsed data to the data model defined in ``schema.py``.
+
+3. Perform the actual import of the data. This comes down to "populating"
+ the data model with the memory representation obtained at 1, according to
+ the mapping defined at 2.
+
+This tutorial illustrates all the above steps in the context of relational data
+stored in the RDF format.
+
+More specifically, we describe the import of Diseasome_ RDF/OWL data.
+
+.. _Diseasome: http://datahub.io/dataset/fu-berlin-diseasome
+
+Building a data model
+~~~~~~~~~~~~~~~~~~~~~
+
+The first thing to do when using CubicWeb for creating an application from scratch
+is to devise a *data model*, that is, a relational representation of the problem to be
+modeled or of the structure of the data to be imported.
+
+In such a schema, we define
+an entity type (``EntityType`` objects) for each type of entity to import. Each such type
+has several attributes. If the attributes are of known CubicWeb (Yams) types, viz. numbers,
+strings or characters, then they are defined as attributes, as e.g. ``attribute = Int()``
+for an attribute named ``attribute`` which is an integer.
+
+Each such type also has a set of
+relations, which are defined like the attributes, except that they represent, in fact,
+relations between the entities of the type under discussion and the objects of a type which
+is specified in the relation definition.
+
+For example, for the Diseasome data, we have two types of entities, genes and diseases.
+Thus, we create two classes which inherit from ``EntityType``::
+
+ class Disease(EntityType):
+ # Corresponds to http://www.w3.org/2000/01/rdf-schema#label
+ label = String(maxsize=512, fulltextindexed=True)
+ ...
+
+ #Corresponds to http://www4.wiwiss.fu-berlin.de/diseasome/resource/diseasome/associatedGene
+ associated_genes = SubjectRelation('Gene', cardinality='**')
+ ...
+
+ #Corresponds to 'http://www4.wiwiss.fu-berlin.de/diseasome/resource/diseasome/chromosomalLocation'
+ chromosomal_location = SubjectRelation('ExternalUri', cardinality='?*', inlined=True)
+
+
+ class Gene(EntityType):
+ ...
+
+In this schema, there are attributes whose values are numbers or strings. Thus, they are
+defined by using the CubicWeb / Yams primitive types, e.g., ``label = String(maxsize=12)``.
+These types can have several constraints or attributes, such as ``maxsize``.
+There are also relations, either between the entity types themselves, or between them
+and a CubicWeb type, ``ExternalUri``. The latter defines a class of URI objects in
+CubicWeb. For instance, the ``chromosomal_location`` attribute is a relation between
+a ``Disease`` entity and an ``ExternalUri`` entity. The relation is marked by the CubicWeb /
+Yams ``SubjectRelation`` method. The latter can have several optional keyword arguments, such as
+``cardinality`` which specifies the number of subjects and objects related by the relation type
+specified. For example, the ``'?*'`` cardinality in the ``chromosomal_relation`` relation type says
+that zero or more ``Disease`` entities are related to zero or one ``ExternalUri`` entities.
+In other words, a ``Disease`` entity is related to at most one ``ExternalUri`` entity via the
+``chromosomal_location`` relation type, and that we can have zero or more ``Disease`` entities in the
+data base.
+For a relation between the entity types themselves, the ``associated_genes`` between a ``Disease``
+entity and a ``Gene`` entity is defined, so that any number of ``Gene`` entities can be associated
+to a ``Disease``, and there can be any number of ``Disease`` s if a ``Gene`` exists.
+
+Of course, before being able to use the CubicWeb / Yams built-in objects, we need to import them::
+
+
+ from yams.buildobjs import EntityType, SubjectRelation, String, Int
+ from cubicweb.schemas.base import ExternalUri
+
+Building a custom data parser
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The data we wish to import is structured in the RDF format,
+as a text file containing a set of lines.
+On each line, there are three fields.
+The first two fields are URIs ("Universal Resource Identifiers").
+The third field is either an URI or a string. Each field bares a particular meaning:
+
+- the leftmost field is an URI that holds the entity to be imported.
+ Note that the entities defined in the data model (i.e., in ``schema.py``) should
+ correspond to the entities whose URIs are specified in the import file.
+
+- the middle field is an URI that holds a relation whose subject is the entity
+ defined by the leftmost field. Note that this should also correspond
+ to the definitions in the data model.
+
+- the rightmost field is either an URI or a string. When this field is an URI,
+ it gives the object of the relation defined by the middle field.
+ When the rightmost field is a string, the middle field is interpreted as an attribute
+ of the subject (introduced by the leftmost field) and the rightmost field is
+ interpreted as the value of the attribute.
+
+Note however that some attributes (i.e. relations whose objects are strings)
+have their objects defined as strings followed by ``^^`` and by another URI;
+we ignore this part.
+
+Let us show some examples:
+
+- of line holding an attribute definition:
+ ``<http://www4.wiwiss.fu-berlin.de/diseasome/resource/genes/CYP17A1>
+ <http://www.w3.org/2000/01/rdf-schema#label> "CYP17A1" .``
+ The line contains the definition of the ``label`` attribute of an
+ entity of type ``gene``. The value of ``label`` is '``CYP17A1``'.
+
+- of line holding a relation definition:
+ ``<http://www4.wiwiss.fu-berlin.de/diseasome/resource/diseases/1>
+ <http://www4.wiwiss.fu-berlin.de/diseasome/resource/diseasome/associatedGene>
+ <http://www4.wiwiss.fu-berlin.de/diseasome/resource/genes/HADH2> .``
+ The line contains the definition of the ``associatedGene`` relation between
+ a ``disease`` subject entity identified by ``1`` and a ``gene`` object
+ entity defined by ``HADH2``.
+
+Thus, for parsing the data, we can (:note: see the ``diseasome_parser`` module):
+
+1. define a couple of regular expressions for parsing the two kinds of lines,
+ ``RE_ATTS`` for parsing the attribute definitions, and ``RE_RELS`` for parsing
+ the relation definitions.
+
+2. define a function that iterates through the lines of the file and retrieves
+ (``yield`` s) a (subject, relation, object) tuple for each line.
+ We called it ``_retrieve_structure`` in the ``diseasome_parser`` module.
+ The function needs the file name and the types for which information
+ should be retrieved.
+
+Alternatively, instead of hand-making the parser, one could use the RDF parser provided
+in the ``dataio`` cube.
+
+.. XXX To further study and detail the ``dataio`` cube usage.
+
+Once we get to have the (subject, relation, object) triples, we need to map them into
+the data model.
+
+
+Mapping the data to the schema
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+In the case of diseasome data, we can just define two dictionaries for mapping
+the names of the relations as extracted by the parser, to the names of the relations
+as defined in the ``schema.py`` data model. In the ``diseasome_parser`` module
+they are called ``MAPPING_ATTS`` and ``MAPPING_RELS``.
+Given that the relation and attribute names are given in CamelCase in the original data,
+mappings are necessary if we follow the PEP08 when naming the attributes in the data model.
+For example, the RDF relation ``chromosomalLocation`` is mapped into the schema relation
+``chromosomal_location``.
+
+Once these mappings have been defined, we just iterate over the (subject, relation, object)
+tuples provided by the parser and we extract the entities, with their attributes and relations.
+For each entity, we thus have a dictionary with two keys, ``attributes`` and ``relations``.
+The value associated to the ``attributes`` key is a dictionary containing (attribute: value)
+pairs, where "value" is a string, plus the ``cwuri`` key / attribute holding the URI of
+the entity itself.
+The value associated to the ``relations`` key is a dictionary containing (relation: value)
+pairs, where "value" is an URI.
+This is implemented in the ``entities_from_rdf`` interface function of the module
+``diseasome_parser``. This function provides an iterator on the dictionaries containing
+the ``attributes`` and ``relations`` keys for all entities.
+
+However, this is a simple case. In real life, things can get much more complicated, and the
+mapping can be far from trivial, especially when several data sources (which can follow
+different formatting and even structuring conventions) must be mapped into the same data model.
+
+Importing the data
+~~~~~~~~~~~~~~~~~~
+
+The data import code should be placed in a Python module. Let us call it
+``diseasome_import.py``. Then, this module should be called via
+``cubicweb-ctl``, as follows::
+
+ cubicweb-ctl shell diseasome_import.py -- <other arguments e.g. data file>
+
+In the import module, we should use a *store* for doing the import.
+A store is an object which provides three kinds of methods for
+importing data:
+
+- a method for importing the entities, along with the values
+ of their attributes.
+- a method for importing the relations between the entities.
+- a method for committing the imports to the database.
+
+In CubicWeb, we have four stores:
+
+1. ``ObjectStore`` base class for the stores in CubicWeb.
+ It only provides a skeleton for all other stores and
+ provides the means for creating the memory structures
+ (dictionaries) that hold the entities and the relations
+ between them.
+
+2. ``RQLObjectStore``: store which uses the RQL language for performing
+ database insertions and updates. It relies on all the CubicWeb hooks
+ machinery, especially for dealing with security issues (database access
+ permissions).
+
+2. ``NoHookRQLObjectStore``: store which uses the RQL language for
+ performing database insertions and updates, but for which
+ all hooks are deactivated. This implies that
+ certain checks with respect to the CubicWeb / Yams schema
+ (data model) are not performed. However, all SQL queries
+ obtained from the RQL ones are executed in a sequential
+ manner, one query per inserted entity.
+
+4. ``SQLGenObjectStore``: store which uses the SQL language directly.
+ It inserts entities either sequentially, by executing an SQL query
+ for each entity, or directly by using one PostGRES ``COPY FROM``
+ query for a set of similarly structured entities.
+
+For really massive imports (millions or billions of entities), there
+is a cube ``dataio`` which contains another store, called
+``MassiveObjectStore``. This store is similar to ``SQLGenObjectStore``,
+except that anything related to CubicWeb is bypassed. That is, even the
+CubicWeb EID entity identifiers are not handled. This store is the fastest,
+but has a slightly different API from the other four stores mentioned above.
+Moreover, it has an important limitation, in that it doesn't insert inlined [#]_
+relations in the database.
+
+.. [#] An inlined relation is a relation defined in the schema
+ with the keyword argument ``inlined=True``. Such a relation
+ is inserted in the database as an attribute of the entity
+ whose subject it is.
+
+In the following section we will see how to import data by using the stores
+in CubicWeb's ``dataimport`` module.
+
+Using the stores in ``dataimport``
+++++++++++++++++++++++++++++++++++
+
+``ObjectStore`` is seldom used in real life for importing data, since it is
+only the base store for the other stores and it doesn't perform an actual
+import of the data. Nevertheless, the other three stores, which import data,
+are based on ``ObjectStore`` and provide the same API.
+
+All three stores ``RQLObjectStore``, ``NoHookRQLObjectStore`` and
+``SQLGenObjectStore`` provide exactly the same API for importing data, that is
+entities and relations, in an SQL database.
+
+Before using a store, one must import the ``dataimport`` module and then initialize
+the store, with the current ``session`` as a parameter::
+
+ import cubicweb.dataimport as cwdi
+ ...
+
+ store = cwdi.RQLObjectStore(session)
+
+Each such store provides three methods for data import:
+
+#. ``create_entity(Etype, **attributes)``, which allows us to add
+ an entity of the Yams type ``Etype`` to the database. This entity's attributes
+ are specified in the ``attributes`` dictionary. The method returns the entity
+ created in the database. For example, we add two entities,
+ a person, of ``Person`` type, and a location, of ``Location`` type::
+
+ person = store.create_entity('Person', name='Toto', age='18', height='190')
+
+ location = store.create_entity('Location', town='Paris', arrondissement='13')
+
+#. ``relate(subject_eid, r_type, object_eid)``, which allows us to add a relation
+ of the Yams type ``r_type`` to the database. The relation's subject is an entity
+ whose EID is ``subject_eid``; its object is another entity, whose EID is
+ ``object_eid``. For example [#]_::
+
+ store.relate(person.eid(), 'lives_in', location.eid(), **kwargs)
+
+ ``kwargs`` is only used by the ``SQLGenObjectStore``'s ``relate`` method and is here
+ to allow us to specify the type of the subject of the relation, when the relation is
+ defined as inlined in the schema.
+
+.. [#] The ``eid`` method of an entity defined via ``create_entity`` returns
+ the entity identifier as assigned by CubicWeb when creating the entity.
+ This only works for entities defined via the stores in the CubicWeb's
+ ``dataimport`` module.
+
+ The keyword argument that is understood by ``SQLGenObjectStore`` is called
+ ``subjtype`` and holds the type of the subject entity. For the example considered here,
+ this comes to having [#]_::
+
+ store.relate(person.eid(), 'lives_in', location.eid(), subjtype=person.cw_etype)
+
+ If ``subjtype`` is not specified, then the store tries to infer the type of the subject.
+ However, this doesn't always work, e.g. when there are several possible subject types
+ for a given relation type.
+
+.. [#] The ``cw_etype`` attribute of an entity defined via ``create_entity`` holds
+ the type of the entity just created. This only works for entities defined via
+ the stores in the CubicWeb's ``dataimport`` module. In the example considered
+ here, ``person.cw_etype`` holds ``'Person'``.
+
+ All the other stores but ``SQLGenObjectStore`` ignore the ``kwargs`` parameters.
+
+#. ``flush()``, which allows us to perform the actual commit into the database, along
+ with some cleanup operations. Ideally, this method should be called as often as
+ possible, that is after each insertion in the database, so that database sessions
+ are kept as atomic as possible. In practice, we usually call this method twice:
+ first, after all the entities have been created, second, after all relations have
+ been created.
+
+ Note however that before each commit the database insertions
+ have to be consistent with the schema. Thus, if, for instance,
+ an entity has an attribute defined through a relation (viz.
+ a ``SubjectRelation``) with a ``"1"`` or ``"+"`` object
+ cardinality, we have to create the entity under discussion,
+ the object entity of the relation under discussion, and the
+ relation itself, before committing the additions to the database.
+
+ The ``flush`` method is simply called as::
+
+ store.flush().
+
+
+Using the ``MassiveObjectStore`` in the ``dataio`` cube
++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+
+This store, available in the ``dataio`` cube, allows us to
+fully dispense with the CubicWeb import mechanisms and hence
+to interact directly with the database server, via SQL queries.
+
+Moreover, these queries rely on PostGreSQL's ``COPY FROM`` instruction
+to create several entities in a single query. This brings tremendous
+performance improvements with respect to the RQL-based data insertion
+procedures.
+
+However, the API of this store is slightly different from the API of
+the stores in CubicWeb's ``dataimport`` module.
+
+Before using the store, one has to import the ``dataio`` cube's
+``dataimport`` module, then initialize the store by giving it the
+``session`` parameter::
+
+ from cubes.dataio import dataimport as mcwdi
+ ...
+
+ store = mcwdi.MassiveObjectStore(session)
+
+The ``MassiveObjectStore`` provides six methods for inserting data
+into the database:
+
+#. ``init_rtype_table(SubjEtype, r_type, ObjEtype)``, which specifies the
+ creation of the tables associated to the relation types in the database.
+ Each such table has three column, the type of the subject entity, the
+ type of the relation (that is, the name of the attribute in the subject
+ entity which is defined via the relation), and the type of the object
+ entity. For example::
+
+ store.init_rtype_table('Person', 'lives_in', 'Location')
+
+ Please note that these tables can be created before the entities, since
+ they only specify their types, not their unique identifiers.
+
+#. ``create_entity(Etype, **attributes)``, which allows us to add new entities,
+ whose attributes are given in the ``attributes`` dictionary.
+ Please note however that, by default, this method does *not* return
+ the created entity. The method is called, for example, as in::
+
+ store.create_entity('Person', name='Toto', age='18', height='190',
+ uri='http://link/to/person/toto_18_190')
+ store.create_entity('Location', town='Paris', arrondissement='13',
+ uri='http://link/to/location/paris_13')
+
+ In order to be able to link these entities via the relations when needed,
+ we must provide ourselves a means for uniquely identifying the entities.
+ In general, this is done via URIs, stored in attributes like ``uri`` or
+ ``cwuri``. The name of the attribute is irrelevant as long as its value is
+ unique for each entity.
+
+#. ``relate_by_iid(subject_iid, r_type, object_iid)`` allows us to actually
+ relate the entities uniquely identified by ``subject_iid`` and
+ ``object_iid`` via a relation of type ``r_type``. For example::
+
+ store.relate_by_iid('http://link/to/person/toto_18_190',
+ 'lives_in',
+ 'http://link/to/location/paris_13')
+
+ Please note that this method does *not* work for inlined relations!
+
+#. ``convert_relations(SubjEtype, r_type, ObjEtype, subj_iid_attribute,
+ obj_iid_attribute)``
+ allows us to actually insert
+ the relations in the database. At one call of this method, one inserts
+ all the relations of type ``rtype`` between entities of given types.
+ ``subj_iid_attribute`` and ``object_iid_attribute`` are the names
+ of the attributes which store the unique identifiers of the entities,
+ as assigned by the user. These names can be identical, as long as
+ their values are unique. For example, for inserting all relations
+ of type ``lives_in`` between ``People`` and ``Location`` entities,
+ we write::
+
+ store.convert_relations('Person', 'lives_in', 'Location', 'uri', 'uri')
+
+#. ``flush()`` performs the actual commit in the database. It only needs
+ to be called after ``create_entity`` and ``relate_by_iid`` calls.
+ Please note that ``relate_by_iid`` does *not* perform insertions into
+ the database, hence calling ``flush()`` for it would have no effect.
+
+#. ``cleanup()`` performs database cleanups, by removing temporary tables.
+ It should only be called at the end of the import.
+
+
+
+.. XXX to add smth on the store's parameter initialization.
+
+
+
+Application to the Diseasome data
++++++++++++++++++++++++++++++++++
+
+Import setup
+############
+
+We define an import function, ``diseasome_import``, which does basically four things:
+
+#. creates and initializes the store to be used, via a line such as::
+
+ store = cwdi.SQLGenObjectStore(session)
+
+ where ``cwdi`` is the imported ``cubicweb.dataimport`` or
+ ``cubes.dataio.dataimport``.
+
+#. calls the diseasome parser, that is, the ``entities_from_rdf`` function in the
+ ``diseasome_parser`` module and iterates on its result, in a line such as::
+
+ for entity, relations in parser.entities_from_rdf(filename, ('gene', 'disease')):
+
+ where ``parser`` is the imported ``diseasome_parser`` module, and ``filename`` is the
+ name of the file containing the data (with its path), e.g. ``../data/diseasome_dump.nt``.
+
+#. creates the entities to be inserted in the database; for Diseasome, there are two
+ kinds of entities:
+
+ #. entities defined in the data model, viz. ``Gene`` and ``Disease`` in our case.
+ #. entities which are built in CubicWeb / Yams, viz. ``ExternalUri`` which define
+ URIs.
+
+ As we are working with RDF data, each entity is defined through a series of URIs. Hence,
+ each "relational attribute" [#]_ of an entity is defined via an URI, that is, in CubicWeb
+ terms, via an ``ExternalUri`` entity. The entities are created, in the loop presented above,
+ as such::
+
+ ent = store.create_entity(etype, **entity)
+
+ where ``etype`` is the appropriate entity type, either ``Gene`` or ``Disease``.
+
+.. [#] By "relational attribute" we denote an attribute (of an entity) which
+ is defined through a relation, e.g. the ``chromosomal_location`` attribute
+ of ``Disease`` entities, which is defined through a relation between a
+ ``Disease`` and an ``ExternalUri``.
+
+ The ``ExternalUri`` entities are as many as URIs in the data file. For them, we define a unique
+ attribute, ``uri``, which holds the URI under discussion::
+
+ extu = store.create_entity('ExternalUri', uri="http://path/of/the/uri")
+
+#. creates the relations between the entities. We have relations between:
+
+ #. entities defined in the schema, e.g. between ``Disease`` and ``Gene``
+ entities, such as the ``associated_genes`` relation defined for
+ ``Disease`` entities.
+ #. entities defined in the schema and ``ExternalUri`` entities, such as ``gene_id``.
+
+ The way relations are added to the database depends on the store:
+
+ - for the stores in the CubicWeb ``dataimport`` module, we only use
+ ``store.relate``, in
+ another loop, on the relations (that is, a
+ loop inside the preceding one, mentioned at step 2)::
+
+ for rtype, rels in relations.iteritems():
+ ...
+
+ store.relate(ent.eid(), rtype, extu.eid(), **kwargs)
+
+ where ``kwargs`` is a dictionary designed to accommodate the need for specifying
+ the type of the subject entity of the relation, when the relation is inlined and
+ ``SQLGenObjectStore`` is used. For example::
+
+ ...
+ store.relate(ent.eid(), 'chromosomal_location', extu.eid(), subjtype='Disease')
+
+ - for the ``MassiveObjectStore`` in the ``dataio`` cube's ``dataimport`` module,
+ the relations are created in three steps:
+
+ #. first, a table is created for each relation type, as in::
+
+ ...
+ store.init_rtype_table(ent.cw_etype, rtype, extu.cw_etype)
+
+ which comes down to lines such as::
+
+ store.init_rtype_table('Disease', 'associated_genes', 'Gene')
+ store.init_rtype_table('Gene', 'gene_id', 'ExternalUri')
+
+ #. second, the URI of each entity will be used as its identifier, in the
+ ``relate_by_iid`` method, such as::
+
+ disease_uri = 'http://www4.wiwiss.fu-berlin.de/diseasome/resource/diseases/3'
+ gene_uri = '<http://www4.wiwiss.fu-berlin.de/diseasome/resource/genes/HSD3B2'
+ store.relate_by_iid(disease_uri, 'associated_genes', gene_uri)
+
+ #. third, the relations for each relation type will be added to the database,
+ via the ``convert_relations`` method, such as in::
+
+ store.convert_relations('Disease', 'associated_genes', 'Gene', 'cwuri', 'cwuri')
+
+ and::
+
+ store.convert_relations('Gene', 'hgnc_id', 'ExternalUri', 'cwuri', 'uri')
+
+ where ``cwuri`` and ``uri`` are the attributes which store the URIs of the entities
+ defined in the data model, and of the ``ExternalUri`` entities, respectively.
+
+#. flushes all relations and entities::
+
+ store.flush()
+
+ which performs the actual commit of the inserted entities and relations in the database.
+
+If the ``MassiveObjectStore`` is used, then a cleanup of temporary SQL tables should be performed
+at the end of the import::
+
+ store.cleanup()
+
+Timing benchmarks
+#################
+
+In order to time the import script, we just decorate the import function with the ``timed``
+decorator::
+
+ from logilab.common.decorators import timed
+ ...
+
+ @timed
+ def diseasome_import(session, filename):
+ ...
+
+After running the import function as shown in the "Importing the data" section, we obtain two time measurements::
+
+ diseasome_import clock: ... / time: ...
+
+Here, the meanings of these measurements are [#]_:
+
+- ``clock`` is the time spent by CubicWeb, on the server side (i.e. hooks and data pre- / post-processing on SQL
+ queries),
+
+- ``time`` is the sum between ``clock`` and the time spent in PostGreSQL.
+
+.. [#] The meanings of the ``clock`` and ``time`` measurements, when using the ``@timed``
+ decorators, were taken from `a blog post on massive data import in CubicWeb`_.
+
+.. _a blog post on massive data import in CubicWeb: http://www.cubicweb.org/blogentry/2116712
+
+The import function is put in an import module, named ``diseasome_import`` here. The module is called
+directly from the CubicWeb shell, as follows::
+
+ cubicweb-ctl shell diseasome_instance diseasome_import.py \
+ -- -df diseasome_import_file.nt -st StoreName
+
+The module accepts two arguments:
+
+- the data file, introduced by ``-df [--datafile]``, and
+- the store, introduced by ``-st [--store]``.
+
+The timings (in seconds) for different stores are given in the following table, for
+importing 4213 ``Disease`` entities and 3919 ``Gene`` entities with the import module
+just described:
+
++--------------------------+------------------------+--------------------------------+------------+
+| Store | CubicWeb time (clock) | PostGreSQL time (time - clock) | Total time |
++==========================+========================+================================+============+
+| ``RQLObjectStore`` | 225.98 | 62.05 | 288.03 |
++--------------------------+------------------------+--------------------------------+------------+
+| ``NoHookRQLObjectStore`` | 62.73 | 51.38 | 114.11 |
++--------------------------+------------------------+--------------------------------+------------+
+| ``SQLGenObjectStore`` | 20.41 | 11.03 | 31.44 |
++--------------------------+------------------------+--------------------------------+------------+
+| ``MassiveObjectStore`` | 4.84 | 6.93 | 11.77 |
++--------------------------+------------------------+--------------------------------+------------+
+
+
+Conclusions
+~~~~~~~~~~~
+
+In this tutorial we have seen how to import data in a CubicWeb application instance. We have first seen how to
+create a schema, then how to create a parser of the data and a mapping of the data to the schema.
+Finally, we have seen four ways of importing data into CubicWeb.
+
+Three of those are integrated into CubicWeb, namely the ``RQLObjectStore``, ``NoHookRQLObjectStore`` and
+``SQLGenObjectStore`` stores, which have a common API:
+
+- ``RQLObjectStore`` is by far the slowest, especially its time spent on the
+ CubicWeb side, and so it should be used only for small amounts of
+ "sensitive" data (i.e. where security is a concern).
+
+- ``NoHookRQLObjectStore`` slashes by almost four the time spent on the CubicWeb side,
+ but is also quite slow; on the PostGres side it is as slow as the previous store.
+ It should be used for data where security is not a concern,
+ but consistency (with the data model) is.
+
+- ``SQLGenObjectStore`` slashes by three the time spent on the CubicWeb side and by five the time
+ spent on the PostGreSQL side. It should be used for relatively great amounts of data, where
+ security and data consistency are not a concern. Compared to the previous store, it has the
+ disadvantage that, for inlined relations, we must specify their subjects' types.
+
+For really huge amounts of data there is a fourth store, ``MassiveObjectStore``, available
+from the ``dataio`` cube. It provides a blazing performance with respect to all other stores:
+it is almost 25 times faster than ``RQLObjectStore`` and almost three times faster than
+``SQLGenObjectStore``. However, it has a few usage caveats that should be taken into account:
+
+#. it cannot insert relations defined as inlined in the schema,
+#. no security or consistency check is performed on the data,
+#. its API is slightly different from the other stores.
+
+Hence, this store should be used when security and data consistency are not a concern,
+and there are no inlined relations in the schema.
+
+
+
+
+
+
--- a/web/views/tableview.py Tue Jun 23 17:04:40 2015 +0200
+++ b/web/views/tableview.py Thu Jul 02 19:54:25 2015 +0200
@@ -992,8 +992,10 @@
@cachedproperty
def initial_load(self):
- """We detect a bit heuristically if we are built for the first time of
- from subsequent calls by the form filter or by the pagination hooks
+ """We detect a bit heuristically if we are built for the first time or
+ from subsequent calls by the form filter or by the pagination
+ hooks.
+
"""
form = self._cw.form
return 'fromformfilter' not in form and '__start' not in form