doc/book/en/devrepo/dataimport.rst
changeset 10457 1f5026e7d848
parent 8625 7ee0752178e5
child 10460 d260722f2453
--- a/doc/book/en/devrepo/dataimport.rst	Tue Jun 23 13:08:48 2015 +0200
+++ b/doc/book/en/devrepo/dataimport.rst	Wed Jun 24 23:23:57 2015 +0200
@@ -5,29 +5,54 @@
 Dataimport
 ==========
 
-*CubicWeb* is designed to manipulate huge of amount of data, and provides helper functions to do so.
-These functions insert data within different levels of the *CubicWeb* API,
-allowing different speed/security tradeoffs. Those keeping all the *CubicWeb* hooks
-and security will be slower but the possible errors in insertion
-(bad data types, integrity error, ...) will be raised.
+*CubicWeb* is designed to manipulate huge of amount of data, and provides utilities to do so.  They
+allow to insert data within different levels of the *CubicWeb* API, allowing different
+speed/security tradeoffs. Those keeping all the *CubicWeb* hooks and security will be slower but the
+possible errors in insertion (bad data types, integrity error, ...) will be raised.
 
-These dataimport function are provided in the file `dataimport.py`.
+These data import utilities are provided in the package `cubicweb.dataimport`.
 
 All the stores have the following API::
 
-    >>> store = ObjectStore()
-    >>> user = store.create_entity('CWUser', login=u'johndoe')
-    >>> group = store.create_entity('CWUser', name=u'unknown')
-    >>> store.relate(user.eid, 'in_group', group.eid)
+    >>> user_eid = store.prepare_insert_entity('CWUser', login=u'johndoe')
+    >>> group_eid = store.prepare_insert_entity('CWUser', name=u'unknown')
+    >>> store.relate(user_eid, 'in_group', group_eid)
+    >>> store.flush()
+    >>> store.commit()
+    >>> store.finish()
+
+Some stores **require a flush** to copy data in the database, so if you want to have store
+independent code you should explicitly call it. (There may be multiple flushes during the
+process, or only one at the end if there is no memory issue). This is different from the
+commit which validates the database transaction. At last, the `finish()` method should be called in
+case the store requires additional work once everything is done.
 
+* ``prepare_insert_entity(<entity type>, **kwargs) -> eid``: given an entity
+  type, attributes and inlined relations, return the eid of the entity to be
+  inserted, *with no guarantee that anything has been inserted in database*.
+
+* ``prepare_update_entity(<entity type>, eid, **kwargs) -> None``: given an
+  entity type and eid, promise for update given attributes and inlined
+  relations *with no guarantee that anything has been inserted in database*.
+
+* ``prepare_insert_relation(eid_from, rtype, eid_to) -> None``: indicate that a
+  relation ``rtype`` should be added between entities with eids ``eid_from``
+  and ``eid_to``. Similar to ``prepare_insert_entity()``, *there is no
+  guarantee that the relation has been inserted in database*.
+
+* ``flush() -> None``: flush any temporary data to database. May be called
+  several times during an import.
+
+* ``commit() -> None``: commit the database transaction.
+
+* ``finish() -> None``: additional stuff to do after import is terminated.
 
 ObjectStore
 -----------
 
-This store keeps objects in memory for *faster* validation. It may be useful
-in development mode. However, as it will not enforce the constraints of the schema,
-it may miss some problems.
-
+This store keeps objects in memory for *faster* validation. It may be useful in development
+mode. However, as it will not enforce the constraints of the schema nor insert anything in the
+database, so it may miss some problems.
 
 
 RQLObjectStore
@@ -48,11 +73,3 @@
 This store relies on *COPY FROM*/execute many sql commands to directly push data using SQL commands
 rather than using the whole *CubicWeb* API. For now, **it only works with PostgresSQL** as it requires
 the *COPY FROM* command.
-
-The API is similar to the other stores, but **it requires a flush** after some imports to copy data
-in the database (these flushes may be multiples through the processes, or be done only once at the
-end if there is no memory issue)::
-
-    >>> store = SQLGenObjectStore(session)
-    >>> store.create_entity('Person', ...)
-    >>> store.flush()