diff -r e7ee508a8b2f -r 1f5026e7d848 doc/book/en/devrepo/dataimport.rst --- a/doc/book/en/devrepo/dataimport.rst Tue Jun 23 13:08:48 2015 +0200 +++ b/doc/book/en/devrepo/dataimport.rst Wed Jun 24 23:23:57 2015 +0200 @@ -5,29 +5,54 @@ Dataimport ========== -*CubicWeb* is designed to manipulate huge of amount of data, and provides helper functions to do so. -These functions insert data within different levels of the *CubicWeb* API, -allowing different speed/security tradeoffs. Those keeping all the *CubicWeb* hooks -and security will be slower but the possible errors in insertion -(bad data types, integrity error, ...) will be raised. +*CubicWeb* is designed to manipulate huge of amount of data, and provides utilities to do so. They +allow to insert data within different levels of the *CubicWeb* API, allowing different +speed/security tradeoffs. Those keeping all the *CubicWeb* hooks and security will be slower but the +possible errors in insertion (bad data types, integrity error, ...) will be raised. -These dataimport function are provided in the file `dataimport.py`. +These data import utilities are provided in the package `cubicweb.dataimport`. All the stores have the following API:: - >>> store = ObjectStore() - >>> user = store.create_entity('CWUser', login=u'johndoe') - >>> group = store.create_entity('CWUser', name=u'unknown') - >>> store.relate(user.eid, 'in_group', group.eid) + >>> user_eid = store.prepare_insert_entity('CWUser', login=u'johndoe') + >>> group_eid = store.prepare_insert_entity('CWUser', name=u'unknown') + >>> store.relate(user_eid, 'in_group', group_eid) + >>> store.flush() + >>> store.commit() + >>> store.finish() + +Some stores **require a flush** to copy data in the database, so if you want to have store +independent code you should explicitly call it. (There may be multiple flushes during the +process, or only one at the end if there is no memory issue). This is different from the +commit which validates the database transaction. At last, the `finish()` method should be called in +case the store requires additional work once everything is done. +* ``prepare_insert_entity(, **kwargs) -> eid``: given an entity + type, attributes and inlined relations, return the eid of the entity to be + inserted, *with no guarantee that anything has been inserted in database*. + +* ``prepare_update_entity(, eid, **kwargs) -> None``: given an + entity type and eid, promise for update given attributes and inlined + relations *with no guarantee that anything has been inserted in database*. + +* ``prepare_insert_relation(eid_from, rtype, eid_to) -> None``: indicate that a + relation ``rtype`` should be added between entities with eids ``eid_from`` + and ``eid_to``. Similar to ``prepare_insert_entity()``, *there is no + guarantee that the relation has been inserted in database*. + +* ``flush() -> None``: flush any temporary data to database. May be called + several times during an import. + +* ``commit() -> None``: commit the database transaction. + +* ``finish() -> None``: additional stuff to do after import is terminated. ObjectStore ----------- -This store keeps objects in memory for *faster* validation. It may be useful -in development mode. However, as it will not enforce the constraints of the schema, -it may miss some problems. - +This store keeps objects in memory for *faster* validation. It may be useful in development +mode. However, as it will not enforce the constraints of the schema nor insert anything in the +database, so it may miss some problems. RQLObjectStore @@ -48,11 +73,3 @@ This store relies on *COPY FROM*/execute many sql commands to directly push data using SQL commands rather than using the whole *CubicWeb* API. For now, **it only works with PostgresSQL** as it requires the *COPY FROM* command. - -The API is similar to the other stores, but **it requires a flush** after some imports to copy data -in the database (these flushes may be multiples through the processes, or be done only once at the -end if there is no memory issue):: - - >>> store = SQLGenObjectStore(session) - >>> store.create_entity('Person', ...) - >>> store.flush()