doc/book/en/devrepo/dataimport.rst
changeset 10457 1f5026e7d848
parent 8625 7ee0752178e5
child 10460 d260722f2453
equal deleted inserted replaced
10456:e7ee508a8b2f 10457:1f5026e7d848
     3 .. _dataimport:
     3 .. _dataimport:
     4 
     4 
     5 Dataimport
     5 Dataimport
     6 ==========
     6 ==========
     7 
     7 
     8 *CubicWeb* is designed to manipulate huge of amount of data, and provides helper functions to do so.
     8 *CubicWeb* is designed to manipulate huge of amount of data, and provides utilities to do so.  They
     9 These functions insert data within different levels of the *CubicWeb* API,
     9 allow to insert data within different levels of the *CubicWeb* API, allowing different
    10 allowing different speed/security tradeoffs. Those keeping all the *CubicWeb* hooks
    10 speed/security tradeoffs. Those keeping all the *CubicWeb* hooks and security will be slower but the
    11 and security will be slower but the possible errors in insertion
    11 possible errors in insertion (bad data types, integrity error, ...) will be raised.
    12 (bad data types, integrity error, ...) will be raised.
       
    13 
    12 
    14 These dataimport function are provided in the file `dataimport.py`.
    13 These data import utilities are provided in the package `cubicweb.dataimport`.
    15 
    14 
    16 All the stores have the following API::
    15 All the stores have the following API::
    17 
    16 
    18     >>> store = ObjectStore()
    17     >>> user_eid = store.prepare_insert_entity('CWUser', login=u'johndoe')
    19     >>> user = store.create_entity('CWUser', login=u'johndoe')
    18     >>> group_eid = store.prepare_insert_entity('CWUser', name=u'unknown')
    20     >>> group = store.create_entity('CWUser', name=u'unknown')
    19     >>> store.relate(user_eid, 'in_group', group_eid)
    21     >>> store.relate(user.eid, 'in_group', group.eid)
    20     >>> store.flush()
       
    21     >>> store.commit()
       
    22     >>> store.finish()
    22 
    23 
       
    24 Some stores **require a flush** to copy data in the database, so if you want to have store
       
    25 independent code you should explicitly call it. (There may be multiple flushes during the
       
    26 process, or only one at the end if there is no memory issue). This is different from the
       
    27 commit which validates the database transaction. At last, the `finish()` method should be called in
       
    28 case the store requires additional work once everything is done.
       
    29 
       
    30 * ``prepare_insert_entity(<entity type>, **kwargs) -> eid``: given an entity
       
    31   type, attributes and inlined relations, return the eid of the entity to be
       
    32   inserted, *with no guarantee that anything has been inserted in database*.
       
    33 
       
    34 * ``prepare_update_entity(<entity type>, eid, **kwargs) -> None``: given an
       
    35   entity type and eid, promise for update given attributes and inlined
       
    36   relations *with no guarantee that anything has been inserted in database*.
       
    37 
       
    38 * ``prepare_insert_relation(eid_from, rtype, eid_to) -> None``: indicate that a
       
    39   relation ``rtype`` should be added between entities with eids ``eid_from``
       
    40   and ``eid_to``. Similar to ``prepare_insert_entity()``, *there is no
       
    41   guarantee that the relation has been inserted in database*.
       
    42 
       
    43 * ``flush() -> None``: flush any temporary data to database. May be called
       
    44   several times during an import.
       
    45 
       
    46 * ``commit() -> None``: commit the database transaction.
       
    47 
       
    48 * ``finish() -> None``: additional stuff to do after import is terminated.
    23 
    49 
    24 ObjectStore
    50 ObjectStore
    25 -----------
    51 -----------
    26 
    52 
    27 This store keeps objects in memory for *faster* validation. It may be useful
    53 This store keeps objects in memory for *faster* validation. It may be useful in development
    28 in development mode. However, as it will not enforce the constraints of the schema,
    54 mode. However, as it will not enforce the constraints of the schema nor insert anything in the
    29 it may miss some problems.
    55 database, so it may miss some problems.
    30 
       
    31 
    56 
    32 
    57 
    33 RQLObjectStore
    58 RQLObjectStore
    34 --------------
    59 --------------
    35 
    60 
    46 -----------------
    71 -----------------
    47 
    72 
    48 This store relies on *COPY FROM*/execute many sql commands to directly push data using SQL commands
    73 This store relies on *COPY FROM*/execute many sql commands to directly push data using SQL commands
    49 rather than using the whole *CubicWeb* API. For now, **it only works with PostgresSQL** as it requires
    74 rather than using the whole *CubicWeb* API. For now, **it only works with PostgresSQL** as it requires
    50 the *COPY FROM* command.
    75 the *COPY FROM* command.
    51 
       
    52 The API is similar to the other stores, but **it requires a flush** after some imports to copy data
       
    53 in the database (these flushes may be multiples through the processes, or be done only once at the
       
    54 end if there is no memory issue)::
       
    55 
       
    56     >>> store = SQLGenObjectStore(session)
       
    57     >>> store.create_entity('Person', ...)
       
    58     >>> store.flush()