3 .. _dataimport: |
3 .. _dataimport: |
4 |
4 |
5 Dataimport |
5 Dataimport |
6 ========== |
6 ========== |
7 |
7 |
8 *CubicWeb* is designed to manipulate huge of amount of data, and provides helper functions to do so. |
8 *CubicWeb* is designed to manipulate huge of amount of data, and provides utilities to do so. They |
9 These functions insert data within different levels of the *CubicWeb* API, |
9 allow to insert data within different levels of the *CubicWeb* API, allowing different |
10 allowing different speed/security tradeoffs. Those keeping all the *CubicWeb* hooks |
10 speed/security tradeoffs. Those keeping all the *CubicWeb* hooks and security will be slower but the |
11 and security will be slower but the possible errors in insertion |
11 possible errors in insertion (bad data types, integrity error, ...) will be raised. |
12 (bad data types, integrity error, ...) will be raised. |
|
13 |
12 |
14 These dataimport function are provided in the file `dataimport.py`. |
13 These data import utilities are provided in the package `cubicweb.dataimport`. |
15 |
14 |
16 All the stores have the following API:: |
15 All the stores have the following API:: |
17 |
16 |
18 >>> store = ObjectStore() |
17 >>> user_eid = store.prepare_insert_entity('CWUser', login=u'johndoe') |
19 >>> user = store.create_entity('CWUser', login=u'johndoe') |
18 >>> group_eid = store.prepare_insert_entity('CWUser', name=u'unknown') |
20 >>> group = store.create_entity('CWUser', name=u'unknown') |
19 >>> store.relate(user_eid, 'in_group', group_eid) |
21 >>> store.relate(user.eid, 'in_group', group.eid) |
20 >>> store.flush() |
|
21 >>> store.commit() |
|
22 >>> store.finish() |
22 |
23 |
|
24 Some stores **require a flush** to copy data in the database, so if you want to have store |
|
25 independent code you should explicitly call it. (There may be multiple flushes during the |
|
26 process, or only one at the end if there is no memory issue). This is different from the |
|
27 commit which validates the database transaction. At last, the `finish()` method should be called in |
|
28 case the store requires additional work once everything is done. |
|
29 |
|
30 * ``prepare_insert_entity(<entity type>, **kwargs) -> eid``: given an entity |
|
31 type, attributes and inlined relations, return the eid of the entity to be |
|
32 inserted, *with no guarantee that anything has been inserted in database*. |
|
33 |
|
34 * ``prepare_update_entity(<entity type>, eid, **kwargs) -> None``: given an |
|
35 entity type and eid, promise for update given attributes and inlined |
|
36 relations *with no guarantee that anything has been inserted in database*. |
|
37 |
|
38 * ``prepare_insert_relation(eid_from, rtype, eid_to) -> None``: indicate that a |
|
39 relation ``rtype`` should be added between entities with eids ``eid_from`` |
|
40 and ``eid_to``. Similar to ``prepare_insert_entity()``, *there is no |
|
41 guarantee that the relation has been inserted in database*. |
|
42 |
|
43 * ``flush() -> None``: flush any temporary data to database. May be called |
|
44 several times during an import. |
|
45 |
|
46 * ``commit() -> None``: commit the database transaction. |
|
47 |
|
48 * ``finish() -> None``: additional stuff to do after import is terminated. |
23 |
49 |
24 ObjectStore |
50 ObjectStore |
25 ----------- |
51 ----------- |
26 |
52 |
27 This store keeps objects in memory for *faster* validation. It may be useful |
53 This store keeps objects in memory for *faster* validation. It may be useful in development |
28 in development mode. However, as it will not enforce the constraints of the schema, |
54 mode. However, as it will not enforce the constraints of the schema nor insert anything in the |
29 it may miss some problems. |
55 database, so it may miss some problems. |
30 |
|
31 |
56 |
32 |
57 |
33 RQLObjectStore |
58 RQLObjectStore |
34 -------------- |
59 -------------- |
35 |
60 |
46 ----------------- |
71 ----------------- |
47 |
72 |
48 This store relies on *COPY FROM*/execute many sql commands to directly push data using SQL commands |
73 This store relies on *COPY FROM*/execute many sql commands to directly push data using SQL commands |
49 rather than using the whole *CubicWeb* API. For now, **it only works with PostgresSQL** as it requires |
74 rather than using the whole *CubicWeb* API. For now, **it only works with PostgresSQL** as it requires |
50 the *COPY FROM* command. |
75 the *COPY FROM* command. |
51 |
|
52 The API is similar to the other stores, but **it requires a flush** after some imports to copy data |
|
53 in the database (these flushes may be multiples through the processes, or be done only once at the |
|
54 end if there is no memory issue):: |
|
55 |
|
56 >>> store = SQLGenObjectStore(session) |
|
57 >>> store.create_entity('Person', ...) |
|
58 >>> store.flush() |
|