author | Yann Voté <yann.vote@logilab.fr> |
Wed, 24 Jun 2015 23:23:57 +0200 | |
changeset 10457 | 1f5026e7d848 |
parent 8625 | 7ee0752178e5 |
child 10460 | d260722f2453 |
permissions | -rw-r--r-- |
8625
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff
changeset
|
1 |
. -*- coding: utf-8 -*- |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff
changeset
|
2 |
|
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff
changeset
|
3 |
.. _dataimport: |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff
changeset
|
4 |
|
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff
changeset
|
5 |
Dataimport |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff
changeset
|
6 |
========== |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff
changeset
|
7 |
|
10457
1f5026e7d848
[dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents:
8625
diff
changeset
|
8 |
*CubicWeb* is designed to manipulate huge of amount of data, and provides utilities to do so. They |
1f5026e7d848
[dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents:
8625
diff
changeset
|
9 |
allow to insert data within different levels of the *CubicWeb* API, allowing different |
1f5026e7d848
[dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents:
8625
diff
changeset
|
10 |
speed/security tradeoffs. Those keeping all the *CubicWeb* hooks and security will be slower but the |
1f5026e7d848
[dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents:
8625
diff
changeset
|
11 |
possible errors in insertion (bad data types, integrity error, ...) will be raised. |
8625
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff
changeset
|
12 |
|
10457
1f5026e7d848
[dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents:
8625
diff
changeset
|
13 |
These data import utilities are provided in the package `cubicweb.dataimport`. |
8625
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff
changeset
|
14 |
|
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff
changeset
|
15 |
All the stores have the following API:: |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff
changeset
|
16 |
|
10457
1f5026e7d848
[dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents:
8625
diff
changeset
|
17 |
>>> user_eid = store.prepare_insert_entity('CWUser', login=u'johndoe') |
1f5026e7d848
[dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents:
8625
diff
changeset
|
18 |
>>> group_eid = store.prepare_insert_entity('CWUser', name=u'unknown') |
1f5026e7d848
[dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents:
8625
diff
changeset
|
19 |
>>> store.relate(user_eid, 'in_group', group_eid) |
1f5026e7d848
[dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents:
8625
diff
changeset
|
20 |
>>> store.flush() |
1f5026e7d848
[dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents:
8625
diff
changeset
|
21 |
>>> store.commit() |
1f5026e7d848
[dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents:
8625
diff
changeset
|
22 |
>>> store.finish() |
1f5026e7d848
[dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents:
8625
diff
changeset
|
23 |
|
1f5026e7d848
[dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents:
8625
diff
changeset
|
24 |
Some stores **require a flush** to copy data in the database, so if you want to have store |
1f5026e7d848
[dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents:
8625
diff
changeset
|
25 |
independent code you should explicitly call it. (There may be multiple flushes during the |
1f5026e7d848
[dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents:
8625
diff
changeset
|
26 |
process, or only one at the end if there is no memory issue). This is different from the |
1f5026e7d848
[dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents:
8625
diff
changeset
|
27 |
commit which validates the database transaction. At last, the `finish()` method should be called in |
1f5026e7d848
[dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents:
8625
diff
changeset
|
28 |
case the store requires additional work once everything is done. |
8625
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff
changeset
|
29 |
|
10457
1f5026e7d848
[dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents:
8625
diff
changeset
|
30 |
* ``prepare_insert_entity(<entity type>, **kwargs) -> eid``: given an entity |
1f5026e7d848
[dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents:
8625
diff
changeset
|
31 |
type, attributes and inlined relations, return the eid of the entity to be |
1f5026e7d848
[dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents:
8625
diff
changeset
|
32 |
inserted, *with no guarantee that anything has been inserted in database*. |
1f5026e7d848
[dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents:
8625
diff
changeset
|
33 |
|
1f5026e7d848
[dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents:
8625
diff
changeset
|
34 |
* ``prepare_update_entity(<entity type>, eid, **kwargs) -> None``: given an |
1f5026e7d848
[dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents:
8625
diff
changeset
|
35 |
entity type and eid, promise for update given attributes and inlined |
1f5026e7d848
[dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents:
8625
diff
changeset
|
36 |
relations *with no guarantee that anything has been inserted in database*. |
1f5026e7d848
[dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents:
8625
diff
changeset
|
37 |
|
1f5026e7d848
[dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents:
8625
diff
changeset
|
38 |
* ``prepare_insert_relation(eid_from, rtype, eid_to) -> None``: indicate that a |
1f5026e7d848
[dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents:
8625
diff
changeset
|
39 |
relation ``rtype`` should be added between entities with eids ``eid_from`` |
1f5026e7d848
[dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents:
8625
diff
changeset
|
40 |
and ``eid_to``. Similar to ``prepare_insert_entity()``, *there is no |
1f5026e7d848
[dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents:
8625
diff
changeset
|
41 |
guarantee that the relation has been inserted in database*. |
1f5026e7d848
[dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents:
8625
diff
changeset
|
42 |
|
1f5026e7d848
[dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents:
8625
diff
changeset
|
43 |
* ``flush() -> None``: flush any temporary data to database. May be called |
1f5026e7d848
[dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents:
8625
diff
changeset
|
44 |
several times during an import. |
1f5026e7d848
[dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents:
8625
diff
changeset
|
45 |
|
1f5026e7d848
[dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents:
8625
diff
changeset
|
46 |
* ``commit() -> None``: commit the database transaction. |
1f5026e7d848
[dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents:
8625
diff
changeset
|
47 |
|
1f5026e7d848
[dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents:
8625
diff
changeset
|
48 |
* ``finish() -> None``: additional stuff to do after import is terminated. |
8625
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff
changeset
|
49 |
|
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff
changeset
|
50 |
ObjectStore |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff
changeset
|
51 |
----------- |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff
changeset
|
52 |
|
10457
1f5026e7d848
[dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents:
8625
diff
changeset
|
53 |
This store keeps objects in memory for *faster* validation. It may be useful in development |
1f5026e7d848
[dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents:
8625
diff
changeset
|
54 |
mode. However, as it will not enforce the constraints of the schema nor insert anything in the |
1f5026e7d848
[dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents:
8625
diff
changeset
|
55 |
database, so it may miss some problems. |
8625
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff
changeset
|
56 |
|
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff
changeset
|
57 |
|
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff
changeset
|
58 |
RQLObjectStore |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff
changeset
|
59 |
-------------- |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff
changeset
|
60 |
|
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff
changeset
|
61 |
This store works with an actual RQL repository, and it may be used in production mode. |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff
changeset
|
62 |
|
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff
changeset
|
63 |
|
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff
changeset
|
64 |
NoHookRQLObjectStore |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff
changeset
|
65 |
-------------------- |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff
changeset
|
66 |
|
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff
changeset
|
67 |
This store works similarly to the *RQLObjectStore* but bypasses some *CubicWeb* hooks to be faster. |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff
changeset
|
68 |
|
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff
changeset
|
69 |
|
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff
changeset
|
70 |
SQLGenObjectStore |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff
changeset
|
71 |
----------------- |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff
changeset
|
72 |
|
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff
changeset
|
73 |
This store relies on *COPY FROM*/execute many sql commands to directly push data using SQL commands |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff
changeset
|
74 |
rather than using the whole *CubicWeb* API. For now, **it only works with PostgresSQL** as it requires |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff
changeset
|
75 |
the *COPY FROM* command. |