doc/book/en/devrepo/dataimport.rst
author Yann Voté <yann.vote@logilab.fr>
Wed, 24 Jun 2015 23:23:57 +0200
changeset 10457 1f5026e7d848
parent 8625 7ee0752178e5
child 10460 d260722f2453
permissions -rw-r--r--
[dataimport] Move stores to new API. Here is the final store API: * ``prepare_insert_entity(<entity type>, **kwargs) -> eid``: given an entity type, attributes and inlined relations, return the eid of the entity to be inserted, *with no guarantee that anything has been inserted in database*, * ``prepare_update_entity(<entity type>, eid, **kwargs) -> None``: given an entity type and eid, promise for update given attributes and inlined relations *with no guarantee that anything has been inserted in database*, * ``prepare_insert_relation(eid_from, rtype, eid_to) -> None``: indicate that a relation ``rtype`` should be added between entities with eids ``eid_from`` and ``eid_to``. Similarly to ``prepare_insert_entity()``, *there is no guarantee that the relation will be inserted in database*, * ``flush() -> None``: flush any temporary data to database. May be called several times during an import, * ``finish() -> None``: additional stuff to do after import is terminated. **Warning:** ``prepare_update_entity()`` still needs to be implemented for NoHookRQLObjectStore. Related to #5040344
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
     1
. -*- coding: utf-8 -*-
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
     2
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
     3
.. _dataimport:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
     4
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
     5
Dataimport
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
     6
==========
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
     7
10457
1f5026e7d848 [dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents: 8625
diff changeset
     8
*CubicWeb* is designed to manipulate huge of amount of data, and provides utilities to do so.  They
1f5026e7d848 [dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents: 8625
diff changeset
     9
allow to insert data within different levels of the *CubicWeb* API, allowing different
1f5026e7d848 [dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents: 8625
diff changeset
    10
speed/security tradeoffs. Those keeping all the *CubicWeb* hooks and security will be slower but the
1f5026e7d848 [dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents: 8625
diff changeset
    11
possible errors in insertion (bad data types, integrity error, ...) will be raised.
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    12
10457
1f5026e7d848 [dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents: 8625
diff changeset
    13
These data import utilities are provided in the package `cubicweb.dataimport`.
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    14
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    15
All the stores have the following API::
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    16
10457
1f5026e7d848 [dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents: 8625
diff changeset
    17
    >>> user_eid = store.prepare_insert_entity('CWUser', login=u'johndoe')
1f5026e7d848 [dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents: 8625
diff changeset
    18
    >>> group_eid = store.prepare_insert_entity('CWUser', name=u'unknown')
1f5026e7d848 [dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents: 8625
diff changeset
    19
    >>> store.relate(user_eid, 'in_group', group_eid)
1f5026e7d848 [dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents: 8625
diff changeset
    20
    >>> store.flush()
1f5026e7d848 [dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents: 8625
diff changeset
    21
    >>> store.commit()
1f5026e7d848 [dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents: 8625
diff changeset
    22
    >>> store.finish()
1f5026e7d848 [dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents: 8625
diff changeset
    23
1f5026e7d848 [dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents: 8625
diff changeset
    24
Some stores **require a flush** to copy data in the database, so if you want to have store
1f5026e7d848 [dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents: 8625
diff changeset
    25
independent code you should explicitly call it. (There may be multiple flushes during the
1f5026e7d848 [dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents: 8625
diff changeset
    26
process, or only one at the end if there is no memory issue). This is different from the
1f5026e7d848 [dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents: 8625
diff changeset
    27
commit which validates the database transaction. At last, the `finish()` method should be called in
1f5026e7d848 [dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents: 8625
diff changeset
    28
case the store requires additional work once everything is done.
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    29
10457
1f5026e7d848 [dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents: 8625
diff changeset
    30
* ``prepare_insert_entity(<entity type>, **kwargs) -> eid``: given an entity
1f5026e7d848 [dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents: 8625
diff changeset
    31
  type, attributes and inlined relations, return the eid of the entity to be
1f5026e7d848 [dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents: 8625
diff changeset
    32
  inserted, *with no guarantee that anything has been inserted in database*.
1f5026e7d848 [dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents: 8625
diff changeset
    33
1f5026e7d848 [dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents: 8625
diff changeset
    34
* ``prepare_update_entity(<entity type>, eid, **kwargs) -> None``: given an
1f5026e7d848 [dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents: 8625
diff changeset
    35
  entity type and eid, promise for update given attributes and inlined
1f5026e7d848 [dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents: 8625
diff changeset
    36
  relations *with no guarantee that anything has been inserted in database*.
1f5026e7d848 [dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents: 8625
diff changeset
    37
1f5026e7d848 [dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents: 8625
diff changeset
    38
* ``prepare_insert_relation(eid_from, rtype, eid_to) -> None``: indicate that a
1f5026e7d848 [dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents: 8625
diff changeset
    39
  relation ``rtype`` should be added between entities with eids ``eid_from``
1f5026e7d848 [dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents: 8625
diff changeset
    40
  and ``eid_to``. Similar to ``prepare_insert_entity()``, *there is no
1f5026e7d848 [dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents: 8625
diff changeset
    41
  guarantee that the relation has been inserted in database*.
1f5026e7d848 [dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents: 8625
diff changeset
    42
1f5026e7d848 [dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents: 8625
diff changeset
    43
* ``flush() -> None``: flush any temporary data to database. May be called
1f5026e7d848 [dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents: 8625
diff changeset
    44
  several times during an import.
1f5026e7d848 [dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents: 8625
diff changeset
    45
1f5026e7d848 [dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents: 8625
diff changeset
    46
* ``commit() -> None``: commit the database transaction.
1f5026e7d848 [dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents: 8625
diff changeset
    47
1f5026e7d848 [dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents: 8625
diff changeset
    48
* ``finish() -> None``: additional stuff to do after import is terminated.
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    49
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    50
ObjectStore
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    51
-----------
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    52
10457
1f5026e7d848 [dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents: 8625
diff changeset
    53
This store keeps objects in memory for *faster* validation. It may be useful in development
1f5026e7d848 [dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents: 8625
diff changeset
    54
mode. However, as it will not enforce the constraints of the schema nor insert anything in the
1f5026e7d848 [dataimport] Move stores to new API.
Yann Voté <yann.vote@logilab.fr>
parents: 8625
diff changeset
    55
database, so it may miss some problems.
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    56
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    57
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    58
RQLObjectStore
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    59
--------------
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    60
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    61
This store works with an actual RQL repository, and it may be used in production mode.
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    62
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    63
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    64
NoHookRQLObjectStore
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    65
--------------------
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    66
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    67
This store works similarly to the *RQLObjectStore* but bypasses some *CubicWeb* hooks to be faster.
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    68
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    69
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    70
SQLGenObjectStore
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    71
-----------------
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    72
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    73
This store relies on *COPY FROM*/execute many sql commands to directly push data using SQL commands
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    74
rather than using the whole *CubicWeb* API. For now, **it only works with PostgresSQL** as it requires
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    75
the *COPY FROM* command.