doc/book/en/devrepo/dataimport.rst
author Vincent Michel <vincent.michel@logilab.fr>
Fri, 14 Dec 2012 14:08:14 +0100
changeset 8625 7ee0752178e5
child 10457 1f5026e7d848
permissions -rw-r--r--
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822 This store will use: - copy from for massive insertions. - execute from for update. The API of this store is similar to the other stores.
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
     1
. -*- coding: utf-8 -*-
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
     2
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
     3
.. _dataimport:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
     4
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
     5
Dataimport
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
     6
==========
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
     7
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
     8
*CubicWeb* is designed to manipulate huge of amount of data, and provides helper functions to do so.
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
     9
These functions insert data within different levels of the *CubicWeb* API,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    10
allowing different speed/security tradeoffs. Those keeping all the *CubicWeb* hooks
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    11
and security will be slower but the possible errors in insertion
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    12
(bad data types, integrity error, ...) will be raised.
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    13
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    14
These dataimport function are provided in the file `dataimport.py`.
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    15
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    16
All the stores have the following API::
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    17
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    18
    >>> store = ObjectStore()
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    19
    >>> user = store.create_entity('CWUser', login=u'johndoe')
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    20
    >>> group = store.create_entity('CWUser', name=u'unknown')
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    21
    >>> store.relate(user.eid, 'in_group', group.eid)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    22
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    23
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    24
ObjectStore
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    25
-----------
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    26
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    27
This store keeps objects in memory for *faster* validation. It may be useful
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    28
in development mode. However, as it will not enforce the constraints of the schema,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    29
it may miss some problems.
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    30
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    31
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    32
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    33
RQLObjectStore
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    34
--------------
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    35
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    36
This store works with an actual RQL repository, and it may be used in production mode.
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    37
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    38
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    39
NoHookRQLObjectStore
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    40
--------------------
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    41
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    42
This store works similarly to the *RQLObjectStore* but bypasses some *CubicWeb* hooks to be faster.
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    43
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    44
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    45
SQLGenObjectStore
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    46
-----------------
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    47
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    48
This store relies on *COPY FROM*/execute many sql commands to directly push data using SQL commands
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    49
rather than using the whole *CubicWeb* API. For now, **it only works with PostgresSQL** as it requires
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    50
the *COPY FROM* command.
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    51
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    52
The API is similar to the other stores, but **it requires a flush** after some imports to copy data
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    53
in the database (these flushes may be multiples through the processes, or be done only once at the
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    54
end if there is no memory issue)::
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    55
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    56
    >>> store = SQLGenObjectStore(session)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    57
    >>> store.create_entity('Person', ...)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff changeset
    58
    >>> store.flush()