author | Rémi Cardona <remi.cardona@logilab.fr> |
Tue, 08 Dec 2015 16:28:20 +0100 | |
changeset 10969 | b4de8b1cc135 |
parent 10513 | 7bec01a59f92 |
child 11238 | bb5fdf1eb8fb |
permissions | -rw-r--r-- |
10461
37644c518705
[doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents:
10460
diff
changeset
|
1 |
.. -*- coding: utf-8 -*- |
8625
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff
changeset
|
2 |
|
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff
changeset
|
3 |
.. _dataimport: |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff
changeset
|
4 |
|
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff
changeset
|
5 |
Dataimport |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff
changeset
|
6 |
========== |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff
changeset
|
7 |
|
10461
37644c518705
[doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents:
10460
diff
changeset
|
8 |
*CubicWeb* is designed to manipulate huge of amount of data, and provides utilities to do so. |
37644c518705
[doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents:
10460
diff
changeset
|
9 |
|
37644c518705
[doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents:
10460
diff
changeset
|
10 |
The main entry point is :mod:`cubicweb.dataimport.importer` which defines an |
37644c518705
[doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents:
10460
diff
changeset
|
11 |
:class:`ExtEntitiesImporter` class responsible for importing data from an external source in the |
37644c518705
[doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents:
10460
diff
changeset
|
12 |
form :class:`ExtEntity` objects. An :class:`ExtEntity` is a transitional representation of an |
37644c518705
[doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents:
10460
diff
changeset
|
13 |
entity to be imported in the CubicWeb instance; building this representation is usually |
37644c518705
[doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents:
10460
diff
changeset
|
14 |
domain-specific -- e.g. dependent of the kind of data source (RDF, CSV, etc.) -- and is thus the |
37644c518705
[doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents:
10460
diff
changeset
|
15 |
responsibility of the end-user. |
37644c518705
[doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents:
10460
diff
changeset
|
16 |
|
37644c518705
[doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents:
10460
diff
changeset
|
17 |
Along with the importer, a *store* must be selected, which is responsible for insertion of data into |
37644c518705
[doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents:
10460
diff
changeset
|
18 |
the database. There exists different kind of stores_, allowing to insert data within different |
37644c518705
[doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents:
10460
diff
changeset
|
19 |
levels of the *CubicWeb* API and with different speed/security tradeoffs. Those keeping all the |
37644c518705
[doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents:
10460
diff
changeset
|
20 |
*CubicWeb* hooks and security will be slower but the possible errors in insertion (bad data types, |
37644c518705
[doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents:
10460
diff
changeset
|
21 |
integrity error, ...) will be handled. |
37644c518705
[doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents:
10460
diff
changeset
|
22 |
|
8625
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff
changeset
|
23 |
|
10461
37644c518705
[doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents:
10460
diff
changeset
|
24 |
Example |
37644c518705
[doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents:
10460
diff
changeset
|
25 |
------- |
37644c518705
[doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents:
10460
diff
changeset
|
26 |
|
37644c518705
[doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents:
10460
diff
changeset
|
27 |
Consider the following schema snippet. |
37644c518705
[doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents:
10460
diff
changeset
|
28 |
|
37644c518705
[doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents:
10460
diff
changeset
|
29 |
.. code-block:: python |
8625
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff
changeset
|
30 |
|
10461
37644c518705
[doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents:
10460
diff
changeset
|
31 |
class Person(EntityType): |
37644c518705
[doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents:
10460
diff
changeset
|
32 |
name = String(required=True) |
37644c518705
[doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents:
10460
diff
changeset
|
33 |
|
37644c518705
[doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents:
10460
diff
changeset
|
34 |
class knows(RelationDefinition): |
37644c518705
[doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents:
10460
diff
changeset
|
35 |
subject = 'Person' |
37644c518705
[doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents:
10460
diff
changeset
|
36 |
object = 'Person' |
37644c518705
[doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents:
10460
diff
changeset
|
37 |
|
37644c518705
[doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents:
10460
diff
changeset
|
38 |
along with some data in a ``people.csv`` file:: |
10460
d260722f2453
[dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
10457
diff
changeset
|
39 |
|
10461
37644c518705
[doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents:
10460
diff
changeset
|
40 |
# uri,name,knows |
37644c518705
[doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents:
10460
diff
changeset
|
41 |
http://www.example.org/alice,Alice, |
37644c518705
[doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents:
10460
diff
changeset
|
42 |
http://www.example.org/bob,Bob,http://www.example.org/alice |
10460
d260722f2453
[dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
10457
diff
changeset
|
43 |
|
10461
37644c518705
[doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents:
10460
diff
changeset
|
44 |
The following code (using a shell context) defines a function `extentities_from_csv` to read |
37644c518705
[doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents:
10460
diff
changeset
|
45 |
`Person` external entities coming from a CSV file and calls the :class:`ExtEntitiesImporter` to |
37644c518705
[doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents:
10460
diff
changeset
|
46 |
insert corresponding entities and relations into the CubicWeb instance. |
37644c518705
[doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents:
10460
diff
changeset
|
47 |
|
37644c518705
[doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents:
10460
diff
changeset
|
48 |
.. code-block:: python |
37644c518705
[doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents:
10460
diff
changeset
|
49 |
|
37644c518705
[doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents:
10460
diff
changeset
|
50 |
from cubicweb.dataimport import ucsvreader, RQLObjectStore |
37644c518705
[doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents:
10460
diff
changeset
|
51 |
from cubicweb.dataimport.importer import ExtEntity, ExtEntitiesImporter |
10460
d260722f2453
[dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
10457
diff
changeset
|
52 |
|
10461
37644c518705
[doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents:
10460
diff
changeset
|
53 |
def extentities_from_csv(fpath): |
37644c518705
[doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents:
10460
diff
changeset
|
54 |
"""Yield Person ExtEntities read from `fpath` CSV file.""" |
37644c518705
[doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents:
10460
diff
changeset
|
55 |
with open(fpath) as f: |
37644c518705
[doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents:
10460
diff
changeset
|
56 |
for uri, name, knows in ucsvreader(f, skipfirst=True, skip_empty=False): |
37644c518705
[doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents:
10460
diff
changeset
|
57 |
yield ExtEntity('Personne', uri, |
37644c518705
[doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents:
10460
diff
changeset
|
58 |
{'nom': set([name]), 'connait': set([knows])}) |
10460
d260722f2453
[dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
10457
diff
changeset
|
59 |
|
10461
37644c518705
[doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents:
10460
diff
changeset
|
60 |
extenties = extentities_from_csv('people.csv') |
37644c518705
[doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents:
10460
diff
changeset
|
61 |
store = RQLObjectStore(cnx) |
37644c518705
[doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents:
10460
diff
changeset
|
62 |
importer = ExtEntitiesImporter(schema, store) |
37644c518705
[doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents:
10460
diff
changeset
|
63 |
importer.import_entities(extenties) |
37644c518705
[doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents:
10460
diff
changeset
|
64 |
commit() |
37644c518705
[doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents:
10460
diff
changeset
|
65 |
rset = cnx.execute('String N WHERE X nom N, X connait Y, Y nom "Alice"') |
37644c518705
[doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents:
10460
diff
changeset
|
66 |
assert rset[0][0] == u'Bob', rset |
37644c518705
[doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents:
10460
diff
changeset
|
67 |
|
37644c518705
[doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents:
10460
diff
changeset
|
68 |
Importer API |
37644c518705
[doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents:
10460
diff
changeset
|
69 |
------------ |
37644c518705
[doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents:
10460
diff
changeset
|
70 |
|
37644c518705
[doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents:
10460
diff
changeset
|
71 |
.. automodule:: cubicweb.dataimport.importer |
37644c518705
[doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents:
10460
diff
changeset
|
72 |
|
10460
d260722f2453
[dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
10457
diff
changeset
|
73 |
|
d260722f2453
[dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
10457
diff
changeset
|
74 |
Stores |
d260722f2453
[dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
10457
diff
changeset
|
75 |
~~~~~~ |
d260722f2453
[dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
10457
diff
changeset
|
76 |
|
10513
7bec01a59f92
[dataimport] dispatch and deprecate old code
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10491
diff
changeset
|
77 |
.. automodule:: cubicweb.dataimport.stores |
8625
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff
changeset
|
78 |
|
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff
changeset
|
79 |
|
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff
changeset
|
80 |
SQLGenObjectStore |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff
changeset
|
81 |
----------------- |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff
changeset
|
82 |
|
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff
changeset
|
83 |
This store relies on *COPY FROM*/execute many sql commands to directly push data using SQL commands |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff
changeset
|
84 |
rather than using the whole *CubicWeb* API. For now, **it only works with PostgresSQL** as it requires |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
diff
changeset
|
85 |
the *COPY FROM* command. |