author | Philippe Pepiot <philippe.pepiot@logilab.fr> |
Thu, 28 Mar 2019 11:13:12 +0100 | |
changeset 12556 | d1c659d70368 |
parent 10496 | e95b559a06a2 |
child 12792 | e2cdb1be6bd9 |
permissions | -rw-r--r-- |
8836
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
1 |
Importing relational data into a CubicWeb instance |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
2 |
================================================== |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
3 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
4 |
Introduction |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
5 |
~~~~~~~~~~~~ |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
6 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
7 |
This tutorial explains how to import data from an external source (e.g. a collection of files) |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
8 |
into a CubicWeb cube instance. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
9 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
10 |
First, once we know the format of the data we wish to import, we devise a |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
11 |
*data model*, that is, a CubicWeb (Yams) schema which reflects the way the data |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
12 |
is structured. This schema is implemented in the ``schema.py`` file. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
13 |
In this tutorial, we will describe such a schema for a particular data set, |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
14 |
the Diseasome data (see below). |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
15 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
16 |
Once the schema is defined, we create a cube and an instance. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
17 |
The cube is a specification of an application, whereas an instance |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
18 |
is the application per se. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
19 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
20 |
Once the schema is defined and the instance is created, the import can be performed, via |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
21 |
the following steps: |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
22 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
23 |
1. Build a custom parser for the data to be imported. Thus, one obtains a Python |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
24 |
memory representation of the data. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
25 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
26 |
2. Map the parsed data to the data model defined in ``schema.py``. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
27 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
28 |
3. Perform the actual import of the data. This comes down to "populating" |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
29 |
the data model with the memory representation obtained at 1, according to |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
30 |
the mapping defined at 2. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
31 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
32 |
This tutorial illustrates all the above steps in the context of relational data |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
33 |
stored in the RDF format. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
34 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
35 |
More specifically, we describe the import of Diseasome_ RDF/OWL data. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
36 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
37 |
.. _Diseasome: http://datahub.io/dataset/fu-berlin-diseasome |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
38 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
39 |
Building a data model |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
40 |
~~~~~~~~~~~~~~~~~~~~~ |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
41 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
42 |
The first thing to do when using CubicWeb for creating an application from scratch |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
43 |
is to devise a *data model*, that is, a relational representation of the problem to be |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
44 |
modeled or of the structure of the data to be imported. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
45 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
46 |
In such a schema, we define |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
47 |
an entity type (``EntityType`` objects) for each type of entity to import. Each such type |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
48 |
has several attributes. If the attributes are of known CubicWeb (Yams) types, viz. numbers, |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
49 |
strings or characters, then they are defined as attributes, as e.g. ``attribute = Int()`` |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
50 |
for an attribute named ``attribute`` which is an integer. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
51 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
52 |
Each such type also has a set of |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
53 |
relations, which are defined like the attributes, except that they represent, in fact, |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
54 |
relations between the entities of the type under discussion and the objects of a type which |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
55 |
is specified in the relation definition. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
56 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
57 |
For example, for the Diseasome data, we have two types of entities, genes and diseases. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
58 |
Thus, we create two classes which inherit from ``EntityType``:: |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
59 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
60 |
class Disease(EntityType): |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
61 |
# Corresponds to http://www.w3.org/2000/01/rdf-schema#label |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
62 |
label = String(maxsize=512, fulltextindexed=True) |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
63 |
... |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
64 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
65 |
#Corresponds to http://www4.wiwiss.fu-berlin.de/diseasome/resource/diseasome/associatedGene |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
66 |
associated_genes = SubjectRelation('Gene', cardinality='**') |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
67 |
... |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
68 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
69 |
#Corresponds to 'http://www4.wiwiss.fu-berlin.de/diseasome/resource/diseasome/chromosomalLocation' |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
70 |
chromosomal_location = SubjectRelation('ExternalUri', cardinality='?*', inlined=True) |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
71 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
72 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
73 |
class Gene(EntityType): |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
74 |
... |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
75 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
76 |
In this schema, there are attributes whose values are numbers or strings. Thus, they are |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
77 |
defined by using the CubicWeb / Yams primitive types, e.g., ``label = String(maxsize=12)``. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
78 |
These types can have several constraints or attributes, such as ``maxsize``. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
79 |
There are also relations, either between the entity types themselves, or between them |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
80 |
and a CubicWeb type, ``ExternalUri``. The latter defines a class of URI objects in |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
81 |
CubicWeb. For instance, the ``chromosomal_location`` attribute is a relation between |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
82 |
a ``Disease`` entity and an ``ExternalUri`` entity. The relation is marked by the CubicWeb / |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
83 |
Yams ``SubjectRelation`` method. The latter can have several optional keyword arguments, such as |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
84 |
``cardinality`` which specifies the number of subjects and objects related by the relation type |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
85 |
specified. For example, the ``'?*'`` cardinality in the ``chromosomal_relation`` relation type says |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
86 |
that zero or more ``Disease`` entities are related to zero or one ``ExternalUri`` entities. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
87 |
In other words, a ``Disease`` entity is related to at most one ``ExternalUri`` entity via the |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
88 |
``chromosomal_location`` relation type, and that we can have zero or more ``Disease`` entities in the |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
89 |
data base. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
90 |
For a relation between the entity types themselves, the ``associated_genes`` between a ``Disease`` |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
91 |
entity and a ``Gene`` entity is defined, so that any number of ``Gene`` entities can be associated |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
92 |
to a ``Disease``, and there can be any number of ``Disease`` s if a ``Gene`` exists. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
93 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
94 |
Of course, before being able to use the CubicWeb / Yams built-in objects, we need to import them:: |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
95 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
96 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
97 |
from yams.buildobjs import EntityType, SubjectRelation, String, Int |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
98 |
from cubicweb.schemas.base import ExternalUri |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
99 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
100 |
Building a custom data parser |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
101 |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
102 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
103 |
The data we wish to import is structured in the RDF format, |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
104 |
as a text file containing a set of lines. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
105 |
On each line, there are three fields. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
106 |
The first two fields are URIs ("Universal Resource Identifiers"). |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
107 |
The third field is either an URI or a string. Each field bares a particular meaning: |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
108 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
109 |
- the leftmost field is an URI that holds the entity to be imported. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
110 |
Note that the entities defined in the data model (i.e., in ``schema.py``) should |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
111 |
correspond to the entities whose URIs are specified in the import file. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
112 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
113 |
- the middle field is an URI that holds a relation whose subject is the entity |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
114 |
defined by the leftmost field. Note that this should also correspond |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
115 |
to the definitions in the data model. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
116 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
117 |
- the rightmost field is either an URI or a string. When this field is an URI, |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
118 |
it gives the object of the relation defined by the middle field. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
119 |
When the rightmost field is a string, the middle field is interpreted as an attribute |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
120 |
of the subject (introduced by the leftmost field) and the rightmost field is |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
121 |
interpreted as the value of the attribute. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
122 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
123 |
Note however that some attributes (i.e. relations whose objects are strings) |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
124 |
have their objects defined as strings followed by ``^^`` and by another URI; |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
125 |
we ignore this part. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
126 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
127 |
Let us show some examples: |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
128 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
129 |
- of line holding an attribute definition: |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
130 |
``<http://www4.wiwiss.fu-berlin.de/diseasome/resource/genes/CYP17A1> |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
131 |
<http://www.w3.org/2000/01/rdf-schema#label> "CYP17A1" .`` |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
132 |
The line contains the definition of the ``label`` attribute of an |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
133 |
entity of type ``gene``. The value of ``label`` is '``CYP17A1``'. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
134 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
135 |
- of line holding a relation definition: |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
136 |
``<http://www4.wiwiss.fu-berlin.de/diseasome/resource/diseases/1> |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
137 |
<http://www4.wiwiss.fu-berlin.de/diseasome/resource/diseasome/associatedGene> |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
138 |
<http://www4.wiwiss.fu-berlin.de/diseasome/resource/genes/HADH2> .`` |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
139 |
The line contains the definition of the ``associatedGene`` relation between |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
140 |
a ``disease`` subject entity identified by ``1`` and a ``gene`` object |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
141 |
entity defined by ``HADH2``. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
142 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
143 |
Thus, for parsing the data, we can (:note: see the ``diseasome_parser`` module): |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
144 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
145 |
1. define a couple of regular expressions for parsing the two kinds of lines, |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
146 |
``RE_ATTS`` for parsing the attribute definitions, and ``RE_RELS`` for parsing |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
147 |
the relation definitions. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
148 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
149 |
2. define a function that iterates through the lines of the file and retrieves |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
150 |
(``yield`` s) a (subject, relation, object) tuple for each line. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
151 |
We called it ``_retrieve_structure`` in the ``diseasome_parser`` module. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
152 |
The function needs the file name and the types for which information |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
153 |
should be retrieved. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
154 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
155 |
Alternatively, instead of hand-making the parser, one could use the RDF parser provided |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
156 |
in the ``dataio`` cube. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
157 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
158 |
.. XXX To further study and detail the ``dataio`` cube usage. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
159 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
160 |
Once we get to have the (subject, relation, object) triples, we need to map them into |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
161 |
the data model. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
162 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
163 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
164 |
Mapping the data to the schema |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
165 |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
166 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
167 |
In the case of diseasome data, we can just define two dictionaries for mapping |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
168 |
the names of the relations as extracted by the parser, to the names of the relations |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
169 |
as defined in the ``schema.py`` data model. In the ``diseasome_parser`` module |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
170 |
they are called ``MAPPING_ATTS`` and ``MAPPING_RELS``. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
171 |
Given that the relation and attribute names are given in CamelCase in the original data, |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
172 |
mappings are necessary if we follow the PEP08 when naming the attributes in the data model. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
173 |
For example, the RDF relation ``chromosomalLocation`` is mapped into the schema relation |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
174 |
``chromosomal_location``. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
175 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
176 |
Once these mappings have been defined, we just iterate over the (subject, relation, object) |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
177 |
tuples provided by the parser and we extract the entities, with their attributes and relations. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
178 |
For each entity, we thus have a dictionary with two keys, ``attributes`` and ``relations``. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
179 |
The value associated to the ``attributes`` key is a dictionary containing (attribute: value) |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
180 |
pairs, where "value" is a string, plus the ``cwuri`` key / attribute holding the URI of |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
181 |
the entity itself. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
182 |
The value associated to the ``relations`` key is a dictionary containing (relation: value) |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
183 |
pairs, where "value" is an URI. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
184 |
This is implemented in the ``entities_from_rdf`` interface function of the module |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
185 |
``diseasome_parser``. This function provides an iterator on the dictionaries containing |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
186 |
the ``attributes`` and ``relations`` keys for all entities. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
187 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
188 |
However, this is a simple case. In real life, things can get much more complicated, and the |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
189 |
mapping can be far from trivial, especially when several data sources (which can follow |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
190 |
different formatting and even structuring conventions) must be mapped into the same data model. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
191 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
192 |
Importing the data |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
193 |
~~~~~~~~~~~~~~~~~~ |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
194 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
195 |
The data import code should be placed in a Python module. Let us call it |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
196 |
``diseasome_import.py``. Then, this module should be called via |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
197 |
``cubicweb-ctl``, as follows:: |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
198 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
199 |
cubicweb-ctl shell diseasome_import.py -- <other arguments e.g. data file> |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
200 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
201 |
In the import module, we should use a *store* for doing the import. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
202 |
A store is an object which provides three kinds of methods for |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
203 |
importing data: |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
204 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
205 |
- a method for importing the entities, along with the values |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
206 |
of their attributes. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
207 |
- a method for importing the relations between the entities. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
208 |
- a method for committing the imports to the database. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
209 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
210 |
In CubicWeb, we have four stores: |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
211 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
212 |
1. ``ObjectStore`` base class for the stores in CubicWeb. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
213 |
It only provides a skeleton for all other stores and |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
214 |
provides the means for creating the memory structures |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
215 |
(dictionaries) that hold the entities and the relations |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
216 |
between them. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
217 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
218 |
2. ``RQLObjectStore``: store which uses the RQL language for performing |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
219 |
database insertions and updates. It relies on all the CubicWeb hooks |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
220 |
machinery, especially for dealing with security issues (database access |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
221 |
permissions). |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
222 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
223 |
2. ``NoHookRQLObjectStore``: store which uses the RQL language for |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
224 |
performing database insertions and updates, but for which |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
225 |
all hooks are deactivated. This implies that |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
226 |
certain checks with respect to the CubicWeb / Yams schema |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
227 |
(data model) are not performed. However, all SQL queries |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
228 |
obtained from the RQL ones are executed in a sequential |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
229 |
manner, one query per inserted entity. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
230 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
231 |
4. ``SQLGenObjectStore``: store which uses the SQL language directly. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
232 |
It inserts entities either sequentially, by executing an SQL query |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
233 |
for each entity, or directly by using one PostGRES ``COPY FROM`` |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
234 |
query for a set of similarly structured entities. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
235 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
236 |
For really massive imports (millions or billions of entities), there |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
237 |
is a cube ``dataio`` which contains another store, called |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
238 |
``MassiveObjectStore``. This store is similar to ``SQLGenObjectStore``, |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
239 |
except that anything related to CubicWeb is bypassed. That is, even the |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
240 |
CubicWeb EID entity identifiers are not handled. This store is the fastest, |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
241 |
but has a slightly different API from the other four stores mentioned above. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
242 |
Moreover, it has an important limitation, in that it doesn't insert inlined [#]_ |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
243 |
relations in the database. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
244 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
245 |
.. [#] An inlined relation is a relation defined in the schema |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
246 |
with the keyword argument ``inlined=True``. Such a relation |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
247 |
is inserted in the database as an attribute of the entity |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
248 |
whose subject it is. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
249 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
250 |
In the following section we will see how to import data by using the stores |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
251 |
in CubicWeb's ``dataimport`` module. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
252 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
253 |
Using the stores in ``dataimport`` |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
254 |
++++++++++++++++++++++++++++++++++ |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
255 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
256 |
``ObjectStore`` is seldom used in real life for importing data, since it is |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
257 |
only the base store for the other stores and it doesn't perform an actual |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
258 |
import of the data. Nevertheless, the other three stores, which import data, |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
259 |
are based on ``ObjectStore`` and provide the same API. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
260 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
261 |
All three stores ``RQLObjectStore``, ``NoHookRQLObjectStore`` and |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
262 |
``SQLGenObjectStore`` provide exactly the same API for importing data, that is |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
263 |
entities and relations, in an SQL database. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
264 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
265 |
Before using a store, one must import the ``dataimport`` module and then initialize |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
266 |
the store, with the current ``session`` as a parameter:: |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
267 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
268 |
import cubicweb.dataimport as cwdi |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
269 |
... |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
270 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
271 |
store = cwdi.RQLObjectStore(session) |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
272 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
273 |
Each such store provides three methods for data import: |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
274 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
275 |
#. ``create_entity(Etype, **attributes)``, which allows us to add |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
276 |
an entity of the Yams type ``Etype`` to the database. This entity's attributes |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
277 |
are specified in the ``attributes`` dictionary. The method returns the entity |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
278 |
created in the database. For example, we add two entities, |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
279 |
a person, of ``Person`` type, and a location, of ``Location`` type:: |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
280 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
281 |
person = store.create_entity('Person', name='Toto', age='18', height='190') |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
282 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
283 |
location = store.create_entity('Location', town='Paris', arrondissement='13') |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
284 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
285 |
#. ``relate(subject_eid, r_type, object_eid)``, which allows us to add a relation |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
286 |
of the Yams type ``r_type`` to the database. The relation's subject is an entity |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
287 |
whose EID is ``subject_eid``; its object is another entity, whose EID is |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
288 |
``object_eid``. For example [#]_:: |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
289 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
290 |
store.relate(person.eid(), 'lives_in', location.eid(), **kwargs) |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
291 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
292 |
``kwargs`` is only used by the ``SQLGenObjectStore``'s ``relate`` method and is here |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
293 |
to allow us to specify the type of the subject of the relation, when the relation is |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
294 |
defined as inlined in the schema. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
295 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
296 |
.. [#] The ``eid`` method of an entity defined via ``create_entity`` returns |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
297 |
the entity identifier as assigned by CubicWeb when creating the entity. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
298 |
This only works for entities defined via the stores in the CubicWeb's |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
299 |
``dataimport`` module. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
300 |
|
10496
e95b559a06a2
[doc] more fixes of warnings/errors in doc build
David Douard <david.douard@logilab.fr>
parents:
8927
diff
changeset
|
301 |
The keyword argument that is understood by ``SQLGenObjectStore`` is called |
8836
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
302 |
``subjtype`` and holds the type of the subject entity. For the example considered here, |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
303 |
this comes to having [#]_:: |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
304 |
|
8927
885dea8f16a0
[cubicweb/doc] Replace dc_type() by cw_etype
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
8836
diff
changeset
|
305 |
store.relate(person.eid(), 'lives_in', location.eid(), subjtype=person.cw_etype) |
8836
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
306 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
307 |
If ``subjtype`` is not specified, then the store tries to infer the type of the subject. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
308 |
However, this doesn't always work, e.g. when there are several possible subject types |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
309 |
for a given relation type. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
310 |
|
8927
885dea8f16a0
[cubicweb/doc] Replace dc_type() by cw_etype
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
8836
diff
changeset
|
311 |
.. [#] The ``cw_etype`` attribute of an entity defined via ``create_entity`` holds |
8836
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
312 |
the type of the entity just created. This only works for entities defined via |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
313 |
the stores in the CubicWeb's ``dataimport`` module. In the example considered |
8927
885dea8f16a0
[cubicweb/doc] Replace dc_type() by cw_etype
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
8836
diff
changeset
|
314 |
here, ``person.cw_etype`` holds ``'Person'``. |
8836
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
315 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
316 |
All the other stores but ``SQLGenObjectStore`` ignore the ``kwargs`` parameters. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
317 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
318 |
#. ``flush()``, which allows us to perform the actual commit into the database, along |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
319 |
with some cleanup operations. Ideally, this method should be called as often as |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
320 |
possible, that is after each insertion in the database, so that database sessions |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
321 |
are kept as atomic as possible. In practice, we usually call this method twice: |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
322 |
first, after all the entities have been created, second, after all relations have |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
323 |
been created. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
324 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
325 |
Note however that before each commit the database insertions |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
326 |
have to be consistent with the schema. Thus, if, for instance, |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
327 |
an entity has an attribute defined through a relation (viz. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
328 |
a ``SubjectRelation``) with a ``"1"`` or ``"+"`` object |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
329 |
cardinality, we have to create the entity under discussion, |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
330 |
the object entity of the relation under discussion, and the |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
331 |
relation itself, before committing the additions to the database. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
332 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
333 |
The ``flush`` method is simply called as:: |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
334 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
335 |
store.flush(). |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
336 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
337 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
338 |
Using the ``MassiveObjectStore`` in the ``dataio`` cube |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
339 |
+++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
340 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
341 |
This store, available in the ``dataio`` cube, allows us to |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
342 |
fully dispense with the CubicWeb import mechanisms and hence |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
343 |
to interact directly with the database server, via SQL queries. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
344 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
345 |
Moreover, these queries rely on PostGreSQL's ``COPY FROM`` instruction |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
346 |
to create several entities in a single query. This brings tremendous |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
347 |
performance improvements with respect to the RQL-based data insertion |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
348 |
procedures. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
349 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
350 |
However, the API of this store is slightly different from the API of |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
351 |
the stores in CubicWeb's ``dataimport`` module. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
352 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
353 |
Before using the store, one has to import the ``dataio`` cube's |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
354 |
``dataimport`` module, then initialize the store by giving it the |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
355 |
``session`` parameter:: |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
356 |
|
12556
d1c659d70368
[doc] replace legacy import to new style cube import in various places
Philippe Pepiot <philippe.pepiot@logilab.fr>
parents:
10496
diff
changeset
|
357 |
from cubicweb_dataio import dataimport as mcwdi |
8836
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
358 |
... |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
359 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
360 |
store = mcwdi.MassiveObjectStore(session) |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
361 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
362 |
The ``MassiveObjectStore`` provides six methods for inserting data |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
363 |
into the database: |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
364 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
365 |
#. ``init_rtype_table(SubjEtype, r_type, ObjEtype)``, which specifies the |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
366 |
creation of the tables associated to the relation types in the database. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
367 |
Each such table has three column, the type of the subject entity, the |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
368 |
type of the relation (that is, the name of the attribute in the subject |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
369 |
entity which is defined via the relation), and the type of the object |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
370 |
entity. For example:: |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
371 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
372 |
store.init_rtype_table('Person', 'lives_in', 'Location') |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
373 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
374 |
Please note that these tables can be created before the entities, since |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
375 |
they only specify their types, not their unique identifiers. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
376 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
377 |
#. ``create_entity(Etype, **attributes)``, which allows us to add new entities, |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
378 |
whose attributes are given in the ``attributes`` dictionary. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
379 |
Please note however that, by default, this method does *not* return |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
380 |
the created entity. The method is called, for example, as in:: |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
381 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
382 |
store.create_entity('Person', name='Toto', age='18', height='190', |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
383 |
uri='http://link/to/person/toto_18_190') |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
384 |
store.create_entity('Location', town='Paris', arrondissement='13', |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
385 |
uri='http://link/to/location/paris_13') |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
386 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
387 |
In order to be able to link these entities via the relations when needed, |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
388 |
we must provide ourselves a means for uniquely identifying the entities. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
389 |
In general, this is done via URIs, stored in attributes like ``uri`` or |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
390 |
``cwuri``. The name of the attribute is irrelevant as long as its value is |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
391 |
unique for each entity. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
392 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
393 |
#. ``relate_by_iid(subject_iid, r_type, object_iid)`` allows us to actually |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
394 |
relate the entities uniquely identified by ``subject_iid`` and |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
395 |
``object_iid`` via a relation of type ``r_type``. For example:: |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
396 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
397 |
store.relate_by_iid('http://link/to/person/toto_18_190', |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
398 |
'lives_in', |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
399 |
'http://link/to/location/paris_13') |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
400 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
401 |
Please note that this method does *not* work for inlined relations! |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
402 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
403 |
#. ``convert_relations(SubjEtype, r_type, ObjEtype, subj_iid_attribute, |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
404 |
obj_iid_attribute)`` |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
405 |
allows us to actually insert |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
406 |
the relations in the database. At one call of this method, one inserts |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
407 |
all the relations of type ``rtype`` between entities of given types. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
408 |
``subj_iid_attribute`` and ``object_iid_attribute`` are the names |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
409 |
of the attributes which store the unique identifiers of the entities, |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
410 |
as assigned by the user. These names can be identical, as long as |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
411 |
their values are unique. For example, for inserting all relations |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
412 |
of type ``lives_in`` between ``People`` and ``Location`` entities, |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
413 |
we write:: |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
414 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
415 |
store.convert_relations('Person', 'lives_in', 'Location', 'uri', 'uri') |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
416 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
417 |
#. ``flush()`` performs the actual commit in the database. It only needs |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
418 |
to be called after ``create_entity`` and ``relate_by_iid`` calls. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
419 |
Please note that ``relate_by_iid`` does *not* perform insertions into |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
420 |
the database, hence calling ``flush()`` for it would have no effect. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
421 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
422 |
#. ``cleanup()`` performs database cleanups, by removing temporary tables. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
423 |
It should only be called at the end of the import. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
424 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
425 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
426 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
427 |
.. XXX to add smth on the store's parameter initialization. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
428 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
429 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
430 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
431 |
Application to the Diseasome data |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
432 |
+++++++++++++++++++++++++++++++++ |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
433 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
434 |
Import setup |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
435 |
############ |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
436 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
437 |
We define an import function, ``diseasome_import``, which does basically four things: |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
438 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
439 |
#. creates and initializes the store to be used, via a line such as:: |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
440 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
441 |
store = cwdi.SQLGenObjectStore(session) |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
442 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
443 |
where ``cwdi`` is the imported ``cubicweb.dataimport`` or |
12556
d1c659d70368
[doc] replace legacy import to new style cube import in various places
Philippe Pepiot <philippe.pepiot@logilab.fr>
parents:
10496
diff
changeset
|
444 |
``cubicweb_dataio.dataimport``. |
8836
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
445 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
446 |
#. calls the diseasome parser, that is, the ``entities_from_rdf`` function in the |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
447 |
``diseasome_parser`` module and iterates on its result, in a line such as:: |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
448 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
449 |
for entity, relations in parser.entities_from_rdf(filename, ('gene', 'disease')): |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
450 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
451 |
where ``parser`` is the imported ``diseasome_parser`` module, and ``filename`` is the |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
452 |
name of the file containing the data (with its path), e.g. ``../data/diseasome_dump.nt``. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
453 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
454 |
#. creates the entities to be inserted in the database; for Diseasome, there are two |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
455 |
kinds of entities: |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
456 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
457 |
#. entities defined in the data model, viz. ``Gene`` and ``Disease`` in our case. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
458 |
#. entities which are built in CubicWeb / Yams, viz. ``ExternalUri`` which define |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
459 |
URIs. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
460 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
461 |
As we are working with RDF data, each entity is defined through a series of URIs. Hence, |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
462 |
each "relational attribute" [#]_ of an entity is defined via an URI, that is, in CubicWeb |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
463 |
terms, via an ``ExternalUri`` entity. The entities are created, in the loop presented above, |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
464 |
as such:: |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
465 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
466 |
ent = store.create_entity(etype, **entity) |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
467 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
468 |
where ``etype`` is the appropriate entity type, either ``Gene`` or ``Disease``. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
469 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
470 |
.. [#] By "relational attribute" we denote an attribute (of an entity) which |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
471 |
is defined through a relation, e.g. the ``chromosomal_location`` attribute |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
472 |
of ``Disease`` entities, which is defined through a relation between a |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
473 |
``Disease`` and an ``ExternalUri``. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
474 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
475 |
The ``ExternalUri`` entities are as many as URIs in the data file. For them, we define a unique |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
476 |
attribute, ``uri``, which holds the URI under discussion:: |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
477 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
478 |
extu = store.create_entity('ExternalUri', uri="http://path/of/the/uri") |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
479 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
480 |
#. creates the relations between the entities. We have relations between: |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
481 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
482 |
#. entities defined in the schema, e.g. between ``Disease`` and ``Gene`` |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
483 |
entities, such as the ``associated_genes`` relation defined for |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
484 |
``Disease`` entities. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
485 |
#. entities defined in the schema and ``ExternalUri`` entities, such as ``gene_id``. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
486 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
487 |
The way relations are added to the database depends on the store: |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
488 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
489 |
- for the stores in the CubicWeb ``dataimport`` module, we only use |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
490 |
``store.relate``, in |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
491 |
another loop, on the relations (that is, a |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
492 |
loop inside the preceding one, mentioned at step 2):: |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
493 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
494 |
for rtype, rels in relations.iteritems(): |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
495 |
... |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
496 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
497 |
store.relate(ent.eid(), rtype, extu.eid(), **kwargs) |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
498 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
499 |
where ``kwargs`` is a dictionary designed to accommodate the need for specifying |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
500 |
the type of the subject entity of the relation, when the relation is inlined and |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
501 |
``SQLGenObjectStore`` is used. For example:: |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
502 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
503 |
... |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
504 |
store.relate(ent.eid(), 'chromosomal_location', extu.eid(), subjtype='Disease') |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
505 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
506 |
- for the ``MassiveObjectStore`` in the ``dataio`` cube's ``dataimport`` module, |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
507 |
the relations are created in three steps: |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
508 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
509 |
#. first, a table is created for each relation type, as in:: |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
510 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
511 |
... |
8927
885dea8f16a0
[cubicweb/doc] Replace dc_type() by cw_etype
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
8836
diff
changeset
|
512 |
store.init_rtype_table(ent.cw_etype, rtype, extu.cw_etype) |
8836
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
513 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
514 |
which comes down to lines such as:: |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
515 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
516 |
store.init_rtype_table('Disease', 'associated_genes', 'Gene') |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
517 |
store.init_rtype_table('Gene', 'gene_id', 'ExternalUri') |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
518 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
519 |
#. second, the URI of each entity will be used as its identifier, in the |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
520 |
``relate_by_iid`` method, such as:: |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
521 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
522 |
disease_uri = 'http://www4.wiwiss.fu-berlin.de/diseasome/resource/diseases/3' |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
523 |
gene_uri = '<http://www4.wiwiss.fu-berlin.de/diseasome/resource/genes/HSD3B2' |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
524 |
store.relate_by_iid(disease_uri, 'associated_genes', gene_uri) |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
525 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
526 |
#. third, the relations for each relation type will be added to the database, |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
527 |
via the ``convert_relations`` method, such as in:: |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
528 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
529 |
store.convert_relations('Disease', 'associated_genes', 'Gene', 'cwuri', 'cwuri') |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
530 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
531 |
and:: |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
532 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
533 |
store.convert_relations('Gene', 'hgnc_id', 'ExternalUri', 'cwuri', 'uri') |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
534 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
535 |
where ``cwuri`` and ``uri`` are the attributes which store the URIs of the entities |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
536 |
defined in the data model, and of the ``ExternalUri`` entities, respectively. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
537 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
538 |
#. flushes all relations and entities:: |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
539 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
540 |
store.flush() |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
541 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
542 |
which performs the actual commit of the inserted entities and relations in the database. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
543 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
544 |
If the ``MassiveObjectStore`` is used, then a cleanup of temporary SQL tables should be performed |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
545 |
at the end of the import:: |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
546 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
547 |
store.cleanup() |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
548 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
549 |
Timing benchmarks |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
550 |
################# |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
551 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
552 |
In order to time the import script, we just decorate the import function with the ``timed`` |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
553 |
decorator:: |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
554 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
555 |
from logilab.common.decorators import timed |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
556 |
... |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
557 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
558 |
@timed |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
559 |
def diseasome_import(session, filename): |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
560 |
... |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
561 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
562 |
After running the import function as shown in the "Importing the data" section, we obtain two time measurements:: |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
563 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
564 |
diseasome_import clock: ... / time: ... |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
565 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
566 |
Here, the meanings of these measurements are [#]_: |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
567 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
568 |
- ``clock`` is the time spent by CubicWeb, on the server side (i.e. hooks and data pre- / post-processing on SQL |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
569 |
queries), |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
570 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
571 |
- ``time`` is the sum between ``clock`` and the time spent in PostGreSQL. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
572 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
573 |
.. [#] The meanings of the ``clock`` and ``time`` measurements, when using the ``@timed`` |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
574 |
decorators, were taken from `a blog post on massive data import in CubicWeb`_. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
575 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
576 |
.. _a blog post on massive data import in CubicWeb: http://www.cubicweb.org/blogentry/2116712 |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
577 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
578 |
The import function is put in an import module, named ``diseasome_import`` here. The module is called |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
579 |
directly from the CubicWeb shell, as follows:: |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
580 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
581 |
cubicweb-ctl shell diseasome_instance diseasome_import.py \ |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
582 |
-- -df diseasome_import_file.nt -st StoreName |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
583 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
584 |
The module accepts two arguments: |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
585 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
586 |
- the data file, introduced by ``-df [--datafile]``, and |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
587 |
- the store, introduced by ``-st [--store]``. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
588 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
589 |
The timings (in seconds) for different stores are given in the following table, for |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
590 |
importing 4213 ``Disease`` entities and 3919 ``Gene`` entities with the import module |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
591 |
just described: |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
592 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
593 |
+--------------------------+------------------------+--------------------------------+------------+ |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
594 |
| Store | CubicWeb time (clock) | PostGreSQL time (time - clock) | Total time | |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
595 |
+==========================+========================+================================+============+ |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
596 |
| ``RQLObjectStore`` | 225.98 | 62.05 | 288.03 | |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
597 |
+--------------------------+------------------------+--------------------------------+------------+ |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
598 |
| ``NoHookRQLObjectStore`` | 62.73 | 51.38 | 114.11 | |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
599 |
+--------------------------+------------------------+--------------------------------+------------+ |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
600 |
| ``SQLGenObjectStore`` | 20.41 | 11.03 | 31.44 | |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
601 |
+--------------------------+------------------------+--------------------------------+------------+ |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
602 |
| ``MassiveObjectStore`` | 4.84 | 6.93 | 11.77 | |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
603 |
+--------------------------+------------------------+--------------------------------+------------+ |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
604 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
605 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
606 |
Conclusions |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
607 |
~~~~~~~~~~~ |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
608 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
609 |
In this tutorial we have seen how to import data in a CubicWeb application instance. We have first seen how to |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
610 |
create a schema, then how to create a parser of the data and a mapping of the data to the schema. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
611 |
Finally, we have seen four ways of importing data into CubicWeb. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
612 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
613 |
Three of those are integrated into CubicWeb, namely the ``RQLObjectStore``, ``NoHookRQLObjectStore`` and |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
614 |
``SQLGenObjectStore`` stores, which have a common API: |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
615 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
616 |
- ``RQLObjectStore`` is by far the slowest, especially its time spent on the |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
617 |
CubicWeb side, and so it should be used only for small amounts of |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
618 |
"sensitive" data (i.e. where security is a concern). |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
619 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
620 |
- ``NoHookRQLObjectStore`` slashes by almost four the time spent on the CubicWeb side, |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
621 |
but is also quite slow; on the PostGres side it is as slow as the previous store. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
622 |
It should be used for data where security is not a concern, |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
623 |
but consistency (with the data model) is. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
624 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
625 |
- ``SQLGenObjectStore`` slashes by three the time spent on the CubicWeb side and by five the time |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
626 |
spent on the PostGreSQL side. It should be used for relatively great amounts of data, where |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
627 |
security and data consistency are not a concern. Compared to the previous store, it has the |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
628 |
disadvantage that, for inlined relations, we must specify their subjects' types. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
629 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
630 |
For really huge amounts of data there is a fourth store, ``MassiveObjectStore``, available |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
631 |
from the ``dataio`` cube. It provides a blazing performance with respect to all other stores: |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
632 |
it is almost 25 times faster than ``RQLObjectStore`` and almost three times faster than |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
633 |
``SQLGenObjectStore``. However, it has a few usage caveats that should be taken into account: |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
634 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
635 |
#. it cannot insert relations defined as inlined in the schema, |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
636 |
#. no security or consistency check is performed on the data, |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
637 |
#. its API is slightly different from the other stores. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
638 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
639 |
Hence, this store should be used when security and data consistency are not a concern, |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
640 |
and there are no inlined relations in the schema. |
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
641 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
642 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
643 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
644 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
645 |
|
8a57802d40d3
[cubicweb/doc] Add tutorial on data import in CubicWeb.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
diff
changeset
|
646 |