author | Sylvain Thénault <sylvain.thenault@logilab.fr> |
Mon, 25 Oct 2010 15:30:50 +0200 | |
changeset 6621 | 11c09415078b |
parent 6492 | 47a284c0d012 |
child 6989 | 4a999a647f52 |
permissions | -rw-r--r-- |
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
1 |
# -*- coding: utf-8 -*- |
5421
8167de96c523
proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
5097
diff
changeset
|
2 |
# copyright 2003-2010 LOGILAB S.A. (Paris, FRANCE), all rights reserved. |
8167de96c523
proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
5097
diff
changeset
|
3 |
# contact http://www.logilab.fr/ -- mailto:contact@logilab.fr |
8167de96c523
proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
5097
diff
changeset
|
4 |
# |
8167de96c523
proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
5097
diff
changeset
|
5 |
# This file is part of CubicWeb. |
8167de96c523
proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
5097
diff
changeset
|
6 |
# |
8167de96c523
proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
5097
diff
changeset
|
7 |
# CubicWeb is free software: you can redistribute it and/or modify it under the |
8167de96c523
proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
5097
diff
changeset
|
8 |
# terms of the GNU Lesser General Public License as published by the Free |
8167de96c523
proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
5097
diff
changeset
|
9 |
# Software Foundation, either version 2.1 of the License, or (at your option) |
8167de96c523
proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
5097
diff
changeset
|
10 |
# any later version. |
8167de96c523
proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
5097
diff
changeset
|
11 |
# |
5424
8ecbcbff9777
replace logilab-common by CubicWeb in disclaimer
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
5421
diff
changeset
|
12 |
# CubicWeb is distributed in the hope that it will be useful, but WITHOUT |
5421
8167de96c523
proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
5097
diff
changeset
|
13 |
# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS |
8167de96c523
proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
5097
diff
changeset
|
14 |
# FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more |
8167de96c523
proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
5097
diff
changeset
|
15 |
# details. |
8167de96c523
proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
5097
diff
changeset
|
16 |
# |
8167de96c523
proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
5097
diff
changeset
|
17 |
# You should have received a copy of the GNU Lesser General Public License along |
8167de96c523
proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
5097
diff
changeset
|
18 |
# with CubicWeb. If not, see <http://www.gnu.org/licenses/>. |
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
19 |
"""This module provides tools to import tabular data. |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
20 |
|
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
21 |
|
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
22 |
|
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
23 |
Example of use (run this with `cubicweb-ctl shell instance import-script.py`): |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
24 |
|
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
25 |
.. sourcecode:: python |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
26 |
|
3318
5b47b9f09bca
documentation : fixed docstring
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents:
3029
diff
changeset
|
27 |
from cubicweb.devtools.dataimport import * |
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
28 |
# define data generators |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
29 |
GENERATORS = [] |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
30 |
|
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
31 |
USERS = [('Prenom', 'firstname', ()), |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
32 |
('Nom', 'surname', ()), |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
33 |
('Identifiant', 'login', ()), |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
34 |
] |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
35 |
|
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
36 |
def gen_users(ctl): |
6133
6f3eabbbdf2e
use iter_and_commit in example
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents:
6122
diff
changeset
|
37 |
for row in ctl.iter_and_commit('utilisateurs'): |
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
38 |
entity = mk_entity(row, USERS) |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
39 |
entity['upassword'] = u'motdepasse' |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
40 |
ctl.check('login', entity['login'], None) |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
41 |
ctl.store.add('CWUser', entity) |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
42 |
email = {'address': row['email']} |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
43 |
ctl.store.add('EmailAddress', email) |
3003
2944ee420dca
R [dataimport] rename uid to eid
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
2974
diff
changeset
|
44 |
ctl.store.relate(entity['eid'], 'use_email', email['eid']) |
2944ee420dca
R [dataimport] rename uid to eid
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
2974
diff
changeset
|
45 |
ctl.store.rql('SET U in_group G WHERE G name "users", U eid %(x)s', {'x':entity['eid']}) |
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
46 |
|
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
47 |
CHK = [('login', check_doubles, 'Utilisateurs Login', |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
48 |
'Deux utilisateurs ne devraient pas avoir le même login.'), |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
49 |
] |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
50 |
|
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
51 |
GENERATORS.append( (gen_users, CHK) ) |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
52 |
|
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
53 |
# create controller |
6122
4d2b04b32cdc
improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents:
5557
diff
changeset
|
54 |
if 'cnx' in globals(): |
4d2b04b32cdc
improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents:
5557
diff
changeset
|
55 |
ctl = CWImportController(RQLObjectStore(cnx)) |
4d2b04b32cdc
improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents:
5557
diff
changeset
|
56 |
else: |
4d2b04b32cdc
improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents:
5557
diff
changeset
|
57 |
print 'debug mode (not connected)' |
4d2b04b32cdc
improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents:
5557
diff
changeset
|
58 |
print 'run through cubicweb-ctl shell to access an instance' |
4d2b04b32cdc
improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents:
5557
diff
changeset
|
59 |
ctl = CWImportController(ObjectStore()) |
4527
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
60 |
ctl.askerror = 1 |
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
61 |
ctl.generators = GENERATORS |
6122
4d2b04b32cdc
improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents:
5557
diff
changeset
|
62 |
ctl.data['utilisateurs'] = lazytable(ucsvreader(open('users.csv'))) |
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
63 |
# run |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
64 |
ctl.run() |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
65 |
|
4912
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
66 |
.. BUG file with one column are not parsable |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
67 |
.. TODO rollback() invocation is not possible yet |
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
68 |
""" |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
69 |
__docformat__ = "restructuredtext en" |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
70 |
|
4186
ca7e526b07b6
import cleanup, check data file exists
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4173
diff
changeset
|
71 |
import sys |
ca7e526b07b6
import cleanup, check data file exists
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4173
diff
changeset
|
72 |
import csv |
ca7e526b07b6
import cleanup, check data file exists
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4173
diff
changeset
|
73 |
import traceback |
ca7e526b07b6
import cleanup, check data file exists
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4173
diff
changeset
|
74 |
import os.path as osp |
ca7e526b07b6
import cleanup, check data file exists
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4173
diff
changeset
|
75 |
from StringIO import StringIO |
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
76 |
from copy import copy |
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
77 |
|
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
78 |
from logilab.common import shellutils |
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
79 |
from logilab.common.date import strptime |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
80 |
from logilab.common.decorators import cached |
4136
47060a66c97f
dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
3486
diff
changeset
|
81 |
from logilab.common.deprecation import deprecated |
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
82 |
|
5066
bf5cbc351e99
[repo] move eschema_eid function from hooks.metadata to server.utils
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
5063
diff
changeset
|
83 |
from cubicweb.server.utils import eschema_eid |
6142
8bc6eac1fac1
[session] cleanup hook / operation / entity edition api
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
6122
diff
changeset
|
84 |
from cubicweb.server.ssplanner import EditedEntity |
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
85 |
|
6122
4d2b04b32cdc
improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents:
5557
diff
changeset
|
86 |
def count_lines(stream_or_filename): |
4d2b04b32cdc
improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents:
5557
diff
changeset
|
87 |
if isinstance(stream_or_filename, basestring): |
6492
47a284c0d012
fix some pylint detected errors
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
6427
diff
changeset
|
88 |
f = open(stream_or_filename) |
6122
4d2b04b32cdc
improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents:
5557
diff
changeset
|
89 |
else: |
4d2b04b32cdc
improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents:
5557
diff
changeset
|
90 |
f = stream_or_filename |
4d2b04b32cdc
improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents:
5557
diff
changeset
|
91 |
f.seek(0) |
4d2b04b32cdc
improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents:
5557
diff
changeset
|
92 |
for i, line in enumerate(f): |
4d2b04b32cdc
improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents:
5557
diff
changeset
|
93 |
pass |
4d2b04b32cdc
improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents:
5557
diff
changeset
|
94 |
f.seek(0) |
4d2b04b32cdc
improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents:
5557
diff
changeset
|
95 |
return i+1 |
4d2b04b32cdc
improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents:
5557
diff
changeset
|
96 |
|
4d2b04b32cdc
improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents:
5557
diff
changeset
|
97 |
def ucsvreader_pb(stream_or_path, encoding='utf-8', separator=',', quote='"', |
4136
47060a66c97f
dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
3486
diff
changeset
|
98 |
skipfirst=False, withpb=True): |
47060a66c97f
dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
3486
diff
changeset
|
99 |
"""same as ucsvreader but a progress bar is displayed as we iter on rows""" |
6122
4d2b04b32cdc
improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents:
5557
diff
changeset
|
100 |
if isinstance(stream_or_path, basestring): |
6492
47a284c0d012
fix some pylint detected errors
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
6427
diff
changeset
|
101 |
if not osp.exists(stream_or_path): |
47a284c0d012
fix some pylint detected errors
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
6427
diff
changeset
|
102 |
raise Exception("file doesn't exists: %s" % stream_or_path) |
6122
4d2b04b32cdc
improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents:
5557
diff
changeset
|
103 |
stream = open(stream_or_path) |
4d2b04b32cdc
improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents:
5557
diff
changeset
|
104 |
else: |
4d2b04b32cdc
improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents:
5557
diff
changeset
|
105 |
stream = stream_or_path |
4d2b04b32cdc
improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents:
5557
diff
changeset
|
106 |
rowcount = count_lines(stream) |
4136
47060a66c97f
dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
3486
diff
changeset
|
107 |
if skipfirst: |
47060a66c97f
dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
3486
diff
changeset
|
108 |
rowcount -= 1 |
47060a66c97f
dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
3486
diff
changeset
|
109 |
if withpb: |
4140
46ddd27a4ca4
tweaks output
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4136
diff
changeset
|
110 |
pb = shellutils.ProgressBar(rowcount, 50) |
6122
4d2b04b32cdc
improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents:
5557
diff
changeset
|
111 |
for urow in ucsvreader(stream, encoding, separator, quote, skipfirst): |
4136
47060a66c97f
dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
3486
diff
changeset
|
112 |
yield urow |
47060a66c97f
dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
3486
diff
changeset
|
113 |
if withpb: |
47060a66c97f
dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
3486
diff
changeset
|
114 |
pb.update() |
47060a66c97f
dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
3486
diff
changeset
|
115 |
print ' %s rows imported' % rowcount |
47060a66c97f
dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
3486
diff
changeset
|
116 |
|
47060a66c97f
dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
3486
diff
changeset
|
117 |
def ucsvreader(stream, encoding='utf-8', separator=',', quote='"', |
47060a66c97f
dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
3486
diff
changeset
|
118 |
skipfirst=False): |
47060a66c97f
dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
3486
diff
changeset
|
119 |
"""A csv reader that accepts files with any encoding and outputs unicode |
47060a66c97f
dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
3486
diff
changeset
|
120 |
strings |
47060a66c97f
dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
3486
diff
changeset
|
121 |
""" |
47060a66c97f
dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
3486
diff
changeset
|
122 |
it = iter(csv.reader(stream, delimiter=separator, quotechar=quote)) |
47060a66c97f
dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
3486
diff
changeset
|
123 |
if skipfirst: |
47060a66c97f
dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
3486
diff
changeset
|
124 |
it.next() |
47060a66c97f
dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
3486
diff
changeset
|
125 |
for row in it: |
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
126 |
yield [item.decode(encoding) for item in row] |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
127 |
|
6136
79da6f969b15
[dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
6133
diff
changeset
|
128 |
def callfunc_every(func, number, iterable): |
79da6f969b15
[dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
6133
diff
changeset
|
129 |
"""yield items of `iterable` one by one and call function `func` |
79da6f969b15
[dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
6133
diff
changeset
|
130 |
every `number` iterations. Always call function `func` at the end. |
79da6f969b15
[dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
6133
diff
changeset
|
131 |
""" |
79da6f969b15
[dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
6133
diff
changeset
|
132 |
for idx, item in enumerate(iterable): |
79da6f969b15
[dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
6133
diff
changeset
|
133 |
yield item |
79da6f969b15
[dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
6133
diff
changeset
|
134 |
if idx % number: |
79da6f969b15
[dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
6133
diff
changeset
|
135 |
func() |
79da6f969b15
[dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
6133
diff
changeset
|
136 |
func() |
4152
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
137 |
|
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
138 |
def lazytable(reader): |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
139 |
"""The first row is taken to be the header of the table and |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
140 |
used to output a dict for each row of data. |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
141 |
|
6122
4d2b04b32cdc
improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents:
5557
diff
changeset
|
142 |
>>> data = lazytable(ucsvreader(open(filename))) |
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
143 |
""" |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
144 |
header = reader.next() |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
145 |
for row in reader: |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
146 |
yield dict(zip(header, row)) |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
147 |
|
4152
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
148 |
def mk_entity(row, map): |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
149 |
"""Return a dict made from sanitized mapped values. |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
150 |
|
4912
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
151 |
ValueError can be raised on unexpected values found in checkers |
4527
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
152 |
|
4152
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
153 |
>>> row = {'myname': u'dupont'} |
4912
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
154 |
>>> map = [('myname', u'name', (call_transform_method('title'),))] |
4152
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
155 |
>>> mk_entity(row, map) |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
156 |
{'name': u'Dupont'} |
4527
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
157 |
>>> row = {'myname': u'dupont', 'optname': u''} |
4912
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
158 |
>>> map = [('myname', u'name', (call_transform_method('title'),)), |
4527
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
159 |
... ('optname', u'MARKER', (optional,))] |
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
160 |
>>> mk_entity(row, map) |
4912
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
161 |
{'name': u'Dupont', 'optname': None} |
4152
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
162 |
""" |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
163 |
res = {} |
4527
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
164 |
assert isinstance(row, dict) |
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
165 |
assert isinstance(map, list) |
4152
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
166 |
for src, dest, funcs in map: |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
167 |
res[dest] = row[src] |
4527
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
168 |
try: |
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
169 |
for func in funcs: |
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
170 |
res[dest] = func(res[dest]) |
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
171 |
if res[dest] is None: |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
172 |
break |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
173 |
except ValueError, err: |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
174 |
raise ValueError('error with %r field: %s' % (src, err)) |
4152
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
175 |
return res |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
176 |
|
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
177 |
|
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
178 |
# user interactions ############################################################ |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
179 |
|
3029
bc573d5fb5b7
F [devtools] by default dataimport prints message on stdout
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
3003
diff
changeset
|
180 |
def tell(msg): |
bc573d5fb5b7
F [devtools] by default dataimport prints message on stdout
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
3003
diff
changeset
|
181 |
print msg |
bc573d5fb5b7
F [devtools] by default dataimport prints message on stdout
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
3003
diff
changeset
|
182 |
|
4152
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
183 |
def confirm(question): |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
184 |
"""A confirm function that asks for yes/no/abort and exits on abort.""" |
4721
8f63691ccb7f
pylint style fixes
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4613
diff
changeset
|
185 |
answer = shellutils.ASK.ask(question, ('Y', 'n', 'abort'), 'Y') |
4152
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
186 |
if answer == 'abort': |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
187 |
sys.exit(1) |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
188 |
return answer == 'Y' |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
189 |
|
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
190 |
|
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
191 |
class catch_error(object): |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
192 |
"""Helper for @contextmanager decorator.""" |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
193 |
|
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
194 |
def __init__(self, ctl, key='unexpected error', msg=None): |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
195 |
self.ctl = ctl |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
196 |
self.key = key |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
197 |
self.msg = msg |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
198 |
|
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
199 |
def __enter__(self): |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
200 |
return self |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
201 |
|
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
202 |
def __exit__(self, type, value, traceback): |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
203 |
if type is not None: |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
204 |
if issubclass(type, (KeyboardInterrupt, SystemExit)): |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
205 |
return # re-raise |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
206 |
if self.ctl.catcherrors: |
4173
cfd5d3270f99
msg isn't defined there, but we've to give traceback information to record error
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4152
diff
changeset
|
207 |
self.ctl.record_error(self.key, None, type, value, traceback) |
4152
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
208 |
return True # silent |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
209 |
|
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
210 |
|
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
211 |
# base sanitizing/coercing functions ########################################### |
4527
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
212 |
|
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
213 |
def optional(value): |
4912
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
214 |
"""checker to filter optional field |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
215 |
|
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
216 |
If value is undefined (ex: empty string), return None that will |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
217 |
break the checkers validation chain |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
218 |
|
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
219 |
General use is to add 'optional' check in first condition to avoid |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
220 |
ValueError by further checkers |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
221 |
|
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
222 |
>>> MAPPER = [(u'value', 'value', (optional, int))] |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
223 |
>>> row = {'value': u'XXX'} |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
224 |
>>> mk_entity(row, MAPPER) |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
225 |
{'value': None} |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
226 |
>>> row = {'value': u'100'} |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
227 |
>>> mk_entity(row, MAPPER) |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
228 |
{'value': 100} |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
229 |
""" |
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
230 |
if value: |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
231 |
return value |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
232 |
return None |
4527
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
233 |
|
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
234 |
def required(value): |
6136
79da6f969b15
[dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
6133
diff
changeset
|
235 |
"""raise ValueError if value is empty |
4527
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
236 |
|
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
237 |
This check should be often found in last position in the chain. |
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
238 |
""" |
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
239 |
if value: |
4527
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
240 |
return value |
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
241 |
raise ValueError("required") |
4527
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
242 |
|
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
243 |
def todatetime(format='%d/%m/%Y'): |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
244 |
"""return a transformation function to turn string input value into a |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
245 |
`datetime.datetime` instance, using given format. |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
246 |
|
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
247 |
Follow it by `todate` or `totime` functions from `logilab.common.date` if |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
248 |
you want a `date`/`time` instance instead of `datetime`. |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
249 |
""" |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
250 |
def coerce(value): |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
251 |
return strptime(value, format) |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
252 |
return coerce |
4527
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
253 |
|
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
254 |
def call_transform_method(methodname, *args, **kwargs): |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
255 |
"""return value returned by calling the given method on input""" |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
256 |
def coerce(value): |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
257 |
return getattr(value, methodname)(*args, **kwargs) |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
258 |
return coerce |
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
259 |
|
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
260 |
def call_check_method(methodname, *args, **kwargs): |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
261 |
"""check value returned by calling the given method on input is true, |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
262 |
else raise ValueError |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
263 |
""" |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
264 |
def check(value): |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
265 |
if getattr(value, methodname)(*args, **kwargs): |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
266 |
return value |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
267 |
raise ValueError('%s not verified on %r' % (methodname, value)) |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
268 |
return check |
4152
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
269 |
|
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
270 |
# base integrity checking functions ############################################ |
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
271 |
|
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
272 |
def check_doubles(buckets): |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
273 |
"""Extract the keys that have more than one item in their bucket.""" |
4721
8f63691ccb7f
pylint style fixes
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4613
diff
changeset
|
274 |
return [(k, len(v)) for k, v in buckets.items() if len(v) > 1] |
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
275 |
|
4136
47060a66c97f
dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
3486
diff
changeset
|
276 |
def check_doubles_not_none(buckets): |
47060a66c97f
dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
3486
diff
changeset
|
277 |
"""Extract the keys that have more than one item in their bucket.""" |
4721
8f63691ccb7f
pylint style fixes
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4613
diff
changeset
|
278 |
return [(k, len(v)) for k, v in buckets.items() |
8f63691ccb7f
pylint style fixes
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4613
diff
changeset
|
279 |
if k is not None and len(v) > 1] |
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
280 |
|
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
281 |
|
4152
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
282 |
# object stores ################################################################# |
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
283 |
|
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
284 |
class ObjectStore(object): |
4527
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
285 |
"""Store objects in memory for *faster* validation (development mode) |
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
286 |
|
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
287 |
But it will not enforce the constraints of the schema and hence will miss some problems |
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
288 |
|
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
289 |
>>> store = ObjectStore() |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
290 |
>>> user = {'login': 'johndoe'} |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
291 |
>>> store.add('CWUser', user) |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
292 |
>>> group = {'name': 'unknown'} |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
293 |
>>> store.add('CWUser', group) |
3003
2944ee420dca
R [dataimport] rename uid to eid
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
2974
diff
changeset
|
294 |
>>> store.relate(user['eid'], 'in_group', group['eid']) |
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
295 |
""" |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
296 |
def __init__(self): |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
297 |
self.items = [] |
3003
2944ee420dca
R [dataimport] rename uid to eid
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
2974
diff
changeset
|
298 |
self.eids = {} |
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
299 |
self.types = {} |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
300 |
self.relations = set() |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
301 |
self.indexes = {} |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
302 |
self._rql = None |
4912
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
303 |
self._commit = None |
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
304 |
|
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
305 |
def _put(self, type, item): |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
306 |
self.items.append(item) |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
307 |
return len(self.items) - 1 |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
308 |
|
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
309 |
def add(self, type, item): |
3486
ea6bf6f9ba0c
[cwctl] improve dialog messages
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
3318
diff
changeset
|
310 |
assert isinstance(item, dict), 'item is not a dict but a %s' % type(item) |
3003
2944ee420dca
R [dataimport] rename uid to eid
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
2974
diff
changeset
|
311 |
eid = item['eid'] = self._put(type, item) |
2944ee420dca
R [dataimport] rename uid to eid
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
2974
diff
changeset
|
312 |
self.eids[eid] = item |
2944ee420dca
R [dataimport] rename uid to eid
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
2974
diff
changeset
|
313 |
self.types.setdefault(type, []).append(eid) |
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
314 |
|
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
315 |
def relate(self, eid_from, rtype, eid_to, inlined=False): |
4912
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
316 |
"""Add new relation""" |
4527
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
317 |
relation = eid_from, rtype, eid_to |
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
318 |
self.relations.add(relation) |
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
319 |
return relation |
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
320 |
|
4912
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
321 |
def commit(self): |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
322 |
"""this commit method do nothing by default |
4527
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
323 |
|
4912
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
324 |
This is voluntary to use the frequent autocommit feature in CubicWeb |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
325 |
when you are using hooks or another |
4527
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
326 |
|
4912
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
327 |
If you want override commit method, please set it by the |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
328 |
constructor |
4527
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
329 |
""" |
4912
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
330 |
pass |
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
331 |
|
4527
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
332 |
def rql(self, *args): |
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
333 |
if self._rql is not None: |
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
334 |
return self._rql(*args) |
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
335 |
|
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
336 |
@property |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
337 |
def nb_inserted_entities(self): |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
338 |
return len(self.eids) |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
339 |
@property |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
340 |
def nb_inserted_types(self): |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
341 |
return len(self.types) |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
342 |
@property |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
343 |
def nb_inserted_relations(self): |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
344 |
return len(self.relations) |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
345 |
|
4912
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
346 |
@deprecated("[3.7] index support will disappear") |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
347 |
def build_index(self, name, type, func=None, can_be_empty=False): |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
348 |
"""build internal index for further search""" |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
349 |
index = {} |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
350 |
if func is None or not callable(func): |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
351 |
func = lambda x: x['eid'] |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
352 |
for eid in self.types[type]: |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
353 |
index.setdefault(func(self.eids[eid]), []).append(eid) |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
354 |
if not can_be_empty: |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
355 |
assert index, "new index '%s' cannot be empty" % name |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
356 |
self.indexes[name] = index |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
357 |
|
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
358 |
@deprecated("[3.7] index support will disappear") |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
359 |
def build_rqlindex(self, name, type, key, rql, rql_params=False, |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
360 |
func=None, can_be_empty=False): |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
361 |
"""build an index by rql query |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
362 |
|
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
363 |
rql should return eid in first column |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
364 |
ctl.store.build_index('index_name', 'users', 'login', 'Any U WHERE U is CWUser') |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
365 |
""" |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
366 |
self.types[type] = [] |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
367 |
rset = self.rql(rql, rql_params or {}) |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
368 |
if not can_be_empty: |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
369 |
assert rset, "new index type '%s' cannot be empty (0 record found)" % type |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
370 |
for entity in rset.entities(): |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
371 |
getattr(entity, key) # autopopulate entity with key attribute |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
372 |
self.eids[entity.eid] = dict(entity) |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
373 |
if entity.eid not in self.types[type]: |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
374 |
self.types[type].append(entity.eid) |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
375 |
|
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
376 |
# Build index with specified key |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
377 |
func = lambda x: x[key] |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
378 |
self.build_index(name, type, func, can_be_empty=can_be_empty) |
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
379 |
|
4912
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
380 |
@deprecated("[3.7] index support will disappear") |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
381 |
def fetch(self, name, key, unique=False, decorator=None): |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
382 |
"""index fetcher method |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
383 |
|
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
384 |
decorator is a callable method or an iterator of callable methods (usually a lambda function) |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
385 |
decorator=lambda x: x[:1] (first value is returned) |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
386 |
decorator=lambda x: x.lower (lowercased value is returned) |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
387 |
|
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
388 |
decorator is handy when you want to improve index keys but without |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
389 |
changing the original field |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
390 |
|
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
391 |
Same check functions can be reused here. |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
392 |
""" |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
393 |
eids = self.indexes[name].get(key, []) |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
394 |
if decorator is not None: |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
395 |
if not hasattr(decorator, '__iter__'): |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
396 |
decorator = (decorator,) |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
397 |
for f in decorator: |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
398 |
eids = f(eids) |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
399 |
if unique: |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
400 |
assert len(eids) == 1, u'expected a single one value for key "%s" in index "%s". Got %i' % (key, name, len(eids)) |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
401 |
eids = eids[0] |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
402 |
return eids |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
403 |
|
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
404 |
@deprecated("[3.7] index support will disappear") |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
405 |
def find(self, type, key, value): |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
406 |
for idx in self.types[type]: |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
407 |
item = self.items[idx] |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
408 |
if item[key] == value: |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
409 |
yield item |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
410 |
|
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
411 |
@deprecated("[3.7] checkpoint() deprecated. use commit() instead") |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
412 |
def checkpoint(self): |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
413 |
self.commit() |
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
414 |
|
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
415 |
|
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
416 |
class RQLObjectStore(ObjectStore): |
4527
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
417 |
"""ObjectStore that works with an actual RQL repository (production mode)""" |
4136
47060a66c97f
dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
3486
diff
changeset
|
418 |
_rql = None # bw compat |
47060a66c97f
dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
3486
diff
changeset
|
419 |
|
4912
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
420 |
def __init__(self, session=None, commit=None): |
4136
47060a66c97f
dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
3486
diff
changeset
|
421 |
ObjectStore.__init__(self) |
6122
4d2b04b32cdc
improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents:
5557
diff
changeset
|
422 |
if session is None: |
4d2b04b32cdc
improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents:
5557
diff
changeset
|
423 |
sys.exit('please provide a session of run this script with cubicweb-ctl shell and pass cnx as session') |
4d2b04b32cdc
improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents:
5557
diff
changeset
|
424 |
if not hasattr(session, 'set_pool'): |
4d2b04b32cdc
improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents:
5557
diff
changeset
|
425 |
# connection |
4d2b04b32cdc
improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents:
5557
diff
changeset
|
426 |
cnx = session |
4d2b04b32cdc
improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents:
5557
diff
changeset
|
427 |
session = session.request() |
4d2b04b32cdc
improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents:
5557
diff
changeset
|
428 |
session.set_pool = lambda : None |
4d2b04b32cdc
improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents:
5557
diff
changeset
|
429 |
commit = commit or cnx.commit |
4d2b04b32cdc
improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents:
5557
diff
changeset
|
430 |
else: |
4d2b04b32cdc
improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents:
5557
diff
changeset
|
431 |
session.set_pool() |
4d2b04b32cdc
improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents:
5557
diff
changeset
|
432 |
self.session = session |
4d2b04b32cdc
improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents:
5557
diff
changeset
|
433 |
self._commit = commit or session.commit |
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
434 |
|
4912
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
435 |
@deprecated("[3.7] checkpoint() deprecated. use commit() instead") |
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
436 |
def checkpoint(self): |
4912
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
437 |
self.commit() |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
438 |
|
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
439 |
def commit(self): |
5063
2a94b61837e1
[dataimport] stop disabling undo ; commit return transaction id
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
5054
diff
changeset
|
440 |
txuuid = self._commit() |
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
441 |
self.session.set_pool() |
5063
2a94b61837e1
[dataimport] stop disabling undo ; commit return transaction id
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
5054
diff
changeset
|
442 |
return txuuid |
4136
47060a66c97f
dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
3486
diff
changeset
|
443 |
|
47060a66c97f
dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
3486
diff
changeset
|
444 |
def rql(self, *args): |
47060a66c97f
dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
3486
diff
changeset
|
445 |
if self._rql is not None: |
47060a66c97f
dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
3486
diff
changeset
|
446 |
return self._rql(*args) |
47060a66c97f
dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
3486
diff
changeset
|
447 |
return self.session.execute(*args) |
47060a66c97f
dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
3486
diff
changeset
|
448 |
|
47060a66c97f
dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
3486
diff
changeset
|
449 |
def create_entity(self, *args, **kwargs): |
47060a66c97f
dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
3486
diff
changeset
|
450 |
entity = self.session.create_entity(*args, **kwargs) |
47060a66c97f
dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
3486
diff
changeset
|
451 |
self.eids[entity.eid] = entity |
47060a66c97f
dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
3486
diff
changeset
|
452 |
self.types.setdefault(args[0], []).append(entity.eid) |
47060a66c97f
dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
3486
diff
changeset
|
453 |
return entity |
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
454 |
|
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
455 |
def _put(self, type, item): |
4721
8f63691ccb7f
pylint style fixes
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4613
diff
changeset
|
456 |
query = ('INSERT %s X: ' % type) + ', '.join('X %s %%(%s)s' % (k, k) |
4734 | 457 |
for k in item) |
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
458 |
return self.rql(query, item)[0][0] |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
459 |
|
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
460 |
def relate(self, eid_from, rtype, eid_to, inlined=False): |
4721
8f63691ccb7f
pylint style fixes
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4613
diff
changeset
|
461 |
eid_from, rtype, eid_to = super(RQLObjectStore, self).relate( |
8f63691ccb7f
pylint style fixes
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4613
diff
changeset
|
462 |
eid_from, rtype, eid_to) |
4136
47060a66c97f
dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
3486
diff
changeset
|
463 |
self.rql('SET X %s Y WHERE X eid %%(x)s, Y eid %%(y)s' % rtype, |
47060a66c97f
dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
3486
diff
changeset
|
464 |
{'x': int(eid_from), 'y': int(eid_to)}, ('x', 'y')) |
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
465 |
|
4152
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
466 |
|
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
467 |
# the import controller ######################################################## |
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
468 |
|
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
469 |
class CWImportController(object): |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
470 |
"""Controller of the data import process. |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
471 |
|
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
472 |
>>> ctl = CWImportController(store) |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
473 |
>>> ctl.generators = list_of_data_generators |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
474 |
>>> ctl.data = dict_of_data_tables |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
475 |
>>> ctl.run() |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
476 |
""" |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
477 |
|
4527
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
478 |
def __init__(self, store, askerror=0, catcherrors=None, tell=tell, |
4152
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
479 |
commitevery=50): |
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
480 |
self.store = store |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
481 |
self.generators = None |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
482 |
self.data = {} |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
483 |
self.errors = None |
4152
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
484 |
self.askerror = askerror |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
485 |
if catcherrors is None: |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
486 |
catcherrors = askerror |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
487 |
self.catcherrors = catcherrors |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
488 |
self.commitevery = commitevery # set to None to do a single commit |
3029
bc573d5fb5b7
F [devtools] by default dataimport prints message on stdout
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
3003
diff
changeset
|
489 |
self._tell = tell |
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
490 |
|
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
491 |
def check(self, type, key, value): |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
492 |
self._checks.setdefault(type, {}).setdefault(key, []).append(value) |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
493 |
|
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
494 |
def check_map(self, entity, key, map, default): |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
495 |
try: |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
496 |
entity[key] = map[entity[key]] |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
497 |
except KeyError: |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
498 |
self.check(key, entity[key], None) |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
499 |
entity[key] = default |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
500 |
|
4152
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
501 |
def record_error(self, key, msg=None, type=None, value=None, tb=None): |
4186
ca7e526b07b6
import cleanup, check data file exists
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4173
diff
changeset
|
502 |
tmp = StringIO() |
4152
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
503 |
if type is None: |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
504 |
traceback.print_exc(file=tmp) |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
505 |
else: |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
506 |
traceback.print_exception(type, value, tb, file=tmp) |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
507 |
print tmp.getvalue() |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
508 |
# use a list to avoid counting a <nb lines> errors instead of one |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
509 |
errorlog = self.errors.setdefault(key, []) |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
510 |
if msg is None: |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
511 |
errorlog.append(tmp.getvalue().splitlines()) |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
512 |
else: |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
513 |
errorlog.append( (msg, tmp.getvalue().splitlines()) ) |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
514 |
|
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
515 |
def run(self): |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
516 |
self.errors = {} |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
517 |
for func, checks in self.generators: |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
518 |
self._checks = {} |
4912
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
519 |
func_name = func.__name__ |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
520 |
self.tell("Run import function '%s'..." % func_name) |
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
521 |
try: |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
522 |
func(self) |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
523 |
except: |
4152
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
524 |
if self.catcherrors: |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
525 |
self.record_error(func_name, 'While calling %s' % func.__name__) |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
526 |
else: |
4912
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
527 |
self._print_stats() |
4152
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
528 |
raise |
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
529 |
for key, func, title, help in checks: |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
530 |
buckets = self._checks.get(key) |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
531 |
if buckets: |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
532 |
err = func(buckets) |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
533 |
if err: |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
534 |
self.errors[title] = (help, err) |
5097
60a237638f57
[dataimport] print transaction id when we get one
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
5066
diff
changeset
|
535 |
txuuid = self.store.commit() |
4912
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
536 |
self._print_stats() |
4527
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
537 |
if self.errors: |
4721
8f63691ccb7f
pylint style fixes
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4613
diff
changeset
|
538 |
if self.askerror == 2 or (self.askerror and confirm('Display errors ?')): |
4527
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
539 |
from pprint import pformat |
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
540 |
for errkey, error in self.errors.items(): |
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
541 |
self.tell("\n%s (%s): %d\n" % (error[0], errkey, len(error[1]))) |
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
542 |
self.tell(pformat(sorted(error[1]))) |
5097
60a237638f57
[dataimport] print transaction id when we get one
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
5066
diff
changeset
|
543 |
if txuuid is not None: |
60a237638f57
[dataimport] print transaction id when we get one
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
5066
diff
changeset
|
544 |
print 'transaction id:', txuuid |
4912
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
545 |
def _print_stats(self): |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
546 |
nberrors = sum(len(err[1]) for err in self.errors.values()) |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
547 |
self.tell('\nImport statistics: %i entities, %i types, %i relations and %i errors' |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
548 |
% (self.store.nb_inserted_entities, |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
549 |
self.store.nb_inserted_types, |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
550 |
self.store.nb_inserted_relations, |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
551 |
nberrors)) |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
552 |
|
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
553 |
def get_data(self, key): |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
554 |
return self.data.get(key) |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
555 |
|
4527
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
556 |
def index(self, name, key, value, unique=False): |
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
557 |
"""create a new index |
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
558 |
|
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
559 |
If unique is set to True, only first occurence will be kept not the following ones |
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
560 |
""" |
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
561 |
if unique: |
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
562 |
try: |
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
563 |
if value in self.store.indexes[name][key]: |
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
564 |
return |
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
565 |
except KeyError: |
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
566 |
# we're sure that one is the first occurence; so continue... |
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
567 |
pass |
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
568 |
self.store.indexes.setdefault(name, {}).setdefault(key, []).append(value) |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
569 |
|
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
570 |
def tell(self, msg): |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
571 |
self._tell(msg) |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
572 |
|
4152
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
573 |
def iter_and_commit(self, datakey): |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
574 |
"""iter rows, triggering commit every self.commitevery iterations""" |
6136
79da6f969b15
[dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
6133
diff
changeset
|
575 |
if self.commitevery is None: |
79da6f969b15
[dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
6133
diff
changeset
|
576 |
return self.get_data(datakey) |
79da6f969b15
[dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
6133
diff
changeset
|
577 |
else: |
6169
55378e1bab1b
fix order of parameters in call to callfunc_every
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents:
6136
diff
changeset
|
578 |
return callfunc_every(self.store.commit, |
55378e1bab1b
fix order of parameters in call to callfunc_every
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents:
6136
diff
changeset
|
579 |
self.commitevery, |
55378e1bab1b
fix order of parameters in call to callfunc_every
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents:
6136
diff
changeset
|
580 |
self.get_data(datakey)) |
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
581 |
|
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
582 |
|
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
583 |
|
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
584 |
from datetime import datetime |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
585 |
from cubicweb.schema import META_RTYPES, VIRTUAL_RTYPES |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
586 |
|
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
587 |
|
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
588 |
class NoHookRQLObjectStore(RQLObjectStore): |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
589 |
"""ObjectStore that works with an actual RQL repository (production mode)""" |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
590 |
_rql = None # bw compat |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
591 |
|
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
592 |
def __init__(self, session, metagen=None, baseurl=None): |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
593 |
super(NoHookRQLObjectStore, self).__init__(session) |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
594 |
self.source = session.repo.system_source |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
595 |
self.rschema = session.repo.schema.rschema |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
596 |
self.add_relation = self.source.add_relation |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
597 |
if metagen is None: |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
598 |
metagen = MetaGenerator(session, baseurl) |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
599 |
self.metagen = metagen |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
600 |
self._nb_inserted_entities = 0 |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
601 |
self._nb_inserted_types = 0 |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
602 |
self._nb_inserted_relations = 0 |
5054
cb066d29166a
fix dataimport for 3.7.2
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4913
diff
changeset
|
603 |
self.rql = session.execute |
cb066d29166a
fix dataimport for 3.7.2
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4913
diff
changeset
|
604 |
# deactivate security |
cb066d29166a
fix dataimport for 3.7.2
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4913
diff
changeset
|
605 |
session.set_read_security(False) |
cb066d29166a
fix dataimport for 3.7.2
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4913
diff
changeset
|
606 |
session.set_write_security(False) |
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
607 |
|
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
608 |
def create_entity(self, etype, **kwargs): |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
609 |
for k, v in kwargs.iteritems(): |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
610 |
kwargs[k] = getattr(v, 'eid', v) |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
611 |
entity, rels = self.metagen.base_etype_dicts(etype) |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
612 |
entity = copy(entity) |
5557
1a534c596bff
[entity] continue cleanup of Entity/AnyEntity namespace
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
5424
diff
changeset
|
613 |
entity.cw_clear_relation_cache() |
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
614 |
self.metagen.init_entity(entity) |
6142
8bc6eac1fac1
[session] cleanup hook / operation / entity edition api
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
6122
diff
changeset
|
615 |
entity.cw_edited.update(kwargs, skipsec=False) |
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
616 |
session = self.session |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
617 |
self.source.add_entity(session, entity) |
5054
cb066d29166a
fix dataimport for 3.7.2
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4913
diff
changeset
|
618 |
self.source.add_info(session, entity, self.source, None, complete=False) |
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
619 |
for rtype, targeteids in rels.iteritems(): |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
620 |
# targeteids may be a single eid or a list of eids |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
621 |
inlined = self.rschema(rtype).inlined |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
622 |
try: |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
623 |
for targeteid in targeteids: |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
624 |
self.add_relation(session, entity.eid, rtype, targeteid, |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
625 |
inlined) |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
626 |
except TypeError: |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
627 |
self.add_relation(session, entity.eid, rtype, targeteids, |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
628 |
inlined) |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
629 |
self._nb_inserted_entities += 1 |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
630 |
return entity |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
631 |
|
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
632 |
def relate(self, eid_from, rtype, eid_to): |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
633 |
assert not rtype.startswith('reverse_') |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
634 |
self.add_relation(self.session, eid_from, rtype, eid_to, |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
635 |
self.rschema(rtype).inlined) |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
636 |
self._nb_inserted_relations += 1 |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
637 |
|
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
638 |
@property |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
639 |
def nb_inserted_entities(self): |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
640 |
return self._nb_inserted_entities |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
641 |
@property |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
642 |
def nb_inserted_types(self): |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
643 |
return self._nb_inserted_types |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
644 |
@property |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
645 |
def nb_inserted_relations(self): |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
646 |
return self._nb_inserted_relations |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
647 |
|
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
648 |
def _put(self, type, item): |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
649 |
raise RuntimeError('use create entity') |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
650 |
|
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
651 |
|
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
652 |
class MetaGenerator(object): |
6427
c8a5ac2d1eaa
[schema / sources] store data sources as cubicweb entities
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
6173
diff
changeset
|
653 |
META_RELATIONS = (META_RTYPES |
c8a5ac2d1eaa
[schema / sources] store data sources as cubicweb entities
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
6173
diff
changeset
|
654 |
- VIRTUAL_RTYPES |
c8a5ac2d1eaa
[schema / sources] store data sources as cubicweb entities
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
6173
diff
changeset
|
655 |
- set(('eid', 'cwuri', |
c8a5ac2d1eaa
[schema / sources] store data sources as cubicweb entities
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
6173
diff
changeset
|
656 |
'is', 'is_instance_of', 'cw_source'))) |
c8a5ac2d1eaa
[schema / sources] store data sources as cubicweb entities
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
6173
diff
changeset
|
657 |
|
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
658 |
def __init__(self, session, baseurl=None): |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
659 |
self.session = session |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
660 |
self.source = session.repo.system_source |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
661 |
self.time = datetime.now() |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
662 |
if baseurl is None: |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
663 |
config = session.vreg.config |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
664 |
baseurl = config['base-url'] or config.default_base_url() |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
665 |
if not baseurl[-1] == '/': |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
666 |
baseurl += '/' |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
667 |
self.baseurl = baseurl |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
668 |
# attributes/relations shared by all entities of the same type |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
669 |
self.etype_attrs = [] |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
670 |
self.etype_rels = [] |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
671 |
# attributes/relations specific to each entity |
5054
cb066d29166a
fix dataimport for 3.7.2
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4913
diff
changeset
|
672 |
self.entity_attrs = ['cwuri'] |
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
673 |
#self.entity_rels = [] XXX not handled (YAGNI?) |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
674 |
schema = session.vreg.schema |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
675 |
rschema = schema.rschema |
6427
c8a5ac2d1eaa
[schema / sources] store data sources as cubicweb entities
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
6173
diff
changeset
|
676 |
for rtype in self.META_RELATIONS: |
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
677 |
if rschema(rtype).final: |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
678 |
self.etype_attrs.append(rtype) |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
679 |
else: |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
680 |
self.etype_rels.append(rtype) |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
681 |
|
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
682 |
@cached |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
683 |
def base_etype_dicts(self, etype): |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
684 |
entity = self.session.vreg['etypes'].etype_class(etype)(self.session) |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
685 |
# entity are "surface" copied, avoid shared dict between copies |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
686 |
del entity.cw_extra_kwargs |
6142
8bc6eac1fac1
[session] cleanup hook / operation / entity edition api
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
6122
diff
changeset
|
687 |
entity.cw_edited = EditedEntity(entity) |
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
688 |
for attr in self.etype_attrs: |
6142
8bc6eac1fac1
[session] cleanup hook / operation / entity edition api
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
6122
diff
changeset
|
689 |
entity.cw_edited.attribute_edited(attr, self.generate(entity, attr)) |
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
690 |
rels = {} |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
691 |
for rel in self.etype_rels: |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
692 |
rels[rel] = self.generate(entity, rel) |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
693 |
return entity, rels |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
694 |
|
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
695 |
def init_entity(self, entity): |
5054
cb066d29166a
fix dataimport for 3.7.2
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4913
diff
changeset
|
696 |
entity.eid = self.source.create_eid(self.session) |
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
697 |
for attr in self.entity_attrs: |
6142
8bc6eac1fac1
[session] cleanup hook / operation / entity edition api
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
6122
diff
changeset
|
698 |
entity.cw_edited.attribute_edited(attr, self.generate(entity, attr)) |
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
699 |
|
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
700 |
def generate(self, entity, rtype): |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
701 |
return getattr(self, 'gen_%s' % rtype)(entity) |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
702 |
|
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
703 |
def gen_cwuri(self, entity): |
5054
cb066d29166a
fix dataimport for 3.7.2
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4913
diff
changeset
|
704 |
return u'%seid/%s' % (self.baseurl, entity.eid) |
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
705 |
|
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
706 |
def gen_creation_date(self, entity): |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
707 |
return self.time |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
708 |
def gen_modification_date(self, entity): |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
709 |
return self.time |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
710 |
|
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
711 |
def gen_created_by(self, entity): |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
712 |
return self.session.user.eid |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
713 |
def gen_owned_by(self, entity): |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
714 |
return self.session.user.eid |