author | Sylvain Thénault <sylvain.thenault@logilab.fr> |
Wed, 02 Sep 2015 15:31:18 +0200 | |
changeset 10634 | 06a43f727601 |
parent 10295 | 080ac14df6fa |
child 10349 | efbbf1e93a04 |
child 10971 | de59a60a9e40 |
permissions | -rw-r--r-- |
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
1 |
# -*- coding: utf-8 -*- |
10007
727bbb361ed1
remove 3.11 bw compat
Aurelien Campeas <aurelien.campeas@logilab.fr>
parents:
9911
diff
changeset
|
2 |
# copyright 2003-2014 LOGILAB S.A. (Paris, FRANCE), all rights reserved. |
5421
8167de96c523
proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
5097
diff
changeset
|
3 |
# contact http://www.logilab.fr/ -- mailto:contact@logilab.fr |
8167de96c523
proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
5097
diff
changeset
|
4 |
# |
8167de96c523
proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
5097
diff
changeset
|
5 |
# This file is part of CubicWeb. |
8167de96c523
proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
5097
diff
changeset
|
6 |
# |
8167de96c523
proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
5097
diff
changeset
|
7 |
# CubicWeb is free software: you can redistribute it and/or modify it under the |
8167de96c523
proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
5097
diff
changeset
|
8 |
# terms of the GNU Lesser General Public License as published by the Free |
8167de96c523
proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
5097
diff
changeset
|
9 |
# Software Foundation, either version 2.1 of the License, or (at your option) |
8167de96c523
proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
5097
diff
changeset
|
10 |
# any later version. |
8167de96c523
proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
5097
diff
changeset
|
11 |
# |
5424
8ecbcbff9777
replace logilab-common by CubicWeb in disclaimer
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
5421
diff
changeset
|
12 |
# CubicWeb is distributed in the hope that it will be useful, but WITHOUT |
5421
8167de96c523
proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
5097
diff
changeset
|
13 |
# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS |
8167de96c523
proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
5097
diff
changeset
|
14 |
# FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more |
8167de96c523
proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
5097
diff
changeset
|
15 |
# details. |
8167de96c523
proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
5097
diff
changeset
|
16 |
# |
8167de96c523
proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
5097
diff
changeset
|
17 |
# You should have received a copy of the GNU Lesser General Public License along |
8167de96c523
proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
5097
diff
changeset
|
18 |
# with CubicWeb. If not, see <http://www.gnu.org/licenses/>. |
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
19 |
"""This module provides tools to import tabular data. |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
20 |
|
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
21 |
|
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
22 |
Example of use (run this with `cubicweb-ctl shell instance import-script.py`): |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
23 |
|
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
24 |
.. sourcecode:: python |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
25 |
|
7158
0f31a50b144e
[dataimport] cleanups, update docstring to up-to-date usage
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
7118
diff
changeset
|
26 |
from cubicweb.dataimport import * |
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
27 |
# define data generators |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
28 |
GENERATORS = [] |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
29 |
|
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
30 |
USERS = [('Prenom', 'firstname', ()), |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
31 |
('Nom', 'surname', ()), |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
32 |
('Identifiant', 'login', ()), |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
33 |
] |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
34 |
|
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
35 |
def gen_users(ctl): |
6133
6f3eabbbdf2e
use iter_and_commit in example
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents:
6122
diff
changeset
|
36 |
for row in ctl.iter_and_commit('utilisateurs'): |
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
37 |
entity = mk_entity(row, USERS) |
7158
0f31a50b144e
[dataimport] cleanups, update docstring to up-to-date usage
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
7118
diff
changeset
|
38 |
entity['upassword'] = 'motdepasse' |
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
39 |
ctl.check('login', entity['login'], None) |
7158
0f31a50b144e
[dataimport] cleanups, update docstring to up-to-date usage
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
7118
diff
changeset
|
40 |
entity = ctl.store.create_entity('CWUser', **entity) |
0f31a50b144e
[dataimport] cleanups, update docstring to up-to-date usage
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
7118
diff
changeset
|
41 |
email = ctl.store.create_entity('EmailAddress', address=row['email']) |
0f31a50b144e
[dataimport] cleanups, update docstring to up-to-date usage
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
7118
diff
changeset
|
42 |
ctl.store.relate(entity.eid, 'use_email', email.eid) |
3003
2944ee420dca
R [dataimport] rename uid to eid
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
2974
diff
changeset
|
43 |
ctl.store.rql('SET U in_group G WHERE G name "users", U eid %(x)s', {'x':entity['eid']}) |
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
44 |
|
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
45 |
CHK = [('login', check_doubles, 'Utilisateurs Login', |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
46 |
'Deux utilisateurs ne devraient pas avoir le même login.'), |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
47 |
] |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
48 |
|
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
49 |
GENERATORS.append( (gen_users, CHK) ) |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
50 |
|
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
51 |
# create controller |
9906
b2919eca7514
[dataimport] remove _rql heresy
Julien Cristau <julien.cristau@logilab.fr>
parents:
9905
diff
changeset
|
52 |
ctl = CWImportController(RQLObjectStore(cnx)) |
4527
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
53 |
ctl.askerror = 1 |
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
54 |
ctl.generators = GENERATORS |
6122
4d2b04b32cdc
improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents:
5557
diff
changeset
|
55 |
ctl.data['utilisateurs'] = lazytable(ucsvreader(open('users.csv'))) |
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
56 |
# run |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
57 |
ctl.run() |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
58 |
|
4912
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
59 |
.. BUG file with one column are not parsable |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
60 |
.. TODO rollback() invocation is not possible yet |
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
61 |
""" |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
62 |
__docformat__ = "restructuredtext en" |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
63 |
|
8625
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
64 |
import csv |
4186
ca7e526b07b6
import cleanup, check data file exists
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4173
diff
changeset
|
65 |
import sys |
8625
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
66 |
import threading |
4186
ca7e526b07b6
import cleanup, check data file exists
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4173
diff
changeset
|
67 |
import traceback |
8926
336e4971dc50
[dataimport] backout 6947201033be (related to #2788402)
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
8900
diff
changeset
|
68 |
import warnings |
8625
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
69 |
import cPickle |
4186
ca7e526b07b6
import cleanup, check data file exists
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4173
diff
changeset
|
70 |
import os.path as osp |
8832
26cdfc6dd6f8
[dataimport] Uniformize the API across the different stores.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
8807
diff
changeset
|
71 |
import inspect |
10269
d5e298df98d1
[dataimport] add missing import (closes #4985916)
Julien Cristau <julien.cristau@logilab.fr>
parents:
9827
diff
changeset
|
72 |
from base64 import b64encode |
8625
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
73 |
from collections import defaultdict |
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
74 |
from copy import copy |
9901
161ec913aeec
[dataimport] _create_copyfrom_buffer: fix datetime converter + add time converter
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9900
diff
changeset
|
75 |
from datetime import date, datetime, time |
8625
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
76 |
from time import asctime |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
77 |
from StringIO import StringIO |
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
78 |
|
7159
3bcccd3ab6b6
[dataimport] ObjectStore.create_entity should return something that looks like an entity (eg no more using dict protocol to access to attributes)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
7158
diff
changeset
|
79 |
from logilab.common import shellutils, attrdict |
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
80 |
from logilab.common.date import strptime |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
81 |
from logilab.common.decorators import cached |
4136
47060a66c97f
dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
3486
diff
changeset
|
82 |
from logilab.common.deprecation import deprecated |
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
83 |
|
7171
4297be67bbe4
[dataimport] tell more and nicely about transaction status
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
7170
diff
changeset
|
84 |
from cubicweb import QueryError |
8625
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
85 |
from cubicweb.utils import make_uid |
7158
0f31a50b144e
[dataimport] cleanups, update docstring to up-to-date usage
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
7118
diff
changeset
|
86 |
from cubicweb.schema import META_RTYPES, VIRTUAL_RTYPES |
8625
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
87 |
from cubicweb.server.edition import EditedEntity |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
88 |
from cubicweb.server.sqlutils import SQL_PREFIX |
5066
bf5cbc351e99
[repo] move eschema_eid function from hooks.metadata to server.utils
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
5063
diff
changeset
|
89 |
from cubicweb.server.utils import eschema_eid |
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
90 |
|
7158
0f31a50b144e
[dataimport] cleanups, update docstring to up-to-date usage
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
7118
diff
changeset
|
91 |
|
6122
4d2b04b32cdc
improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents:
5557
diff
changeset
|
92 |
def count_lines(stream_or_filename): |
4d2b04b32cdc
improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents:
5557
diff
changeset
|
93 |
if isinstance(stream_or_filename, basestring): |
6492
47a284c0d012
fix some pylint detected errors
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
6427
diff
changeset
|
94 |
f = open(stream_or_filename) |
6122
4d2b04b32cdc
improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents:
5557
diff
changeset
|
95 |
else: |
4d2b04b32cdc
improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents:
5557
diff
changeset
|
96 |
f = stream_or_filename |
4d2b04b32cdc
improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents:
5557
diff
changeset
|
97 |
f.seek(0) |
4d2b04b32cdc
improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents:
5557
diff
changeset
|
98 |
for i, line in enumerate(f): |
4d2b04b32cdc
improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents:
5557
diff
changeset
|
99 |
pass |
4d2b04b32cdc
improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents:
5557
diff
changeset
|
100 |
f.seek(0) |
4d2b04b32cdc
improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents:
5557
diff
changeset
|
101 |
return i+1 |
4d2b04b32cdc
improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents:
5557
diff
changeset
|
102 |
|
10091
09878c2f8621
[dataimport] Have ucsvreader's API match that of csv.reader (closes #3705701)
Rémi Cardona <remi.cardona@logilab.fr>
parents:
10078
diff
changeset
|
103 |
def ucsvreader_pb(stream_or_path, encoding='utf-8', delimiter=',', quotechar='"', |
09878c2f8621
[dataimport] Have ucsvreader's API match that of csv.reader (closes #3705701)
Rémi Cardona <remi.cardona@logilab.fr>
parents:
10078
diff
changeset
|
104 |
skipfirst=False, withpb=True, skip_empty=True, separator=None, |
09878c2f8621
[dataimport] Have ucsvreader's API match that of csv.reader (closes #3705701)
Rémi Cardona <remi.cardona@logilab.fr>
parents:
10078
diff
changeset
|
105 |
quote=None): |
9181
2eac0aa1d3f6
[dataimport] ucsvreader should skip empty lines unless specified otherwise. Closes #3035944
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
8970
diff
changeset
|
106 |
"""same as :func:`ucsvreader` but a progress bar is displayed as we iter on rows""" |
10091
09878c2f8621
[dataimport] Have ucsvreader's API match that of csv.reader (closes #3705701)
Rémi Cardona <remi.cardona@logilab.fr>
parents:
10078
diff
changeset
|
107 |
if separator is not None: |
09878c2f8621
[dataimport] Have ucsvreader's API match that of csv.reader (closes #3705701)
Rémi Cardona <remi.cardona@logilab.fr>
parents:
10078
diff
changeset
|
108 |
delimiter = separator |
09878c2f8621
[dataimport] Have ucsvreader's API match that of csv.reader (closes #3705701)
Rémi Cardona <remi.cardona@logilab.fr>
parents:
10078
diff
changeset
|
109 |
warnings.warn("[3.20] 'separator' kwarg is deprecated, use 'delimiter' instead") |
09878c2f8621
[dataimport] Have ucsvreader's API match that of csv.reader (closes #3705701)
Rémi Cardona <remi.cardona@logilab.fr>
parents:
10078
diff
changeset
|
110 |
if quote is not None: |
09878c2f8621
[dataimport] Have ucsvreader's API match that of csv.reader (closes #3705701)
Rémi Cardona <remi.cardona@logilab.fr>
parents:
10078
diff
changeset
|
111 |
quotechar = quote |
09878c2f8621
[dataimport] Have ucsvreader's API match that of csv.reader (closes #3705701)
Rémi Cardona <remi.cardona@logilab.fr>
parents:
10078
diff
changeset
|
112 |
warnings.warn("[3.20] 'quote' kwarg is deprecated, use 'quotechar' instead") |
6122
4d2b04b32cdc
improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents:
5557
diff
changeset
|
113 |
if isinstance(stream_or_path, basestring): |
6492
47a284c0d012
fix some pylint detected errors
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
6427
diff
changeset
|
114 |
if not osp.exists(stream_or_path): |
47a284c0d012
fix some pylint detected errors
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
6427
diff
changeset
|
115 |
raise Exception("file doesn't exists: %s" % stream_or_path) |
6122
4d2b04b32cdc
improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents:
5557
diff
changeset
|
116 |
stream = open(stream_or_path) |
4d2b04b32cdc
improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents:
5557
diff
changeset
|
117 |
else: |
4d2b04b32cdc
improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents:
5557
diff
changeset
|
118 |
stream = stream_or_path |
4d2b04b32cdc
improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents:
5557
diff
changeset
|
119 |
rowcount = count_lines(stream) |
4136
47060a66c97f
dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
3486
diff
changeset
|
120 |
if skipfirst: |
47060a66c97f
dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
3486
diff
changeset
|
121 |
rowcount -= 1 |
47060a66c97f
dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
3486
diff
changeset
|
122 |
if withpb: |
4140
46ddd27a4ca4
tweaks output
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4136
diff
changeset
|
123 |
pb = shellutils.ProgressBar(rowcount, 50) |
10091
09878c2f8621
[dataimport] Have ucsvreader's API match that of csv.reader (closes #3705701)
Rémi Cardona <remi.cardona@logilab.fr>
parents:
10078
diff
changeset
|
124 |
for urow in ucsvreader(stream, encoding, delimiter, quotechar, |
9181
2eac0aa1d3f6
[dataimport] ucsvreader should skip empty lines unless specified otherwise. Closes #3035944
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
8970
diff
changeset
|
125 |
skipfirst=skipfirst, skip_empty=skip_empty): |
4136
47060a66c97f
dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
3486
diff
changeset
|
126 |
yield urow |
47060a66c97f
dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
3486
diff
changeset
|
127 |
if withpb: |
47060a66c97f
dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
3486
diff
changeset
|
128 |
pb.update() |
47060a66c97f
dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
3486
diff
changeset
|
129 |
print ' %s rows imported' % rowcount |
47060a66c97f
dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
3486
diff
changeset
|
130 |
|
10091
09878c2f8621
[dataimport] Have ucsvreader's API match that of csv.reader (closes #3705701)
Rémi Cardona <remi.cardona@logilab.fr>
parents:
10078
diff
changeset
|
131 |
def ucsvreader(stream, encoding='utf-8', delimiter=',', quotechar='"', |
09878c2f8621
[dataimport] Have ucsvreader's API match that of csv.reader (closes #3705701)
Rémi Cardona <remi.cardona@logilab.fr>
parents:
10078
diff
changeset
|
132 |
skipfirst=False, ignore_errors=False, skip_empty=True, |
09878c2f8621
[dataimport] Have ucsvreader's API match that of csv.reader (closes #3705701)
Rémi Cardona <remi.cardona@logilab.fr>
parents:
10078
diff
changeset
|
133 |
separator=None, quote=None): |
4136
47060a66c97f
dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
3486
diff
changeset
|
134 |
"""A csv reader that accepts files with any encoding and outputs unicode |
47060a66c97f
dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
3486
diff
changeset
|
135 |
strings |
9181
2eac0aa1d3f6
[dataimport] ucsvreader should skip empty lines unless specified otherwise. Closes #3035944
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
8970
diff
changeset
|
136 |
|
2eac0aa1d3f6
[dataimport] ucsvreader should skip empty lines unless specified otherwise. Closes #3035944
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
8970
diff
changeset
|
137 |
if skip_empty (the default), lines without any values specified (only |
2eac0aa1d3f6
[dataimport] ucsvreader should skip empty lines unless specified otherwise. Closes #3035944
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
8970
diff
changeset
|
138 |
separators) will be skipped. This is useful for Excel exports which may be |
2eac0aa1d3f6
[dataimport] ucsvreader should skip empty lines unless specified otherwise. Closes #3035944
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
8970
diff
changeset
|
139 |
full of such lines. |
4136
47060a66c97f
dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
3486
diff
changeset
|
140 |
""" |
10091
09878c2f8621
[dataimport] Have ucsvreader's API match that of csv.reader (closes #3705701)
Rémi Cardona <remi.cardona@logilab.fr>
parents:
10078
diff
changeset
|
141 |
if separator is not None: |
09878c2f8621
[dataimport] Have ucsvreader's API match that of csv.reader (closes #3705701)
Rémi Cardona <remi.cardona@logilab.fr>
parents:
10078
diff
changeset
|
142 |
delimiter = separator |
09878c2f8621
[dataimport] Have ucsvreader's API match that of csv.reader (closes #3705701)
Rémi Cardona <remi.cardona@logilab.fr>
parents:
10078
diff
changeset
|
143 |
warnings.warn("[3.20] 'separator' kwarg is deprecated, use 'delimiter' instead") |
09878c2f8621
[dataimport] Have ucsvreader's API match that of csv.reader (closes #3705701)
Rémi Cardona <remi.cardona@logilab.fr>
parents:
10078
diff
changeset
|
144 |
if quote is not None: |
09878c2f8621
[dataimport] Have ucsvreader's API match that of csv.reader (closes #3705701)
Rémi Cardona <remi.cardona@logilab.fr>
parents:
10078
diff
changeset
|
145 |
quotechar = quote |
09878c2f8621
[dataimport] Have ucsvreader's API match that of csv.reader (closes #3705701)
Rémi Cardona <remi.cardona@logilab.fr>
parents:
10078
diff
changeset
|
146 |
warnings.warn("[3.20] 'quote' kwarg is deprecated, use 'quotechar' instead") |
09878c2f8621
[dataimport] Have ucsvreader's API match that of csv.reader (closes #3705701)
Rémi Cardona <remi.cardona@logilab.fr>
parents:
10078
diff
changeset
|
147 |
it = iter(csv.reader(stream, delimiter=delimiter, quotechar=quotechar)) |
8637
e16561083d84
[dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
8631
diff
changeset
|
148 |
if not ignore_errors: |
e16561083d84
[dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
8631
diff
changeset
|
149 |
if skipfirst: |
e16561083d84
[dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
8631
diff
changeset
|
150 |
it.next() |
e16561083d84
[dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
8631
diff
changeset
|
151 |
for row in it: |
9181
2eac0aa1d3f6
[dataimport] ucsvreader should skip empty lines unless specified otherwise. Closes #3035944
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
8970
diff
changeset
|
152 |
decoded = [item.decode(encoding) for item in row] |
2eac0aa1d3f6
[dataimport] ucsvreader should skip empty lines unless specified otherwise. Closes #3035944
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
8970
diff
changeset
|
153 |
if not skip_empty or any(decoded): |
9694
c90107199dea
[dataimport] Avoid double unicode decoding in ucsvreader (closes #3705752)
Rémi Cardona <remi.cardona@logilab.fr>
parents:
9597
diff
changeset
|
154 |
yield decoded |
8637
e16561083d84
[dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
8631
diff
changeset
|
155 |
else: |
9695
aa982b7c3f2a
[dataimport] Prevent ucsvreader from skipping the first line when ignore_errors is True (closes #3705791)
Rémi Cardona <remi.cardona@logilab.fr>
parents:
9694
diff
changeset
|
156 |
if skipfirst: |
aa982b7c3f2a
[dataimport] Prevent ucsvreader from skipping the first line when ignore_errors is True (closes #3705791)
Rémi Cardona <remi.cardona@logilab.fr>
parents:
9694
diff
changeset
|
157 |
try: |
aa982b7c3f2a
[dataimport] Prevent ucsvreader from skipping the first line when ignore_errors is True (closes #3705791)
Rémi Cardona <remi.cardona@logilab.fr>
parents:
9694
diff
changeset
|
158 |
row = it.next() |
aa982b7c3f2a
[dataimport] Prevent ucsvreader from skipping the first line when ignore_errors is True (closes #3705791)
Rémi Cardona <remi.cardona@logilab.fr>
parents:
9694
diff
changeset
|
159 |
except csv.Error: |
aa982b7c3f2a
[dataimport] Prevent ucsvreader from skipping the first line when ignore_errors is True (closes #3705791)
Rémi Cardona <remi.cardona@logilab.fr>
parents:
9694
diff
changeset
|
160 |
pass |
8637
e16561083d84
[dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
8631
diff
changeset
|
161 |
# Safe version, that can cope with error in CSV file |
e16561083d84
[dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
8631
diff
changeset
|
162 |
while True: |
e16561083d84
[dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
8631
diff
changeset
|
163 |
try: |
e16561083d84
[dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
8631
diff
changeset
|
164 |
row = it.next() |
e16561083d84
[dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
8631
diff
changeset
|
165 |
# End of CSV, break |
e16561083d84
[dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
8631
diff
changeset
|
166 |
except StopIteration: |
e16561083d84
[dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
8631
diff
changeset
|
167 |
break |
e16561083d84
[dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
8631
diff
changeset
|
168 |
# Error in CSV, ignore line and continue |
e16561083d84
[dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
8631
diff
changeset
|
169 |
except csv.Error: |
e16561083d84
[dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
8631
diff
changeset
|
170 |
continue |
9181
2eac0aa1d3f6
[dataimport] ucsvreader should skip empty lines unless specified otherwise. Closes #3035944
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
8970
diff
changeset
|
171 |
decoded = [item.decode(encoding) for item in row] |
2eac0aa1d3f6
[dataimport] ucsvreader should skip empty lines unless specified otherwise. Closes #3035944
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
8970
diff
changeset
|
172 |
if not skip_empty or any(decoded): |
2eac0aa1d3f6
[dataimport] ucsvreader should skip empty lines unless specified otherwise. Closes #3035944
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
8970
diff
changeset
|
173 |
yield decoded |
2eac0aa1d3f6
[dataimport] ucsvreader should skip empty lines unless specified otherwise. Closes #3035944
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
8970
diff
changeset
|
174 |
|
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
175 |
|
6136
79da6f969b15
[dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
6133
diff
changeset
|
176 |
def callfunc_every(func, number, iterable): |
79da6f969b15
[dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
6133
diff
changeset
|
177 |
"""yield items of `iterable` one by one and call function `func` |
79da6f969b15
[dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
6133
diff
changeset
|
178 |
every `number` iterations. Always call function `func` at the end. |
79da6f969b15
[dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
6133
diff
changeset
|
179 |
""" |
79da6f969b15
[dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
6133
diff
changeset
|
180 |
for idx, item in enumerate(iterable): |
79da6f969b15
[dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
6133
diff
changeset
|
181 |
yield item |
7227
23d9c1f89c96
[dataimport] actually commit every desired number...
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
7214
diff
changeset
|
182 |
if not idx % number: |
6136
79da6f969b15
[dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
6133
diff
changeset
|
183 |
func() |
79da6f969b15
[dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
6133
diff
changeset
|
184 |
func() |
4152
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
185 |
|
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
186 |
def lazytable(reader): |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
187 |
"""The first row is taken to be the header of the table and |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
188 |
used to output a dict for each row of data. |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
189 |
|
6122
4d2b04b32cdc
improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents:
5557
diff
changeset
|
190 |
>>> data = lazytable(ucsvreader(open(filename))) |
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
191 |
""" |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
192 |
header = reader.next() |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
193 |
for row in reader: |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
194 |
yield dict(zip(header, row)) |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
195 |
|
7201
52f5831400b2
[dataimport] allow to specify columns on which result should be sorted in lazydbtable
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
7171
diff
changeset
|
196 |
def lazydbtable(cu, table, headers, orderby=None): |
7160
923013173031
[dataimport] new 'lazydbtable' generator function to feed data from a database table
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
7159
diff
changeset
|
197 |
"""return an iterator on rows of a sql table. On each row, fetch columns |
923013173031
[dataimport] new 'lazydbtable' generator function to feed data from a database table
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
7159
diff
changeset
|
198 |
defined in headers and return values as a dictionary. |
923013173031
[dataimport] new 'lazydbtable' generator function to feed data from a database table
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
7159
diff
changeset
|
199 |
|
923013173031
[dataimport] new 'lazydbtable' generator function to feed data from a database table
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
7159
diff
changeset
|
200 |
>>> data = lazydbtable(cu, 'experimentation', ('id', 'nickname', 'gps')) |
923013173031
[dataimport] new 'lazydbtable' generator function to feed data from a database table
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
7159
diff
changeset
|
201 |
""" |
7201
52f5831400b2
[dataimport] allow to specify columns on which result should be sorted in lazydbtable
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
7171
diff
changeset
|
202 |
sql = 'SELECT %s FROM %s' % (','.join(headers), table,) |
52f5831400b2
[dataimport] allow to specify columns on which result should be sorted in lazydbtable
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
7171
diff
changeset
|
203 |
if orderby: |
52f5831400b2
[dataimport] allow to specify columns on which result should be sorted in lazydbtable
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
7171
diff
changeset
|
204 |
sql += ' ORDER BY %s' % ','.join(orderby) |
52f5831400b2
[dataimport] allow to specify columns on which result should be sorted in lazydbtable
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
7171
diff
changeset
|
205 |
cu.execute(sql) |
7160
923013173031
[dataimport] new 'lazydbtable' generator function to feed data from a database table
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
7159
diff
changeset
|
206 |
while True: |
923013173031
[dataimport] new 'lazydbtable' generator function to feed data from a database table
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
7159
diff
changeset
|
207 |
row = cu.fetchone() |
923013173031
[dataimport] new 'lazydbtable' generator function to feed data from a database table
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
7159
diff
changeset
|
208 |
if row is None: |
923013173031
[dataimport] new 'lazydbtable' generator function to feed data from a database table
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
7159
diff
changeset
|
209 |
break |
923013173031
[dataimport] new 'lazydbtable' generator function to feed data from a database table
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
7159
diff
changeset
|
210 |
yield dict(zip(headers, row)) |
923013173031
[dataimport] new 'lazydbtable' generator function to feed data from a database table
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
7159
diff
changeset
|
211 |
|
4152
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
212 |
def mk_entity(row, map): |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
213 |
"""Return a dict made from sanitized mapped values. |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
214 |
|
4912
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
215 |
ValueError can be raised on unexpected values found in checkers |
4527
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
216 |
|
4152
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
217 |
>>> row = {'myname': u'dupont'} |
4912
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
218 |
>>> map = [('myname', u'name', (call_transform_method('title'),))] |
4152
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
219 |
>>> mk_entity(row, map) |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
220 |
{'name': u'Dupont'} |
4527
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
221 |
>>> row = {'myname': u'dupont', 'optname': u''} |
4912
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
222 |
>>> map = [('myname', u'name', (call_transform_method('title'),)), |
4527
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
223 |
... ('optname', u'MARKER', (optional,))] |
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
224 |
>>> mk_entity(row, map) |
4912
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
225 |
{'name': u'Dupont', 'optname': None} |
4152
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
226 |
""" |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
227 |
res = {} |
4527
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
228 |
assert isinstance(row, dict) |
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
229 |
assert isinstance(map, list) |
4152
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
230 |
for src, dest, funcs in map: |
8406
f3bc8ca0b715
[data import] don't crash if value isn't in the file, simply no key/value in the output dict. Closes #2356328
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
8403
diff
changeset
|
231 |
try: |
f3bc8ca0b715
[data import] don't crash if value isn't in the file, simply no key/value in the output dict. Closes #2356328
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
8403
diff
changeset
|
232 |
res[dest] = row[src] |
f3bc8ca0b715
[data import] don't crash if value isn't in the file, simply no key/value in the output dict. Closes #2356328
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
8403
diff
changeset
|
233 |
except KeyError: |
f3bc8ca0b715
[data import] don't crash if value isn't in the file, simply no key/value in the output dict. Closes #2356328
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
8403
diff
changeset
|
234 |
continue |
4527
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
235 |
try: |
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
236 |
for func in funcs: |
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
237 |
res[dest] = func(res[dest]) |
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
238 |
if res[dest] is None: |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
239 |
break |
8695
358d8bed9626
[toward-py3k] rewrite to "except AnException as exc:" (part of #2711624)
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
8637
diff
changeset
|
240 |
except ValueError as err: |
7170
32b5d9d43a7e
[dataimport] propagate stack
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
7160
diff
changeset
|
241 |
raise ValueError('error with %r field: %s' % (src, err)), None, sys.exc_info()[-1] |
4152
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
242 |
return res |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
243 |
|
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
244 |
# user interactions ############################################################ |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
245 |
|
3029
bc573d5fb5b7
F [devtools] by default dataimport prints message on stdout
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
3003
diff
changeset
|
246 |
def tell(msg): |
bc573d5fb5b7
F [devtools] by default dataimport prints message on stdout
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
3003
diff
changeset
|
247 |
print msg |
bc573d5fb5b7
F [devtools] by default dataimport prints message on stdout
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
3003
diff
changeset
|
248 |
|
4152
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
249 |
def confirm(question): |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
250 |
"""A confirm function that asks for yes/no/abort and exits on abort.""" |
4721
8f63691ccb7f
pylint style fixes
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4613
diff
changeset
|
251 |
answer = shellutils.ASK.ask(question, ('Y', 'n', 'abort'), 'Y') |
4152
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
252 |
if answer == 'abort': |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
253 |
sys.exit(1) |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
254 |
return answer == 'Y' |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
255 |
|
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
256 |
|
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
257 |
class catch_error(object): |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
258 |
"""Helper for @contextmanager decorator.""" |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
259 |
|
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
260 |
def __init__(self, ctl, key='unexpected error', msg=None): |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
261 |
self.ctl = ctl |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
262 |
self.key = key |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
263 |
self.msg = msg |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
264 |
|
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
265 |
def __enter__(self): |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
266 |
return self |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
267 |
|
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
268 |
def __exit__(self, type, value, traceback): |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
269 |
if type is not None: |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
270 |
if issubclass(type, (KeyboardInterrupt, SystemExit)): |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
271 |
return # re-raise |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
272 |
if self.ctl.catcherrors: |
4173
cfd5d3270f99
msg isn't defined there, but we've to give traceback information to record error
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4152
diff
changeset
|
273 |
self.ctl.record_error(self.key, None, type, value, traceback) |
4152
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
274 |
return True # silent |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
275 |
|
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
276 |
|
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
277 |
# base sanitizing/coercing functions ########################################### |
4527
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
278 |
|
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
279 |
def optional(value): |
4912
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
280 |
"""checker to filter optional field |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
281 |
|
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
282 |
If value is undefined (ex: empty string), return None that will |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
283 |
break the checkers validation chain |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
284 |
|
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
285 |
General use is to add 'optional' check in first condition to avoid |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
286 |
ValueError by further checkers |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
287 |
|
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
288 |
>>> MAPPER = [(u'value', 'value', (optional, int))] |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
289 |
>>> row = {'value': u'XXX'} |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
290 |
>>> mk_entity(row, MAPPER) |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
291 |
{'value': None} |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
292 |
>>> row = {'value': u'100'} |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
293 |
>>> mk_entity(row, MAPPER) |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
294 |
{'value': 100} |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
295 |
""" |
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
296 |
if value: |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
297 |
return value |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
298 |
return None |
4527
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
299 |
|
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
300 |
def required(value): |
6136
79da6f969b15
[dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
6133
diff
changeset
|
301 |
"""raise ValueError if value is empty |
4527
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
302 |
|
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
303 |
This check should be often found in last position in the chain. |
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
304 |
""" |
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
305 |
if value: |
4527
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
306 |
return value |
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
307 |
raise ValueError("required") |
4527
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
308 |
|
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
309 |
def todatetime(format='%d/%m/%Y'): |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
310 |
"""return a transformation function to turn string input value into a |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
311 |
`datetime.datetime` instance, using given format. |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
312 |
|
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
313 |
Follow it by `todate` or `totime` functions from `logilab.common.date` if |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
314 |
you want a `date`/`time` instance instead of `datetime`. |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
315 |
""" |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
316 |
def coerce(value): |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
317 |
return strptime(value, format) |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
318 |
return coerce |
4527
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
319 |
|
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
320 |
def call_transform_method(methodname, *args, **kwargs): |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
321 |
"""return value returned by calling the given method on input""" |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
322 |
def coerce(value): |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
323 |
return getattr(value, methodname)(*args, **kwargs) |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
324 |
return coerce |
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
325 |
|
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
326 |
def call_check_method(methodname, *args, **kwargs): |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
327 |
"""check value returned by calling the given method on input is true, |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
328 |
else raise ValueError |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
329 |
""" |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
330 |
def check(value): |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
331 |
if getattr(value, methodname)(*args, **kwargs): |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
332 |
return value |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
333 |
raise ValueError('%s not verified on %r' % (methodname, value)) |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
334 |
return check |
4152
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
335 |
|
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
336 |
# base integrity checking functions ############################################ |
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
337 |
|
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
338 |
def check_doubles(buckets): |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
339 |
"""Extract the keys that have more than one item in their bucket.""" |
4721
8f63691ccb7f
pylint style fixes
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4613
diff
changeset
|
340 |
return [(k, len(v)) for k, v in buckets.items() if len(v) > 1] |
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
341 |
|
4136
47060a66c97f
dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
3486
diff
changeset
|
342 |
def check_doubles_not_none(buckets): |
47060a66c97f
dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
3486
diff
changeset
|
343 |
"""Extract the keys that have more than one item in their bucket.""" |
4721
8f63691ccb7f
pylint style fixes
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4613
diff
changeset
|
344 |
return [(k, len(v)) for k, v in buckets.items() |
8f63691ccb7f
pylint style fixes
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4613
diff
changeset
|
345 |
if k is not None and len(v) > 1] |
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
346 |
|
8625
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
347 |
# sql generator utility functions ############################################# |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
348 |
|
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
349 |
|
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
350 |
def _import_statements(sql_connect, statements, nb_threads=3, |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
351 |
dump_output_dir=None, |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
352 |
support_copy_from=True, encoding='utf-8'): |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
353 |
""" |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
354 |
Import a bunch of sql statements, using different threads. |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
355 |
""" |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
356 |
try: |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
357 |
chunksize = (len(statements) / nb_threads) + 1 |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
358 |
threads = [] |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
359 |
for i in xrange(nb_threads): |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
360 |
chunks = statements[i*chunksize:(i+1)*chunksize] |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
361 |
thread = threading.Thread(target=_execmany_thread, |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
362 |
args=(sql_connect, chunks, |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
363 |
dump_output_dir, |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
364 |
support_copy_from, |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
365 |
encoding)) |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
366 |
thread.start() |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
367 |
threads.append(thread) |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
368 |
for t in threads: |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
369 |
t.join() |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
370 |
except Exception: |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
371 |
print 'Error in import statements' |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
372 |
|
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
373 |
def _execmany_thread_not_copy_from(cu, statement, data, table=None, |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
374 |
columns=None, encoding='utf-8'): |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
375 |
""" Execute thread without copy from |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
376 |
""" |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
377 |
cu.executemany(statement, data) |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
378 |
|
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
379 |
def _execmany_thread_copy_from(cu, statement, data, table, |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
380 |
columns, encoding='utf-8'): |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
381 |
""" Execute thread with copy from |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
382 |
""" |
10078
5eeffcfde1ba
[dataimport] Fix use of _create_copyfrom_buffer() (related to #3845572)
Rémi Cardona <remi.cardona@logilab.fr>
parents:
10007
diff
changeset
|
383 |
buf = _create_copyfrom_buffer(data, columns, encoding=encoding) |
8625
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
384 |
if buf is None: |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
385 |
_execmany_thread_not_copy_from(cu, statement, data) |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
386 |
else: |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
387 |
if columns is None: |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
388 |
cu.copy_from(buf, table, null='NULL') |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
389 |
else: |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
390 |
cu.copy_from(buf, table, null='NULL', columns=columns) |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
391 |
|
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
392 |
def _execmany_thread(sql_connect, statements, dump_output_dir=None, |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
393 |
support_copy_from=True, encoding='utf-8'): |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
394 |
""" |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
395 |
Execute sql statement. If 'INSERT INTO', try to use 'COPY FROM' command, |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
396 |
or fallback to execute_many. |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
397 |
""" |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
398 |
if support_copy_from: |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
399 |
execmany_func = _execmany_thread_copy_from |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
400 |
else: |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
401 |
execmany_func = _execmany_thread_not_copy_from |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
402 |
cnx = sql_connect() |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
403 |
cu = cnx.cursor() |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
404 |
try: |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
405 |
for statement, data in statements: |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
406 |
table = None |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
407 |
columns = None |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
408 |
try: |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
409 |
if not statement.startswith('INSERT INTO'): |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
410 |
cu.executemany(statement, data) |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
411 |
continue |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
412 |
table = statement.split()[2] |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
413 |
if isinstance(data[0], (tuple, list)): |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
414 |
columns = None |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
415 |
else: |
8696
0bb18407c053
[toward py3k] rewrite dict.keys() and dict.values() (part of #2711624)
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
8695
diff
changeset
|
416 |
columns = list(data[0]) |
8625
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
417 |
execmany_func(cu, statement, data, table, columns, encoding) |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
418 |
except Exception: |
8970
0a1bd0c590e2
[dataimport] minor typo in error handling
Dimitri Papadopoulos <dimitri.papadopoulos@cea.fr>
parents:
8930
diff
changeset
|
419 |
print 'unable to copy data into table %s' % table |
8625
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
420 |
# Error in import statement, save data in dump_output_dir |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
421 |
if dump_output_dir is not None: |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
422 |
pdata = {'data': data, 'statement': statement, |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
423 |
'time': asctime(), 'columns': columns} |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
424 |
filename = make_uid() |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
425 |
try: |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
426 |
with open(osp.join(dump_output_dir, |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
427 |
'%s.pickle' % filename), 'w') as fobj: |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
428 |
fobj.write(cPickle.dumps(pdata)) |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
429 |
except IOError: |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
430 |
print 'ERROR while pickling in', dump_output_dir, filename+'.pickle' |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
431 |
pass |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
432 |
cnx.rollback() |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
433 |
raise |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
434 |
finally: |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
435 |
cnx.commit() |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
436 |
cu.close() |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
437 |
|
9898
70056633085c
[dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9827
diff
changeset
|
438 |
|
70056633085c
[dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9827
diff
changeset
|
439 |
def _copyfrom_buffer_convert_None(value, **opts): |
70056633085c
[dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9827
diff
changeset
|
440 |
'''Convert None value to "NULL"''' |
70056633085c
[dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9827
diff
changeset
|
441 |
return 'NULL' |
70056633085c
[dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9827
diff
changeset
|
442 |
|
70056633085c
[dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9827
diff
changeset
|
443 |
def _copyfrom_buffer_convert_number(value, **opts): |
70056633085c
[dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9827
diff
changeset
|
444 |
'''Convert a number into its string representation''' |
70056633085c
[dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9827
diff
changeset
|
445 |
return str(value) |
70056633085c
[dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9827
diff
changeset
|
446 |
|
70056633085c
[dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9827
diff
changeset
|
447 |
def _copyfrom_buffer_convert_string(value, **opts): |
70056633085c
[dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9827
diff
changeset
|
448 |
'''Convert string value. |
70056633085c
[dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9827
diff
changeset
|
449 |
|
70056633085c
[dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9827
diff
changeset
|
450 |
Recognized keywords: |
70056633085c
[dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9827
diff
changeset
|
451 |
:encoding: resulting string encoding (default: utf-8) |
70056633085c
[dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9827
diff
changeset
|
452 |
:replace_sep: character used when input contains characters |
70056633085c
[dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9827
diff
changeset
|
453 |
that conflict with the column separator. |
70056633085c
[dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9827
diff
changeset
|
454 |
''' |
70056633085c
[dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9827
diff
changeset
|
455 |
encoding = opts.get('encoding','utf-8') |
70056633085c
[dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9827
diff
changeset
|
456 |
replace_sep = opts.get('replace_sep', None) |
70056633085c
[dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9827
diff
changeset
|
457 |
# Remove separators used in string formatting |
70056633085c
[dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9827
diff
changeset
|
458 |
for _char in (u'\t', u'\r', u'\n'): |
70056633085c
[dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9827
diff
changeset
|
459 |
if _char in value: |
70056633085c
[dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9827
diff
changeset
|
460 |
# If a replace_sep is given, replace |
70056633085c
[dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9827
diff
changeset
|
461 |
# the separator |
70056633085c
[dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9827
diff
changeset
|
462 |
# (and thus avoid empty buffer) |
9900
9c7de09a6648
[dataimport] _create_copyfrom_buffer: raise ValueError if conversion cannot be performed
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9899
diff
changeset
|
463 |
if replace_sep is None: |
9c7de09a6648
[dataimport] _create_copyfrom_buffer: raise ValueError if conversion cannot be performed
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9899
diff
changeset
|
464 |
raise ValueError('conflicting separator: ' |
9c7de09a6648
[dataimport] _create_copyfrom_buffer: raise ValueError if conversion cannot be performed
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9899
diff
changeset
|
465 |
'you must provide the replace_sep option') |
9c7de09a6648
[dataimport] _create_copyfrom_buffer: raise ValueError if conversion cannot be performed
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9899
diff
changeset
|
466 |
value = value.replace(_char, replace_sep) |
9898
70056633085c
[dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9827
diff
changeset
|
467 |
value = value.replace('\\', r'\\') |
70056633085c
[dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9827
diff
changeset
|
468 |
if isinstance(value, unicode): |
70056633085c
[dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9827
diff
changeset
|
469 |
value = value.encode(encoding) |
70056633085c
[dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9827
diff
changeset
|
470 |
return value |
70056633085c
[dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9827
diff
changeset
|
471 |
|
9901
161ec913aeec
[dataimport] _create_copyfrom_buffer: fix datetime converter + add time converter
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9900
diff
changeset
|
472 |
def _copyfrom_buffer_convert_date(value, **opts): |
9898
70056633085c
[dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9827
diff
changeset
|
473 |
'''Convert date into "YYYY-MM-DD"''' |
9901
161ec913aeec
[dataimport] _create_copyfrom_buffer: fix datetime converter + add time converter
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9900
diff
changeset
|
474 |
# Do not use strftime, as it yields issue with date < 1900 |
161ec913aeec
[dataimport] _create_copyfrom_buffer: fix datetime converter + add time converter
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9900
diff
changeset
|
475 |
# (http://bugs.python.org/issue1777412) |
161ec913aeec
[dataimport] _create_copyfrom_buffer: fix datetime converter + add time converter
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9900
diff
changeset
|
476 |
return '%04d-%02d-%02d' % (value.year, value.month, value.day) |
161ec913aeec
[dataimport] _create_copyfrom_buffer: fix datetime converter + add time converter
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9900
diff
changeset
|
477 |
|
161ec913aeec
[dataimport] _create_copyfrom_buffer: fix datetime converter + add time converter
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9900
diff
changeset
|
478 |
def _copyfrom_buffer_convert_datetime(value, **opts): |
161ec913aeec
[dataimport] _create_copyfrom_buffer: fix datetime converter + add time converter
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9900
diff
changeset
|
479 |
'''Convert date into "YYYY-MM-DD HH:MM:SS.UUUUUU"''' |
161ec913aeec
[dataimport] _create_copyfrom_buffer: fix datetime converter + add time converter
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9900
diff
changeset
|
480 |
# Do not use strftime, as it yields issue with date < 1900 |
161ec913aeec
[dataimport] _create_copyfrom_buffer: fix datetime converter + add time converter
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9900
diff
changeset
|
481 |
# (http://bugs.python.org/issue1777412) |
161ec913aeec
[dataimport] _create_copyfrom_buffer: fix datetime converter + add time converter
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9900
diff
changeset
|
482 |
return '%s %s' % (_copyfrom_buffer_convert_date(value, **opts), |
161ec913aeec
[dataimport] _create_copyfrom_buffer: fix datetime converter + add time converter
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9900
diff
changeset
|
483 |
_copyfrom_buffer_convert_time(value, **opts)) |
161ec913aeec
[dataimport] _create_copyfrom_buffer: fix datetime converter + add time converter
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9900
diff
changeset
|
484 |
|
161ec913aeec
[dataimport] _create_copyfrom_buffer: fix datetime converter + add time converter
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9900
diff
changeset
|
485 |
def _copyfrom_buffer_convert_time(value, **opts): |
161ec913aeec
[dataimport] _create_copyfrom_buffer: fix datetime converter + add time converter
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9900
diff
changeset
|
486 |
'''Convert time into "HH:MM:SS.UUUUUU"''' |
161ec913aeec
[dataimport] _create_copyfrom_buffer: fix datetime converter + add time converter
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9900
diff
changeset
|
487 |
return '%02d:%02d:%02d.%06d' % (value.hour, value.minute, |
161ec913aeec
[dataimport] _create_copyfrom_buffer: fix datetime converter + add time converter
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9900
diff
changeset
|
488 |
value.second, value.microsecond) |
9898
70056633085c
[dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9827
diff
changeset
|
489 |
|
70056633085c
[dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9827
diff
changeset
|
490 |
# (types, converter) list. |
70056633085c
[dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9827
diff
changeset
|
491 |
_COPYFROM_BUFFER_CONVERTERS = [ |
70056633085c
[dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9827
diff
changeset
|
492 |
(type(None), _copyfrom_buffer_convert_None), |
70056633085c
[dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9827
diff
changeset
|
493 |
((long, int, float), _copyfrom_buffer_convert_number), |
70056633085c
[dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9827
diff
changeset
|
494 |
(basestring, _copyfrom_buffer_convert_string), |
9901
161ec913aeec
[dataimport] _create_copyfrom_buffer: fix datetime converter + add time converter
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9900
diff
changeset
|
495 |
(datetime, _copyfrom_buffer_convert_datetime), |
161ec913aeec
[dataimport] _create_copyfrom_buffer: fix datetime converter + add time converter
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9900
diff
changeset
|
496 |
(date, _copyfrom_buffer_convert_date), |
161ec913aeec
[dataimport] _create_copyfrom_buffer: fix datetime converter + add time converter
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9900
diff
changeset
|
497 |
(time, _copyfrom_buffer_convert_time), |
9898
70056633085c
[dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9827
diff
changeset
|
498 |
] |
70056633085c
[dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9827
diff
changeset
|
499 |
|
9902
62c586f32f93
[dataimport] _create_copyfrom_buffer: do not ignore columns if data is a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9901
diff
changeset
|
500 |
def _create_copyfrom_buffer(data, columns=None, **convert_opts): |
8625
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
501 |
""" |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
502 |
Create a StringIO buffer for 'COPY FROM' command. |
9898
70056633085c
[dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9827
diff
changeset
|
503 |
Deals with Unicode, Int, Float, Date... (see ``converters``) |
70056633085c
[dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9827
diff
changeset
|
504 |
|
70056633085c
[dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9827
diff
changeset
|
505 |
:data: a sequence/dict of tuples |
70056633085c
[dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9827
diff
changeset
|
506 |
:columns: list of columns to consider (default to all columns) |
70056633085c
[dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9827
diff
changeset
|
507 |
:converter_opts: keyword arguements given to converters |
8625
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
508 |
""" |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
509 |
# Create a list rather than directly create a StringIO |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
510 |
# to correctly write lines separated by '\n' in a single step |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
511 |
rows = [] |
9902
62c586f32f93
[dataimport] _create_copyfrom_buffer: do not ignore columns if data is a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9901
diff
changeset
|
512 |
if columns is None: |
62c586f32f93
[dataimport] _create_copyfrom_buffer: do not ignore columns if data is a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9901
diff
changeset
|
513 |
if isinstance(data[0], (tuple, list)): |
62c586f32f93
[dataimport] _create_copyfrom_buffer: do not ignore columns if data is a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9901
diff
changeset
|
514 |
columns = range(len(data[0])) |
62c586f32f93
[dataimport] _create_copyfrom_buffer: do not ignore columns if data is a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9901
diff
changeset
|
515 |
elif isinstance(data[0], dict): |
62c586f32f93
[dataimport] _create_copyfrom_buffer: do not ignore columns if data is a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9901
diff
changeset
|
516 |
columns = data[0].keys() |
62c586f32f93
[dataimport] _create_copyfrom_buffer: do not ignore columns if data is a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9901
diff
changeset
|
517 |
else: |
62c586f32f93
[dataimport] _create_copyfrom_buffer: do not ignore columns if data is a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9901
diff
changeset
|
518 |
raise ValueError('Could not get columns: you must provide columns.') |
8625
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
519 |
for row in data: |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
520 |
# Iterate over the different columns and the different values |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
521 |
# and try to convert them to a correct datatype. |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
522 |
# If an error is raised, do not continue. |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
523 |
formatted_row = [] |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
524 |
for col in columns: |
8926
336e4971dc50
[dataimport] backout 6947201033be (related to #2788402)
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
8900
diff
changeset
|
525 |
try: |
8834
6947201033be
[dataimport] Handle various data formats when creating buffers from data.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
8833
diff
changeset
|
526 |
value = row[col] |
8926
336e4971dc50
[dataimport] backout 6947201033be (related to #2788402)
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
8900
diff
changeset
|
527 |
except KeyError: |
9898
70056633085c
[dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9827
diff
changeset
|
528 |
warnings.warn(u"Column %s is not accessible in row %s" |
8926
336e4971dc50
[dataimport] backout 6947201033be (related to #2788402)
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
8900
diff
changeset
|
529 |
% (col, row), RuntimeWarning) |
9898
70056633085c
[dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9827
diff
changeset
|
530 |
# XXX 'value' set to None so that the import does not end in |
70056633085c
[dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9827
diff
changeset
|
531 |
# error. |
70056633085c
[dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9827
diff
changeset
|
532 |
# Instead, the extra keys are set to NULL from the |
8926
336e4971dc50
[dataimport] backout 6947201033be (related to #2788402)
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
8900
diff
changeset
|
533 |
# database point of view. |
336e4971dc50
[dataimport] backout 6947201033be (related to #2788402)
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
8900
diff
changeset
|
534 |
value = None |
9898
70056633085c
[dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9827
diff
changeset
|
535 |
for types, converter in _COPYFROM_BUFFER_CONVERTERS: |
70056633085c
[dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9827
diff
changeset
|
536 |
if isinstance(value, types): |
70056633085c
[dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9827
diff
changeset
|
537 |
value = converter(value, **convert_opts) |
70056633085c
[dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9827
diff
changeset
|
538 |
break |
8625
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
539 |
else: |
9900
9c7de09a6648
[dataimport] _create_copyfrom_buffer: raise ValueError if conversion cannot be performed
Alain Leufroy <alain.leufroy@logilab.fr>
parents:
9899
diff
changeset
|
540 |
raise ValueError("Unsupported value type %s" % type(value)) |
8625
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
541 |
# We push the value to the new formatted row |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
542 |
# if the value is not None and could be converted to a string. |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
543 |
formatted_row.append(value) |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
544 |
rows.append('\t'.join(formatted_row)) |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
545 |
return StringIO('\n'.join(rows)) |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
546 |
|
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
547 |
|
4152
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
548 |
# object stores ################################################################# |
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
549 |
|
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
550 |
class ObjectStore(object): |
4527
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
551 |
"""Store objects in memory for *faster* validation (development mode) |
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
552 |
|
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
553 |
But it will not enforce the constraints of the schema and hence will miss some problems |
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
554 |
|
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
555 |
>>> store = ObjectStore() |
7158
0f31a50b144e
[dataimport] cleanups, update docstring to up-to-date usage
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
7118
diff
changeset
|
556 |
>>> user = store.create_entity('CWUser', login=u'johndoe') |
0f31a50b144e
[dataimport] cleanups, update docstring to up-to-date usage
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
7118
diff
changeset
|
557 |
>>> group = store.create_entity('CWUser', name=u'unknown') |
0f31a50b144e
[dataimport] cleanups, update docstring to up-to-date usage
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
7118
diff
changeset
|
558 |
>>> store.relate(user.eid, 'in_group', group.eid) |
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
559 |
""" |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
560 |
def __init__(self): |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
561 |
self.items = [] |
3003
2944ee420dca
R [dataimport] rename uid to eid
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
2974
diff
changeset
|
562 |
self.eids = {} |
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
563 |
self.types = {} |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
564 |
self.relations = set() |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
565 |
self.indexes = {} |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
566 |
|
6990
353ad06867a8
[dataimport] implement create_entity() on ObjectStore to provide a consistent interface
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
6989
diff
changeset
|
567 |
def create_entity(self, etype, **data): |
7159
3bcccd3ab6b6
[dataimport] ObjectStore.create_entity should return something that looks like an entity (eg no more using dict protocol to access to attributes)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
7158
diff
changeset
|
568 |
data = attrdict(data) |
9905
1fa35cc06c69
[dataimport] remove dead code
Julien Cristau <julien.cristau@logilab.fr>
parents:
9904
diff
changeset
|
569 |
data['eid'] = eid = len(self.items) |
1fa35cc06c69
[dataimport] remove dead code
Julien Cristau <julien.cristau@logilab.fr>
parents:
9904
diff
changeset
|
570 |
self.items.append(data) |
6990
353ad06867a8
[dataimport] implement create_entity() on ObjectStore to provide a consistent interface
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
6989
diff
changeset
|
571 |
self.eids[eid] = data |
353ad06867a8
[dataimport] implement create_entity() on ObjectStore to provide a consistent interface
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
6989
diff
changeset
|
572 |
self.types.setdefault(etype, []).append(eid) |
353ad06867a8
[dataimport] implement create_entity() on ObjectStore to provide a consistent interface
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
6989
diff
changeset
|
573 |
return data |
353ad06867a8
[dataimport] implement create_entity() on ObjectStore to provide a consistent interface
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
6989
diff
changeset
|
574 |
|
8832
26cdfc6dd6f8
[dataimport] Uniformize the API across the different stores.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
8807
diff
changeset
|
575 |
def relate(self, eid_from, rtype, eid_to, **kwargs): |
4912
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
576 |
"""Add new relation""" |
4527
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
577 |
relation = eid_from, rtype, eid_to |
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
578 |
self.relations.add(relation) |
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
579 |
return relation |
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
580 |
|
4912
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
581 |
def commit(self): |
9908
88bbb3abf30f
[dataimport] Stop swallowing errors from commit/flush
Julien Cristau <julien.cristau@logilab.fr>
parents:
9907
diff
changeset
|
582 |
"""this commit method does nothing by default""" |
88bbb3abf30f
[dataimport] Stop swallowing errors from commit/flush
Julien Cristau <julien.cristau@logilab.fr>
parents:
9907
diff
changeset
|
583 |
return |
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
584 |
|
8833
39f81e2db2fc
[dataimport] Add a ``flush`` method for all stores.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
8832
diff
changeset
|
585 |
def flush(self): |
9910
55d9d483e7c3
[dataimport] don't commit on flush
Julien Cristau <julien.cristau@logilab.fr>
parents:
9908
diff
changeset
|
586 |
"""The method is provided so that all stores share a common API""" |
55d9d483e7c3
[dataimport] don't commit on flush
Julien Cristau <julien.cristau@logilab.fr>
parents:
9908
diff
changeset
|
587 |
pass |
4527
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
588 |
|
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
589 |
@property |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
590 |
def nb_inserted_entities(self): |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
591 |
return len(self.eids) |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
592 |
@property |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
593 |
def nb_inserted_types(self): |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
594 |
return len(self.types) |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
595 |
@property |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
596 |
def nb_inserted_relations(self): |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
597 |
return len(self.relations) |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
598 |
|
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
599 |
class RQLObjectStore(ObjectStore): |
4527
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
600 |
"""ObjectStore that works with an actual RQL repository (production mode)""" |
4136
47060a66c97f
dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
3486
diff
changeset
|
601 |
|
9907
696b81eba218
[dataimport] Update RQLObjectStore to Connection API
Julien Cristau <julien.cristau@logilab.fr>
parents:
9906
diff
changeset
|
602 |
def __init__(self, cnx, commit=None): |
696b81eba218
[dataimport] Update RQLObjectStore to Connection API
Julien Cristau <julien.cristau@logilab.fr>
parents:
9906
diff
changeset
|
603 |
if commit is not None: |
696b81eba218
[dataimport] Update RQLObjectStore to Connection API
Julien Cristau <julien.cristau@logilab.fr>
parents:
9906
diff
changeset
|
604 |
warnings.warn('[3.19] commit argument should not be specified ' |
696b81eba218
[dataimport] Update RQLObjectStore to Connection API
Julien Cristau <julien.cristau@logilab.fr>
parents:
9906
diff
changeset
|
605 |
'as the cnx object already provides it.', |
696b81eba218
[dataimport] Update RQLObjectStore to Connection API
Julien Cristau <julien.cristau@logilab.fr>
parents:
9906
diff
changeset
|
606 |
DeprecationWarning, stacklevel=2) |
696b81eba218
[dataimport] Update RQLObjectStore to Connection API
Julien Cristau <julien.cristau@logilab.fr>
parents:
9906
diff
changeset
|
607 |
super(RQLObjectStore, self).__init__() |
696b81eba218
[dataimport] Update RQLObjectStore to Connection API
Julien Cristau <julien.cristau@logilab.fr>
parents:
9906
diff
changeset
|
608 |
self._cnx = cnx |
696b81eba218
[dataimport] Update RQLObjectStore to Connection API
Julien Cristau <julien.cristau@logilab.fr>
parents:
9906
diff
changeset
|
609 |
self._commit = commit or cnx.commit |
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
610 |
|
4912
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
611 |
def commit(self): |
9907
696b81eba218
[dataimport] Update RQLObjectStore to Connection API
Julien Cristau <julien.cristau@logilab.fr>
parents:
9906
diff
changeset
|
612 |
return self._commit() |
4136
47060a66c97f
dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
3486
diff
changeset
|
613 |
|
47060a66c97f
dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
3486
diff
changeset
|
614 |
def rql(self, *args): |
9907
696b81eba218
[dataimport] Update RQLObjectStore to Connection API
Julien Cristau <julien.cristau@logilab.fr>
parents:
9906
diff
changeset
|
615 |
return self._cnx.execute(*args) |
696b81eba218
[dataimport] Update RQLObjectStore to Connection API
Julien Cristau <julien.cristau@logilab.fr>
parents:
9906
diff
changeset
|
616 |
|
696b81eba218
[dataimport] Update RQLObjectStore to Connection API
Julien Cristau <julien.cristau@logilab.fr>
parents:
9906
diff
changeset
|
617 |
@property |
696b81eba218
[dataimport] Update RQLObjectStore to Connection API
Julien Cristau <julien.cristau@logilab.fr>
parents:
9906
diff
changeset
|
618 |
def session(self): |
696b81eba218
[dataimport] Update RQLObjectStore to Connection API
Julien Cristau <julien.cristau@logilab.fr>
parents:
9906
diff
changeset
|
619 |
warnings.warn('[3.19] deprecated property.', DeprecationWarning, |
696b81eba218
[dataimport] Update RQLObjectStore to Connection API
Julien Cristau <julien.cristau@logilab.fr>
parents:
9906
diff
changeset
|
620 |
stacklevel=2) |
696b81eba218
[dataimport] Update RQLObjectStore to Connection API
Julien Cristau <julien.cristau@logilab.fr>
parents:
9906
diff
changeset
|
621 |
return self._cnx.repo._get_session(self._cnx.sessionid) |
4136
47060a66c97f
dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
3486
diff
changeset
|
622 |
|
47060a66c97f
dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
3486
diff
changeset
|
623 |
def create_entity(self, *args, **kwargs): |
9907
696b81eba218
[dataimport] Update RQLObjectStore to Connection API
Julien Cristau <julien.cristau@logilab.fr>
parents:
9906
diff
changeset
|
624 |
entity = self._cnx.create_entity(*args, **kwargs) |
4136
47060a66c97f
dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
3486
diff
changeset
|
625 |
self.eids[entity.eid] = entity |
47060a66c97f
dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
3486
diff
changeset
|
626 |
self.types.setdefault(args[0], []).append(entity.eid) |
47060a66c97f
dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
3486
diff
changeset
|
627 |
return entity |
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
628 |
|
8832
26cdfc6dd6f8
[dataimport] Uniformize the API across the different stores.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
8807
diff
changeset
|
629 |
def relate(self, eid_from, rtype, eid_to, **kwargs): |
4721
8f63691ccb7f
pylint style fixes
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4613
diff
changeset
|
630 |
eid_from, rtype, eid_to = super(RQLObjectStore, self).relate( |
8832
26cdfc6dd6f8
[dataimport] Uniformize the API across the different stores.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
8807
diff
changeset
|
631 |
eid_from, rtype, eid_to, **kwargs) |
4136
47060a66c97f
dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
3486
diff
changeset
|
632 |
self.rql('SET X %s Y WHERE X eid %%(x)s, Y eid %%(y)s' % rtype, |
7033
ddc1b4d80dbd
[dataimport] remove eid_key deprecation warning and prints
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
6990
diff
changeset
|
633 |
{'x': int(eid_from), 'y': int(eid_to)}) |
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
634 |
|
9907
696b81eba218
[dataimport] Update RQLObjectStore to Connection API
Julien Cristau <julien.cristau@logilab.fr>
parents:
9906
diff
changeset
|
635 |
@deprecated("[3.19] use cnx.find(*args, **kwargs).entities() instead") |
7116
dfd4680a23f0
[session] add find_entities and find_one_entity to session/request API (closes #1550045)
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
7033
diff
changeset
|
636 |
def find_entities(self, *args, **kwargs): |
9907
696b81eba218
[dataimport] Update RQLObjectStore to Connection API
Julien Cristau <julien.cristau@logilab.fr>
parents:
9906
diff
changeset
|
637 |
return self._cnx.find(*args, **kwargs).entities() |
7116
dfd4680a23f0
[session] add find_entities and find_one_entity to session/request API (closes #1550045)
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
7033
diff
changeset
|
638 |
|
9907
696b81eba218
[dataimport] Update RQLObjectStore to Connection API
Julien Cristau <julien.cristau@logilab.fr>
parents:
9906
diff
changeset
|
639 |
@deprecated("[3.19] use cnx.find(*args, **kwargs).one() instead") |
7116
dfd4680a23f0
[session] add find_entities and find_one_entity to session/request API (closes #1550045)
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
7033
diff
changeset
|
640 |
def find_one_entity(self, *args, **kwargs): |
9907
696b81eba218
[dataimport] Update RQLObjectStore to Connection API
Julien Cristau <julien.cristau@logilab.fr>
parents:
9906
diff
changeset
|
641 |
return self._cnx.find(*args, **kwargs).one() |
4152
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
642 |
|
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
643 |
# the import controller ######################################################## |
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
644 |
|
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
645 |
class CWImportController(object): |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
646 |
"""Controller of the data import process. |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
647 |
|
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
648 |
>>> ctl = CWImportController(store) |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
649 |
>>> ctl.generators = list_of_data_generators |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
650 |
>>> ctl.data = dict_of_data_tables |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
651 |
>>> ctl.run() |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
652 |
""" |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
653 |
|
4527
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
654 |
def __init__(self, store, askerror=0, catcherrors=None, tell=tell, |
4152
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
655 |
commitevery=50): |
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
656 |
self.store = store |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
657 |
self.generators = None |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
658 |
self.data = {} |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
659 |
self.errors = None |
4152
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
660 |
self.askerror = askerror |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
661 |
if catcherrors is None: |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
662 |
catcherrors = askerror |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
663 |
self.catcherrors = catcherrors |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
664 |
self.commitevery = commitevery # set to None to do a single commit |
3029
bc573d5fb5b7
F [devtools] by default dataimport prints message on stdout
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
3003
diff
changeset
|
665 |
self._tell = tell |
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
666 |
|
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
667 |
def check(self, type, key, value): |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
668 |
self._checks.setdefault(type, {}).setdefault(key, []).append(value) |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
669 |
|
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
670 |
def check_map(self, entity, key, map, default): |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
671 |
try: |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
672 |
entity[key] = map[entity[key]] |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
673 |
except KeyError: |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
674 |
self.check(key, entity[key], None) |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
675 |
entity[key] = default |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
676 |
|
4152
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
677 |
def record_error(self, key, msg=None, type=None, value=None, tb=None): |
4186
ca7e526b07b6
import cleanup, check data file exists
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4173
diff
changeset
|
678 |
tmp = StringIO() |
4152
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
679 |
if type is None: |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
680 |
traceback.print_exc(file=tmp) |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
681 |
else: |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
682 |
traceback.print_exception(type, value, tb, file=tmp) |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
683 |
# use a list to avoid counting a <nb lines> errors instead of one |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
684 |
errorlog = self.errors.setdefault(key, []) |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
685 |
if msg is None: |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
686 |
errorlog.append(tmp.getvalue().splitlines()) |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
687 |
else: |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
688 |
errorlog.append( (msg, tmp.getvalue().splitlines()) ) |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
689 |
|
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
690 |
def run(self): |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
691 |
self.errors = {} |
7171
4297be67bbe4
[dataimport] tell more and nicely about transaction status
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
7170
diff
changeset
|
692 |
if self.commitevery is None: |
4297be67bbe4
[dataimport] tell more and nicely about transaction status
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
7170
diff
changeset
|
693 |
self.tell('Will commit all or nothing.') |
4297be67bbe4
[dataimport] tell more and nicely about transaction status
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
7170
diff
changeset
|
694 |
else: |
4297be67bbe4
[dataimport] tell more and nicely about transaction status
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
7170
diff
changeset
|
695 |
self.tell('Will commit every %s iterations' % self.commitevery) |
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
696 |
for func, checks in self.generators: |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
697 |
self._checks = {} |
4912
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
698 |
func_name = func.__name__ |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
699 |
self.tell("Run import function '%s'..." % func_name) |
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
700 |
try: |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
701 |
func(self) |
7815
2a164a9cf81c
[exceptions] stop catching any exception in various places (closes #1942716)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
7471
diff
changeset
|
702 |
except Exception: |
4152
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
703 |
if self.catcherrors: |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
704 |
self.record_error(func_name, 'While calling %s' % func.__name__) |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
705 |
else: |
4912
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
706 |
self._print_stats() |
4152
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
707 |
raise |
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
708 |
for key, func, title, help in checks: |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
709 |
buckets = self._checks.get(key) |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
710 |
if buckets: |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
711 |
err = func(buckets) |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
712 |
if err: |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
713 |
self.errors[title] = (help, err) |
7171
4297be67bbe4
[dataimport] tell more and nicely about transaction status
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
7170
diff
changeset
|
714 |
try: |
4297be67bbe4
[dataimport] tell more and nicely about transaction status
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
7170
diff
changeset
|
715 |
txuuid = self.store.commit() |
4297be67bbe4
[dataimport] tell more and nicely about transaction status
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
7170
diff
changeset
|
716 |
if txuuid is not None: |
4297be67bbe4
[dataimport] tell more and nicely about transaction status
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
7170
diff
changeset
|
717 |
self.tell('Transaction commited (txuuid: %s)' % txuuid) |
8695
358d8bed9626
[toward-py3k] rewrite to "except AnException as exc:" (part of #2711624)
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
8637
diff
changeset
|
718 |
except QueryError as ex: |
7171
4297be67bbe4
[dataimport] tell more and nicely about transaction status
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
7170
diff
changeset
|
719 |
self.tell('Transaction aborted: %s' % ex) |
4912
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
720 |
self._print_stats() |
4527
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
721 |
if self.errors: |
4721
8f63691ccb7f
pylint style fixes
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4613
diff
changeset
|
722 |
if self.askerror == 2 or (self.askerror and confirm('Display errors ?')): |
4527
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
723 |
from pprint import pformat |
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
724 |
for errkey, error in self.errors.items(): |
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
725 |
self.tell("\n%s (%s): %d\n" % (error[0], errkey, len(error[1]))) |
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
726 |
self.tell(pformat(sorted(error[1]))) |
7171
4297be67bbe4
[dataimport] tell more and nicely about transaction status
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
7170
diff
changeset
|
727 |
|
4912
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
728 |
def _print_stats(self): |
8696
0bb18407c053
[toward py3k] rewrite dict.keys() and dict.values() (part of #2711624)
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
8695
diff
changeset
|
729 |
nberrors = sum(len(err) for err in self.errors.itervalues()) |
4912
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
730 |
self.tell('\nImport statistics: %i entities, %i types, %i relations and %i errors' |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
731 |
% (self.store.nb_inserted_entities, |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
732 |
self.store.nb_inserted_types, |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
733 |
self.store.nb_inserted_relations, |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
734 |
nberrors)) |
9767cc516b4f
[R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4847
diff
changeset
|
735 |
|
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
736 |
def get_data(self, key): |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
737 |
return self.data.get(key) |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
738 |
|
4527
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
739 |
def index(self, name, key, value, unique=False): |
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
740 |
"""create a new index |
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
741 |
|
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
742 |
If unique is set to True, only first occurence will be kept not the following ones |
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
743 |
""" |
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
744 |
if unique: |
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
745 |
try: |
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
746 |
if value in self.store.indexes[name][key]: |
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
747 |
return |
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
748 |
except KeyError: |
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
749 |
# we're sure that one is the first occurence; so continue... |
67ab70e98488
[R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents:
4252
diff
changeset
|
750 |
pass |
2974
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
751 |
self.store.indexes.setdefault(name, {}).setdefault(key, []).append(value) |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
752 |
|
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
753 |
def tell(self, msg): |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
754 |
self._tell(msg) |
3dfe497e5afa
F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff
changeset
|
755 |
|
4152
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
756 |
def iter_and_commit(self, datakey): |
30fd1229137d
new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4140
diff
changeset
|
757 |
"""iter rows, triggering commit every self.commitevery iterations""" |
6136
79da6f969b15
[dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
6133
diff
changeset
|
758 |
if self.commitevery is None: |
79da6f969b15
[dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
6133
diff
changeset
|
759 |
return self.get_data(datakey) |
79da6f969b15
[dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
6133
diff
changeset
|
760 |
else: |
6169
55378e1bab1b
fix order of parameters in call to callfunc_every
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents:
6136
diff
changeset
|
761 |
return callfunc_every(self.store.commit, |
55378e1bab1b
fix order of parameters in call to callfunc_every
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents:
6136
diff
changeset
|
762 |
self.commitevery, |
55378e1bab1b
fix order of parameters in call to callfunc_every
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents:
6136
diff
changeset
|
763 |
self.get_data(datakey)) |
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
764 |
|
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
765 |
|
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
766 |
class NoHookRQLObjectStore(RQLObjectStore): |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
767 |
"""ObjectStore that works with an actual RQL repository (production mode)""" |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
768 |
|
10189
0b141ffcdd74
[dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10091
diff
changeset
|
769 |
def __init__(self, cnx, metagen=None, baseurl=None): |
0b141ffcdd74
[dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10091
diff
changeset
|
770 |
super(NoHookRQLObjectStore, self).__init__(cnx) |
0b141ffcdd74
[dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10091
diff
changeset
|
771 |
self.source = cnx.repo.system_source |
0b141ffcdd74
[dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10091
diff
changeset
|
772 |
self.rschema = cnx.repo.schema.rschema |
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
773 |
self.add_relation = self.source.add_relation |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
774 |
if metagen is None: |
10189
0b141ffcdd74
[dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10091
diff
changeset
|
775 |
metagen = MetaGenerator(cnx, baseurl) |
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
776 |
self.metagen = metagen |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
777 |
self._nb_inserted_entities = 0 |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
778 |
self._nb_inserted_types = 0 |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
779 |
self._nb_inserted_relations = 0 |
5054
cb066d29166a
fix dataimport for 3.7.2
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4913
diff
changeset
|
780 |
# deactivate security |
10189
0b141ffcdd74
[dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10091
diff
changeset
|
781 |
cnx.read_security = False |
0b141ffcdd74
[dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10091
diff
changeset
|
782 |
cnx.write_security = False |
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
783 |
|
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
784 |
def create_entity(self, etype, **kwargs): |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
785 |
for k, v in kwargs.iteritems(): |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
786 |
kwargs[k] = getattr(v, 'eid', v) |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
787 |
entity, rels = self.metagen.base_etype_dicts(etype) |
7471
bf9443f8725f
[dataimport] fix #1732685: cached entity and shared cw_edited data with NoHookRQLObjectStore
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
7398
diff
changeset
|
788 |
# make a copy to keep cached entity pristine |
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
789 |
entity = copy(entity) |
7471
bf9443f8725f
[dataimport] fix #1732685: cached entity and shared cw_edited data with NoHookRQLObjectStore
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
7398
diff
changeset
|
790 |
entity.cw_edited = copy(entity.cw_edited) |
5557
1a534c596bff
[entity] continue cleanup of Entity/AnyEntity namespace
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
5424
diff
changeset
|
791 |
entity.cw_clear_relation_cache() |
6142
8bc6eac1fac1
[session] cleanup hook / operation / entity edition api
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
6122
diff
changeset
|
792 |
entity.cw_edited.update(kwargs, skipsec=False) |
10295
080ac14df6fa
[dataimport] make MetaDataGenerator / NoHookObjectStore usable from a datafeed parser
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10294
diff
changeset
|
793 |
entity_source, extid = self.metagen.init_entity(entity) |
10189
0b141ffcdd74
[dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10091
diff
changeset
|
794 |
cnx = self._cnx |
0b141ffcdd74
[dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10091
diff
changeset
|
795 |
self.source.add_entity(cnx, entity) |
10295
080ac14df6fa
[dataimport] make MetaDataGenerator / NoHookObjectStore usable from a datafeed parser
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10294
diff
changeset
|
796 |
self.source.add_info(cnx, entity, entity_source, extid) |
8832
26cdfc6dd6f8
[dataimport] Uniformize the API across the different stores.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
8807
diff
changeset
|
797 |
kwargs = dict() |
26cdfc6dd6f8
[dataimport] Uniformize the API across the different stores.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
8807
diff
changeset
|
798 |
if inspect.getargspec(self.add_relation).keywords: |
8900
010a59e12d89
use cw_etype instead of __regid__
Pierre-Yves David <pierre-yves.david@logilab.fr>
parents:
8835
diff
changeset
|
799 |
kwargs['subjtype'] = entity.cw_etype |
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
800 |
for rtype, targeteids in rels.iteritems(): |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
801 |
# targeteids may be a single eid or a list of eids |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
802 |
inlined = self.rschema(rtype).inlined |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
803 |
try: |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
804 |
for targeteid in targeteids: |
10189
0b141ffcdd74
[dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10091
diff
changeset
|
805 |
self.add_relation(cnx, entity.eid, rtype, targeteid, |
8832
26cdfc6dd6f8
[dataimport] Uniformize the API across the different stores.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
8807
diff
changeset
|
806 |
inlined, **kwargs) |
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
807 |
except TypeError: |
10189
0b141ffcdd74
[dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10091
diff
changeset
|
808 |
self.add_relation(cnx, entity.eid, rtype, targeteids, |
8832
26cdfc6dd6f8
[dataimport] Uniformize the API across the different stores.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
8807
diff
changeset
|
809 |
inlined, **kwargs) |
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
810 |
self._nb_inserted_entities += 1 |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
811 |
return entity |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
812 |
|
8832
26cdfc6dd6f8
[dataimport] Uniformize the API across the different stores.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
8807
diff
changeset
|
813 |
def relate(self, eid_from, rtype, eid_to, **kwargs): |
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
814 |
assert not rtype.startswith('reverse_') |
10189
0b141ffcdd74
[dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10091
diff
changeset
|
815 |
self.add_relation(self._cnx, eid_from, rtype, eid_to, |
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
816 |
self.rschema(rtype).inlined) |
9597
8e9db17ce129
[dataimport] Correctly call rschema(rtype) in SqlGenObjectStore, closes #3694139
Vincent Michel <vincent.michel@logilab.fr>
parents:
9536
diff
changeset
|
817 |
if self.rschema(rtype).symmetric: |
10189
0b141ffcdd74
[dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10091
diff
changeset
|
818 |
self.add_relation(self._cnx, eid_to, rtype, eid_from, |
9361
0542a85fe667
symmetric relations: replace bogus rql2sql translation by a hook
Aurelien Campeas <aurelien.campeas@logilab.fr>
parents:
9181
diff
changeset
|
819 |
self.rschema(rtype).inlined) |
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
820 |
self._nb_inserted_relations += 1 |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
821 |
|
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
822 |
@property |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
823 |
def nb_inserted_entities(self): |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
824 |
return self._nb_inserted_entities |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
825 |
@property |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
826 |
def nb_inserted_types(self): |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
827 |
return self._nb_inserted_types |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
828 |
@property |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
829 |
def nb_inserted_relations(self): |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
830 |
return self._nb_inserted_relations |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
831 |
|
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
832 |
|
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
833 |
class MetaGenerator(object): |
6427
c8a5ac2d1eaa
[schema / sources] store data sources as cubicweb entities
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
6173
diff
changeset
|
834 |
META_RELATIONS = (META_RTYPES |
c8a5ac2d1eaa
[schema / sources] store data sources as cubicweb entities
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
6173
diff
changeset
|
835 |
- VIRTUAL_RTYPES |
c8a5ac2d1eaa
[schema / sources] store data sources as cubicweb entities
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
6173
diff
changeset
|
836 |
- set(('eid', 'cwuri', |
c8a5ac2d1eaa
[schema / sources] store data sources as cubicweb entities
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
6173
diff
changeset
|
837 |
'is', 'is_instance_of', 'cw_source'))) |
c8a5ac2d1eaa
[schema / sources] store data sources as cubicweb entities
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
6173
diff
changeset
|
838 |
|
10295
080ac14df6fa
[dataimport] make MetaDataGenerator / NoHookObjectStore usable from a datafeed parser
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10294
diff
changeset
|
839 |
def __init__(self, cnx, baseurl=None, source=None): |
10189
0b141ffcdd74
[dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10091
diff
changeset
|
840 |
self._cnx = cnx |
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
841 |
if baseurl is None: |
10189
0b141ffcdd74
[dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10091
diff
changeset
|
842 |
config = cnx.vreg.config |
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
843 |
baseurl = config['base-url'] or config.default_base_url() |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
844 |
if not baseurl[-1] == '/': |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
845 |
baseurl += '/' |
10295
080ac14df6fa
[dataimport] make MetaDataGenerator / NoHookObjectStore usable from a datafeed parser
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10294
diff
changeset
|
846 |
self.baseurl = baseurl |
080ac14df6fa
[dataimport] make MetaDataGenerator / NoHookObjectStore usable from a datafeed parser
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10294
diff
changeset
|
847 |
if source is None: |
080ac14df6fa
[dataimport] make MetaDataGenerator / NoHookObjectStore usable from a datafeed parser
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10294
diff
changeset
|
848 |
source = cnx.repo.system_source |
080ac14df6fa
[dataimport] make MetaDataGenerator / NoHookObjectStore usable from a datafeed parser
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10294
diff
changeset
|
849 |
self.source = source |
080ac14df6fa
[dataimport] make MetaDataGenerator / NoHookObjectStore usable from a datafeed parser
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10294
diff
changeset
|
850 |
self.create_eid = cnx.repo.system_source.create_eid |
080ac14df6fa
[dataimport] make MetaDataGenerator / NoHookObjectStore usable from a datafeed parser
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10294
diff
changeset
|
851 |
self.time = datetime.now() |
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
852 |
# attributes/relations shared by all entities of the same type |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
853 |
self.etype_attrs = [] |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
854 |
self.etype_rels = [] |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
855 |
# attributes/relations specific to each entity |
5054
cb066d29166a
fix dataimport for 3.7.2
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4913
diff
changeset
|
856 |
self.entity_attrs = ['cwuri'] |
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
857 |
#self.entity_rels = [] XXX not handled (YAGNI?) |
10189
0b141ffcdd74
[dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10091
diff
changeset
|
858 |
schema = cnx.vreg.schema |
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
859 |
rschema = schema.rschema |
6427
c8a5ac2d1eaa
[schema / sources] store data sources as cubicweb entities
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
6173
diff
changeset
|
860 |
for rtype in self.META_RELATIONS: |
10286
0f8c3ac88f1e
[dataimport] don't insert created_by / owned_by relations when user is the internal manager (eid = -1).
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10272
diff
changeset
|
861 |
# skip owned_by / created_by if user is the internal manager |
0f8c3ac88f1e
[dataimport] don't insert created_by / owned_by relations when user is the internal manager (eid = -1).
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10272
diff
changeset
|
862 |
if cnx.user.eid == -1 and rtype in ('owned_by', 'created_by'): |
0f8c3ac88f1e
[dataimport] don't insert created_by / owned_by relations when user is the internal manager (eid = -1).
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10272
diff
changeset
|
863 |
continue |
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
864 |
if rschema(rtype).final: |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
865 |
self.etype_attrs.append(rtype) |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
866 |
else: |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
867 |
self.etype_rels.append(rtype) |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
868 |
|
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
869 |
@cached |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
870 |
def base_etype_dicts(self, etype): |
10189
0b141ffcdd74
[dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10091
diff
changeset
|
871 |
entity = self._cnx.vreg['etypes'].etype_class(etype)(self._cnx) |
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
872 |
# entity are "surface" copied, avoid shared dict between copies |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
873 |
del entity.cw_extra_kwargs |
6142
8bc6eac1fac1
[session] cleanup hook / operation / entity edition api
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
6122
diff
changeset
|
874 |
entity.cw_edited = EditedEntity(entity) |
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
875 |
for attr in self.etype_attrs: |
9697
d96b5e72717c
[dataimport] Add safety belt on "gen_rtype" in MetaGenerator, closes #3712892
Vincent Michel <vincent.michel@logilab.fr>
parents:
9696
diff
changeset
|
876 |
genfunc = self.generate(attr) |
d96b5e72717c
[dataimport] Add safety belt on "gen_rtype" in MetaGenerator, closes #3712892
Vincent Michel <vincent.michel@logilab.fr>
parents:
9696
diff
changeset
|
877 |
if genfunc: |
d96b5e72717c
[dataimport] Add safety belt on "gen_rtype" in MetaGenerator, closes #3712892
Vincent Michel <vincent.michel@logilab.fr>
parents:
9696
diff
changeset
|
878 |
entity.cw_edited.edited_attribute(attr, genfunc(entity)) |
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
879 |
rels = {} |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
880 |
for rel in self.etype_rels: |
9697
d96b5e72717c
[dataimport] Add safety belt on "gen_rtype" in MetaGenerator, closes #3712892
Vincent Michel <vincent.michel@logilab.fr>
parents:
9696
diff
changeset
|
881 |
genfunc = self.generate(rel) |
d96b5e72717c
[dataimport] Add safety belt on "gen_rtype" in MetaGenerator, closes #3712892
Vincent Michel <vincent.michel@logilab.fr>
parents:
9696
diff
changeset
|
882 |
if genfunc: |
d96b5e72717c
[dataimport] Add safety belt on "gen_rtype" in MetaGenerator, closes #3712892
Vincent Michel <vincent.michel@logilab.fr>
parents:
9696
diff
changeset
|
883 |
rels[rel] = genfunc(entity) |
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
884 |
return entity, rels |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
885 |
|
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
886 |
def init_entity(self, entity): |
10295
080ac14df6fa
[dataimport] make MetaDataGenerator / NoHookObjectStore usable from a datafeed parser
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10294
diff
changeset
|
887 |
entity.eid = self.create_eid(self._cnx) |
080ac14df6fa
[dataimport] make MetaDataGenerator / NoHookObjectStore usable from a datafeed parser
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10294
diff
changeset
|
888 |
extid = entity.cw_edited.get('cwuri') |
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
889 |
for attr in self.entity_attrs: |
10294
277074659cad
[dataimport] don't generate metadata which are explicitly specified
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10286
diff
changeset
|
890 |
if attr in entity.cw_edited: |
277074659cad
[dataimport] don't generate metadata which are explicitly specified
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10286
diff
changeset
|
891 |
# already set, skip this attribute |
277074659cad
[dataimport] don't generate metadata which are explicitly specified
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10286
diff
changeset
|
892 |
continue |
9697
d96b5e72717c
[dataimport] Add safety belt on "gen_rtype" in MetaGenerator, closes #3712892
Vincent Michel <vincent.michel@logilab.fr>
parents:
9696
diff
changeset
|
893 |
genfunc = self.generate(attr) |
d96b5e72717c
[dataimport] Add safety belt on "gen_rtype" in MetaGenerator, closes #3712892
Vincent Michel <vincent.michel@logilab.fr>
parents:
9696
diff
changeset
|
894 |
if genfunc: |
d96b5e72717c
[dataimport] Add safety belt on "gen_rtype" in MetaGenerator, closes #3712892
Vincent Michel <vincent.michel@logilab.fr>
parents:
9696
diff
changeset
|
895 |
entity.cw_edited.edited_attribute(attr, genfunc(entity)) |
10295
080ac14df6fa
[dataimport] make MetaDataGenerator / NoHookObjectStore usable from a datafeed parser
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10294
diff
changeset
|
896 |
if isinstance(extid, unicode): |
080ac14df6fa
[dataimport] make MetaDataGenerator / NoHookObjectStore usable from a datafeed parser
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10294
diff
changeset
|
897 |
extid = extid.encode('utf-8') |
080ac14df6fa
[dataimport] make MetaDataGenerator / NoHookObjectStore usable from a datafeed parser
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10294
diff
changeset
|
898 |
return self.source, extid |
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
899 |
|
9697
d96b5e72717c
[dataimport] Add safety belt on "gen_rtype" in MetaGenerator, closes #3712892
Vincent Michel <vincent.michel@logilab.fr>
parents:
9696
diff
changeset
|
900 |
def generate(self, rtype): |
d96b5e72717c
[dataimport] Add safety belt on "gen_rtype" in MetaGenerator, closes #3712892
Vincent Michel <vincent.michel@logilab.fr>
parents:
9696
diff
changeset
|
901 |
return getattr(self, 'gen_%s' % rtype, None) |
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
902 |
|
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
903 |
def gen_cwuri(self, entity): |
10294
277074659cad
[dataimport] don't generate metadata which are explicitly specified
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10286
diff
changeset
|
904 |
assert self.baseurl, 'baseurl is None while generating cwuri' |
9515
b0dd5b57d2d8
[dataimport, migration] more fixes in the spirit of a6c32edabc8d:
Dimitri Papadopoulos <dimitri.papadopoulos@cea.fr>
parents:
9440
diff
changeset
|
905 |
return u'%s%s' % (self.baseurl, entity.eid) |
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
906 |
|
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
907 |
def gen_creation_date(self, entity): |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
908 |
return self.time |
9697
d96b5e72717c
[dataimport] Add safety belt on "gen_rtype" in MetaGenerator, closes #3712892
Vincent Michel <vincent.michel@logilab.fr>
parents:
9696
diff
changeset
|
909 |
|
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
910 |
def gen_modification_date(self, entity): |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
911 |
return self.time |
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
912 |
|
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
913 |
def gen_created_by(self, entity): |
10189
0b141ffcdd74
[dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10091
diff
changeset
|
914 |
return self._cnx.user.eid |
9697
d96b5e72717c
[dataimport] Add safety belt on "gen_rtype" in MetaGenerator, closes #3712892
Vincent Michel <vincent.michel@logilab.fr>
parents:
9696
diff
changeset
|
915 |
|
4818
9f9bfbcdecfd
le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
4734
diff
changeset
|
916 |
def gen_owned_by(self, entity): |
10189
0b141ffcdd74
[dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10091
diff
changeset
|
917 |
return self._cnx.user.eid |
8625
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
918 |
|
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
919 |
|
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
920 |
########################################################################### |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
921 |
## SQL object store ####################################################### |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
922 |
########################################################################### |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
923 |
class SQLGenObjectStore(NoHookRQLObjectStore): |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
924 |
"""Controller of the data import process. This version is based |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
925 |
on direct insertions throught SQL command (COPY FROM or execute many). |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
926 |
|
10189
0b141ffcdd74
[dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10091
diff
changeset
|
927 |
>>> store = SQLGenObjectStore(cnx) |
8625
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
928 |
>>> store.create_entity('Person', ...) |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
929 |
>>> store.flush() |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
930 |
""" |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
931 |
|
10189
0b141ffcdd74
[dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10091
diff
changeset
|
932 |
def __init__(self, cnx, dump_output_dir=None, nb_threads_statement=3): |
8625
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
933 |
""" |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
934 |
Initialize a SQLGenObjectStore. |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
935 |
|
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
936 |
Parameters: |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
937 |
|
10189
0b141ffcdd74
[dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10091
diff
changeset
|
938 |
- cnx: connection on the cubicweb instance |
8625
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
939 |
- dump_output_dir: a directory to dump failed statements |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
940 |
for easier recovery. Default is None (no dump). |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
941 |
- nb_threads_statement: number of threads used |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
942 |
for SQL insertion (default is 3). |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
943 |
""" |
10189
0b141ffcdd74
[dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10091
diff
changeset
|
944 |
super(SQLGenObjectStore, self).__init__(cnx) |
8625
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
945 |
### hijack default source |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
946 |
self.source = SQLGenSourceWrapper( |
10189
0b141ffcdd74
[dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10091
diff
changeset
|
947 |
self.source, cnx.vreg.schema, |
8625
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
948 |
dump_output_dir=dump_output_dir, |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
949 |
nb_threads_statement=nb_threads_statement) |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
950 |
### XXX This is done in super().__init__(), but should be |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
951 |
### redone here to link to the correct source |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
952 |
self.add_relation = self.source.add_relation |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
953 |
self.indexes_etypes = {} |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
954 |
|
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
955 |
def flush(self): |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
956 |
"""Flush data to the database""" |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
957 |
self.source.flush() |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
958 |
|
8832
26cdfc6dd6f8
[dataimport] Uniformize the API across the different stores.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
8807
diff
changeset
|
959 |
def relate(self, subj_eid, rtype, obj_eid, **kwargs): |
8625
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
960 |
if subj_eid is None or obj_eid is None: |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
961 |
return |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
962 |
# XXX Could subjtype be inferred ? |
10189
0b141ffcdd74
[dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10091
diff
changeset
|
963 |
self.source.add_relation(self._cnx, subj_eid, rtype, obj_eid, |
8832
26cdfc6dd6f8
[dataimport] Uniformize the API across the different stores.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
8807
diff
changeset
|
964 |
self.rschema(rtype).inlined, **kwargs) |
9597
8e9db17ce129
[dataimport] Correctly call rschema(rtype) in SqlGenObjectStore, closes #3694139
Vincent Michel <vincent.michel@logilab.fr>
parents:
9536
diff
changeset
|
965 |
if self.rschema(rtype).symmetric: |
10189
0b141ffcdd74
[dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10091
diff
changeset
|
966 |
self.source.add_relation(self._cnx, obj_eid, rtype, subj_eid, |
9361
0542a85fe667
symmetric relations: replace bogus rql2sql translation by a hook
Aurelien Campeas <aurelien.campeas@logilab.fr>
parents:
9181
diff
changeset
|
967 |
self.rschema(rtype).inlined, **kwargs) |
8625
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
968 |
|
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
969 |
def drop_indexes(self, etype): |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
970 |
"""Drop indexes for a given entity type""" |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
971 |
if etype not in self.indexes_etypes: |
10189
0b141ffcdd74
[dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10091
diff
changeset
|
972 |
cu = self._cnx.cnxset.cu |
8625
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
973 |
def index_to_attr(index): |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
974 |
"""turn an index name to (database) attribute name""" |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
975 |
return index.replace(etype.lower(), '').replace('idx', '').strip('_') |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
976 |
indices = [(index, index_to_attr(index)) |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
977 |
for index in self.source.dbhelper.list_indices(cu, etype) |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
978 |
# Do not consider 'cw_etype_pkey' index |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
979 |
if not index.endswith('key')] |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
980 |
self.indexes_etypes[etype] = indices |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
981 |
for index, attr in self.indexes_etypes[etype]: |
10189
0b141ffcdd74
[dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10091
diff
changeset
|
982 |
self._cnx.system_sql('DROP INDEX %s' % index) |
8625
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
983 |
|
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
984 |
def create_indexes(self, etype): |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
985 |
"""Recreate indexes for a given entity type""" |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
986 |
for index, attr in self.indexes_etypes.get(etype, []): |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
987 |
sql = 'CREATE INDEX %s ON cw_%s(%s)' % (index, etype, attr) |
10189
0b141ffcdd74
[dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10091
diff
changeset
|
988 |
self._cnx.system_sql(sql) |
8625
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
989 |
|
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
990 |
|
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
991 |
########################################################################### |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
992 |
## SQL Source ############################################################# |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
993 |
########################################################################### |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
994 |
|
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
995 |
class SQLGenSourceWrapper(object): |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
996 |
|
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
997 |
def __init__(self, system_source, schema, |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
998 |
dump_output_dir=None, nb_threads_statement=3): |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
999 |
self.system_source = system_source |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1000 |
self._sql = threading.local() |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1001 |
# Explicitely backport attributes from system source |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1002 |
self._storage_handler = self.system_source._storage_handler |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1003 |
self.preprocess_entity = self.system_source.preprocess_entity |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1004 |
self.sqlgen = self.system_source.sqlgen |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1005 |
self.uri = self.system_source.uri |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1006 |
self.eid = self.system_source.eid |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1007 |
# Directory to write temporary files |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1008 |
self.dump_output_dir = dump_output_dir |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1009 |
# Allow to execute code with SQLite backend that does |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1010 |
# not support (yet...) copy_from |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1011 |
# XXX Should be dealt with in logilab.database |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1012 |
spcfrom = system_source.dbhelper.dbapi_module.support_copy_from |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1013 |
self.support_copy_from = spcfrom |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1014 |
self.dbencoding = system_source.dbhelper.dbencoding |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1015 |
self.nb_threads_statement = nb_threads_statement |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1016 |
# initialize thread-local data for main thread |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1017 |
self.init_thread_locals() |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1018 |
self._inlined_rtypes_cache = {} |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1019 |
self._fill_inlined_rtypes_cache(schema) |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1020 |
self.schema = schema |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1021 |
self.do_fti = False |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1022 |
|
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1023 |
def _fill_inlined_rtypes_cache(self, schema): |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1024 |
cache = self._inlined_rtypes_cache |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1025 |
for eschema in schema.entities(): |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1026 |
for rschema in eschema.ordered_relations(): |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1027 |
if rschema.inlined: |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1028 |
cache[eschema.type] = SQL_PREFIX + rschema.type |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1029 |
|
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1030 |
def init_thread_locals(self): |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1031 |
"""initializes thread-local data""" |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1032 |
self._sql.entities = defaultdict(list) |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1033 |
self._sql.relations = {} |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1034 |
self._sql.inlined_relations = {} |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1035 |
# keep track, for each eid of the corresponding data dict |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1036 |
self._sql.eid_insertdicts = {} |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1037 |
|
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1038 |
def flush(self): |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1039 |
print 'starting flush' |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1040 |
_entities_sql = self._sql.entities |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1041 |
_relations_sql = self._sql.relations |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1042 |
_inlined_relations_sql = self._sql.inlined_relations |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1043 |
_insertdicts = self._sql.eid_insertdicts |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1044 |
try: |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1045 |
# try, for each inlined_relation, to find if we're also creating |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1046 |
# the host entity (i.e. the subject of the relation). |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1047 |
# In that case, simply update the insert dict and remove |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1048 |
# the need to make the |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1049 |
# UPDATE statement |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1050 |
for statement, datalist in _inlined_relations_sql.iteritems(): |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1051 |
new_datalist = [] |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1052 |
# for a given inlined relation, |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1053 |
# browse each couple to be inserted |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1054 |
for data in datalist: |
8696
0bb18407c053
[toward py3k] rewrite dict.keys() and dict.values() (part of #2711624)
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
8695
diff
changeset
|
1055 |
keys = list(data) |
8625
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1056 |
# For inlined relations, it exists only two case: |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1057 |
# (rtype, cw_eid) or (cw_eid, rtype) |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1058 |
if keys[0] == 'cw_eid': |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1059 |
rtype = keys[1] |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1060 |
else: |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1061 |
rtype = keys[0] |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1062 |
updated_eid = data['cw_eid'] |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1063 |
if updated_eid in _insertdicts: |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1064 |
_insertdicts[updated_eid][rtype] = data[rtype] |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1065 |
else: |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1066 |
# could not find corresponding insert dict, keep the |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1067 |
# UPDATE query |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1068 |
new_datalist.append(data) |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1069 |
_inlined_relations_sql[statement] = new_datalist |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1070 |
_import_statements(self.system_source.get_connection, |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1071 |
_entities_sql.items() |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1072 |
+ _relations_sql.items() |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1073 |
+ _inlined_relations_sql.items(), |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1074 |
dump_output_dir=self.dump_output_dir, |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1075 |
nb_threads=self.nb_threads_statement, |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1076 |
support_copy_from=self.support_copy_from, |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1077 |
encoding=self.dbencoding) |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1078 |
finally: |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1079 |
_entities_sql.clear() |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1080 |
_relations_sql.clear() |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1081 |
_insertdicts.clear() |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1082 |
_inlined_relations_sql.clear() |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1083 |
|
10189
0b141ffcdd74
[dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10091
diff
changeset
|
1084 |
def add_relation(self, cnx, subject, rtype, object, |
8832
26cdfc6dd6f8
[dataimport] Uniformize the API across the different stores.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
8807
diff
changeset
|
1085 |
inlined=False, **kwargs): |
8625
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1086 |
if inlined: |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1087 |
_sql = self._sql.inlined_relations |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1088 |
data = {'cw_eid': subject, SQL_PREFIX + rtype: object} |
8832
26cdfc6dd6f8
[dataimport] Uniformize the API across the different stores.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
8807
diff
changeset
|
1089 |
subjtype = kwargs.get('subjtype') |
8625
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1090 |
if subjtype is None: |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1091 |
# Try to infer it |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1092 |
targets = [t.type for t in |
9425
d7e8293fa4de
[dataimport] The subjtype should be the subject of a relation, not the object, closes #3365113
Vincent Michel <vincent.michel@logilab.fr>
parents:
9181
diff
changeset
|
1093 |
self.schema.rschema(rtype).subjects()] |
8625
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1094 |
if len(targets) == 1: |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1095 |
subjtype = targets[0] |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1096 |
else: |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1097 |
raise ValueError('You should give the subject etype for ' |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1098 |
'inlined relation %s' |
8835
3612b760488b
[dataimport] Slight message modification in exception handling code.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
8834
diff
changeset
|
1099 |
', as it cannot be inferred: ' |
3612b760488b
[dataimport] Slight message modification in exception handling code.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
8834
diff
changeset
|
1100 |
'this type is given as keyword argument ' |
3612b760488b
[dataimport] Slight message modification in exception handling code.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents:
8834
diff
changeset
|
1101 |
'``subjtype``'% rtype) |
8625
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1102 |
statement = self.sqlgen.update(SQL_PREFIX + subjtype, |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1103 |
data, ['cw_eid']) |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1104 |
else: |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1105 |
_sql = self._sql.relations |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1106 |
data = {'eid_from': subject, 'eid_to': object} |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1107 |
statement = self.sqlgen.insert('%s_relation' % rtype, data) |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1108 |
if statement in _sql: |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1109 |
_sql[statement].append(data) |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1110 |
else: |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1111 |
_sql[statement] = [data] |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1112 |
|
10189
0b141ffcdd74
[dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10091
diff
changeset
|
1113 |
def add_entity(self, cnx, entity): |
8625
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1114 |
with self._storage_handler(entity, 'added'): |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1115 |
attrs = self.preprocess_entity(entity) |
8900
010a59e12d89
use cw_etype instead of __regid__
Pierre-Yves David <pierre-yves.david@logilab.fr>
parents:
8835
diff
changeset
|
1116 |
rtypes = self._inlined_rtypes_cache.get(entity.cw_etype, ()) |
8625
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1117 |
if isinstance(rtypes, str): |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1118 |
rtypes = (rtypes,) |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1119 |
for rtype in rtypes: |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1120 |
if rtype not in attrs: |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1121 |
attrs[rtype] = None |
8900
010a59e12d89
use cw_etype instead of __regid__
Pierre-Yves David <pierre-yves.david@logilab.fr>
parents:
8835
diff
changeset
|
1122 |
sql = self.sqlgen.insert(SQL_PREFIX + entity.cw_etype, attrs) |
8625
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1123 |
self._sql.eid_insertdicts[entity.eid] = attrs |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1124 |
self._append_to_entities(sql, attrs) |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1125 |
|
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1126 |
def _append_to_entities(self, sql, attrs): |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1127 |
self._sql.entities[sql].append(attrs) |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1128 |
|
10189
0b141ffcdd74
[dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10091
diff
changeset
|
1129 |
def _handle_insert_entity_sql(self, cnx, sql, attrs): |
8625
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1130 |
# We have to overwrite the source given in parameters |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1131 |
# as here, we directly use the system source |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1132 |
attrs['asource'] = self.system_source.uri |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1133 |
self._append_to_entities(sql, attrs) |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1134 |
|
10189
0b141ffcdd74
[dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10091
diff
changeset
|
1135 |
def _handle_is_relation_sql(self, cnx, sql, attrs): |
8625
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1136 |
self._append_to_entities(sql, attrs) |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1137 |
|
10189
0b141ffcdd74
[dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10091
diff
changeset
|
1138 |
def _handle_is_instance_of_sql(self, cnx, sql, attrs): |
8625
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1139 |
self._append_to_entities(sql, attrs) |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1140 |
|
10189
0b141ffcdd74
[dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10091
diff
changeset
|
1141 |
def _handle_source_relation_sql(self, cnx, sql, attrs): |
8625
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1142 |
self._append_to_entities(sql, attrs) |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1143 |
|
9522
8154a5748194
[dataimport] fix comment
Aurelien Campeas <aurelien.campeas@logilab.fr>
parents:
9425
diff
changeset
|
1144 |
# add_info is _copypasted_ from the one in NativeSQLSource. We want it |
8154a5748194
[dataimport] fix comment
Aurelien Campeas <aurelien.campeas@logilab.fr>
parents:
9425
diff
changeset
|
1145 |
# there because it will use the _handlers of the SQLGenSourceWrapper, which |
8154a5748194
[dataimport] fix comment
Aurelien Campeas <aurelien.campeas@logilab.fr>
parents:
9425
diff
changeset
|
1146 |
# are not like the ones in the native source. |
10190
252e8f7ff9ea
[dataimport] source.add_info doesn't take anymore a 'complete' argument
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10189
diff
changeset
|
1147 |
def add_info(self, cnx, entity, source, extid): |
8625
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1148 |
"""add type and source info for an eid into the system table""" |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1149 |
# begin by inserting eid/type/source/extid into the entities table |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1150 |
if extid is not None: |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1151 |
assert isinstance(extid, str) |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1152 |
extid = b64encode(extid) |
8900
010a59e12d89
use cw_etype instead of __regid__
Pierre-Yves David <pierre-yves.david@logilab.fr>
parents:
8835
diff
changeset
|
1153 |
attrs = {'type': entity.cw_etype, 'eid': entity.eid, 'extid': extid, |
9827
c7ce035aede8
[dataimport] Drop reference to the 'source' column (closes #4067694).
Damien Garaud <damien.garaud@logilab.fr>
parents:
9770
diff
changeset
|
1154 |
'asource': source.uri} |
10189
0b141ffcdd74
[dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10091
diff
changeset
|
1155 |
self._handle_insert_entity_sql(cnx, self.sqlgen.insert('entities', attrs), attrs) |
8625
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1156 |
# insert core relations: is, is_instance_of and cw_source |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1157 |
try: |
10189
0b141ffcdd74
[dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10091
diff
changeset
|
1158 |
self._handle_is_relation_sql(cnx, 'INSERT INTO is_relation(eid_from,eid_to) VALUES (%s,%s)', |
0b141ffcdd74
[dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10091
diff
changeset
|
1159 |
(entity.eid, eschema_eid(cnx, entity.e_schema))) |
8625
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1160 |
except IndexError: |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1161 |
# during schema serialization, skip |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1162 |
pass |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1163 |
else: |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1164 |
for eschema in entity.e_schema.ancestors() + [entity.e_schema]: |
10189
0b141ffcdd74
[dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10091
diff
changeset
|
1165 |
self._handle_is_relation_sql(cnx, |
8625
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1166 |
'INSERT INTO is_instance_of_relation(eid_from,eid_to) VALUES (%s,%s)', |
10189
0b141ffcdd74
[dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10091
diff
changeset
|
1167 |
(entity.eid, eschema_eid(cnx, eschema))) |
8625
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1168 |
if 'CWSource' in self.schema and source.eid is not None: # else, cw < 3.10 |
10189
0b141ffcdd74
[dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10091
diff
changeset
|
1169 |
self._handle_is_relation_sql(cnx, 'INSERT INTO cw_source_relation(eid_from,eid_to) VALUES (%s,%s)', |
8625
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1170 |
(entity.eid, source.eid)) |
7ee0752178e5
[dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents:
8406
diff
changeset
|
1171 |
# now we can update the full text index |
8900
010a59e12d89
use cw_etype instead of __regid__
Pierre-Yves David <pierre-yves.david@logilab.fr>
parents:
8835
diff
changeset
|
1172 |
if self.do_fti and self.need_fti_indexation(entity.cw_etype): |
10189
0b141ffcdd74
[dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
10091
diff
changeset
|
1173 |
self.index_entity(cnx, entity=entity) |