dataimport.py
author Julien Cristau <julien.cristau@logilab.fr>
Mon, 02 Feb 2015 12:07:10 +0100
changeset 10174 7e1c8fb9c407
parent 10091 09878c2f8621
child 10189 0b141ffcdd74
permissions -rw-r--r--
[devtools] restore i18n of string removed in 4001cfe2f44d Removing that string from the po files means that after running i18ncube against newer cubicweb, we lose the translation when running on older cubicweb's, which is probably not a good idea.
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
     1
# -*- coding: utf-8 -*-
10007
727bbb361ed1 remove 3.11 bw compat
Aurelien Campeas <aurelien.campeas@logilab.fr>
parents: 9911
diff changeset
     2
# copyright 2003-2014 LOGILAB S.A. (Paris, FRANCE), all rights reserved.
5421
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5097
diff changeset
     3
# contact http://www.logilab.fr/ -- mailto:contact@logilab.fr
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5097
diff changeset
     4
#
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5097
diff changeset
     5
# This file is part of CubicWeb.
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5097
diff changeset
     6
#
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5097
diff changeset
     7
# CubicWeb is free software: you can redistribute it and/or modify it under the
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5097
diff changeset
     8
# terms of the GNU Lesser General Public License as published by the Free
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5097
diff changeset
     9
# Software Foundation, either version 2.1 of the License, or (at your option)
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5097
diff changeset
    10
# any later version.
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5097
diff changeset
    11
#
5424
8ecbcbff9777 replace logilab-common by CubicWeb in disclaimer
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5421
diff changeset
    12
# CubicWeb is distributed in the hope that it will be useful, but WITHOUT
5421
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5097
diff changeset
    13
# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5097
diff changeset
    14
# FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public License for more
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5097
diff changeset
    15
# details.
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5097
diff changeset
    16
#
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5097
diff changeset
    17
# You should have received a copy of the GNU Lesser General Public License along
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5097
diff changeset
    18
# with CubicWeb.  If not, see <http://www.gnu.org/licenses/>.
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    19
"""This module provides tools to import tabular data.
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    20
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    21
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    22
Example of use (run this with `cubicweb-ctl shell instance import-script.py`):
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    23
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    24
.. sourcecode:: python
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    25
7158
0f31a50b144e [dataimport] cleanups, update docstring to up-to-date usage
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7118
diff changeset
    26
  from cubicweb.dataimport import *
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    27
  # define data generators
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    28
  GENERATORS = []
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    29
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    30
  USERS = [('Prenom', 'firstname', ()),
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    31
           ('Nom', 'surname', ()),
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    32
           ('Identifiant', 'login', ()),
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    33
           ]
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    34
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    35
  def gen_users(ctl):
6133
6f3eabbbdf2e use iter_and_commit in example
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 6122
diff changeset
    36
      for row in ctl.iter_and_commit('utilisateurs'):
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    37
          entity = mk_entity(row, USERS)
7158
0f31a50b144e [dataimport] cleanups, update docstring to up-to-date usage
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7118
diff changeset
    38
          entity['upassword'] = 'motdepasse'
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    39
          ctl.check('login', entity['login'], None)
7158
0f31a50b144e [dataimport] cleanups, update docstring to up-to-date usage
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7118
diff changeset
    40
          entity = ctl.store.create_entity('CWUser', **entity)
0f31a50b144e [dataimport] cleanups, update docstring to up-to-date usage
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7118
diff changeset
    41
          email = ctl.store.create_entity('EmailAddress', address=row['email'])
0f31a50b144e [dataimport] cleanups, update docstring to up-to-date usage
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7118
diff changeset
    42
          ctl.store.relate(entity.eid, 'use_email', email.eid)
3003
2944ee420dca R [dataimport] rename uid to eid
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 2974
diff changeset
    43
          ctl.store.rql('SET U in_group G WHERE G name "users", U eid %(x)s', {'x':entity['eid']})
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    44
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    45
  CHK = [('login', check_doubles, 'Utilisateurs Login',
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    46
          'Deux utilisateurs ne devraient pas avoir le même login.'),
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    47
         ]
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    48
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    49
  GENERATORS.append( (gen_users, CHK) )
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    50
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    51
  # create controller
9906
b2919eca7514 [dataimport] remove _rql heresy
Julien Cristau <julien.cristau@logilab.fr>
parents: 9905
diff changeset
    52
  ctl = CWImportController(RQLObjectStore(cnx))
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
    53
  ctl.askerror = 1
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    54
  ctl.generators = GENERATORS
6122
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
    55
  ctl.data['utilisateurs'] = lazytable(ucsvreader(open('users.csv')))
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    56
  # run
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    57
  ctl.run()
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    58
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
    59
.. BUG file with one column are not parsable
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
    60
.. TODO rollback() invocation is not possible yet
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    61
"""
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    62
__docformat__ = "restructuredtext en"
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    63
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
    64
import csv
4186
ca7e526b07b6 import cleanup, check data file exists
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4173
diff changeset
    65
import sys
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
    66
import threading
4186
ca7e526b07b6 import cleanup, check data file exists
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4173
diff changeset
    67
import traceback
8926
336e4971dc50 [dataimport] backout 6947201033be (related to #2788402)
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents: 8900
diff changeset
    68
import warnings
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
    69
import cPickle
4186
ca7e526b07b6 import cleanup, check data file exists
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4173
diff changeset
    70
import os.path as osp
8832
26cdfc6dd6f8 [dataimport] Uniformize the API across the different stores.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents: 8807
diff changeset
    71
import inspect
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
    72
from collections import defaultdict
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
    73
from copy import copy
9901
161ec913aeec [dataimport] _create_copyfrom_buffer: fix datetime converter + add time converter
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9900
diff changeset
    74
from datetime import date, datetime, time
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
    75
from time import asctime
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
    76
from StringIO import StringIO
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    77
7159
3bcccd3ab6b6 [dataimport] ObjectStore.create_entity should return something that looks like an entity (eg no more using dict protocol to access to attributes)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7158
diff changeset
    78
from logilab.common import shellutils, attrdict
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
    79
from logilab.common.date import strptime
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
    80
from logilab.common.decorators import cached
4136
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
    81
from logilab.common.deprecation import deprecated
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    82
7171
4297be67bbe4 [dataimport] tell more and nicely about transaction status
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7170
diff changeset
    83
from cubicweb import QueryError
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
    84
from cubicweb.utils import make_uid
7158
0f31a50b144e [dataimport] cleanups, update docstring to up-to-date usage
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7118
diff changeset
    85
from cubicweb.schema import META_RTYPES, VIRTUAL_RTYPES
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
    86
from cubicweb.server.edition import EditedEntity
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
    87
from cubicweb.server.sqlutils import SQL_PREFIX
5066
bf5cbc351e99 [repo] move eschema_eid function from hooks.metadata to server.utils
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5063
diff changeset
    88
from cubicweb.server.utils import eschema_eid
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
    89
7158
0f31a50b144e [dataimport] cleanups, update docstring to up-to-date usage
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7118
diff changeset
    90
6122
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
    91
def count_lines(stream_or_filename):
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
    92
    if isinstance(stream_or_filename, basestring):
6492
47a284c0d012 fix some pylint detected errors
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 6427
diff changeset
    93
        f = open(stream_or_filename)
6122
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
    94
    else:
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
    95
        f = stream_or_filename
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
    96
        f.seek(0)
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
    97
    for i, line in enumerate(f):
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
    98
        pass
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
    99
    f.seek(0)
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
   100
    return i+1
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
   101
10091
09878c2f8621 [dataimport] Have ucsvreader's API match that of csv.reader (closes #3705701)
Rémi Cardona <remi.cardona@logilab.fr>
parents: 10078
diff changeset
   102
def ucsvreader_pb(stream_or_path, encoding='utf-8', delimiter=',', quotechar='"',
09878c2f8621 [dataimport] Have ucsvreader's API match that of csv.reader (closes #3705701)
Rémi Cardona <remi.cardona@logilab.fr>
parents: 10078
diff changeset
   103
                  skipfirst=False, withpb=True, skip_empty=True, separator=None,
09878c2f8621 [dataimport] Have ucsvreader's API match that of csv.reader (closes #3705701)
Rémi Cardona <remi.cardona@logilab.fr>
parents: 10078
diff changeset
   104
                  quote=None):
9181
2eac0aa1d3f6 [dataimport] ucsvreader should skip empty lines unless specified otherwise. Closes #3035944
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8970
diff changeset
   105
    """same as :func:`ucsvreader` but a progress bar is displayed as we iter on rows"""
10091
09878c2f8621 [dataimport] Have ucsvreader's API match that of csv.reader (closes #3705701)
Rémi Cardona <remi.cardona@logilab.fr>
parents: 10078
diff changeset
   106
    if separator is not None:
09878c2f8621 [dataimport] Have ucsvreader's API match that of csv.reader (closes #3705701)
Rémi Cardona <remi.cardona@logilab.fr>
parents: 10078
diff changeset
   107
        delimiter = separator
09878c2f8621 [dataimport] Have ucsvreader's API match that of csv.reader (closes #3705701)
Rémi Cardona <remi.cardona@logilab.fr>
parents: 10078
diff changeset
   108
        warnings.warn("[3.20] 'separator' kwarg is deprecated, use 'delimiter' instead")
09878c2f8621 [dataimport] Have ucsvreader's API match that of csv.reader (closes #3705701)
Rémi Cardona <remi.cardona@logilab.fr>
parents: 10078
diff changeset
   109
    if quote is not None:
09878c2f8621 [dataimport] Have ucsvreader's API match that of csv.reader (closes #3705701)
Rémi Cardona <remi.cardona@logilab.fr>
parents: 10078
diff changeset
   110
        quotechar = quote
09878c2f8621 [dataimport] Have ucsvreader's API match that of csv.reader (closes #3705701)
Rémi Cardona <remi.cardona@logilab.fr>
parents: 10078
diff changeset
   111
        warnings.warn("[3.20] 'quote' kwarg is deprecated, use 'quotechar' instead")
6122
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
   112
    if isinstance(stream_or_path, basestring):
6492
47a284c0d012 fix some pylint detected errors
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 6427
diff changeset
   113
        if not osp.exists(stream_or_path):
47a284c0d012 fix some pylint detected errors
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 6427
diff changeset
   114
            raise Exception("file doesn't exists: %s" % stream_or_path)
6122
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
   115
        stream = open(stream_or_path)
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
   116
    else:
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
   117
        stream = stream_or_path
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
   118
    rowcount = count_lines(stream)
4136
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   119
    if skipfirst:
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   120
        rowcount -= 1
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   121
    if withpb:
4140
46ddd27a4ca4 tweaks output
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4136
diff changeset
   122
        pb = shellutils.ProgressBar(rowcount, 50)
10091
09878c2f8621 [dataimport] Have ucsvreader's API match that of csv.reader (closes #3705701)
Rémi Cardona <remi.cardona@logilab.fr>
parents: 10078
diff changeset
   123
    for urow in ucsvreader(stream, encoding, delimiter, quotechar,
9181
2eac0aa1d3f6 [dataimport] ucsvreader should skip empty lines unless specified otherwise. Closes #3035944
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8970
diff changeset
   124
                           skipfirst=skipfirst, skip_empty=skip_empty):
4136
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   125
        yield urow
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   126
        if withpb:
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   127
            pb.update()
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   128
    print ' %s rows imported' % rowcount
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   129
10091
09878c2f8621 [dataimport] Have ucsvreader's API match that of csv.reader (closes #3705701)
Rémi Cardona <remi.cardona@logilab.fr>
parents: 10078
diff changeset
   130
def ucsvreader(stream, encoding='utf-8', delimiter=',', quotechar='"',
09878c2f8621 [dataimport] Have ucsvreader's API match that of csv.reader (closes #3705701)
Rémi Cardona <remi.cardona@logilab.fr>
parents: 10078
diff changeset
   131
               skipfirst=False, ignore_errors=False, skip_empty=True,
09878c2f8621 [dataimport] Have ucsvreader's API match that of csv.reader (closes #3705701)
Rémi Cardona <remi.cardona@logilab.fr>
parents: 10078
diff changeset
   132
               separator=None, quote=None):
4136
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   133
    """A csv reader that accepts files with any encoding and outputs unicode
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   134
    strings
9181
2eac0aa1d3f6 [dataimport] ucsvreader should skip empty lines unless specified otherwise. Closes #3035944
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8970
diff changeset
   135
2eac0aa1d3f6 [dataimport] ucsvreader should skip empty lines unless specified otherwise. Closes #3035944
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8970
diff changeset
   136
    if skip_empty (the default), lines without any values specified (only
2eac0aa1d3f6 [dataimport] ucsvreader should skip empty lines unless specified otherwise. Closes #3035944
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8970
diff changeset
   137
    separators) will be skipped. This is useful for Excel exports which may be
2eac0aa1d3f6 [dataimport] ucsvreader should skip empty lines unless specified otherwise. Closes #3035944
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8970
diff changeset
   138
    full of such lines.
4136
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   139
    """
10091
09878c2f8621 [dataimport] Have ucsvreader's API match that of csv.reader (closes #3705701)
Rémi Cardona <remi.cardona@logilab.fr>
parents: 10078
diff changeset
   140
    if separator is not None:
09878c2f8621 [dataimport] Have ucsvreader's API match that of csv.reader (closes #3705701)
Rémi Cardona <remi.cardona@logilab.fr>
parents: 10078
diff changeset
   141
        delimiter = separator
09878c2f8621 [dataimport] Have ucsvreader's API match that of csv.reader (closes #3705701)
Rémi Cardona <remi.cardona@logilab.fr>
parents: 10078
diff changeset
   142
        warnings.warn("[3.20] 'separator' kwarg is deprecated, use 'delimiter' instead")
09878c2f8621 [dataimport] Have ucsvreader's API match that of csv.reader (closes #3705701)
Rémi Cardona <remi.cardona@logilab.fr>
parents: 10078
diff changeset
   143
    if quote is not None:
09878c2f8621 [dataimport] Have ucsvreader's API match that of csv.reader (closes #3705701)
Rémi Cardona <remi.cardona@logilab.fr>
parents: 10078
diff changeset
   144
        quotechar = quote
09878c2f8621 [dataimport] Have ucsvreader's API match that of csv.reader (closes #3705701)
Rémi Cardona <remi.cardona@logilab.fr>
parents: 10078
diff changeset
   145
        warnings.warn("[3.20] 'quote' kwarg is deprecated, use 'quotechar' instead")
09878c2f8621 [dataimport] Have ucsvreader's API match that of csv.reader (closes #3705701)
Rémi Cardona <remi.cardona@logilab.fr>
parents: 10078
diff changeset
   146
    it = iter(csv.reader(stream, delimiter=delimiter, quotechar=quotechar))
8637
e16561083d84 [dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8631
diff changeset
   147
    if not ignore_errors:
e16561083d84 [dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8631
diff changeset
   148
        if skipfirst:
e16561083d84 [dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8631
diff changeset
   149
            it.next()
e16561083d84 [dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8631
diff changeset
   150
        for row in it:
9181
2eac0aa1d3f6 [dataimport] ucsvreader should skip empty lines unless specified otherwise. Closes #3035944
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8970
diff changeset
   151
            decoded = [item.decode(encoding) for item in row]
2eac0aa1d3f6 [dataimport] ucsvreader should skip empty lines unless specified otherwise. Closes #3035944
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8970
diff changeset
   152
            if not skip_empty or any(decoded):
9694
c90107199dea [dataimport] Avoid double unicode decoding in ucsvreader (closes #3705752)
Rémi Cardona <remi.cardona@logilab.fr>
parents: 9597
diff changeset
   153
                yield decoded
8637
e16561083d84 [dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8631
diff changeset
   154
    else:
9695
aa982b7c3f2a [dataimport] Prevent ucsvreader from skipping the first line when ignore_errors is True (closes #3705791)
Rémi Cardona <remi.cardona@logilab.fr>
parents: 9694
diff changeset
   155
        if skipfirst:
aa982b7c3f2a [dataimport] Prevent ucsvreader from skipping the first line when ignore_errors is True (closes #3705791)
Rémi Cardona <remi.cardona@logilab.fr>
parents: 9694
diff changeset
   156
            try:
aa982b7c3f2a [dataimport] Prevent ucsvreader from skipping the first line when ignore_errors is True (closes #3705791)
Rémi Cardona <remi.cardona@logilab.fr>
parents: 9694
diff changeset
   157
                row = it.next()
aa982b7c3f2a [dataimport] Prevent ucsvreader from skipping the first line when ignore_errors is True (closes #3705791)
Rémi Cardona <remi.cardona@logilab.fr>
parents: 9694
diff changeset
   158
            except csv.Error:
aa982b7c3f2a [dataimport] Prevent ucsvreader from skipping the first line when ignore_errors is True (closes #3705791)
Rémi Cardona <remi.cardona@logilab.fr>
parents: 9694
diff changeset
   159
                pass
8637
e16561083d84 [dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8631
diff changeset
   160
        # Safe version, that can cope with error in CSV file
e16561083d84 [dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8631
diff changeset
   161
        while True:
e16561083d84 [dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8631
diff changeset
   162
            try:
e16561083d84 [dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8631
diff changeset
   163
                row = it.next()
e16561083d84 [dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8631
diff changeset
   164
            # End of CSV, break
e16561083d84 [dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8631
diff changeset
   165
            except StopIteration:
e16561083d84 [dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8631
diff changeset
   166
                break
e16561083d84 [dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8631
diff changeset
   167
            # Error in CSV, ignore line and continue
e16561083d84 [dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8631
diff changeset
   168
            except csv.Error:
e16561083d84 [dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8631
diff changeset
   169
                continue
9181
2eac0aa1d3f6 [dataimport] ucsvreader should skip empty lines unless specified otherwise. Closes #3035944
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8970
diff changeset
   170
            decoded = [item.decode(encoding) for item in row]
2eac0aa1d3f6 [dataimport] ucsvreader should skip empty lines unless specified otherwise. Closes #3035944
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8970
diff changeset
   171
            if not skip_empty or any(decoded):
2eac0aa1d3f6 [dataimport] ucsvreader should skip empty lines unless specified otherwise. Closes #3035944
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8970
diff changeset
   172
                yield decoded
2eac0aa1d3f6 [dataimport] ucsvreader should skip empty lines unless specified otherwise. Closes #3035944
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8970
diff changeset
   173
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   174
6136
79da6f969b15 [dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 6133
diff changeset
   175
def callfunc_every(func, number, iterable):
79da6f969b15 [dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 6133
diff changeset
   176
    """yield items of `iterable` one by one and call function `func`
79da6f969b15 [dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 6133
diff changeset
   177
    every `number` iterations. Always call function `func` at the end.
79da6f969b15 [dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 6133
diff changeset
   178
    """
79da6f969b15 [dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 6133
diff changeset
   179
    for idx, item in enumerate(iterable):
79da6f969b15 [dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 6133
diff changeset
   180
        yield item
7227
23d9c1f89c96 [dataimport] actually commit every desired number...
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7214
diff changeset
   181
        if not idx % number:
6136
79da6f969b15 [dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 6133
diff changeset
   182
            func()
79da6f969b15 [dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 6133
diff changeset
   183
    func()
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   184
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   185
def lazytable(reader):
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   186
    """The first row is taken to be the header of the table and
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   187
    used to output a dict for each row of data.
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   188
6122
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
   189
    >>> data = lazytable(ucsvreader(open(filename)))
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   190
    """
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   191
    header = reader.next()
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   192
    for row in reader:
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   193
        yield dict(zip(header, row))
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   194
7201
52f5831400b2 [dataimport] allow to specify columns on which result should be sorted in lazydbtable
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7171
diff changeset
   195
def lazydbtable(cu, table, headers, orderby=None):
7160
923013173031 [dataimport] new 'lazydbtable' generator function to feed data from a database table
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7159
diff changeset
   196
    """return an iterator on rows of a sql table. On each row, fetch columns
923013173031 [dataimport] new 'lazydbtable' generator function to feed data from a database table
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7159
diff changeset
   197
    defined in headers and return values as a dictionary.
923013173031 [dataimport] new 'lazydbtable' generator function to feed data from a database table
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7159
diff changeset
   198
923013173031 [dataimport] new 'lazydbtable' generator function to feed data from a database table
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7159
diff changeset
   199
    >>> data = lazydbtable(cu, 'experimentation', ('id', 'nickname', 'gps'))
923013173031 [dataimport] new 'lazydbtable' generator function to feed data from a database table
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7159
diff changeset
   200
    """
7201
52f5831400b2 [dataimport] allow to specify columns on which result should be sorted in lazydbtable
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7171
diff changeset
   201
    sql = 'SELECT %s FROM %s' % (','.join(headers), table,)
52f5831400b2 [dataimport] allow to specify columns on which result should be sorted in lazydbtable
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7171
diff changeset
   202
    if orderby:
52f5831400b2 [dataimport] allow to specify columns on which result should be sorted in lazydbtable
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7171
diff changeset
   203
        sql += ' ORDER BY %s' % ','.join(orderby)
52f5831400b2 [dataimport] allow to specify columns on which result should be sorted in lazydbtable
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7171
diff changeset
   204
    cu.execute(sql)
7160
923013173031 [dataimport] new 'lazydbtable' generator function to feed data from a database table
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7159
diff changeset
   205
    while True:
923013173031 [dataimport] new 'lazydbtable' generator function to feed data from a database table
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7159
diff changeset
   206
        row = cu.fetchone()
923013173031 [dataimport] new 'lazydbtable' generator function to feed data from a database table
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7159
diff changeset
   207
        if row is None:
923013173031 [dataimport] new 'lazydbtable' generator function to feed data from a database table
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7159
diff changeset
   208
            break
923013173031 [dataimport] new 'lazydbtable' generator function to feed data from a database table
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7159
diff changeset
   209
        yield dict(zip(headers, row))
923013173031 [dataimport] new 'lazydbtable' generator function to feed data from a database table
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7159
diff changeset
   210
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   211
def mk_entity(row, map):
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   212
    """Return a dict made from sanitized mapped values.
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   213
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   214
    ValueError can be raised on unexpected values found in checkers
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   215
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   216
    >>> row = {'myname': u'dupont'}
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   217
    >>> map = [('myname', u'name', (call_transform_method('title'),))]
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   218
    >>> mk_entity(row, map)
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   219
    {'name': u'Dupont'}
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   220
    >>> row = {'myname': u'dupont', 'optname': u''}
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   221
    >>> map = [('myname', u'name', (call_transform_method('title'),)),
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   222
    ...        ('optname', u'MARKER', (optional,))]
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   223
    >>> mk_entity(row, map)
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   224
    {'name': u'Dupont', 'optname': None}
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   225
    """
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   226
    res = {}
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   227
    assert isinstance(row, dict)
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   228
    assert isinstance(map, list)
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   229
    for src, dest, funcs in map:
8406
f3bc8ca0b715 [data import] don't crash if value isn't in the file, simply no key/value in the output dict. Closes #2356328
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8403
diff changeset
   230
        try:
f3bc8ca0b715 [data import] don't crash if value isn't in the file, simply no key/value in the output dict. Closes #2356328
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8403
diff changeset
   231
            res[dest] = row[src]
f3bc8ca0b715 [data import] don't crash if value isn't in the file, simply no key/value in the output dict. Closes #2356328
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8403
diff changeset
   232
        except KeyError:
f3bc8ca0b715 [data import] don't crash if value isn't in the file, simply no key/value in the output dict. Closes #2356328
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8403
diff changeset
   233
            continue
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   234
        try:
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   235
            for func in funcs:
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   236
                res[dest] = func(res[dest])
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   237
                if res[dest] is None:
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   238
                    break
8695
358d8bed9626 [toward-py3k] rewrite to "except AnException as exc:" (part of #2711624)
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 8637
diff changeset
   239
        except ValueError as err:
7170
32b5d9d43a7e [dataimport] propagate stack
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7160
diff changeset
   240
            raise ValueError('error with %r field: %s' % (src, err)), None, sys.exc_info()[-1]
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   241
    return res
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   242
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   243
# user interactions ############################################################
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   244
3029
bc573d5fb5b7 F [devtools] by default dataimport prints message on stdout
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 3003
diff changeset
   245
def tell(msg):
bc573d5fb5b7 F [devtools] by default dataimport prints message on stdout
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 3003
diff changeset
   246
    print msg
bc573d5fb5b7 F [devtools] by default dataimport prints message on stdout
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 3003
diff changeset
   247
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   248
def confirm(question):
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   249
    """A confirm function that asks for yes/no/abort and exits on abort."""
4721
8f63691ccb7f pylint style fixes
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4613
diff changeset
   250
    answer = shellutils.ASK.ask(question, ('Y', 'n', 'abort'), 'Y')
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   251
    if answer == 'abort':
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   252
        sys.exit(1)
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   253
    return answer == 'Y'
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   254
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   255
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   256
class catch_error(object):
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   257
    """Helper for @contextmanager decorator."""
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   258
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   259
    def __init__(self, ctl, key='unexpected error', msg=None):
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   260
        self.ctl = ctl
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   261
        self.key = key
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   262
        self.msg = msg
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   263
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   264
    def __enter__(self):
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   265
        return self
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   266
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   267
    def __exit__(self, type, value, traceback):
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   268
        if type is not None:
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   269
            if issubclass(type, (KeyboardInterrupt, SystemExit)):
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   270
                return # re-raise
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   271
            if self.ctl.catcherrors:
4173
cfd5d3270f99 msg isn't defined there, but we've to give traceback information to record error
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4152
diff changeset
   272
                self.ctl.record_error(self.key, None, type, value, traceback)
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   273
                return True # silent
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   274
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   275
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   276
# base sanitizing/coercing functions ###########################################
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   277
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   278
def optional(value):
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   279
    """checker to filter optional field
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   280
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   281
    If value is undefined (ex: empty string), return None that will
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   282
    break the checkers validation chain
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   283
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   284
    General use is to add 'optional' check in first condition to avoid
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   285
    ValueError by further checkers
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   286
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   287
    >>> MAPPER = [(u'value', 'value', (optional, int))]
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   288
    >>> row = {'value': u'XXX'}
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   289
    >>> mk_entity(row, MAPPER)
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   290
    {'value': None}
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   291
    >>> row = {'value': u'100'}
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   292
    >>> mk_entity(row, MAPPER)
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   293
    {'value': 100}
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   294
    """
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   295
    if value:
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   296
        return value
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   297
    return None
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   298
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   299
def required(value):
6136
79da6f969b15 [dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 6133
diff changeset
   300
    """raise ValueError if value is empty
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   301
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   302
    This check should be often found in last position in the chain.
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   303
    """
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   304
    if value:
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   305
        return value
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   306
    raise ValueError("required")
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   307
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   308
def todatetime(format='%d/%m/%Y'):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   309
    """return a transformation function to turn string input value into a
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   310
    `datetime.datetime` instance, using given format.
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   311
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   312
    Follow it by `todate` or `totime` functions from `logilab.common.date` if
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   313
    you want a `date`/`time` instance instead of `datetime`.
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   314
    """
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   315
    def coerce(value):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   316
        return strptime(value, format)
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   317
    return coerce
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   318
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   319
def call_transform_method(methodname, *args, **kwargs):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   320
    """return value returned by calling the given method on input"""
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   321
    def coerce(value):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   322
        return getattr(value, methodname)(*args, **kwargs)
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   323
    return coerce
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   324
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   325
def call_check_method(methodname, *args, **kwargs):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   326
    """check value returned by calling the given method on input is true,
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   327
    else raise ValueError
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   328
    """
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   329
    def check(value):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   330
        if getattr(value, methodname)(*args, **kwargs):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   331
            return value
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   332
        raise ValueError('%s not verified on %r' % (methodname, value))
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   333
    return check
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   334
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   335
# base integrity checking functions ############################################
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   336
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   337
def check_doubles(buckets):
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   338
    """Extract the keys that have more than one item in their bucket."""
4721
8f63691ccb7f pylint style fixes
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4613
diff changeset
   339
    return [(k, len(v)) for k, v in buckets.items() if len(v) > 1]
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   340
4136
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   341
def check_doubles_not_none(buckets):
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   342
    """Extract the keys that have more than one item in their bucket."""
4721
8f63691ccb7f pylint style fixes
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4613
diff changeset
   343
    return [(k, len(v)) for k, v in buckets.items()
8f63691ccb7f pylint style fixes
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4613
diff changeset
   344
            if k is not None and len(v) > 1]
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   345
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   346
# sql generator utility functions #############################################
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   347
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   348
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   349
def _import_statements(sql_connect, statements, nb_threads=3,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   350
                       dump_output_dir=None,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   351
                       support_copy_from=True, encoding='utf-8'):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   352
    """
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   353
    Import a bunch of sql statements, using different threads.
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   354
    """
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   355
    try:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   356
        chunksize = (len(statements) / nb_threads) + 1
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   357
        threads = []
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   358
        for i in xrange(nb_threads):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   359
            chunks = statements[i*chunksize:(i+1)*chunksize]
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   360
            thread = threading.Thread(target=_execmany_thread,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   361
                                      args=(sql_connect, chunks,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   362
                                            dump_output_dir,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   363
                                            support_copy_from,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   364
                                            encoding))
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   365
            thread.start()
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   366
            threads.append(thread)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   367
        for t in threads:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   368
            t.join()
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   369
    except Exception:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   370
        print 'Error in import statements'
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   371
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   372
def _execmany_thread_not_copy_from(cu, statement, data, table=None,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   373
                                   columns=None, encoding='utf-8'):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   374
    """ Execute thread without copy from
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   375
    """
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   376
    cu.executemany(statement, data)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   377
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   378
def _execmany_thread_copy_from(cu, statement, data, table,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   379
                               columns, encoding='utf-8'):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   380
    """ Execute thread with copy from
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   381
    """
10078
5eeffcfde1ba [dataimport] Fix use of _create_copyfrom_buffer() (related to #3845572)
Rémi Cardona <remi.cardona@logilab.fr>
parents: 10007
diff changeset
   382
    buf = _create_copyfrom_buffer(data, columns, encoding=encoding)
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   383
    if buf is None:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   384
        _execmany_thread_not_copy_from(cu, statement, data)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   385
    else:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   386
        if columns is None:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   387
            cu.copy_from(buf, table, null='NULL')
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   388
        else:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   389
            cu.copy_from(buf, table, null='NULL', columns=columns)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   390
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   391
def _execmany_thread(sql_connect, statements, dump_output_dir=None,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   392
                     support_copy_from=True, encoding='utf-8'):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   393
    """
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   394
    Execute sql statement. If 'INSERT INTO', try to use 'COPY FROM' command,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   395
    or fallback to execute_many.
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   396
    """
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   397
    if support_copy_from:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   398
        execmany_func = _execmany_thread_copy_from
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   399
    else:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   400
        execmany_func = _execmany_thread_not_copy_from
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   401
    cnx = sql_connect()
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   402
    cu = cnx.cursor()
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   403
    try:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   404
        for statement, data in statements:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   405
            table = None
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   406
            columns = None
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   407
            try:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   408
                if not statement.startswith('INSERT INTO'):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   409
                    cu.executemany(statement, data)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   410
                    continue
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   411
                table = statement.split()[2]
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   412
                if isinstance(data[0], (tuple, list)):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   413
                    columns = None
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   414
                else:
8696
0bb18407c053 [toward py3k] rewrite dict.keys() and dict.values() (part of #2711624)
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 8695
diff changeset
   415
                    columns = list(data[0])
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   416
                execmany_func(cu, statement, data, table, columns, encoding)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   417
            except Exception:
8970
0a1bd0c590e2 [dataimport] minor typo in error handling
Dimitri Papadopoulos <dimitri.papadopoulos@cea.fr>
parents: 8930
diff changeset
   418
                print 'unable to copy data into table %s' % table
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   419
                # Error in import statement, save data in dump_output_dir
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   420
                if dump_output_dir is not None:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   421
                    pdata = {'data': data, 'statement': statement,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   422
                             'time': asctime(), 'columns': columns}
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   423
                    filename = make_uid()
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   424
                    try:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   425
                        with open(osp.join(dump_output_dir,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   426
                                           '%s.pickle' % filename), 'w') as fobj:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   427
                            fobj.write(cPickle.dumps(pdata))
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   428
                    except IOError:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   429
                        print 'ERROR while pickling in', dump_output_dir, filename+'.pickle'
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   430
                        pass
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   431
                cnx.rollback()
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   432
                raise
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   433
    finally:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   434
        cnx.commit()
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   435
        cu.close()
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   436
9898
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   437
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   438
def _copyfrom_buffer_convert_None(value, **opts):
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   439
    '''Convert None value to "NULL"'''
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   440
    return 'NULL'
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   441
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   442
def _copyfrom_buffer_convert_number(value, **opts):
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   443
    '''Convert a number into its string representation'''
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   444
    return str(value)
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   445
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   446
def _copyfrom_buffer_convert_string(value, **opts):
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   447
    '''Convert string value.
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   448
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   449
    Recognized keywords:
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   450
    :encoding: resulting string encoding (default: utf-8)
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   451
    :replace_sep: character used when input contains characters
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   452
                  that conflict with the column separator.
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   453
    '''
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   454
    encoding = opts.get('encoding','utf-8')
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   455
    replace_sep = opts.get('replace_sep', None)
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   456
    # Remove separators used in string formatting
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   457
    for _char in (u'\t', u'\r', u'\n'):
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   458
        if _char in value:
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   459
            # If a replace_sep is given, replace
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   460
            # the separator
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   461
            # (and thus avoid empty buffer)
9900
9c7de09a6648 [dataimport] _create_copyfrom_buffer: raise ValueError if conversion cannot be performed
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9899
diff changeset
   462
            if replace_sep is None:
9c7de09a6648 [dataimport] _create_copyfrom_buffer: raise ValueError if conversion cannot be performed
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9899
diff changeset
   463
                raise ValueError('conflicting separator: '
9c7de09a6648 [dataimport] _create_copyfrom_buffer: raise ValueError if conversion cannot be performed
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9899
diff changeset
   464
                                 'you must provide the replace_sep option')
9c7de09a6648 [dataimport] _create_copyfrom_buffer: raise ValueError if conversion cannot be performed
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9899
diff changeset
   465
            value = value.replace(_char, replace_sep)
9898
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   466
        value = value.replace('\\', r'\\')
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   467
    if isinstance(value, unicode):
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   468
        value = value.encode(encoding)
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   469
    return value
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   470
9901
161ec913aeec [dataimport] _create_copyfrom_buffer: fix datetime converter + add time converter
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9900
diff changeset
   471
def _copyfrom_buffer_convert_date(value, **opts):
9898
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   472
    '''Convert date into "YYYY-MM-DD"'''
9901
161ec913aeec [dataimport] _create_copyfrom_buffer: fix datetime converter + add time converter
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9900
diff changeset
   473
    # Do not use strftime, as it yields issue with date < 1900
161ec913aeec [dataimport] _create_copyfrom_buffer: fix datetime converter + add time converter
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9900
diff changeset
   474
    # (http://bugs.python.org/issue1777412)
161ec913aeec [dataimport] _create_copyfrom_buffer: fix datetime converter + add time converter
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9900
diff changeset
   475
    return '%04d-%02d-%02d' % (value.year, value.month, value.day)
161ec913aeec [dataimport] _create_copyfrom_buffer: fix datetime converter + add time converter
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9900
diff changeset
   476
161ec913aeec [dataimport] _create_copyfrom_buffer: fix datetime converter + add time converter
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9900
diff changeset
   477
def _copyfrom_buffer_convert_datetime(value, **opts):
161ec913aeec [dataimport] _create_copyfrom_buffer: fix datetime converter + add time converter
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9900
diff changeset
   478
    '''Convert date into "YYYY-MM-DD HH:MM:SS.UUUUUU"'''
161ec913aeec [dataimport] _create_copyfrom_buffer: fix datetime converter + add time converter
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9900
diff changeset
   479
    # Do not use strftime, as it yields issue with date < 1900
161ec913aeec [dataimport] _create_copyfrom_buffer: fix datetime converter + add time converter
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9900
diff changeset
   480
    # (http://bugs.python.org/issue1777412)
161ec913aeec [dataimport] _create_copyfrom_buffer: fix datetime converter + add time converter
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9900
diff changeset
   481
    return '%s %s' % (_copyfrom_buffer_convert_date(value, **opts),
161ec913aeec [dataimport] _create_copyfrom_buffer: fix datetime converter + add time converter
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9900
diff changeset
   482
                      _copyfrom_buffer_convert_time(value, **opts))
161ec913aeec [dataimport] _create_copyfrom_buffer: fix datetime converter + add time converter
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9900
diff changeset
   483
161ec913aeec [dataimport] _create_copyfrom_buffer: fix datetime converter + add time converter
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9900
diff changeset
   484
def _copyfrom_buffer_convert_time(value, **opts):
161ec913aeec [dataimport] _create_copyfrom_buffer: fix datetime converter + add time converter
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9900
diff changeset
   485
    '''Convert time into "HH:MM:SS.UUUUUU"'''
161ec913aeec [dataimport] _create_copyfrom_buffer: fix datetime converter + add time converter
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9900
diff changeset
   486
    return '%02d:%02d:%02d.%06d' % (value.hour, value.minute,
161ec913aeec [dataimport] _create_copyfrom_buffer: fix datetime converter + add time converter
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9900
diff changeset
   487
                                    value.second, value.microsecond)
9898
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   488
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   489
# (types, converter) list.
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   490
_COPYFROM_BUFFER_CONVERTERS = [
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   491
    (type(None), _copyfrom_buffer_convert_None),
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   492
    ((long, int, float), _copyfrom_buffer_convert_number),
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   493
    (basestring, _copyfrom_buffer_convert_string),
9901
161ec913aeec [dataimport] _create_copyfrom_buffer: fix datetime converter + add time converter
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9900
diff changeset
   494
    (datetime, _copyfrom_buffer_convert_datetime),
161ec913aeec [dataimport] _create_copyfrom_buffer: fix datetime converter + add time converter
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9900
diff changeset
   495
    (date, _copyfrom_buffer_convert_date),
161ec913aeec [dataimport] _create_copyfrom_buffer: fix datetime converter + add time converter
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9900
diff changeset
   496
    (time, _copyfrom_buffer_convert_time),
9898
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   497
]
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   498
9902
62c586f32f93 [dataimport] _create_copyfrom_buffer: do not ignore columns if data is a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9901
diff changeset
   499
def _create_copyfrom_buffer(data, columns=None, **convert_opts):
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   500
    """
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   501
    Create a StringIO buffer for 'COPY FROM' command.
9898
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   502
    Deals with Unicode, Int, Float, Date... (see ``converters``)
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   503
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   504
    :data: a sequence/dict of tuples
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   505
    :columns: list of columns to consider (default to all columns)
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   506
    :converter_opts: keyword arguements given to converters
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   507
    """
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   508
    # Create a list rather than directly create a StringIO
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   509
    # to correctly write lines separated by '\n' in a single step
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   510
    rows = []
9902
62c586f32f93 [dataimport] _create_copyfrom_buffer: do not ignore columns if data is a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9901
diff changeset
   511
    if columns is None:
62c586f32f93 [dataimport] _create_copyfrom_buffer: do not ignore columns if data is a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9901
diff changeset
   512
        if isinstance(data[0], (tuple, list)):
62c586f32f93 [dataimport] _create_copyfrom_buffer: do not ignore columns if data is a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9901
diff changeset
   513
            columns = range(len(data[0]))
62c586f32f93 [dataimport] _create_copyfrom_buffer: do not ignore columns if data is a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9901
diff changeset
   514
        elif isinstance(data[0], dict):
62c586f32f93 [dataimport] _create_copyfrom_buffer: do not ignore columns if data is a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9901
diff changeset
   515
            columns = data[0].keys()
62c586f32f93 [dataimport] _create_copyfrom_buffer: do not ignore columns if data is a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9901
diff changeset
   516
        else:
62c586f32f93 [dataimport] _create_copyfrom_buffer: do not ignore columns if data is a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9901
diff changeset
   517
            raise ValueError('Could not get columns: you must provide columns.')
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   518
    for row in data:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   519
        # Iterate over the different columns and the different values
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   520
        # and try to convert them to a correct datatype.
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   521
        # If an error is raised, do not continue.
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   522
        formatted_row = []
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   523
        for col in columns:
8926
336e4971dc50 [dataimport] backout 6947201033be (related to #2788402)
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents: 8900
diff changeset
   524
            try:
8834
6947201033be [dataimport] Handle various data formats when creating buffers from data.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents: 8833
diff changeset
   525
                value = row[col]
8926
336e4971dc50 [dataimport] backout 6947201033be (related to #2788402)
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents: 8900
diff changeset
   526
            except KeyError:
9898
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   527
                warnings.warn(u"Column %s is not accessible in row %s"
8926
336e4971dc50 [dataimport] backout 6947201033be (related to #2788402)
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents: 8900
diff changeset
   528
                              % (col, row), RuntimeWarning)
9898
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   529
                # XXX 'value' set to None so that the import does not end in
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   530
                # error.
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   531
                # Instead, the extra keys are set to NULL from the
8926
336e4971dc50 [dataimport] backout 6947201033be (related to #2788402)
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents: 8900
diff changeset
   532
                # database point of view.
336e4971dc50 [dataimport] backout 6947201033be (related to #2788402)
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents: 8900
diff changeset
   533
                value = None
9898
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   534
            for types, converter in _COPYFROM_BUFFER_CONVERTERS:
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   535
                if isinstance(value, types):
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   536
                    value = converter(value, **convert_opts)
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   537
                    break
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   538
            else:
9900
9c7de09a6648 [dataimport] _create_copyfrom_buffer: raise ValueError if conversion cannot be performed
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9899
diff changeset
   539
                raise ValueError("Unsupported value type %s" % type(value))
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   540
            # We push the value to the new formatted row
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   541
            # if the value is not None and could be converted to a string.
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   542
            formatted_row.append(value)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   543
        rows.append('\t'.join(formatted_row))
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   544
    return StringIO('\n'.join(rows))
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   545
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   546
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   547
# object stores #################################################################
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   548
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   549
class ObjectStore(object):
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   550
    """Store objects in memory for *faster* validation (development mode)
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   551
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   552
    But it will not enforce the constraints of the schema and hence will miss some problems
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   553
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   554
    >>> store = ObjectStore()
7158
0f31a50b144e [dataimport] cleanups, update docstring to up-to-date usage
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7118
diff changeset
   555
    >>> user = store.create_entity('CWUser', login=u'johndoe')
0f31a50b144e [dataimport] cleanups, update docstring to up-to-date usage
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7118
diff changeset
   556
    >>> group = store.create_entity('CWUser', name=u'unknown')
0f31a50b144e [dataimport] cleanups, update docstring to up-to-date usage
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7118
diff changeset
   557
    >>> store.relate(user.eid, 'in_group', group.eid)
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   558
    """
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   559
    def __init__(self):
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   560
        self.items = []
3003
2944ee420dca R [dataimport] rename uid to eid
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 2974
diff changeset
   561
        self.eids = {}
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   562
        self.types = {}
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   563
        self.relations = set()
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   564
        self.indexes = {}
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   565
6990
353ad06867a8 [dataimport] implement create_entity() on ObjectStore to provide a consistent interface
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 6989
diff changeset
   566
    def create_entity(self, etype, **data):
7159
3bcccd3ab6b6 [dataimport] ObjectStore.create_entity should return something that looks like an entity (eg no more using dict protocol to access to attributes)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7158
diff changeset
   567
        data = attrdict(data)
9905
1fa35cc06c69 [dataimport] remove dead code
Julien Cristau <julien.cristau@logilab.fr>
parents: 9904
diff changeset
   568
        data['eid'] = eid = len(self.items)
1fa35cc06c69 [dataimport] remove dead code
Julien Cristau <julien.cristau@logilab.fr>
parents: 9904
diff changeset
   569
        self.items.append(data)
6990
353ad06867a8 [dataimport] implement create_entity() on ObjectStore to provide a consistent interface
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 6989
diff changeset
   570
        self.eids[eid] = data
353ad06867a8 [dataimport] implement create_entity() on ObjectStore to provide a consistent interface
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 6989
diff changeset
   571
        self.types.setdefault(etype, []).append(eid)
353ad06867a8 [dataimport] implement create_entity() on ObjectStore to provide a consistent interface
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 6989
diff changeset
   572
        return data
353ad06867a8 [dataimport] implement create_entity() on ObjectStore to provide a consistent interface
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 6989
diff changeset
   573
8832
26cdfc6dd6f8 [dataimport] Uniformize the API across the different stores.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents: 8807
diff changeset
   574
    def relate(self, eid_from, rtype, eid_to, **kwargs):
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   575
        """Add new relation"""
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   576
        relation = eid_from, rtype, eid_to
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   577
        self.relations.add(relation)
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   578
        return relation
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   579
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   580
    def commit(self):
9908
88bbb3abf30f [dataimport] Stop swallowing errors from commit/flush
Julien Cristau <julien.cristau@logilab.fr>
parents: 9907
diff changeset
   581
        """this commit method does nothing by default"""
88bbb3abf30f [dataimport] Stop swallowing errors from commit/flush
Julien Cristau <julien.cristau@logilab.fr>
parents: 9907
diff changeset
   582
        return
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   583
8833
39f81e2db2fc [dataimport] Add a ``flush`` method for all stores.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents: 8832
diff changeset
   584
    def flush(self):
9910
55d9d483e7c3 [dataimport] don't commit on flush
Julien Cristau <julien.cristau@logilab.fr>
parents: 9908
diff changeset
   585
        """The method is provided so that all stores share a common API"""
55d9d483e7c3 [dataimport] don't commit on flush
Julien Cristau <julien.cristau@logilab.fr>
parents: 9908
diff changeset
   586
        pass
8833
39f81e2db2fc [dataimport] Add a ``flush`` method for all stores.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents: 8832
diff changeset
   587
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   588
    @property
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   589
    def nb_inserted_entities(self):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   590
        return len(self.eids)
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   591
    @property
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   592
    def nb_inserted_types(self):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   593
        return len(self.types)
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   594
    @property
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   595
    def nb_inserted_relations(self):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   596
        return len(self.relations)
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   597
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   598
class RQLObjectStore(ObjectStore):
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   599
    """ObjectStore that works with an actual RQL repository (production mode)"""
4136
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   600
9907
696b81eba218 [dataimport] Update RQLObjectStore to Connection API
Julien Cristau <julien.cristau@logilab.fr>
parents: 9906
diff changeset
   601
    def __init__(self, cnx, commit=None):
696b81eba218 [dataimport] Update RQLObjectStore to Connection API
Julien Cristau <julien.cristau@logilab.fr>
parents: 9906
diff changeset
   602
        if commit is not None:
696b81eba218 [dataimport] Update RQLObjectStore to Connection API
Julien Cristau <julien.cristau@logilab.fr>
parents: 9906
diff changeset
   603
            warnings.warn('[3.19] commit argument should not be specified '
696b81eba218 [dataimport] Update RQLObjectStore to Connection API
Julien Cristau <julien.cristau@logilab.fr>
parents: 9906
diff changeset
   604
                          'as the cnx object already provides it.',
696b81eba218 [dataimport] Update RQLObjectStore to Connection API
Julien Cristau <julien.cristau@logilab.fr>
parents: 9906
diff changeset
   605
                          DeprecationWarning, stacklevel=2)
696b81eba218 [dataimport] Update RQLObjectStore to Connection API
Julien Cristau <julien.cristau@logilab.fr>
parents: 9906
diff changeset
   606
        super(RQLObjectStore, self).__init__()
696b81eba218 [dataimport] Update RQLObjectStore to Connection API
Julien Cristau <julien.cristau@logilab.fr>
parents: 9906
diff changeset
   607
        self._cnx = cnx
696b81eba218 [dataimport] Update RQLObjectStore to Connection API
Julien Cristau <julien.cristau@logilab.fr>
parents: 9906
diff changeset
   608
        self._commit = commit or cnx.commit
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   609
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   610
    def commit(self):
9907
696b81eba218 [dataimport] Update RQLObjectStore to Connection API
Julien Cristau <julien.cristau@logilab.fr>
parents: 9906
diff changeset
   611
        return self._commit()
4136
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   612
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   613
    def rql(self, *args):
9907
696b81eba218 [dataimport] Update RQLObjectStore to Connection API
Julien Cristau <julien.cristau@logilab.fr>
parents: 9906
diff changeset
   614
        return self._cnx.execute(*args)
696b81eba218 [dataimport] Update RQLObjectStore to Connection API
Julien Cristau <julien.cristau@logilab.fr>
parents: 9906
diff changeset
   615
696b81eba218 [dataimport] Update RQLObjectStore to Connection API
Julien Cristau <julien.cristau@logilab.fr>
parents: 9906
diff changeset
   616
    @property
696b81eba218 [dataimport] Update RQLObjectStore to Connection API
Julien Cristau <julien.cristau@logilab.fr>
parents: 9906
diff changeset
   617
    def session(self):
696b81eba218 [dataimport] Update RQLObjectStore to Connection API
Julien Cristau <julien.cristau@logilab.fr>
parents: 9906
diff changeset
   618
        warnings.warn('[3.19] deprecated property.', DeprecationWarning,
696b81eba218 [dataimport] Update RQLObjectStore to Connection API
Julien Cristau <julien.cristau@logilab.fr>
parents: 9906
diff changeset
   619
                      stacklevel=2)
696b81eba218 [dataimport] Update RQLObjectStore to Connection API
Julien Cristau <julien.cristau@logilab.fr>
parents: 9906
diff changeset
   620
        return self._cnx.repo._get_session(self._cnx.sessionid)
4136
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   621
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   622
    def create_entity(self, *args, **kwargs):
9907
696b81eba218 [dataimport] Update RQLObjectStore to Connection API
Julien Cristau <julien.cristau@logilab.fr>
parents: 9906
diff changeset
   623
        entity = self._cnx.create_entity(*args, **kwargs)
4136
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   624
        self.eids[entity.eid] = entity
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   625
        self.types.setdefault(args[0], []).append(entity.eid)
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   626
        return entity
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   627
8832
26cdfc6dd6f8 [dataimport] Uniformize the API across the different stores.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents: 8807
diff changeset
   628
    def relate(self, eid_from, rtype, eid_to, **kwargs):
4721
8f63691ccb7f pylint style fixes
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4613
diff changeset
   629
        eid_from, rtype, eid_to = super(RQLObjectStore, self).relate(
8832
26cdfc6dd6f8 [dataimport] Uniformize the API across the different stores.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents: 8807
diff changeset
   630
            eid_from, rtype, eid_to, **kwargs)
4136
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   631
        self.rql('SET X %s Y WHERE X eid %%(x)s, Y eid %%(y)s' % rtype,
7033
ddc1b4d80dbd [dataimport] remove eid_key deprecation warning and prints
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 6990
diff changeset
   632
                 {'x': int(eid_from), 'y': int(eid_to)})
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   633
9907
696b81eba218 [dataimport] Update RQLObjectStore to Connection API
Julien Cristau <julien.cristau@logilab.fr>
parents: 9906
diff changeset
   634
    @deprecated("[3.19] use cnx.find(*args, **kwargs).entities() instead")
7116
dfd4680a23f0 [session] add find_entities and find_one_entity to session/request API (closes #1550045)
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 7033
diff changeset
   635
    def find_entities(self, *args, **kwargs):
9907
696b81eba218 [dataimport] Update RQLObjectStore to Connection API
Julien Cristau <julien.cristau@logilab.fr>
parents: 9906
diff changeset
   636
        return self._cnx.find(*args, **kwargs).entities()
7116
dfd4680a23f0 [session] add find_entities and find_one_entity to session/request API (closes #1550045)
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 7033
diff changeset
   637
9907
696b81eba218 [dataimport] Update RQLObjectStore to Connection API
Julien Cristau <julien.cristau@logilab.fr>
parents: 9906
diff changeset
   638
    @deprecated("[3.19] use cnx.find(*args, **kwargs).one() instead")
7116
dfd4680a23f0 [session] add find_entities and find_one_entity to session/request API (closes #1550045)
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 7033
diff changeset
   639
    def find_one_entity(self, *args, **kwargs):
9907
696b81eba218 [dataimport] Update RQLObjectStore to Connection API
Julien Cristau <julien.cristau@logilab.fr>
parents: 9906
diff changeset
   640
        return self._cnx.find(*args, **kwargs).one()
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   641
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   642
# the import controller ########################################################
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   643
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   644
class CWImportController(object):
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   645
    """Controller of the data import process.
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   646
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   647
    >>> ctl = CWImportController(store)
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   648
    >>> ctl.generators = list_of_data_generators
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   649
    >>> ctl.data = dict_of_data_tables
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   650
    >>> ctl.run()
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   651
    """
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   652
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   653
    def __init__(self, store, askerror=0, catcherrors=None, tell=tell,
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   654
                 commitevery=50):
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   655
        self.store = store
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   656
        self.generators = None
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   657
        self.data = {}
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   658
        self.errors = None
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   659
        self.askerror = askerror
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   660
        if  catcherrors is None:
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   661
            catcherrors = askerror
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   662
        self.catcherrors = catcherrors
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   663
        self.commitevery = commitevery # set to None to do a single commit
3029
bc573d5fb5b7 F [devtools] by default dataimport prints message on stdout
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 3003
diff changeset
   664
        self._tell = tell
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   665
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   666
    def check(self, type, key, value):
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   667
        self._checks.setdefault(type, {}).setdefault(key, []).append(value)
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   668
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   669
    def check_map(self, entity, key, map, default):
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   670
        try:
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   671
            entity[key] = map[entity[key]]
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   672
        except KeyError:
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   673
            self.check(key, entity[key], None)
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   674
            entity[key] = default
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   675
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   676
    def record_error(self, key, msg=None, type=None, value=None, tb=None):
4186
ca7e526b07b6 import cleanup, check data file exists
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4173
diff changeset
   677
        tmp = StringIO()
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   678
        if type is None:
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   679
            traceback.print_exc(file=tmp)
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   680
        else:
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   681
            traceback.print_exception(type, value, tb, file=tmp)
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   682
        # use a list to avoid counting a <nb lines> errors instead of one
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   683
        errorlog = self.errors.setdefault(key, [])
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   684
        if msg is None:
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   685
            errorlog.append(tmp.getvalue().splitlines())
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   686
        else:
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   687
            errorlog.append( (msg, tmp.getvalue().splitlines()) )
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   688
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   689
    def run(self):
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   690
        self.errors = {}
7171
4297be67bbe4 [dataimport] tell more and nicely about transaction status
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7170
diff changeset
   691
        if self.commitevery is None:
4297be67bbe4 [dataimport] tell more and nicely about transaction status
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7170
diff changeset
   692
            self.tell('Will commit all or nothing.')
4297be67bbe4 [dataimport] tell more and nicely about transaction status
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7170
diff changeset
   693
        else:
4297be67bbe4 [dataimport] tell more and nicely about transaction status
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7170
diff changeset
   694
            self.tell('Will commit every %s iterations' % self.commitevery)
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   695
        for func, checks in self.generators:
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   696
            self._checks = {}
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   697
            func_name = func.__name__
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   698
            self.tell("Run import function '%s'..." % func_name)
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   699
            try:
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   700
                func(self)
7815
2a164a9cf81c [exceptions] stop catching any exception in various places (closes #1942716)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7471
diff changeset
   701
            except Exception:
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   702
                if self.catcherrors:
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   703
                    self.record_error(func_name, 'While calling %s' % func.__name__)
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   704
                else:
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   705
                    self._print_stats()
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   706
                    raise
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   707
            for key, func, title, help in checks:
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   708
                buckets = self._checks.get(key)
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   709
                if buckets:
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   710
                    err = func(buckets)
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   711
                    if err:
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   712
                        self.errors[title] = (help, err)
7171
4297be67bbe4 [dataimport] tell more and nicely about transaction status
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7170
diff changeset
   713
        try:
4297be67bbe4 [dataimport] tell more and nicely about transaction status
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7170
diff changeset
   714
            txuuid = self.store.commit()
4297be67bbe4 [dataimport] tell more and nicely about transaction status
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7170
diff changeset
   715
            if txuuid is not None:
4297be67bbe4 [dataimport] tell more and nicely about transaction status
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7170
diff changeset
   716
                self.tell('Transaction commited (txuuid: %s)' % txuuid)
8695
358d8bed9626 [toward-py3k] rewrite to "except AnException as exc:" (part of #2711624)
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 8637
diff changeset
   717
        except QueryError as ex:
7171
4297be67bbe4 [dataimport] tell more and nicely about transaction status
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7170
diff changeset
   718
            self.tell('Transaction aborted: %s' % ex)
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   719
        self._print_stats()
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   720
        if self.errors:
4721
8f63691ccb7f pylint style fixes
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4613
diff changeset
   721
            if self.askerror == 2 or (self.askerror and confirm('Display errors ?')):
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   722
                from pprint import pformat
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   723
                for errkey, error in self.errors.items():
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   724
                    self.tell("\n%s (%s): %d\n" % (error[0], errkey, len(error[1])))
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   725
                    self.tell(pformat(sorted(error[1])))
7171
4297be67bbe4 [dataimport] tell more and nicely about transaction status
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7170
diff changeset
   726
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   727
    def _print_stats(self):
8696
0bb18407c053 [toward py3k] rewrite dict.keys() and dict.values() (part of #2711624)
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 8695
diff changeset
   728
        nberrors = sum(len(err) for err in self.errors.itervalues())
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   729
        self.tell('\nImport statistics: %i entities, %i types, %i relations and %i errors'
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   730
                  % (self.store.nb_inserted_entities,
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   731
                     self.store.nb_inserted_types,
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   732
                     self.store.nb_inserted_relations,
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   733
                     nberrors))
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   734
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   735
    def get_data(self, key):
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   736
        return self.data.get(key)
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   737
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   738
    def index(self, name, key, value, unique=False):
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   739
        """create a new index
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   740
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   741
        If unique is set to True, only first occurence will be kept not the following ones
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   742
        """
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   743
        if unique:
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   744
            try:
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   745
                if value in self.store.indexes[name][key]:
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   746
                    return
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   747
            except KeyError:
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   748
                # we're sure that one is the first occurence; so continue...
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   749
                pass
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   750
        self.store.indexes.setdefault(name, {}).setdefault(key, []).append(value)
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   751
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   752
    def tell(self, msg):
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   753
        self._tell(msg)
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   754
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   755
    def iter_and_commit(self, datakey):
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   756
        """iter rows, triggering commit every self.commitevery iterations"""
6136
79da6f969b15 [dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 6133
diff changeset
   757
        if self.commitevery is None:
79da6f969b15 [dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 6133
diff changeset
   758
            return self.get_data(datakey)
79da6f969b15 [dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 6133
diff changeset
   759
        else:
6169
55378e1bab1b fix order of parameters in call to callfunc_every
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 6136
diff changeset
   760
            return callfunc_every(self.store.commit,
55378e1bab1b fix order of parameters in call to callfunc_every
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 6136
diff changeset
   761
                                  self.commitevery,
55378e1bab1b fix order of parameters in call to callfunc_every
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 6136
diff changeset
   762
                                  self.get_data(datakey))
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   763
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   764
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   765
class NoHookRQLObjectStore(RQLObjectStore):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   766
    """ObjectStore that works with an actual RQL repository (production mode)"""
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   767
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   768
    def __init__(self, session, metagen=None, baseurl=None):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   769
        super(NoHookRQLObjectStore, self).__init__(session)
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   770
        self.source = session.repo.system_source
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   771
        self.rschema = session.repo.schema.rschema
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   772
        self.add_relation = self.source.add_relation
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   773
        if metagen is None:
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   774
            metagen = MetaGenerator(session, baseurl)
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   775
        self.metagen = metagen
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   776
        self._nb_inserted_entities = 0
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   777
        self._nb_inserted_types = 0
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   778
        self._nb_inserted_relations = 0
5054
cb066d29166a fix dataimport for 3.7.2
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4913
diff changeset
   779
        # deactivate security
8807
d9aaad2c52e9 [session] drop useless getter and setter for security
Pierre-Yves David <pierre-yves.david@logilab.fr>
parents: 8797
diff changeset
   780
        session.read_security = False
d9aaad2c52e9 [session] drop useless getter and setter for security
Pierre-Yves David <pierre-yves.david@logilab.fr>
parents: 8797
diff changeset
   781
        session.write_security = False
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   782
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   783
    def create_entity(self, etype, **kwargs):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   784
        for k, v in kwargs.iteritems():
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   785
            kwargs[k] = getattr(v, 'eid', v)
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   786
        entity, rels = self.metagen.base_etype_dicts(etype)
7471
bf9443f8725f [dataimport] fix #1732685: cached entity and shared cw_edited data with NoHookRQLObjectStore
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 7398
diff changeset
   787
        # make a copy to keep cached entity pristine
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   788
        entity = copy(entity)
7471
bf9443f8725f [dataimport] fix #1732685: cached entity and shared cw_edited data with NoHookRQLObjectStore
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 7398
diff changeset
   789
        entity.cw_edited = copy(entity.cw_edited)
5557
1a534c596bff [entity] continue cleanup of Entity/AnyEntity namespace
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5424
diff changeset
   790
        entity.cw_clear_relation_cache()
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   791
        self.metagen.init_entity(entity)
6142
8bc6eac1fac1 [session] cleanup hook / operation / entity edition api
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 6122
diff changeset
   792
        entity.cw_edited.update(kwargs, skipsec=False)
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   793
        session = self.session
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   794
        self.source.add_entity(session, entity)
5054
cb066d29166a fix dataimport for 3.7.2
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4913
diff changeset
   795
        self.source.add_info(session, entity, self.source, None, complete=False)
8832
26cdfc6dd6f8 [dataimport] Uniformize the API across the different stores.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents: 8807
diff changeset
   796
        kwargs = dict()
26cdfc6dd6f8 [dataimport] Uniformize the API across the different stores.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents: 8807
diff changeset
   797
        if inspect.getargspec(self.add_relation).keywords:
8900
010a59e12d89 use cw_etype instead of __regid__
Pierre-Yves David <pierre-yves.david@logilab.fr>
parents: 8835
diff changeset
   798
            kwargs['subjtype'] = entity.cw_etype
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   799
        for rtype, targeteids in rels.iteritems():
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   800
            # targeteids may be a single eid or a list of eids
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   801
            inlined = self.rschema(rtype).inlined
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   802
            try:
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   803
                for targeteid in targeteids:
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   804
                    self.add_relation(session, entity.eid, rtype, targeteid,
8832
26cdfc6dd6f8 [dataimport] Uniformize the API across the different stores.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents: 8807
diff changeset
   805
                                      inlined, **kwargs)
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   806
            except TypeError:
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   807
                self.add_relation(session, entity.eid, rtype, targeteids,
8832
26cdfc6dd6f8 [dataimport] Uniformize the API across the different stores.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents: 8807
diff changeset
   808
                                  inlined, **kwargs)
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   809
        self._nb_inserted_entities += 1
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   810
        return entity
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   811
8832
26cdfc6dd6f8 [dataimport] Uniformize the API across the different stores.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents: 8807
diff changeset
   812
    def relate(self, eid_from, rtype, eid_to, **kwargs):
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   813
        assert not rtype.startswith('reverse_')
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   814
        self.add_relation(self.session, eid_from, rtype, eid_to,
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   815
                          self.rschema(rtype).inlined)
9597
8e9db17ce129 [dataimport] Correctly call rschema(rtype) in SqlGenObjectStore, closes #3694139
Vincent Michel <vincent.michel@logilab.fr>
parents: 9536
diff changeset
   816
        if self.rschema(rtype).symmetric:
9361
0542a85fe667 symmetric relations: replace bogus rql2sql translation by a hook
Aurelien Campeas <aurelien.campeas@logilab.fr>
parents: 9181
diff changeset
   817
            self.add_relation(self.session, eid_to, rtype, eid_from,
0542a85fe667 symmetric relations: replace bogus rql2sql translation by a hook
Aurelien Campeas <aurelien.campeas@logilab.fr>
parents: 9181
diff changeset
   818
                              self.rschema(rtype).inlined)
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   819
        self._nb_inserted_relations += 1
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   820
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   821
    @property
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   822
    def nb_inserted_entities(self):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   823
        return self._nb_inserted_entities
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   824
    @property
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   825
    def nb_inserted_types(self):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   826
        return self._nb_inserted_types
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   827
    @property
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   828
    def nb_inserted_relations(self):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   829
        return self._nb_inserted_relations
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   830
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   831
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   832
class MetaGenerator(object):
6427
c8a5ac2d1eaa [schema / sources] store data sources as cubicweb entities
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 6173
diff changeset
   833
    META_RELATIONS = (META_RTYPES
c8a5ac2d1eaa [schema / sources] store data sources as cubicweb entities
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 6173
diff changeset
   834
                      - VIRTUAL_RTYPES
c8a5ac2d1eaa [schema / sources] store data sources as cubicweb entities
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 6173
diff changeset
   835
                      - set(('eid', 'cwuri',
c8a5ac2d1eaa [schema / sources] store data sources as cubicweb entities
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 6173
diff changeset
   836
                             'is', 'is_instance_of', 'cw_source')))
c8a5ac2d1eaa [schema / sources] store data sources as cubicweb entities
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 6173
diff changeset
   837
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   838
    def __init__(self, session, baseurl=None):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   839
        self.session = session
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   840
        self.source = session.repo.system_source
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   841
        self.time = datetime.now()
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   842
        if baseurl is None:
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   843
            config = session.vreg.config
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   844
            baseurl = config['base-url'] or config.default_base_url()
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   845
        if not baseurl[-1] == '/':
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   846
            baseurl += '/'
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   847
        self.baseurl =  baseurl
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   848
        # attributes/relations shared by all entities of the same type
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   849
        self.etype_attrs = []
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   850
        self.etype_rels = []
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   851
        # attributes/relations specific to each entity
5054
cb066d29166a fix dataimport for 3.7.2
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4913
diff changeset
   852
        self.entity_attrs = ['cwuri']
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   853
        #self.entity_rels = [] XXX not handled (YAGNI?)
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   854
        schema = session.vreg.schema
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   855
        rschema = schema.rschema
6427
c8a5ac2d1eaa [schema / sources] store data sources as cubicweb entities
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 6173
diff changeset
   856
        for rtype in self.META_RELATIONS:
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   857
            if rschema(rtype).final:
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   858
                self.etype_attrs.append(rtype)
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   859
            else:
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   860
                self.etype_rels.append(rtype)
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   861
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   862
    @cached
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   863
    def base_etype_dicts(self, etype):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   864
        entity = self.session.vreg['etypes'].etype_class(etype)(self.session)
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   865
        # entity are "surface" copied, avoid shared dict between copies
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   866
        del entity.cw_extra_kwargs
6142
8bc6eac1fac1 [session] cleanup hook / operation / entity edition api
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 6122
diff changeset
   867
        entity.cw_edited = EditedEntity(entity)
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   868
        for attr in self.etype_attrs:
9697
d96b5e72717c [dataimport] Add safety belt on "gen_rtype" in MetaGenerator, closes #3712892
Vincent Michel <vincent.michel@logilab.fr>
parents: 9696
diff changeset
   869
            genfunc = self.generate(attr)
d96b5e72717c [dataimport] Add safety belt on "gen_rtype" in MetaGenerator, closes #3712892
Vincent Michel <vincent.michel@logilab.fr>
parents: 9696
diff changeset
   870
            if genfunc:
d96b5e72717c [dataimport] Add safety belt on "gen_rtype" in MetaGenerator, closes #3712892
Vincent Michel <vincent.michel@logilab.fr>
parents: 9696
diff changeset
   871
                entity.cw_edited.edited_attribute(attr, genfunc(entity))
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   872
        rels = {}
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   873
        for rel in self.etype_rels:
9697
d96b5e72717c [dataimport] Add safety belt on "gen_rtype" in MetaGenerator, closes #3712892
Vincent Michel <vincent.michel@logilab.fr>
parents: 9696
diff changeset
   874
            genfunc = self.generate(rel)
d96b5e72717c [dataimport] Add safety belt on "gen_rtype" in MetaGenerator, closes #3712892
Vincent Michel <vincent.michel@logilab.fr>
parents: 9696
diff changeset
   875
            if genfunc:
d96b5e72717c [dataimport] Add safety belt on "gen_rtype" in MetaGenerator, closes #3712892
Vincent Michel <vincent.michel@logilab.fr>
parents: 9696
diff changeset
   876
                rels[rel] = genfunc(entity)
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   877
        return entity, rels
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   878
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   879
    def init_entity(self, entity):
5054
cb066d29166a fix dataimport for 3.7.2
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4913
diff changeset
   880
        entity.eid = self.source.create_eid(self.session)
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   881
        for attr in self.entity_attrs:
9697
d96b5e72717c [dataimport] Add safety belt on "gen_rtype" in MetaGenerator, closes #3712892
Vincent Michel <vincent.michel@logilab.fr>
parents: 9696
diff changeset
   882
            genfunc = self.generate(attr)
d96b5e72717c [dataimport] Add safety belt on "gen_rtype" in MetaGenerator, closes #3712892
Vincent Michel <vincent.michel@logilab.fr>
parents: 9696
diff changeset
   883
            if genfunc:
d96b5e72717c [dataimport] Add safety belt on "gen_rtype" in MetaGenerator, closes #3712892
Vincent Michel <vincent.michel@logilab.fr>
parents: 9696
diff changeset
   884
                entity.cw_edited.edited_attribute(attr, genfunc(entity))
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   885
9697
d96b5e72717c [dataimport] Add safety belt on "gen_rtype" in MetaGenerator, closes #3712892
Vincent Michel <vincent.michel@logilab.fr>
parents: 9696
diff changeset
   886
    def generate(self, rtype):
d96b5e72717c [dataimport] Add safety belt on "gen_rtype" in MetaGenerator, closes #3712892
Vincent Michel <vincent.michel@logilab.fr>
parents: 9696
diff changeset
   887
        return getattr(self, 'gen_%s' % rtype, None)
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   888
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   889
    def gen_cwuri(self, entity):
9515
b0dd5b57d2d8 [dataimport, migration] more fixes in the spirit of a6c32edabc8d:
Dimitri Papadopoulos <dimitri.papadopoulos@cea.fr>
parents: 9440
diff changeset
   890
        return u'%s%s' % (self.baseurl, entity.eid)
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   891
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   892
    def gen_creation_date(self, entity):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   893
        return self.time
9697
d96b5e72717c [dataimport] Add safety belt on "gen_rtype" in MetaGenerator, closes #3712892
Vincent Michel <vincent.michel@logilab.fr>
parents: 9696
diff changeset
   894
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   895
    def gen_modification_date(self, entity):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   896
        return self.time
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   897
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   898
    def gen_created_by(self, entity):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   899
        return self.session.user.eid
9697
d96b5e72717c [dataimport] Add safety belt on "gen_rtype" in MetaGenerator, closes #3712892
Vincent Michel <vincent.michel@logilab.fr>
parents: 9696
diff changeset
   900
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   901
    def gen_owned_by(self, entity):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   902
        return self.session.user.eid
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   903
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   904
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   905
###########################################################################
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   906
## SQL object store #######################################################
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   907
###########################################################################
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   908
class SQLGenObjectStore(NoHookRQLObjectStore):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   909
    """Controller of the data import process. This version is based
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   910
    on direct insertions throught SQL command (COPY FROM or execute many).
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   911
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   912
    >>> store = SQLGenObjectStore(session)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   913
    >>> store.create_entity('Person', ...)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   914
    >>> store.flush()
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   915
    """
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   916
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   917
    def __init__(self, session, dump_output_dir=None, nb_threads_statement=3):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   918
        """
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   919
        Initialize a SQLGenObjectStore.
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   920
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   921
        Parameters:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   922
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   923
          - session: session on the cubicweb instance
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   924
          - dump_output_dir: a directory to dump failed statements
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   925
            for easier recovery. Default is None (no dump).
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   926
          - nb_threads_statement: number of threads used
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   927
            for SQL insertion (default is 3).
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   928
        """
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   929
        super(SQLGenObjectStore, self).__init__(session)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   930
        ### hijack default source
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   931
        self.source = SQLGenSourceWrapper(
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   932
            self.source, session.vreg.schema,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   933
            dump_output_dir=dump_output_dir,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   934
            nb_threads_statement=nb_threads_statement)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   935
        ### XXX This is done in super().__init__(), but should be
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   936
        ### redone here to link to the correct source
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   937
        self.add_relation = self.source.add_relation
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   938
        self.indexes_etypes = {}
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   939
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   940
    def flush(self):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   941
        """Flush data to the database"""
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   942
        self.source.flush()
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   943
8832
26cdfc6dd6f8 [dataimport] Uniformize the API across the different stores.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents: 8807
diff changeset
   944
    def relate(self, subj_eid, rtype, obj_eid, **kwargs):
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   945
        if subj_eid is None or obj_eid is None:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   946
            return
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   947
        # XXX Could subjtype be inferred ?
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   948
        self.source.add_relation(self.session, subj_eid, rtype, obj_eid,
8832
26cdfc6dd6f8 [dataimport] Uniformize the API across the different stores.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents: 8807
diff changeset
   949
                                 self.rschema(rtype).inlined, **kwargs)
9597
8e9db17ce129 [dataimport] Correctly call rschema(rtype) in SqlGenObjectStore, closes #3694139
Vincent Michel <vincent.michel@logilab.fr>
parents: 9536
diff changeset
   950
        if self.rschema(rtype).symmetric:
9361
0542a85fe667 symmetric relations: replace bogus rql2sql translation by a hook
Aurelien Campeas <aurelien.campeas@logilab.fr>
parents: 9181
diff changeset
   951
            self.source.add_relation(self.session, obj_eid, rtype, subj_eid,
0542a85fe667 symmetric relations: replace bogus rql2sql translation by a hook
Aurelien Campeas <aurelien.campeas@logilab.fr>
parents: 9181
diff changeset
   952
                                     self.rschema(rtype).inlined, **kwargs)
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   953
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   954
    def drop_indexes(self, etype):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   955
        """Drop indexes for a given entity type"""
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   956
        if etype not in self.indexes_etypes:
9463
d62e13eba033 [multi-sources-removal] Simplify ConnectionsSet internal structures and public methods
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 9450
diff changeset
   957
            cu = self.session.cnxset.cu
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   958
            def index_to_attr(index):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   959
                """turn an index name to (database) attribute name"""
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   960
                return index.replace(etype.lower(), '').replace('idx', '').strip('_')
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   961
            indices = [(index, index_to_attr(index))
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   962
                       for index in self.source.dbhelper.list_indices(cu, etype)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   963
                       # Do not consider 'cw_etype_pkey' index
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   964
                       if not index.endswith('key')]
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   965
            self.indexes_etypes[etype] = indices
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   966
        for index, attr in self.indexes_etypes[etype]:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   967
            self.session.system_sql('DROP INDEX %s' % index)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   968
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   969
    def create_indexes(self, etype):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   970
        """Recreate indexes for a given entity type"""
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   971
        for index, attr in self.indexes_etypes.get(etype, []):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   972
            sql = 'CREATE INDEX %s ON cw_%s(%s)' % (index, etype, attr)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   973
            self.session.system_sql(sql)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   974
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   975
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   976
###########################################################################
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   977
## SQL Source #############################################################
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   978
###########################################################################
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   979
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   980
class SQLGenSourceWrapper(object):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   981
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   982
    def __init__(self, system_source, schema,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   983
                 dump_output_dir=None, nb_threads_statement=3):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   984
        self.system_source = system_source
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   985
        self._sql = threading.local()
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   986
        # Explicitely backport attributes from system source
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   987
        self._storage_handler = self.system_source._storage_handler
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   988
        self.preprocess_entity = self.system_source.preprocess_entity
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   989
        self.sqlgen = self.system_source.sqlgen
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   990
        self.uri = self.system_source.uri
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   991
        self.eid = self.system_source.eid
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   992
        # Directory to write temporary files
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   993
        self.dump_output_dir = dump_output_dir
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   994
        # Allow to execute code with SQLite backend that does
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   995
        # not support (yet...) copy_from
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   996
        # XXX Should be dealt with in logilab.database
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   997
        spcfrom = system_source.dbhelper.dbapi_module.support_copy_from
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   998
        self.support_copy_from = spcfrom
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   999
        self.dbencoding = system_source.dbhelper.dbencoding
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1000
        self.nb_threads_statement = nb_threads_statement
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1001
        # initialize thread-local data for main thread
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1002
        self.init_thread_locals()
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1003
        self._inlined_rtypes_cache = {}
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1004
        self._fill_inlined_rtypes_cache(schema)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1005
        self.schema = schema
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1006
        self.do_fti = False
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1007
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1008
    def _fill_inlined_rtypes_cache(self, schema):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1009
        cache = self._inlined_rtypes_cache
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1010
        for eschema in schema.entities():
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1011
            for rschema in eschema.ordered_relations():
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1012
                if rschema.inlined:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1013
                    cache[eschema.type] = SQL_PREFIX + rschema.type
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1014
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1015
    def init_thread_locals(self):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1016
        """initializes thread-local data"""
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1017
        self._sql.entities = defaultdict(list)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1018
        self._sql.relations = {}
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1019
        self._sql.inlined_relations = {}
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1020
        # keep track, for each eid of the corresponding data dict
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1021
        self._sql.eid_insertdicts = {}
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1022
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1023
    def flush(self):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1024
        print 'starting flush'
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1025
        _entities_sql = self._sql.entities
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1026
        _relations_sql = self._sql.relations
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1027
        _inlined_relations_sql = self._sql.inlined_relations
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1028
        _insertdicts = self._sql.eid_insertdicts
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1029
        try:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1030
            # try, for each inlined_relation, to find if we're also creating
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1031
            # the host entity (i.e. the subject of the relation).
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1032
            # In that case, simply update the insert dict and remove
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1033
            # the need to make the
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1034
            # UPDATE statement
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1035
            for statement, datalist in _inlined_relations_sql.iteritems():
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1036
                new_datalist = []
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1037
                # for a given inlined relation,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1038
                # browse each couple to be inserted
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1039
                for data in datalist:
8696
0bb18407c053 [toward py3k] rewrite dict.keys() and dict.values() (part of #2711624)
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 8695
diff changeset
  1040
                    keys = list(data)
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1041
                    # For inlined relations, it exists only two case:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1042
                    # (rtype, cw_eid) or (cw_eid, rtype)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1043
                    if keys[0] == 'cw_eid':
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1044
                        rtype = keys[1]
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1045
                    else:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1046
                        rtype = keys[0]
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1047
                    updated_eid = data['cw_eid']
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1048
                    if updated_eid in _insertdicts:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1049
                        _insertdicts[updated_eid][rtype] = data[rtype]
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1050
                    else:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1051
                        # could not find corresponding insert dict, keep the
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1052
                        # UPDATE query
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1053
                        new_datalist.append(data)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1054
                _inlined_relations_sql[statement] = new_datalist
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1055
            _import_statements(self.system_source.get_connection,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1056
                               _entities_sql.items()
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1057
                               + _relations_sql.items()
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1058
                               + _inlined_relations_sql.items(),
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1059
                               dump_output_dir=self.dump_output_dir,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1060
                               nb_threads=self.nb_threads_statement,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1061
                               support_copy_from=self.support_copy_from,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1062
                               encoding=self.dbencoding)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1063
        finally:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1064
            _entities_sql.clear()
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1065
            _relations_sql.clear()
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1066
            _insertdicts.clear()
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1067
            _inlined_relations_sql.clear()
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1068
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1069
    def add_relation(self, session, subject, rtype, object,
8832
26cdfc6dd6f8 [dataimport] Uniformize the API across the different stores.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents: 8807
diff changeset
  1070
                     inlined=False, **kwargs):
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1071
        if inlined:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1072
            _sql = self._sql.inlined_relations
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1073
            data = {'cw_eid': subject, SQL_PREFIX + rtype: object}
8832
26cdfc6dd6f8 [dataimport] Uniformize the API across the different stores.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents: 8807
diff changeset
  1074
            subjtype = kwargs.get('subjtype')
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1075
            if subjtype is None:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1076
                # Try to infer it
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1077
                targets = [t.type for t in
9425
d7e8293fa4de [dataimport] The subjtype should be the subject of a relation, not the object, closes #3365113
Vincent Michel <vincent.michel@logilab.fr>
parents: 9181
diff changeset
  1078
                           self.schema.rschema(rtype).subjects()]
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1079
                if len(targets) == 1:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1080
                    subjtype = targets[0]
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1081
                else:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1082
                    raise ValueError('You should give the subject etype for '
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1083
                                     'inlined relation %s'
8835
3612b760488b [dataimport] Slight message modification in exception handling code.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents: 8834
diff changeset
  1084
                                     ', as it cannot be inferred: '
3612b760488b [dataimport] Slight message modification in exception handling code.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents: 8834
diff changeset
  1085
                                     'this type is given as keyword argument '
3612b760488b [dataimport] Slight message modification in exception handling code.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents: 8834
diff changeset
  1086
                                     '``subjtype``'% rtype)
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1087
            statement = self.sqlgen.update(SQL_PREFIX + subjtype,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1088
                                           data, ['cw_eid'])
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1089
        else:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1090
            _sql = self._sql.relations
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1091
            data = {'eid_from': subject, 'eid_to': object}
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1092
            statement = self.sqlgen.insert('%s_relation' % rtype, data)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1093
        if statement in _sql:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1094
            _sql[statement].append(data)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1095
        else:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1096
            _sql[statement] = [data]
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1097
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1098
    def add_entity(self, session, entity):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1099
        with self._storage_handler(entity, 'added'):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1100
            attrs = self.preprocess_entity(entity)
8900
010a59e12d89 use cw_etype instead of __regid__
Pierre-Yves David <pierre-yves.david@logilab.fr>
parents: 8835
diff changeset
  1101
            rtypes = self._inlined_rtypes_cache.get(entity.cw_etype, ())
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1102
            if isinstance(rtypes, str):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1103
                rtypes = (rtypes,)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1104
            for rtype in rtypes:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1105
                if rtype not in attrs:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1106
                    attrs[rtype] = None
8900
010a59e12d89 use cw_etype instead of __regid__
Pierre-Yves David <pierre-yves.david@logilab.fr>
parents: 8835
diff changeset
  1107
            sql = self.sqlgen.insert(SQL_PREFIX + entity.cw_etype, attrs)
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1108
            self._sql.eid_insertdicts[entity.eid] = attrs
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1109
            self._append_to_entities(sql, attrs)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1110
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1111
    def _append_to_entities(self, sql, attrs):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1112
        self._sql.entities[sql].append(attrs)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1113
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1114
    def _handle_insert_entity_sql(self, session, sql, attrs):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1115
        # We have to overwrite the source given in parameters
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1116
        # as here, we directly use the system source
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1117
        attrs['asource'] = self.system_source.uri
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1118
        self._append_to_entities(sql, attrs)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1119
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1120
    def _handle_is_relation_sql(self, session, sql, attrs):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1121
        self._append_to_entities(sql, attrs)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1122
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1123
    def _handle_is_instance_of_sql(self, session, sql, attrs):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1124
        self._append_to_entities(sql, attrs)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1125
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1126
    def _handle_source_relation_sql(self, session, sql, attrs):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1127
        self._append_to_entities(sql, attrs)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1128
9522
8154a5748194 [dataimport] fix comment
Aurelien Campeas <aurelien.campeas@logilab.fr>
parents: 9425
diff changeset
  1129
    # add_info is _copypasted_ from the one in NativeSQLSource. We want it
8154a5748194 [dataimport] fix comment
Aurelien Campeas <aurelien.campeas@logilab.fr>
parents: 9425
diff changeset
  1130
    # there because it will use the _handlers of the SQLGenSourceWrapper, which
8154a5748194 [dataimport] fix comment
Aurelien Campeas <aurelien.campeas@logilab.fr>
parents: 9425
diff changeset
  1131
    # are not like the ones in the native source.
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1132
    def add_info(self, session, entity, source, extid, complete):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1133
        """add type and source info for an eid into the system table"""
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1134
        # begin by inserting eid/type/source/extid into the entities table
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1135
        if extid is not None:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1136
            assert isinstance(extid, str)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1137
            extid = b64encode(extid)
8900
010a59e12d89 use cw_etype instead of __regid__
Pierre-Yves David <pierre-yves.david@logilab.fr>
parents: 8835
diff changeset
  1138
        attrs = {'type': entity.cw_etype, 'eid': entity.eid, 'extid': extid,
9827
c7ce035aede8 [dataimport] Drop reference to the 'source' column (closes #4067694).
Damien Garaud <damien.garaud@logilab.fr>
parents: 9770
diff changeset
  1139
                 'asource': source.uri}
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1140
        self._handle_insert_entity_sql(session, self.sqlgen.insert('entities', attrs), attrs)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1141
        # insert core relations: is, is_instance_of and cw_source
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1142
        try:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1143
            self._handle_is_relation_sql(session, 'INSERT INTO is_relation(eid_from,eid_to) VALUES (%s,%s)',
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1144
                                         (entity.eid, eschema_eid(session, entity.e_schema)))
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1145
        except IndexError:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1146
            # during schema serialization, skip
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1147
            pass
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1148
        else:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1149
            for eschema in entity.e_schema.ancestors() + [entity.e_schema]:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1150
                self._handle_is_relation_sql(session,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1151
                                             'INSERT INTO is_instance_of_relation(eid_from,eid_to) VALUES (%s,%s)',
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1152
                                             (entity.eid, eschema_eid(session, eschema)))
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1153
        if 'CWSource' in self.schema and source.eid is not None: # else, cw < 3.10
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1154
            self._handle_is_relation_sql(session, 'INSERT INTO cw_source_relation(eid_from,eid_to) VALUES (%s,%s)',
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1155
                                         (entity.eid, source.eid))
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1156
        # now we can update the full text index
8900
010a59e12d89 use cw_etype instead of __regid__
Pierre-Yves David <pierre-yves.david@logilab.fr>
parents: 8835
diff changeset
  1157
        if self.do_fti and self.need_fti_indexation(entity.cw_etype):
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1158
            if complete:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1159
                entity.complete(entity.e_schema.indexable_attributes())
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1160
            self.index_entity(session, entity=entity)