dataimport.py
author Aurelien Campeas <aurelien.campeas@logilab.fr>
Tue, 03 Mar 2015 14:57:34 +0100
changeset 10235 684215aca046
parent 10198 534efa7bfaeb
child 10272 3231fd2fa7a5
permissions -rw-r--r--
Remove remote repository-access-through-pyro support Modern methods such as the rqlcontroller cube + the cwclientlib library are the way forward. Closes #2919309.
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
     1
# -*- coding: utf-8 -*-
10007
727bbb361ed1 remove 3.11 bw compat
Aurelien Campeas <aurelien.campeas@logilab.fr>
parents: 9911
diff changeset
     2
# copyright 2003-2014 LOGILAB S.A. (Paris, FRANCE), all rights reserved.
5421
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5097
diff changeset
     3
# contact http://www.logilab.fr/ -- mailto:contact@logilab.fr
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5097
diff changeset
     4
#
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5097
diff changeset
     5
# This file is part of CubicWeb.
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5097
diff changeset
     6
#
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5097
diff changeset
     7
# CubicWeb is free software: you can redistribute it and/or modify it under the
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5097
diff changeset
     8
# terms of the GNU Lesser General Public License as published by the Free
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5097
diff changeset
     9
# Software Foundation, either version 2.1 of the License, or (at your option)
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5097
diff changeset
    10
# any later version.
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5097
diff changeset
    11
#
5424
8ecbcbff9777 replace logilab-common by CubicWeb in disclaimer
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5421
diff changeset
    12
# CubicWeb is distributed in the hope that it will be useful, but WITHOUT
5421
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5097
diff changeset
    13
# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5097
diff changeset
    14
# FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public License for more
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5097
diff changeset
    15
# details.
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5097
diff changeset
    16
#
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5097
diff changeset
    17
# You should have received a copy of the GNU Lesser General Public License along
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5097
diff changeset
    18
# with CubicWeb.  If not, see <http://www.gnu.org/licenses/>.
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    19
"""This module provides tools to import tabular data.
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    20
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    21
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    22
Example of use (run this with `cubicweb-ctl shell instance import-script.py`):
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    23
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    24
.. sourcecode:: python
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    25
7158
0f31a50b144e [dataimport] cleanups, update docstring to up-to-date usage
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7118
diff changeset
    26
  from cubicweb.dataimport import *
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    27
  # define data generators
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    28
  GENERATORS = []
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    29
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    30
  USERS = [('Prenom', 'firstname', ()),
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    31
           ('Nom', 'surname', ()),
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    32
           ('Identifiant', 'login', ()),
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    33
           ]
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    34
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    35
  def gen_users(ctl):
6133
6f3eabbbdf2e use iter_and_commit in example
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 6122
diff changeset
    36
      for row in ctl.iter_and_commit('utilisateurs'):
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    37
          entity = mk_entity(row, USERS)
7158
0f31a50b144e [dataimport] cleanups, update docstring to up-to-date usage
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7118
diff changeset
    38
          entity['upassword'] = 'motdepasse'
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    39
          ctl.check('login', entity['login'], None)
7158
0f31a50b144e [dataimport] cleanups, update docstring to up-to-date usage
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7118
diff changeset
    40
          entity = ctl.store.create_entity('CWUser', **entity)
0f31a50b144e [dataimport] cleanups, update docstring to up-to-date usage
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7118
diff changeset
    41
          email = ctl.store.create_entity('EmailAddress', address=row['email'])
0f31a50b144e [dataimport] cleanups, update docstring to up-to-date usage
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7118
diff changeset
    42
          ctl.store.relate(entity.eid, 'use_email', email.eid)
3003
2944ee420dca R [dataimport] rename uid to eid
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 2974
diff changeset
    43
          ctl.store.rql('SET U in_group G WHERE G name "users", U eid %(x)s', {'x':entity['eid']})
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    44
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    45
  CHK = [('login', check_doubles, 'Utilisateurs Login',
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    46
          'Deux utilisateurs ne devraient pas avoir le même login.'),
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    47
         ]
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    48
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    49
  GENERATORS.append( (gen_users, CHK) )
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    50
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    51
  # create controller
9906
b2919eca7514 [dataimport] remove _rql heresy
Julien Cristau <julien.cristau@logilab.fr>
parents: 9905
diff changeset
    52
  ctl = CWImportController(RQLObjectStore(cnx))
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
    53
  ctl.askerror = 1
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    54
  ctl.generators = GENERATORS
6122
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
    55
  ctl.data['utilisateurs'] = lazytable(ucsvreader(open('users.csv')))
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    56
  # run
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    57
  ctl.run()
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    58
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
    59
.. BUG file with one column are not parsable
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
    60
.. TODO rollback() invocation is not possible yet
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    61
"""
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    62
__docformat__ = "restructuredtext en"
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    63
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
    64
import csv
4186
ca7e526b07b6 import cleanup, check data file exists
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4173
diff changeset
    65
import sys
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
    66
import threading
4186
ca7e526b07b6 import cleanup, check data file exists
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4173
diff changeset
    67
import traceback
8926
336e4971dc50 [dataimport] backout 6947201033be (related to #2788402)
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents: 8900
diff changeset
    68
import warnings
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
    69
import cPickle
4186
ca7e526b07b6 import cleanup, check data file exists
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4173
diff changeset
    70
import os.path as osp
8832
26cdfc6dd6f8 [dataimport] Uniformize the API across the different stores.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents: 8807
diff changeset
    71
import inspect
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
    72
from collections import defaultdict
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
    73
from copy import copy
9901
161ec913aeec [dataimport] _create_copyfrom_buffer: fix datetime converter + add time converter
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9900
diff changeset
    74
from datetime import date, datetime, time
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
    75
from time import asctime
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
    76
from StringIO import StringIO
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    77
7159
3bcccd3ab6b6 [dataimport] ObjectStore.create_entity should return something that looks like an entity (eg no more using dict protocol to access to attributes)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7158
diff changeset
    78
from logilab.common import shellutils, attrdict
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
    79
from logilab.common.date import strptime
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
    80
from logilab.common.decorators import cached
4136
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
    81
from logilab.common.deprecation import deprecated
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    82
7171
4297be67bbe4 [dataimport] tell more and nicely about transaction status
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7170
diff changeset
    83
from cubicweb import QueryError
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
    84
from cubicweb.utils import make_uid
7158
0f31a50b144e [dataimport] cleanups, update docstring to up-to-date usage
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7118
diff changeset
    85
from cubicweb.schema import META_RTYPES, VIRTUAL_RTYPES
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
    86
from cubicweb.server.edition import EditedEntity
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
    87
from cubicweb.server.sqlutils import SQL_PREFIX
5066
bf5cbc351e99 [repo] move eschema_eid function from hooks.metadata to server.utils
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5063
diff changeset
    88
from cubicweb.server.utils import eschema_eid
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
    89
7158
0f31a50b144e [dataimport] cleanups, update docstring to up-to-date usage
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7118
diff changeset
    90
6122
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
    91
def count_lines(stream_or_filename):
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
    92
    if isinstance(stream_or_filename, basestring):
6492
47a284c0d012 fix some pylint detected errors
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 6427
diff changeset
    93
        f = open(stream_or_filename)
6122
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
    94
    else:
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
    95
        f = stream_or_filename
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
    96
        f.seek(0)
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
    97
    for i, line in enumerate(f):
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
    98
        pass
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
    99
    f.seek(0)
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
   100
    return i+1
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
   101
10091
09878c2f8621 [dataimport] Have ucsvreader's API match that of csv.reader (closes #3705701)
Rémi Cardona <remi.cardona@logilab.fr>
parents: 10078
diff changeset
   102
def ucsvreader_pb(stream_or_path, encoding='utf-8', delimiter=',', quotechar='"',
09878c2f8621 [dataimport] Have ucsvreader's API match that of csv.reader (closes #3705701)
Rémi Cardona <remi.cardona@logilab.fr>
parents: 10078
diff changeset
   103
                  skipfirst=False, withpb=True, skip_empty=True, separator=None,
09878c2f8621 [dataimport] Have ucsvreader's API match that of csv.reader (closes #3705701)
Rémi Cardona <remi.cardona@logilab.fr>
parents: 10078
diff changeset
   104
                  quote=None):
9181
2eac0aa1d3f6 [dataimport] ucsvreader should skip empty lines unless specified otherwise. Closes #3035944
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8970
diff changeset
   105
    """same as :func:`ucsvreader` but a progress bar is displayed as we iter on rows"""
10091
09878c2f8621 [dataimport] Have ucsvreader's API match that of csv.reader (closes #3705701)
Rémi Cardona <remi.cardona@logilab.fr>
parents: 10078
diff changeset
   106
    if separator is not None:
09878c2f8621 [dataimport] Have ucsvreader's API match that of csv.reader (closes #3705701)
Rémi Cardona <remi.cardona@logilab.fr>
parents: 10078
diff changeset
   107
        delimiter = separator
09878c2f8621 [dataimport] Have ucsvreader's API match that of csv.reader (closes #3705701)
Rémi Cardona <remi.cardona@logilab.fr>
parents: 10078
diff changeset
   108
        warnings.warn("[3.20] 'separator' kwarg is deprecated, use 'delimiter' instead")
09878c2f8621 [dataimport] Have ucsvreader's API match that of csv.reader (closes #3705701)
Rémi Cardona <remi.cardona@logilab.fr>
parents: 10078
diff changeset
   109
    if quote is not None:
09878c2f8621 [dataimport] Have ucsvreader's API match that of csv.reader (closes #3705701)
Rémi Cardona <remi.cardona@logilab.fr>
parents: 10078
diff changeset
   110
        quotechar = quote
09878c2f8621 [dataimport] Have ucsvreader's API match that of csv.reader (closes #3705701)
Rémi Cardona <remi.cardona@logilab.fr>
parents: 10078
diff changeset
   111
        warnings.warn("[3.20] 'quote' kwarg is deprecated, use 'quotechar' instead")
6122
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
   112
    if isinstance(stream_or_path, basestring):
6492
47a284c0d012 fix some pylint detected errors
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 6427
diff changeset
   113
        if not osp.exists(stream_or_path):
47a284c0d012 fix some pylint detected errors
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 6427
diff changeset
   114
            raise Exception("file doesn't exists: %s" % stream_or_path)
6122
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
   115
        stream = open(stream_or_path)
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
   116
    else:
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
   117
        stream = stream_or_path
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
   118
    rowcount = count_lines(stream)
4136
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   119
    if skipfirst:
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   120
        rowcount -= 1
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   121
    if withpb:
4140
46ddd27a4ca4 tweaks output
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4136
diff changeset
   122
        pb = shellutils.ProgressBar(rowcount, 50)
10091
09878c2f8621 [dataimport] Have ucsvreader's API match that of csv.reader (closes #3705701)
Rémi Cardona <remi.cardona@logilab.fr>
parents: 10078
diff changeset
   123
    for urow in ucsvreader(stream, encoding, delimiter, quotechar,
9181
2eac0aa1d3f6 [dataimport] ucsvreader should skip empty lines unless specified otherwise. Closes #3035944
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8970
diff changeset
   124
                           skipfirst=skipfirst, skip_empty=skip_empty):
4136
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   125
        yield urow
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   126
        if withpb:
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   127
            pb.update()
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   128
    print ' %s rows imported' % rowcount
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   129
10091
09878c2f8621 [dataimport] Have ucsvreader's API match that of csv.reader (closes #3705701)
Rémi Cardona <remi.cardona@logilab.fr>
parents: 10078
diff changeset
   130
def ucsvreader(stream, encoding='utf-8', delimiter=',', quotechar='"',
09878c2f8621 [dataimport] Have ucsvreader's API match that of csv.reader (closes #3705701)
Rémi Cardona <remi.cardona@logilab.fr>
parents: 10078
diff changeset
   131
               skipfirst=False, ignore_errors=False, skip_empty=True,
09878c2f8621 [dataimport] Have ucsvreader's API match that of csv.reader (closes #3705701)
Rémi Cardona <remi.cardona@logilab.fr>
parents: 10078
diff changeset
   132
               separator=None, quote=None):
4136
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   133
    """A csv reader that accepts files with any encoding and outputs unicode
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   134
    strings
9181
2eac0aa1d3f6 [dataimport] ucsvreader should skip empty lines unless specified otherwise. Closes #3035944
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8970
diff changeset
   135
2eac0aa1d3f6 [dataimport] ucsvreader should skip empty lines unless specified otherwise. Closes #3035944
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8970
diff changeset
   136
    if skip_empty (the default), lines without any values specified (only
2eac0aa1d3f6 [dataimport] ucsvreader should skip empty lines unless specified otherwise. Closes #3035944
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8970
diff changeset
   137
    separators) will be skipped. This is useful for Excel exports which may be
2eac0aa1d3f6 [dataimport] ucsvreader should skip empty lines unless specified otherwise. Closes #3035944
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8970
diff changeset
   138
    full of such lines.
4136
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   139
    """
10091
09878c2f8621 [dataimport] Have ucsvreader's API match that of csv.reader (closes #3705701)
Rémi Cardona <remi.cardona@logilab.fr>
parents: 10078
diff changeset
   140
    if separator is not None:
09878c2f8621 [dataimport] Have ucsvreader's API match that of csv.reader (closes #3705701)
Rémi Cardona <remi.cardona@logilab.fr>
parents: 10078
diff changeset
   141
        delimiter = separator
09878c2f8621 [dataimport] Have ucsvreader's API match that of csv.reader (closes #3705701)
Rémi Cardona <remi.cardona@logilab.fr>
parents: 10078
diff changeset
   142
        warnings.warn("[3.20] 'separator' kwarg is deprecated, use 'delimiter' instead")
09878c2f8621 [dataimport] Have ucsvreader's API match that of csv.reader (closes #3705701)
Rémi Cardona <remi.cardona@logilab.fr>
parents: 10078
diff changeset
   143
    if quote is not None:
09878c2f8621 [dataimport] Have ucsvreader's API match that of csv.reader (closes #3705701)
Rémi Cardona <remi.cardona@logilab.fr>
parents: 10078
diff changeset
   144
        quotechar = quote
09878c2f8621 [dataimport] Have ucsvreader's API match that of csv.reader (closes #3705701)
Rémi Cardona <remi.cardona@logilab.fr>
parents: 10078
diff changeset
   145
        warnings.warn("[3.20] 'quote' kwarg is deprecated, use 'quotechar' instead")
09878c2f8621 [dataimport] Have ucsvreader's API match that of csv.reader (closes #3705701)
Rémi Cardona <remi.cardona@logilab.fr>
parents: 10078
diff changeset
   146
    it = iter(csv.reader(stream, delimiter=delimiter, quotechar=quotechar))
8637
e16561083d84 [dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8631
diff changeset
   147
    if not ignore_errors:
e16561083d84 [dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8631
diff changeset
   148
        if skipfirst:
e16561083d84 [dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8631
diff changeset
   149
            it.next()
e16561083d84 [dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8631
diff changeset
   150
        for row in it:
9181
2eac0aa1d3f6 [dataimport] ucsvreader should skip empty lines unless specified otherwise. Closes #3035944
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8970
diff changeset
   151
            decoded = [item.decode(encoding) for item in row]
2eac0aa1d3f6 [dataimport] ucsvreader should skip empty lines unless specified otherwise. Closes #3035944
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8970
diff changeset
   152
            if not skip_empty or any(decoded):
9694
c90107199dea [dataimport] Avoid double unicode decoding in ucsvreader (closes #3705752)
Rémi Cardona <remi.cardona@logilab.fr>
parents: 9597
diff changeset
   153
                yield decoded
8637
e16561083d84 [dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8631
diff changeset
   154
    else:
9695
aa982b7c3f2a [dataimport] Prevent ucsvreader from skipping the first line when ignore_errors is True (closes #3705791)
Rémi Cardona <remi.cardona@logilab.fr>
parents: 9694
diff changeset
   155
        if skipfirst:
aa982b7c3f2a [dataimport] Prevent ucsvreader from skipping the first line when ignore_errors is True (closes #3705791)
Rémi Cardona <remi.cardona@logilab.fr>
parents: 9694
diff changeset
   156
            try:
aa982b7c3f2a [dataimport] Prevent ucsvreader from skipping the first line when ignore_errors is True (closes #3705791)
Rémi Cardona <remi.cardona@logilab.fr>
parents: 9694
diff changeset
   157
                row = it.next()
aa982b7c3f2a [dataimport] Prevent ucsvreader from skipping the first line when ignore_errors is True (closes #3705791)
Rémi Cardona <remi.cardona@logilab.fr>
parents: 9694
diff changeset
   158
            except csv.Error:
aa982b7c3f2a [dataimport] Prevent ucsvreader from skipping the first line when ignore_errors is True (closes #3705791)
Rémi Cardona <remi.cardona@logilab.fr>
parents: 9694
diff changeset
   159
                pass
8637
e16561083d84 [dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8631
diff changeset
   160
        # Safe version, that can cope with error in CSV file
e16561083d84 [dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8631
diff changeset
   161
        while True:
e16561083d84 [dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8631
diff changeset
   162
            try:
e16561083d84 [dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8631
diff changeset
   163
                row = it.next()
e16561083d84 [dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8631
diff changeset
   164
            # End of CSV, break
e16561083d84 [dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8631
diff changeset
   165
            except StopIteration:
e16561083d84 [dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8631
diff changeset
   166
                break
e16561083d84 [dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8631
diff changeset
   167
            # Error in CSV, ignore line and continue
e16561083d84 [dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8631
diff changeset
   168
            except csv.Error:
e16561083d84 [dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8631
diff changeset
   169
                continue
9181
2eac0aa1d3f6 [dataimport] ucsvreader should skip empty lines unless specified otherwise. Closes #3035944
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8970
diff changeset
   170
            decoded = [item.decode(encoding) for item in row]
2eac0aa1d3f6 [dataimport] ucsvreader should skip empty lines unless specified otherwise. Closes #3035944
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8970
diff changeset
   171
            if not skip_empty or any(decoded):
2eac0aa1d3f6 [dataimport] ucsvreader should skip empty lines unless specified otherwise. Closes #3035944
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8970
diff changeset
   172
                yield decoded
2eac0aa1d3f6 [dataimport] ucsvreader should skip empty lines unless specified otherwise. Closes #3035944
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8970
diff changeset
   173
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   174
6136
79da6f969b15 [dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 6133
diff changeset
   175
def callfunc_every(func, number, iterable):
79da6f969b15 [dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 6133
diff changeset
   176
    """yield items of `iterable` one by one and call function `func`
79da6f969b15 [dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 6133
diff changeset
   177
    every `number` iterations. Always call function `func` at the end.
79da6f969b15 [dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 6133
diff changeset
   178
    """
79da6f969b15 [dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 6133
diff changeset
   179
    for idx, item in enumerate(iterable):
79da6f969b15 [dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 6133
diff changeset
   180
        yield item
7227
23d9c1f89c96 [dataimport] actually commit every desired number...
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7214
diff changeset
   181
        if not idx % number:
6136
79da6f969b15 [dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 6133
diff changeset
   182
            func()
79da6f969b15 [dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 6133
diff changeset
   183
    func()
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   184
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   185
def lazytable(reader):
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   186
    """The first row is taken to be the header of the table and
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   187
    used to output a dict for each row of data.
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   188
6122
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
   189
    >>> data = lazytable(ucsvreader(open(filename)))
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   190
    """
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   191
    header = reader.next()
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   192
    for row in reader:
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   193
        yield dict(zip(header, row))
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   194
7201
52f5831400b2 [dataimport] allow to specify columns on which result should be sorted in lazydbtable
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7171
diff changeset
   195
def lazydbtable(cu, table, headers, orderby=None):
7160
923013173031 [dataimport] new 'lazydbtable' generator function to feed data from a database table
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7159
diff changeset
   196
    """return an iterator on rows of a sql table. On each row, fetch columns
923013173031 [dataimport] new 'lazydbtable' generator function to feed data from a database table
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7159
diff changeset
   197
    defined in headers and return values as a dictionary.
923013173031 [dataimport] new 'lazydbtable' generator function to feed data from a database table
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7159
diff changeset
   198
923013173031 [dataimport] new 'lazydbtable' generator function to feed data from a database table
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7159
diff changeset
   199
    >>> data = lazydbtable(cu, 'experimentation', ('id', 'nickname', 'gps'))
923013173031 [dataimport] new 'lazydbtable' generator function to feed data from a database table
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7159
diff changeset
   200
    """
7201
52f5831400b2 [dataimport] allow to specify columns on which result should be sorted in lazydbtable
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7171
diff changeset
   201
    sql = 'SELECT %s FROM %s' % (','.join(headers), table,)
52f5831400b2 [dataimport] allow to specify columns on which result should be sorted in lazydbtable
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7171
diff changeset
   202
    if orderby:
52f5831400b2 [dataimport] allow to specify columns on which result should be sorted in lazydbtable
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7171
diff changeset
   203
        sql += ' ORDER BY %s' % ','.join(orderby)
52f5831400b2 [dataimport] allow to specify columns on which result should be sorted in lazydbtable
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7171
diff changeset
   204
    cu.execute(sql)
7160
923013173031 [dataimport] new 'lazydbtable' generator function to feed data from a database table
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7159
diff changeset
   205
    while True:
923013173031 [dataimport] new 'lazydbtable' generator function to feed data from a database table
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7159
diff changeset
   206
        row = cu.fetchone()
923013173031 [dataimport] new 'lazydbtable' generator function to feed data from a database table
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7159
diff changeset
   207
        if row is None:
923013173031 [dataimport] new 'lazydbtable' generator function to feed data from a database table
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7159
diff changeset
   208
            break
923013173031 [dataimport] new 'lazydbtable' generator function to feed data from a database table
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7159
diff changeset
   209
        yield dict(zip(headers, row))
923013173031 [dataimport] new 'lazydbtable' generator function to feed data from a database table
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7159
diff changeset
   210
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   211
def mk_entity(row, map):
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   212
    """Return a dict made from sanitized mapped values.
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   213
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   214
    ValueError can be raised on unexpected values found in checkers
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   215
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   216
    >>> row = {'myname': u'dupont'}
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   217
    >>> map = [('myname', u'name', (call_transform_method('title'),))]
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   218
    >>> mk_entity(row, map)
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   219
    {'name': u'Dupont'}
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   220
    >>> row = {'myname': u'dupont', 'optname': u''}
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   221
    >>> map = [('myname', u'name', (call_transform_method('title'),)),
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   222
    ...        ('optname', u'MARKER', (optional,))]
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   223
    >>> mk_entity(row, map)
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   224
    {'name': u'Dupont', 'optname': None}
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   225
    """
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   226
    res = {}
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   227
    assert isinstance(row, dict)
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   228
    assert isinstance(map, list)
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   229
    for src, dest, funcs in map:
8406
f3bc8ca0b715 [data import] don't crash if value isn't in the file, simply no key/value in the output dict. Closes #2356328
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8403
diff changeset
   230
        try:
f3bc8ca0b715 [data import] don't crash if value isn't in the file, simply no key/value in the output dict. Closes #2356328
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8403
diff changeset
   231
            res[dest] = row[src]
f3bc8ca0b715 [data import] don't crash if value isn't in the file, simply no key/value in the output dict. Closes #2356328
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8403
diff changeset
   232
        except KeyError:
f3bc8ca0b715 [data import] don't crash if value isn't in the file, simply no key/value in the output dict. Closes #2356328
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8403
diff changeset
   233
            continue
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   234
        try:
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   235
            for func in funcs:
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   236
                res[dest] = func(res[dest])
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   237
                if res[dest] is None:
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   238
                    break
8695
358d8bed9626 [toward-py3k] rewrite to "except AnException as exc:" (part of #2711624)
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 8637
diff changeset
   239
        except ValueError as err:
7170
32b5d9d43a7e [dataimport] propagate stack
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7160
diff changeset
   240
            raise ValueError('error with %r field: %s' % (src, err)), None, sys.exc_info()[-1]
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   241
    return res
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   242
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   243
# user interactions ############################################################
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   244
3029
bc573d5fb5b7 F [devtools] by default dataimport prints message on stdout
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 3003
diff changeset
   245
def tell(msg):
bc573d5fb5b7 F [devtools] by default dataimport prints message on stdout
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 3003
diff changeset
   246
    print msg
bc573d5fb5b7 F [devtools] by default dataimport prints message on stdout
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 3003
diff changeset
   247
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   248
def confirm(question):
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   249
    """A confirm function that asks for yes/no/abort and exits on abort."""
4721
8f63691ccb7f pylint style fixes
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4613
diff changeset
   250
    answer = shellutils.ASK.ask(question, ('Y', 'n', 'abort'), 'Y')
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   251
    if answer == 'abort':
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   252
        sys.exit(1)
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   253
    return answer == 'Y'
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   254
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   255
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   256
class catch_error(object):
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   257
    """Helper for @contextmanager decorator."""
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   258
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   259
    def __init__(self, ctl, key='unexpected error', msg=None):
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   260
        self.ctl = ctl
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   261
        self.key = key
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   262
        self.msg = msg
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   263
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   264
    def __enter__(self):
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   265
        return self
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   266
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   267
    def __exit__(self, type, value, traceback):
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   268
        if type is not None:
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   269
            if issubclass(type, (KeyboardInterrupt, SystemExit)):
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   270
                return # re-raise
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   271
            if self.ctl.catcherrors:
4173
cfd5d3270f99 msg isn't defined there, but we've to give traceback information to record error
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4152
diff changeset
   272
                self.ctl.record_error(self.key, None, type, value, traceback)
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   273
                return True # silent
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   274
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   275
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   276
# base sanitizing/coercing functions ###########################################
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   277
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   278
def optional(value):
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   279
    """checker to filter optional field
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   280
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   281
    If value is undefined (ex: empty string), return None that will
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   282
    break the checkers validation chain
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   283
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   284
    General use is to add 'optional' check in first condition to avoid
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   285
    ValueError by further checkers
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   286
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   287
    >>> MAPPER = [(u'value', 'value', (optional, int))]
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   288
    >>> row = {'value': u'XXX'}
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   289
    >>> mk_entity(row, MAPPER)
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   290
    {'value': None}
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   291
    >>> row = {'value': u'100'}
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   292
    >>> mk_entity(row, MAPPER)
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   293
    {'value': 100}
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   294
    """
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   295
    if value:
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   296
        return value
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   297
    return None
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   298
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   299
def required(value):
6136
79da6f969b15 [dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 6133
diff changeset
   300
    """raise ValueError if value is empty
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   301
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   302
    This check should be often found in last position in the chain.
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   303
    """
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   304
    if value:
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   305
        return value
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   306
    raise ValueError("required")
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   307
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   308
def todatetime(format='%d/%m/%Y'):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   309
    """return a transformation function to turn string input value into a
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   310
    `datetime.datetime` instance, using given format.
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   311
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   312
    Follow it by `todate` or `totime` functions from `logilab.common.date` if
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   313
    you want a `date`/`time` instance instead of `datetime`.
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   314
    """
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   315
    def coerce(value):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   316
        return strptime(value, format)
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   317
    return coerce
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   318
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   319
def call_transform_method(methodname, *args, **kwargs):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   320
    """return value returned by calling the given method on input"""
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   321
    def coerce(value):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   322
        return getattr(value, methodname)(*args, **kwargs)
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   323
    return coerce
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   324
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   325
def call_check_method(methodname, *args, **kwargs):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   326
    """check value returned by calling the given method on input is true,
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   327
    else raise ValueError
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   328
    """
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   329
    def check(value):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   330
        if getattr(value, methodname)(*args, **kwargs):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   331
            return value
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   332
        raise ValueError('%s not verified on %r' % (methodname, value))
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   333
    return check
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   334
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   335
# base integrity checking functions ############################################
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   336
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   337
def check_doubles(buckets):
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   338
    """Extract the keys that have more than one item in their bucket."""
4721
8f63691ccb7f pylint style fixes
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4613
diff changeset
   339
    return [(k, len(v)) for k, v in buckets.items() if len(v) > 1]
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   340
4136
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   341
def check_doubles_not_none(buckets):
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   342
    """Extract the keys that have more than one item in their bucket."""
4721
8f63691ccb7f pylint style fixes
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4613
diff changeset
   343
    return [(k, len(v)) for k, v in buckets.items()
8f63691ccb7f pylint style fixes
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4613
diff changeset
   344
            if k is not None and len(v) > 1]
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   345
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   346
# sql generator utility functions #############################################
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   347
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   348
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   349
def _import_statements(sql_connect, statements, nb_threads=3,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   350
                       dump_output_dir=None,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   351
                       support_copy_from=True, encoding='utf-8'):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   352
    """
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   353
    Import a bunch of sql statements, using different threads.
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   354
    """
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   355
    try:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   356
        chunksize = (len(statements) / nb_threads) + 1
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   357
        threads = []
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   358
        for i in xrange(nb_threads):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   359
            chunks = statements[i*chunksize:(i+1)*chunksize]
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   360
            thread = threading.Thread(target=_execmany_thread,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   361
                                      args=(sql_connect, chunks,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   362
                                            dump_output_dir,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   363
                                            support_copy_from,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   364
                                            encoding))
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   365
            thread.start()
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   366
            threads.append(thread)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   367
        for t in threads:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   368
            t.join()
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   369
    except Exception:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   370
        print 'Error in import statements'
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   371
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   372
def _execmany_thread_not_copy_from(cu, statement, data, table=None,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   373
                                   columns=None, encoding='utf-8'):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   374
    """ Execute thread without copy from
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   375
    """
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   376
    cu.executemany(statement, data)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   377
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   378
def _execmany_thread_copy_from(cu, statement, data, table,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   379
                               columns, encoding='utf-8'):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   380
    """ Execute thread with copy from
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   381
    """
10078
5eeffcfde1ba [dataimport] Fix use of _create_copyfrom_buffer() (related to #3845572)
Rémi Cardona <remi.cardona@logilab.fr>
parents: 10007
diff changeset
   382
    buf = _create_copyfrom_buffer(data, columns, encoding=encoding)
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   383
    if buf is None:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   384
        _execmany_thread_not_copy_from(cu, statement, data)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   385
    else:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   386
        if columns is None:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   387
            cu.copy_from(buf, table, null='NULL')
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   388
        else:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   389
            cu.copy_from(buf, table, null='NULL', columns=columns)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   390
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   391
def _execmany_thread(sql_connect, statements, dump_output_dir=None,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   392
                     support_copy_from=True, encoding='utf-8'):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   393
    """
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   394
    Execute sql statement. If 'INSERT INTO', try to use 'COPY FROM' command,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   395
    or fallback to execute_many.
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   396
    """
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   397
    if support_copy_from:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   398
        execmany_func = _execmany_thread_copy_from
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   399
    else:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   400
        execmany_func = _execmany_thread_not_copy_from
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   401
    cnx = sql_connect()
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   402
    cu = cnx.cursor()
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   403
    try:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   404
        for statement, data in statements:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   405
            table = None
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   406
            columns = None
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   407
            try:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   408
                if not statement.startswith('INSERT INTO'):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   409
                    cu.executemany(statement, data)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   410
                    continue
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   411
                table = statement.split()[2]
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   412
                if isinstance(data[0], (tuple, list)):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   413
                    columns = None
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   414
                else:
8696
0bb18407c053 [toward py3k] rewrite dict.keys() and dict.values() (part of #2711624)
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 8695
diff changeset
   415
                    columns = list(data[0])
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   416
                execmany_func(cu, statement, data, table, columns, encoding)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   417
            except Exception:
8970
0a1bd0c590e2 [dataimport] minor typo in error handling
Dimitri Papadopoulos <dimitri.papadopoulos@cea.fr>
parents: 8930
diff changeset
   418
                print 'unable to copy data into table %s' % table
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   419
                # Error in import statement, save data in dump_output_dir
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   420
                if dump_output_dir is not None:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   421
                    pdata = {'data': data, 'statement': statement,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   422
                             'time': asctime(), 'columns': columns}
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   423
                    filename = make_uid()
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   424
                    try:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   425
                        with open(osp.join(dump_output_dir,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   426
                                           '%s.pickle' % filename), 'w') as fobj:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   427
                            fobj.write(cPickle.dumps(pdata))
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   428
                    except IOError:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   429
                        print 'ERROR while pickling in', dump_output_dir, filename+'.pickle'
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   430
                        pass
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   431
                cnx.rollback()
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   432
                raise
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   433
    finally:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   434
        cnx.commit()
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   435
        cu.close()
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   436
9898
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   437
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   438
def _copyfrom_buffer_convert_None(value, **opts):
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   439
    '''Convert None value to "NULL"'''
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   440
    return 'NULL'
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   441
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   442
def _copyfrom_buffer_convert_number(value, **opts):
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   443
    '''Convert a number into its string representation'''
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   444
    return str(value)
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   445
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   446
def _copyfrom_buffer_convert_string(value, **opts):
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   447
    '''Convert string value.
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   448
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   449
    Recognized keywords:
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   450
    :encoding: resulting string encoding (default: utf-8)
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   451
    :replace_sep: character used when input contains characters
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   452
                  that conflict with the column separator.
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   453
    '''
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   454
    encoding = opts.get('encoding','utf-8')
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   455
    replace_sep = opts.get('replace_sep', None)
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   456
    # Remove separators used in string formatting
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   457
    for _char in (u'\t', u'\r', u'\n'):
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   458
        if _char in value:
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   459
            # If a replace_sep is given, replace
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   460
            # the separator
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   461
            # (and thus avoid empty buffer)
9900
9c7de09a6648 [dataimport] _create_copyfrom_buffer: raise ValueError if conversion cannot be performed
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9899
diff changeset
   462
            if replace_sep is None:
9c7de09a6648 [dataimport] _create_copyfrom_buffer: raise ValueError if conversion cannot be performed
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9899
diff changeset
   463
                raise ValueError('conflicting separator: '
9c7de09a6648 [dataimport] _create_copyfrom_buffer: raise ValueError if conversion cannot be performed
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9899
diff changeset
   464
                                 'you must provide the replace_sep option')
9c7de09a6648 [dataimport] _create_copyfrom_buffer: raise ValueError if conversion cannot be performed
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9899
diff changeset
   465
            value = value.replace(_char, replace_sep)
9898
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   466
        value = value.replace('\\', r'\\')
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   467
    if isinstance(value, unicode):
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   468
        value = value.encode(encoding)
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   469
    return value
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   470
9901
161ec913aeec [dataimport] _create_copyfrom_buffer: fix datetime converter + add time converter
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9900
diff changeset
   471
def _copyfrom_buffer_convert_date(value, **opts):
9898
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   472
    '''Convert date into "YYYY-MM-DD"'''
9901
161ec913aeec [dataimport] _create_copyfrom_buffer: fix datetime converter + add time converter
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9900
diff changeset
   473
    # Do not use strftime, as it yields issue with date < 1900
161ec913aeec [dataimport] _create_copyfrom_buffer: fix datetime converter + add time converter
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9900
diff changeset
   474
    # (http://bugs.python.org/issue1777412)
161ec913aeec [dataimport] _create_copyfrom_buffer: fix datetime converter + add time converter
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9900
diff changeset
   475
    return '%04d-%02d-%02d' % (value.year, value.month, value.day)
161ec913aeec [dataimport] _create_copyfrom_buffer: fix datetime converter + add time converter
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9900
diff changeset
   476
161ec913aeec [dataimport] _create_copyfrom_buffer: fix datetime converter + add time converter
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9900
diff changeset
   477
def _copyfrom_buffer_convert_datetime(value, **opts):
161ec913aeec [dataimport] _create_copyfrom_buffer: fix datetime converter + add time converter
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9900
diff changeset
   478
    '''Convert date into "YYYY-MM-DD HH:MM:SS.UUUUUU"'''
161ec913aeec [dataimport] _create_copyfrom_buffer: fix datetime converter + add time converter
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9900
diff changeset
   479
    # Do not use strftime, as it yields issue with date < 1900
161ec913aeec [dataimport] _create_copyfrom_buffer: fix datetime converter + add time converter
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9900
diff changeset
   480
    # (http://bugs.python.org/issue1777412)
161ec913aeec [dataimport] _create_copyfrom_buffer: fix datetime converter + add time converter
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9900
diff changeset
   481
    return '%s %s' % (_copyfrom_buffer_convert_date(value, **opts),
161ec913aeec [dataimport] _create_copyfrom_buffer: fix datetime converter + add time converter
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9900
diff changeset
   482
                      _copyfrom_buffer_convert_time(value, **opts))
161ec913aeec [dataimport] _create_copyfrom_buffer: fix datetime converter + add time converter
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9900
diff changeset
   483
161ec913aeec [dataimport] _create_copyfrom_buffer: fix datetime converter + add time converter
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9900
diff changeset
   484
def _copyfrom_buffer_convert_time(value, **opts):
161ec913aeec [dataimport] _create_copyfrom_buffer: fix datetime converter + add time converter
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9900
diff changeset
   485
    '''Convert time into "HH:MM:SS.UUUUUU"'''
161ec913aeec [dataimport] _create_copyfrom_buffer: fix datetime converter + add time converter
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9900
diff changeset
   486
    return '%02d:%02d:%02d.%06d' % (value.hour, value.minute,
161ec913aeec [dataimport] _create_copyfrom_buffer: fix datetime converter + add time converter
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9900
diff changeset
   487
                                    value.second, value.microsecond)
9898
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   488
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   489
# (types, converter) list.
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   490
_COPYFROM_BUFFER_CONVERTERS = [
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   491
    (type(None), _copyfrom_buffer_convert_None),
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   492
    ((long, int, float), _copyfrom_buffer_convert_number),
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   493
    (basestring, _copyfrom_buffer_convert_string),
9901
161ec913aeec [dataimport] _create_copyfrom_buffer: fix datetime converter + add time converter
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9900
diff changeset
   494
    (datetime, _copyfrom_buffer_convert_datetime),
161ec913aeec [dataimport] _create_copyfrom_buffer: fix datetime converter + add time converter
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9900
diff changeset
   495
    (date, _copyfrom_buffer_convert_date),
161ec913aeec [dataimport] _create_copyfrom_buffer: fix datetime converter + add time converter
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9900
diff changeset
   496
    (time, _copyfrom_buffer_convert_time),
9898
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   497
]
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   498
9902
62c586f32f93 [dataimport] _create_copyfrom_buffer: do not ignore columns if data is a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9901
diff changeset
   499
def _create_copyfrom_buffer(data, columns=None, **convert_opts):
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   500
    """
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   501
    Create a StringIO buffer for 'COPY FROM' command.
9898
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   502
    Deals with Unicode, Int, Float, Date... (see ``converters``)
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   503
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   504
    :data: a sequence/dict of tuples
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   505
    :columns: list of columns to consider (default to all columns)
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   506
    :converter_opts: keyword arguements given to converters
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   507
    """
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   508
    # Create a list rather than directly create a StringIO
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   509
    # to correctly write lines separated by '\n' in a single step
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   510
    rows = []
9902
62c586f32f93 [dataimport] _create_copyfrom_buffer: do not ignore columns if data is a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9901
diff changeset
   511
    if columns is None:
62c586f32f93 [dataimport] _create_copyfrom_buffer: do not ignore columns if data is a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9901
diff changeset
   512
        if isinstance(data[0], (tuple, list)):
62c586f32f93 [dataimport] _create_copyfrom_buffer: do not ignore columns if data is a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9901
diff changeset
   513
            columns = range(len(data[0]))
62c586f32f93 [dataimport] _create_copyfrom_buffer: do not ignore columns if data is a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9901
diff changeset
   514
        elif isinstance(data[0], dict):
62c586f32f93 [dataimport] _create_copyfrom_buffer: do not ignore columns if data is a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9901
diff changeset
   515
            columns = data[0].keys()
62c586f32f93 [dataimport] _create_copyfrom_buffer: do not ignore columns if data is a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9901
diff changeset
   516
        else:
62c586f32f93 [dataimport] _create_copyfrom_buffer: do not ignore columns if data is a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9901
diff changeset
   517
            raise ValueError('Could not get columns: you must provide columns.')
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   518
    for row in data:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   519
        # Iterate over the different columns and the different values
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   520
        # and try to convert them to a correct datatype.
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   521
        # If an error is raised, do not continue.
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   522
        formatted_row = []
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   523
        for col in columns:
8926
336e4971dc50 [dataimport] backout 6947201033be (related to #2788402)
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents: 8900
diff changeset
   524
            try:
8834
6947201033be [dataimport] Handle various data formats when creating buffers from data.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents: 8833
diff changeset
   525
                value = row[col]
8926
336e4971dc50 [dataimport] backout 6947201033be (related to #2788402)
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents: 8900
diff changeset
   526
            except KeyError:
9898
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   527
                warnings.warn(u"Column %s is not accessible in row %s"
8926
336e4971dc50 [dataimport] backout 6947201033be (related to #2788402)
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents: 8900
diff changeset
   528
                              % (col, row), RuntimeWarning)
9898
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   529
                # XXX 'value' set to None so that the import does not end in
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   530
                # error.
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   531
                # Instead, the extra keys are set to NULL from the
8926
336e4971dc50 [dataimport] backout 6947201033be (related to #2788402)
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents: 8900
diff changeset
   532
                # database point of view.
336e4971dc50 [dataimport] backout 6947201033be (related to #2788402)
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents: 8900
diff changeset
   533
                value = None
9898
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   534
            for types, converter in _COPYFROM_BUFFER_CONVERTERS:
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   535
                if isinstance(value, types):
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   536
                    value = converter(value, **convert_opts)
70056633085c [dataimport] _create_copyfrom_buffer: put converters into a list
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9827
diff changeset
   537
                    break
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   538
            else:
9900
9c7de09a6648 [dataimport] _create_copyfrom_buffer: raise ValueError if conversion cannot be performed
Alain Leufroy <alain.leufroy@logilab.fr>
parents: 9899
diff changeset
   539
                raise ValueError("Unsupported value type %s" % type(value))
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   540
            # We push the value to the new formatted row
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   541
            # if the value is not None and could be converted to a string.
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   542
            formatted_row.append(value)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   543
        rows.append('\t'.join(formatted_row))
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   544
    return StringIO('\n'.join(rows))
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   545
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   546
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   547
# object stores #################################################################
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   548
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   549
class ObjectStore(object):
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   550
    """Store objects in memory for *faster* validation (development mode)
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   551
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   552
    But it will not enforce the constraints of the schema and hence will miss some problems
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   553
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   554
    >>> store = ObjectStore()
7158
0f31a50b144e [dataimport] cleanups, update docstring to up-to-date usage
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7118
diff changeset
   555
    >>> user = store.create_entity('CWUser', login=u'johndoe')
0f31a50b144e [dataimport] cleanups, update docstring to up-to-date usage
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7118
diff changeset
   556
    >>> group = store.create_entity('CWUser', name=u'unknown')
0f31a50b144e [dataimport] cleanups, update docstring to up-to-date usage
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7118
diff changeset
   557
    >>> store.relate(user.eid, 'in_group', group.eid)
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   558
    """
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   559
    def __init__(self):
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   560
        self.items = []
3003
2944ee420dca R [dataimport] rename uid to eid
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 2974
diff changeset
   561
        self.eids = {}
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   562
        self.types = {}
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   563
        self.relations = set()
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   564
        self.indexes = {}
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   565
6990
353ad06867a8 [dataimport] implement create_entity() on ObjectStore to provide a consistent interface
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 6989
diff changeset
   566
    def create_entity(self, etype, **data):
7159
3bcccd3ab6b6 [dataimport] ObjectStore.create_entity should return something that looks like an entity (eg no more using dict protocol to access to attributes)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7158
diff changeset
   567
        data = attrdict(data)
9905
1fa35cc06c69 [dataimport] remove dead code
Julien Cristau <julien.cristau@logilab.fr>
parents: 9904
diff changeset
   568
        data['eid'] = eid = len(self.items)
1fa35cc06c69 [dataimport] remove dead code
Julien Cristau <julien.cristau@logilab.fr>
parents: 9904
diff changeset
   569
        self.items.append(data)
6990
353ad06867a8 [dataimport] implement create_entity() on ObjectStore to provide a consistent interface
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 6989
diff changeset
   570
        self.eids[eid] = data
353ad06867a8 [dataimport] implement create_entity() on ObjectStore to provide a consistent interface
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 6989
diff changeset
   571
        self.types.setdefault(etype, []).append(eid)
353ad06867a8 [dataimport] implement create_entity() on ObjectStore to provide a consistent interface
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 6989
diff changeset
   572
        return data
353ad06867a8 [dataimport] implement create_entity() on ObjectStore to provide a consistent interface
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 6989
diff changeset
   573
8832
26cdfc6dd6f8 [dataimport] Uniformize the API across the different stores.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents: 8807
diff changeset
   574
    def relate(self, eid_from, rtype, eid_to, **kwargs):
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   575
        """Add new relation"""
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   576
        relation = eid_from, rtype, eid_to
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   577
        self.relations.add(relation)
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   578
        return relation
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   579
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   580
    def commit(self):
9908
88bbb3abf30f [dataimport] Stop swallowing errors from commit/flush
Julien Cristau <julien.cristau@logilab.fr>
parents: 9907
diff changeset
   581
        """this commit method does nothing by default"""
88bbb3abf30f [dataimport] Stop swallowing errors from commit/flush
Julien Cristau <julien.cristau@logilab.fr>
parents: 9907
diff changeset
   582
        return
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   583
8833
39f81e2db2fc [dataimport] Add a ``flush`` method for all stores.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents: 8832
diff changeset
   584
    def flush(self):
9910
55d9d483e7c3 [dataimport] don't commit on flush
Julien Cristau <julien.cristau@logilab.fr>
parents: 9908
diff changeset
   585
        """The method is provided so that all stores share a common API"""
55d9d483e7c3 [dataimport] don't commit on flush
Julien Cristau <julien.cristau@logilab.fr>
parents: 9908
diff changeset
   586
        pass
8833
39f81e2db2fc [dataimport] Add a ``flush`` method for all stores.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents: 8832
diff changeset
   587
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   588
    @property
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   589
    def nb_inserted_entities(self):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   590
        return len(self.eids)
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   591
    @property
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   592
    def nb_inserted_types(self):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   593
        return len(self.types)
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   594
    @property
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   595
    def nb_inserted_relations(self):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   596
        return len(self.relations)
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   597
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   598
class RQLObjectStore(ObjectStore):
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   599
    """ObjectStore that works with an actual RQL repository (production mode)"""
4136
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   600
9907
696b81eba218 [dataimport] Update RQLObjectStore to Connection API
Julien Cristau <julien.cristau@logilab.fr>
parents: 9906
diff changeset
   601
    def __init__(self, cnx, commit=None):
696b81eba218 [dataimport] Update RQLObjectStore to Connection API
Julien Cristau <julien.cristau@logilab.fr>
parents: 9906
diff changeset
   602
        if commit is not None:
696b81eba218 [dataimport] Update RQLObjectStore to Connection API
Julien Cristau <julien.cristau@logilab.fr>
parents: 9906
diff changeset
   603
            warnings.warn('[3.19] commit argument should not be specified '
696b81eba218 [dataimport] Update RQLObjectStore to Connection API
Julien Cristau <julien.cristau@logilab.fr>
parents: 9906
diff changeset
   604
                          'as the cnx object already provides it.',
696b81eba218 [dataimport] Update RQLObjectStore to Connection API
Julien Cristau <julien.cristau@logilab.fr>
parents: 9906
diff changeset
   605
                          DeprecationWarning, stacklevel=2)
696b81eba218 [dataimport] Update RQLObjectStore to Connection API
Julien Cristau <julien.cristau@logilab.fr>
parents: 9906
diff changeset
   606
        super(RQLObjectStore, self).__init__()
696b81eba218 [dataimport] Update RQLObjectStore to Connection API
Julien Cristau <julien.cristau@logilab.fr>
parents: 9906
diff changeset
   607
        self._cnx = cnx
696b81eba218 [dataimport] Update RQLObjectStore to Connection API
Julien Cristau <julien.cristau@logilab.fr>
parents: 9906
diff changeset
   608
        self._commit = commit or cnx.commit
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   609
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   610
    def commit(self):
9907
696b81eba218 [dataimport] Update RQLObjectStore to Connection API
Julien Cristau <julien.cristau@logilab.fr>
parents: 9906
diff changeset
   611
        return self._commit()
4136
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   612
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   613
    def rql(self, *args):
9907
696b81eba218 [dataimport] Update RQLObjectStore to Connection API
Julien Cristau <julien.cristau@logilab.fr>
parents: 9906
diff changeset
   614
        return self._cnx.execute(*args)
696b81eba218 [dataimport] Update RQLObjectStore to Connection API
Julien Cristau <julien.cristau@logilab.fr>
parents: 9906
diff changeset
   615
696b81eba218 [dataimport] Update RQLObjectStore to Connection API
Julien Cristau <julien.cristau@logilab.fr>
parents: 9906
diff changeset
   616
    @property
696b81eba218 [dataimport] Update RQLObjectStore to Connection API
Julien Cristau <julien.cristau@logilab.fr>
parents: 9906
diff changeset
   617
    def session(self):
696b81eba218 [dataimport] Update RQLObjectStore to Connection API
Julien Cristau <julien.cristau@logilab.fr>
parents: 9906
diff changeset
   618
        warnings.warn('[3.19] deprecated property.', DeprecationWarning,
696b81eba218 [dataimport] Update RQLObjectStore to Connection API
Julien Cristau <julien.cristau@logilab.fr>
parents: 9906
diff changeset
   619
                      stacklevel=2)
696b81eba218 [dataimport] Update RQLObjectStore to Connection API
Julien Cristau <julien.cristau@logilab.fr>
parents: 9906
diff changeset
   620
        return self._cnx.repo._get_session(self._cnx.sessionid)
4136
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   621
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   622
    def create_entity(self, *args, **kwargs):
9907
696b81eba218 [dataimport] Update RQLObjectStore to Connection API
Julien Cristau <julien.cristau@logilab.fr>
parents: 9906
diff changeset
   623
        entity = self._cnx.create_entity(*args, **kwargs)
4136
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   624
        self.eids[entity.eid] = entity
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   625
        self.types.setdefault(args[0], []).append(entity.eid)
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   626
        return entity
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   627
8832
26cdfc6dd6f8 [dataimport] Uniformize the API across the different stores.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents: 8807
diff changeset
   628
    def relate(self, eid_from, rtype, eid_to, **kwargs):
4721
8f63691ccb7f pylint style fixes
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4613
diff changeset
   629
        eid_from, rtype, eid_to = super(RQLObjectStore, self).relate(
8832
26cdfc6dd6f8 [dataimport] Uniformize the API across the different stores.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents: 8807
diff changeset
   630
            eid_from, rtype, eid_to, **kwargs)
4136
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   631
        self.rql('SET X %s Y WHERE X eid %%(x)s, Y eid %%(y)s' % rtype,
7033
ddc1b4d80dbd [dataimport] remove eid_key deprecation warning and prints
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 6990
diff changeset
   632
                 {'x': int(eid_from), 'y': int(eid_to)})
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   633
9907
696b81eba218 [dataimport] Update RQLObjectStore to Connection API
Julien Cristau <julien.cristau@logilab.fr>
parents: 9906
diff changeset
   634
    @deprecated("[3.19] use cnx.find(*args, **kwargs).entities() instead")
7116
dfd4680a23f0 [session] add find_entities and find_one_entity to session/request API (closes #1550045)
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 7033
diff changeset
   635
    def find_entities(self, *args, **kwargs):
9907
696b81eba218 [dataimport] Update RQLObjectStore to Connection API
Julien Cristau <julien.cristau@logilab.fr>
parents: 9906
diff changeset
   636
        return self._cnx.find(*args, **kwargs).entities()
7116
dfd4680a23f0 [session] add find_entities and find_one_entity to session/request API (closes #1550045)
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 7033
diff changeset
   637
9907
696b81eba218 [dataimport] Update RQLObjectStore to Connection API
Julien Cristau <julien.cristau@logilab.fr>
parents: 9906
diff changeset
   638
    @deprecated("[3.19] use cnx.find(*args, **kwargs).one() instead")
7116
dfd4680a23f0 [session] add find_entities and find_one_entity to session/request API (closes #1550045)
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 7033
diff changeset
   639
    def find_one_entity(self, *args, **kwargs):
9907
696b81eba218 [dataimport] Update RQLObjectStore to Connection API
Julien Cristau <julien.cristau@logilab.fr>
parents: 9906
diff changeset
   640
        return self._cnx.find(*args, **kwargs).one()
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   641
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   642
# the import controller ########################################################
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   643
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   644
class CWImportController(object):
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   645
    """Controller of the data import process.
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   646
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   647
    >>> ctl = CWImportController(store)
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   648
    >>> ctl.generators = list_of_data_generators
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   649
    >>> ctl.data = dict_of_data_tables
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   650
    >>> ctl.run()
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   651
    """
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   652
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   653
    def __init__(self, store, askerror=0, catcherrors=None, tell=tell,
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   654
                 commitevery=50):
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   655
        self.store = store
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   656
        self.generators = None
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   657
        self.data = {}
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   658
        self.errors = None
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   659
        self.askerror = askerror
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   660
        if  catcherrors is None:
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   661
            catcherrors = askerror
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   662
        self.catcherrors = catcherrors
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   663
        self.commitevery = commitevery # set to None to do a single commit
3029
bc573d5fb5b7 F [devtools] by default dataimport prints message on stdout
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 3003
diff changeset
   664
        self._tell = tell
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   665
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   666
    def check(self, type, key, value):
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   667
        self._checks.setdefault(type, {}).setdefault(key, []).append(value)
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   668
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   669
    def check_map(self, entity, key, map, default):
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   670
        try:
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   671
            entity[key] = map[entity[key]]
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   672
        except KeyError:
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   673
            self.check(key, entity[key], None)
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   674
            entity[key] = default
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   675
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   676
    def record_error(self, key, msg=None, type=None, value=None, tb=None):
4186
ca7e526b07b6 import cleanup, check data file exists
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4173
diff changeset
   677
        tmp = StringIO()
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   678
        if type is None:
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   679
            traceback.print_exc(file=tmp)
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   680
        else:
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   681
            traceback.print_exception(type, value, tb, file=tmp)
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   682
        # use a list to avoid counting a <nb lines> errors instead of one
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   683
        errorlog = self.errors.setdefault(key, [])
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   684
        if msg is None:
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   685
            errorlog.append(tmp.getvalue().splitlines())
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   686
        else:
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   687
            errorlog.append( (msg, tmp.getvalue().splitlines()) )
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   688
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   689
    def run(self):
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   690
        self.errors = {}
7171
4297be67bbe4 [dataimport] tell more and nicely about transaction status
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7170
diff changeset
   691
        if self.commitevery is None:
4297be67bbe4 [dataimport] tell more and nicely about transaction status
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7170
diff changeset
   692
            self.tell('Will commit all or nothing.')
4297be67bbe4 [dataimport] tell more and nicely about transaction status
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7170
diff changeset
   693
        else:
4297be67bbe4 [dataimport] tell more and nicely about transaction status
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7170
diff changeset
   694
            self.tell('Will commit every %s iterations' % self.commitevery)
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   695
        for func, checks in self.generators:
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   696
            self._checks = {}
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   697
            func_name = func.__name__
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   698
            self.tell("Run import function '%s'..." % func_name)
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   699
            try:
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   700
                func(self)
7815
2a164a9cf81c [exceptions] stop catching any exception in various places (closes #1942716)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7471
diff changeset
   701
            except Exception:
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   702
                if self.catcherrors:
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   703
                    self.record_error(func_name, 'While calling %s' % func.__name__)
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   704
                else:
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   705
                    self._print_stats()
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   706
                    raise
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   707
            for key, func, title, help in checks:
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   708
                buckets = self._checks.get(key)
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   709
                if buckets:
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   710
                    err = func(buckets)
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   711
                    if err:
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   712
                        self.errors[title] = (help, err)
7171
4297be67bbe4 [dataimport] tell more and nicely about transaction status
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7170
diff changeset
   713
        try:
4297be67bbe4 [dataimport] tell more and nicely about transaction status
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7170
diff changeset
   714
            txuuid = self.store.commit()
4297be67bbe4 [dataimport] tell more and nicely about transaction status
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7170
diff changeset
   715
            if txuuid is not None:
4297be67bbe4 [dataimport] tell more and nicely about transaction status
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7170
diff changeset
   716
                self.tell('Transaction commited (txuuid: %s)' % txuuid)
8695
358d8bed9626 [toward-py3k] rewrite to "except AnException as exc:" (part of #2711624)
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 8637
diff changeset
   717
        except QueryError as ex:
7171
4297be67bbe4 [dataimport] tell more and nicely about transaction status
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7170
diff changeset
   718
            self.tell('Transaction aborted: %s' % ex)
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   719
        self._print_stats()
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   720
        if self.errors:
4721
8f63691ccb7f pylint style fixes
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4613
diff changeset
   721
            if self.askerror == 2 or (self.askerror and confirm('Display errors ?')):
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   722
                from pprint import pformat
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   723
                for errkey, error in self.errors.items():
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   724
                    self.tell("\n%s (%s): %d\n" % (error[0], errkey, len(error[1])))
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   725
                    self.tell(pformat(sorted(error[1])))
7171
4297be67bbe4 [dataimport] tell more and nicely about transaction status
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7170
diff changeset
   726
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   727
    def _print_stats(self):
8696
0bb18407c053 [toward py3k] rewrite dict.keys() and dict.values() (part of #2711624)
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 8695
diff changeset
   728
        nberrors = sum(len(err) for err in self.errors.itervalues())
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   729
        self.tell('\nImport statistics: %i entities, %i types, %i relations and %i errors'
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   730
                  % (self.store.nb_inserted_entities,
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   731
                     self.store.nb_inserted_types,
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   732
                     self.store.nb_inserted_relations,
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   733
                     nberrors))
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   734
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   735
    def get_data(self, key):
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   736
        return self.data.get(key)
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   737
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   738
    def index(self, name, key, value, unique=False):
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   739
        """create a new index
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   740
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   741
        If unique is set to True, only first occurence will be kept not the following ones
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   742
        """
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   743
        if unique:
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   744
            try:
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   745
                if value in self.store.indexes[name][key]:
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   746
                    return
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   747
            except KeyError:
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   748
                # we're sure that one is the first occurence; so continue...
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   749
                pass
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   750
        self.store.indexes.setdefault(name, {}).setdefault(key, []).append(value)
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   751
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   752
    def tell(self, msg):
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   753
        self._tell(msg)
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   754
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   755
    def iter_and_commit(self, datakey):
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   756
        """iter rows, triggering commit every self.commitevery iterations"""
6136
79da6f969b15 [dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 6133
diff changeset
   757
        if self.commitevery is None:
79da6f969b15 [dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 6133
diff changeset
   758
            return self.get_data(datakey)
79da6f969b15 [dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 6133
diff changeset
   759
        else:
6169
55378e1bab1b fix order of parameters in call to callfunc_every
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 6136
diff changeset
   760
            return callfunc_every(self.store.commit,
55378e1bab1b fix order of parameters in call to callfunc_every
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 6136
diff changeset
   761
                                  self.commitevery,
55378e1bab1b fix order of parameters in call to callfunc_every
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 6136
diff changeset
   762
                                  self.get_data(datakey))
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   763
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   764
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   765
class NoHookRQLObjectStore(RQLObjectStore):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   766
    """ObjectStore that works with an actual RQL repository (production mode)"""
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   767
10189
0b141ffcdd74 [dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10091
diff changeset
   768
    def __init__(self, cnx, metagen=None, baseurl=None):
0b141ffcdd74 [dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10091
diff changeset
   769
        super(NoHookRQLObjectStore, self).__init__(cnx)
0b141ffcdd74 [dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10091
diff changeset
   770
        self.source = cnx.repo.system_source
0b141ffcdd74 [dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10091
diff changeset
   771
        self.rschema = cnx.repo.schema.rschema
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   772
        self.add_relation = self.source.add_relation
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   773
        if metagen is None:
10189
0b141ffcdd74 [dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10091
diff changeset
   774
            metagen = MetaGenerator(cnx, baseurl)
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   775
        self.metagen = metagen
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   776
        self._nb_inserted_entities = 0
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   777
        self._nb_inserted_types = 0
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   778
        self._nb_inserted_relations = 0
5054
cb066d29166a fix dataimport for 3.7.2
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4913
diff changeset
   779
        # deactivate security
10189
0b141ffcdd74 [dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10091
diff changeset
   780
        cnx.read_security = False
0b141ffcdd74 [dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10091
diff changeset
   781
        cnx.write_security = False
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   782
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   783
    def create_entity(self, etype, **kwargs):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   784
        for k, v in kwargs.iteritems():
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   785
            kwargs[k] = getattr(v, 'eid', v)
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   786
        entity, rels = self.metagen.base_etype_dicts(etype)
7471
bf9443f8725f [dataimport] fix #1732685: cached entity and shared cw_edited data with NoHookRQLObjectStore
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 7398
diff changeset
   787
        # make a copy to keep cached entity pristine
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   788
        entity = copy(entity)
7471
bf9443f8725f [dataimport] fix #1732685: cached entity and shared cw_edited data with NoHookRQLObjectStore
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 7398
diff changeset
   789
        entity.cw_edited = copy(entity.cw_edited)
5557
1a534c596bff [entity] continue cleanup of Entity/AnyEntity namespace
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5424
diff changeset
   790
        entity.cw_clear_relation_cache()
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   791
        self.metagen.init_entity(entity)
6142
8bc6eac1fac1 [session] cleanup hook / operation / entity edition api
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 6122
diff changeset
   792
        entity.cw_edited.update(kwargs, skipsec=False)
10189
0b141ffcdd74 [dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10091
diff changeset
   793
        cnx = self._cnx
0b141ffcdd74 [dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10091
diff changeset
   794
        self.source.add_entity(cnx, entity)
10190
252e8f7ff9ea [dataimport] source.add_info doesn't take anymore a 'complete' argument
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10189
diff changeset
   795
        self.source.add_info(cnx, entity, self.source, None)
8832
26cdfc6dd6f8 [dataimport] Uniformize the API across the different stores.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents: 8807
diff changeset
   796
        kwargs = dict()
26cdfc6dd6f8 [dataimport] Uniformize the API across the different stores.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents: 8807
diff changeset
   797
        if inspect.getargspec(self.add_relation).keywords:
8900
010a59e12d89 use cw_etype instead of __regid__
Pierre-Yves David <pierre-yves.david@logilab.fr>
parents: 8835
diff changeset
   798
            kwargs['subjtype'] = entity.cw_etype
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   799
        for rtype, targeteids in rels.iteritems():
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   800
            # targeteids may be a single eid or a list of eids
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   801
            inlined = self.rschema(rtype).inlined
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   802
            try:
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   803
                for targeteid in targeteids:
10189
0b141ffcdd74 [dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10091
diff changeset
   804
                    self.add_relation(cnx, entity.eid, rtype, targeteid,
8832
26cdfc6dd6f8 [dataimport] Uniformize the API across the different stores.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents: 8807
diff changeset
   805
                                      inlined, **kwargs)
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   806
            except TypeError:
10189
0b141ffcdd74 [dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10091
diff changeset
   807
                self.add_relation(cnx, entity.eid, rtype, targeteids,
8832
26cdfc6dd6f8 [dataimport] Uniformize the API across the different stores.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents: 8807
diff changeset
   808
                                  inlined, **kwargs)
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   809
        self._nb_inserted_entities += 1
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   810
        return entity
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   811
8832
26cdfc6dd6f8 [dataimport] Uniformize the API across the different stores.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents: 8807
diff changeset
   812
    def relate(self, eid_from, rtype, eid_to, **kwargs):
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   813
        assert not rtype.startswith('reverse_')
10189
0b141ffcdd74 [dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10091
diff changeset
   814
        self.add_relation(self._cnx, eid_from, rtype, eid_to,
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   815
                          self.rschema(rtype).inlined)
9597
8e9db17ce129 [dataimport] Correctly call rschema(rtype) in SqlGenObjectStore, closes #3694139
Vincent Michel <vincent.michel@logilab.fr>
parents: 9536
diff changeset
   816
        if self.rschema(rtype).symmetric:
10189
0b141ffcdd74 [dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10091
diff changeset
   817
            self.add_relation(self._cnx, eid_to, rtype, eid_from,
9361
0542a85fe667 symmetric relations: replace bogus rql2sql translation by a hook
Aurelien Campeas <aurelien.campeas@logilab.fr>
parents: 9181
diff changeset
   818
                              self.rschema(rtype).inlined)
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   819
        self._nb_inserted_relations += 1
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   820
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   821
    @property
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   822
    def nb_inserted_entities(self):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   823
        return self._nb_inserted_entities
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   824
    @property
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   825
    def nb_inserted_types(self):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   826
        return self._nb_inserted_types
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   827
    @property
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   828
    def nb_inserted_relations(self):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   829
        return self._nb_inserted_relations
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   830
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   831
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   832
class MetaGenerator(object):
6427
c8a5ac2d1eaa [schema / sources] store data sources as cubicweb entities
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 6173
diff changeset
   833
    META_RELATIONS = (META_RTYPES
c8a5ac2d1eaa [schema / sources] store data sources as cubicweb entities
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 6173
diff changeset
   834
                      - VIRTUAL_RTYPES
c8a5ac2d1eaa [schema / sources] store data sources as cubicweb entities
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 6173
diff changeset
   835
                      - set(('eid', 'cwuri',
c8a5ac2d1eaa [schema / sources] store data sources as cubicweb entities
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 6173
diff changeset
   836
                             'is', 'is_instance_of', 'cw_source')))
c8a5ac2d1eaa [schema / sources] store data sources as cubicweb entities
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 6173
diff changeset
   837
10189
0b141ffcdd74 [dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10091
diff changeset
   838
    def __init__(self, cnx, baseurl=None):
0b141ffcdd74 [dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10091
diff changeset
   839
        self._cnx = cnx
0b141ffcdd74 [dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10091
diff changeset
   840
        self.source = cnx.repo.system_source
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   841
        self.time = datetime.now()
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   842
        if baseurl is None:
10189
0b141ffcdd74 [dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10091
diff changeset
   843
            config = cnx.vreg.config
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   844
            baseurl = config['base-url'] or config.default_base_url()
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   845
        if not baseurl[-1] == '/':
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   846
            baseurl += '/'
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   847
        self.baseurl =  baseurl
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   848
        # attributes/relations shared by all entities of the same type
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   849
        self.etype_attrs = []
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   850
        self.etype_rels = []
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   851
        # attributes/relations specific to each entity
5054
cb066d29166a fix dataimport for 3.7.2
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4913
diff changeset
   852
        self.entity_attrs = ['cwuri']
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   853
        #self.entity_rels = [] XXX not handled (YAGNI?)
10189
0b141ffcdd74 [dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10091
diff changeset
   854
        schema = cnx.vreg.schema
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   855
        rschema = schema.rschema
6427
c8a5ac2d1eaa [schema / sources] store data sources as cubicweb entities
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 6173
diff changeset
   856
        for rtype in self.META_RELATIONS:
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   857
            if rschema(rtype).final:
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   858
                self.etype_attrs.append(rtype)
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   859
            else:
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   860
                self.etype_rels.append(rtype)
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   861
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   862
    @cached
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   863
    def base_etype_dicts(self, etype):
10189
0b141ffcdd74 [dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10091
diff changeset
   864
        entity = self._cnx.vreg['etypes'].etype_class(etype)(self._cnx)
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   865
        # entity are "surface" copied, avoid shared dict between copies
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   866
        del entity.cw_extra_kwargs
6142
8bc6eac1fac1 [session] cleanup hook / operation / entity edition api
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 6122
diff changeset
   867
        entity.cw_edited = EditedEntity(entity)
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   868
        for attr in self.etype_attrs:
9697
d96b5e72717c [dataimport] Add safety belt on "gen_rtype" in MetaGenerator, closes #3712892
Vincent Michel <vincent.michel@logilab.fr>
parents: 9696
diff changeset
   869
            genfunc = self.generate(attr)
d96b5e72717c [dataimport] Add safety belt on "gen_rtype" in MetaGenerator, closes #3712892
Vincent Michel <vincent.michel@logilab.fr>
parents: 9696
diff changeset
   870
            if genfunc:
d96b5e72717c [dataimport] Add safety belt on "gen_rtype" in MetaGenerator, closes #3712892
Vincent Michel <vincent.michel@logilab.fr>
parents: 9696
diff changeset
   871
                entity.cw_edited.edited_attribute(attr, genfunc(entity))
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   872
        rels = {}
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   873
        for rel in self.etype_rels:
9697
d96b5e72717c [dataimport] Add safety belt on "gen_rtype" in MetaGenerator, closes #3712892
Vincent Michel <vincent.michel@logilab.fr>
parents: 9696
diff changeset
   874
            genfunc = self.generate(rel)
d96b5e72717c [dataimport] Add safety belt on "gen_rtype" in MetaGenerator, closes #3712892
Vincent Michel <vincent.michel@logilab.fr>
parents: 9696
diff changeset
   875
            if genfunc:
d96b5e72717c [dataimport] Add safety belt on "gen_rtype" in MetaGenerator, closes #3712892
Vincent Michel <vincent.michel@logilab.fr>
parents: 9696
diff changeset
   876
                rels[rel] = genfunc(entity)
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   877
        return entity, rels
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   878
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   879
    def init_entity(self, entity):
10189
0b141ffcdd74 [dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10091
diff changeset
   880
        entity.eid = self.source.create_eid(self._cnx)
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   881
        for attr in self.entity_attrs:
9697
d96b5e72717c [dataimport] Add safety belt on "gen_rtype" in MetaGenerator, closes #3712892
Vincent Michel <vincent.michel@logilab.fr>
parents: 9696
diff changeset
   882
            genfunc = self.generate(attr)
d96b5e72717c [dataimport] Add safety belt on "gen_rtype" in MetaGenerator, closes #3712892
Vincent Michel <vincent.michel@logilab.fr>
parents: 9696
diff changeset
   883
            if genfunc:
d96b5e72717c [dataimport] Add safety belt on "gen_rtype" in MetaGenerator, closes #3712892
Vincent Michel <vincent.michel@logilab.fr>
parents: 9696
diff changeset
   884
                entity.cw_edited.edited_attribute(attr, genfunc(entity))
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   885
9697
d96b5e72717c [dataimport] Add safety belt on "gen_rtype" in MetaGenerator, closes #3712892
Vincent Michel <vincent.michel@logilab.fr>
parents: 9696
diff changeset
   886
    def generate(self, rtype):
d96b5e72717c [dataimport] Add safety belt on "gen_rtype" in MetaGenerator, closes #3712892
Vincent Michel <vincent.michel@logilab.fr>
parents: 9696
diff changeset
   887
        return getattr(self, 'gen_%s' % rtype, None)
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   888
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   889
    def gen_cwuri(self, entity):
9515
b0dd5b57d2d8 [dataimport, migration] more fixes in the spirit of a6c32edabc8d:
Dimitri Papadopoulos <dimitri.papadopoulos@cea.fr>
parents: 9440
diff changeset
   890
        return u'%s%s' % (self.baseurl, entity.eid)
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   891
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   892
    def gen_creation_date(self, entity):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   893
        return self.time
9697
d96b5e72717c [dataimport] Add safety belt on "gen_rtype" in MetaGenerator, closes #3712892
Vincent Michel <vincent.michel@logilab.fr>
parents: 9696
diff changeset
   894
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   895
    def gen_modification_date(self, entity):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   896
        return self.time
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   897
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   898
    def gen_created_by(self, entity):
10189
0b141ffcdd74 [dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10091
diff changeset
   899
        return self._cnx.user.eid
9697
d96b5e72717c [dataimport] Add safety belt on "gen_rtype" in MetaGenerator, closes #3712892
Vincent Michel <vincent.michel@logilab.fr>
parents: 9696
diff changeset
   900
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   901
    def gen_owned_by(self, entity):
10189
0b141ffcdd74 [dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10091
diff changeset
   902
        return self._cnx.user.eid
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   903
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   904
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   905
###########################################################################
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   906
## SQL object store #######################################################
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   907
###########################################################################
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   908
class SQLGenObjectStore(NoHookRQLObjectStore):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   909
    """Controller of the data import process. This version is based
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   910
    on direct insertions throught SQL command (COPY FROM or execute many).
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   911
10189
0b141ffcdd74 [dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10091
diff changeset
   912
    >>> store = SQLGenObjectStore(cnx)
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   913
    >>> store.create_entity('Person', ...)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   914
    >>> store.flush()
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   915
    """
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   916
10189
0b141ffcdd74 [dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10091
diff changeset
   917
    def __init__(self, cnx, dump_output_dir=None, nb_threads_statement=3):
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   918
        """
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   919
        Initialize a SQLGenObjectStore.
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   920
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   921
        Parameters:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   922
10189
0b141ffcdd74 [dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10091
diff changeset
   923
          - cnx: connection on the cubicweb instance
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   924
          - dump_output_dir: a directory to dump failed statements
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   925
            for easier recovery. Default is None (no dump).
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   926
          - nb_threads_statement: number of threads used
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   927
            for SQL insertion (default is 3).
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   928
        """
10189
0b141ffcdd74 [dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10091
diff changeset
   929
        super(SQLGenObjectStore, self).__init__(cnx)
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   930
        ### hijack default source
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   931
        self.source = SQLGenSourceWrapper(
10189
0b141ffcdd74 [dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10091
diff changeset
   932
            self.source, cnx.vreg.schema,
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   933
            dump_output_dir=dump_output_dir,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   934
            nb_threads_statement=nb_threads_statement)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   935
        ### XXX This is done in super().__init__(), but should be
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   936
        ### redone here to link to the correct source
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   937
        self.add_relation = self.source.add_relation
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   938
        self.indexes_etypes = {}
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   939
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   940
    def flush(self):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   941
        """Flush data to the database"""
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   942
        self.source.flush()
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   943
8832
26cdfc6dd6f8 [dataimport] Uniformize the API across the different stores.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents: 8807
diff changeset
   944
    def relate(self, subj_eid, rtype, obj_eid, **kwargs):
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   945
        if subj_eid is None or obj_eid is None:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   946
            return
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   947
        # XXX Could subjtype be inferred ?
10189
0b141ffcdd74 [dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10091
diff changeset
   948
        self.source.add_relation(self._cnx, subj_eid, rtype, obj_eid,
8832
26cdfc6dd6f8 [dataimport] Uniformize the API across the different stores.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents: 8807
diff changeset
   949
                                 self.rschema(rtype).inlined, **kwargs)
9597
8e9db17ce129 [dataimport] Correctly call rschema(rtype) in SqlGenObjectStore, closes #3694139
Vincent Michel <vincent.michel@logilab.fr>
parents: 9536
diff changeset
   950
        if self.rschema(rtype).symmetric:
10189
0b141ffcdd74 [dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10091
diff changeset
   951
            self.source.add_relation(self._cnx, obj_eid, rtype, subj_eid,
9361
0542a85fe667 symmetric relations: replace bogus rql2sql translation by a hook
Aurelien Campeas <aurelien.campeas@logilab.fr>
parents: 9181
diff changeset
   952
                                     self.rschema(rtype).inlined, **kwargs)
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   953
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   954
    def drop_indexes(self, etype):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   955
        """Drop indexes for a given entity type"""
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   956
        if etype not in self.indexes_etypes:
10189
0b141ffcdd74 [dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10091
diff changeset
   957
            cu = self._cnx.cnxset.cu
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   958
            def index_to_attr(index):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   959
                """turn an index name to (database) attribute name"""
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   960
                return index.replace(etype.lower(), '').replace('idx', '').strip('_')
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   961
            indices = [(index, index_to_attr(index))
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   962
                       for index in self.source.dbhelper.list_indices(cu, etype)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   963
                       # Do not consider 'cw_etype_pkey' index
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   964
                       if not index.endswith('key')]
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   965
            self.indexes_etypes[etype] = indices
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   966
        for index, attr in self.indexes_etypes[etype]:
10189
0b141ffcdd74 [dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10091
diff changeset
   967
            self._cnx.system_sql('DROP INDEX %s' % index)
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   968
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   969
    def create_indexes(self, etype):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   970
        """Recreate indexes for a given entity type"""
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   971
        for index, attr in self.indexes_etypes.get(etype, []):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   972
            sql = 'CREATE INDEX %s ON cw_%s(%s)' % (index, etype, attr)
10189
0b141ffcdd74 [dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10091
diff changeset
   973
            self._cnx.system_sql(sql)
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   974
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   975
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   976
###########################################################################
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   977
## SQL Source #############################################################
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   978
###########################################################################
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   979
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   980
class SQLGenSourceWrapper(object):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   981
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   982
    def __init__(self, system_source, schema,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   983
                 dump_output_dir=None, nb_threads_statement=3):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   984
        self.system_source = system_source
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   985
        self._sql = threading.local()
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   986
        # Explicitely backport attributes from system source
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   987
        self._storage_handler = self.system_source._storage_handler
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   988
        self.preprocess_entity = self.system_source.preprocess_entity
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   989
        self.sqlgen = self.system_source.sqlgen
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   990
        self.uri = self.system_source.uri
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   991
        self.eid = self.system_source.eid
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   992
        # Directory to write temporary files
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   993
        self.dump_output_dir = dump_output_dir
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   994
        # Allow to execute code with SQLite backend that does
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   995
        # not support (yet...) copy_from
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   996
        # XXX Should be dealt with in logilab.database
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   997
        spcfrom = system_source.dbhelper.dbapi_module.support_copy_from
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   998
        self.support_copy_from = spcfrom
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   999
        self.dbencoding = system_source.dbhelper.dbencoding
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1000
        self.nb_threads_statement = nb_threads_statement
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1001
        # initialize thread-local data for main thread
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1002
        self.init_thread_locals()
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1003
        self._inlined_rtypes_cache = {}
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1004
        self._fill_inlined_rtypes_cache(schema)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1005
        self.schema = schema
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1006
        self.do_fti = False
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1007
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1008
    def _fill_inlined_rtypes_cache(self, schema):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1009
        cache = self._inlined_rtypes_cache
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1010
        for eschema in schema.entities():
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1011
            for rschema in eschema.ordered_relations():
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1012
                if rschema.inlined:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1013
                    cache[eschema.type] = SQL_PREFIX + rschema.type
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1014
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1015
    def init_thread_locals(self):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1016
        """initializes thread-local data"""
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1017
        self._sql.entities = defaultdict(list)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1018
        self._sql.relations = {}
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1019
        self._sql.inlined_relations = {}
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1020
        # keep track, for each eid of the corresponding data dict
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1021
        self._sql.eid_insertdicts = {}
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1022
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1023
    def flush(self):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1024
        print 'starting flush'
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1025
        _entities_sql = self._sql.entities
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1026
        _relations_sql = self._sql.relations
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1027
        _inlined_relations_sql = self._sql.inlined_relations
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1028
        _insertdicts = self._sql.eid_insertdicts
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1029
        try:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1030
            # try, for each inlined_relation, to find if we're also creating
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1031
            # the host entity (i.e. the subject of the relation).
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1032
            # In that case, simply update the insert dict and remove
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1033
            # the need to make the
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1034
            # UPDATE statement
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1035
            for statement, datalist in _inlined_relations_sql.iteritems():
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1036
                new_datalist = []
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1037
                # for a given inlined relation,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1038
                # browse each couple to be inserted
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1039
                for data in datalist:
8696
0bb18407c053 [toward py3k] rewrite dict.keys() and dict.values() (part of #2711624)
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 8695
diff changeset
  1040
                    keys = list(data)
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1041
                    # For inlined relations, it exists only two case:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1042
                    # (rtype, cw_eid) or (cw_eid, rtype)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1043
                    if keys[0] == 'cw_eid':
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1044
                        rtype = keys[1]
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1045
                    else:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1046
                        rtype = keys[0]
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1047
                    updated_eid = data['cw_eid']
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1048
                    if updated_eid in _insertdicts:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1049
                        _insertdicts[updated_eid][rtype] = data[rtype]
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1050
                    else:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1051
                        # could not find corresponding insert dict, keep the
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1052
                        # UPDATE query
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1053
                        new_datalist.append(data)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1054
                _inlined_relations_sql[statement] = new_datalist
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1055
            _import_statements(self.system_source.get_connection,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1056
                               _entities_sql.items()
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1057
                               + _relations_sql.items()
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1058
                               + _inlined_relations_sql.items(),
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1059
                               dump_output_dir=self.dump_output_dir,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1060
                               nb_threads=self.nb_threads_statement,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1061
                               support_copy_from=self.support_copy_from,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1062
                               encoding=self.dbencoding)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1063
        finally:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1064
            _entities_sql.clear()
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1065
            _relations_sql.clear()
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1066
            _insertdicts.clear()
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1067
            _inlined_relations_sql.clear()
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1068
10189
0b141ffcdd74 [dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10091
diff changeset
  1069
    def add_relation(self, cnx, subject, rtype, object,
8832
26cdfc6dd6f8 [dataimport] Uniformize the API across the different stores.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents: 8807
diff changeset
  1070
                     inlined=False, **kwargs):
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1071
        if inlined:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1072
            _sql = self._sql.inlined_relations
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1073
            data = {'cw_eid': subject, SQL_PREFIX + rtype: object}
8832
26cdfc6dd6f8 [dataimport] Uniformize the API across the different stores.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents: 8807
diff changeset
  1074
            subjtype = kwargs.get('subjtype')
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1075
            if subjtype is None:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1076
                # Try to infer it
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1077
                targets = [t.type for t in
9425
d7e8293fa4de [dataimport] The subjtype should be the subject of a relation, not the object, closes #3365113
Vincent Michel <vincent.michel@logilab.fr>
parents: 9181
diff changeset
  1078
                           self.schema.rschema(rtype).subjects()]
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1079
                if len(targets) == 1:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1080
                    subjtype = targets[0]
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1081
                else:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1082
                    raise ValueError('You should give the subject etype for '
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1083
                                     'inlined relation %s'
8835
3612b760488b [dataimport] Slight message modification in exception handling code.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents: 8834
diff changeset
  1084
                                     ', as it cannot be inferred: '
3612b760488b [dataimport] Slight message modification in exception handling code.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents: 8834
diff changeset
  1085
                                     'this type is given as keyword argument '
3612b760488b [dataimport] Slight message modification in exception handling code.
Vladimir Popescu <vladimir.popescu@logilab.fr>
parents: 8834
diff changeset
  1086
                                     '``subjtype``'% rtype)
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1087
            statement = self.sqlgen.update(SQL_PREFIX + subjtype,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1088
                                           data, ['cw_eid'])
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1089
        else:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1090
            _sql = self._sql.relations
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1091
            data = {'eid_from': subject, 'eid_to': object}
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1092
            statement = self.sqlgen.insert('%s_relation' % rtype, data)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1093
        if statement in _sql:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1094
            _sql[statement].append(data)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1095
        else:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1096
            _sql[statement] = [data]
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1097
10189
0b141ffcdd74 [dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10091
diff changeset
  1098
    def add_entity(self, cnx, entity):
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1099
        with self._storage_handler(entity, 'added'):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1100
            attrs = self.preprocess_entity(entity)
8900
010a59e12d89 use cw_etype instead of __regid__
Pierre-Yves David <pierre-yves.david@logilab.fr>
parents: 8835
diff changeset
  1101
            rtypes = self._inlined_rtypes_cache.get(entity.cw_etype, ())
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1102
            if isinstance(rtypes, str):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1103
                rtypes = (rtypes,)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1104
            for rtype in rtypes:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1105
                if rtype not in attrs:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1106
                    attrs[rtype] = None
8900
010a59e12d89 use cw_etype instead of __regid__
Pierre-Yves David <pierre-yves.david@logilab.fr>
parents: 8835
diff changeset
  1107
            sql = self.sqlgen.insert(SQL_PREFIX + entity.cw_etype, attrs)
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1108
            self._sql.eid_insertdicts[entity.eid] = attrs
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1109
            self._append_to_entities(sql, attrs)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1110
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1111
    def _append_to_entities(self, sql, attrs):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1112
        self._sql.entities[sql].append(attrs)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1113
10189
0b141ffcdd74 [dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10091
diff changeset
  1114
    def _handle_insert_entity_sql(self, cnx, sql, attrs):
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1115
        # We have to overwrite the source given in parameters
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1116
        # as here, we directly use the system source
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1117
        attrs['asource'] = self.system_source.uri
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1118
        self._append_to_entities(sql, attrs)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1119
10189
0b141ffcdd74 [dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10091
diff changeset
  1120
    def _handle_is_relation_sql(self, cnx, sql, attrs):
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1121
        self._append_to_entities(sql, attrs)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1122
10189
0b141ffcdd74 [dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10091
diff changeset
  1123
    def _handle_is_instance_of_sql(self, cnx, sql, attrs):
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1124
        self._append_to_entities(sql, attrs)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1125
10189
0b141ffcdd74 [dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10091
diff changeset
  1126
    def _handle_source_relation_sql(self, cnx, sql, attrs):
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1127
        self._append_to_entities(sql, attrs)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1128
9522
8154a5748194 [dataimport] fix comment
Aurelien Campeas <aurelien.campeas@logilab.fr>
parents: 9425
diff changeset
  1129
    # add_info is _copypasted_ from the one in NativeSQLSource. We want it
8154a5748194 [dataimport] fix comment
Aurelien Campeas <aurelien.campeas@logilab.fr>
parents: 9425
diff changeset
  1130
    # there because it will use the _handlers of the SQLGenSourceWrapper, which
8154a5748194 [dataimport] fix comment
Aurelien Campeas <aurelien.campeas@logilab.fr>
parents: 9425
diff changeset
  1131
    # are not like the ones in the native source.
10190
252e8f7ff9ea [dataimport] source.add_info doesn't take anymore a 'complete' argument
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10189
diff changeset
  1132
    def add_info(self, cnx, entity, source, extid):
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1133
        """add type and source info for an eid into the system table"""
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1134
        # begin by inserting eid/type/source/extid into the entities table
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1135
        if extid is not None:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1136
            assert isinstance(extid, str)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1137
            extid = b64encode(extid)
8900
010a59e12d89 use cw_etype instead of __regid__
Pierre-Yves David <pierre-yves.david@logilab.fr>
parents: 8835
diff changeset
  1138
        attrs = {'type': entity.cw_etype, 'eid': entity.eid, 'extid': extid,
9827
c7ce035aede8 [dataimport] Drop reference to the 'source' column (closes #4067694).
Damien Garaud <damien.garaud@logilab.fr>
parents: 9770
diff changeset
  1139
                 'asource': source.uri}
10189
0b141ffcdd74 [dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10091
diff changeset
  1140
        self._handle_insert_entity_sql(cnx, self.sqlgen.insert('entities', attrs), attrs)
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1141
        # insert core relations: is, is_instance_of and cw_source
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1142
        try:
10189
0b141ffcdd74 [dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10091
diff changeset
  1143
            self._handle_is_relation_sql(cnx, 'INSERT INTO is_relation(eid_from,eid_to) VALUES (%s,%s)',
0b141ffcdd74 [dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10091
diff changeset
  1144
                                         (entity.eid, eschema_eid(cnx, entity.e_schema)))
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1145
        except IndexError:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1146
            # during schema serialization, skip
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1147
            pass
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1148
        else:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1149
            for eschema in entity.e_schema.ancestors() + [entity.e_schema]:
10189
0b141ffcdd74 [dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10091
diff changeset
  1150
                self._handle_is_relation_sql(cnx,
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1151
                                             'INSERT INTO is_instance_of_relation(eid_from,eid_to) VALUES (%s,%s)',
10189
0b141ffcdd74 [dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10091
diff changeset
  1152
                                             (entity.eid, eschema_eid(cnx, eschema)))
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1153
        if 'CWSource' in self.schema and source.eid is not None: # else, cw < 3.10
10189
0b141ffcdd74 [dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10091
diff changeset
  1154
            self._handle_is_relation_sql(cnx, 'INSERT INTO cw_source_relation(eid_from,eid_to) VALUES (%s,%s)',
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1155
                                         (entity.eid, source.eid))
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1156
        # now we can update the full text index
8900
010a59e12d89 use cw_etype instead of __regid__
Pierre-Yves David <pierre-yves.david@logilab.fr>
parents: 8835
diff changeset
  1157
        if self.do_fti and self.need_fti_indexation(entity.cw_etype):
10189
0b141ffcdd74 [dataimport] massive renaming of session to cnx
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10091
diff changeset
  1158
            self.index_entity(cnx, entity=entity)