dataimport.py
author Pierre-Yves David <pierre-yves.david@logilab.fr>
Fri, 22 Mar 2013 18:51:03 +0100
changeset 8763 0144b26e958d
parent 8724 1beab80aed23
child 8796 2884368b263c
permissions -rw-r--r--
[transaction] handle ``mode`` default value in Transaction The transaction mode is now explicitly passed at creation time and always read from the Transaction object. Note that there is a slight behavior change. The transaction mode is now set at the creation of the transaction. Changes made to the default value have no longer any effect on existing transaction.
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
     1
# -*- coding: utf-8 -*-
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
     2
# copyright 2003-2012 LOGILAB S.A. (Paris, FRANCE), all rights reserved.
5421
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5097
diff changeset
     3
# contact http://www.logilab.fr/ -- mailto:contact@logilab.fr
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5097
diff changeset
     4
#
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5097
diff changeset
     5
# This file is part of CubicWeb.
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5097
diff changeset
     6
#
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5097
diff changeset
     7
# CubicWeb is free software: you can redistribute it and/or modify it under the
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5097
diff changeset
     8
# terms of the GNU Lesser General Public License as published by the Free
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5097
diff changeset
     9
# Software Foundation, either version 2.1 of the License, or (at your option)
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5097
diff changeset
    10
# any later version.
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5097
diff changeset
    11
#
5424
8ecbcbff9777 replace logilab-common by CubicWeb in disclaimer
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5421
diff changeset
    12
# CubicWeb is distributed in the hope that it will be useful, but WITHOUT
5421
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5097
diff changeset
    13
# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5097
diff changeset
    14
# FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public License for more
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5097
diff changeset
    15
# details.
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5097
diff changeset
    16
#
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5097
diff changeset
    17
# You should have received a copy of the GNU Lesser General Public License along
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5097
diff changeset
    18
# with CubicWeb.  If not, see <http://www.gnu.org/licenses/>.
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    19
"""This module provides tools to import tabular data.
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    20
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    21
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    22
Example of use (run this with `cubicweb-ctl shell instance import-script.py`):
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    23
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    24
.. sourcecode:: python
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    25
7158
0f31a50b144e [dataimport] cleanups, update docstring to up-to-date usage
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7118
diff changeset
    26
  from cubicweb.dataimport import *
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    27
  # define data generators
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    28
  GENERATORS = []
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    29
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    30
  USERS = [('Prenom', 'firstname', ()),
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    31
           ('Nom', 'surname', ()),
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    32
           ('Identifiant', 'login', ()),
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    33
           ]
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    34
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    35
  def gen_users(ctl):
6133
6f3eabbbdf2e use iter_and_commit in example
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 6122
diff changeset
    36
      for row in ctl.iter_and_commit('utilisateurs'):
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    37
          entity = mk_entity(row, USERS)
7158
0f31a50b144e [dataimport] cleanups, update docstring to up-to-date usage
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7118
diff changeset
    38
          entity['upassword'] = 'motdepasse'
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    39
          ctl.check('login', entity['login'], None)
7158
0f31a50b144e [dataimport] cleanups, update docstring to up-to-date usage
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7118
diff changeset
    40
          entity = ctl.store.create_entity('CWUser', **entity)
0f31a50b144e [dataimport] cleanups, update docstring to up-to-date usage
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7118
diff changeset
    41
          email = ctl.store.create_entity('EmailAddress', address=row['email'])
0f31a50b144e [dataimport] cleanups, update docstring to up-to-date usage
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7118
diff changeset
    42
          ctl.store.relate(entity.eid, 'use_email', email.eid)
3003
2944ee420dca R [dataimport] rename uid to eid
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 2974
diff changeset
    43
          ctl.store.rql('SET U in_group G WHERE G name "users", U eid %(x)s', {'x':entity['eid']})
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    44
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    45
  CHK = [('login', check_doubles, 'Utilisateurs Login',
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    46
          'Deux utilisateurs ne devraient pas avoir le même login.'),
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    47
         ]
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    48
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    49
  GENERATORS.append( (gen_users, CHK) )
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    50
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    51
  # create controller
6122
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
    52
  if 'cnx' in globals():
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
    53
      ctl = CWImportController(RQLObjectStore(cnx))
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
    54
  else:
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
    55
      print 'debug mode (not connected)'
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
    56
      print 'run through cubicweb-ctl shell to access an instance'
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
    57
      ctl = CWImportController(ObjectStore())
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
    58
  ctl.askerror = 1
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    59
  ctl.generators = GENERATORS
6122
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
    60
  ctl.data['utilisateurs'] = lazytable(ucsvreader(open('users.csv')))
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    61
  # run
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    62
  ctl.run()
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    63
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
    64
.. BUG file with one column are not parsable
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
    65
.. TODO rollback() invocation is not possible yet
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    66
"""
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    67
__docformat__ = "restructuredtext en"
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    68
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
    69
import csv
4186
ca7e526b07b6 import cleanup, check data file exists
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4173
diff changeset
    70
import sys
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
    71
import threading
4186
ca7e526b07b6 import cleanup, check data file exists
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4173
diff changeset
    72
import traceback
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
    73
import cPickle
4186
ca7e526b07b6 import cleanup, check data file exists
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4173
diff changeset
    74
import os.path as osp
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
    75
from collections import defaultdict
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
    76
from contextlib import contextmanager
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
    77
from copy import copy
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
    78
from datetime import date, datetime
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
    79
from time import asctime
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
    80
from StringIO import StringIO
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    81
7159
3bcccd3ab6b6 [dataimport] ObjectStore.create_entity should return something that looks like an entity (eg no more using dict protocol to access to attributes)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7158
diff changeset
    82
from logilab.common import shellutils, attrdict
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
    83
from logilab.common.date import strptime
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
    84
from logilab.common.decorators import cached
4136
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
    85
from logilab.common.deprecation import deprecated
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    86
7171
4297be67bbe4 [dataimport] tell more and nicely about transaction status
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7170
diff changeset
    87
from cubicweb import QueryError
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
    88
from cubicweb.utils import make_uid
7158
0f31a50b144e [dataimport] cleanups, update docstring to up-to-date usage
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7118
diff changeset
    89
from cubicweb.schema import META_RTYPES, VIRTUAL_RTYPES
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
    90
from cubicweb.server.edition import EditedEntity
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
    91
from cubicweb.server.sqlutils import SQL_PREFIX
5066
bf5cbc351e99 [repo] move eschema_eid function from hooks.metadata to server.utils
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5063
diff changeset
    92
from cubicweb.server.utils import eschema_eid
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
    93
7158
0f31a50b144e [dataimport] cleanups, update docstring to up-to-date usage
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7118
diff changeset
    94
6122
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
    95
def count_lines(stream_or_filename):
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
    96
    if isinstance(stream_or_filename, basestring):
6492
47a284c0d012 fix some pylint detected errors
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 6427
diff changeset
    97
        f = open(stream_or_filename)
6122
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
    98
    else:
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
    99
        f = stream_or_filename
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
   100
        f.seek(0)
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
   101
    for i, line in enumerate(f):
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
   102
        pass
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
   103
    f.seek(0)
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
   104
    return i+1
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
   105
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
   106
def ucsvreader_pb(stream_or_path, encoding='utf-8', separator=',', quote='"',
4136
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   107
                  skipfirst=False, withpb=True):
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   108
    """same as ucsvreader but a progress bar is displayed as we iter on rows"""
6122
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
   109
    if isinstance(stream_or_path, basestring):
6492
47a284c0d012 fix some pylint detected errors
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 6427
diff changeset
   110
        if not osp.exists(stream_or_path):
47a284c0d012 fix some pylint detected errors
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 6427
diff changeset
   111
            raise Exception("file doesn't exists: %s" % stream_or_path)
6122
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
   112
        stream = open(stream_or_path)
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
   113
    else:
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
   114
        stream = stream_or_path
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
   115
    rowcount = count_lines(stream)
4136
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   116
    if skipfirst:
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   117
        rowcount -= 1
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   118
    if withpb:
4140
46ddd27a4ca4 tweaks output
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4136
diff changeset
   119
        pb = shellutils.ProgressBar(rowcount, 50)
6122
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
   120
    for urow in ucsvreader(stream, encoding, separator, quote, skipfirst):
4136
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   121
        yield urow
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   122
        if withpb:
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   123
            pb.update()
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   124
    print ' %s rows imported' % rowcount
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   125
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   126
def ucsvreader(stream, encoding='utf-8', separator=',', quote='"',
8637
e16561083d84 [dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8631
diff changeset
   127
               skipfirst=False, ignore_errors=False):
4136
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   128
    """A csv reader that accepts files with any encoding and outputs unicode
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   129
    strings
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   130
    """
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   131
    it = iter(csv.reader(stream, delimiter=separator, quotechar=quote))
8637
e16561083d84 [dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8631
diff changeset
   132
    if not ignore_errors:
e16561083d84 [dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8631
diff changeset
   133
        if skipfirst:
e16561083d84 [dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8631
diff changeset
   134
            it.next()
e16561083d84 [dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8631
diff changeset
   135
        for row in it:
e16561083d84 [dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8631
diff changeset
   136
            yield [item.decode(encoding) for item in row]
e16561083d84 [dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8631
diff changeset
   137
    else:
e16561083d84 [dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8631
diff changeset
   138
        # Skip first line
e16561083d84 [dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8631
diff changeset
   139
        try:
e16561083d84 [dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8631
diff changeset
   140
            row = it.next()
e16561083d84 [dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8631
diff changeset
   141
        except csv.Error:
e16561083d84 [dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8631
diff changeset
   142
            pass
e16561083d84 [dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8631
diff changeset
   143
        # Safe version, that can cope with error in CSV file
e16561083d84 [dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8631
diff changeset
   144
        while True:
e16561083d84 [dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8631
diff changeset
   145
            try:
e16561083d84 [dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8631
diff changeset
   146
                row = it.next()
e16561083d84 [dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8631
diff changeset
   147
            # End of CSV, break
e16561083d84 [dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8631
diff changeset
   148
            except StopIteration:
e16561083d84 [dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8631
diff changeset
   149
                break
e16561083d84 [dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8631
diff changeset
   150
            # Error in CSV, ignore line and continue
e16561083d84 [dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8631
diff changeset
   151
            except csv.Error:
e16561083d84 [dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8631
diff changeset
   152
                continue
e16561083d84 [dataimport] new ignore_errors argument to ucsvreader, default to False. Closes #2547200
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8631
diff changeset
   153
            yield [item.decode(encoding) for item in row]
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   154
6136
79da6f969b15 [dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 6133
diff changeset
   155
def callfunc_every(func, number, iterable):
79da6f969b15 [dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 6133
diff changeset
   156
    """yield items of `iterable` one by one and call function `func`
79da6f969b15 [dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 6133
diff changeset
   157
    every `number` iterations. Always call function `func` at the end.
79da6f969b15 [dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 6133
diff changeset
   158
    """
79da6f969b15 [dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 6133
diff changeset
   159
    for idx, item in enumerate(iterable):
79da6f969b15 [dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 6133
diff changeset
   160
        yield item
7227
23d9c1f89c96 [dataimport] actually commit every desired number...
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7214
diff changeset
   161
        if not idx % number:
6136
79da6f969b15 [dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 6133
diff changeset
   162
            func()
79da6f969b15 [dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 6133
diff changeset
   163
    func()
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   164
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   165
def lazytable(reader):
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   166
    """The first row is taken to be the header of the table and
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   167
    used to output a dict for each row of data.
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   168
6122
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
   169
    >>> data = lazytable(ucsvreader(open(filename)))
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   170
    """
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   171
    header = reader.next()
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   172
    for row in reader:
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   173
        yield dict(zip(header, row))
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   174
7201
52f5831400b2 [dataimport] allow to specify columns on which result should be sorted in lazydbtable
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7171
diff changeset
   175
def lazydbtable(cu, table, headers, orderby=None):
7160
923013173031 [dataimport] new 'lazydbtable' generator function to feed data from a database table
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7159
diff changeset
   176
    """return an iterator on rows of a sql table. On each row, fetch columns
923013173031 [dataimport] new 'lazydbtable' generator function to feed data from a database table
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7159
diff changeset
   177
    defined in headers and return values as a dictionary.
923013173031 [dataimport] new 'lazydbtable' generator function to feed data from a database table
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7159
diff changeset
   178
923013173031 [dataimport] new 'lazydbtable' generator function to feed data from a database table
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7159
diff changeset
   179
    >>> data = lazydbtable(cu, 'experimentation', ('id', 'nickname', 'gps'))
923013173031 [dataimport] new 'lazydbtable' generator function to feed data from a database table
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7159
diff changeset
   180
    """
7201
52f5831400b2 [dataimport] allow to specify columns on which result should be sorted in lazydbtable
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7171
diff changeset
   181
    sql = 'SELECT %s FROM %s' % (','.join(headers), table,)
52f5831400b2 [dataimport] allow to specify columns on which result should be sorted in lazydbtable
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7171
diff changeset
   182
    if orderby:
52f5831400b2 [dataimport] allow to specify columns on which result should be sorted in lazydbtable
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7171
diff changeset
   183
        sql += ' ORDER BY %s' % ','.join(orderby)
52f5831400b2 [dataimport] allow to specify columns on which result should be sorted in lazydbtable
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7171
diff changeset
   184
    cu.execute(sql)
7160
923013173031 [dataimport] new 'lazydbtable' generator function to feed data from a database table
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7159
diff changeset
   185
    while True:
923013173031 [dataimport] new 'lazydbtable' generator function to feed data from a database table
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7159
diff changeset
   186
        row = cu.fetchone()
923013173031 [dataimport] new 'lazydbtable' generator function to feed data from a database table
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7159
diff changeset
   187
        if row is None:
923013173031 [dataimport] new 'lazydbtable' generator function to feed data from a database table
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7159
diff changeset
   188
            break
923013173031 [dataimport] new 'lazydbtable' generator function to feed data from a database table
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7159
diff changeset
   189
        yield dict(zip(headers, row))
923013173031 [dataimport] new 'lazydbtable' generator function to feed data from a database table
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7159
diff changeset
   190
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   191
def mk_entity(row, map):
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   192
    """Return a dict made from sanitized mapped values.
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   193
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   194
    ValueError can be raised on unexpected values found in checkers
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   195
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   196
    >>> row = {'myname': u'dupont'}
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   197
    >>> map = [('myname', u'name', (call_transform_method('title'),))]
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   198
    >>> mk_entity(row, map)
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   199
    {'name': u'Dupont'}
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   200
    >>> row = {'myname': u'dupont', 'optname': u''}
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   201
    >>> map = [('myname', u'name', (call_transform_method('title'),)),
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   202
    ...        ('optname', u'MARKER', (optional,))]
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   203
    >>> mk_entity(row, map)
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   204
    {'name': u'Dupont', 'optname': None}
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   205
    """
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   206
    res = {}
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   207
    assert isinstance(row, dict)
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   208
    assert isinstance(map, list)
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   209
    for src, dest, funcs in map:
8406
f3bc8ca0b715 [data import] don't crash if value isn't in the file, simply no key/value in the output dict. Closes #2356328
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8403
diff changeset
   210
        try:
f3bc8ca0b715 [data import] don't crash if value isn't in the file, simply no key/value in the output dict. Closes #2356328
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8403
diff changeset
   211
            res[dest] = row[src]
f3bc8ca0b715 [data import] don't crash if value isn't in the file, simply no key/value in the output dict. Closes #2356328
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8403
diff changeset
   212
        except KeyError:
f3bc8ca0b715 [data import] don't crash if value isn't in the file, simply no key/value in the output dict. Closes #2356328
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 8403
diff changeset
   213
            continue
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   214
        try:
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   215
            for func in funcs:
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   216
                res[dest] = func(res[dest])
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   217
                if res[dest] is None:
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   218
                    break
8695
358d8bed9626 [toward-py3k] rewrite to "except AnException as exc:" (part of #2711624)
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 8637
diff changeset
   219
        except ValueError as err:
7170
32b5d9d43a7e [dataimport] propagate stack
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7160
diff changeset
   220
            raise ValueError('error with %r field: %s' % (src, err)), None, sys.exc_info()[-1]
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   221
    return res
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   222
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   223
# user interactions ############################################################
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   224
3029
bc573d5fb5b7 F [devtools] by default dataimport prints message on stdout
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 3003
diff changeset
   225
def tell(msg):
bc573d5fb5b7 F [devtools] by default dataimport prints message on stdout
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 3003
diff changeset
   226
    print msg
bc573d5fb5b7 F [devtools] by default dataimport prints message on stdout
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 3003
diff changeset
   227
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   228
def confirm(question):
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   229
    """A confirm function that asks for yes/no/abort and exits on abort."""
4721
8f63691ccb7f pylint style fixes
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4613
diff changeset
   230
    answer = shellutils.ASK.ask(question, ('Y', 'n', 'abort'), 'Y')
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   231
    if answer == 'abort':
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   232
        sys.exit(1)
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   233
    return answer == 'Y'
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   234
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   235
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   236
class catch_error(object):
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   237
    """Helper for @contextmanager decorator."""
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   238
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   239
    def __init__(self, ctl, key='unexpected error', msg=None):
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   240
        self.ctl = ctl
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   241
        self.key = key
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   242
        self.msg = msg
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   243
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   244
    def __enter__(self):
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   245
        return self
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   246
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   247
    def __exit__(self, type, value, traceback):
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   248
        if type is not None:
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   249
            if issubclass(type, (KeyboardInterrupt, SystemExit)):
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   250
                return # re-raise
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   251
            if self.ctl.catcherrors:
4173
cfd5d3270f99 msg isn't defined there, but we've to give traceback information to record error
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4152
diff changeset
   252
                self.ctl.record_error(self.key, None, type, value, traceback)
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   253
                return True # silent
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   254
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   255
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   256
# base sanitizing/coercing functions ###########################################
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   257
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   258
def optional(value):
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   259
    """checker to filter optional field
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   260
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   261
    If value is undefined (ex: empty string), return None that will
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   262
    break the checkers validation chain
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   263
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   264
    General use is to add 'optional' check in first condition to avoid
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   265
    ValueError by further checkers
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   266
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   267
    >>> MAPPER = [(u'value', 'value', (optional, int))]
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   268
    >>> row = {'value': u'XXX'}
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   269
    >>> mk_entity(row, MAPPER)
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   270
    {'value': None}
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   271
    >>> row = {'value': u'100'}
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   272
    >>> mk_entity(row, MAPPER)
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   273
    {'value': 100}
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   274
    """
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   275
    if value:
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   276
        return value
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   277
    return None
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   278
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   279
def required(value):
6136
79da6f969b15 [dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 6133
diff changeset
   280
    """raise ValueError if value is empty
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   281
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   282
    This check should be often found in last position in the chain.
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   283
    """
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   284
    if value:
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   285
        return value
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   286
    raise ValueError("required")
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   287
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   288
def todatetime(format='%d/%m/%Y'):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   289
    """return a transformation function to turn string input value into a
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   290
    `datetime.datetime` instance, using given format.
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   291
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   292
    Follow it by `todate` or `totime` functions from `logilab.common.date` if
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   293
    you want a `date`/`time` instance instead of `datetime`.
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   294
    """
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   295
    def coerce(value):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   296
        return strptime(value, format)
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   297
    return coerce
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   298
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   299
def call_transform_method(methodname, *args, **kwargs):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   300
    """return value returned by calling the given method on input"""
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   301
    def coerce(value):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   302
        return getattr(value, methodname)(*args, **kwargs)
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   303
    return coerce
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   304
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   305
def call_check_method(methodname, *args, **kwargs):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   306
    """check value returned by calling the given method on input is true,
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   307
    else raise ValueError
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   308
    """
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   309
    def check(value):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   310
        if getattr(value, methodname)(*args, **kwargs):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   311
            return value
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   312
        raise ValueError('%s not verified on %r' % (methodname, value))
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   313
    return check
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   314
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   315
# base integrity checking functions ############################################
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   316
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   317
def check_doubles(buckets):
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   318
    """Extract the keys that have more than one item in their bucket."""
4721
8f63691ccb7f pylint style fixes
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4613
diff changeset
   319
    return [(k, len(v)) for k, v in buckets.items() if len(v) > 1]
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   320
4136
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   321
def check_doubles_not_none(buckets):
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   322
    """Extract the keys that have more than one item in their bucket."""
4721
8f63691ccb7f pylint style fixes
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4613
diff changeset
   323
    return [(k, len(v)) for k, v in buckets.items()
8f63691ccb7f pylint style fixes
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4613
diff changeset
   324
            if k is not None and len(v) > 1]
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   325
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   326
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   327
# sql generator utility functions #############################################
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   328
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   329
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   330
def _import_statements(sql_connect, statements, nb_threads=3,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   331
                       dump_output_dir=None,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   332
                       support_copy_from=True, encoding='utf-8'):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   333
    """
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   334
    Import a bunch of sql statements, using different threads.
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   335
    """
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   336
    try:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   337
        chunksize = (len(statements) / nb_threads) + 1
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   338
        threads = []
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   339
        for i in xrange(nb_threads):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   340
            chunks = statements[i*chunksize:(i+1)*chunksize]
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   341
            thread = threading.Thread(target=_execmany_thread,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   342
                                      args=(sql_connect, chunks,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   343
                                            dump_output_dir,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   344
                                            support_copy_from,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   345
                                            encoding))
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   346
            thread.start()
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   347
            threads.append(thread)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   348
        for t in threads:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   349
            t.join()
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   350
    except Exception:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   351
        print 'Error in import statements'
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   352
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   353
def _execmany_thread_not_copy_from(cu, statement, data, table=None,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   354
                                   columns=None, encoding='utf-8'):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   355
    """ Execute thread without copy from
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   356
    """
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   357
    cu.executemany(statement, data)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   358
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   359
def _execmany_thread_copy_from(cu, statement, data, table,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   360
                               columns, encoding='utf-8'):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   361
    """ Execute thread with copy from
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   362
    """
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   363
    buf = _create_copyfrom_buffer(data, columns, encoding)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   364
    if buf is None:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   365
        _execmany_thread_not_copy_from(cu, statement, data)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   366
    else:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   367
        if columns is None:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   368
            cu.copy_from(buf, table, null='NULL')
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   369
        else:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   370
            cu.copy_from(buf, table, null='NULL', columns=columns)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   371
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   372
def _execmany_thread(sql_connect, statements, dump_output_dir=None,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   373
                     support_copy_from=True, encoding='utf-8'):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   374
    """
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   375
    Execute sql statement. If 'INSERT INTO', try to use 'COPY FROM' command,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   376
    or fallback to execute_many.
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   377
    """
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   378
    if support_copy_from:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   379
        execmany_func = _execmany_thread_copy_from
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   380
    else:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   381
        execmany_func = _execmany_thread_not_copy_from
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   382
    cnx = sql_connect()
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   383
    cu = cnx.cursor()
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   384
    try:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   385
        for statement, data in statements:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   386
            table = None
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   387
            columns = None
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   388
            try:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   389
                if not statement.startswith('INSERT INTO'):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   390
                    cu.executemany(statement, data)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   391
                    continue
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   392
                table = statement.split()[2]
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   393
                if isinstance(data[0], (tuple, list)):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   394
                    columns = None
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   395
                else:
8696
0bb18407c053 [toward py3k] rewrite dict.keys() and dict.values() (part of #2711624)
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 8695
diff changeset
   396
                    columns = list(data[0])
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   397
                execmany_func(cu, statement, data, table, columns, encoding)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   398
            except Exception:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   399
                print 'unable to copy data into table %s', table
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   400
                # Error in import statement, save data in dump_output_dir
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   401
                if dump_output_dir is not None:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   402
                    pdata = {'data': data, 'statement': statement,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   403
                             'time': asctime(), 'columns': columns}
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   404
                    filename = make_uid()
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   405
                    try:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   406
                        with open(osp.join(dump_output_dir,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   407
                                           '%s.pickle' % filename), 'w') as fobj:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   408
                            fobj.write(cPickle.dumps(pdata))
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   409
                    except IOError:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   410
                        print 'ERROR while pickling in', dump_output_dir, filename+'.pickle'
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   411
                        pass
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   412
                cnx.rollback()
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   413
                raise
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   414
    finally:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   415
        cnx.commit()
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   416
        cu.close()
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   417
8631
1053b9d0fdf7 [dataimport] Allow to replace escape char in the create_copyfrom_buffer
Vincent Michel <vincent.michel@logilab.fr>
parents: 8625
diff changeset
   418
def _create_copyfrom_buffer(data, columns, encoding='utf-8', replace_sep=None):
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   419
    """
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   420
    Create a StringIO buffer for 'COPY FROM' command.
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   421
    Deals with Unicode, Int, Float, Date...
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   422
    """
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   423
    # Create a list rather than directly create a StringIO
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   424
    # to correctly write lines separated by '\n' in a single step
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   425
    rows = []
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   426
    if isinstance(data[0], (tuple, list)):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   427
        columns = range(len(data[0]))
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   428
    for row in data:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   429
        # Iterate over the different columns and the different values
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   430
        # and try to convert them to a correct datatype.
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   431
        # If an error is raised, do not continue.
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   432
        formatted_row = []
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   433
        for col in columns:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   434
            value = row[col]
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   435
            if value is None:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   436
                value = 'NULL'
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   437
            elif isinstance(value, (long, int, float)):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   438
                value = str(value)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   439
            elif isinstance(value, (str, unicode)):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   440
                # Remove separators used in string formatting
8631
1053b9d0fdf7 [dataimport] Allow to replace escape char in the create_copyfrom_buffer
Vincent Michel <vincent.michel@logilab.fr>
parents: 8625
diff changeset
   441
                for _char in (u'\t', u'\r', u'\n'):
1053b9d0fdf7 [dataimport] Allow to replace escape char in the create_copyfrom_buffer
Vincent Michel <vincent.michel@logilab.fr>
parents: 8625
diff changeset
   442
                    if _char in value:
1053b9d0fdf7 [dataimport] Allow to replace escape char in the create_copyfrom_buffer
Vincent Michel <vincent.michel@logilab.fr>
parents: 8625
diff changeset
   443
                        # If a replace_sep is given, replace
1053b9d0fdf7 [dataimport] Allow to replace escape char in the create_copyfrom_buffer
Vincent Michel <vincent.michel@logilab.fr>
parents: 8625
diff changeset
   444
                        # the separator instead of returning None
1053b9d0fdf7 [dataimport] Allow to replace escape char in the create_copyfrom_buffer
Vincent Michel <vincent.michel@logilab.fr>
parents: 8625
diff changeset
   445
                        # (and thus avoid empty buffer)
1053b9d0fdf7 [dataimport] Allow to replace escape char in the create_copyfrom_buffer
Vincent Michel <vincent.michel@logilab.fr>
parents: 8625
diff changeset
   446
                        if replace_sep:
1053b9d0fdf7 [dataimport] Allow to replace escape char in the create_copyfrom_buffer
Vincent Michel <vincent.michel@logilab.fr>
parents: 8625
diff changeset
   447
                            value = value.replace(_char, replace_sep)
1053b9d0fdf7 [dataimport] Allow to replace escape char in the create_copyfrom_buffer
Vincent Michel <vincent.michel@logilab.fr>
parents: 8625
diff changeset
   448
                        else:
1053b9d0fdf7 [dataimport] Allow to replace escape char in the create_copyfrom_buffer
Vincent Michel <vincent.michel@logilab.fr>
parents: 8625
diff changeset
   449
                            return
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   450
                value = value.replace('\\', r'\\')
8631
1053b9d0fdf7 [dataimport] Allow to replace escape char in the create_copyfrom_buffer
Vincent Michel <vincent.michel@logilab.fr>
parents: 8625
diff changeset
   451
                if value is None:
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   452
                    return
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   453
                if isinstance(value, unicode):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   454
                    value = value.encode(encoding)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   455
            elif isinstance(value, (date, datetime)):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   456
                # Do not use strftime, as it yields issue
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   457
                # with date < 1900
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   458
                value = '%04d-%02d-%02d' % (value.year,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   459
                                            value.month,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   460
                                            value.day)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   461
            else:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   462
                return None
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   463
            # We push the value to the new formatted row
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   464
            # if the value is not None and could be converted to a string.
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   465
            formatted_row.append(value)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   466
        rows.append('\t'.join(formatted_row))
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   467
    return StringIO('\n'.join(rows))
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   468
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   469
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   470
# object stores #################################################################
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   471
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   472
class ObjectStore(object):
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   473
    """Store objects in memory for *faster* validation (development mode)
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   474
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   475
    But it will not enforce the constraints of the schema and hence will miss some problems
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   476
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   477
    >>> store = ObjectStore()
7158
0f31a50b144e [dataimport] cleanups, update docstring to up-to-date usage
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7118
diff changeset
   478
    >>> user = store.create_entity('CWUser', login=u'johndoe')
0f31a50b144e [dataimport] cleanups, update docstring to up-to-date usage
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7118
diff changeset
   479
    >>> group = store.create_entity('CWUser', name=u'unknown')
0f31a50b144e [dataimport] cleanups, update docstring to up-to-date usage
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7118
diff changeset
   480
    >>> store.relate(user.eid, 'in_group', group.eid)
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   481
    """
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   482
    def __init__(self):
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   483
        self.items = []
3003
2944ee420dca R [dataimport] rename uid to eid
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 2974
diff changeset
   484
        self.eids = {}
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   485
        self.types = {}
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   486
        self.relations = set()
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   487
        self.indexes = {}
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   488
        self._rql = None
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   489
        self._commit = None
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   490
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   491
    def _put(self, type, item):
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   492
        self.items.append(item)
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   493
        return len(self.items) - 1
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   494
6990
353ad06867a8 [dataimport] implement create_entity() on ObjectStore to provide a consistent interface
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 6989
diff changeset
   495
    def create_entity(self, etype, **data):
7159
3bcccd3ab6b6 [dataimport] ObjectStore.create_entity should return something that looks like an entity (eg no more using dict protocol to access to attributes)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7158
diff changeset
   496
        data = attrdict(data)
3bcccd3ab6b6 [dataimport] ObjectStore.create_entity should return something that looks like an entity (eg no more using dict protocol to access to attributes)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7158
diff changeset
   497
        data['eid'] = eid = self._put(etype, data)
6990
353ad06867a8 [dataimport] implement create_entity() on ObjectStore to provide a consistent interface
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 6989
diff changeset
   498
        self.eids[eid] = data
353ad06867a8 [dataimport] implement create_entity() on ObjectStore to provide a consistent interface
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 6989
diff changeset
   499
        self.types.setdefault(etype, []).append(eid)
353ad06867a8 [dataimport] implement create_entity() on ObjectStore to provide a consistent interface
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 6989
diff changeset
   500
        return data
353ad06867a8 [dataimport] implement create_entity() on ObjectStore to provide a consistent interface
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 6989
diff changeset
   501
353ad06867a8 [dataimport] implement create_entity() on ObjectStore to provide a consistent interface
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 6989
diff changeset
   502
    @deprecated("[3.11] add is deprecated, use create_entity instead")
353ad06867a8 [dataimport] implement create_entity() on ObjectStore to provide a consistent interface
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 6989
diff changeset
   503
    def add(self, etype, item):
3486
ea6bf6f9ba0c [cwctl] improve dialog messages
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 3318
diff changeset
   504
        assert isinstance(item, dict), 'item is not a dict but a %s' % type(item)
6990
353ad06867a8 [dataimport] implement create_entity() on ObjectStore to provide a consistent interface
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 6989
diff changeset
   505
        data = self.create_entity(etype, **item)
353ad06867a8 [dataimport] implement create_entity() on ObjectStore to provide a consistent interface
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 6989
diff changeset
   506
        item['eid'] = data['eid']
353ad06867a8 [dataimport] implement create_entity() on ObjectStore to provide a consistent interface
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 6989
diff changeset
   507
        return item
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   508
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   509
    def relate(self, eid_from, rtype, eid_to, inlined=False):
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   510
        """Add new relation"""
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   511
        relation = eid_from, rtype, eid_to
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   512
        self.relations.add(relation)
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   513
        return relation
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   514
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   515
    def commit(self):
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   516
        """this commit method do nothing by default
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   517
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   518
        This is voluntary to use the frequent autocommit feature in CubicWeb
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   519
        when you are using hooks or another
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   520
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   521
        If you want override commit method, please set it by the
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   522
        constructor
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   523
        """
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   524
        pass
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   525
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   526
    def rql(self, *args):
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   527
        if self._rql is not None:
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   528
            return self._rql(*args)
7033
ddc1b4d80dbd [dataimport] remove eid_key deprecation warning and prints
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 6990
diff changeset
   529
        return []
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   530
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   531
    @property
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   532
    def nb_inserted_entities(self):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   533
        return len(self.eids)
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   534
    @property
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   535
    def nb_inserted_types(self):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   536
        return len(self.types)
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   537
    @property
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   538
    def nb_inserted_relations(self):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   539
        return len(self.relations)
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   540
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   541
    @deprecated("[3.7] index support will disappear")
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   542
    def build_index(self, name, type, func=None, can_be_empty=False):
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   543
        """build internal index for further search"""
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   544
        index = {}
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   545
        if func is None or not callable(func):
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   546
            func = lambda x: x['eid']
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   547
        for eid in self.types[type]:
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   548
            index.setdefault(func(self.eids[eid]), []).append(eid)
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   549
        if not can_be_empty:
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   550
            assert index, "new index '%s' cannot be empty" % name
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   551
        self.indexes[name] = index
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   552
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   553
    @deprecated("[3.7] index support will disappear")
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   554
    def build_rqlindex(self, name, type, key, rql, rql_params=False,
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   555
                       func=None, can_be_empty=False):
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   556
        """build an index by rql query
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   557
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   558
        rql should return eid in first column
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   559
        ctl.store.build_index('index_name', 'users', 'login', 'Any U WHERE U is CWUser')
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   560
        """
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   561
        self.types[type] = []
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   562
        rset = self.rql(rql, rql_params or {})
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   563
        if not can_be_empty:
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   564
            assert rset, "new index type '%s' cannot be empty (0 record found)" % type
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   565
        for entity in rset.entities():
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   566
            getattr(entity, key) # autopopulate entity with key attribute
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   567
            self.eids[entity.eid] = dict(entity)
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   568
            if entity.eid not in self.types[type]:
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   569
                self.types[type].append(entity.eid)
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   570
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   571
        # Build index with specified key
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   572
        func = lambda x: x[key]
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   573
        self.build_index(name, type, func, can_be_empty=can_be_empty)
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   574
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   575
    @deprecated("[3.7] index support will disappear")
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   576
    def fetch(self, name, key, unique=False, decorator=None):
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   577
        """index fetcher method
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   578
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   579
        decorator is a callable method or an iterator of callable methods (usually a lambda function)
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   580
        decorator=lambda x: x[:1] (first value is returned)
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   581
        decorator=lambda x: x.lower (lowercased value is returned)
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   582
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   583
        decorator is handy when you want to improve index keys but without
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   584
        changing the original field
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   585
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   586
        Same check functions can be reused here.
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   587
        """
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   588
        eids = self.indexes[name].get(key, [])
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   589
        if decorator is not None:
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   590
            if not hasattr(decorator, '__iter__'):
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   591
                decorator = (decorator,)
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   592
            for f in decorator:
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   593
                eids = f(eids)
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   594
        if unique:
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   595
            assert len(eids) == 1, u'expected a single one value for key "%s" in index "%s". Got %i' % (key, name, len(eids))
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   596
            eids = eids[0]
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   597
        return eids
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   598
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   599
    @deprecated("[3.7] index support will disappear")
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   600
    def find(self, type, key, value):
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   601
        for idx in self.types[type]:
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   602
            item = self.items[idx]
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   603
            if item[key] == value:
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   604
                yield item
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   605
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   606
    @deprecated("[3.7] checkpoint() deprecated. use commit() instead")
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   607
    def checkpoint(self):
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   608
        self.commit()
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   609
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   610
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   611
class RQLObjectStore(ObjectStore):
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   612
    """ObjectStore that works with an actual RQL repository (production mode)"""
4136
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   613
    _rql = None # bw compat
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   614
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   615
    def __init__(self, session=None, commit=None):
4136
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   616
        ObjectStore.__init__(self)
6122
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
   617
        if session is None:
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
   618
            sys.exit('please provide a session of run this script with cubicweb-ctl shell and pass cnx as session')
7398
26695dd703d8 [repository api] definitly kill usage of word 'pool' to refer to connections set used by a session
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7227
diff changeset
   619
        if not hasattr(session, 'set_cnxset'):
8403
a6ee3cd783e1 [data import] allow a request to be given as argument, ease use from web ui
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7815
diff changeset
   620
            if hasattr(session, 'request'):
a6ee3cd783e1 [data import] allow a request to be given as argument, ease use from web ui
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7815
diff changeset
   621
                # connection object
a6ee3cd783e1 [data import] allow a request to be given as argument, ease use from web ui
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7815
diff changeset
   622
                cnx = session
a6ee3cd783e1 [data import] allow a request to be given as argument, ease use from web ui
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7815
diff changeset
   623
                session = session.request()
a6ee3cd783e1 [data import] allow a request to be given as argument, ease use from web ui
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7815
diff changeset
   624
            else: # object is already a request
a6ee3cd783e1 [data import] allow a request to be given as argument, ease use from web ui
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7815
diff changeset
   625
                cnx = session.cnx
7398
26695dd703d8 [repository api] definitly kill usage of word 'pool' to refer to connections set used by a session
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7227
diff changeset
   626
            session.set_cnxset = lambda : None
6122
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
   627
            commit = commit or cnx.commit
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
   628
        else:
7398
26695dd703d8 [repository api] definitly kill usage of word 'pool' to refer to connections set used by a session
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7227
diff changeset
   629
            session.set_cnxset()
6122
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
   630
        self.session = session
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
   631
        self._commit = commit or session.commit
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   632
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   633
    @deprecated("[3.7] checkpoint() deprecated. use commit() instead")
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   634
    def checkpoint(self):
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   635
        self.commit()
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   636
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   637
    def commit(self):
5063
2a94b61837e1 [dataimport] stop disabling undo ; commit return transaction id
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5054
diff changeset
   638
        txuuid = self._commit()
7398
26695dd703d8 [repository api] definitly kill usage of word 'pool' to refer to connections set used by a session
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7227
diff changeset
   639
        self.session.set_cnxset()
5063
2a94b61837e1 [dataimport] stop disabling undo ; commit return transaction id
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5054
diff changeset
   640
        return txuuid
4136
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   641
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   642
    def rql(self, *args):
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   643
        if self._rql is not None:
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   644
            return self._rql(*args)
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   645
        return self.session.execute(*args)
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   646
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   647
    def create_entity(self, *args, **kwargs):
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   648
        entity = self.session.create_entity(*args, **kwargs)
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   649
        self.eids[entity.eid] = entity
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   650
        self.types.setdefault(args[0], []).append(entity.eid)
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   651
        return entity
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   652
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   653
    def _put(self, type, item):
6989
4a999a647f52 [dataimport] make it possible to insert 'empty' (i.e. no-attrs) entities
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 6492
diff changeset
   654
        query = 'INSERT %s X' % type
4a999a647f52 [dataimport] make it possible to insert 'empty' (i.e. no-attrs) entities
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 6492
diff changeset
   655
        if item:
4a999a647f52 [dataimport] make it possible to insert 'empty' (i.e. no-attrs) entities
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 6492
diff changeset
   656
            query += ': ' + ', '.join('X %s %%(%s)s' % (k, k)
4a999a647f52 [dataimport] make it possible to insert 'empty' (i.e. no-attrs) entities
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 6492
diff changeset
   657
                                      for k in item)
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   658
        return self.rql(query, item)[0][0]
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   659
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   660
    def relate(self, eid_from, rtype, eid_to, inlined=False):
4721
8f63691ccb7f pylint style fixes
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4613
diff changeset
   661
        eid_from, rtype, eid_to = super(RQLObjectStore, self).relate(
8f63691ccb7f pylint style fixes
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4613
diff changeset
   662
            eid_from, rtype, eid_to)
4136
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   663
        self.rql('SET X %s Y WHERE X eid %%(x)s, Y eid %%(y)s' % rtype,
7033
ddc1b4d80dbd [dataimport] remove eid_key deprecation warning and prints
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 6990
diff changeset
   664
                 {'x': int(eid_from), 'y': int(eid_to)})
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   665
7116
dfd4680a23f0 [session] add find_entities and find_one_entity to session/request API (closes #1550045)
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 7033
diff changeset
   666
    def find_entities(self, *args, **kwargs):
dfd4680a23f0 [session] add find_entities and find_one_entity to session/request API (closes #1550045)
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 7033
diff changeset
   667
        return self.session.find_entities(*args, **kwargs)
dfd4680a23f0 [session] add find_entities and find_one_entity to session/request API (closes #1550045)
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 7033
diff changeset
   668
dfd4680a23f0 [session] add find_entities and find_one_entity to session/request API (closes #1550045)
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 7033
diff changeset
   669
    def find_one_entity(self, *args, **kwargs):
dfd4680a23f0 [session] add find_entities and find_one_entity to session/request API (closes #1550045)
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 7033
diff changeset
   670
        return self.session.find_one_entity(*args, **kwargs)
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   671
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   672
# the import controller ########################################################
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   673
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   674
class CWImportController(object):
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   675
    """Controller of the data import process.
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   676
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   677
    >>> ctl = CWImportController(store)
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   678
    >>> ctl.generators = list_of_data_generators
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   679
    >>> ctl.data = dict_of_data_tables
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   680
    >>> ctl.run()
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   681
    """
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   682
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   683
    def __init__(self, store, askerror=0, catcherrors=None, tell=tell,
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   684
                 commitevery=50):
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   685
        self.store = store
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   686
        self.generators = None
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   687
        self.data = {}
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   688
        self.errors = None
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   689
        self.askerror = askerror
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   690
        if  catcherrors is None:
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   691
            catcherrors = askerror
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   692
        self.catcherrors = catcherrors
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   693
        self.commitevery = commitevery # set to None to do a single commit
3029
bc573d5fb5b7 F [devtools] by default dataimport prints message on stdout
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 3003
diff changeset
   694
        self._tell = tell
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   695
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   696
    def check(self, type, key, value):
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   697
        self._checks.setdefault(type, {}).setdefault(key, []).append(value)
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   698
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   699
    def check_map(self, entity, key, map, default):
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   700
        try:
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   701
            entity[key] = map[entity[key]]
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   702
        except KeyError:
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   703
            self.check(key, entity[key], None)
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   704
            entity[key] = default
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   705
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   706
    def record_error(self, key, msg=None, type=None, value=None, tb=None):
4186
ca7e526b07b6 import cleanup, check data file exists
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4173
diff changeset
   707
        tmp = StringIO()
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   708
        if type is None:
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   709
            traceback.print_exc(file=tmp)
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   710
        else:
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   711
            traceback.print_exception(type, value, tb, file=tmp)
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   712
        # use a list to avoid counting a <nb lines> errors instead of one
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   713
        errorlog = self.errors.setdefault(key, [])
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   714
        if msg is None:
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   715
            errorlog.append(tmp.getvalue().splitlines())
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   716
        else:
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   717
            errorlog.append( (msg, tmp.getvalue().splitlines()) )
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   718
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   719
    def run(self):
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   720
        self.errors = {}
7171
4297be67bbe4 [dataimport] tell more and nicely about transaction status
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7170
diff changeset
   721
        if self.commitevery is None:
4297be67bbe4 [dataimport] tell more and nicely about transaction status
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7170
diff changeset
   722
            self.tell('Will commit all or nothing.')
4297be67bbe4 [dataimport] tell more and nicely about transaction status
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7170
diff changeset
   723
        else:
4297be67bbe4 [dataimport] tell more and nicely about transaction status
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7170
diff changeset
   724
            self.tell('Will commit every %s iterations' % self.commitevery)
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   725
        for func, checks in self.generators:
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   726
            self._checks = {}
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   727
            func_name = func.__name__
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   728
            self.tell("Run import function '%s'..." % func_name)
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   729
            try:
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   730
                func(self)
7815
2a164a9cf81c [exceptions] stop catching any exception in various places (closes #1942716)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7471
diff changeset
   731
            except Exception:
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   732
                if self.catcherrors:
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   733
                    self.record_error(func_name, 'While calling %s' % func.__name__)
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   734
                else:
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   735
                    self._print_stats()
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   736
                    raise
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   737
            for key, func, title, help in checks:
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   738
                buckets = self._checks.get(key)
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   739
                if buckets:
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   740
                    err = func(buckets)
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   741
                    if err:
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   742
                        self.errors[title] = (help, err)
7171
4297be67bbe4 [dataimport] tell more and nicely about transaction status
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7170
diff changeset
   743
        try:
4297be67bbe4 [dataimport] tell more and nicely about transaction status
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7170
diff changeset
   744
            txuuid = self.store.commit()
4297be67bbe4 [dataimport] tell more and nicely about transaction status
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7170
diff changeset
   745
            if txuuid is not None:
4297be67bbe4 [dataimport] tell more and nicely about transaction status
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7170
diff changeset
   746
                self.tell('Transaction commited (txuuid: %s)' % txuuid)
8695
358d8bed9626 [toward-py3k] rewrite to "except AnException as exc:" (part of #2711624)
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 8637
diff changeset
   747
        except QueryError as ex:
7171
4297be67bbe4 [dataimport] tell more and nicely about transaction status
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7170
diff changeset
   748
            self.tell('Transaction aborted: %s' % ex)
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   749
        self._print_stats()
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   750
        if self.errors:
4721
8f63691ccb7f pylint style fixes
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4613
diff changeset
   751
            if self.askerror == 2 or (self.askerror and confirm('Display errors ?')):
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   752
                from pprint import pformat
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   753
                for errkey, error in self.errors.items():
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   754
                    self.tell("\n%s (%s): %d\n" % (error[0], errkey, len(error[1])))
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   755
                    self.tell(pformat(sorted(error[1])))
7171
4297be67bbe4 [dataimport] tell more and nicely about transaction status
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 7170
diff changeset
   756
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   757
    def _print_stats(self):
8696
0bb18407c053 [toward py3k] rewrite dict.keys() and dict.values() (part of #2711624)
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 8695
diff changeset
   758
        nberrors = sum(len(err) for err in self.errors.itervalues())
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   759
        self.tell('\nImport statistics: %i entities, %i types, %i relations and %i errors'
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   760
                  % (self.store.nb_inserted_entities,
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   761
                     self.store.nb_inserted_types,
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   762
                     self.store.nb_inserted_relations,
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   763
                     nberrors))
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   764
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   765
    def get_data(self, key):
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   766
        return self.data.get(key)
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   767
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   768
    def index(self, name, key, value, unique=False):
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   769
        """create a new index
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   770
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   771
        If unique is set to True, only first occurence will be kept not the following ones
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   772
        """
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   773
        if unique:
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   774
            try:
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   775
                if value in self.store.indexes[name][key]:
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   776
                    return
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   777
            except KeyError:
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   778
                # we're sure that one is the first occurence; so continue...
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   779
                pass
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   780
        self.store.indexes.setdefault(name, {}).setdefault(key, []).append(value)
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   781
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   782
    def tell(self, msg):
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   783
        self._tell(msg)
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   784
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   785
    def iter_and_commit(self, datakey):
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   786
        """iter rows, triggering commit every self.commitevery iterations"""
6136
79da6f969b15 [dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 6133
diff changeset
   787
        if self.commitevery is None:
79da6f969b15 [dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 6133
diff changeset
   788
            return self.get_data(datakey)
79da6f969b15 [dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 6133
diff changeset
   789
        else:
6169
55378e1bab1b fix order of parameters in call to callfunc_every
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 6136
diff changeset
   790
            return callfunc_every(self.store.commit,
55378e1bab1b fix order of parameters in call to callfunc_every
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 6136
diff changeset
   791
                                  self.commitevery,
55378e1bab1b fix order of parameters in call to callfunc_every
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 6136
diff changeset
   792
                                  self.get_data(datakey))
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   793
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   794
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   795
class NoHookRQLObjectStore(RQLObjectStore):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   796
    """ObjectStore that works with an actual RQL repository (production mode)"""
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   797
    _rql = None # bw compat
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   798
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   799
    def __init__(self, session, metagen=None, baseurl=None):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   800
        super(NoHookRQLObjectStore, self).__init__(session)
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   801
        self.source = session.repo.system_source
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   802
        self.rschema = session.repo.schema.rschema
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   803
        self.add_relation = self.source.add_relation
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   804
        if metagen is None:
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   805
            metagen = MetaGenerator(session, baseurl)
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   806
        self.metagen = metagen
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   807
        self._nb_inserted_entities = 0
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   808
        self._nb_inserted_types = 0
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   809
        self._nb_inserted_relations = 0
5054
cb066d29166a fix dataimport for 3.7.2
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4913
diff changeset
   810
        self.rql = session.execute
cb066d29166a fix dataimport for 3.7.2
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4913
diff changeset
   811
        # deactivate security
cb066d29166a fix dataimport for 3.7.2
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4913
diff changeset
   812
        session.set_read_security(False)
cb066d29166a fix dataimport for 3.7.2
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4913
diff changeset
   813
        session.set_write_security(False)
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   814
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   815
    def create_entity(self, etype, **kwargs):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   816
        for k, v in kwargs.iteritems():
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   817
            kwargs[k] = getattr(v, 'eid', v)
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   818
        entity, rels = self.metagen.base_etype_dicts(etype)
7471
bf9443f8725f [dataimport] fix #1732685: cached entity and shared cw_edited data with NoHookRQLObjectStore
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 7398
diff changeset
   819
        # make a copy to keep cached entity pristine
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   820
        entity = copy(entity)
7471
bf9443f8725f [dataimport] fix #1732685: cached entity and shared cw_edited data with NoHookRQLObjectStore
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 7398
diff changeset
   821
        entity.cw_edited = copy(entity.cw_edited)
5557
1a534c596bff [entity] continue cleanup of Entity/AnyEntity namespace
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5424
diff changeset
   822
        entity.cw_clear_relation_cache()
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   823
        self.metagen.init_entity(entity)
6142
8bc6eac1fac1 [session] cleanup hook / operation / entity edition api
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 6122
diff changeset
   824
        entity.cw_edited.update(kwargs, skipsec=False)
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   825
        session = self.session
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   826
        self.source.add_entity(session, entity)
5054
cb066d29166a fix dataimport for 3.7.2
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4913
diff changeset
   827
        self.source.add_info(session, entity, self.source, None, complete=False)
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   828
        for rtype, targeteids in rels.iteritems():
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   829
            # targeteids may be a single eid or a list of eids
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   830
            inlined = self.rschema(rtype).inlined
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   831
            try:
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   832
                for targeteid in targeteids:
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   833
                    self.add_relation(session, entity.eid, rtype, targeteid,
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   834
                                      inlined)
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   835
            except TypeError:
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   836
                self.add_relation(session, entity.eid, rtype, targeteids,
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   837
                                  inlined)
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   838
        self._nb_inserted_entities += 1
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   839
        return entity
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   840
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   841
    def relate(self, eid_from, rtype, eid_to):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   842
        assert not rtype.startswith('reverse_')
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   843
        self.add_relation(self.session, eid_from, rtype, eid_to,
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   844
                          self.rschema(rtype).inlined)
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   845
        self._nb_inserted_relations += 1
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   846
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   847
    @property
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   848
    def nb_inserted_entities(self):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   849
        return self._nb_inserted_entities
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   850
    @property
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   851
    def nb_inserted_types(self):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   852
        return self._nb_inserted_types
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   853
    @property
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   854
    def nb_inserted_relations(self):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   855
        return self._nb_inserted_relations
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   856
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   857
    def _put(self, type, item):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   858
        raise RuntimeError('use create entity')
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   859
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   860
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   861
class MetaGenerator(object):
6427
c8a5ac2d1eaa [schema / sources] store data sources as cubicweb entities
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 6173
diff changeset
   862
    META_RELATIONS = (META_RTYPES
c8a5ac2d1eaa [schema / sources] store data sources as cubicweb entities
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 6173
diff changeset
   863
                      - VIRTUAL_RTYPES
c8a5ac2d1eaa [schema / sources] store data sources as cubicweb entities
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 6173
diff changeset
   864
                      - set(('eid', 'cwuri',
c8a5ac2d1eaa [schema / sources] store data sources as cubicweb entities
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 6173
diff changeset
   865
                             'is', 'is_instance_of', 'cw_source')))
c8a5ac2d1eaa [schema / sources] store data sources as cubicweb entities
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 6173
diff changeset
   866
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   867
    def __init__(self, session, baseurl=None):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   868
        self.session = session
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   869
        self.source = session.repo.system_source
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   870
        self.time = datetime.now()
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   871
        if baseurl is None:
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   872
            config = session.vreg.config
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   873
            baseurl = config['base-url'] or config.default_base_url()
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   874
        if not baseurl[-1] == '/':
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   875
            baseurl += '/'
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   876
        self.baseurl =  baseurl
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   877
        # attributes/relations shared by all entities of the same type
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   878
        self.etype_attrs = []
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   879
        self.etype_rels = []
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   880
        # attributes/relations specific to each entity
5054
cb066d29166a fix dataimport for 3.7.2
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4913
diff changeset
   881
        self.entity_attrs = ['cwuri']
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   882
        #self.entity_rels = [] XXX not handled (YAGNI?)
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   883
        schema = session.vreg.schema
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   884
        rschema = schema.rschema
6427
c8a5ac2d1eaa [schema / sources] store data sources as cubicweb entities
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 6173
diff changeset
   885
        for rtype in self.META_RELATIONS:
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   886
            if rschema(rtype).final:
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   887
                self.etype_attrs.append(rtype)
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   888
            else:
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   889
                self.etype_rels.append(rtype)
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   890
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   891
    @cached
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   892
    def base_etype_dicts(self, etype):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   893
        entity = self.session.vreg['etypes'].etype_class(etype)(self.session)
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   894
        # entity are "surface" copied, avoid shared dict between copies
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   895
        del entity.cw_extra_kwargs
6142
8bc6eac1fac1 [session] cleanup hook / operation / entity edition api
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 6122
diff changeset
   896
        entity.cw_edited = EditedEntity(entity)
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   897
        for attr in self.etype_attrs:
6990
353ad06867a8 [dataimport] implement create_entity() on ObjectStore to provide a consistent interface
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 6989
diff changeset
   898
            entity.cw_edited.edited_attribute(attr, self.generate(entity, attr))
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   899
        rels = {}
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   900
        for rel in self.etype_rels:
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   901
            rels[rel] = self.generate(entity, rel)
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   902
        return entity, rels
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   903
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   904
    def init_entity(self, entity):
5054
cb066d29166a fix dataimport for 3.7.2
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4913
diff changeset
   905
        entity.eid = self.source.create_eid(self.session)
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   906
        for attr in self.entity_attrs:
6990
353ad06867a8 [dataimport] implement create_entity() on ObjectStore to provide a consistent interface
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 6989
diff changeset
   907
            entity.cw_edited.edited_attribute(attr, self.generate(entity, attr))
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   908
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   909
    def generate(self, entity, rtype):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   910
        return getattr(self, 'gen_%s' % rtype)(entity)
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   911
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   912
    def gen_cwuri(self, entity):
5054
cb066d29166a fix dataimport for 3.7.2
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4913
diff changeset
   913
        return u'%seid/%s' % (self.baseurl, entity.eid)
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   914
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   915
    def gen_creation_date(self, entity):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   916
        return self.time
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   917
    def gen_modification_date(self, entity):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   918
        return self.time
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   919
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   920
    def gen_created_by(self, entity):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   921
        return self.session.user.eid
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   922
    def gen_owned_by(self, entity):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   923
        return self.session.user.eid
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   924
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   925
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   926
###########################################################################
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   927
## SQL object store #######################################################
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   928
###########################################################################
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   929
class SQLGenObjectStore(NoHookRQLObjectStore):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   930
    """Controller of the data import process. This version is based
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   931
    on direct insertions throught SQL command (COPY FROM or execute many).
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   932
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   933
    >>> store = SQLGenObjectStore(session)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   934
    >>> store.create_entity('Person', ...)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   935
    >>> store.flush()
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   936
    """
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   937
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   938
    def __init__(self, session, dump_output_dir=None, nb_threads_statement=3):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   939
        """
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   940
        Initialize a SQLGenObjectStore.
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   941
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   942
        Parameters:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   943
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   944
          - session: session on the cubicweb instance
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   945
          - dump_output_dir: a directory to dump failed statements
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   946
            for easier recovery. Default is None (no dump).
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   947
          - nb_threads_statement: number of threads used
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   948
            for SQL insertion (default is 3).
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   949
        """
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   950
        super(SQLGenObjectStore, self).__init__(session)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   951
        ### hijack default source
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   952
        self.source = SQLGenSourceWrapper(
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   953
            self.source, session.vreg.schema,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   954
            dump_output_dir=dump_output_dir,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   955
            nb_threads_statement=nb_threads_statement)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   956
        ### XXX This is done in super().__init__(), but should be
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   957
        ### redone here to link to the correct source
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   958
        self.add_relation = self.source.add_relation
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   959
        self.indexes_etypes = {}
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   960
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   961
    def flush(self):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   962
        """Flush data to the database"""
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   963
        self.source.flush()
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   964
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   965
    def relate(self, subj_eid, rtype, obj_eid, subjtype=None):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   966
        if subj_eid is None or obj_eid is None:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   967
            return
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   968
        # XXX Could subjtype be inferred ?
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   969
        self.source.add_relation(self.session, subj_eid, rtype, obj_eid,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   970
                                 self.rschema(rtype).inlined, subjtype)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   971
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   972
    def drop_indexes(self, etype):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   973
        """Drop indexes for a given entity type"""
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   974
        if etype not in self.indexes_etypes:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   975
            cu = self.session.cnxset['system']
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   976
            def index_to_attr(index):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   977
                """turn an index name to (database) attribute name"""
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   978
                return index.replace(etype.lower(), '').replace('idx', '').strip('_')
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   979
            indices = [(index, index_to_attr(index))
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   980
                       for index in self.source.dbhelper.list_indices(cu, etype)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   981
                       # Do not consider 'cw_etype_pkey' index
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   982
                       if not index.endswith('key')]
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   983
            self.indexes_etypes[etype] = indices
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   984
        for index, attr in self.indexes_etypes[etype]:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   985
            self.session.system_sql('DROP INDEX %s' % index)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   986
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   987
    def create_indexes(self, etype):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   988
        """Recreate indexes for a given entity type"""
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   989
        for index, attr in self.indexes_etypes.get(etype, []):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   990
            sql = 'CREATE INDEX %s ON cw_%s(%s)' % (index, etype, attr)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   991
            self.session.system_sql(sql)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   992
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   993
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   994
###########################################################################
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   995
## SQL Source #############################################################
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   996
###########################################################################
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   997
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   998
class SQLGenSourceWrapper(object):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
   999
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1000
    def __init__(self, system_source, schema,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1001
                 dump_output_dir=None, nb_threads_statement=3):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1002
        self.system_source = system_source
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1003
        self._sql = threading.local()
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1004
        # Explicitely backport attributes from system source
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1005
        self._storage_handler = self.system_source._storage_handler
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1006
        self.preprocess_entity = self.system_source.preprocess_entity
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1007
        self.sqlgen = self.system_source.sqlgen
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1008
        self.copy_based_source = self.system_source.copy_based_source
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1009
        self.uri = self.system_source.uri
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1010
        self.eid = self.system_source.eid
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1011
        # Directory to write temporary files
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1012
        self.dump_output_dir = dump_output_dir
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1013
        # Allow to execute code with SQLite backend that does
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1014
        # not support (yet...) copy_from
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1015
        # XXX Should be dealt with in logilab.database
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1016
        spcfrom = system_source.dbhelper.dbapi_module.support_copy_from
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1017
        self.support_copy_from = spcfrom
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1018
        self.dbencoding = system_source.dbhelper.dbencoding
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1019
        self.nb_threads_statement = nb_threads_statement
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1020
        # initialize thread-local data for main thread
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1021
        self.init_thread_locals()
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1022
        self._inlined_rtypes_cache = {}
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1023
        self._fill_inlined_rtypes_cache(schema)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1024
        self.schema = schema
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1025
        self.do_fti = False
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1026
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1027
    def _fill_inlined_rtypes_cache(self, schema):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1028
        cache = self._inlined_rtypes_cache
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1029
        for eschema in schema.entities():
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1030
            for rschema in eschema.ordered_relations():
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1031
                if rschema.inlined:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1032
                    cache[eschema.type] = SQL_PREFIX + rschema.type
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1033
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1034
    def init_thread_locals(self):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1035
        """initializes thread-local data"""
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1036
        self._sql.entities = defaultdict(list)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1037
        self._sql.relations = {}
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1038
        self._sql.inlined_relations = {}
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1039
        # keep track, for each eid of the corresponding data dict
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1040
        self._sql.eid_insertdicts = {}
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1041
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1042
    def flush(self):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1043
        print 'starting flush'
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1044
        _entities_sql = self._sql.entities
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1045
        _relations_sql = self._sql.relations
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1046
        _inlined_relations_sql = self._sql.inlined_relations
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1047
        _insertdicts = self._sql.eid_insertdicts
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1048
        try:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1049
            # try, for each inlined_relation, to find if we're also creating
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1050
            # the host entity (i.e. the subject of the relation).
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1051
            # In that case, simply update the insert dict and remove
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1052
            # the need to make the
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1053
            # UPDATE statement
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1054
            for statement, datalist in _inlined_relations_sql.iteritems():
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1055
                new_datalist = []
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1056
                # for a given inlined relation,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1057
                # browse each couple to be inserted
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1058
                for data in datalist:
8696
0bb18407c053 [toward py3k] rewrite dict.keys() and dict.values() (part of #2711624)
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 8695
diff changeset
  1059
                    keys = list(data)
8625
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1060
                    # For inlined relations, it exists only two case:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1061
                    # (rtype, cw_eid) or (cw_eid, rtype)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1062
                    if keys[0] == 'cw_eid':
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1063
                        rtype = keys[1]
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1064
                    else:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1065
                        rtype = keys[0]
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1066
                    updated_eid = data['cw_eid']
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1067
                    if updated_eid in _insertdicts:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1068
                        _insertdicts[updated_eid][rtype] = data[rtype]
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1069
                    else:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1070
                        # could not find corresponding insert dict, keep the
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1071
                        # UPDATE query
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1072
                        new_datalist.append(data)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1073
                _inlined_relations_sql[statement] = new_datalist
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1074
            _import_statements(self.system_source.get_connection,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1075
                               _entities_sql.items()
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1076
                               + _relations_sql.items()
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1077
                               + _inlined_relations_sql.items(),
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1078
                               dump_output_dir=self.dump_output_dir,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1079
                               nb_threads=self.nb_threads_statement,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1080
                               support_copy_from=self.support_copy_from,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1081
                               encoding=self.dbencoding)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1082
        except:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1083
            print 'failed to flush'
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1084
        finally:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1085
            _entities_sql.clear()
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1086
            _relations_sql.clear()
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1087
            _insertdicts.clear()
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1088
            _inlined_relations_sql.clear()
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1089
            print 'flush done'
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1090
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1091
    def add_relation(self, session, subject, rtype, object,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1092
                     inlined=False, subjtype=None):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1093
        if inlined:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1094
            _sql = self._sql.inlined_relations
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1095
            data = {'cw_eid': subject, SQL_PREFIX + rtype: object}
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1096
            if subjtype is None:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1097
                # Try to infer it
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1098
                targets = [t.type for t in
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1099
                           self.schema.rschema(rtype).targets()]
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1100
                if len(targets) == 1:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1101
                    subjtype = targets[0]
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1102
                else:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1103
                    raise ValueError('You should give the subject etype for '
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1104
                                     'inlined relation %s'
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1105
                                     ', as it cannot be inferred' % rtype)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1106
            statement = self.sqlgen.update(SQL_PREFIX + subjtype,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1107
                                           data, ['cw_eid'])
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1108
        else:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1109
            _sql = self._sql.relations
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1110
            data = {'eid_from': subject, 'eid_to': object}
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1111
            statement = self.sqlgen.insert('%s_relation' % rtype, data)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1112
        if statement in _sql:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1113
            _sql[statement].append(data)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1114
        else:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1115
            _sql[statement] = [data]
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1116
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1117
    def add_entity(self, session, entity):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1118
        with self._storage_handler(entity, 'added'):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1119
            attrs = self.preprocess_entity(entity)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1120
            rtypes = self._inlined_rtypes_cache.get(entity.__regid__, ())
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1121
            if isinstance(rtypes, str):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1122
                rtypes = (rtypes,)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1123
            for rtype in rtypes:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1124
                if rtype not in attrs:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1125
                    attrs[rtype] = None
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1126
            sql = self.sqlgen.insert(SQL_PREFIX + entity.__regid__, attrs)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1127
            self._sql.eid_insertdicts[entity.eid] = attrs
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1128
            self._append_to_entities(sql, attrs)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1129
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1130
    def _append_to_entities(self, sql, attrs):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1131
        self._sql.entities[sql].append(attrs)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1132
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1133
    def _handle_insert_entity_sql(self, session, sql, attrs):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1134
        # We have to overwrite the source given in parameters
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1135
        # as here, we directly use the system source
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1136
        attrs['source'] = 'system'
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1137
        attrs['asource'] = self.system_source.uri
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1138
        self._append_to_entities(sql, attrs)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1139
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1140
    def _handle_is_relation_sql(self, session, sql, attrs):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1141
        self._append_to_entities(sql, attrs)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1142
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1143
    def _handle_is_instance_of_sql(self, session, sql, attrs):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1144
        self._append_to_entities(sql, attrs)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1145
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1146
    def _handle_source_relation_sql(self, session, sql, attrs):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1147
        self._append_to_entities(sql, attrs)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1148
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1149
    # XXX add_info is similar to the one in NativeSQLSource. It is rewritten
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1150
    # here to correctly used the _handle_xxx of the SQLGenSourceWrapper. This
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1151
    # part should be rewritten in a more clearly way.
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1152
    def add_info(self, session, entity, source, extid, complete):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1153
        """add type and source info for an eid into the system table"""
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1154
        # begin by inserting eid/type/source/extid into the entities table
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1155
        if extid is not None:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1156
            assert isinstance(extid, str)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1157
            extid = b64encode(extid)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1158
        uri = 'system' if source.copy_based_source else source.uri
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1159
        attrs = {'type': entity.__regid__, 'eid': entity.eid, 'extid': extid,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1160
                 'source': uri, 'asource': source.uri, 'mtime': datetime.utcnow()}
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1161
        self._handle_insert_entity_sql(session, self.sqlgen.insert('entities', attrs), attrs)
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1162
        # insert core relations: is, is_instance_of and cw_source
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1163
        try:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1164
            self._handle_is_relation_sql(session, 'INSERT INTO is_relation(eid_from,eid_to) VALUES (%s,%s)',
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1165
                                         (entity.eid, eschema_eid(session, entity.e_schema)))
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1166
        except IndexError:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1167
            # during schema serialization, skip
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1168
            pass
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1169
        else:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1170
            for eschema in entity.e_schema.ancestors() + [entity.e_schema]:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1171
                self._handle_is_relation_sql(session,
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1172
                                             'INSERT INTO is_instance_of_relation(eid_from,eid_to) VALUES (%s,%s)',
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1173
                                             (entity.eid, eschema_eid(session, eschema)))
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1174
        if 'CWSource' in self.schema and source.eid is not None: # else, cw < 3.10
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1175
            self._handle_is_relation_sql(session, 'INSERT INTO cw_source_relation(eid_from,eid_to) VALUES (%s,%s)',
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1176
                                         (entity.eid, source.eid))
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1177
        # now we can update the full text index
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1178
        if self.do_fti and self.need_fti_indexation(entity.__regid__):
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1179
            if complete:
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1180
                entity.complete(entity.e_schema.indexable_attributes())
7ee0752178e5 [dataimport] Add SQL Store for faster import - works ONLY with Postgres for now, as it requires "copy from" command - closes #2410822
Vincent Michel <vincent.michel@logilab.fr>
parents: 8406
diff changeset
  1181
            self.index_entity(session, entity=entity)