dataimport.py
author Pierre-Yves David <pierre-yves.david@logilab.fr>
Thu, 17 Mar 2011 12:21:36 +0100
branchstable
changeset 7094 4f9f13a50484
parent 7033 ddc1b4d80dbd
child 7116 dfd4680a23f0
permissions -rw-r--r--
merge oldstable into default
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
     1
# -*- coding: utf-8 -*-
5421
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5097
diff changeset
     2
# copyright 2003-2010 LOGILAB S.A. (Paris, FRANCE), all rights reserved.
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5097
diff changeset
     3
# contact http://www.logilab.fr/ -- mailto:contact@logilab.fr
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5097
diff changeset
     4
#
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5097
diff changeset
     5
# This file is part of CubicWeb.
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5097
diff changeset
     6
#
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5097
diff changeset
     7
# CubicWeb is free software: you can redistribute it and/or modify it under the
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5097
diff changeset
     8
# terms of the GNU Lesser General Public License as published by the Free
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5097
diff changeset
     9
# Software Foundation, either version 2.1 of the License, or (at your option)
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5097
diff changeset
    10
# any later version.
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5097
diff changeset
    11
#
5424
8ecbcbff9777 replace logilab-common by CubicWeb in disclaimer
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5421
diff changeset
    12
# CubicWeb is distributed in the hope that it will be useful, but WITHOUT
5421
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5097
diff changeset
    13
# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5097
diff changeset
    14
# FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public License for more
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5097
diff changeset
    15
# details.
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5097
diff changeset
    16
#
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5097
diff changeset
    17
# You should have received a copy of the GNU Lesser General Public License along
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5097
diff changeset
    18
# with CubicWeb.  If not, see <http://www.gnu.org/licenses/>.
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    19
"""This module provides tools to import tabular data.
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    20
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    21
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    22
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    23
Example of use (run this with `cubicweb-ctl shell instance import-script.py`):
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    24
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    25
.. sourcecode:: python
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    26
3318
5b47b9f09bca documentation : fixed docstring
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 3029
diff changeset
    27
  from cubicweb.devtools.dataimport import *
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    28
  # define data generators
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    29
  GENERATORS = []
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    30
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    31
  USERS = [('Prenom', 'firstname', ()),
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    32
           ('Nom', 'surname', ()),
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    33
           ('Identifiant', 'login', ()),
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    34
           ]
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    35
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    36
  def gen_users(ctl):
6133
6f3eabbbdf2e use iter_and_commit in example
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 6122
diff changeset
    37
      for row in ctl.iter_and_commit('utilisateurs'):
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    38
          entity = mk_entity(row, USERS)
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    39
          entity['upassword'] = u'motdepasse'
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    40
          ctl.check('login', entity['login'], None)
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    41
          ctl.store.add('CWUser', entity)
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    42
          email = {'address': row['email']}
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    43
          ctl.store.add('EmailAddress', email)
3003
2944ee420dca R [dataimport] rename uid to eid
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 2974
diff changeset
    44
          ctl.store.relate(entity['eid'], 'use_email', email['eid'])
2944ee420dca R [dataimport] rename uid to eid
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 2974
diff changeset
    45
          ctl.store.rql('SET U in_group G WHERE G name "users", U eid %(x)s', {'x':entity['eid']})
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    46
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    47
  CHK = [('login', check_doubles, 'Utilisateurs Login',
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    48
          'Deux utilisateurs ne devraient pas avoir le même login.'),
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    49
         ]
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    50
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    51
  GENERATORS.append( (gen_users, CHK) )
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    52
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    53
  # create controller
6122
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
    54
  if 'cnx' in globals():
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
    55
      ctl = CWImportController(RQLObjectStore(cnx))
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
    56
  else:
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
    57
      print 'debug mode (not connected)'
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
    58
      print 'run through cubicweb-ctl shell to access an instance'
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
    59
      ctl = CWImportController(ObjectStore())
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
    60
  ctl.askerror = 1
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    61
  ctl.generators = GENERATORS
6122
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
    62
  ctl.data['utilisateurs'] = lazytable(ucsvreader(open('users.csv')))
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    63
  # run
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    64
  ctl.run()
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    65
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
    66
.. BUG file with one column are not parsable
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
    67
.. TODO rollback() invocation is not possible yet
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    68
"""
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    69
__docformat__ = "restructuredtext en"
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    70
4186
ca7e526b07b6 import cleanup, check data file exists
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4173
diff changeset
    71
import sys
ca7e526b07b6 import cleanup, check data file exists
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4173
diff changeset
    72
import csv
ca7e526b07b6 import cleanup, check data file exists
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4173
diff changeset
    73
import traceback
ca7e526b07b6 import cleanup, check data file exists
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4173
diff changeset
    74
import os.path as osp
ca7e526b07b6 import cleanup, check data file exists
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4173
diff changeset
    75
from StringIO import StringIO
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
    76
from copy import copy
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    77
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    78
from logilab.common import shellutils
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
    79
from logilab.common.date import strptime
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
    80
from logilab.common.decorators import cached
4136
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
    81
from logilab.common.deprecation import deprecated
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
    82
5066
bf5cbc351e99 [repo] move eschema_eid function from hooks.metadata to server.utils
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5063
diff changeset
    83
from cubicweb.server.utils import eschema_eid
6142
8bc6eac1fac1 [session] cleanup hook / operation / entity edition api
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 6122
diff changeset
    84
from cubicweb.server.ssplanner import EditedEntity
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
    85
6122
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
    86
def count_lines(stream_or_filename):
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
    87
    if isinstance(stream_or_filename, basestring):
6492
47a284c0d012 fix some pylint detected errors
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 6427
diff changeset
    88
        f = open(stream_or_filename)
6122
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
    89
    else:
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
    90
        f = stream_or_filename
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
    91
        f.seek(0)
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
    92
    for i, line in enumerate(f):
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
    93
        pass
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
    94
    f.seek(0)
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
    95
    return i+1
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
    96
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
    97
def ucsvreader_pb(stream_or_path, encoding='utf-8', separator=',', quote='"',
4136
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
    98
                  skipfirst=False, withpb=True):
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
    99
    """same as ucsvreader but a progress bar is displayed as we iter on rows"""
6122
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
   100
    if isinstance(stream_or_path, basestring):
6492
47a284c0d012 fix some pylint detected errors
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 6427
diff changeset
   101
        if not osp.exists(stream_or_path):
47a284c0d012 fix some pylint detected errors
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 6427
diff changeset
   102
            raise Exception("file doesn't exists: %s" % stream_or_path)
6122
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
   103
        stream = open(stream_or_path)
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
   104
    else:
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
   105
        stream = stream_or_path
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
   106
    rowcount = count_lines(stream)
4136
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   107
    if skipfirst:
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   108
        rowcount -= 1
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   109
    if withpb:
4140
46ddd27a4ca4 tweaks output
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4136
diff changeset
   110
        pb = shellutils.ProgressBar(rowcount, 50)
6122
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
   111
    for urow in ucsvreader(stream, encoding, separator, quote, skipfirst):
4136
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   112
        yield urow
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   113
        if withpb:
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   114
            pb.update()
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   115
    print ' %s rows imported' % rowcount
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   116
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   117
def ucsvreader(stream, encoding='utf-8', separator=',', quote='"',
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   118
               skipfirst=False):
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   119
    """A csv reader that accepts files with any encoding and outputs unicode
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   120
    strings
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   121
    """
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   122
    it = iter(csv.reader(stream, delimiter=separator, quotechar=quote))
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   123
    if skipfirst:
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   124
        it.next()
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   125
    for row in it:
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   126
        yield [item.decode(encoding) for item in row]
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   127
6136
79da6f969b15 [dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 6133
diff changeset
   128
def callfunc_every(func, number, iterable):
79da6f969b15 [dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 6133
diff changeset
   129
    """yield items of `iterable` one by one and call function `func`
79da6f969b15 [dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 6133
diff changeset
   130
    every `number` iterations. Always call function `func` at the end.
79da6f969b15 [dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 6133
diff changeset
   131
    """
79da6f969b15 [dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 6133
diff changeset
   132
    for idx, item in enumerate(iterable):
79da6f969b15 [dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 6133
diff changeset
   133
        yield item
79da6f969b15 [dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 6133
diff changeset
   134
        if idx % number:
79da6f969b15 [dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 6133
diff changeset
   135
            func()
79da6f969b15 [dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 6133
diff changeset
   136
    func()
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   137
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   138
def lazytable(reader):
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   139
    """The first row is taken to be the header of the table and
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   140
    used to output a dict for each row of data.
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   141
6122
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
   142
    >>> data = lazytable(ucsvreader(open(filename)))
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   143
    """
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   144
    header = reader.next()
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   145
    for row in reader:
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   146
        yield dict(zip(header, row))
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   147
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   148
def mk_entity(row, map):
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   149
    """Return a dict made from sanitized mapped values.
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   150
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   151
    ValueError can be raised on unexpected values found in checkers
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   152
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   153
    >>> row = {'myname': u'dupont'}
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   154
    >>> map = [('myname', u'name', (call_transform_method('title'),))]
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   155
    >>> mk_entity(row, map)
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   156
    {'name': u'Dupont'}
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   157
    >>> row = {'myname': u'dupont', 'optname': u''}
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   158
    >>> map = [('myname', u'name', (call_transform_method('title'),)),
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   159
    ...        ('optname', u'MARKER', (optional,))]
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   160
    >>> mk_entity(row, map)
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   161
    {'name': u'Dupont', 'optname': None}
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   162
    """
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   163
    res = {}
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   164
    assert isinstance(row, dict)
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   165
    assert isinstance(map, list)
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   166
    for src, dest, funcs in map:
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   167
        res[dest] = row[src]
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   168
        try:
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   169
            for func in funcs:
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   170
                res[dest] = func(res[dest])
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   171
                if res[dest] is None:
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   172
                    break
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   173
        except ValueError, err:
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   174
            raise ValueError('error with %r field: %s' % (src, err))
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   175
    return res
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   176
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   177
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   178
# user interactions ############################################################
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   179
3029
bc573d5fb5b7 F [devtools] by default dataimport prints message on stdout
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 3003
diff changeset
   180
def tell(msg):
bc573d5fb5b7 F [devtools] by default dataimport prints message on stdout
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 3003
diff changeset
   181
    print msg
bc573d5fb5b7 F [devtools] by default dataimport prints message on stdout
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 3003
diff changeset
   182
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   183
def confirm(question):
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   184
    """A confirm function that asks for yes/no/abort and exits on abort."""
4721
8f63691ccb7f pylint style fixes
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4613
diff changeset
   185
    answer = shellutils.ASK.ask(question, ('Y', 'n', 'abort'), 'Y')
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   186
    if answer == 'abort':
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   187
        sys.exit(1)
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   188
    return answer == 'Y'
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   189
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   190
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   191
class catch_error(object):
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   192
    """Helper for @contextmanager decorator."""
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   193
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   194
    def __init__(self, ctl, key='unexpected error', msg=None):
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   195
        self.ctl = ctl
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   196
        self.key = key
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   197
        self.msg = msg
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   198
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   199
    def __enter__(self):
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   200
        return self
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   201
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   202
    def __exit__(self, type, value, traceback):
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   203
        if type is not None:
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   204
            if issubclass(type, (KeyboardInterrupt, SystemExit)):
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   205
                return # re-raise
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   206
            if self.ctl.catcherrors:
4173
cfd5d3270f99 msg isn't defined there, but we've to give traceback information to record error
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4152
diff changeset
   207
                self.ctl.record_error(self.key, None, type, value, traceback)
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   208
                return True # silent
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   209
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   210
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   211
# base sanitizing/coercing functions ###########################################
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   212
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   213
def optional(value):
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   214
    """checker to filter optional field
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   215
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   216
    If value is undefined (ex: empty string), return None that will
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   217
    break the checkers validation chain
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   218
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   219
    General use is to add 'optional' check in first condition to avoid
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   220
    ValueError by further checkers
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   221
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   222
    >>> MAPPER = [(u'value', 'value', (optional, int))]
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   223
    >>> row = {'value': u'XXX'}
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   224
    >>> mk_entity(row, MAPPER)
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   225
    {'value': None}
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   226
    >>> row = {'value': u'100'}
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   227
    >>> mk_entity(row, MAPPER)
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   228
    {'value': 100}
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   229
    """
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   230
    if value:
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   231
        return value
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   232
    return None
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   233
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   234
def required(value):
6136
79da6f969b15 [dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 6133
diff changeset
   235
    """raise ValueError if value is empty
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   236
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   237
    This check should be often found in last position in the chain.
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   238
    """
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   239
    if value:
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   240
        return value
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   241
    raise ValueError("required")
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   242
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   243
def todatetime(format='%d/%m/%Y'):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   244
    """return a transformation function to turn string input value into a
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   245
    `datetime.datetime` instance, using given format.
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   246
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   247
    Follow it by `todate` or `totime` functions from `logilab.common.date` if
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   248
    you want a `date`/`time` instance instead of `datetime`.
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   249
    """
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   250
    def coerce(value):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   251
        return strptime(value, format)
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   252
    return coerce
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   253
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   254
def call_transform_method(methodname, *args, **kwargs):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   255
    """return value returned by calling the given method on input"""
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   256
    def coerce(value):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   257
        return getattr(value, methodname)(*args, **kwargs)
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   258
    return coerce
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   259
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   260
def call_check_method(methodname, *args, **kwargs):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   261
    """check value returned by calling the given method on input is true,
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   262
    else raise ValueError
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   263
    """
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   264
    def check(value):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   265
        if getattr(value, methodname)(*args, **kwargs):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   266
            return value
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   267
        raise ValueError('%s not verified on %r' % (methodname, value))
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   268
    return check
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   269
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   270
# base integrity checking functions ############################################
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   271
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   272
def check_doubles(buckets):
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   273
    """Extract the keys that have more than one item in their bucket."""
4721
8f63691ccb7f pylint style fixes
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4613
diff changeset
   274
    return [(k, len(v)) for k, v in buckets.items() if len(v) > 1]
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   275
4136
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   276
def check_doubles_not_none(buckets):
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   277
    """Extract the keys that have more than one item in their bucket."""
4721
8f63691ccb7f pylint style fixes
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4613
diff changeset
   278
    return [(k, len(v)) for k, v in buckets.items()
8f63691ccb7f pylint style fixes
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4613
diff changeset
   279
            if k is not None and len(v) > 1]
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   280
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   281
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   282
# object stores #################################################################
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   283
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   284
class ObjectStore(object):
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   285
    """Store objects in memory for *faster* validation (development mode)
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   286
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   287
    But it will not enforce the constraints of the schema and hence will miss some problems
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   288
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   289
    >>> store = ObjectStore()
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   290
    >>> user = {'login': 'johndoe'}
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   291
    >>> store.add('CWUser', user)
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   292
    >>> group = {'name': 'unknown'}
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   293
    >>> store.add('CWUser', group)
3003
2944ee420dca R [dataimport] rename uid to eid
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 2974
diff changeset
   294
    >>> store.relate(user['eid'], 'in_group', group['eid'])
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   295
    """
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   296
    def __init__(self):
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   297
        self.items = []
3003
2944ee420dca R [dataimport] rename uid to eid
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 2974
diff changeset
   298
        self.eids = {}
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   299
        self.types = {}
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   300
        self.relations = set()
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   301
        self.indexes = {}
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   302
        self._rql = None
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   303
        self._commit = None
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   304
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   305
    def _put(self, type, item):
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   306
        self.items.append(item)
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   307
        return len(self.items) - 1
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   308
6990
353ad06867a8 [dataimport] implement create_entity() on ObjectStore to provide a consistent interface
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 6989
diff changeset
   309
    def create_entity(self, etype, **data):
353ad06867a8 [dataimport] implement create_entity() on ObjectStore to provide a consistent interface
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 6989
diff changeset
   310
        data['eid'] =  eid = self._put(etype, data)
353ad06867a8 [dataimport] implement create_entity() on ObjectStore to provide a consistent interface
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 6989
diff changeset
   311
        self.eids[eid] = data
353ad06867a8 [dataimport] implement create_entity() on ObjectStore to provide a consistent interface
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 6989
diff changeset
   312
        self.types.setdefault(etype, []).append(eid)
353ad06867a8 [dataimport] implement create_entity() on ObjectStore to provide a consistent interface
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 6989
diff changeset
   313
        return data
353ad06867a8 [dataimport] implement create_entity() on ObjectStore to provide a consistent interface
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 6989
diff changeset
   314
353ad06867a8 [dataimport] implement create_entity() on ObjectStore to provide a consistent interface
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 6989
diff changeset
   315
    @deprecated("[3.11] add is deprecated, use create_entity instead")
353ad06867a8 [dataimport] implement create_entity() on ObjectStore to provide a consistent interface
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 6989
diff changeset
   316
    def add(self, etype, item):
3486
ea6bf6f9ba0c [cwctl] improve dialog messages
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 3318
diff changeset
   317
        assert isinstance(item, dict), 'item is not a dict but a %s' % type(item)
6990
353ad06867a8 [dataimport] implement create_entity() on ObjectStore to provide a consistent interface
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 6989
diff changeset
   318
        data = self.create_entity(etype, **item)
353ad06867a8 [dataimport] implement create_entity() on ObjectStore to provide a consistent interface
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 6989
diff changeset
   319
        item['eid'] = data['eid']
353ad06867a8 [dataimport] implement create_entity() on ObjectStore to provide a consistent interface
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 6989
diff changeset
   320
        return item
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   321
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   322
    def relate(self, eid_from, rtype, eid_to, inlined=False):
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   323
        """Add new relation"""
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   324
        relation = eid_from, rtype, eid_to
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   325
        self.relations.add(relation)
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   326
        return relation
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   327
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   328
    def commit(self):
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   329
        """this commit method do nothing by default
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   330
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   331
        This is voluntary to use the frequent autocommit feature in CubicWeb
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   332
        when you are using hooks or another
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   333
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   334
        If you want override commit method, please set it by the
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   335
        constructor
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   336
        """
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   337
        pass
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   338
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   339
    def rql(self, *args):
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   340
        if self._rql is not None:
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   341
            return self._rql(*args)
7033
ddc1b4d80dbd [dataimport] remove eid_key deprecation warning and prints
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 6990
diff changeset
   342
        return []
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   343
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   344
    @property
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   345
    def nb_inserted_entities(self):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   346
        return len(self.eids)
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   347
    @property
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   348
    def nb_inserted_types(self):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   349
        return len(self.types)
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   350
    @property
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   351
    def nb_inserted_relations(self):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   352
        return len(self.relations)
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   353
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   354
    @deprecated("[3.7] index support will disappear")
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   355
    def build_index(self, name, type, func=None, can_be_empty=False):
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   356
        """build internal index for further search"""
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   357
        index = {}
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   358
        if func is None or not callable(func):
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   359
            func = lambda x: x['eid']
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   360
        for eid in self.types[type]:
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   361
            index.setdefault(func(self.eids[eid]), []).append(eid)
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   362
        if not can_be_empty:
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   363
            assert index, "new index '%s' cannot be empty" % name
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   364
        self.indexes[name] = index
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   365
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   366
    @deprecated("[3.7] index support will disappear")
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   367
    def build_rqlindex(self, name, type, key, rql, rql_params=False,
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   368
                       func=None, can_be_empty=False):
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   369
        """build an index by rql query
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   370
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   371
        rql should return eid in first column
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   372
        ctl.store.build_index('index_name', 'users', 'login', 'Any U WHERE U is CWUser')
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   373
        """
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   374
        self.types[type] = []
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   375
        rset = self.rql(rql, rql_params or {})
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   376
        if not can_be_empty:
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   377
            assert rset, "new index type '%s' cannot be empty (0 record found)" % type
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   378
        for entity in rset.entities():
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   379
            getattr(entity, key) # autopopulate entity with key attribute
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   380
            self.eids[entity.eid] = dict(entity)
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   381
            if entity.eid not in self.types[type]:
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   382
                self.types[type].append(entity.eid)
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   383
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   384
        # Build index with specified key
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   385
        func = lambda x: x[key]
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   386
        self.build_index(name, type, func, can_be_empty=can_be_empty)
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   387
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   388
    @deprecated("[3.7] index support will disappear")
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   389
    def fetch(self, name, key, unique=False, decorator=None):
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   390
        """index fetcher method
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   391
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   392
        decorator is a callable method or an iterator of callable methods (usually a lambda function)
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   393
        decorator=lambda x: x[:1] (first value is returned)
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   394
        decorator=lambda x: x.lower (lowercased value is returned)
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   395
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   396
        decorator is handy when you want to improve index keys but without
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   397
        changing the original field
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   398
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   399
        Same check functions can be reused here.
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   400
        """
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   401
        eids = self.indexes[name].get(key, [])
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   402
        if decorator is not None:
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   403
            if not hasattr(decorator, '__iter__'):
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   404
                decorator = (decorator,)
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   405
            for f in decorator:
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   406
                eids = f(eids)
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   407
        if unique:
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   408
            assert len(eids) == 1, u'expected a single one value for key "%s" in index "%s". Got %i' % (key, name, len(eids))
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   409
            eids = eids[0]
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   410
        return eids
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   411
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   412
    @deprecated("[3.7] index support will disappear")
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   413
    def find(self, type, key, value):
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   414
        for idx in self.types[type]:
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   415
            item = self.items[idx]
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   416
            if item[key] == value:
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   417
                yield item
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   418
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   419
    @deprecated("[3.7] checkpoint() deprecated. use commit() instead")
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   420
    def checkpoint(self):
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   421
        self.commit()
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   422
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   423
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   424
class RQLObjectStore(ObjectStore):
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   425
    """ObjectStore that works with an actual RQL repository (production mode)"""
4136
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   426
    _rql = None # bw compat
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   427
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   428
    def __init__(self, session=None, commit=None):
4136
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   429
        ObjectStore.__init__(self)
6122
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
   430
        if session is None:
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
   431
            sys.exit('please provide a session of run this script with cubicweb-ctl shell and pass cnx as session')
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
   432
        if not hasattr(session, 'set_pool'):
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
   433
            # connection
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
   434
            cnx = session
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
   435
            session = session.request()
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
   436
            session.set_pool = lambda : None
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
   437
            commit = commit or cnx.commit
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
   438
        else:
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
   439
            session.set_pool()
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
   440
        self.session = session
4d2b04b32cdc improvements in dataimport.py
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 5557
diff changeset
   441
        self._commit = commit or session.commit
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   442
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   443
    @deprecated("[3.7] checkpoint() deprecated. use commit() instead")
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   444
    def checkpoint(self):
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   445
        self.commit()
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   446
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   447
    def commit(self):
5063
2a94b61837e1 [dataimport] stop disabling undo ; commit return transaction id
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5054
diff changeset
   448
        txuuid = self._commit()
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   449
        self.session.set_pool()
5063
2a94b61837e1 [dataimport] stop disabling undo ; commit return transaction id
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5054
diff changeset
   450
        return txuuid
4136
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   451
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   452
    def rql(self, *args):
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   453
        if self._rql is not None:
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   454
            return self._rql(*args)
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   455
        return self.session.execute(*args)
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   456
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   457
    def create_entity(self, *args, **kwargs):
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   458
        entity = self.session.create_entity(*args, **kwargs)
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   459
        self.eids[entity.eid] = entity
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   460
        self.types.setdefault(args[0], []).append(entity.eid)
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   461
        return entity
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   462
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   463
    def _put(self, type, item):
6989
4a999a647f52 [dataimport] make it possible to insert 'empty' (i.e. no-attrs) entities
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 6492
diff changeset
   464
        query = 'INSERT %s X' % type
4a999a647f52 [dataimport] make it possible to insert 'empty' (i.e. no-attrs) entities
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 6492
diff changeset
   465
        if item:
4a999a647f52 [dataimport] make it possible to insert 'empty' (i.e. no-attrs) entities
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 6492
diff changeset
   466
            query += ': ' + ', '.join('X %s %%(%s)s' % (k, k)
4a999a647f52 [dataimport] make it possible to insert 'empty' (i.e. no-attrs) entities
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 6492
diff changeset
   467
                                      for k in item)
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   468
        return self.rql(query, item)[0][0]
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   469
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   470
    def relate(self, eid_from, rtype, eid_to, inlined=False):
4721
8f63691ccb7f pylint style fixes
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4613
diff changeset
   471
        eid_from, rtype, eid_to = super(RQLObjectStore, self).relate(
8f63691ccb7f pylint style fixes
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4613
diff changeset
   472
            eid_from, rtype, eid_to)
4136
47060a66c97f dataimport refactoring / improvments, keeping bw compat (for now)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3486
diff changeset
   473
        self.rql('SET X %s Y WHERE X eid %%(x)s, Y eid %%(y)s' % rtype,
7033
ddc1b4d80dbd [dataimport] remove eid_key deprecation warning and prints
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 6990
diff changeset
   474
                 {'x': int(eid_from), 'y': int(eid_to)})
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   475
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   476
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   477
# the import controller ########################################################
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   478
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   479
class CWImportController(object):
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   480
    """Controller of the data import process.
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   481
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   482
    >>> ctl = CWImportController(store)
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   483
    >>> ctl.generators = list_of_data_generators
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   484
    >>> ctl.data = dict_of_data_tables
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   485
    >>> ctl.run()
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   486
    """
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   487
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   488
    def __init__(self, store, askerror=0, catcherrors=None, tell=tell,
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   489
                 commitevery=50):
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   490
        self.store = store
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   491
        self.generators = None
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   492
        self.data = {}
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   493
        self.errors = None
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   494
        self.askerror = askerror
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   495
        if  catcherrors is None:
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   496
            catcherrors = askerror
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   497
        self.catcherrors = catcherrors
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   498
        self.commitevery = commitevery # set to None to do a single commit
3029
bc573d5fb5b7 F [devtools] by default dataimport prints message on stdout
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 3003
diff changeset
   499
        self._tell = tell
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   500
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   501
    def check(self, type, key, value):
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   502
        self._checks.setdefault(type, {}).setdefault(key, []).append(value)
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   503
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   504
    def check_map(self, entity, key, map, default):
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   505
        try:
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   506
            entity[key] = map[entity[key]]
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   507
        except KeyError:
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   508
            self.check(key, entity[key], None)
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   509
            entity[key] = default
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   510
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   511
    def record_error(self, key, msg=None, type=None, value=None, tb=None):
4186
ca7e526b07b6 import cleanup, check data file exists
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4173
diff changeset
   512
        tmp = StringIO()
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   513
        if type is None:
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   514
            traceback.print_exc(file=tmp)
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   515
        else:
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   516
            traceback.print_exception(type, value, tb, file=tmp)
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   517
        # use a list to avoid counting a <nb lines> errors instead of one
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   518
        errorlog = self.errors.setdefault(key, [])
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   519
        if msg is None:
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   520
            errorlog.append(tmp.getvalue().splitlines())
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   521
        else:
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   522
            errorlog.append( (msg, tmp.getvalue().splitlines()) )
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   523
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   524
    def run(self):
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   525
        self.errors = {}
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   526
        for func, checks in self.generators:
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   527
            self._checks = {}
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   528
            func_name = func.__name__
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   529
            self.tell("Run import function '%s'..." % func_name)
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   530
            try:
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   531
                func(self)
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   532
            except:
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   533
                if self.catcherrors:
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   534
                    self.record_error(func_name, 'While calling %s' % func.__name__)
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   535
                else:
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   536
                    self._print_stats()
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   537
                    raise
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   538
            for key, func, title, help in checks:
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   539
                buckets = self._checks.get(key)
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   540
                if buckets:
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   541
                    err = func(buckets)
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   542
                    if err:
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   543
                        self.errors[title] = (help, err)
5097
60a237638f57 [dataimport] print transaction id when we get one
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5066
diff changeset
   544
        txuuid = self.store.commit()
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   545
        self._print_stats()
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   546
        if self.errors:
4721
8f63691ccb7f pylint style fixes
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4613
diff changeset
   547
            if self.askerror == 2 or (self.askerror and confirm('Display errors ?')):
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   548
                from pprint import pformat
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   549
                for errkey, error in self.errors.items():
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   550
                    self.tell("\n%s (%s): %d\n" % (error[0], errkey, len(error[1])))
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   551
                    self.tell(pformat(sorted(error[1])))
5097
60a237638f57 [dataimport] print transaction id when we get one
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5066
diff changeset
   552
        if txuuid is not None:
60a237638f57 [dataimport] print transaction id when we get one
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5066
diff changeset
   553
            print 'transaction id:', txuuid
4912
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   554
    def _print_stats(self):
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   555
        nberrors = sum(len(err[1]) for err in self.errors.values())
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   556
        self.tell('\nImport statistics: %i entities, %i types, %i relations and %i errors'
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   557
                  % (self.store.nb_inserted_entities,
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   558
                     self.store.nb_inserted_types,
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   559
                     self.store.nb_inserted_relations,
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   560
                     nberrors))
9767cc516b4f [R] dataimport: changes
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4847
diff changeset
   561
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   562
    def get_data(self, key):
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   563
        return self.data.get(key)
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   564
4527
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   565
    def index(self, name, key, value, unique=False):
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   566
        """create a new index
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   567
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   568
        If unique is set to True, only first occurence will be kept not the following ones
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   569
        """
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   570
        if unique:
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   571
            try:
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   572
                if value in self.store.indexes[name][key]:
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   573
                    return
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   574
            except KeyError:
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   575
                # we're sure that one is the first occurence; so continue...
67ab70e98488 [R] devtools: improve default data import mechanism
Julien Jehannet <julien.jehannet@logilab.fr>
parents: 4252
diff changeset
   576
                pass
2974
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   577
        self.store.indexes.setdefault(name, {}).setdefault(key, []).append(value)
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   578
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   579
    def tell(self, msg):
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   580
        self._tell(msg)
3dfe497e5afa F tools to import data
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents:
diff changeset
   581
4152
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   582
    def iter_and_commit(self, datakey):
30fd1229137d new catch_error context manager, nicer controller __init__ and new iter_and_commit(datakey) method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4140
diff changeset
   583
        """iter rows, triggering commit every self.commitevery iterations"""
6136
79da6f969b15 [dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 6133
diff changeset
   584
        if self.commitevery is None:
79da6f969b15 [dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 6133
diff changeset
   585
            return self.get_data(datakey)
79da6f969b15 [dataimport] refactor commitevery to gain readability
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 6133
diff changeset
   586
        else:
6169
55378e1bab1b fix order of parameters in call to callfunc_every
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 6136
diff changeset
   587
            return callfunc_every(self.store.commit,
55378e1bab1b fix order of parameters in call to callfunc_every
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 6136
diff changeset
   588
                                  self.commitevery,
55378e1bab1b fix order of parameters in call to callfunc_every
Alexandre Fayolle <alexandre.fayolle@logilab.fr>
parents: 6136
diff changeset
   589
                                  self.get_data(datakey))
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   590
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   591
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   592
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   593
from datetime import datetime
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   594
from cubicweb.schema import META_RTYPES, VIRTUAL_RTYPES
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   595
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   596
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   597
class NoHookRQLObjectStore(RQLObjectStore):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   598
    """ObjectStore that works with an actual RQL repository (production mode)"""
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   599
    _rql = None # bw compat
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   600
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   601
    def __init__(self, session, metagen=None, baseurl=None):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   602
        super(NoHookRQLObjectStore, self).__init__(session)
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   603
        self.source = session.repo.system_source
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   604
        self.rschema = session.repo.schema.rschema
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   605
        self.add_relation = self.source.add_relation
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   606
        if metagen is None:
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   607
            metagen = MetaGenerator(session, baseurl)
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   608
        self.metagen = metagen
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   609
        self._nb_inserted_entities = 0
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   610
        self._nb_inserted_types = 0
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   611
        self._nb_inserted_relations = 0
5054
cb066d29166a fix dataimport for 3.7.2
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4913
diff changeset
   612
        self.rql = session.execute
cb066d29166a fix dataimport for 3.7.2
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4913
diff changeset
   613
        # deactivate security
cb066d29166a fix dataimport for 3.7.2
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4913
diff changeset
   614
        session.set_read_security(False)
cb066d29166a fix dataimport for 3.7.2
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4913
diff changeset
   615
        session.set_write_security(False)
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   616
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   617
    def create_entity(self, etype, **kwargs):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   618
        for k, v in kwargs.iteritems():
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   619
            kwargs[k] = getattr(v, 'eid', v)
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   620
        entity, rels = self.metagen.base_etype_dicts(etype)
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   621
        entity = copy(entity)
5557
1a534c596bff [entity] continue cleanup of Entity/AnyEntity namespace
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5424
diff changeset
   622
        entity.cw_clear_relation_cache()
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   623
        self.metagen.init_entity(entity)
6142
8bc6eac1fac1 [session] cleanup hook / operation / entity edition api
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 6122
diff changeset
   624
        entity.cw_edited.update(kwargs, skipsec=False)
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   625
        session = self.session
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   626
        self.source.add_entity(session, entity)
5054
cb066d29166a fix dataimport for 3.7.2
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4913
diff changeset
   627
        self.source.add_info(session, entity, self.source, None, complete=False)
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   628
        for rtype, targeteids in rels.iteritems():
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   629
            # targeteids may be a single eid or a list of eids
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   630
            inlined = self.rschema(rtype).inlined
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   631
            try:
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   632
                for targeteid in targeteids:
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   633
                    self.add_relation(session, entity.eid, rtype, targeteid,
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   634
                                      inlined)
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   635
            except TypeError:
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   636
                self.add_relation(session, entity.eid, rtype, targeteids,
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   637
                                  inlined)
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   638
        self._nb_inserted_entities += 1
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   639
        return entity
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   640
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   641
    def relate(self, eid_from, rtype, eid_to):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   642
        assert not rtype.startswith('reverse_')
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   643
        self.add_relation(self.session, eid_from, rtype, eid_to,
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   644
                          self.rschema(rtype).inlined)
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   645
        self._nb_inserted_relations += 1
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   646
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   647
    @property
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   648
    def nb_inserted_entities(self):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   649
        return self._nb_inserted_entities
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   650
    @property
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   651
    def nb_inserted_types(self):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   652
        return self._nb_inserted_types
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   653
    @property
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   654
    def nb_inserted_relations(self):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   655
        return self._nb_inserted_relations
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   656
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   657
    def _put(self, type, item):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   658
        raise RuntimeError('use create entity')
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   659
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   660
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   661
class MetaGenerator(object):
6427
c8a5ac2d1eaa [schema / sources] store data sources as cubicweb entities
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 6173
diff changeset
   662
    META_RELATIONS = (META_RTYPES
c8a5ac2d1eaa [schema / sources] store data sources as cubicweb entities
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 6173
diff changeset
   663
                      - VIRTUAL_RTYPES
c8a5ac2d1eaa [schema / sources] store data sources as cubicweb entities
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 6173
diff changeset
   664
                      - set(('eid', 'cwuri',
c8a5ac2d1eaa [schema / sources] store data sources as cubicweb entities
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 6173
diff changeset
   665
                             'is', 'is_instance_of', 'cw_source')))
c8a5ac2d1eaa [schema / sources] store data sources as cubicweb entities
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 6173
diff changeset
   666
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   667
    def __init__(self, session, baseurl=None):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   668
        self.session = session
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   669
        self.source = session.repo.system_source
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   670
        self.time = datetime.now()
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   671
        if baseurl is None:
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   672
            config = session.vreg.config
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   673
            baseurl = config['base-url'] or config.default_base_url()
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   674
        if not baseurl[-1] == '/':
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   675
            baseurl += '/'
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   676
        self.baseurl =  baseurl
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   677
        # attributes/relations shared by all entities of the same type
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   678
        self.etype_attrs = []
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   679
        self.etype_rels = []
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   680
        # attributes/relations specific to each entity
5054
cb066d29166a fix dataimport for 3.7.2
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4913
diff changeset
   681
        self.entity_attrs = ['cwuri']
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   682
        #self.entity_rels = [] XXX not handled (YAGNI?)
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   683
        schema = session.vreg.schema
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   684
        rschema = schema.rschema
6427
c8a5ac2d1eaa [schema / sources] store data sources as cubicweb entities
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 6173
diff changeset
   685
        for rtype in self.META_RELATIONS:
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   686
            if rschema(rtype).final:
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   687
                self.etype_attrs.append(rtype)
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   688
            else:
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   689
                self.etype_rels.append(rtype)
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   690
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   691
    @cached
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   692
    def base_etype_dicts(self, etype):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   693
        entity = self.session.vreg['etypes'].etype_class(etype)(self.session)
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   694
        # entity are "surface" copied, avoid shared dict between copies
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   695
        del entity.cw_extra_kwargs
6142
8bc6eac1fac1 [session] cleanup hook / operation / entity edition api
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 6122
diff changeset
   696
        entity.cw_edited = EditedEntity(entity)
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   697
        for attr in self.etype_attrs:
6990
353ad06867a8 [dataimport] implement create_entity() on ObjectStore to provide a consistent interface
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 6989
diff changeset
   698
            entity.cw_edited.edited_attribute(attr, self.generate(entity, attr))
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   699
        rels = {}
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   700
        for rel in self.etype_rels:
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   701
            rels[rel] = self.generate(entity, rel)
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   702
        return entity, rels
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   703
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   704
    def init_entity(self, entity):
5054
cb066d29166a fix dataimport for 3.7.2
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4913
diff changeset
   705
        entity.eid = self.source.create_eid(self.session)
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   706
        for attr in self.entity_attrs:
6990
353ad06867a8 [dataimport] implement create_entity() on ObjectStore to provide a consistent interface
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 6989
diff changeset
   707
            entity.cw_edited.edited_attribute(attr, self.generate(entity, attr))
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   708
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   709
    def generate(self, entity, rtype):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   710
        return getattr(self, 'gen_%s' % rtype)(entity)
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   711
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   712
    def gen_cwuri(self, entity):
5054
cb066d29166a fix dataimport for 3.7.2
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4913
diff changeset
   713
        return u'%seid/%s' % (self.baseurl, entity.eid)
4818
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   714
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   715
    def gen_creation_date(self, entity):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   716
        return self.time
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   717
    def gen_modification_date(self, entity):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   718
        return self.time
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   719
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   720
    def gen_created_by(self, entity):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   721
        return self.session.user.eid
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   722
    def gen_owned_by(self, entity):
9f9bfbcdecfd le patch massiveimport a été importé
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4734
diff changeset
   723
        return self.session.user.eid