cubicweb/dataimport/massive_store.py
author Sylvain Thénault <sylvain.thenault@logilab.fr>
Thu, 28 Jan 2016 15:13:39 +0100
changeset 11319 fe90d07f3afa
parent 11318 09731bd52887
child 11320 78da04c853dc
permissions -rw-r--r--
[dataimport] test for a value is in a set and insertion in a set should live together also inline some methods on the way, in the hope to make the public api clearer at some point, and only put valuable values in those sets so we don't have to endlessly test for consistency.
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
     1
# coding: utf-8
11305
118d83e65ca8 [dataimport] remove useless assignment in massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11057
diff changeset
     2
# copyright 2015-2016 LOGILAB S.A. (Paris, FRANCE), all rights reserved.
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
     3
# contact http://www.logilab.fr/ -- mailto:contact@logilab.fr
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
     4
#
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
     5
# This file is part of CubicWeb.
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
     6
#
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
     7
# CubicWeb is free software: you can redistribute it and/or modify it under the
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
     8
# terms of the GNU Lesser General Public License as published by the Free
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
     9
# Software Foundation, either version 2.1 of the License, or (at your option)
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    10
# any later version.
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    11
#
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    12
# CubicWeb is distributed in the hope that it will be useful, but WITHOUT ANY
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    13
# WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    14
# A PARTICULAR PURPOSE.  See the GNU Lesser General Public License for more
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    15
# details.
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    16
#
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    17
# You should have received a copy of the GNU Lesser General Public License along
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    18
# with CubicWeb.  If not, see <http://www.gnu.org/licenses/>.
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    19
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    20
import logging
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    21
from datetime import datetime
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    22
from collections import defaultdict
10854
f437787d8849 [dataimport] import StringIO from io
Julien Cristau <julien.cristau@logilab.fr>
parents: 10853
diff changeset
    23
from io import StringIO
11316
36c7cd362fc7 [dataimport] add a .schema shortcut attribute on the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11315
diff changeset
    24
from itertools import chain
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    25
10859
375a8232e61c [dataimport] import range from six.moves
Julien Cristau <julien.cristau@logilab.fr>
parents: 10856
diff changeset
    26
from six.moves import range
375a8232e61c [dataimport] import range from six.moves
Julien Cristau <julien.cristau@logilab.fr>
parents: 10856
diff changeset
    27
11308
df75fe529ba8 [dataimport] Prefer now(pytz.utc) to utcnow
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11305
diff changeset
    28
import pytz
df75fe529ba8 [dataimport] Prefer now(pytz.utc) to utcnow
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11305
diff changeset
    29
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    30
from yams.constraints import SizeConstraint
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    31
11313
682b15eb2dd2 [dataimport] flake8
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11312
diff changeset
    32
from cubicweb.schema import PURE_VIRTUAL_RTYPES
11020
c8c8f6a6147f [dataimport] massive_store: drop all constraints in __init__ and restore all in finish
Samuel Trégouët <samuel.tregouet@logilab.fr>
parents: 11019
diff changeset
    33
from cubicweb.server.schema2sql import rschema_has_table
11313
682b15eb2dd2 [dataimport] flake8
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11312
diff changeset
    34
from cubicweb.server.sqlutils import SQL_PREFIX
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    35
from cubicweb.dataimport import stores, pgstore
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    36
from cubicweb.utils import make_uid
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    37
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    38
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    39
class MassiveObjectStore(stores.RQLObjectStore):
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    40
    """
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    41
    Store for massive import of data, with delayed insertion of meta data.
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    42
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    43
    WARNINGS:
11315
ad826d81e88e [dataimport] rework massive store's __init__
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11314
diff changeset
    44
ad826d81e88e [dataimport] rework massive store's __init__
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11314
diff changeset
    45
    - This store may only be used with PostgreSQL for now, as it relies
ad826d81e88e [dataimport] rework massive store's __init__
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11314
diff changeset
    46
      on the COPY FROM method, and on specific PostgreSQL tables to get all
ad826d81e88e [dataimport] rework massive store's __init__
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11314
diff changeset
    47
      the indexes.
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    48
11315
ad826d81e88e [dataimport] rework massive store's __init__
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11314
diff changeset
    49
    - This store can only insert relations that are not inlined (i.e.,
ad826d81e88e [dataimport] rework massive store's __init__
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11314
diff changeset
    50
      which do *not* have inlined=True in their definition in the schema), unless they are
ad826d81e88e [dataimport] rework massive store's __init__
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11314
diff changeset
    51
      specified as entity attributes.
ad826d81e88e [dataimport] rework massive store's __init__
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11314
diff changeset
    52
ad826d81e88e [dataimport] rework massive store's __init__
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11314
diff changeset
    53
    It should be used as follows:
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    54
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    55
       store = MassiveObjectStore(cnx)
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    56
       store.init_rtype_table('Person', 'lives_in', 'Location')
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    57
       ...
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    58
10882
634cc266e48f [dataimport] fix doc string to stop referring to the old API
Julien Cristau <julien.cristau@logilab.fr>
parents: 10881
diff changeset
    59
       store.prepare_insert_entity('Person', subj_iid_attribute=person_iid, ...)
634cc266e48f [dataimport] fix doc string to stop referring to the old API
Julien Cristau <julien.cristau@logilab.fr>
parents: 10881
diff changeset
    60
       store.prepare_insert_entity('Location', obj_iid_attribute=location_iid, ...)
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    61
       ...
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    62
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    63
       # subj_iid_attribute and obj_iid_attribute are argument names
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    64
       # chosen by the user (e.g. "cwuri"). These names can be identical.
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    65
       # person_iid and location_iid are unique IDs and depend on the data
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    66
       # (e.g URI).
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    67
       store.flush()
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    68
       store.relate_by_iid(person_iid, 'lives_in', location_iid)
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    69
       # For example:
10882
634cc266e48f [dataimport] fix doc string to stop referring to the old API
Julien Cristau <julien.cristau@logilab.fr>
parents: 10881
diff changeset
    70
       store.prepare_insert_entity('Person',
634cc266e48f [dataimport] fix doc string to stop referring to the old API
Julien Cristau <julien.cristau@logilab.fr>
parents: 10881
diff changeset
    71
                                   cwuri='http://dbpedia.org/toto',
634cc266e48f [dataimport] fix doc string to stop referring to the old API
Julien Cristau <julien.cristau@logilab.fr>
parents: 10881
diff changeset
    72
                                   name='Toto')
634cc266e48f [dataimport] fix doc string to stop referring to the old API
Julien Cristau <julien.cristau@logilab.fr>
parents: 10881
diff changeset
    73
       store.prepare_insert_entity('Location',
634cc266e48f [dataimport] fix doc string to stop referring to the old API
Julien Cristau <julien.cristau@logilab.fr>
parents: 10881
diff changeset
    74
                                   uri='http://geonames.org/11111',
634cc266e48f [dataimport] fix doc string to stop referring to the old API
Julien Cristau <julien.cristau@logilab.fr>
parents: 10881
diff changeset
    75
                                   name='Somewhere')
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    76
       store.flush()
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    77
       store.relate_by_iid('http://dbpedia.org/toto',
10882
634cc266e48f [dataimport] fix doc string to stop referring to the old API
Julien Cristau <julien.cristau@logilab.fr>
parents: 10881
diff changeset
    78
                           'lives_in',
634cc266e48f [dataimport] fix doc string to stop referring to the old API
Julien Cristau <julien.cristau@logilab.fr>
parents: 10881
diff changeset
    79
                           'http://geonames.org/11111')
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    80
       # Finally
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    81
       store.convert_relations('Person', 'lives_in', 'Location',
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    82
                               'subj_iid_attribute', 'obj_iid_attribute')
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    83
       # For the previous example:
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    84
       store.convert_relations('Person', 'lives_in', 'Location', 'cwuri', 'uri')
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    85
       ...
10882
634cc266e48f [dataimport] fix doc string to stop referring to the old API
Julien Cristau <julien.cristau@logilab.fr>
parents: 10881
diff changeset
    86
       store.commit()
634cc266e48f [dataimport] fix doc string to stop referring to the old API
Julien Cristau <julien.cristau@logilab.fr>
parents: 10881
diff changeset
    87
       store.finish()
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    88
    """
10873
0611466ce367 [dataimport] turn iid_maxsize into a class attribute
Julien Cristau <julien.cristau@logilab.fr>
parents: 10872
diff changeset
    89
    # max size of the iid, used to create the iid_eid conversion table
0611466ce367 [dataimport] turn iid_maxsize into a class attribute
Julien Cristau <julien.cristau@logilab.fr>
parents: 10872
diff changeset
    90
    iid_maxsize = 1024
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    91
10875
75d1b2d66f18 [dataimport] remove autoflush_metadata from MassiveObjectStore parameters
Julien Cristau <julien.cristau@logilab.fr>
parents: 10874
diff changeset
    92
    def __init__(self, cnx,
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    93
                 on_commit_callback=None, on_rollback_callback=None,
10865
2537df9fdd27 [dataimport] drop no more used parameter on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10864
diff changeset
    94
                 slave_mode=False,
11028
66f94d7f9ca7 [dataimport] make eids_seq_range as massive store instance attribute again
Julien Cristau <julien.cristau@logilab.fr>
parents: 11027
diff changeset
    95
                 source=None,
66f94d7f9ca7 [dataimport] make eids_seq_range as massive store instance attribute again
Julien Cristau <julien.cristau@logilab.fr>
parents: 11027
diff changeset
    96
                 eids_seq_range=10000):
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    97
        """ Create a MassiveObject store, with the following attributes:
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    98
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    99
        - cnx: CubicWeb cnx
11028
66f94d7f9ca7 [dataimport] make eids_seq_range as massive store instance attribute again
Julien Cristau <julien.cristau@logilab.fr>
parents: 11027
diff changeset
   100
        - eids_seq_range: size of eid range reserved by the store for each batch
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   101
        """
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   102
        super(MassiveObjectStore, self).__init__(cnx)
11315
ad826d81e88e [dataimport] rework massive store's __init__
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11314
diff changeset
   103
        self.on_commit_callback = on_commit_callback
ad826d81e88e [dataimport] rework massive store's __init__
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11314
diff changeset
   104
        self.on_rollback_callback = on_rollback_callback
ad826d81e88e [dataimport] rework massive store's __init__
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11314
diff changeset
   105
        self.slave_mode = slave_mode
11028
66f94d7f9ca7 [dataimport] make eids_seq_range as massive store instance attribute again
Julien Cristau <julien.cristau@logilab.fr>
parents: 11027
diff changeset
   106
        self.eids_seq_range = eids_seq_range
10879
3193d9ede8dd [dataimport] drop extra indirection through MassiveObjectStore._initialized dict
Julien Cristau <julien.cristau@logilab.fr>
parents: 10878
diff changeset
   107
11315
ad826d81e88e [dataimport] rework massive store's __init__
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11314
diff changeset
   108
        self.logger = logging.getLogger('dataimport.massive_store')
ad826d81e88e [dataimport] rework massive store's __init__
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11314
diff changeset
   109
        self.sql = cnx.system_sql
11316
36c7cd362fc7 [dataimport] add a .schema shortcut attribute on the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11315
diff changeset
   110
        self.schema = self._cnx.vreg.schema
36c7cd362fc7 [dataimport] add a .schema shortcut attribute on the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11315
diff changeset
   111
        self.default_values = get_default_values(self.schema)
11315
ad826d81e88e [dataimport] rework massive store's __init__
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11314
diff changeset
   112
        self.get_next_eid = lambda g=self._get_eid_gen(): next(g)
ad826d81e88e [dataimport] rework massive store's __init__
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11314
diff changeset
   113
        self._dbh = PGHelper(cnx)
ad826d81e88e [dataimport] rework massive store's __init__
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11314
diff changeset
   114
ad826d81e88e [dataimport] rework massive store's __init__
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11314
diff changeset
   115
        cnx.read_security = False
ad826d81e88e [dataimport] rework massive store's __init__
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11314
diff changeset
   116
        cnx.write_security = False
ad826d81e88e [dataimport] rework massive store's __init__
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11314
diff changeset
   117
ad826d81e88e [dataimport] rework massive store's __init__
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11314
diff changeset
   118
        self._data_uri_relations = defaultdict(list)
ad826d81e88e [dataimport] rework massive store's __init__
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11314
diff changeset
   119
        self._data_entities = defaultdict(list)
ad826d81e88e [dataimport] rework massive store's __init__
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11314
diff changeset
   120
        self._data_relations = defaultdict(list)
10879
3193d9ede8dd [dataimport] drop extra indirection through MassiveObjectStore._initialized dict
Julien Cristau <julien.cristau@logilab.fr>
parents: 10878
diff changeset
   121
        # etypes for which we have a uri_eid_%(etype)s table
3193d9ede8dd [dataimport] drop extra indirection through MassiveObjectStore._initialized dict
Julien Cristau <julien.cristau@logilab.fr>
parents: 10878
diff changeset
   122
        self._init_uri_eid = set()
3193d9ede8dd [dataimport] drop extra indirection through MassiveObjectStore._initialized dict
Julien Cristau <julien.cristau@logilab.fr>
parents: 10878
diff changeset
   123
        # etypes for which we have a uri_eid_%(e)s_idx index
3193d9ede8dd [dataimport] drop extra indirection through MassiveObjectStore._initialized dict
Julien Cristau <julien.cristau@logilab.fr>
parents: 10878
diff changeset
   124
        self._uri_eid_inserted = set()
3193d9ede8dd [dataimport] drop extra indirection through MassiveObjectStore._initialized dict
Julien Cristau <julien.cristau@logilab.fr>
parents: 10878
diff changeset
   125
        # set of rtypes for which we have a %(rtype)s_relation_iid_tmp table
3193d9ede8dd [dataimport] drop extra indirection through MassiveObjectStore._initialized dict
Julien Cristau <julien.cristau@logilab.fr>
parents: 10878
diff changeset
   126
        self._uri_rtypes = set()
11318
09731bd52887 [dataimport] use a single _initialized set instead of _entities / _rtypes
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11317
diff changeset
   127
        # set of etypes/rtypes whose tables are created
09731bd52887 [dataimport] use a single _initialized set instead of _entities / _rtypes
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11317
diff changeset
   128
        self._initialized = set()
10879
3193d9ede8dd [dataimport] drop extra indirection through MassiveObjectStore._initialized dict
Julien Cristau <julien.cristau@logilab.fr>
parents: 10878
diff changeset
   129
11308
df75fe529ba8 [dataimport] Prefer now(pytz.utc) to utcnow
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11305
diff changeset
   130
        self._now = datetime.now(pytz.utc)
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   131
        self._default_cwuri = make_uid('_auto_generated')
11020
c8c8f6a6147f [dataimport] massive_store: drop all constraints in __init__ and restore all in finish
Samuel Trégouët <samuel.tregouet@logilab.fr>
parents: 11019
diff changeset
   132
10869
575982c948a9 [dataimport] remove drop_index parameter from massive store
Julien Cristau <julien.cristau@logilab.fr>
parents: 10867
diff changeset
   133
        if not self.slave_mode:
11315
ad826d81e88e [dataimport] rework massive store's __init__
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11314
diff changeset
   134
            # drop constraint and metadata table, they will be recreated when self.finish() is
ad826d81e88e [dataimport] rework massive store's __init__
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11314
diff changeset
   135
            # called
11020
c8c8f6a6147f [dataimport] massive_store: drop all constraints in __init__ and restore all in finish
Samuel Trégouët <samuel.tregouet@logilab.fr>
parents: 11019
diff changeset
   136
            self._drop_all_constraints()
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   137
            self._drop_metatables_constraints()
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   138
        if source is None:
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   139
            source = cnx.repo.system_source
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   140
        self.source = source
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   141
11313
682b15eb2dd2 [dataimport] flake8
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11312
diff changeset
   142
    # INIT FUNCTIONS ########################################################
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   143
11020
c8c8f6a6147f [dataimport] massive_store: drop all constraints in __init__ and restore all in finish
Samuel Trégouët <samuel.tregouet@logilab.fr>
parents: 11019
diff changeset
   144
    def _drop_all_constraints(self):
11316
36c7cd362fc7 [dataimport] add a .schema shortcut attribute on the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11315
diff changeset
   145
        etypes_tables = ('cw_%s' % eschema.type.lower() for eschema in self.schema.entities()
36c7cd362fc7 [dataimport] add a .schema shortcut attribute on the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11315
diff changeset
   146
                         if not eschema.final)
36c7cd362fc7 [dataimport] add a .schema shortcut attribute on the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11315
diff changeset
   147
        rtypes_tables = ('%s_relation' % rschema.type.lower() for rschema in self.schema.relations()
36c7cd362fc7 [dataimport] add a .schema shortcut attribute on the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11315
diff changeset
   148
                         if rschema_has_table(rschema, skip_relations=PURE_VIRTUAL_RTYPES))
36c7cd362fc7 [dataimport] add a .schema shortcut attribute on the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11315
diff changeset
   149
        for tablename in chain(etypes_tables, rtypes_tables, ('entities',)):
11020
c8c8f6a6147f [dataimport] massive_store: drop all constraints in __init__ and restore all in finish
Samuel Trégouët <samuel.tregouet@logilab.fr>
parents: 11019
diff changeset
   150
            self._store_and_drop_constraints(tablename)
c8c8f6a6147f [dataimport] massive_store: drop all constraints in __init__ and restore all in finish
Samuel Trégouët <samuel.tregouet@logilab.fr>
parents: 11019
diff changeset
   151
c8c8f6a6147f [dataimport] massive_store: drop all constraints in __init__ and restore all in finish
Samuel Trégouët <samuel.tregouet@logilab.fr>
parents: 11019
diff changeset
   152
    def _store_and_drop_constraints(self, tablename):
11311
fd45fc498c1b [dataimport] use IF EXISTS when possible
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11310
diff changeset
   153
        # Create a table to save the constraints, it allows reloading even after crash
fd45fc498c1b [dataimport] use IF EXISTS when possible
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11310
diff changeset
   154
        self.sql('CREATE TABLE IF NOT EXISTS cwmassive_constraints'
fd45fc498c1b [dataimport] use IF EXISTS when possible
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11310
diff changeset
   155
                 '(origtable text, query text, type varchar(256))')
11314
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   156
        constraints = self._dbh.table_constraints(tablename)
11020
c8c8f6a6147f [dataimport] massive_store: drop all constraints in __init__ and restore all in finish
Samuel Trégouët <samuel.tregouet@logilab.fr>
parents: 11019
diff changeset
   157
        for name, query in constraints.items():
11312
3a83759854ee [dataimport] enhance a bit sql queries readability
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11311
diff changeset
   158
            self.sql('INSERT INTO cwmassive_constraints VALUES (%(e)s, %(c)s, %(t)s)',
3a83759854ee [dataimport] enhance a bit sql queries readability
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11311
diff changeset
   159
                     {'e': tablename, 'c': query, 't': 'constraint'})
3a83759854ee [dataimport] enhance a bit sql queries readability
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11311
diff changeset
   160
            self.sql('ALTER TABLE %s DROP CONSTRAINT %s' % (tablename, name))
11020
c8c8f6a6147f [dataimport] massive_store: drop all constraints in __init__ and restore all in finish
Samuel Trégouët <samuel.tregouet@logilab.fr>
parents: 11019
diff changeset
   161
c8c8f6a6147f [dataimport] massive_store: drop all constraints in __init__ and restore all in finish
Samuel Trégouët <samuel.tregouet@logilab.fr>
parents: 11019
diff changeset
   162
    def reapply_all_constraints(self):
c8c8f6a6147f [dataimport] massive_store: drop all constraints in __init__ and restore all in finish
Samuel Trégouët <samuel.tregouet@logilab.fr>
parents: 11019
diff changeset
   163
        if not self._dbh.table_exists('cwmassive_constraints'):
c8c8f6a6147f [dataimport] massive_store: drop all constraints in __init__ and restore all in finish
Samuel Trégouët <samuel.tregouet@logilab.fr>
parents: 11019
diff changeset
   164
            self.logger.info('The table cwmassive_constraints does not exist')
c8c8f6a6147f [dataimport] massive_store: drop all constraints in __init__ and restore all in finish
Samuel Trégouët <samuel.tregouet@logilab.fr>
parents: 11019
diff changeset
   165
            return
11312
3a83759854ee [dataimport] enhance a bit sql queries readability
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11311
diff changeset
   166
        cu = self.sql("SELECT query FROM cwmassive_constraints WHERE type='constraint'")
3a83759854ee [dataimport] enhance a bit sql queries readability
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11311
diff changeset
   167
        for query, in cu.fetchall():
11020
c8c8f6a6147f [dataimport] massive_store: drop all constraints in __init__ and restore all in finish
Samuel Trégouët <samuel.tregouet@logilab.fr>
parents: 11019
diff changeset
   168
            self.sql(query)
11312
3a83759854ee [dataimport] enhance a bit sql queries readability
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11311
diff changeset
   169
            self.sql("DELETE FROM cwmassive_constraints WHERE type='constraint' AND query=%(q)s",
3a83759854ee [dataimport] enhance a bit sql queries readability
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11311
diff changeset
   170
                     {'q': query})
11020
c8c8f6a6147f [dataimport] massive_store: drop all constraints in __init__ and restore all in finish
Samuel Trégouët <samuel.tregouet@logilab.fr>
parents: 11019
diff changeset
   171
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   172
    def init_rtype_table(self, etype_from, rtype, etype_to):
10879
3193d9ede8dd [dataimport] drop extra indirection through MassiveObjectStore._initialized dict
Julien Cristau <julien.cristau@logilab.fr>
parents: 10878
diff changeset
   173
        """ Build temporary table for standard rtype """
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   174
        # Create an uri_eid table for each etype for a better
10879
3193d9ede8dd [dataimport] drop extra indirection through MassiveObjectStore._initialized dict
Julien Cristau <julien.cristau@logilab.fr>
parents: 10878
diff changeset
   175
        # control of which etype is concerned by a particular
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   176
        # possibly multivalued relation.
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   177
        for etype in (etype_from, etype_to):
10879
3193d9ede8dd [dataimport] drop extra indirection through MassiveObjectStore._initialized dict
Julien Cristau <julien.cristau@logilab.fr>
parents: 10878
diff changeset
   178
            if etype and etype not in self._init_uri_eid:
11319
fe90d07f3afa [dataimport] test for a value is in a set and insertion in a set should live together
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11318
diff changeset
   179
                self._init_uri_eid.add(etype)
fe90d07f3afa [dataimport] test for a value is in a set and insertion in a set should live together
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11318
diff changeset
   180
                self.sql('CREATE TABLE IF NOT EXISTS uri_eid_%(e)s'
fe90d07f3afa [dataimport] test for a value is in a set and insertion in a set should live together
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11318
diff changeset
   181
                         '(uri character varying(%(size)s), eid integer)'
fe90d07f3afa [dataimport] test for a value is in a set and insertion in a set should live together
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11318
diff changeset
   182
                         % {'e': etype.lower(), 'size': self.iid_maxsize})
10879
3193d9ede8dd [dataimport] drop extra indirection through MassiveObjectStore._initialized dict
Julien Cristau <julien.cristau@logilab.fr>
parents: 10878
diff changeset
   183
        if rtype not in self._uri_rtypes:
3193d9ede8dd [dataimport] drop extra indirection through MassiveObjectStore._initialized dict
Julien Cristau <julien.cristau@logilab.fr>
parents: 10878
diff changeset
   184
            # Create the temporary table
11316
36c7cd362fc7 [dataimport] add a .schema shortcut attribute on the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11315
diff changeset
   185
            if not self.schema.rschema(rtype).inlined:
11311
fd45fc498c1b [dataimport] use IF EXISTS when possible
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11310
diff changeset
   186
                self.sql('CREATE TABLE IF NOT EXISTS %(r)s_relation_iid_tmp'
fd45fc498c1b [dataimport] use IF EXISTS when possible
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11310
diff changeset
   187
                         '(uri_from character varying(%(s)s), uri_to character varying(%(s)s))'
fd45fc498c1b [dataimport] use IF EXISTS when possible
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11310
diff changeset
   188
                         % {'r': rtype, 's': self.iid_maxsize})
11319
fe90d07f3afa [dataimport] test for a value is in a set and insertion in a set should live together
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11318
diff changeset
   189
                self._uri_rtypes.add(rtype)
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   190
            else:
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   191
                self.logger.warning("inlined relation %s: cannot insert it", rtype)
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   192
11313
682b15eb2dd2 [dataimport] flake8
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11312
diff changeset
   193
    # RELATE FUNCTION #######################################################
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   194
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   195
    def relate_by_iid(self, iid_from, rtype, iid_to):
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   196
        """Add new relation based on the internal id (iid)
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   197
        of the entities (not the eid)"""
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   198
        # Push data
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   199
        if isinstance(iid_from, unicode):
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   200
            iid_from = iid_from.encode('utf-8')
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   201
        if isinstance(iid_to, unicode):
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   202
            iid_to = iid_to.encode('utf-8')
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   203
        self._data_uri_relations[rtype].append({'uri_from': iid_from, 'uri_to': iid_to})
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   204
11313
682b15eb2dd2 [dataimport] flake8
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11312
diff changeset
   205
    # FLUSH FUNCTIONS #######################################################
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   206
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   207
    def flush_relations(self):
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   208
        """ Flush the relations data
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   209
        """
10856
b839167d99a4 [dataimport] dict.iteritems() → dict.items()
Julien Cristau <julien.cristau@logilab.fr>
parents: 10855
diff changeset
   210
        for rtype, data in self._data_uri_relations.items():
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   211
            if not data:
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   212
                self.logger.info('No data for rtype %s', rtype)
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   213
            buf = StringIO('\n'.join(['%(uri_from)s\t%(uri_to)s' % d for d in data]))
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   214
            if not buf:
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   215
                self.logger.info('Empty Buffer for rtype %s', rtype)
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   216
                continue
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   217
            cursor = self._cnx.cnxset.cu
11316
36c7cd362fc7 [dataimport] add a .schema shortcut attribute on the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11315
diff changeset
   218
            if not self.schema.rschema(rtype).inlined:
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   219
                cursor.copy_from(buf, '%s_relation_iid_tmp' % rtype.lower(),
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   220
                                 null='NULL', columns=('uri_from', 'uri_to'))
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   221
            else:
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   222
                self.logger.warning("inlined relation %s: cannot insert it", rtype)
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   223
            buf.close()
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   224
            # Clear data cache
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   225
            self._data_uri_relations[rtype] = []
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   226
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   227
    def fill_uri_eid_table(self, etype, uri_label):
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   228
        """ Fill the uri_eid table
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   229
        """
11319
fe90d07f3afa [dataimport] test for a value is in a set and insertion in a set should live together
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11318
diff changeset
   230
        if etype not in self._uri_eid_inserted:
fe90d07f3afa [dataimport] test for a value is in a set and insertion in a set should live together
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11318
diff changeset
   231
            self._uri_eid_inserted.add(etype)
fe90d07f3afa [dataimport] test for a value is in a set and insertion in a set should live together
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11318
diff changeset
   232
            self.logger.info('Fill uri_eid for etype %s', etype)
fe90d07f3afa [dataimport] test for a value is in a set and insertion in a set should live together
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11318
diff changeset
   233
            self.sql('INSERT INTO uri_eid_%(e)s SELECT cw_%(l)s, cw_eid FROM cw_%(e)s'
fe90d07f3afa [dataimport] test for a value is in a set and insertion in a set should live together
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11318
diff changeset
   234
                     % {'l': uri_label, 'e': etype.lower()})
fe90d07f3afa [dataimport] test for a value is in a set and insertion in a set should live together
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11318
diff changeset
   235
            self.sql('CREATE INDEX uri_eid_%(e)s_idx ON uri_eid_%(e)s(uri)'
fe90d07f3afa [dataimport] test for a value is in a set and insertion in a set should live together
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11318
diff changeset
   236
                     % {'e': etype.lower()})
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   237
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   238
    def convert_relations(self, etype_from, rtype, etype_to,
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   239
                          uri_label_from='cwuri', uri_label_to='cwuri'):
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   240
        """ Flush the converted relations
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   241
        """
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   242
        # Always flush relations to be sure
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   243
        self.logger.info('Convert relations %s %s %s', etype_from, rtype, etype_to)
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   244
        self.flush_relations()
11319
fe90d07f3afa [dataimport] test for a value is in a set and insertion in a set should live together
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11318
diff changeset
   245
        if uri_label_from:
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   246
            self.fill_uri_eid_table(etype_from, uri_label_from)
11319
fe90d07f3afa [dataimport] test for a value is in a set and insertion in a set should live together
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11318
diff changeset
   247
        if uri_label_to:
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   248
            self.fill_uri_eid_table(etype_to, uri_label_to)
11316
36c7cd362fc7 [dataimport] add a .schema shortcut attribute on the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11315
diff changeset
   249
        if self.schema.rschema(rtype).inlined:
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   250
            self.logger.warning("Can't insert inlined relation %s", rtype)
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   251
            return
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   252
        if uri_label_from and uri_label_to:
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   253
            sql = '''INSERT INTO %(r)s_relation (eid_from, eid_to) SELECT DISTINCT O1.eid, O2.eid
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   254
            FROM %(r)s_relation_iid_tmp AS T, uri_eid_%(ef)s as O1, uri_eid_%(et)s as O2
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   255
            WHERE O1.uri=T.uri_from AND O2.uri=T.uri_to AND NOT EXISTS (
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   256
            SELECT 1 FROM %(r)s_relation AS TT WHERE TT.eid_from=O1.eid AND TT.eid_to=O2.eid);
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   257
            '''
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   258
        elif uri_label_to:
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   259
            sql = '''INSERT INTO %(r)s_relation (eid_from, eid_to) SELECT DISTINCT
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   260
            CAST(T.uri_from AS INTEGER), O1.eid
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   261
            FROM %(r)s_relation_iid_tmp AS T, uri_eid_%(et)s as O1
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   262
            WHERE O1.uri=T.uri_to AND NOT EXISTS (
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   263
            SELECT 1 FROM %(r)s_relation AS TT WHERE
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   264
            TT.eid_from=CAST(T.uri_from AS INTEGER) AND TT.eid_to=O1.eid);
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   265
            '''
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   266
        elif uri_label_from:
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   267
            sql = '''INSERT INTO %(r)s_relation (eid_from, eid_to) SELECT DISTINCT O1.eid, T.uri_to
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   268
            O1.eid, CAST(T.uri_to AS INTEGER)
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   269
            FROM %(r)s_relation_iid_tmp AS T, uri_eid_%(ef)s as O1
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   270
            WHERE O1.uri=T.uri_from AND NOT EXISTS (
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   271
            SELECT 1 FROM %(r)s_relation AS TT WHERE
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   272
            TT.eid_from=O1.eid AND TT.eid_to=CAST(T.uri_to AS INTEGER));
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   273
            '''
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   274
        try:
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   275
            self.sql(sql % {'r': rtype.lower(),
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   276
                            'et': etype_to.lower() if etype_to else u'',
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   277
                            'ef': etype_from.lower() if etype_from else u''})
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   278
        except Exception as ex:
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   279
            self.logger.error("Can't insert relation %s: %s", rtype, ex)
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   280
11313
682b15eb2dd2 [dataimport] flake8
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11312
diff changeset
   281
    # SQL UTILITIES #########################################################
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   282
11022
1032d7956b11 [dataimport] massive_store: do not drop constraints multiple times
Samuel Trégouët <samuel.tregouet@logilab.fr>
parents: 11021
diff changeset
   283
    def drop_and_store_indexes(self, tablename):
11311
fd45fc498c1b [dataimport] use IF EXISTS when possible
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11310
diff changeset
   284
        """Drop indexes and constraints"""
fd45fc498c1b [dataimport] use IF EXISTS when possible
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11310
diff changeset
   285
        # Create a table to save the constraints, it allows reloading even after crash
fd45fc498c1b [dataimport] use IF EXISTS when possible
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11310
diff changeset
   286
        self.sql('CREATE TABLE IF NOT EXISTS cwmassive_constraints'
fd45fc498c1b [dataimport] use IF EXISTS when possible
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11310
diff changeset
   287
                 '(origtable text, query text, type varchar(256))')
11022
1032d7956b11 [dataimport] massive_store: do not drop constraints multiple times
Samuel Trégouët <samuel.tregouet@logilab.fr>
parents: 11021
diff changeset
   288
        self._drop_table_indexes(tablename)
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   289
11022
1032d7956b11 [dataimport] massive_store: do not drop constraints multiple times
Samuel Trégouët <samuel.tregouet@logilab.fr>
parents: 11021
diff changeset
   290
    def _drop_table_indexes(self, tablename):
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   291
        """ Drop and store table constraints and indexes """
11314
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   292
        indexes = self._dbh.table_indexes(tablename)
10856
b839167d99a4 [dataimport] dict.iteritems() → dict.items()
Julien Cristau <julien.cristau@logilab.fr>
parents: 10855
diff changeset
   293
        for name, query in indexes.items():
11312
3a83759854ee [dataimport] enhance a bit sql queries readability
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11311
diff changeset
   294
            self.sql('INSERT INTO cwmassive_constraints VALUES (%(e)s, %(c)s, %(t)s)',
3a83759854ee [dataimport] enhance a bit sql queries readability
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11311
diff changeset
   295
                     {'e': tablename, 'c': query, 't': 'index'})
3a83759854ee [dataimport] enhance a bit sql queries readability
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11311
diff changeset
   296
            self.sql('DROP INDEX %s' % name)
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   297
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   298
    def reapply_constraint_index(self, tablename):
10878
fda5e42037a9 [dataimport] remove remaining references to dataio from MassiveObjectStore
Julien Cristau <julien.cristau@logilab.fr>
parents: 10877
diff changeset
   299
        if not self._dbh.table_exists('cwmassive_constraints'):
fda5e42037a9 [dataimport] remove remaining references to dataio from MassiveObjectStore
Julien Cristau <julien.cristau@logilab.fr>
parents: 10877
diff changeset
   300
            self.logger.info('The table cwmassive_constraints does not exist')
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   301
            return
11312
3a83759854ee [dataimport] enhance a bit sql queries readability
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11311
diff changeset
   302
        cu = self.sql('SELECT query FROM cwmassive_constraints WHERE origtable = %(e)s',
3a83759854ee [dataimport] enhance a bit sql queries readability
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11311
diff changeset
   303
                      {'e': tablename})
3a83759854ee [dataimport] enhance a bit sql queries readability
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11311
diff changeset
   304
        for query, in cu.fetchall():
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   305
            self.sql(query)
10878
fda5e42037a9 [dataimport] remove remaining references to dataio from MassiveObjectStore
Julien Cristau <julien.cristau@logilab.fr>
parents: 10877
diff changeset
   306
            self.sql('DELETE FROM cwmassive_constraints WHERE origtable = %(e)s '
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   307
                     'AND query = %(q)s', {'e': tablename, 'q': query})
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   308
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   309
    def _drop_metatables_constraints(self):
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   310
        """ Drop all the constraints for the meta data"""
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   311
        for tablename in ('created_by_relation', 'owned_by_relation',
10861
6205b89c3af6 [massive store] identity_relation should not be considered, is_relation seems missing
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10859
diff changeset
   312
                          'is_instance_of_relation', 'is_relation',
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   313
                          'entities'):
11022
1032d7956b11 [dataimport] massive_store: do not drop constraints multiple times
Samuel Trégouët <samuel.tregouet@logilab.fr>
parents: 11021
diff changeset
   314
            self.drop_and_store_indexes(tablename)
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   315
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   316
    def _create_metatables_constraints(self):
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   317
        """ Create all the constraints for the meta data"""
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   318
        for tablename in ('entities',
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   319
                          'created_by_relation', 'owned_by_relation',
10861
6205b89c3af6 [massive store] identity_relation should not be considered, is_relation seems missing
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10859
diff changeset
   320
                          'is_instance_of_relation', 'is_relation'):
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   321
            # Indexes and constraints
10869
575982c948a9 [dataimport] remove drop_index parameter from massive store
Julien Cristau <julien.cristau@logilab.fr>
parents: 10867
diff changeset
   322
            self.reapply_constraint_index(tablename)
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   323
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   324
    def init_relation_table(self, rtype):
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   325
        """ Get and remove all indexes for performance sake """
11318
09731bd52887 [dataimport] use a single _initialized set instead of _entities / _rtypes
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11317
diff changeset
   326
        if not self.slave_mode and rtype not in self._initialized:
09731bd52887 [dataimport] use a single _initialized set instead of _entities / _rtypes
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11317
diff changeset
   327
            self._initialized.add(rtype)
11312
3a83759854ee [dataimport] enhance a bit sql queries readability
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11311
diff changeset
   328
            self.sql('CREATE TABLE %s_relation_tmp (eid_from integer, eid_to integer)'
3a83759854ee [dataimport] enhance a bit sql queries readability
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11311
diff changeset
   329
                     % rtype.lower())
10869
575982c948a9 [dataimport] remove drop_index parameter from massive store
Julien Cristau <julien.cristau@logilab.fr>
parents: 10867
diff changeset
   330
            # Drop indexes and constraints
11318
09731bd52887 [dataimport] use a single _initialized set instead of _entities / _rtypes
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11317
diff changeset
   331
            self.drop_and_store_indexes('%s_relation' % rtype.lower())
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   332
            # Push the etype in the initialized table for easier restart
11318
09731bd52887 [dataimport] use a single _initialized set instead of _entities / _rtypes
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11317
diff changeset
   333
            self.sql('CREATE TABLE IF NOT EXISTS cwmassive_initialized'
09731bd52887 [dataimport] use a single _initialized set instead of _entities / _rtypes
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11317
diff changeset
   334
                     '(retype text, type varchar(128))')
09731bd52887 [dataimport] use a single _initialized set instead of _entities / _rtypes
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11317
diff changeset
   335
            self.sql("INSERT INTO cwmassive_initialized VALUES (%(e)s, 'rtype')", {'e': rtype})
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   336
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   337
    def init_etype_table(self, etype):
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   338
        """ Add eid sequence to a particular etype table and
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   339
        remove all indexes for performance sake """
11318
09731bd52887 [dataimport] use a single _initialized set instead of _entities / _rtypes
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11317
diff changeset
   340
        if not self.slave_mode and etype not in self._initialized:
09731bd52887 [dataimport] use a single _initialized set instead of _entities / _rtypes
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11317
diff changeset
   341
            self._initialized.add(etype)
09731bd52887 [dataimport] use a single _initialized set instead of _entities / _rtypes
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11317
diff changeset
   342
            # Drop indexes and constraints
09731bd52887 [dataimport] use a single _initialized set instead of _entities / _rtypes
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11317
diff changeset
   343
            self.drop_and_store_indexes('cw_%s' % etype.lower())
09731bd52887 [dataimport] use a single _initialized set instead of _entities / _rtypes
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11317
diff changeset
   344
            # Push the rtype in the initialized table for easier restart
09731bd52887 [dataimport] use a single _initialized set instead of _entities / _rtypes
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11317
diff changeset
   345
            self.sql('CREATE TABLE IF NOT EXISTS cwmassive_initialized'
09731bd52887 [dataimport] use a single _initialized set instead of _entities / _rtypes
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11317
diff changeset
   346
                     '(retype text, type varchar(128))')
09731bd52887 [dataimport] use a single _initialized set instead of _entities / _rtypes
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11317
diff changeset
   347
            self.sql("INSERT INTO cwmassive_initialized VALUES (%(e)s, 'etype')", {'e': etype})
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   348
11026
ce9b3886955d [dataimport] remove eids_seq_start attribute from massive store
Julien Cristau <julien.cristau@logilab.fr>
parents: 11025
diff changeset
   349
    def restart_eid_sequence(self, start_eid):
ce9b3886955d [dataimport] remove eids_seq_start attribute from massive store
Julien Cristau <julien.cristau@logilab.fr>
parents: 11025
diff changeset
   350
        self._cnx.system_sql(self._cnx.repo.system_source.dbhelper.sql_restart_numrange(
ce9b3886955d [dataimport] remove eids_seq_start attribute from massive store
Julien Cristau <julien.cristau@logilab.fr>
parents: 11025
diff changeset
   351
            'entities_id_seq', initial_value=start_eid))
ce9b3886955d [dataimport] remove eids_seq_start attribute from massive store
Julien Cristau <julien.cristau@logilab.fr>
parents: 11025
diff changeset
   352
        self._cnx.commit()
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   353
11313
682b15eb2dd2 [dataimport] flake8
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11312
diff changeset
   354
    # ENTITIES CREATION #####################################################
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   355
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   356
    def _get_eid_gen(self):
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   357
        """ Function getting the next eid. This is done by preselecting
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   358
        a given number of eids from the 'entities_id_seq', and then
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   359
        storing them"""
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   360
        while True:
10872
ff4f94cfa2fb [dataimport] turn eids_seq_{start,range} into class attributes
Julien Cristau <julien.cristau@logilab.fr>
parents: 10871
diff changeset
   361
            last_eid = self._cnx.repo.system_source.create_eid(self._cnx, self.eids_seq_range)
ff4f94cfa2fb [dataimport] turn eids_seq_{start,range} into class attributes
Julien Cristau <julien.cristau@logilab.fr>
parents: 10871
diff changeset
   362
            for eid in range(last_eid - self.eids_seq_range + 1, last_eid + 1):
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   363
                yield eid
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   364
10864
b7f4acf0473b [dataimport] methods that modify in-place shouldn't return value
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10863
diff changeset
   365
    def _apply_default_values(self, etype, kwargs):
b7f4acf0473b [dataimport] methods that modify in-place shouldn't return value
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10863
diff changeset
   366
        """Apply the default values for a given etype, attribute and value."""
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   367
        default_values = self.default_values[etype]
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   368
        missing_keys = set(default_values) - set(kwargs)
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   369
        kwargs.update((key, default_values[key]) for key in missing_keys)
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   370
10863
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   371
    # store api ################################################################
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   372
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   373
    def prepare_insert_entity(self, etype, **kwargs):
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   374
        """Given an entity type, attributes and inlined relations, returns the inserted entity's
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   375
        eid.
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   376
        """
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   377
        # Init the table if necessary
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   378
        self.init_etype_table(etype)
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   379
        # Add meta data if not given
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   380
        if 'modification_date' not in kwargs:
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   381
            kwargs['modification_date'] = self._now
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   382
        if 'creation_date' not in kwargs:
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   383
            kwargs['creation_date'] = self._now
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   384
        if 'cwuri' not in kwargs:
10874
538e17174769 [dataimport] remove uri_param_name from MassiveObjectStore parameters
Julien Cristau <julien.cristau@logilab.fr>
parents: 10873
diff changeset
   385
            kwargs['cwuri'] = self._default_cwuri + str(self._count_cwuri)
538e17174769 [dataimport] remove uri_param_name from MassiveObjectStore parameters
Julien Cristau <julien.cristau@logilab.fr>
parents: 10873
diff changeset
   386
            self._count_cwuri += 1
11027
ec5eeb08f2e8 [dataimport] stop supporting None as eids_seq_range
Julien Cristau <julien.cristau@logilab.fr>
parents: 11026
diff changeset
   387
        if 'eid' not in kwargs:
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   388
            # If eid is not given and the eids sequence is set,
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   389
            # use the value from the sequence
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   390
            kwargs['eid'] = self.get_next_eid()
10864
b7f4acf0473b [dataimport] methods that modify in-place shouldn't return value
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10863
diff changeset
   391
        self._apply_default_values(etype, kwargs)
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   392
        self._data_entities[etype].append(kwargs)
10863
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   393
        return kwargs.get('eid')
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   394
10863
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   395
    def prepare_insert_relation(self, eid_from, rtype, eid_to, **kwargs):
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   396
        """Insert into the database a  relation ``rtype`` between entities with eids ``eid_from``
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   397
        and ``eid_to``.
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   398
        """
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   399
        # Init the table if necessary
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   400
        self.init_relation_table(rtype)
10863
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   401
        self._data_relations[rtype].append({'eid_from': eid_from, 'eid_to': eid_to})
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   402
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   403
    def flush(self):
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   404
        """Flush the data"""
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   405
        self.flush_entities()
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   406
        self.flush_internal_relations()
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   407
        self.flush_relations()
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   408
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   409
    def commit(self):
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   410
        """Commit the database transaction."""
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   411
        self.on_commit()
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   412
        super(MassiveObjectStore, self).commit()
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   413
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   414
    def finish(self):
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   415
        """Remove temporary tables and columns."""
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   416
        self.logger.info("Start cleaning")
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   417
        if self.slave_mode:
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   418
            raise RuntimeError('Store cleanup is not allowed in slave mode')
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   419
        self.logger.info("Start cleaning")
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   420
        # Cleanup relations tables
10879
3193d9ede8dd [dataimport] drop extra indirection through MassiveObjectStore._initialized dict
Julien Cristau <julien.cristau@logilab.fr>
parents: 10878
diff changeset
   421
        for etype in self._init_uri_eid:
10863
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   422
            self.sql('DROP TABLE uri_eid_%s' % etype.lower())
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   423
        # Remove relations tables
10879
3193d9ede8dd [dataimport] drop extra indirection through MassiveObjectStore._initialized dict
Julien Cristau <julien.cristau@logilab.fr>
parents: 10878
diff changeset
   424
        for rtype in self._uri_rtypes:
11319
fe90d07f3afa [dataimport] test for a value is in a set and insertion in a set should live together
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11318
diff changeset
   425
            self.sql('DROP TABLE %(r)s_relation_iid_tmp' % {'r': rtype})
11020
c8c8f6a6147f [dataimport] massive_store: drop all constraints in __init__ and restore all in finish
Samuel Trégouët <samuel.tregouet@logilab.fr>
parents: 11019
diff changeset
   426
        # Create meta constraints (entities, is_instance_of, ...)
c8c8f6a6147f [dataimport] massive_store: drop all constraints in __init__ and restore all in finish
Samuel Trégouët <samuel.tregouet@logilab.fr>
parents: 11019
diff changeset
   427
        self._create_metatables_constraints()
10863
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   428
        # Get all the initialized etypes/rtypes
10878
fda5e42037a9 [dataimport] remove remaining references to dataio from MassiveObjectStore
Julien Cristau <julien.cristau@logilab.fr>
parents: 10877
diff changeset
   429
        if self._dbh.table_exists('cwmassive_initialized'):
11312
3a83759854ee [dataimport] enhance a bit sql queries readability
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11311
diff changeset
   430
            cu = self.sql('SELECT retype, type FROM cwmassive_initialized')
3a83759854ee [dataimport] enhance a bit sql queries readability
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11311
diff changeset
   431
            for retype, _type in cu.fetchall():
10863
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   432
                self.logger.info('Cleanup for %s' % retype)
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   433
                if _type == 'etype':
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   434
                    # Cleanup entities tables - Recreate indexes
11319
fe90d07f3afa [dataimport] test for a value is in a set and insertion in a set should live together
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11318
diff changeset
   435
                    self.reapply_constraint_index('cw_%s' % etype.lower())
10863
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   436
                elif _type == 'rtype':
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   437
                    # Cleanup relations tables
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   438
                    self._cleanup_relations(retype)
10878
fda5e42037a9 [dataimport] remove remaining references to dataio from MassiveObjectStore
Julien Cristau <julien.cristau@logilab.fr>
parents: 10877
diff changeset
   439
                self.sql('DELETE FROM cwmassive_initialized WHERE retype = %(e)s',
10863
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   440
                         {'e': retype})
11020
c8c8f6a6147f [dataimport] massive_store: drop all constraints in __init__ and restore all in finish
Samuel Trégouët <samuel.tregouet@logilab.fr>
parents: 11019
diff changeset
   441
        self.reapply_all_constraints()
10863
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   442
        # Delete the meta data table
10878
fda5e42037a9 [dataimport] remove remaining references to dataio from MassiveObjectStore
Julien Cristau <julien.cristau@logilab.fr>
parents: 10877
diff changeset
   443
        for table_name in ('cwmassive_initialized', 'cwmassive_constraints', 'cwmassive_metadata'):
11311
fd45fc498c1b [dataimport] use IF EXISTS when possible
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11310
diff changeset
   444
            self.sql('DROP TABLE IF EXISTS %s' % table_name)
10863
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   445
        self.commit()
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   446
11313
682b15eb2dd2 [dataimport] flake8
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11312
diff changeset
   447
    # FLUSH #################################################################
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   448
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   449
    def on_commit(self):
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   450
        if self.on_commit_callback:
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   451
            self.on_commit_callback()
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   452
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   453
    def on_rollback(self, exc, etype, data):
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   454
        if self.on_rollback_callback:
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   455
            self.on_rollback_callback(exc, etype, data)
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   456
            self._cnx.rollback()
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   457
        else:
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   458
            raise exc
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   459
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   460
    def flush_internal_relations(self):
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   461
        """ Flush the relations data
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   462
        """
10856
b839167d99a4 [dataimport] dict.iteritems() → dict.items()
Julien Cristau <julien.cristau@logilab.fr>
parents: 10855
diff changeset
   463
        for rtype, data in self._data_relations.items():
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   464
            if not data:
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   465
                # There is no data for these etype for this flush round.
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   466
                continue
10871
1d4a94d04ec6 [dataimport] remove replace_sep parameter from massive store
Julien Cristau <julien.cristau@logilab.fr>
parents: 10870
diff changeset
   467
            buf = pgstore._create_copyfrom_buffer(data, ('eid_from', 'eid_to'))
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   468
            if not buf:
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   469
                # The buffer is empty. This is probably due to error in _create_copyfrom_buffer
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   470
                raise ValueError
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   471
            cursor = self._cnx.cnxset.cu
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   472
            # Push into the tmp table
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   473
            cursor.copy_from(buf, '%s_relation_tmp' % rtype.lower(),
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   474
                             null='NULL', columns=('eid_from', 'eid_to'))
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   475
            # Clear data cache
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   476
            self._data_relations[rtype] = []
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   477
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   478
    def flush_entities(self):
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   479
        """ Flush the entities data
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   480
        """
10856
b839167d99a4 [dataimport] dict.iteritems() → dict.items()
Julien Cristau <julien.cristau@logilab.fr>
parents: 10855
diff changeset
   481
        for etype, data in self._data_entities.items():
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   482
            if not data:
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   483
                # There is no data for these etype for this flush round.
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   484
                continue
11305
118d83e65ca8 [dataimport] remove useless assignment in massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11057
diff changeset
   485
            # XXX It may be interresting to directly infer the columns' names from the schema
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   486
            # XXX For now, the _create_copyfrom_buffer does a "row[column]"
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   487
            # which can lead to a key error.
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   488
            # Thus we should create dictionary with all the keys.
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   489
            columns = set()
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   490
            for d in data:
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   491
                columns.update(d.keys())
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   492
            _data = []
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   493
            _base_data = dict.fromkeys(columns)
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   494
            for d in data:
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   495
                _d = _base_data.copy()
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   496
                _d.update(d)
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   497
                _data.append(_d)
10871
1d4a94d04ec6 [dataimport] remove replace_sep parameter from massive store
Julien Cristau <julien.cristau@logilab.fr>
parents: 10870
diff changeset
   498
            buf = pgstore._create_copyfrom_buffer(_data, columns)
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   499
            if not buf:
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   500
                # The buffer is empty. This is probably due to error in _create_copyfrom_buffer
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   501
                raise ValueError('Error in buffer creation for etype %s' % etype)
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   502
            columns = ['cw_%s' % attr for attr in columns]
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   503
            cursor = self._cnx.cnxset.cu
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   504
            try:
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   505
                cursor.copy_from(buf, 'cw_%s' % etype.lower(), null='NULL', columns=columns)
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   506
            except Exception as exc:
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   507
                self.on_rollback(exc, etype, data)
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   508
            # Clear data cache
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   509
            self._data_entities[etype] = []
11025
5413ab08617d [dataimport] massive store in slave mode shouldn't flush metadata
Julien Cristau <julien.cristau@logilab.fr>
parents: 11023
diff changeset
   510
        if not self.slave_mode:
5413ab08617d [dataimport] massive store in slave mode shouldn't flush metadata
Julien Cristau <julien.cristau@logilab.fr>
parents: 11023
diff changeset
   511
            self.flush_meta_data()
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   512
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   513
    def flush_meta_data(self):
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   514
        """ Flush the meta data (entities table, is_instance table, ...)
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   515
        """
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   516
        if self.slave_mode:
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   517
            raise RuntimeError('Flushing meta data is not allow in slave mode')
10878
fda5e42037a9 [dataimport] remove remaining references to dataio from MassiveObjectStore
Julien Cristau <julien.cristau@logilab.fr>
parents: 10877
diff changeset
   518
        if not self._dbh.table_exists('cwmassive_initialized'):
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   519
            self.logger.info('No information available for initialized etypes/rtypes')
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   520
            return
11311
fd45fc498c1b [dataimport] use IF EXISTS when possible
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11310
diff changeset
   521
        # Keep the correctly flush meta data in database
fd45fc498c1b [dataimport] use IF EXISTS when possible
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11310
diff changeset
   522
        self.sql('CREATE TABLE IF NOT EXISTS cwmassive_metadata (etype text)')
11312
3a83759854ee [dataimport] enhance a bit sql queries readability
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11311
diff changeset
   523
        cu = self.sql('SELECT etype FROM cwmassive_metadata')
3a83759854ee [dataimport] enhance a bit sql queries readability
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11311
diff changeset
   524
        already_flushed = set(e for e, in cu.fetchall())
3a83759854ee [dataimport] enhance a bit sql queries readability
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11311
diff changeset
   525
        cu = self.sql('SELECT retype FROM cwmassive_initialized WHERE type = %(t)s',
3a83759854ee [dataimport] enhance a bit sql queries readability
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11311
diff changeset
   526
                      {'t': 'etype'})
3a83759854ee [dataimport] enhance a bit sql queries readability
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11311
diff changeset
   527
        all_etypes = set(e for e, in cu.fetchall())
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   528
        for etype in all_etypes:
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   529
            if etype not in already_flushed:
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   530
                # Deals with meta data
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   531
                self.logger.info('Flushing meta data for %s' % etype)
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   532
                self.insert_massive_meta_data(etype)
11312
3a83759854ee [dataimport] enhance a bit sql queries readability
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11311
diff changeset
   533
                self.sql('INSERT INTO cwmassive_metadata VALUES (%(e)s)', {'e': etype})
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   534
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   535
    def _cleanup_relations(self, rtype):
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   536
        """ Cleanup rtype table """
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   537
        # Push into relation table while removing duplicate
11312
3a83759854ee [dataimport] enhance a bit sql queries readability
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11311
diff changeset
   538
        self.sql('INSERT INTO %(r)s_relation (eid_from, eid_to) SELECT DISTINCT '
3a83759854ee [dataimport] enhance a bit sql queries readability
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11311
diff changeset
   539
                 'T.eid_from, T.eid_to FROM %(r)s_relation_tmp AS T '
3a83759854ee [dataimport] enhance a bit sql queries readability
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11311
diff changeset
   540
                 'WHERE NOT EXISTS (SELECT 1 FROM %(r)s_relation AS TT WHERE '
3a83759854ee [dataimport] enhance a bit sql queries readability
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11311
diff changeset
   541
                 'TT.eid_from=T.eid_from AND TT.eid_to=T.eid_to);' % {'r': rtype})
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   542
        # Drop temporary relation table
11312
3a83759854ee [dataimport] enhance a bit sql queries readability
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11311
diff changeset
   543
        self.sql('DROP TABLE %(r)s_relation_tmp' % {'r': rtype.lower()})
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   544
        # Create indexes and constraints
10869
575982c948a9 [dataimport] remove drop_index parameter from massive store
Julien Cristau <julien.cristau@logilab.fr>
parents: 10867
diff changeset
   545
        tablename = '%s_relation' % rtype.lower()
575982c948a9 [dataimport] remove drop_index parameter from massive store
Julien Cristau <julien.cristau@logilab.fr>
parents: 10867
diff changeset
   546
        self.reapply_constraint_index(tablename)
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   547
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   548
    def insert_massive_meta_data(self, etype):
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   549
        """ Massive insertion of meta data for a given etype, based on SQL statements.
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   550
        """
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   551
        # Push data - Use coalesce to avoid NULL (and get 0), if there is no
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   552
        # entities of this type in the entities table.
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   553
        # Meta data relations
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   554
        self.metagen_push_relation(etype, self._cnx.user.eid, 'created_by_relation')
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   555
        self.metagen_push_relation(etype, self._cnx.user.eid, 'owned_by_relation')
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   556
        self.metagen_push_relation(etype, self.source.eid, 'cw_source_relation')
11317
4085a452b6b4 [dataimport] drop massive store etype to eid index
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11316
diff changeset
   557
        eschema = self.schema[etype].eid
4085a452b6b4 [dataimport] drop massive store etype to eid index
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11316
diff changeset
   558
        self.metagen_push_relation(etype, eschema.eid, 'is_relation')
4085a452b6b4 [dataimport] drop massive store etype to eid index
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11316
diff changeset
   559
        for parent_eschema in eschema.ancestors() + [eschema]:
4085a452b6b4 [dataimport] drop massive store etype to eid index
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11316
diff changeset
   560
            self.metagen_push_relation(etype, parent_eschema.eid, 'is_instance_of_relation')
11312
3a83759854ee [dataimport] enhance a bit sql queries readability
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11311
diff changeset
   561
        self.sql("INSERT INTO entities (eid, type, asource, extid) "
3a83759854ee [dataimport] enhance a bit sql queries readability
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11311
diff changeset
   562
                 "SELECT cw_eid, '%s', 'system', NULL FROM cw_%s "
3a83759854ee [dataimport] enhance a bit sql queries readability
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11311
diff changeset
   563
                 "WHERE NOT EXISTS (SELECT 1 FROM entities WHERE eid=cw_eid)"
3a83759854ee [dataimport] enhance a bit sql queries readability
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11311
diff changeset
   564
                 % (etype, etype.lower()))
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   565
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   566
    def metagen_push_relation(self, etype, eid_to, rtype):
11312
3a83759854ee [dataimport] enhance a bit sql queries readability
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11311
diff changeset
   567
        self.sql("INSERT INTO %s (eid_from, eid_to) SELECT cw_eid, %s FROM cw_%s "
3a83759854ee [dataimport] enhance a bit sql queries readability
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11311
diff changeset
   568
                 "WHERE NOT EXISTS (SELECT 1 FROM entities WHERE eid=cw_eid)"
3a83759854ee [dataimport] enhance a bit sql queries readability
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11311
diff changeset
   569
                 % (rtype, eid_to, etype.lower()))
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   570
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   571
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   572
def get_size_constraints(schema):
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   573
    """analyzes yams ``schema`` and returns the list of size constraints.
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   574
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   575
    The returned value is a dictionary mapping entity types to a
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   576
    sub-dictionnaries mapping attribute names -> max size.
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   577
    """
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   578
    size_constraints = {}
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   579
    # iterates on all entity types
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   580
    for eschema in schema.entities():
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   581
        # for each entity type, iterates on attribute definitions
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   582
        size_constraints[eschema.type] = eschema_constraints = {}
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   583
        for rschema, aschema in eschema.attribute_definitions():
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   584
            # for each attribute, if a size constraint is found,
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   585
            # append it to the size constraint list
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   586
            maxsize = None
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   587
            rdef = rschema.rdef(eschema, aschema)
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   588
            for constraint in rdef.constraints:
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   589
                if isinstance(constraint, SizeConstraint):
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   590
                    maxsize = constraint.max
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   591
                    eschema_constraints[rschema.type] = maxsize
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   592
    return size_constraints
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   593
11313
682b15eb2dd2 [dataimport] flake8
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11312
diff changeset
   594
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   595
def get_default_values(schema):
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   596
    """analyzes yams ``schema`` and returns the list of default values.
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   597
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   598
    The returned value is a dictionary mapping entity types to a
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   599
    sub-dictionnaries mapping attribute names -> default values.
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   600
    """
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   601
    default_values = {}
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   602
    # iterates on all entity types
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   603
    for eschema in schema.entities():
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   604
        # for each entity type, iterates on attribute definitions
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   605
        default_values[eschema.type] = eschema_constraints = {}
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   606
        for rschema, _ in eschema.attribute_definitions():
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   607
            # for each attribute, if a size constraint is found,
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   608
            # append it to the size constraint list
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   609
            if eschema.default(rschema.type) is not None:
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   610
                eschema_constraints[rschema.type] = eschema.default(rschema.type)
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   611
    return default_values
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   612
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   613
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   614
class PGHelper(object):
11314
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   615
    """This class provides some helper methods to manipulate a postgres database metadata (index and
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   616
    constraints).
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   617
    """
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   618
11310
e0b7277e5394 [dataimport] PGHelper should be responsible to retrieve the database schema
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11308
diff changeset
   619
    def __init__(self, cnx):
11314
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   620
        self.sql = cnx.system_sql
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   621
        # Deals with pg schema, see #3216686
11310
e0b7277e5394 [dataimport] PGHelper should be responsible to retrieve the database schema
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11308
diff changeset
   622
        pg_schema = cnx.repo.config.system_source_config.get('db-namespace') or 'public'
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   623
        self.pg_schema = pg_schema
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   624
11314
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   625
    def table_exists(self, tablename):
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   626
        """Return True if the given table already exists in the database."""
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   627
        cu = self.sql('SELECT 1 from information_schema.tables '
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   628
                      'WHERE table_name=%(t)s AND table_schema=%(s)s',
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   629
                      {'t': tablename, 's': self.pg_schema})
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   630
        return bool(cu.fetchone())
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   631
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   632
    def table_indexes_constraints(self, tablename):
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   633
        """Return one dictionary with all indexes by name, another with all constraints by name,
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   634
        for the given table.
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   635
        """
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   636
        indexes = self.table_indexes(tablename)
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   637
        constraints = self.table_constraints(tablename)
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   638
        _indexes = {}
10856
b839167d99a4 [dataimport] dict.iteritems() → dict.items()
Julien Cristau <julien.cristau@logilab.fr>
parents: 10855
diff changeset
   639
        for name, query in indexes.items():
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   640
            # Remove pkey indexes (automatically created by constraints)
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   641
            # Specific cases of primary key, see #3224079
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   642
            if name not in constraints:
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   643
                _indexes[name] = query
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   644
        return _indexes, constraints
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   645
11314
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   646
    def table_indexes(self, tablename):
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   647
        """Return a dictionary of indexes {index name: index sql}, constraints included."""
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   648
        indexes = {}
11314
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   649
        for name in self._index_names(tablename):
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   650
            indexes[name] = self._index_sql(name)
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   651
        return indexes
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   652
11314
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   653
    def table_constraints(self, tablename):
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   654
        """Return a dictionary of constraints {constraint name: constraint sql}."""
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   655
        constraints = {}
11314
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   656
        for name in self._constraint_names(tablename):
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   657
            query = self._constraint_sql(name)
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   658
            constraints[name] = 'ALTER TABLE %s ADD CONSTRAINT %s %s' % (tablename, name, query)
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   659
        return constraints
11314
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   660
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   661
    def _index_names(self, tablename):
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   662
        """Return the names of all indexes in the given table (including constraints.)"""
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   663
        cu = self.sql("SELECT c.relname FROM pg_catalog.pg_class c "
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   664
                      "JOIN pg_catalog.pg_index i ON i.indexrelid = c.oid "
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   665
                      "JOIN pg_catalog.pg_class c2 ON i.indrelid = c2.oid "
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   666
                      "LEFT JOIN pg_catalog.pg_user u ON u.usesysid = c.relowner "
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   667
                      "LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace "
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   668
                      "WHERE c.relkind IN ('i','') "
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   669
                      " AND c2.relname = %(t)s "
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   670
                      " AND i.indisprimary = FALSE "
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   671
                      " AND n.nspname NOT IN ('pg_catalog', 'pg_toast') "
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   672
                      " AND pg_catalog.pg_table_is_visible(c.oid);", {'t': tablename})
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   673
        return [name for name, in cu.fetchall()]
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   674
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   675
    def _constraint_names(self, tablename):
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   676
        """Return the names of all constraints in the given table."""
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   677
        cu = self.sql("SELECT i.conname FROM pg_catalog.pg_class c "
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   678
                      "JOIN pg_catalog.pg_constraint i ON i.conrelid = c.oid "
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   679
                      "JOIN pg_catalog.pg_class c2 ON i.conrelid=c2.oid "
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   680
                      "LEFT JOIN pg_catalog.pg_user u ON u.usesysid = c.relowner "
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   681
                      "LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace "
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   682
                      "WHERE c2.relname = %(t)s "
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   683
                      "AND n.nspname NOT IN ('pg_catalog', 'pg_toast') "
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   684
                      "AND pg_catalog.pg_table_is_visible(c.oid)", {'t': tablename})
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   685
        return [name for name, in cu.fetchall()]
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   686
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   687
    def _index_sql(self, name):
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   688
        """Return the SQL to be used to recreate the index of the given name."""
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   689
        return self.sql('SELECT pg_get_indexdef(c.oid) FROM pg_catalog.pg_class c '
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   690
                        'LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace '
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   691
                        'WHERE c.relname = %(r)s AND n.nspname=%(n)s',
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   692
                        {'r': name, 'n': self.pg_schema}).fetchone()[0]
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   693
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   694
    def _constraint_sql(self, name):
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   695
        """Return the SQL to be used to recreate the constraint."""
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   696
        return self.sql('SELECT pg_get_constraintdef(c.oid) FROM pg_catalog.pg_constraint c '
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   697
                        'LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.connamespace '
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   698
                        'WHERE c.conname = %(r)s AND n.nspname=%(n)s',
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   699
                        {'r': name, 'n': self.pg_schema}).fetchone()[0]