cubicweb/dataimport/massive_store.py
author Sylvain Thénault <sylvain.thenault@logilab.fr>
Wed, 28 Sep 2016 08:57:48 +0200
changeset 11778 9847a097266e
parent 11777 5b535fe2f364
child 11780 307d96c0ab5a
permissions -rw-r--r--
[massive store] Rework constraint/index handling The basic idea is to keep the primary constraint on entities.eid since it's heavily used in metadata insertions. Other option would have been to drop / recreate but its very costly on big database, and the index is used for insertion into the entities table itself, so it's not worth droping it at a first glance. Also, keeping it avoids to systematically drop all constraints which depends on it. We may thus now lazily drop constraints, only on insertion of some etype/rtype for the related table. Related to #15538359
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
     1
# coding: utf-8
11305
118d83e65ca8 [dataimport] remove useless assignment in massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11057
diff changeset
     2
# copyright 2015-2016 LOGILAB S.A. (Paris, FRANCE), all rights reserved.
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
     3
# contact http://www.logilab.fr/ -- mailto:contact@logilab.fr
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
     4
#
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
     5
# This file is part of CubicWeb.
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
     6
#
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
     7
# CubicWeb is free software: you can redistribute it and/or modify it under the
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
     8
# terms of the GNU Lesser General Public License as published by the Free
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
     9
# Software Foundation, either version 2.1 of the License, or (at your option)
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    10
# any later version.
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    11
#
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    12
# CubicWeb is distributed in the hope that it will be useful, but WITHOUT ANY
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    13
# WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    14
# A PARTICULAR PURPOSE.  See the GNU Lesser General Public License for more
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    15
# details.
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    16
#
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    17
# You should have received a copy of the GNU Lesser General Public License along
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    18
# with CubicWeb.  If not, see <http://www.gnu.org/licenses/>.
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    19
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    20
import logging
11326
06eeac9389a3 [dataimport] introduce usage of MetadataGenerator into the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11325
diff changeset
    21
from copy import copy
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    22
from collections import defaultdict
11316
36c7cd362fc7 [dataimport] add a .schema shortcut attribute on the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11315
diff changeset
    23
from itertools import chain
11328
9f2d7da47526 [dataimport] test and fix external source support for the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11326
diff changeset
    24
from base64 import b64encode
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    25
10859
375a8232e61c [dataimport] import range from six.moves
Julien Cristau <julien.cristau@logilab.fr>
parents: 10856
diff changeset
    26
from six.moves import range
375a8232e61c [dataimport] import range from six.moves
Julien Cristau <julien.cristau@logilab.fr>
parents: 10856
diff changeset
    27
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    28
from yams.constraints import SizeConstraint
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    29
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    30
from cubicweb.dataimport import stores, pgstore
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    31
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    32
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    33
class MassiveObjectStore(stores.RQLObjectStore):
11331
f2ff82dfcd5c [dataimport] add a bit of extra-documentation on the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11330
diff changeset
    34
    """Store for massive import of data, with delayed insertion of meta data.
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    35
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    36
    WARNINGS:
11315
ad826d81e88e [dataimport] rework massive store's __init__
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11314
diff changeset
    37
ad826d81e88e [dataimport] rework massive store's __init__
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11314
diff changeset
    38
    - This store may only be used with PostgreSQL for now, as it relies
ad826d81e88e [dataimport] rework massive store's __init__
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11314
diff changeset
    39
      on the COPY FROM method, and on specific PostgreSQL tables to get all
ad826d81e88e [dataimport] rework massive store's __init__
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11314
diff changeset
    40
      the indexes.
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    41
11315
ad826d81e88e [dataimport] rework massive store's __init__
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11314
diff changeset
    42
    - This store can only insert relations that are not inlined (i.e.,
11707
2c4518fea26f [massive store] Drop deprecated code
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11705
diff changeset
    43
      which do *not* have inlined=True in their definition in the schema),
2c4518fea26f [massive store] Drop deprecated code
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11705
diff changeset
    44
      unless they are specified as entity attributes.
11315
ad826d81e88e [dataimport] rework massive store's __init__
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11314
diff changeset
    45
ad826d81e88e [dataimport] rework massive store's __init__
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11314
diff changeset
    46
    It should be used as follows:
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    47
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    48
       store = MassiveObjectStore(cnx)
11707
2c4518fea26f [massive store] Drop deprecated code
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11705
diff changeset
    49
       eid_p = store.prepare_insert_entity('Person',
2c4518fea26f [massive store] Drop deprecated code
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11705
diff changeset
    50
                                           cwuri='http://dbpedia.org/toto',
2c4518fea26f [massive store] Drop deprecated code
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11705
diff changeset
    51
                                           name='Toto')
2c4518fea26f [massive store] Drop deprecated code
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11705
diff changeset
    52
       eid_loc = store.prepare_insert_entity('Location',
2c4518fea26f [massive store] Drop deprecated code
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11705
diff changeset
    53
                                             cwuri='http://geonames.org/11111',
2c4518fea26f [massive store] Drop deprecated code
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11705
diff changeset
    54
                                             name='Somewhere')
2c4518fea26f [massive store] Drop deprecated code
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11705
diff changeset
    55
       store.prepare_insert_relation(eid_p, 'lives_in', eid_loc)
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    56
       store.flush()
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    57
       ...
10882
634cc266e48f [dataimport] fix doc string to stop referring to the old API
Julien Cristau <julien.cristau@logilab.fr>
parents: 10881
diff changeset
    58
       store.commit()
634cc266e48f [dataimport] fix doc string to stop referring to the old API
Julien Cristau <julien.cristau@logilab.fr>
parents: 10881
diff changeset
    59
       store.finish()
11331
f2ff82dfcd5c [dataimport] add a bit of extra-documentation on the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11330
diff changeset
    60
f2ff82dfcd5c [dataimport] add a bit of extra-documentation on the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11330
diff changeset
    61
    Full-text indexation is not handled, you'll have to reindex the proper entity types by yourself
f2ff82dfcd5c [dataimport] add a bit of extra-documentation on the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11330
diff changeset
    62
    if desired.
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    63
    """
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    64
10875
75d1b2d66f18 [dataimport] remove autoflush_metadata from MassiveObjectStore parameters
Julien Cristau <julien.cristau@logilab.fr>
parents: 10874
diff changeset
    65
    def __init__(self, cnx,
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    66
                 on_commit_callback=None, on_rollback_callback=None,
10865
2537df9fdd27 [dataimport] drop no more used parameter on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10864
diff changeset
    67
                 slave_mode=False,
11326
06eeac9389a3 [dataimport] introduce usage of MetadataGenerator into the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11325
diff changeset
    68
                 eids_seq_range=10000,
06eeac9389a3 [dataimport] introduce usage of MetadataGenerator into the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11325
diff changeset
    69
                 metagen=None):
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    70
        """ Create a MassiveObject store, with the following attributes:
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    71
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    72
        - cnx: CubicWeb cnx
11028
66f94d7f9ca7 [dataimport] make eids_seq_range as massive store instance attribute again
Julien Cristau <julien.cristau@logilab.fr>
parents: 11027
diff changeset
    73
        - eids_seq_range: size of eid range reserved by the store for each batch
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    74
        """
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    75
        super(MassiveObjectStore, self).__init__(cnx)
11315
ad826d81e88e [dataimport] rework massive store's __init__
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11314
diff changeset
    76
        self.on_commit_callback = on_commit_callback
ad826d81e88e [dataimport] rework massive store's __init__
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11314
diff changeset
    77
        self.on_rollback_callback = on_rollback_callback
ad826d81e88e [dataimport] rework massive store's __init__
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11314
diff changeset
    78
        self.slave_mode = slave_mode
11028
66f94d7f9ca7 [dataimport] make eids_seq_range as massive store instance attribute again
Julien Cristau <julien.cristau@logilab.fr>
parents: 11027
diff changeset
    79
        self.eids_seq_range = eids_seq_range
11326
06eeac9389a3 [dataimport] introduce usage of MetadataGenerator into the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11325
diff changeset
    80
        if metagen is None:
06eeac9389a3 [dataimport] introduce usage of MetadataGenerator into the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11325
diff changeset
    81
            metagen = stores.MetadataGenerator(cnx)
06eeac9389a3 [dataimport] introduce usage of MetadataGenerator into the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11325
diff changeset
    82
        self.metagen = metagen
10879
3193d9ede8dd [dataimport] drop extra indirection through MassiveObjectStore._initialized dict
Julien Cristau <julien.cristau@logilab.fr>
parents: 10878
diff changeset
    83
11315
ad826d81e88e [dataimport] rework massive store's __init__
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11314
diff changeset
    84
        self.logger = logging.getLogger('dataimport.massive_store')
ad826d81e88e [dataimport] rework massive store's __init__
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11314
diff changeset
    85
        self.sql = cnx.system_sql
11316
36c7cd362fc7 [dataimport] add a .schema shortcut attribute on the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11315
diff changeset
    86
        self.schema = self._cnx.vreg.schema
36c7cd362fc7 [dataimport] add a .schema shortcut attribute on the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11315
diff changeset
    87
        self.default_values = get_default_values(self.schema)
11315
ad826d81e88e [dataimport] rework massive store's __init__
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11314
diff changeset
    88
        self.get_next_eid = lambda g=self._get_eid_gen(): next(g)
ad826d81e88e [dataimport] rework massive store's __init__
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11314
diff changeset
    89
        self._dbh = PGHelper(cnx)
ad826d81e88e [dataimport] rework massive store's __init__
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11314
diff changeset
    90
ad826d81e88e [dataimport] rework massive store's __init__
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11314
diff changeset
    91
        cnx.read_security = False
ad826d81e88e [dataimport] rework massive store's __init__
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11314
diff changeset
    92
        cnx.write_security = False
ad826d81e88e [dataimport] rework massive store's __init__
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11314
diff changeset
    93
ad826d81e88e [dataimport] rework massive store's __init__
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11314
diff changeset
    94
        self._data_entities = defaultdict(list)
ad826d81e88e [dataimport] rework massive store's __init__
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11314
diff changeset
    95
        self._data_relations = defaultdict(list)
11320
78da04c853dc [dataimport] move everything related to the "URI handling" feature of the massive store together
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11319
diff changeset
    96
        self._initialized = set()
10879
3193d9ede8dd [dataimport] drop extra indirection through MassiveObjectStore._initialized dict
Julien Cristau <julien.cristau@logilab.fr>
parents: 10878
diff changeset
    97
11777
5b535fe2f364 [massive store] Lazy removal of constraints and metadata indexes
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11774
diff changeset
    98
        self._constraints_dropped = self.slave_mode
11326
06eeac9389a3 [dataimport] introduce usage of MetadataGenerator into the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11325
diff changeset
    99
06eeac9389a3 [dataimport] introduce usage of MetadataGenerator into the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11325
diff changeset
   100
    def _get_eid_gen(self):
06eeac9389a3 [dataimport] introduce usage of MetadataGenerator into the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11325
diff changeset
   101
        """ Function getting the next eid. This is done by preselecting
06eeac9389a3 [dataimport] introduce usage of MetadataGenerator into the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11325
diff changeset
   102
        a given number of eids from the 'entities_id_seq', and then
06eeac9389a3 [dataimport] introduce usage of MetadataGenerator into the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11325
diff changeset
   103
        storing them"""
06eeac9389a3 [dataimport] introduce usage of MetadataGenerator into the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11325
diff changeset
   104
        while True:
06eeac9389a3 [dataimport] introduce usage of MetadataGenerator into the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11325
diff changeset
   105
            last_eid = self._cnx.repo.system_source.create_eid(self._cnx, self.eids_seq_range)
06eeac9389a3 [dataimport] introduce usage of MetadataGenerator into the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11325
diff changeset
   106
            for eid in range(last_eid - self.eids_seq_range + 1, last_eid + 1):
06eeac9389a3 [dataimport] introduce usage of MetadataGenerator into the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11325
diff changeset
   107
                yield eid
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   108
11322
21316020eae3 [dataimport] move cwmassive_constraint temporary table handling to the PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11321
diff changeset
   109
    # SQL utilities #########################################################
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   110
11778
9847a097266e [massive store] Rework constraint/index handling
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11777
diff changeset
   111
    def _drop_metadata_constraints_if_necessary(self):
9847a097266e [massive store] Rework constraint/index handling
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11777
diff changeset
   112
        """Drop constraints and indexes for the metadata tables if necessary."""
11777
5b535fe2f364 [massive store] Lazy removal of constraints and metadata indexes
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11774
diff changeset
   113
        if not self._constraints_dropped:
11778
9847a097266e [massive store] Rework constraint/index handling
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11777
diff changeset
   114
            self._drop_metadata_constraints()
11777
5b535fe2f364 [massive store] Lazy removal of constraints and metadata indexes
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11774
diff changeset
   115
            self._constraints_dropped = True
5b535fe2f364 [massive store] Lazy removal of constraints and metadata indexes
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11774
diff changeset
   116
11778
9847a097266e [massive store] Rework constraint/index handling
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11777
diff changeset
   117
    def _drop_metadata_constraints(self):
9847a097266e [massive store] Rework constraint/index handling
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11777
diff changeset
   118
        """Drop constraints and indexes for the metadata tables.
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   119
11778
9847a097266e [massive store] Rework constraint/index handling
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11777
diff changeset
   120
        They will be recreated by the `finish` method.
9847a097266e [massive store] Rework constraint/index handling
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11777
diff changeset
   121
        """
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   122
        for tablename in ('created_by_relation', 'owned_by_relation',
11778
9847a097266e [massive store] Rework constraint/index handling
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11777
diff changeset
   123
                          'is_instance_of_relation', 'is_relation'):
9847a097266e [massive store] Rework constraint/index handling
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11777
diff changeset
   124
             self._dbh.drop_constraints(tablename)
11322
21316020eae3 [dataimport] move cwmassive_constraint temporary table handling to the PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11321
diff changeset
   125
            self._dbh.drop_indexes(tablename)
11778
9847a097266e [massive store] Rework constraint/index handling
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11777
diff changeset
   126
        # don't drop constraints for the entities table, the only one is the primary key's index on
9847a097266e [massive store] Rework constraint/index handling
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11777
diff changeset
   127
        # eid and we want to keep it
9847a097266e [massive store] Rework constraint/index handling
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11777
diff changeset
   128
        self._dbh.drop_indexes('entities')
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   129
11026
ce9b3886955d [dataimport] remove eids_seq_start attribute from massive store
Julien Cristau <julien.cristau@logilab.fr>
parents: 11025
diff changeset
   130
    def restart_eid_sequence(self, start_eid):
11323
e9120da559f5 [dataimport] use sql shortcut
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11322
diff changeset
   131
        self.sql(self._cnx.repo.system_source.dbhelper.sql_restart_numrange(
11026
ce9b3886955d [dataimport] remove eids_seq_start attribute from massive store
Julien Cristau <julien.cristau@logilab.fr>
parents: 11025
diff changeset
   132
            'entities_id_seq', initial_value=start_eid))
ce9b3886955d [dataimport] remove eids_seq_start attribute from massive store
Julien Cristau <julien.cristau@logilab.fr>
parents: 11025
diff changeset
   133
        self._cnx.commit()
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   134
10863
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   135
    # store api ################################################################
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   136
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   137
    def prepare_insert_entity(self, etype, **kwargs):
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   138
        """Given an entity type, attributes and inlined relations, returns the inserted entity's
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   139
        eid.
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   140
        """
11321
fab543f542ac [dataimport] inline some methods of the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11320
diff changeset
   141
        if not self.slave_mode and etype not in self._initialized:
fab543f542ac [dataimport] inline some methods of the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11320
diff changeset
   142
            self._initialized.add(etype)
11778
9847a097266e [massive store] Rework constraint/index handling
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11777
diff changeset
   143
            self._drop_metadata_constraints_if_necessary()
9847a097266e [massive store] Rework constraint/index handling
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11777
diff changeset
   144
            tablename = 'cw_%s' % etype.lower()
9847a097266e [massive store] Rework constraint/index handling
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11777
diff changeset
   145
            self._dbh.drop_constraints(tablename)
9847a097266e [massive store] Rework constraint/index handling
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11777
diff changeset
   146
            self._dbh.drop_indexes(tablename)
11321
fab543f542ac [dataimport] inline some methods of the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11320
diff changeset
   147
            self.sql('CREATE TABLE IF NOT EXISTS cwmassive_initialized'
fab543f542ac [dataimport] inline some methods of the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11320
diff changeset
   148
                     '(retype text, type varchar(128))')
fab543f542ac [dataimport] inline some methods of the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11320
diff changeset
   149
            self.sql("INSERT INTO cwmassive_initialized VALUES (%(e)s, 'etype')", {'e': etype})
11326
06eeac9389a3 [dataimport] introduce usage of MetadataGenerator into the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11325
diff changeset
   150
        attrs = self.metagen.base_etype_attrs(etype)
06eeac9389a3 [dataimport] introduce usage of MetadataGenerator into the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11325
diff changeset
   151
        data = copy(attrs)  # base_etype_attrs is @cached, a copy is necessary
06eeac9389a3 [dataimport] introduce usage of MetadataGenerator into the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11325
diff changeset
   152
        data.update(kwargs)
06eeac9389a3 [dataimport] introduce usage of MetadataGenerator into the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11325
diff changeset
   153
        if 'eid' not in data:
06eeac9389a3 [dataimport] introduce usage of MetadataGenerator into the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11325
diff changeset
   154
            # If eid is not given and the eids sequence is set, use the value from the sequence
06eeac9389a3 [dataimport] introduce usage of MetadataGenerator into the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11325
diff changeset
   155
            eid = self.get_next_eid()
06eeac9389a3 [dataimport] introduce usage of MetadataGenerator into the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11325
diff changeset
   156
            data['eid'] = eid
06eeac9389a3 [dataimport] introduce usage of MetadataGenerator into the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11325
diff changeset
   157
        # XXX default values could be set once for all in base entity
06eeac9389a3 [dataimport] introduce usage of MetadataGenerator into the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11325
diff changeset
   158
        default_values = self.default_values[etype]
06eeac9389a3 [dataimport] introduce usage of MetadataGenerator into the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11325
diff changeset
   159
        missing_keys = set(default_values) - set(data)
06eeac9389a3 [dataimport] introduce usage of MetadataGenerator into the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11325
diff changeset
   160
        data.update((key, default_values[key]) for key in missing_keys)
06eeac9389a3 [dataimport] introduce usage of MetadataGenerator into the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11325
diff changeset
   161
        self.metagen.init_entity_attrs(etype, data['eid'], data)
06eeac9389a3 [dataimport] introduce usage of MetadataGenerator into the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11325
diff changeset
   162
        self._data_entities[etype].append(data)
06eeac9389a3 [dataimport] introduce usage of MetadataGenerator into the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11325
diff changeset
   163
        return data['eid']
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   164
10863
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   165
    def prepare_insert_relation(self, eid_from, rtype, eid_to, **kwargs):
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   166
        """Insert into the database a  relation ``rtype`` between entities with eids ``eid_from``
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   167
        and ``eid_to``.
11331
f2ff82dfcd5c [dataimport] add a bit of extra-documentation on the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11330
diff changeset
   168
f2ff82dfcd5c [dataimport] add a bit of extra-documentation on the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11330
diff changeset
   169
        Relation must not be inlined.
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   170
        """
11321
fab543f542ac [dataimport] inline some methods of the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11320
diff changeset
   171
        if not self.slave_mode and rtype not in self._initialized:
11331
f2ff82dfcd5c [dataimport] add a bit of extra-documentation on the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11330
diff changeset
   172
            assert not self._cnx.vreg.schema.rschema(rtype).inlined
11321
fab543f542ac [dataimport] inline some methods of the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11320
diff changeset
   173
            self._initialized.add(rtype)
11778
9847a097266e [massive store] Rework constraint/index handling
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11777
diff changeset
   174
            self._drop_metadata_constraints_if_necessary()
9847a097266e [massive store] Rework constraint/index handling
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11777
diff changeset
   175
            tablename = '%s_relation' % rtype.lower()
9847a097266e [massive store] Rework constraint/index handling
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11777
diff changeset
   176
            self._dbh.drop_constraints(tablename)
9847a097266e [massive store] Rework constraint/index handling
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11777
diff changeset
   177
            self._dbh.drop_indexes(tablename)
9847a097266e [massive store] Rework constraint/index handling
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11777
diff changeset
   178
            self.sql('CREATE TABLE %s_tmp (eid_from integer, eid_to integer)'
9847a097266e [massive store] Rework constraint/index handling
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11777
diff changeset
   179
                     % tablename)
11321
fab543f542ac [dataimport] inline some methods of the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11320
diff changeset
   180
            self.sql('CREATE TABLE IF NOT EXISTS cwmassive_initialized'
fab543f542ac [dataimport] inline some methods of the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11320
diff changeset
   181
                     '(retype text, type varchar(128))')
fab543f542ac [dataimport] inline some methods of the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11320
diff changeset
   182
            self.sql("INSERT INTO cwmassive_initialized VALUES (%(e)s, 'rtype')", {'e': rtype})
10863
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   183
        self._data_relations[rtype].append({'eid_from': eid_from, 'eid_to': eid_to})
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   184
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   185
    def flush(self):
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   186
        """Flush the data"""
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   187
        self.flush_entities()
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   188
        self.flush_relations()
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   189
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   190
    def commit(self):
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   191
        """Commit the database transaction."""
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   192
        self.on_commit()
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   193
        super(MassiveObjectStore, self).commit()
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   194
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   195
    def finish(self):
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   196
        """Remove temporary tables and columns."""
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   197
        if self.slave_mode:
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   198
            raise RuntimeError('Store cleanup is not allowed in slave mode')
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   199
        self.logger.info("Start cleaning")
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   200
        # Get all the initialized etypes/rtypes
10878
fda5e42037a9 [dataimport] remove remaining references to dataio from MassiveObjectStore
Julien Cristau <julien.cristau@logilab.fr>
parents: 10877
diff changeset
   201
        if self._dbh.table_exists('cwmassive_initialized'):
11312
3a83759854ee [dataimport] enhance a bit sql queries readability
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11311
diff changeset
   202
            cu = self.sql('SELECT retype, type FROM cwmassive_initialized')
3a83759854ee [dataimport] enhance a bit sql queries readability
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11311
diff changeset
   203
            for retype, _type in cu.fetchall():
10863
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   204
                self.logger.info('Cleanup for %s' % retype)
11774
51c160677afe [repository] Drop the entities.extid column and associated cache
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11773
diff changeset
   205
                if _type == 'rtype':
10863
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   206
                    # Cleanup relations tables
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   207
                    self._cleanup_relations(retype)
10878
fda5e42037a9 [dataimport] remove remaining references to dataio from MassiveObjectStore
Julien Cristau <julien.cristau@logilab.fr>
parents: 10877
diff changeset
   208
                self.sql('DELETE FROM cwmassive_initialized WHERE retype = %(e)s',
10863
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   209
                         {'e': retype})
11322
21316020eae3 [dataimport] move cwmassive_constraint temporary table handling to the PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11321
diff changeset
   210
        self._dbh.restore_indexes_and_constraints()
10863
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   211
        # Delete the meta data table
11329
a8cab8fb54ba [dataimport] drop massive store's flush_metadata method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11328
diff changeset
   212
        self.sql('DROP TABLE IF EXISTS cwmassive_initialized')
10863
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   213
        self.commit()
8e1f6de61300 [dataimport] implement new store API on massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10861
diff changeset
   214
11313
682b15eb2dd2 [dataimport] flake8
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11312
diff changeset
   215
    # FLUSH #################################################################
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   216
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   217
    def on_commit(self):
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   218
        if self.on_commit_callback:
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   219
            self.on_commit_callback()
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   220
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   221
    def on_rollback(self, exc, etype, data):
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   222
        if self.on_rollback_callback:
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   223
            self.on_rollback_callback(exc, etype, data)
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   224
            self._cnx.rollback()
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   225
        else:
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   226
            raise exc
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   227
11707
2c4518fea26f [massive store] Drop deprecated code
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11705
diff changeset
   228
    def flush_relations(self):
2c4518fea26f [massive store] Drop deprecated code
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11705
diff changeset
   229
        """Flush the relations data."""
10856
b839167d99a4 [dataimport] dict.iteritems() → dict.items()
Julien Cristau <julien.cristau@logilab.fr>
parents: 10855
diff changeset
   230
        for rtype, data in self._data_relations.items():
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   231
            if not data:
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   232
                # There is no data for these etype for this flush round.
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   233
                continue
10871
1d4a94d04ec6 [dataimport] remove replace_sep parameter from massive store
Julien Cristau <julien.cristau@logilab.fr>
parents: 10870
diff changeset
   234
            buf = pgstore._create_copyfrom_buffer(data, ('eid_from', 'eid_to'))
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   235
            if not buf:
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   236
                # The buffer is empty. This is probably due to error in _create_copyfrom_buffer
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   237
                raise ValueError
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   238
            cursor = self._cnx.cnxset.cu
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   239
            # Push into the tmp table
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   240
            cursor.copy_from(buf, '%s_relation_tmp' % rtype.lower(),
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   241
                             null='NULL', columns=('eid_from', 'eid_to'))
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   242
            # Clear data cache
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   243
            self._data_relations[rtype] = []
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   244
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   245
    def flush_entities(self):
11707
2c4518fea26f [massive store] Drop deprecated code
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11705
diff changeset
   246
        """Flush the entities data."""
10856
b839167d99a4 [dataimport] dict.iteritems() → dict.items()
Julien Cristau <julien.cristau@logilab.fr>
parents: 10855
diff changeset
   247
        for etype, data in self._data_entities.items():
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   248
            if not data:
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   249
                # There is no data for these etype for this flush round.
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   250
                continue
11305
118d83e65ca8 [dataimport] remove useless assignment in massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11057
diff changeset
   251
            # XXX It may be interresting to directly infer the columns' names from the schema
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   252
            # XXX For now, the _create_copyfrom_buffer does a "row[column]"
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   253
            # which can lead to a key error.
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   254
            # Thus we should create dictionary with all the keys.
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   255
            columns = set()
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   256
            for d in data:
11330
6adfa1e75179 [dataimport] no need to call .keys()
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11329
diff changeset
   257
                columns.update(d)
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   258
            _data = []
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   259
            _base_data = dict.fromkeys(columns)
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   260
            for d in data:
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   261
                _d = _base_data.copy()
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   262
                _d.update(d)
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   263
                _data.append(_d)
10871
1d4a94d04ec6 [dataimport] remove replace_sep parameter from massive store
Julien Cristau <julien.cristau@logilab.fr>
parents: 10870
diff changeset
   264
            buf = pgstore._create_copyfrom_buffer(_data, columns)
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   265
            if not buf:
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   266
                # The buffer is empty. This is probably due to error in _create_copyfrom_buffer
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   267
                raise ValueError('Error in buffer creation for etype %s' % etype)
11774
51c160677afe [repository] Drop the entities.extid column and associated cache
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11773
diff changeset
   268
            columns = ['cw_%s' % attr for attr in columns]
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   269
            cursor = self._cnx.cnxset.cu
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   270
            try:
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   271
                cursor.copy_from(buf, 'cw_%s' % etype.lower(), null='NULL', columns=columns)
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   272
            except Exception as exc:
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   273
                self.on_rollback(exc, etype, data)
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   274
            # Clear data cache
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   275
            self._data_entities[etype] = []
11329
a8cab8fb54ba [dataimport] drop massive store's flush_metadata method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11328
diff changeset
   276
            if not self.slave_mode:
a8cab8fb54ba [dataimport] drop massive store's flush_metadata method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11328
diff changeset
   277
                self._insert_etype_metadata(etype)
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   278
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   279
    def _cleanup_relations(self, rtype):
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   280
        """ Cleanup rtype table """
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   281
        # Push into relation table while removing duplicate
11312
3a83759854ee [dataimport] enhance a bit sql queries readability
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11311
diff changeset
   282
        self.sql('INSERT INTO %(r)s_relation (eid_from, eid_to) SELECT DISTINCT '
3a83759854ee [dataimport] enhance a bit sql queries readability
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11311
diff changeset
   283
                 'T.eid_from, T.eid_to FROM %(r)s_relation_tmp AS T '
3a83759854ee [dataimport] enhance a bit sql queries readability
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11311
diff changeset
   284
                 'WHERE NOT EXISTS (SELECT 1 FROM %(r)s_relation AS TT WHERE '
3a83759854ee [dataimport] enhance a bit sql queries readability
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11311
diff changeset
   285
                 'TT.eid_from=T.eid_from AND TT.eid_to=T.eid_to);' % {'r': rtype})
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   286
        # Drop temporary relation table
11312
3a83759854ee [dataimport] enhance a bit sql queries readability
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11311
diff changeset
   287
        self.sql('DROP TABLE %(r)s_relation_tmp' % {'r': rtype.lower()})
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   288
11329
a8cab8fb54ba [dataimport] drop massive store's flush_metadata method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11328
diff changeset
   289
    def _insert_etype_metadata(self, etype):
a8cab8fb54ba [dataimport] drop massive store's flush_metadata method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11328
diff changeset
   290
        """Massive insertion of meta data for a given etype, based on SQL statements.
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   291
        """
11326
06eeac9389a3 [dataimport] introduce usage of MetadataGenerator into the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11325
diff changeset
   292
        # insert standard metadata relations
06eeac9389a3 [dataimport] introduce usage of MetadataGenerator into the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11325
diff changeset
   293
        for rtype, eid in self.metagen.base_etype_rels(etype).items():
06eeac9389a3 [dataimport] introduce usage of MetadataGenerator into the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11325
diff changeset
   294
            self._insert_meta_relation(etype, eid, '%s_relation' % rtype)
06eeac9389a3 [dataimport] introduce usage of MetadataGenerator into the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11325
diff changeset
   295
        # insert cw_source, is and is_instance_of relations (normally handled by the system source)
06eeac9389a3 [dataimport] introduce usage of MetadataGenerator into the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11325
diff changeset
   296
        self._insert_meta_relation(etype, self.metagen.source.eid, 'cw_source_relation')
06eeac9389a3 [dataimport] introduce usage of MetadataGenerator into the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11325
diff changeset
   297
        eschema = self.schema[etype]
11325
a29443fbd1f2 [dataimport] rename massive store's metagen_push_relation method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11324
diff changeset
   298
        self._insert_meta_relation(etype, eschema.eid, 'is_relation')
11326
06eeac9389a3 [dataimport] introduce usage of MetadataGenerator into the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11325
diff changeset
   299
        for parent_eschema in chain(eschema.ancestors(), [eschema]):
11325
a29443fbd1f2 [dataimport] rename massive store's metagen_push_relation method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11324
diff changeset
   300
            self._insert_meta_relation(etype, parent_eschema.eid, 'is_instance_of_relation')
11326
06eeac9389a3 [dataimport] introduce usage of MetadataGenerator into the massive store
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11325
diff changeset
   301
        # finally insert records into the entities table
11774
51c160677afe [repository] Drop the entities.extid column and associated cache
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11773
diff changeset
   302
        self.sql("INSERT INTO entities (eid, type) "
51c160677afe [repository] Drop the entities.extid column and associated cache
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11773
diff changeset
   303
                 "SELECT cw_eid, '%s' FROM cw_%s "
11312
3a83759854ee [dataimport] enhance a bit sql queries readability
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11311
diff changeset
   304
                 "WHERE NOT EXISTS (SELECT 1 FROM entities WHERE eid=cw_eid)"
11773
054a947b5415 [repository] Drop the entities.asource column
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11707
diff changeset
   305
                 % (etype, etype.lower()))
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   306
11325
a29443fbd1f2 [dataimport] rename massive store's metagen_push_relation method
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11324
diff changeset
   307
    def _insert_meta_relation(self, etype, eid_to, rtype):
11312
3a83759854ee [dataimport] enhance a bit sql queries readability
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11311
diff changeset
   308
        self.sql("INSERT INTO %s (eid_from, eid_to) SELECT cw_eid, %s FROM cw_%s "
3a83759854ee [dataimport] enhance a bit sql queries readability
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11311
diff changeset
   309
                 "WHERE NOT EXISTS (SELECT 1 FROM entities WHERE eid=cw_eid)"
3a83759854ee [dataimport] enhance a bit sql queries readability
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11311
diff changeset
   310
                 % (rtype, eid_to, etype.lower()))
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   311
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   312
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   313
def get_size_constraints(schema):
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   314
    """analyzes yams ``schema`` and returns the list of size constraints.
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   315
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   316
    The returned value is a dictionary mapping entity types to a
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   317
    sub-dictionnaries mapping attribute names -> max size.
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   318
    """
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   319
    size_constraints = {}
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   320
    # iterates on all entity types
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   321
    for eschema in schema.entities():
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   322
        # for each entity type, iterates on attribute definitions
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   323
        size_constraints[eschema.type] = eschema_constraints = {}
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   324
        for rschema, aschema in eschema.attribute_definitions():
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   325
            # for each attribute, if a size constraint is found,
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   326
            # append it to the size constraint list
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   327
            maxsize = None
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   328
            rdef = rschema.rdef(eschema, aschema)
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   329
            for constraint in rdef.constraints:
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   330
                if isinstance(constraint, SizeConstraint):
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   331
                    maxsize = constraint.max
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   332
                    eschema_constraints[rschema.type] = maxsize
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   333
    return size_constraints
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   334
11313
682b15eb2dd2 [dataimport] flake8
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11312
diff changeset
   335
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   336
def get_default_values(schema):
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   337
    """analyzes yams ``schema`` and returns the list of default values.
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   338
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   339
    The returned value is a dictionary mapping entity types to a
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   340
    sub-dictionnaries mapping attribute names -> default values.
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   341
    """
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   342
    default_values = {}
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   343
    # iterates on all entity types
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   344
    for eschema in schema.entities():
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   345
        # for each entity type, iterates on attribute definitions
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   346
        default_values[eschema.type] = eschema_constraints = {}
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   347
        for rschema, _ in eschema.attribute_definitions():
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   348
            # for each attribute, if a size constraint is found,
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   349
            # append it to the size constraint list
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   350
            if eschema.default(rschema.type) is not None:
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   351
                eschema_constraints[rschema.type] = eschema.default(rschema.type)
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   352
    return default_values
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   353
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   354
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   355
class PGHelper(object):
11314
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   356
    """This class provides some helper methods to manipulate a postgres database metadata (index and
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   357
    constraints).
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   358
    """
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   359
11310
e0b7277e5394 [dataimport] PGHelper should be responsible to retrieve the database schema
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11308
diff changeset
   360
    def __init__(self, cnx):
11314
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   361
        self.sql = cnx.system_sql
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   362
        # Deals with pg schema, see #3216686
11310
e0b7277e5394 [dataimport] PGHelper should be responsible to retrieve the database schema
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11308
diff changeset
   363
        pg_schema = cnx.repo.config.system_source_config.get('db-namespace') or 'public'
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   364
        self.pg_schema = pg_schema
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   365
11322
21316020eae3 [dataimport] move cwmassive_constraint temporary table handling to the PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11321
diff changeset
   366
    def drop_indexes(self, tablename):
21316020eae3 [dataimport] move cwmassive_constraint temporary table handling to the PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11321
diff changeset
   367
        """Drop indexes and constraints, storing them in a table for later restore."""
21316020eae3 [dataimport] move cwmassive_constraint temporary table handling to the PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11321
diff changeset
   368
        # Create a table to save the constraints, it allows reloading even after crash
21316020eae3 [dataimport] move cwmassive_constraint temporary table handling to the PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11321
diff changeset
   369
        self.sql('CREATE TABLE IF NOT EXISTS cwmassive_constraints(sql TEXT, insert_order SERIAL)')
21316020eae3 [dataimport] move cwmassive_constraint temporary table handling to the PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11321
diff changeset
   370
        indexes = self.table_indexes(tablename)
21316020eae3 [dataimport] move cwmassive_constraint temporary table handling to the PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11321
diff changeset
   371
        for name, query in indexes.items():
21316020eae3 [dataimport] move cwmassive_constraint temporary table handling to the PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11321
diff changeset
   372
            self.sql('INSERT INTO cwmassive_constraints(sql) VALUES (%(sql)s)', {'sql': query})
21316020eae3 [dataimport] move cwmassive_constraint temporary table handling to the PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11321
diff changeset
   373
            self.sql('DROP INDEX %s' % name)
21316020eae3 [dataimport] move cwmassive_constraint temporary table handling to the PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11321
diff changeset
   374
21316020eae3 [dataimport] move cwmassive_constraint temporary table handling to the PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11321
diff changeset
   375
    def drop_constraints(self, tablename):
21316020eae3 [dataimport] move cwmassive_constraint temporary table handling to the PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11321
diff changeset
   376
        self.sql('CREATE TABLE IF NOT EXISTS cwmassive_constraints(sql TEXT, insert_order SERIAL)')
21316020eae3 [dataimport] move cwmassive_constraint temporary table handling to the PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11321
diff changeset
   377
        constraints = self.table_constraints(tablename)
21316020eae3 [dataimport] move cwmassive_constraint temporary table handling to the PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11321
diff changeset
   378
        for name, query in constraints.items():
21316020eae3 [dataimport] move cwmassive_constraint temporary table handling to the PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11321
diff changeset
   379
            self.sql('INSERT INTO cwmassive_constraints(sql) VALUES (%(sql)s)', {'sql': query})
21316020eae3 [dataimport] move cwmassive_constraint temporary table handling to the PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11321
diff changeset
   380
            self.sql('ALTER TABLE %s DROP CONSTRAINT %s' % (tablename, name))
21316020eae3 [dataimport] move cwmassive_constraint temporary table handling to the PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11321
diff changeset
   381
21316020eae3 [dataimport] move cwmassive_constraint temporary table handling to the PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11321
diff changeset
   382
    def restore_indexes_and_constraints(self):
21316020eae3 [dataimport] move cwmassive_constraint temporary table handling to the PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11321
diff changeset
   383
        """Restore indexes and constraints."""
21316020eae3 [dataimport] move cwmassive_constraint temporary table handling to the PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11321
diff changeset
   384
        if not self.table_exists('cwmassive_constraints'):
21316020eae3 [dataimport] move cwmassive_constraint temporary table handling to the PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11321
diff changeset
   385
            return
21316020eae3 [dataimport] move cwmassive_constraint temporary table handling to the PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11321
diff changeset
   386
        cu = self.sql('SELECT sql, insert_order FROM cwmassive_constraints '
21316020eae3 [dataimport] move cwmassive_constraint temporary table handling to the PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11321
diff changeset
   387
                      'ORDER BY insert_order DESC')
21316020eae3 [dataimport] move cwmassive_constraint temporary table handling to the PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11321
diff changeset
   388
        for query, order in cu.fetchall():
21316020eae3 [dataimport] move cwmassive_constraint temporary table handling to the PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11321
diff changeset
   389
            self.sql(query)
21316020eae3 [dataimport] move cwmassive_constraint temporary table handling to the PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11321
diff changeset
   390
            self.sql('DELETE FROM cwmassive_constraints WHERE insert_order=%(order)s',
21316020eae3 [dataimport] move cwmassive_constraint temporary table handling to the PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11321
diff changeset
   391
                     {'order': order})
21316020eae3 [dataimport] move cwmassive_constraint temporary table handling to the PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11321
diff changeset
   392
        self.sql('DROP TABLE cwmassive_constraints')
21316020eae3 [dataimport] move cwmassive_constraint temporary table handling to the PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11321
diff changeset
   393
11314
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   394
    def table_exists(self, tablename):
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   395
        """Return True if the given table already exists in the database."""
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   396
        cu = self.sql('SELECT 1 from information_schema.tables '
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   397
                      'WHERE table_name=%(t)s AND table_schema=%(s)s',
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   398
                      {'t': tablename, 's': self.pg_schema})
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   399
        return bool(cu.fetchone())
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   400
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   401
    def table_indexes_constraints(self, tablename):
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   402
        """Return one dictionary with all indexes by name, another with all constraints by name,
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   403
        for the given table.
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   404
        """
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   405
        indexes = self.table_indexes(tablename)
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   406
        constraints = self.table_constraints(tablename)
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   407
        _indexes = {}
10856
b839167d99a4 [dataimport] dict.iteritems() → dict.items()
Julien Cristau <julien.cristau@logilab.fr>
parents: 10855
diff changeset
   408
        for name, query in indexes.items():
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   409
            # Remove pkey indexes (automatically created by constraints)
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   410
            # Specific cases of primary key, see #3224079
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   411
            if name not in constraints:
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   412
                _indexes[name] = query
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   413
        return _indexes, constraints
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   414
11314
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   415
    def table_indexes(self, tablename):
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   416
        """Return a dictionary of indexes {index name: index sql}, constraints included."""
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   417
        indexes = {}
11314
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   418
        for name in self._index_names(tablename):
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   419
            indexes[name] = self._index_sql(name)
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   420
        return indexes
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   421
11314
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   422
    def table_constraints(self, tablename):
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   423
        """Return a dictionary of constraints {constraint name: constraint sql}."""
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   424
        constraints = {}
11314
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   425
        for name in self._constraint_names(tablename):
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   426
            query = self._constraint_sql(name)
10853
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   427
            constraints[name] = 'ALTER TABLE %s ADD CONSTRAINT %s %s' % (tablename, name, query)
de741492538d [dataimport] backport massive store from dataio cube
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
   428
        return constraints
11314
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   429
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   430
    def _index_names(self, tablename):
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   431
        """Return the names of all indexes in the given table (including constraints.)"""
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   432
        cu = self.sql("SELECT c.relname FROM pg_catalog.pg_class c "
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   433
                      "JOIN pg_catalog.pg_index i ON i.indexrelid = c.oid "
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   434
                      "JOIN pg_catalog.pg_class c2 ON i.indrelid = c2.oid "
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   435
                      "LEFT JOIN pg_catalog.pg_user u ON u.usesysid = c.relowner "
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   436
                      "LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace "
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   437
                      "WHERE c.relkind IN ('i','') "
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   438
                      " AND c2.relname = %(t)s "
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   439
                      " AND i.indisprimary = FALSE "
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   440
                      " AND n.nspname NOT IN ('pg_catalog', 'pg_toast') "
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   441
                      " AND pg_catalog.pg_table_is_visible(c.oid);", {'t': tablename})
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   442
        return [name for name, in cu.fetchall()]
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   443
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   444
    def _constraint_names(self, tablename):
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   445
        """Return the names of all constraints in the given table."""
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   446
        cu = self.sql("SELECT i.conname FROM pg_catalog.pg_class c "
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   447
                      "JOIN pg_catalog.pg_constraint i ON i.conrelid = c.oid "
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   448
                      "JOIN pg_catalog.pg_class c2 ON i.conrelid=c2.oid "
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   449
                      "LEFT JOIN pg_catalog.pg_user u ON u.usesysid = c.relowner "
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   450
                      "LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace "
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   451
                      "WHERE c2.relname = %(t)s "
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   452
                      "AND n.nspname NOT IN ('pg_catalog', 'pg_toast') "
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   453
                      "AND pg_catalog.pg_table_is_visible(c.oid)", {'t': tablename})
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   454
        return [name for name, in cu.fetchall()]
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   455
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   456
    def _index_sql(self, name):
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   457
        """Return the SQL to be used to recreate the index of the given name."""
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   458
        return self.sql('SELECT pg_get_indexdef(c.oid) FROM pg_catalog.pg_class c '
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   459
                        'LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace '
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   460
                        'WHERE c.relname = %(r)s AND n.nspname=%(n)s',
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   461
                        {'r': name, 'n': self.pg_schema}).fetchone()[0]
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   462
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   463
    def _constraint_sql(self, name):
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   464
        """Return the SQL to be used to recreate the constraint."""
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   465
        return self.sql('SELECT pg_get_constraintdef(c.oid) FROM pg_catalog.pg_constraint c '
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   466
                        'LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.connamespace '
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   467
                        'WHERE c.conname = %(r)s AND n.nspname=%(n)s',
c258bd6b20d8 [dataimport] rework PGHelper class
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11313
diff changeset
   468
                        {'r': name, 'n': self.pg_schema}).fetchone()[0]