dataimport/importer.py
author Denis Laxalde <denis.laxalde@logilab.fr>
Fri, 26 Jun 2015 16:10:33 +0200
changeset 10461 37644c518705
parent 10460 d260722f2453
child 10514 b29d9904482e
permissions -rw-r--r--
[doc] Add a tutorial and extend documentation for ExtEntityImporter Related to #5414753.
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
     1
# copyright 2015 LOGILAB S.A. (Paris, FRANCE), all rights reserved.
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
     2
# contact http://www.logilab.fr -- mailto:contact@logilab.fr
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
     3
#
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
     4
# This program is free software: you can redistribute it and/or modify it under
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
     5
# the terms of the GNU Lesser General Public License as published by the Free
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
     6
# Software Foundation, either version 2.1 of the License, or (at your option)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
     7
# any later version.
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
     8
#
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
     9
# This program is distributed in the hope that it will be useful, but WITHOUT
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    10
# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    11
# FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    12
# details.
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    13
#
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    14
# You should have received a copy of the GNU Lesser General Public License along
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    15
# with this program. If not, see <http://www.gnu.org/licenses/>.
10461
37644c518705 [doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10460
diff changeset
    16
"""Data import of external entities.
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    17
10461
37644c518705 [doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10460
diff changeset
    18
Main entry points:
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    19
10461
37644c518705 [doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10460
diff changeset
    20
.. autoclass:: ExtEntitiesImporter
37644c518705 [doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10460
diff changeset
    21
.. autoclass:: ExtEntity
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    22
10461
37644c518705 [doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10460
diff changeset
    23
Utilities:
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    24
10461
37644c518705 [doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10460
diff changeset
    25
.. autofunction:: cwuri2eid
37644c518705 [doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10460
diff changeset
    26
.. autoclass:: RelationMapping
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    27
"""
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    28
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    29
from collections import defaultdict
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    30
import logging
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    31
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    32
from logilab.mtconverter import xml_escape
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    33
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    34
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    35
def cwuri2eid(cnx, etypes, source_eid=None):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    36
    """Return a dictionary mapping cwuri to eid for entities of the given entity types and / or
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    37
    source.
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    38
    """
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    39
    assert source_eid or etypes, 'no entity types nor source specified'
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    40
    rql = 'Any U, X WHERE X cwuri U'
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    41
    args = {}
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    42
    if len(etypes) == 1:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    43
        rql += ', X is %s' % etypes[0]
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    44
    elif etypes:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    45
        rql += ', X is IN (%s)' % ','.join(etypes)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    46
    if source_eid is not None:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    47
        rql += ', X cw_source S, S eid %(s)s'
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    48
        args['s'] = source_eid
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    49
    return dict(cnx.execute(rql, args))
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    50
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    51
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    52
class RelationMapping(object):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    53
    """Read-only mapping from relation type to set of related (subject, object) eids.
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    54
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    55
    If `source` is specified, only returns relations implying entities from
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    56
    this source.
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    57
    """
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    58
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    59
    def __init__(self, cnx, source=None):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    60
        self.cnx = cnx
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    61
        self._rql_template = 'Any S,O WHERE S {} O'
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    62
        self._kwargs = {}
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    63
        if source is not None:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    64
            self._rql_template += ', S cw_source SO, O cw_source SO, SO eid %(s)s'
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    65
            self._kwargs['s'] = source.eid
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    66
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    67
    def __getitem__(self, rtype):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    68
        """Return a set of (subject, object) eids already related by `rtype`"""
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    69
        rql = self._rql_template.format(rtype)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    70
        return set(tuple(x) for x in self.cnx.execute(rql, self._kwargs))
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    71
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    72
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    73
class ExtEntity(object):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    74
    """Transitional representation of an entity for use in data importer.
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    75
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    76
    An external entity has the following properties:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    77
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    78
    * ``extid`` (external id), an identifier for the ext entity,
10461
37644c518705 [doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10460
diff changeset
    79
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    80
    * ``etype`` (entity type), a string which must be the name of one entity type in the schema
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    81
      (eg. ``'Person'``, ``'Animal'``, ...),
10461
37644c518705 [doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10460
diff changeset
    82
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    83
    * ``values``, a dictionary whose keys are attribute or relation names from the schema (eg.
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    84
      ``'first_name'``, ``'friend'``), and whose values are *sets*
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    85
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    86
    For instance:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    87
10461
37644c518705 [doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10460
diff changeset
    88
    .. code-block:: python
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    89
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    90
        ext_entity.extid = 'http://example.org/person/debby'
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    91
        ext_entity.etype = 'Person'
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    92
        ext_entity.values = {'first_name': set([u"Deborah", u"Debby"]),
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    93
                            'friend': set(['http://example.org/person/john'])}
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    94
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    95
    """
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    96
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    97
    def __init__(self, etype, extid, values=None):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    98
        self.etype = etype
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    99
        self.extid = extid
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   100
        if values is None:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   101
            values = {}
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   102
        self.values = values
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   103
        self._schema = None
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   104
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   105
    def __repr__(self):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   106
        return '<%s %s %s>' % (self.etype, self.extid, self.values)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   107
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   108
    def iter_rdefs(self):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   109
        """Yield (key, rtype, role) defined in `.values` dict, with:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   110
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   111
        * `key` is the original key in `.values` (i.e. the relation type or a 2-uple (relation type,
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   112
          role))
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   113
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   114
        * `rtype` is a yams relation type, expected to be found in the schema (attribute or
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   115
          relation)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   116
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   117
        * `role` is the role of the entity in the relation, 'subject' or 'object'
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   118
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   119
        Iteration is done on a copy of the keys so values may be inserted/deleted during it.
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   120
        """
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   121
        for key in list(self.values):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   122
            if isinstance(key, tuple):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   123
                rtype, role = key
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   124
                assert role in ('subject', 'object'), key
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   125
                yield key, rtype, role
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   126
            else:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   127
                yield key, key, 'subject'
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   128
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   129
    def prepare(self, schema):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   130
        """Prepare an external entity for later insertion:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   131
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   132
        * ensure attributes and inlined relations have a single value
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   133
        * turn set([value]) into value and remove key associated to empty set
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   134
        * remove non inlined relations and return them as a [(e1key, relation, e2key)] list
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   135
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   136
        Return a list of non inlined relations that may be inserted later, each relations defined by
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   137
        a 3-tuple (subject extid, relation type, object extid).
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   138
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   139
        Take care the importer may call this method several times.
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   140
        """
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   141
        assert self._schema is None, 'prepare() has already been called for %s' % self
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   142
        self._schema = schema
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   143
        eschema = schema.eschema(self.etype)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   144
        deferred = []
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   145
        entity_dict = self.values
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   146
        for key, rtype, role in self.iter_rdefs():
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   147
            rschema = schema.rschema(rtype)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   148
            if rschema.final or (rschema.inlined and role == 'subject'):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   149
                assert len(entity_dict[key]) <= 1, \
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   150
                    "more than one value for %s: %s (%s)" % (rtype, entity_dict[key], self.extid)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   151
                if entity_dict[key]:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   152
                    entity_dict[rtype] = entity_dict[key].pop()
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   153
                    if key != rtype:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   154
                        del entity_dict[key]
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   155
                    if (rschema.final and eschema.has_metadata(rtype, 'format')
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   156
                            and not rtype + '_format' in entity_dict):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   157
                        entity_dict[rtype + '_format'] = u'text/plain'
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   158
                else:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   159
                    del entity_dict[key]
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   160
            else:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   161
                for target_extid in entity_dict.pop(key):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   162
                    if role == 'subject':
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   163
                        deferred.append((self.extid, rtype, target_extid))
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   164
                    else:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   165
                        deferred.append((target_extid, rtype, self.extid))
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   166
        return deferred
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   167
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   168
    def is_ready(self, extid2eid):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   169
        """Return True if the ext entity is ready, i.e. has all the URIs used in inlined relations
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   170
        currently existing.
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   171
        """
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   172
        assert self._schema, 'prepare() method should be called first on %s' % self
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   173
        # as .prepare has been called, we know that .values only contains subject relation *type* as
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   174
        # key (no more (rtype, role) tuple)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   175
        schema = self._schema
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   176
        entity_dict = self.values
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   177
        for rtype in entity_dict:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   178
            rschema = schema.rschema(rtype)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   179
            if not rschema.final:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   180
                # .prepare() should drop other cases from the entity dict
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   181
                assert rschema.inlined
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   182
                if not entity_dict[rtype] in extid2eid:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   183
                    return False
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   184
        # entity is ready, replace all relation's extid by eids
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   185
        for rtype in entity_dict:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   186
            rschema = schema.rschema(rtype)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   187
            if rschema.inlined:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   188
                entity_dict[rtype] = extid2eid[entity_dict[rtype]]
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   189
        return True
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   190
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   191
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   192
class ExtEntitiesImporter(object):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   193
    """This class is responsible for importing externals entities, that is instances of
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   194
    :class:`ExtEntity`, into CubicWeb entities.
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   195
10461
37644c518705 [doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10460
diff changeset
   196
    :param schema: the CubicWeb's instance schema
37644c518705 [doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10460
diff changeset
   197
    :param store: a CubicWeb `Store`
37644c518705 [doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10460
diff changeset
   198
    :param extid2eid: optional {extid: eid} dictionary giving information on existing entities. It
37644c518705 [doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10460
diff changeset
   199
        will be completed during import. You may want to use :func:`cwuri2eid` to build it.
37644c518705 [doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10460
diff changeset
   200
    :param existing_relation: optional {rtype: set((subj eid, obj eid))} mapping giving information on
37644c518705 [doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10460
diff changeset
   201
        existing relations of a given type. You may want to use :class:`RelationMapping` to build it.
37644c518705 [doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10460
diff changeset
   202
    :param  etypes_order_hint: optional ordered iterable on entity types, giving an hint on the order in
37644c518705 [doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10460
diff changeset
   203
        which they should be attempted to be imported
37644c518705 [doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10460
diff changeset
   204
    :param  import_log: optional object implementing the :class:`SimpleImportLog` interface to record
37644c518705 [doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10460
diff changeset
   205
        events occuring during the import
37644c518705 [doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10460
diff changeset
   206
    :param  raise_on_error: optional boolean flag - default to false, indicating whether errors should
37644c518705 [doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10460
diff changeset
   207
        be raised or logged. You usually want them to be raised during test but to be logged in
37644c518705 [doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10460
diff changeset
   208
        production.
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   209
10461
37644c518705 [doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10460
diff changeset
   210
    Instances of this class are meant to import external entities through :meth:`import_entities`
37644c518705 [doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10460
diff changeset
   211
    which handles a stream of :class:`ExtEntity`. One may then plug arbitrary filters into the
37644c518705 [doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10460
diff changeset
   212
    external entities stream.
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   213
10461
37644c518705 [doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10460
diff changeset
   214
    .. automethod:: import_entities
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   215
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   216
    """
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   217
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   218
    def __init__(self, schema, store, extid2eid=None, existing_relations=None,
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   219
                 etypes_order_hint=(), import_log=None, raise_on_error=False):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   220
        self.schema = schema
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   221
        self.store = store
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   222
        self.extid2eid = extid2eid if extid2eid is not None else {}
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   223
        self.existing_relations = (existing_relations if existing_relations is not None
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   224
                                   else defaultdict(set))
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   225
        self.etypes_order_hint = etypes_order_hint
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   226
        if import_log is None:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   227
            import_log = SimpleImportLog('<unspecified>')
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   228
        self.import_log = import_log
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   229
        self.raise_on_error = raise_on_error
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   230
        # set of created/updated eids
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   231
        self.created = set()
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   232
        self.updated = set()
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   233
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   234
    def import_entities(self, ext_entities):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   235
        """Import given external entities (:class:`ExtEntity`) stream (usually a generator)."""
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   236
        # {etype: [etype dict]} of entities that are in the import queue
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   237
        queue = {}
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   238
        # order entity dictionaries then create/update them
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   239
        deferred = self._import_entities(ext_entities, queue)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   240
        # create deferred relations that don't exist already
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   241
        missing_relations = self.prepare_insert_deferred_relations(deferred)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   242
        self._warn_about_missing_work(queue, missing_relations)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   243
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   244
    def _import_entities(self, ext_entities, queue):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   245
        extid2eid = self.extid2eid
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   246
        deferred = {}  # non inlined relations that may be deferred
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   247
        self.import_log.record_debug('importing entities')
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   248
        for ext_entity in self.iter_ext_entities(ext_entities, deferred, queue):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   249
            try:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   250
                eid = extid2eid[ext_entity.extid]
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   251
            except KeyError:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   252
                self.prepare_insert_entity(ext_entity)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   253
            else:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   254
                if ext_entity.values:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   255
                    self.prepare_update_entity(ext_entity, eid)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   256
        return deferred
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   257
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   258
    def iter_ext_entities(self, ext_entities, deferred, queue):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   259
        """Yield external entities in an order which attempts to satisfy
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   260
        schema constraints (inlined / cardinality) and to optimize the import.
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   261
        """
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   262
        schema = self.schema
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   263
        extid2eid = self.extid2eid
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   264
        for ext_entity in ext_entities:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   265
            # check data in the transitional representation and prepare it for
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   266
            # later insertion in the database
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   267
            for subject_uri, rtype, object_uri in ext_entity.prepare(schema):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   268
                deferred.setdefault(rtype, set()).add((subject_uri, object_uri))
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   269
            if not ext_entity.is_ready(extid2eid):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   270
                queue.setdefault(ext_entity.etype, []).append(ext_entity)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   271
                continue
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   272
            yield ext_entity
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   273
            # check for some entities in the queue that may now be ready. We'll have to restart
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   274
            # search for ready entities until no one is generated
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   275
            new = True
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   276
            while new:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   277
                new = False
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   278
                for etype in self.etypes_order_hint:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   279
                    if etype in queue:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   280
                        new_queue = []
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   281
                        for ext_entity in queue[etype]:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   282
                            if ext_entity.is_ready(extid2eid):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   283
                                yield ext_entity
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   284
                                # may unlock entity previously handled within this loop
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   285
                                new = True
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   286
                            else:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   287
                                new_queue.append(ext_entity)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   288
                        if new_queue:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   289
                            queue[etype][:] = new_queue
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   290
                        else:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   291
                            del queue[etype]
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   292
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   293
    def prepare_insert_entity(self, ext_entity):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   294
        """Call the store to prepare insertion of the given external entity"""
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   295
        eid = self.store.prepare_insert_entity(ext_entity.etype, **ext_entity.values)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   296
        self.extid2eid[ext_entity.extid] = eid
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   297
        self.created.add(eid)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   298
        return eid
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   299
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   300
    def prepare_update_entity(self, ext_entity, eid):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   301
        """Call the store to prepare update of the given external entity"""
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   302
        self.store.prepare_update_entity(ext_entity.etype, eid, **ext_entity.values)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   303
        self.updated.add(eid)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   304
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   305
    def prepare_insert_deferred_relations(self, deferred):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   306
        """Call the store to insert deferred relations (not handled during insertion/update for
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   307
        entities). Return a list of relations `[(subj ext id, obj ext id)]` that may not be inserted
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   308
        because the target entities don't exists yet.
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   309
        """
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   310
        prepare_insert_relation = self.store.prepare_insert_relation
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   311
        rschema = self.schema.rschema
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   312
        extid2eid = self.extid2eid
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   313
        missing_relations = []
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   314
        for rtype, relations in deferred.items():
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   315
            self.import_log.record_debug('importing %s %s relations' % (len(relations), rtype))
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   316
            symmetric = rschema(rtype).symmetric
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   317
            existing = self.existing_relations[rtype]
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   318
            for subject_uri, object_uri in relations:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   319
                try:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   320
                    subject_eid = extid2eid[subject_uri]
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   321
                    object_eid = extid2eid[object_uri]
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   322
                except KeyError:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   323
                    missing_relations.append((subject_uri, rtype, object_uri))
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   324
                    continue
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   325
                if (subject_eid, object_eid) not in existing:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   326
                    prepare_insert_relation(subject_eid, rtype, object_eid)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   327
                    existing.add((subject_eid, object_eid))
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   328
                    if symmetric:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   329
                        existing.add((object_eid, subject_eid))
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   330
        return missing_relations
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   331
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   332
    def _warn_about_missing_work(self, queue, missing_relations):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   333
        error = self.import_log.record_error
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   334
        if queue:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   335
            msgs = ["can't create some entities, is there some cycle or "
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   336
                    "missing data?"]
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   337
            for ext_entities in queue.values():
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   338
                for ext_entity in ext_entities:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   339
                    msgs.append(str(ext_entity))
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   340
            map(error, msgs)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   341
            if self.raise_on_error:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   342
                raise Exception('\n'.join(msgs))
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   343
        if missing_relations:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   344
            msgs = ["can't create some relations, is there missing data?"]
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   345
            for subject_uri, rtype, object_uri in missing_relations:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   346
                msgs.append("%s %s %s" % (subject_uri, rtype, object_uri))
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   347
            map(error, msgs)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   348
            if self.raise_on_error:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   349
                raise Exception('\n'.join(msgs))
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   350
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   351
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   352
class SimpleImportLog(object):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   353
    """Fake CWDataImport log using a simple text format.
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   354
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   355
    Useful to display logs in the UI instead of storing them to the
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   356
    database.
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   357
    """
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   358
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   359
    def __init__(self, filename):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   360
        self.logs = []
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   361
        self.filename = filename
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   362
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   363
    def record_debug(self, msg, path=None, line=None):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   364
        self._log(logging.DEBUG, msg, path, line)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   365
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   366
    def record_info(self, msg, path=None, line=None):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   367
        self._log(logging.INFO, msg, path, line)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   368
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   369
    def record_warning(self, msg, path=None, line=None):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   370
        self._log(logging.WARNING, msg, path, line)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   371
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   372
    def record_error(self, msg, path=None, line=None):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   373
        self._log(logging.ERROR, msg, path, line)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   374
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   375
    def record_fatal(self, msg, path=None, line=None):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   376
        self._log(logging.FATAL, msg, path, line)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   377
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   378
    def _log(self, severity, msg, path, line):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   379
        encodedmsg = u'%s\t%s\t%s\t%s' % (severity, self.filename,
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   380
                                          line or u'', msg)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   381
        self.logs.append(encodedmsg)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   382
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   383
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   384
class HTMLImportLog(SimpleImportLog):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   385
    """Fake CWDataImport log using a simple HTML format."""
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   386
    def __init__(self, filename):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   387
        super(HTMLImportLog, self).__init__(xml_escape(filename))
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   388
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   389
    def _log(self, severity, msg, path, line):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   390
        encodedmsg = u'%s\t%s\t%s\t%s<br/>' % (severity, self.filename,
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   391
                                               line or u'', xml_escape(msg))
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   392
        self.logs.append(encodedmsg)