cubicweb/dataimport/importer.py
author Philippe Pepiot <ph@itsalwaysdns.eu>
Tue, 31 Mar 2020 19:15:03 +0200
changeset 12957 0c973204033a
parent 12626 32ee89340e59
permissions -rw-r--r--
[server] prevent returning closed cursor to the database pool In since c8c6ad8 init_repository use repo.internal_cnx() instead of repo.system_source.get_connection() so it use the pool and we should not close cursors from the pool before returning it back. Otherwise we may have "connection already closed" error. This bug only trigger when connection-pool-size = 1. Since we are moving to use a dynamic pooler we need to get this fixed. This does not occur with sqlite since the connection wrapper instantiate new cursor everytime, but this occur with other databases.
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
11139
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
     1
# copyright 2015-2016 LOGILAB S.A. (Paris, FRANCE), all rights reserved.
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
     2
# contact http://www.logilab.fr -- mailto:contact@logilab.fr
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
     3
#
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
     4
# This program is free software: you can redistribute it and/or modify it under
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
     5
# the terms of the GNU Lesser General Public License as published by the Free
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
     6
# Software Foundation, either version 2.1 of the License, or (at your option)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
     7
# any later version.
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
     8
#
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
     9
# This program is distributed in the hope that it will be useful, but WITHOUT
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    10
# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    11
# FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    12
# details.
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    13
#
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    14
# You should have received a copy of the GNU Lesser General Public License along
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    15
# with this program. If not, see <http://www.gnu.org/licenses/>.
10461
37644c518705 [doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10460
diff changeset
    16
"""Data import of external entities.
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    17
10461
37644c518705 [doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10460
diff changeset
    18
Main entry points:
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    19
10461
37644c518705 [doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10460
diff changeset
    20
.. autoclass:: ExtEntitiesImporter
37644c518705 [doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10460
diff changeset
    21
.. autoclass:: ExtEntity
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    22
10461
37644c518705 [doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10460
diff changeset
    23
Utilities:
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    24
10461
37644c518705 [doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10460
diff changeset
    25
.. autofunction:: cwuri2eid
37644c518705 [doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10460
diff changeset
    26
.. autoclass:: RelationMapping
10514
b29d9904482e add use_extid_as_cwuri ext entity transform, that will be often necessary and not so easy to write at once
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10461
diff changeset
    27
.. autofunction:: cubicweb.dataimport.importer.use_extid_as_cwuri
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    28
"""
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    29
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    30
from collections import defaultdict
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    31
import logging
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    32
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    33
from logilab.mtconverter import xml_escape
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    34
11342
62a7100d774b [dataimport] Handle Bytes data in ExtEntity and convert them to Binary
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 11313
diff changeset
    35
from cubicweb import Binary
62a7100d774b [dataimport] Handle Bytes data in ExtEntity and convert them to Binary
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 11313
diff changeset
    36
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    37
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    38
def cwuri2eid(cnx, etypes, source_eid=None):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    39
    """Return a dictionary mapping cwuri to eid for entities of the given entity types and / or
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    40
    source.
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    41
    """
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    42
    assert source_eid or etypes, 'no entity types nor source specified'
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    43
    rql = 'Any U, X WHERE X cwuri U'
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    44
    args = {}
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    45
    if len(etypes) == 1:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    46
        rql += ', X is %s' % etypes[0]
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    47
    elif etypes:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    48
        rql += ', X is IN (%s)' % ','.join(etypes)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    49
    if source_eid is not None:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    50
        rql += ', X cw_source S, S eid %(s)s'
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    51
        args['s'] = source_eid
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    52
    return dict(cnx.execute(rql, args))
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    53
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    54
10514
b29d9904482e add use_extid_as_cwuri ext entity transform, that will be often necessary and not so easy to write at once
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10461
diff changeset
    55
def use_extid_as_cwuri(extid2eid):
b29d9904482e add use_extid_as_cwuri ext entity transform, that will be often necessary and not so easy to write at once
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10461
diff changeset
    56
    """Return a generator of :class:`ExtEntity` objects that will set `cwuri`
b29d9904482e add use_extid_as_cwuri ext entity transform, that will be often necessary and not so easy to write at once
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10461
diff changeset
    57
    using entity's extid if the entity does not exist yet and has no `cwuri`
b29d9904482e add use_extid_as_cwuri ext entity transform, that will be often necessary and not so easy to write at once
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10461
diff changeset
    58
    defined.
b29d9904482e add use_extid_as_cwuri ext entity transform, that will be often necessary and not so easy to write at once
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10461
diff changeset
    59
b29d9904482e add use_extid_as_cwuri ext entity transform, that will be often necessary and not so easy to write at once
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10461
diff changeset
    60
    `extid2eid` is an extid to eid dictionary coming from an
b29d9904482e add use_extid_as_cwuri ext entity transform, that will be often necessary and not so easy to write at once
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10461
diff changeset
    61
    :class:`ExtEntitiesImporter` instance.
b29d9904482e add use_extid_as_cwuri ext entity transform, that will be often necessary and not so easy to write at once
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10461
diff changeset
    62
b29d9904482e add use_extid_as_cwuri ext entity transform, that will be often necessary and not so easy to write at once
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10461
diff changeset
    63
    Example usage:
b29d9904482e add use_extid_as_cwuri ext entity transform, that will be often necessary and not so easy to write at once
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10461
diff changeset
    64
b29d9904482e add use_extid_as_cwuri ext entity transform, that will be often necessary and not so easy to write at once
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10461
diff changeset
    65
    .. code-block:: python
b29d9904482e add use_extid_as_cwuri ext entity transform, that will be often necessary and not so easy to write at once
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10461
diff changeset
    66
11128
9b4de34ad394 [dataimport] update example to use standard importer
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11057
diff changeset
    67
        importer = ExtEntitiesImporter(cnx, store, import_log)
10514
b29d9904482e add use_extid_as_cwuri ext entity transform, that will be often necessary and not so easy to write at once
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10461
diff changeset
    68
        set_cwuri = use_extid_as_cwuri(importer.extid2eid)
b29d9904482e add use_extid_as_cwuri ext entity transform, that will be often necessary and not so easy to write at once
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10461
diff changeset
    69
        importer.import_entities(set_cwuri(extentities))
b29d9904482e add use_extid_as_cwuri ext entity transform, that will be often necessary and not so easy to write at once
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10461
diff changeset
    70
    """
b29d9904482e add use_extid_as_cwuri ext entity transform, that will be often necessary and not so easy to write at once
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10461
diff changeset
    71
    def use_extid_as_cwuri_filter(extentities):
b29d9904482e add use_extid_as_cwuri ext entity transform, that will be often necessary and not so easy to write at once
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10461
diff changeset
    72
        for extentity in extentities:
b29d9904482e add use_extid_as_cwuri ext entity transform, that will be often necessary and not so easy to write at once
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10461
diff changeset
    73
            if extentity.extid not in extid2eid:
12625
ba5231e1aa45 [dataimport] Fix case when extid is text in use_extid_as_cwuri()
julien tayon <julien.tayon@logilab.fr>
parents: 12173
diff changeset
    74
                cwuri = extentity.extid
12626
32ee89340e59 Merge 3.26
Philippe Pepiot <philippe.pepiot@logilab.fr>
parents: 12625
diff changeset
    75
                if isinstance(cwuri, bytes):
12625
ba5231e1aa45 [dataimport] Fix case when extid is text in use_extid_as_cwuri()
julien tayon <julien.tayon@logilab.fr>
parents: 12173
diff changeset
    76
                    cwuri = cwuri.decode('utf-8')
ba5231e1aa45 [dataimport] Fix case when extid is text in use_extid_as_cwuri()
julien tayon <julien.tayon@logilab.fr>
parents: 12173
diff changeset
    77
                extentity.values.setdefault('cwuri', set([cwuri]))
10514
b29d9904482e add use_extid_as_cwuri ext entity transform, that will be often necessary and not so easy to write at once
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10461
diff changeset
    78
            yield extentity
b29d9904482e add use_extid_as_cwuri ext entity transform, that will be often necessary and not so easy to write at once
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10461
diff changeset
    79
    return use_extid_as_cwuri_filter
b29d9904482e add use_extid_as_cwuri ext entity transform, that will be often necessary and not so easy to write at once
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10461
diff changeset
    80
b29d9904482e add use_extid_as_cwuri ext entity transform, that will be often necessary and not so easy to write at once
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10461
diff changeset
    81
11139
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
    82
def drop_extra_values(extentities, schema, import_log):
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
    83
    """Return a generator of :class:`ExtEntity` objects that will ensure their attributes and
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
    84
    inlined relations have a single value. When it's not the case, a warning will be recorded in
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
    85
    the import log and one value among other will be kept (randomly).
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
    86
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
    87
    `schema` is the instance's schema, `import_log` is an instance of a class implementing the
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
    88
    :class:`SimpleImportLog` interface.
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
    89
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
    90
    Example usage:
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
    91
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
    92
    .. code-block:: python
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
    93
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
    94
        importer = ExtEntitiesImporter(schema, store, import_log)
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
    95
        importer.import_entities(drop_extra_values(extentities, schema, import_log))
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
    96
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
    97
    """
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
    98
    _get_rschema = schema.rschema
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
    99
    for extentity in extentities:
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
   100
        entity_dict = extentity.values
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
   101
        for key, rtype, role in extentity.iter_rdefs():
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
   102
            rschema = _get_rschema(rtype)
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
   103
            if (rschema.final or (rschema.inlined and role == 'subject')) \
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
   104
               and len(entity_dict[key]) > 1:
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
   105
                values = ', '.join(repr(v) for v in entity_dict[key])
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
   106
                import_log.record_warning(
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
   107
                    "more than one value for attribute %r, only one will be kept: %s"
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
   108
                    % (rtype, values), path=extentity.extid)
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
   109
                entity_dict[key] = set([entity_dict[key].pop()])
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
   110
        yield extentity
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
   111
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
   112
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   113
class RelationMapping(object):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   114
    """Read-only mapping from relation type to set of related (subject, object) eids.
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   115
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   116
    If `source` is specified, only returns relations implying entities from
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   117
    this source.
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   118
    """
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   119
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   120
    def __init__(self, cnx, source=None):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   121
        self.cnx = cnx
10939
b30c2f49da57 [dataimport] Format strings with % instead of .format()
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10809
diff changeset
   122
        self._rql_template = 'Any S,O WHERE S %s O'
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   123
        self._kwargs = {}
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   124
        if source is not None:
10939
b30c2f49da57 [dataimport] Format strings with % instead of .format()
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10809
diff changeset
   125
            self._rql_template += ', S cw_source SO, O cw_source SO, SO eid %%(s)s'
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   126
            self._kwargs['s'] = source.eid
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   127
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   128
    def __getitem__(self, rtype):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   129
        """Return a set of (subject, object) eids already related by `rtype`"""
10939
b30c2f49da57 [dataimport] Format strings with % instead of .format()
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10809
diff changeset
   130
        rql = self._rql_template % rtype
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   131
        return set(tuple(x) for x in self.cnx.execute(rql, self._kwargs))
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   132
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   133
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   134
class ExtEntity(object):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   135
    """Transitional representation of an entity for use in data importer.
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   136
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   137
    An external entity has the following properties:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   138
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   139
    * ``extid`` (external id), an identifier for the ext entity,
10461
37644c518705 [doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10460
diff changeset
   140
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   141
    * ``etype`` (entity type), a string which must be the name of one entity type in the schema
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   142
      (eg. ``'Person'``, ``'Animal'``, ...),
10461
37644c518705 [doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10460
diff changeset
   143
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   144
    * ``values``, a dictionary whose keys are attribute or relation names from the schema (eg.
11342
62a7100d774b [dataimport] Handle Bytes data in ExtEntity and convert them to Binary
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 11313
diff changeset
   145
      ``'first_name'``, ``'friend'``), and whose values are *sets*. For
62a7100d774b [dataimport] Handle Bytes data in ExtEntity and convert them to Binary
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 11313
diff changeset
   146
      attributes of type Bytes, byte strings should be inserted in `values`.
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   147
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   148
    For instance:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   149
10461
37644c518705 [doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10460
diff changeset
   150
    .. code-block:: python
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   151
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   152
        ext_entity.extid = 'http://example.org/person/debby'
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   153
        ext_entity.etype = 'Person'
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   154
        ext_entity.values = {'first_name': set([u"Deborah", u"Debby"]),
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   155
                            'friend': set(['http://example.org/person/john'])}
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   156
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   157
    """
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   158
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   159
    def __init__(self, etype, extid, values=None):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   160
        self.etype = etype
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   161
        self.extid = extid
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   162
        if values is None:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   163
            values = {}
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   164
        self.values = values
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   165
        self._schema = None
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   166
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   167
    def __repr__(self):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   168
        return '<%s %s %s>' % (self.etype, self.extid, self.values)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   169
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   170
    def iter_rdefs(self):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   171
        """Yield (key, rtype, role) defined in `.values` dict, with:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   172
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   173
        * `key` is the original key in `.values` (i.e. the relation type or a 2-uple (relation type,
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   174
          role))
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   175
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   176
        * `rtype` is a yams relation type, expected to be found in the schema (attribute or
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   177
          relation)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   178
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   179
        * `role` is the role of the entity in the relation, 'subject' or 'object'
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   180
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   181
        Iteration is done on a copy of the keys so values may be inserted/deleted during it.
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   182
        """
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   183
        for key in list(self.values):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   184
            if isinstance(key, tuple):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   185
                rtype, role = key
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   186
                assert role in ('subject', 'object'), key
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   187
                yield key, rtype, role
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   188
            else:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   189
                yield key, key, 'subject'
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   190
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   191
    def prepare(self, schema):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   192
        """Prepare an external entity for later insertion:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   193
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   194
        * ensure attributes and inlined relations have a single value
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   195
        * turn set([value]) into value and remove key associated to empty set
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   196
        * remove non inlined relations and return them as a [(e1key, relation, e2key)] list
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   197
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   198
        Return a list of non inlined relations that may be inserted later, each relations defined by
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   199
        a 3-tuple (subject extid, relation type, object extid).
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   200
11139
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
   201
        The instance's schema is given as argument.
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
   202
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   203
        Take care the importer may call this method several times.
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   204
        """
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   205
        assert self._schema is None, 'prepare() has already been called for %s' % self
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   206
        self._schema = schema
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   207
        eschema = schema.eschema(self.etype)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   208
        deferred = []
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   209
        entity_dict = self.values
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   210
        for key, rtype, role in self.iter_rdefs():
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   211
            rschema = schema.rschema(rtype)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   212
            if rschema.final or (rschema.inlined and role == 'subject'):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   213
                assert len(entity_dict[key]) <= 1, \
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   214
                    "more than one value for %s: %s (%s)" % (rtype, entity_dict[key], self.extid)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   215
                if entity_dict[key]:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   216
                    entity_dict[rtype] = entity_dict[key].pop()
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   217
                    if key != rtype:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   218
                        del entity_dict[key]
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   219
                    if (rschema.final and eschema.has_metadata(rtype, 'format')
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   220
                            and not rtype + '_format' in entity_dict):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   221
                        entity_dict[rtype + '_format'] = u'text/plain'
11393
e148b384a782 [dataimport] Restore handling of Binary in ExtEntity's values
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 11353
diff changeset
   222
                    if (rschema.final
e148b384a782 [dataimport] Restore handling of Binary in ExtEntity's values
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 11353
diff changeset
   223
                            and eschema.rdef(rtype).object.type == 'Bytes'
e148b384a782 [dataimport] Restore handling of Binary in ExtEntity's values
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 11353
diff changeset
   224
                            and not isinstance(entity_dict[rtype], Binary)):
11342
62a7100d774b [dataimport] Handle Bytes data in ExtEntity and convert them to Binary
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 11313
diff changeset
   225
                        entity_dict[rtype] = Binary(entity_dict[rtype])
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   226
                else:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   227
                    del entity_dict[key]
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   228
            else:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   229
                for target_extid in entity_dict.pop(key):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   230
                    if role == 'subject':
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   231
                        deferred.append((self.extid, rtype, target_extid))
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   232
                    else:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   233
                        deferred.append((target_extid, rtype, self.extid))
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   234
        return deferred
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   235
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   236
    def is_ready(self, extid2eid):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   237
        """Return True if the ext entity is ready, i.e. has all the URIs used in inlined relations
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   238
        currently existing.
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   239
        """
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   240
        assert self._schema, 'prepare() method should be called first on %s' % self
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   241
        # as .prepare has been called, we know that .values only contains subject relation *type* as
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   242
        # key (no more (rtype, role) tuple)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   243
        schema = self._schema
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   244
        entity_dict = self.values
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   245
        for rtype in entity_dict:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   246
            rschema = schema.rschema(rtype)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   247
            if not rschema.final:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   248
                # .prepare() should drop other cases from the entity dict
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   249
                assert rschema.inlined
12167
1ca864397424 [cleanup] Fix undetected pep8 error
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11943
diff changeset
   250
                if entity_dict[rtype] not in extid2eid:
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   251
                    return False
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   252
        # entity is ready, replace all relation's extid by eids
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   253
        for rtype in entity_dict:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   254
            rschema = schema.rschema(rtype)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   255
            if rschema.inlined:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   256
                entity_dict[rtype] = extid2eid[entity_dict[rtype]]
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   257
        return True
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   258
12173
d13fc09301bd [dataimport] Add explanation about why external entities can't be inserted
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 12167
diff changeset
   259
    def why_not_ready(self, extid2eid):
d13fc09301bd [dataimport] Add explanation about why external entities can't be inserted
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 12167
diff changeset
   260
        """Return some text explaining why this ext entity is not ready.
d13fc09301bd [dataimport] Add explanation about why external entities can't be inserted
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 12167
diff changeset
   261
        """
d13fc09301bd [dataimport] Add explanation about why external entities can't be inserted
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 12167
diff changeset
   262
        assert self._schema, 'prepare() method should be called first on %s' % self
d13fc09301bd [dataimport] Add explanation about why external entities can't be inserted
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 12167
diff changeset
   263
        # as .prepare has been called, we know that .values only contains subject relation *type* as
d13fc09301bd [dataimport] Add explanation about why external entities can't be inserted
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 12167
diff changeset
   264
        # key (no more (rtype, role) tuple)
d13fc09301bd [dataimport] Add explanation about why external entities can't be inserted
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 12167
diff changeset
   265
        schema = self._schema
d13fc09301bd [dataimport] Add explanation about why external entities can't be inserted
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 12167
diff changeset
   266
        entity_dict = self.values
d13fc09301bd [dataimport] Add explanation about why external entities can't be inserted
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 12167
diff changeset
   267
        for rtype in entity_dict:
d13fc09301bd [dataimport] Add explanation about why external entities can't be inserted
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 12167
diff changeset
   268
            rschema = schema.rschema(rtype)
d13fc09301bd [dataimport] Add explanation about why external entities can't be inserted
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 12167
diff changeset
   269
            if not rschema.final:
d13fc09301bd [dataimport] Add explanation about why external entities can't be inserted
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 12167
diff changeset
   270
                if entity_dict[rtype] not in extid2eid:
d13fc09301bd [dataimport] Add explanation about why external entities can't be inserted
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 12167
diff changeset
   271
                    return u'inlined relation %s is not present (%s)' % (rtype, entity_dict[rtype])
d13fc09301bd [dataimport] Add explanation about why external entities can't be inserted
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 12167
diff changeset
   272
        raise AssertionError('this external entity seems actually ready for insertion')
d13fc09301bd [dataimport] Add explanation about why external entities can't be inserted
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 12167
diff changeset
   273
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   274
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   275
class ExtEntitiesImporter(object):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   276
    """This class is responsible for importing externals entities, that is instances of
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   277
    :class:`ExtEntity`, into CubicWeb entities.
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   278
10461
37644c518705 [doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10460
diff changeset
   279
    :param schema: the CubicWeb's instance schema
11313
682b15eb2dd2 [dataimport] flake8
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11279
diff changeset
   280
10461
37644c518705 [doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10460
diff changeset
   281
    :param store: a CubicWeb `Store`
11313
682b15eb2dd2 [dataimport] flake8
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11279
diff changeset
   282
10461
37644c518705 [doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10460
diff changeset
   283
    :param extid2eid: optional {extid: eid} dictionary giving information on existing entities. It
37644c518705 [doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10460
diff changeset
   284
        will be completed during import. You may want to use :func:`cwuri2eid` to build it.
11313
682b15eb2dd2 [dataimport] flake8
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11279
diff changeset
   285
11943
760deab5413e [dataimport] Fix "existing_relations" parameter name in ExtEntitiesImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 11393
diff changeset
   286
    :param existing_relations: optional {rtype: set((subj eid, obj eid))} mapping giving information
11313
682b15eb2dd2 [dataimport] flake8
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11279
diff changeset
   287
        on existing relations of a given type. You may want to use :class:`RelationMapping` to build
682b15eb2dd2 [dataimport] flake8
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11279
diff changeset
   288
        it.
682b15eb2dd2 [dataimport] flake8
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11279
diff changeset
   289
682b15eb2dd2 [dataimport] flake8
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11279
diff changeset
   290
    :param etypes_order_hint: optional ordered iterable on entity types, giving an hint on the
682b15eb2dd2 [dataimport] flake8
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11279
diff changeset
   291
        order in which they should be attempted to be imported
682b15eb2dd2 [dataimport] flake8
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11279
diff changeset
   292
682b15eb2dd2 [dataimport] flake8
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11279
diff changeset
   293
    :param import_log: optional object implementing the :class:`SimpleImportLog` interface to
682b15eb2dd2 [dataimport] flake8
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11279
diff changeset
   294
        record events occuring during the import
682b15eb2dd2 [dataimport] flake8
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11279
diff changeset
   295
682b15eb2dd2 [dataimport] flake8
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11279
diff changeset
   296
    :param raise_on_error: optional boolean flag - default to false, indicating whether errors
682b15eb2dd2 [dataimport] flake8
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11279
diff changeset
   297
        should be raised or logged. You usually want them to be raised during test but to be logged
682b15eb2dd2 [dataimport] flake8
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11279
diff changeset
   298
        in production.
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   299
10461
37644c518705 [doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10460
diff changeset
   300
    Instances of this class are meant to import external entities through :meth:`import_entities`
37644c518705 [doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10460
diff changeset
   301
    which handles a stream of :class:`ExtEntity`. One may then plug arbitrary filters into the
37644c518705 [doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10460
diff changeset
   302
    external entities stream.
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   303
10461
37644c518705 [doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10460
diff changeset
   304
    .. automethod:: import_entities
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   305
    """
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   306
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   307
    def __init__(self, schema, store, extid2eid=None, existing_relations=None,
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   308
                 etypes_order_hint=(), import_log=None, raise_on_error=False):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   309
        self.schema = schema
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   310
        self.store = store
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   311
        self.extid2eid = extid2eid if extid2eid is not None else {}
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   312
        self.existing_relations = (existing_relations if existing_relations is not None
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   313
                                   else defaultdict(set))
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   314
        self.etypes_order_hint = etypes_order_hint
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   315
        if import_log is None:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   316
            import_log = SimpleImportLog('<unspecified>')
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   317
        self.import_log = import_log
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   318
        self.raise_on_error = raise_on_error
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   319
        # set of created/updated eids
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   320
        self.created = set()
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   321
        self.updated = set()
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   322
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   323
    def import_entities(self, ext_entities):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   324
        """Import given external entities (:class:`ExtEntity`) stream (usually a generator)."""
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   325
        # {etype: [etype dict]} of entities that are in the import queue
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   326
        queue = {}
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   327
        # order entity dictionaries then create/update them
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   328
        deferred = self._import_entities(ext_entities, queue)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   329
        # create deferred relations that don't exist already
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   330
        missing_relations = self.prepare_insert_deferred_relations(deferred)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   331
        self._warn_about_missing_work(queue, missing_relations)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   332
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   333
    def _import_entities(self, ext_entities, queue):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   334
        extid2eid = self.extid2eid
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   335
        deferred = {}  # non inlined relations that may be deferred
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   336
        self.import_log.record_debug('importing entities')
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   337
        for ext_entity in self.iter_ext_entities(ext_entities, deferred, queue):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   338
            try:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   339
                eid = extid2eid[ext_entity.extid]
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   340
            except KeyError:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   341
                self.prepare_insert_entity(ext_entity)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   342
            else:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   343
                if ext_entity.values:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   344
                    self.prepare_update_entity(ext_entity, eid)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   345
        return deferred
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   346
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   347
    def iter_ext_entities(self, ext_entities, deferred, queue):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   348
        """Yield external entities in an order which attempts to satisfy
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   349
        schema constraints (inlined / cardinality) and to optimize the import.
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   350
        """
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   351
        schema = self.schema
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   352
        extid2eid = self.extid2eid
11275
814f54d6183b [dataimport] order of ExtEntities should be irrelevant (closes #13117472)
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 10939
diff changeset
   353
        order_hint = list(self.etypes_order_hint)
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   354
        for ext_entity in ext_entities:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   355
            # check data in the transitional representation and prepare it for
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   356
            # later insertion in the database
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   357
            for subject_uri, rtype, object_uri in ext_entity.prepare(schema):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   358
                deferred.setdefault(rtype, set()).add((subject_uri, object_uri))
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   359
            if not ext_entity.is_ready(extid2eid):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   360
                queue.setdefault(ext_entity.etype, []).append(ext_entity)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   361
                continue
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   362
            yield ext_entity
11275
814f54d6183b [dataimport] order of ExtEntities should be irrelevant (closes #13117472)
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 10939
diff changeset
   363
            if not queue:
814f54d6183b [dataimport] order of ExtEntities should be irrelevant (closes #13117472)
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 10939
diff changeset
   364
                continue
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   365
            # check for some entities in the queue that may now be ready. We'll have to restart
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   366
            # search for ready entities until no one is generated
11275
814f54d6183b [dataimport] order of ExtEntities should be irrelevant (closes #13117472)
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 10939
diff changeset
   367
            for etype in queue:
814f54d6183b [dataimport] order of ExtEntities should be irrelevant (closes #13117472)
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 10939
diff changeset
   368
                if etype not in order_hint:
814f54d6183b [dataimport] order of ExtEntities should be irrelevant (closes #13117472)
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 10939
diff changeset
   369
                    order_hint.append(etype)
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   370
            new = True
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   371
            while new:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   372
                new = False
11275
814f54d6183b [dataimport] order of ExtEntities should be irrelevant (closes #13117472)
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 10939
diff changeset
   373
                for etype in order_hint:
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   374
                    if etype in queue:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   375
                        new_queue = []
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   376
                        for ext_entity in queue[etype]:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   377
                            if ext_entity.is_ready(extid2eid):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   378
                                yield ext_entity
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   379
                                # may unlock entity previously handled within this loop
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   380
                                new = True
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   381
                            else:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   382
                                new_queue.append(ext_entity)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   383
                        if new_queue:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   384
                            queue[etype][:] = new_queue
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   385
                        else:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   386
                            del queue[etype]
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   387
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   388
    def prepare_insert_entity(self, ext_entity):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   389
        """Call the store to prepare insertion of the given external entity"""
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   390
        eid = self.store.prepare_insert_entity(ext_entity.etype, **ext_entity.values)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   391
        self.extid2eid[ext_entity.extid] = eid
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   392
        self.created.add(eid)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   393
        return eid
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   394
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   395
    def prepare_update_entity(self, ext_entity, eid):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   396
        """Call the store to prepare update of the given external entity"""
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   397
        self.store.prepare_update_entity(ext_entity.etype, eid, **ext_entity.values)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   398
        self.updated.add(eid)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   399
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   400
    def prepare_insert_deferred_relations(self, deferred):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   401
        """Call the store to insert deferred relations (not handled during insertion/update for
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   402
        entities). Return a list of relations `[(subj ext id, obj ext id)]` that may not be inserted
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   403
        because the target entities don't exists yet.
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   404
        """
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   405
        prepare_insert_relation = self.store.prepare_insert_relation
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   406
        rschema = self.schema.rschema
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   407
        extid2eid = self.extid2eid
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   408
        missing_relations = []
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   409
        for rtype, relations in deferred.items():
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   410
            self.import_log.record_debug('importing %s %s relations' % (len(relations), rtype))
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   411
            symmetric = rschema(rtype).symmetric
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   412
            existing = self.existing_relations[rtype]
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   413
            for subject_uri, object_uri in relations:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   414
                try:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   415
                    subject_eid = extid2eid[subject_uri]
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   416
                    object_eid = extid2eid[object_uri]
11275
814f54d6183b [dataimport] order of ExtEntities should be irrelevant (closes #13117472)
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 10939
diff changeset
   417
                except KeyError as exc:
814f54d6183b [dataimport] order of ExtEntities should be irrelevant (closes #13117472)
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 10939
diff changeset
   418
                    missing_relations.append((subject_uri, rtype, object_uri, exc))
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   419
                    continue
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   420
                if (subject_eid, object_eid) not in existing:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   421
                    prepare_insert_relation(subject_eid, rtype, object_eid)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   422
                    existing.add((subject_eid, object_eid))
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   423
                    if symmetric:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   424
                        existing.add((object_eid, subject_eid))
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   425
        return missing_relations
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   426
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   427
    def _warn_about_missing_work(self, queue, missing_relations):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   428
        error = self.import_log.record_error
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   429
        if queue:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   430
            msgs = ["can't create some entities, is there some cycle or "
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   431
                    "missing data?"]
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   432
            for ext_entities in queue.values():
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   433
                for ext_entity in ext_entities:
12173
d13fc09301bd [dataimport] Add explanation about why external entities can't be inserted
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 12167
diff changeset
   434
                    msg = '{}: {}'.format(ext_entity, ext_entity.why_not_ready(self.extid2eid))
d13fc09301bd [dataimport] Add explanation about why external entities can't be inserted
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 12167
diff changeset
   435
                    msgs.append(msg)
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   436
            map(error, msgs)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   437
            if self.raise_on_error:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   438
                raise Exception('\n'.join(msgs))
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   439
        if missing_relations:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   440
            msgs = ["can't create some relations, is there missing data?"]
11275
814f54d6183b [dataimport] order of ExtEntities should be irrelevant (closes #13117472)
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 10939
diff changeset
   441
            for subject_uri, rtype, object_uri, exc in missing_relations:
814f54d6183b [dataimport] order of ExtEntities should be irrelevant (closes #13117472)
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 10939
diff changeset
   442
                msgs.append("Could not find %s when trying to insert (%s, %s, %s)"
814f54d6183b [dataimport] order of ExtEntities should be irrelevant (closes #13117472)
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 10939
diff changeset
   443
                            % (exc, subject_uri, rtype, object_uri))
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   444
            map(error, msgs)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   445
            if self.raise_on_error:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   446
                raise Exception('\n'.join(msgs))
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   447
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   448
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   449
class SimpleImportLog(object):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   450
    """Fake CWDataImport log using a simple text format.
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   451
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   452
    Useful to display logs in the UI instead of storing them to the
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   453
    database.
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   454
    """
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   455
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   456
    def __init__(self, filename):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   457
        self.logs = []
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   458
        self.filename = filename
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   459
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   460
    def record_debug(self, msg, path=None, line=None):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   461
        self._log(logging.DEBUG, msg, path, line)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   462
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   463
    def record_info(self, msg, path=None, line=None):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   464
        self._log(logging.INFO, msg, path, line)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   465
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   466
    def record_warning(self, msg, path=None, line=None):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   467
        self._log(logging.WARNING, msg, path, line)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   468
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   469
    def record_error(self, msg, path=None, line=None):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   470
        self._log(logging.ERROR, msg, path, line)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   471
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   472
    def record_fatal(self, msg, path=None, line=None):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   473
        self._log(logging.FATAL, msg, path, line)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   474
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   475
    def _log(self, severity, msg, path, line):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   476
        encodedmsg = u'%s\t%s\t%s\t%s' % (severity, self.filename,
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   477
                                          line or u'', msg)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   478
        self.logs.append(encodedmsg)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   479
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   480
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   481
class HTMLImportLog(SimpleImportLog):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   482
    """Fake CWDataImport log using a simple HTML format."""
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   483
    def __init__(self, filename):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   484
        super(HTMLImportLog, self).__init__(xml_escape(filename))
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   485
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   486
    def _log(self, severity, msg, path, line):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   487
        encodedmsg = u'%s\t%s\t%s\t%s<br/>' % (severity, self.filename,
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   488
                                               line or u'', xml_escape(msg))
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   489
        self.logs.append(encodedmsg)