cubicweb/dataimport/importer.py
author Sylvain Thénault <sylvain.thenault@logilab.fr>
Wed, 19 Apr 2017 09:05:10 +0200
changeset 12173 d13fc09301bd
parent 12167 1ca864397424
child 12625 ba5231e1aa45
permissions -rw-r--r--
[dataimport] Add explanation about why external entities can't be inserted By default after the import processed the importer indicates which external entities can't be inserted because they are missing dependency data (other entities, used in inlined or mandatory relations). It usually helps there to find out which extids / relations are missing, so add this to the log.
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
11139
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
     1
# copyright 2015-2016 LOGILAB S.A. (Paris, FRANCE), all rights reserved.
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
     2
# contact http://www.logilab.fr -- mailto:contact@logilab.fr
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
     3
#
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
     4
# This program is free software: you can redistribute it and/or modify it under
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
     5
# the terms of the GNU Lesser General Public License as published by the Free
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
     6
# Software Foundation, either version 2.1 of the License, or (at your option)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
     7
# any later version.
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
     8
#
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
     9
# This program is distributed in the hope that it will be useful, but WITHOUT
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    10
# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    11
# FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    12
# details.
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    13
#
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    14
# You should have received a copy of the GNU Lesser General Public License along
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    15
# with this program. If not, see <http://www.gnu.org/licenses/>.
10461
37644c518705 [doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10460
diff changeset
    16
"""Data import of external entities.
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    17
10461
37644c518705 [doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10460
diff changeset
    18
Main entry points:
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    19
10461
37644c518705 [doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10460
diff changeset
    20
.. autoclass:: ExtEntitiesImporter
37644c518705 [doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10460
diff changeset
    21
.. autoclass:: ExtEntity
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    22
10461
37644c518705 [doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10460
diff changeset
    23
Utilities:
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    24
10461
37644c518705 [doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10460
diff changeset
    25
.. autofunction:: cwuri2eid
37644c518705 [doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10460
diff changeset
    26
.. autoclass:: RelationMapping
10514
b29d9904482e add use_extid_as_cwuri ext entity transform, that will be often necessary and not so easy to write at once
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10461
diff changeset
    27
.. autofunction:: cubicweb.dataimport.importer.use_extid_as_cwuri
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    28
"""
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    29
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    30
from collections import defaultdict
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    31
import logging
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    32
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    33
from logilab.mtconverter import xml_escape
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    34
11342
62a7100d774b [dataimport] Handle Bytes data in ExtEntity and convert them to Binary
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 11313
diff changeset
    35
from cubicweb import Binary
62a7100d774b [dataimport] Handle Bytes data in ExtEntity and convert them to Binary
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 11313
diff changeset
    36
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    37
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    38
def cwuri2eid(cnx, etypes, source_eid=None):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    39
    """Return a dictionary mapping cwuri to eid for entities of the given entity types and / or
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    40
    source.
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    41
    """
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    42
    assert source_eid or etypes, 'no entity types nor source specified'
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    43
    rql = 'Any U, X WHERE X cwuri U'
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    44
    args = {}
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    45
    if len(etypes) == 1:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    46
        rql += ', X is %s' % etypes[0]
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    47
    elif etypes:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    48
        rql += ', X is IN (%s)' % ','.join(etypes)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    49
    if source_eid is not None:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    50
        rql += ', X cw_source S, S eid %(s)s'
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    51
        args['s'] = source_eid
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    52
    return dict(cnx.execute(rql, args))
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    53
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
    54
10514
b29d9904482e add use_extid_as_cwuri ext entity transform, that will be often necessary and not so easy to write at once
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10461
diff changeset
    55
def use_extid_as_cwuri(extid2eid):
b29d9904482e add use_extid_as_cwuri ext entity transform, that will be often necessary and not so easy to write at once
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10461
diff changeset
    56
    """Return a generator of :class:`ExtEntity` objects that will set `cwuri`
b29d9904482e add use_extid_as_cwuri ext entity transform, that will be often necessary and not so easy to write at once
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10461
diff changeset
    57
    using entity's extid if the entity does not exist yet and has no `cwuri`
b29d9904482e add use_extid_as_cwuri ext entity transform, that will be often necessary and not so easy to write at once
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10461
diff changeset
    58
    defined.
b29d9904482e add use_extid_as_cwuri ext entity transform, that will be often necessary and not so easy to write at once
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10461
diff changeset
    59
b29d9904482e add use_extid_as_cwuri ext entity transform, that will be often necessary and not so easy to write at once
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10461
diff changeset
    60
    `extid2eid` is an extid to eid dictionary coming from an
b29d9904482e add use_extid_as_cwuri ext entity transform, that will be often necessary and not so easy to write at once
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10461
diff changeset
    61
    :class:`ExtEntitiesImporter` instance.
b29d9904482e add use_extid_as_cwuri ext entity transform, that will be often necessary and not so easy to write at once
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10461
diff changeset
    62
b29d9904482e add use_extid_as_cwuri ext entity transform, that will be often necessary and not so easy to write at once
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10461
diff changeset
    63
    Example usage:
b29d9904482e add use_extid_as_cwuri ext entity transform, that will be often necessary and not so easy to write at once
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10461
diff changeset
    64
b29d9904482e add use_extid_as_cwuri ext entity transform, that will be often necessary and not so easy to write at once
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10461
diff changeset
    65
    .. code-block:: python
b29d9904482e add use_extid_as_cwuri ext entity transform, that will be often necessary and not so easy to write at once
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10461
diff changeset
    66
11128
9b4de34ad394 [dataimport] update example to use standard importer
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11057
diff changeset
    67
        importer = ExtEntitiesImporter(cnx, store, import_log)
10514
b29d9904482e add use_extid_as_cwuri ext entity transform, that will be often necessary and not so easy to write at once
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10461
diff changeset
    68
        set_cwuri = use_extid_as_cwuri(importer.extid2eid)
b29d9904482e add use_extid_as_cwuri ext entity transform, that will be often necessary and not so easy to write at once
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10461
diff changeset
    69
        importer.import_entities(set_cwuri(extentities))
b29d9904482e add use_extid_as_cwuri ext entity transform, that will be often necessary and not so easy to write at once
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10461
diff changeset
    70
    """
b29d9904482e add use_extid_as_cwuri ext entity transform, that will be often necessary and not so easy to write at once
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10461
diff changeset
    71
    def use_extid_as_cwuri_filter(extentities):
b29d9904482e add use_extid_as_cwuri ext entity transform, that will be often necessary and not so easy to write at once
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10461
diff changeset
    72
        for extentity in extentities:
b29d9904482e add use_extid_as_cwuri ext entity transform, that will be often necessary and not so easy to write at once
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10461
diff changeset
    73
            if extentity.extid not in extid2eid:
10809
359cbdf3a515 [dataimport] extid must be a bytes object
Julien Cristau <julien.cristau@logilab.fr>
parents: 10514
diff changeset
    74
                extentity.values.setdefault('cwuri', set([extentity.extid.decode('utf-8')]))
10514
b29d9904482e add use_extid_as_cwuri ext entity transform, that will be often necessary and not so easy to write at once
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10461
diff changeset
    75
            yield extentity
b29d9904482e add use_extid_as_cwuri ext entity transform, that will be often necessary and not so easy to write at once
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10461
diff changeset
    76
    return use_extid_as_cwuri_filter
b29d9904482e add use_extid_as_cwuri ext entity transform, that will be often necessary and not so easy to write at once
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10461
diff changeset
    77
b29d9904482e add use_extid_as_cwuri ext entity transform, that will be often necessary and not so easy to write at once
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 10461
diff changeset
    78
11139
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
    79
def drop_extra_values(extentities, schema, import_log):
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
    80
    """Return a generator of :class:`ExtEntity` objects that will ensure their attributes and
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
    81
    inlined relations have a single value. When it's not the case, a warning will be recorded in
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
    82
    the import log and one value among other will be kept (randomly).
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
    83
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
    84
    `schema` is the instance's schema, `import_log` is an instance of a class implementing the
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
    85
    :class:`SimpleImportLog` interface.
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
    86
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
    87
    Example usage:
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
    88
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
    89
    .. code-block:: python
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
    90
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
    91
        importer = ExtEntitiesImporter(schema, store, import_log)
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
    92
        importer.import_entities(drop_extra_values(extentities, schema, import_log))
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
    93
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
    94
    """
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
    95
    _get_rschema = schema.rschema
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
    96
    for extentity in extentities:
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
    97
        entity_dict = extentity.values
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
    98
        for key, rtype, role in extentity.iter_rdefs():
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
    99
            rschema = _get_rschema(rtype)
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
   100
            if (rschema.final or (rschema.inlined and role == 'subject')) \
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
   101
               and len(entity_dict[key]) > 1:
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
   102
                values = ', '.join(repr(v) for v in entity_dict[key])
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
   103
                import_log.record_warning(
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
   104
                    "more than one value for attribute %r, only one will be kept: %s"
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
   105
                    % (rtype, values), path=extentity.extid)
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
   106
                entity_dict[key] = set([entity_dict[key].pop()])
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
   107
        yield extentity
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
   108
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
   109
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   110
class RelationMapping(object):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   111
    """Read-only mapping from relation type to set of related (subject, object) eids.
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   112
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   113
    If `source` is specified, only returns relations implying entities from
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   114
    this source.
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   115
    """
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   116
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   117
    def __init__(self, cnx, source=None):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   118
        self.cnx = cnx
10939
b30c2f49da57 [dataimport] Format strings with % instead of .format()
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10809
diff changeset
   119
        self._rql_template = 'Any S,O WHERE S %s O'
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   120
        self._kwargs = {}
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   121
        if source is not None:
10939
b30c2f49da57 [dataimport] Format strings with % instead of .format()
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10809
diff changeset
   122
            self._rql_template += ', S cw_source SO, O cw_source SO, SO eid %%(s)s'
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   123
            self._kwargs['s'] = source.eid
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   124
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   125
    def __getitem__(self, rtype):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   126
        """Return a set of (subject, object) eids already related by `rtype`"""
10939
b30c2f49da57 [dataimport] Format strings with % instead of .format()
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10809
diff changeset
   127
        rql = self._rql_template % rtype
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   128
        return set(tuple(x) for x in self.cnx.execute(rql, self._kwargs))
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   129
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   130
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   131
class ExtEntity(object):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   132
    """Transitional representation of an entity for use in data importer.
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   133
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   134
    An external entity has the following properties:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   135
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   136
    * ``extid`` (external id), an identifier for the ext entity,
10461
37644c518705 [doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10460
diff changeset
   137
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   138
    * ``etype`` (entity type), a string which must be the name of one entity type in the schema
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   139
      (eg. ``'Person'``, ``'Animal'``, ...),
10461
37644c518705 [doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10460
diff changeset
   140
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   141
    * ``values``, a dictionary whose keys are attribute or relation names from the schema (eg.
11342
62a7100d774b [dataimport] Handle Bytes data in ExtEntity and convert them to Binary
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 11313
diff changeset
   142
      ``'first_name'``, ``'friend'``), and whose values are *sets*. For
62a7100d774b [dataimport] Handle Bytes data in ExtEntity and convert them to Binary
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 11313
diff changeset
   143
      attributes of type Bytes, byte strings should be inserted in `values`.
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   144
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   145
    For instance:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   146
10461
37644c518705 [doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10460
diff changeset
   147
    .. code-block:: python
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   148
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   149
        ext_entity.extid = 'http://example.org/person/debby'
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   150
        ext_entity.etype = 'Person'
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   151
        ext_entity.values = {'first_name': set([u"Deborah", u"Debby"]),
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   152
                            'friend': set(['http://example.org/person/john'])}
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   153
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   154
    """
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   155
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   156
    def __init__(self, etype, extid, values=None):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   157
        self.etype = etype
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   158
        self.extid = extid
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   159
        if values is None:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   160
            values = {}
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   161
        self.values = values
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   162
        self._schema = None
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   163
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   164
    def __repr__(self):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   165
        return '<%s %s %s>' % (self.etype, self.extid, self.values)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   166
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   167
    def iter_rdefs(self):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   168
        """Yield (key, rtype, role) defined in `.values` dict, with:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   169
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   170
        * `key` is the original key in `.values` (i.e. the relation type or a 2-uple (relation type,
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   171
          role))
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   172
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   173
        * `rtype` is a yams relation type, expected to be found in the schema (attribute or
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   174
          relation)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   175
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   176
        * `role` is the role of the entity in the relation, 'subject' or 'object'
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   177
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   178
        Iteration is done on a copy of the keys so values may be inserted/deleted during it.
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   179
        """
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   180
        for key in list(self.values):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   181
            if isinstance(key, tuple):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   182
                rtype, role = key
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   183
                assert role in ('subject', 'object'), key
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   184
                yield key, rtype, role
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   185
            else:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   186
                yield key, key, 'subject'
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   187
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   188
    def prepare(self, schema):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   189
        """Prepare an external entity for later insertion:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   190
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   191
        * ensure attributes and inlined relations have a single value
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   192
        * turn set([value]) into value and remove key associated to empty set
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   193
        * remove non inlined relations and return them as a [(e1key, relation, e2key)] list
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   194
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   195
        Return a list of non inlined relations that may be inserted later, each relations defined by
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   196
        a 3-tuple (subject extid, relation type, object extid).
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   197
11139
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
   198
        The instance's schema is given as argument.
df928a3a94e3 [dataimport] add a filter function to not fail if some extentity has several values for an attribute of final relation
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11128
diff changeset
   199
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   200
        Take care the importer may call this method several times.
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   201
        """
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   202
        assert self._schema is None, 'prepare() has already been called for %s' % self
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   203
        self._schema = schema
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   204
        eschema = schema.eschema(self.etype)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   205
        deferred = []
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   206
        entity_dict = self.values
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   207
        for key, rtype, role in self.iter_rdefs():
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   208
            rschema = schema.rschema(rtype)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   209
            if rschema.final or (rschema.inlined and role == 'subject'):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   210
                assert len(entity_dict[key]) <= 1, \
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   211
                    "more than one value for %s: %s (%s)" % (rtype, entity_dict[key], self.extid)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   212
                if entity_dict[key]:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   213
                    entity_dict[rtype] = entity_dict[key].pop()
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   214
                    if key != rtype:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   215
                        del entity_dict[key]
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   216
                    if (rschema.final and eschema.has_metadata(rtype, 'format')
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   217
                            and not rtype + '_format' in entity_dict):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   218
                        entity_dict[rtype + '_format'] = u'text/plain'
11393
e148b384a782 [dataimport] Restore handling of Binary in ExtEntity's values
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 11353
diff changeset
   219
                    if (rschema.final
e148b384a782 [dataimport] Restore handling of Binary in ExtEntity's values
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 11353
diff changeset
   220
                            and eschema.rdef(rtype).object.type == 'Bytes'
e148b384a782 [dataimport] Restore handling of Binary in ExtEntity's values
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 11353
diff changeset
   221
                            and not isinstance(entity_dict[rtype], Binary)):
11342
62a7100d774b [dataimport] Handle Bytes data in ExtEntity and convert them to Binary
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 11313
diff changeset
   222
                        entity_dict[rtype] = Binary(entity_dict[rtype])
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   223
                else:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   224
                    del entity_dict[key]
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   225
            else:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   226
                for target_extid in entity_dict.pop(key):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   227
                    if role == 'subject':
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   228
                        deferred.append((self.extid, rtype, target_extid))
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   229
                    else:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   230
                        deferred.append((target_extid, rtype, self.extid))
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   231
        return deferred
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   232
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   233
    def is_ready(self, extid2eid):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   234
        """Return True if the ext entity is ready, i.e. has all the URIs used in inlined relations
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   235
        currently existing.
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   236
        """
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   237
        assert self._schema, 'prepare() method should be called first on %s' % self
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   238
        # as .prepare has been called, we know that .values only contains subject relation *type* as
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   239
        # key (no more (rtype, role) tuple)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   240
        schema = self._schema
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   241
        entity_dict = self.values
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   242
        for rtype in entity_dict:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   243
            rschema = schema.rschema(rtype)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   244
            if not rschema.final:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   245
                # .prepare() should drop other cases from the entity dict
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   246
                assert rschema.inlined
12167
1ca864397424 [cleanup] Fix undetected pep8 error
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11943
diff changeset
   247
                if entity_dict[rtype] not in extid2eid:
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   248
                    return False
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   249
        # entity is ready, replace all relation's extid by eids
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   250
        for rtype in entity_dict:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   251
            rschema = schema.rschema(rtype)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   252
            if rschema.inlined:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   253
                entity_dict[rtype] = extid2eid[entity_dict[rtype]]
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   254
        return True
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   255
12173
d13fc09301bd [dataimport] Add explanation about why external entities can't be inserted
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 12167
diff changeset
   256
    def why_not_ready(self, extid2eid):
d13fc09301bd [dataimport] Add explanation about why external entities can't be inserted
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 12167
diff changeset
   257
        """Return some text explaining why this ext entity is not ready.
d13fc09301bd [dataimport] Add explanation about why external entities can't be inserted
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 12167
diff changeset
   258
        """
d13fc09301bd [dataimport] Add explanation about why external entities can't be inserted
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 12167
diff changeset
   259
        assert self._schema, 'prepare() method should be called first on %s' % self
d13fc09301bd [dataimport] Add explanation about why external entities can't be inserted
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 12167
diff changeset
   260
        # as .prepare has been called, we know that .values only contains subject relation *type* as
d13fc09301bd [dataimport] Add explanation about why external entities can't be inserted
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 12167
diff changeset
   261
        # key (no more (rtype, role) tuple)
d13fc09301bd [dataimport] Add explanation about why external entities can't be inserted
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 12167
diff changeset
   262
        schema = self._schema
d13fc09301bd [dataimport] Add explanation about why external entities can't be inserted
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 12167
diff changeset
   263
        entity_dict = self.values
d13fc09301bd [dataimport] Add explanation about why external entities can't be inserted
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 12167
diff changeset
   264
        for rtype in entity_dict:
d13fc09301bd [dataimport] Add explanation about why external entities can't be inserted
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 12167
diff changeset
   265
            rschema = schema.rschema(rtype)
d13fc09301bd [dataimport] Add explanation about why external entities can't be inserted
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 12167
diff changeset
   266
            if not rschema.final:
d13fc09301bd [dataimport] Add explanation about why external entities can't be inserted
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 12167
diff changeset
   267
                if entity_dict[rtype] not in extid2eid:
d13fc09301bd [dataimport] Add explanation about why external entities can't be inserted
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 12167
diff changeset
   268
                    return u'inlined relation %s is not present (%s)' % (rtype, entity_dict[rtype])
d13fc09301bd [dataimport] Add explanation about why external entities can't be inserted
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 12167
diff changeset
   269
        raise AssertionError('this external entity seems actually ready for insertion')
d13fc09301bd [dataimport] Add explanation about why external entities can't be inserted
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 12167
diff changeset
   270
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   271
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   272
class ExtEntitiesImporter(object):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   273
    """This class is responsible for importing externals entities, that is instances of
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   274
    :class:`ExtEntity`, into CubicWeb entities.
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   275
10461
37644c518705 [doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10460
diff changeset
   276
    :param schema: the CubicWeb's instance schema
11313
682b15eb2dd2 [dataimport] flake8
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11279
diff changeset
   277
10461
37644c518705 [doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10460
diff changeset
   278
    :param store: a CubicWeb `Store`
11313
682b15eb2dd2 [dataimport] flake8
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11279
diff changeset
   279
10461
37644c518705 [doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10460
diff changeset
   280
    :param extid2eid: optional {extid: eid} dictionary giving information on existing entities. It
37644c518705 [doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10460
diff changeset
   281
        will be completed during import. You may want to use :func:`cwuri2eid` to build it.
11313
682b15eb2dd2 [dataimport] flake8
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11279
diff changeset
   282
11943
760deab5413e [dataimport] Fix "existing_relations" parameter name in ExtEntitiesImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 11393
diff changeset
   283
    :param existing_relations: optional {rtype: set((subj eid, obj eid))} mapping giving information
11313
682b15eb2dd2 [dataimport] flake8
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11279
diff changeset
   284
        on existing relations of a given type. You may want to use :class:`RelationMapping` to build
682b15eb2dd2 [dataimport] flake8
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11279
diff changeset
   285
        it.
682b15eb2dd2 [dataimport] flake8
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11279
diff changeset
   286
682b15eb2dd2 [dataimport] flake8
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11279
diff changeset
   287
    :param etypes_order_hint: optional ordered iterable on entity types, giving an hint on the
682b15eb2dd2 [dataimport] flake8
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11279
diff changeset
   288
        order in which they should be attempted to be imported
682b15eb2dd2 [dataimport] flake8
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11279
diff changeset
   289
682b15eb2dd2 [dataimport] flake8
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11279
diff changeset
   290
    :param import_log: optional object implementing the :class:`SimpleImportLog` interface to
682b15eb2dd2 [dataimport] flake8
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11279
diff changeset
   291
        record events occuring during the import
682b15eb2dd2 [dataimport] flake8
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11279
diff changeset
   292
682b15eb2dd2 [dataimport] flake8
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11279
diff changeset
   293
    :param raise_on_error: optional boolean flag - default to false, indicating whether errors
682b15eb2dd2 [dataimport] flake8
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11279
diff changeset
   294
        should be raised or logged. You usually want them to be raised during test but to be logged
682b15eb2dd2 [dataimport] flake8
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 11279
diff changeset
   295
        in production.
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   296
10461
37644c518705 [doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10460
diff changeset
   297
    Instances of this class are meant to import external entities through :meth:`import_entities`
37644c518705 [doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10460
diff changeset
   298
    which handles a stream of :class:`ExtEntity`. One may then plug arbitrary filters into the
37644c518705 [doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10460
diff changeset
   299
    external entities stream.
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   300
10461
37644c518705 [doc] Add a tutorial and extend documentation for ExtEntityImporter
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 10460
diff changeset
   301
    .. automethod:: import_entities
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   302
    """
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   303
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   304
    def __init__(self, schema, store, extid2eid=None, existing_relations=None,
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   305
                 etypes_order_hint=(), import_log=None, raise_on_error=False):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   306
        self.schema = schema
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   307
        self.store = store
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   308
        self.extid2eid = extid2eid if extid2eid is not None else {}
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   309
        self.existing_relations = (existing_relations if existing_relations is not None
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   310
                                   else defaultdict(set))
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   311
        self.etypes_order_hint = etypes_order_hint
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   312
        if import_log is None:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   313
            import_log = SimpleImportLog('<unspecified>')
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   314
        self.import_log = import_log
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   315
        self.raise_on_error = raise_on_error
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   316
        # set of created/updated eids
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   317
        self.created = set()
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   318
        self.updated = set()
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   319
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   320
    def import_entities(self, ext_entities):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   321
        """Import given external entities (:class:`ExtEntity`) stream (usually a generator)."""
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   322
        # {etype: [etype dict]} of entities that are in the import queue
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   323
        queue = {}
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   324
        # order entity dictionaries then create/update them
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   325
        deferred = self._import_entities(ext_entities, queue)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   326
        # create deferred relations that don't exist already
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   327
        missing_relations = self.prepare_insert_deferred_relations(deferred)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   328
        self._warn_about_missing_work(queue, missing_relations)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   329
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   330
    def _import_entities(self, ext_entities, queue):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   331
        extid2eid = self.extid2eid
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   332
        deferred = {}  # non inlined relations that may be deferred
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   333
        self.import_log.record_debug('importing entities')
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   334
        for ext_entity in self.iter_ext_entities(ext_entities, deferred, queue):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   335
            try:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   336
                eid = extid2eid[ext_entity.extid]
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   337
            except KeyError:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   338
                self.prepare_insert_entity(ext_entity)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   339
            else:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   340
                if ext_entity.values:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   341
                    self.prepare_update_entity(ext_entity, eid)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   342
        return deferred
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   343
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   344
    def iter_ext_entities(self, ext_entities, deferred, queue):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   345
        """Yield external entities in an order which attempts to satisfy
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   346
        schema constraints (inlined / cardinality) and to optimize the import.
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   347
        """
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   348
        schema = self.schema
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   349
        extid2eid = self.extid2eid
11275
814f54d6183b [dataimport] order of ExtEntities should be irrelevant (closes #13117472)
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 10939
diff changeset
   350
        order_hint = list(self.etypes_order_hint)
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   351
        for ext_entity in ext_entities:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   352
            # check data in the transitional representation and prepare it for
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   353
            # later insertion in the database
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   354
            for subject_uri, rtype, object_uri in ext_entity.prepare(schema):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   355
                deferred.setdefault(rtype, set()).add((subject_uri, object_uri))
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   356
            if not ext_entity.is_ready(extid2eid):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   357
                queue.setdefault(ext_entity.etype, []).append(ext_entity)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   358
                continue
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   359
            yield ext_entity
11275
814f54d6183b [dataimport] order of ExtEntities should be irrelevant (closes #13117472)
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 10939
diff changeset
   360
            if not queue:
814f54d6183b [dataimport] order of ExtEntities should be irrelevant (closes #13117472)
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 10939
diff changeset
   361
                continue
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   362
            # check for some entities in the queue that may now be ready. We'll have to restart
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   363
            # search for ready entities until no one is generated
11275
814f54d6183b [dataimport] order of ExtEntities should be irrelevant (closes #13117472)
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 10939
diff changeset
   364
            for etype in queue:
814f54d6183b [dataimport] order of ExtEntities should be irrelevant (closes #13117472)
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 10939
diff changeset
   365
                if etype not in order_hint:
814f54d6183b [dataimport] order of ExtEntities should be irrelevant (closes #13117472)
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 10939
diff changeset
   366
                    order_hint.append(etype)
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   367
            new = True
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   368
            while new:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   369
                new = False
11275
814f54d6183b [dataimport] order of ExtEntities should be irrelevant (closes #13117472)
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 10939
diff changeset
   370
                for etype in order_hint:
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   371
                    if etype in queue:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   372
                        new_queue = []
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   373
                        for ext_entity in queue[etype]:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   374
                            if ext_entity.is_ready(extid2eid):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   375
                                yield ext_entity
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   376
                                # may unlock entity previously handled within this loop
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   377
                                new = True
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   378
                            else:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   379
                                new_queue.append(ext_entity)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   380
                        if new_queue:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   381
                            queue[etype][:] = new_queue
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   382
                        else:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   383
                            del queue[etype]
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   384
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   385
    def prepare_insert_entity(self, ext_entity):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   386
        """Call the store to prepare insertion of the given external entity"""
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   387
        eid = self.store.prepare_insert_entity(ext_entity.etype, **ext_entity.values)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   388
        self.extid2eid[ext_entity.extid] = eid
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   389
        self.created.add(eid)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   390
        return eid
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   391
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   392
    def prepare_update_entity(self, ext_entity, eid):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   393
        """Call the store to prepare update of the given external entity"""
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   394
        self.store.prepare_update_entity(ext_entity.etype, eid, **ext_entity.values)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   395
        self.updated.add(eid)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   396
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   397
    def prepare_insert_deferred_relations(self, deferred):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   398
        """Call the store to insert deferred relations (not handled during insertion/update for
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   399
        entities). Return a list of relations `[(subj ext id, obj ext id)]` that may not be inserted
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   400
        because the target entities don't exists yet.
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   401
        """
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   402
        prepare_insert_relation = self.store.prepare_insert_relation
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   403
        rschema = self.schema.rschema
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   404
        extid2eid = self.extid2eid
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   405
        missing_relations = []
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   406
        for rtype, relations in deferred.items():
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   407
            self.import_log.record_debug('importing %s %s relations' % (len(relations), rtype))
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   408
            symmetric = rschema(rtype).symmetric
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   409
            existing = self.existing_relations[rtype]
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   410
            for subject_uri, object_uri in relations:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   411
                try:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   412
                    subject_eid = extid2eid[subject_uri]
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   413
                    object_eid = extid2eid[object_uri]
11275
814f54d6183b [dataimport] order of ExtEntities should be irrelevant (closes #13117472)
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 10939
diff changeset
   414
                except KeyError as exc:
814f54d6183b [dataimport] order of ExtEntities should be irrelevant (closes #13117472)
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 10939
diff changeset
   415
                    missing_relations.append((subject_uri, rtype, object_uri, exc))
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   416
                    continue
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   417
                if (subject_eid, object_eid) not in existing:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   418
                    prepare_insert_relation(subject_eid, rtype, object_eid)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   419
                    existing.add((subject_eid, object_eid))
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   420
                    if symmetric:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   421
                        existing.add((object_eid, subject_eid))
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   422
        return missing_relations
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   423
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   424
    def _warn_about_missing_work(self, queue, missing_relations):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   425
        error = self.import_log.record_error
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   426
        if queue:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   427
            msgs = ["can't create some entities, is there some cycle or "
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   428
                    "missing data?"]
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   429
            for ext_entities in queue.values():
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   430
                for ext_entity in ext_entities:
12173
d13fc09301bd [dataimport] Add explanation about why external entities can't be inserted
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 12167
diff changeset
   431
                    msg = '{}: {}'.format(ext_entity, ext_entity.why_not_ready(self.extid2eid))
d13fc09301bd [dataimport] Add explanation about why external entities can't be inserted
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 12167
diff changeset
   432
                    msgs.append(msg)
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   433
            map(error, msgs)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   434
            if self.raise_on_error:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   435
                raise Exception('\n'.join(msgs))
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   436
        if missing_relations:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   437
            msgs = ["can't create some relations, is there missing data?"]
11275
814f54d6183b [dataimport] order of ExtEntities should be irrelevant (closes #13117472)
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 10939
diff changeset
   438
            for subject_uri, rtype, object_uri, exc in missing_relations:
814f54d6183b [dataimport] order of ExtEntities should be irrelevant (closes #13117472)
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 10939
diff changeset
   439
                msgs.append("Could not find %s when trying to insert (%s, %s, %s)"
814f54d6183b [dataimport] order of ExtEntities should be irrelevant (closes #13117472)
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 10939
diff changeset
   440
                            % (exc, subject_uri, rtype, object_uri))
10460
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   441
            map(error, msgs)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   442
            if self.raise_on_error:
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   443
                raise Exception('\n'.join(msgs))
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   444
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   445
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   446
class SimpleImportLog(object):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   447
    """Fake CWDataImport log using a simple text format.
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   448
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   449
    Useful to display logs in the UI instead of storing them to the
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   450
    database.
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   451
    """
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   452
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   453
    def __init__(self, filename):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   454
        self.logs = []
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   455
        self.filename = filename
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   456
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   457
    def record_debug(self, msg, path=None, line=None):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   458
        self._log(logging.DEBUG, msg, path, line)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   459
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   460
    def record_info(self, msg, path=None, line=None):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   461
        self._log(logging.INFO, msg, path, line)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   462
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   463
    def record_warning(self, msg, path=None, line=None):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   464
        self._log(logging.WARNING, msg, path, line)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   465
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   466
    def record_error(self, msg, path=None, line=None):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   467
        self._log(logging.ERROR, msg, path, line)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   468
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   469
    def record_fatal(self, msg, path=None, line=None):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   470
        self._log(logging.FATAL, msg, path, line)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   471
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   472
    def _log(self, severity, msg, path, line):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   473
        encodedmsg = u'%s\t%s\t%s\t%s' % (severity, self.filename,
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   474
                                          line or u'', msg)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   475
        self.logs.append(encodedmsg)
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   476
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   477
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   478
class HTMLImportLog(SimpleImportLog):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   479
    """Fake CWDataImport log using a simple HTML format."""
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   480
    def __init__(self, filename):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   481
        super(HTMLImportLog, self).__init__(xml_escape(filename))
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   482
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   483
    def _log(self, severity, msg, path, line):
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   484
        encodedmsg = u'%s\t%s\t%s\t%s<br/>' % (severity, self.filename,
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   485
                                               line or u'', xml_escape(msg))
d260722f2453 [dataimport] introduce the importer and extentity classes
Yann Voté <yann.vote@logilab.fr>
parents:
diff changeset
   486
        self.logs.append(encodedmsg)