sobjects/textparsers.py
author Julien Cristau <julien.cristau@logilab.fr>
Mon, 09 Nov 2015 16:21:29 +0100
changeset 10879 3193d9ede8dd
parent 8748 f5027f8d2478
permissions -rw-r--r--
[dataimport] drop extra indirection through MassiveObjectStore._initialized dict We already have self.__dict__, no need to add another one for a static set of keys.
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
7815
2a164a9cf81c [exceptions] stop catching any exception in various places (closes #1942716)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 6911
diff changeset
     1
# copyright 2003-2011 LOGILAB S.A. (Paris, FRANCE), all rights reserved.
5421
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4252
diff changeset
     2
# contact http://www.logilab.fr/ -- mailto:contact@logilab.fr
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4252
diff changeset
     3
#
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4252
diff changeset
     4
# This file is part of CubicWeb.
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4252
diff changeset
     5
#
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4252
diff changeset
     6
# CubicWeb is free software: you can redistribute it and/or modify it under the
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4252
diff changeset
     7
# terms of the GNU Lesser General Public License as published by the Free
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4252
diff changeset
     8
# Software Foundation, either version 2.1 of the License, or (at your option)
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4252
diff changeset
     9
# any later version.
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4252
diff changeset
    10
#
5424
8ecbcbff9777 replace logilab-common by CubicWeb in disclaimer
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5421
diff changeset
    11
# CubicWeb is distributed in the hope that it will be useful, but WITHOUT
5421
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4252
diff changeset
    12
# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4252
diff changeset
    13
# FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public License for more
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4252
diff changeset
    14
# details.
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4252
diff changeset
    15
#
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4252
diff changeset
    16
# You should have received a copy of the GNU Lesser General Public License along
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4252
diff changeset
    17
# with CubicWeb.  If not, see <http://www.gnu.org/licenses/>.
6911
75849076fd6c cleanup, docstring update
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5556
diff changeset
    18
"""Some parsers to detect action to do from text
3860
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    19
6911
75849076fd6c cleanup, docstring update
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5556
diff changeset
    20
Currently only a parser to look for state change instruction is provided.
75849076fd6c cleanup, docstring update
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5556
diff changeset
    21
Take care to security when you're using it, think about the user that
75849076fd6c cleanup, docstring update
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5556
diff changeset
    22
will provide the text to analyze...
75849076fd6c cleanup, docstring update
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5556
diff changeset
    23
"""
3860
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    24
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    25
__docformat__ = "restructuredtext en"
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    26
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    27
import re
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    28
8748
f5027f8d2478 drop typed_eid() in favour of int() (closes #2742462)
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 7815
diff changeset
    29
from cubicweb import UnknownEid
3860
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    30
from cubicweb.view import Component
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    31
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    32
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    33
class TextAnalyzer(Component):
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    34
    """analyze and extract information from plain text by calling registered
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    35
    text parsers
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    36
    """
4004
c52619c738a5 api renaming update
Sandrine Ribeau <sandrine.ribeau@logilab.fr>
parents: 3860
diff changeset
    37
    __regid__ = 'textanalyzer'
3860
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    38
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    39
    def parse(self, caller, text):
4056
f4634710e20c api update
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4004
diff changeset
    40
        for parsercls in self._cw.vreg['components'].get('textparser', ()):
f4634710e20c api update
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4004
diff changeset
    41
            parsercls(self._cw).parse(caller, text)
3860
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    42
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    43
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    44
class TextParser(Component):
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    45
    """base class for text parser, responsible to extract some information
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    46
    from plain text. When something is done, it usually call the
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    47
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    48
      .fire_event(something, {event args})
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    49
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    50
    method on the caller.
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    51
    """
4004
c52619c738a5 api renaming update
Sandrine Ribeau <sandrine.ribeau@logilab.fr>
parents: 3860
diff changeset
    52
    __regid__ = 'textparser'
3860
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    53
    __abstract__ = True
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    54
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    55
    def parse(self, caller, text):
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    56
        raise NotImplementedError
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    57
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    58
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    59
class ChangeStateTextParser(TextParser):
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    60
    """search some text for change state instruction in the form
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    61
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    62
         :<transition name>: #?<eid>
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    63
    """
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    64
    instr_rgx = re.compile(':(\w+):\s*#?(\d+)', re.U)
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    65
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    66
    def parse(self, caller, text):
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    67
        for trname, eid in self.instr_rgx.findall(text):
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    68
            try:
8748
f5027f8d2478 drop typed_eid() in favour of int() (closes #2742462)
Nicolas Chauvat <nicolas.chauvat@logilab.fr>
parents: 7815
diff changeset
    69
                entity = self._cw.entity_from_eid(int(eid))
3860
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    70
            except UnknownEid:
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    71
                self.error("can't get entity with eid %s", eid)
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    72
                continue
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    73
            if not hasattr(entity, 'in_state'):
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    74
                self.error('bad change state instruction for eid %s', eid)
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    75
                continue
5556
9ab2b4c74baf [entity] introduce a new 'adapters' registry
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5424
diff changeset
    76
            iworkflowable = entity.cw_adapt_to('IWorkflowable')
9ab2b4c74baf [entity] introduce a new 'adapters' registry
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5424
diff changeset
    77
            if iworkflowable.current_workflow:
9ab2b4c74baf [entity] introduce a new 'adapters' registry
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5424
diff changeset
    78
                tr = iworkflowable.current_workflow.transition_by_name(trname)
9ab2b4c74baf [entity] introduce a new 'adapters' registry
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5424
diff changeset
    79
            else:
9ab2b4c74baf [entity] introduce a new 'adapters' registry
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5424
diff changeset
    80
                tr = None
3860
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    81
            if tr and tr.may_be_fired(entity.eid):
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    82
                try:
5556
9ab2b4c74baf [entity] introduce a new 'adapters' registry
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5424
diff changeset
    83
                    trinfo = iworkflowable.fire_transition(tr)
3860
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    84
                    caller.fire_event('state-changed', {'trinfo': trinfo,
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    85
                                                        'entity': entity})
7815
2a164a9cf81c [exceptions] stop catching any exception in various places (closes #1942716)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 6911
diff changeset
    86
                except Exception:
3860
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    87
                    self.exception('while changing state of %s', entity)
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    88
            else:
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    89
                self.error("can't pass transition %s on entity %s",
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    90
                           trname, entity)