sobjects/textparsers.py
author Sylvain Thénault <sylvain.thenault@logilab.fr>
Wed, 19 Jan 2011 08:39:09 +0100
branchstable
changeset 6841 f04df13fc851
parent 5556 9ab2b4c74baf
child 6911 75849076fd6c
permissions -rw-r--r--
[sparql] fix an url generation bug breaking sparqlxml results view + remove a deprecation warning
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
5421
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4252
diff changeset
     1
# copyright 2003-2010 LOGILAB S.A. (Paris, FRANCE), all rights reserved.
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4252
diff changeset
     2
# contact http://www.logilab.fr/ -- mailto:contact@logilab.fr
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4252
diff changeset
     3
#
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4252
diff changeset
     4
# This file is part of CubicWeb.
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4252
diff changeset
     5
#
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4252
diff changeset
     6
# CubicWeb is free software: you can redistribute it and/or modify it under the
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4252
diff changeset
     7
# terms of the GNU Lesser General Public License as published by the Free
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4252
diff changeset
     8
# Software Foundation, either version 2.1 of the License, or (at your option)
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4252
diff changeset
     9
# any later version.
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4252
diff changeset
    10
#
5424
8ecbcbff9777 replace logilab-common by CubicWeb in disclaimer
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5421
diff changeset
    11
# CubicWeb is distributed in the hope that it will be useful, but WITHOUT
5421
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4252
diff changeset
    12
# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4252
diff changeset
    13
# FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public License for more
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4252
diff changeset
    14
# details.
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4252
diff changeset
    15
#
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4252
diff changeset
    16
# You should have received a copy of the GNU Lesser General Public License along
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4252
diff changeset
    17
# with CubicWeb.  If not, see <http://www.gnu.org/licenses/>.
3860
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    18
"""hooks triggered on email entities creation:
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    19
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    20
* look for state change instruction (XXX security)
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    21
* set email content as a comment on an entity when comments are supported and
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    22
  linking information are found
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    23
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    24
"""
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    25
__docformat__ = "restructuredtext en"
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    26
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    27
import re
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    28
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    29
from cubicweb import UnknownEid, typed_eid
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    30
from cubicweb.view import Component
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    31
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    32
        # XXX use user session if gpg signature validated
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    33
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    34
class TextAnalyzer(Component):
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    35
    """analyze and extract information from plain text by calling registered
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    36
    text parsers
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    37
    """
4004
c52619c738a5 api renaming update
Sandrine Ribeau <sandrine.ribeau@logilab.fr>
parents: 3860
diff changeset
    38
    __regid__ = 'textanalyzer'
3860
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    39
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    40
    def parse(self, caller, text):
4056
f4634710e20c api update
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4004
diff changeset
    41
        for parsercls in self._cw.vreg['components'].get('textparser', ()):
f4634710e20c api update
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4004
diff changeset
    42
            parsercls(self._cw).parse(caller, text)
3860
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    43
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    44
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    45
class TextParser(Component):
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    46
    """base class for text parser, responsible to extract some information
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    47
    from plain text. When something is done, it usually call the
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    48
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    49
      .fire_event(something, {event args})
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    50
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    51
    method on the caller.
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    52
    """
4004
c52619c738a5 api renaming update
Sandrine Ribeau <sandrine.ribeau@logilab.fr>
parents: 3860
diff changeset
    53
    __regid__ = 'textparser'
3860
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    54
    __abstract__ = True
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    55
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    56
    def parse(self, caller, text):
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    57
        raise NotImplementedError
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    58
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    59
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    60
class ChangeStateTextParser(TextParser):
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    61
    """search some text for change state instruction in the form
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    62
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    63
         :<transition name>: #?<eid>
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    64
    """
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    65
    instr_rgx = re.compile(':(\w+):\s*#?(\d+)', re.U)
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    66
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    67
    def parse(self, caller, text):
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    68
        for trname, eid in self.instr_rgx.findall(text):
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    69
            try:
4056
f4634710e20c api update
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4004
diff changeset
    70
                entity = self._cw.entity_from_eid(typed_eid(eid))
3860
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    71
            except UnknownEid:
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    72
                self.error("can't get entity with eid %s", eid)
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    73
                continue
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    74
            if not hasattr(entity, 'in_state'):
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    75
                self.error('bad change state instruction for eid %s', eid)
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    76
                continue
5556
9ab2b4c74baf [entity] introduce a new 'adapters' registry
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5424
diff changeset
    77
            iworkflowable = entity.cw_adapt_to('IWorkflowable')
9ab2b4c74baf [entity] introduce a new 'adapters' registry
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5424
diff changeset
    78
            if iworkflowable.current_workflow:
9ab2b4c74baf [entity] introduce a new 'adapters' registry
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5424
diff changeset
    79
                tr = iworkflowable.current_workflow.transition_by_name(trname)
9ab2b4c74baf [entity] introduce a new 'adapters' registry
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5424
diff changeset
    80
            else:
9ab2b4c74baf [entity] introduce a new 'adapters' registry
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5424
diff changeset
    81
                tr = None
3860
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    82
            if tr and tr.may_be_fired(entity.eid):
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    83
                try:
5556
9ab2b4c74baf [entity] introduce a new 'adapters' registry
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5424
diff changeset
    84
                    trinfo = iworkflowable.fire_transition(tr)
3860
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    85
                    caller.fire_event('state-changed', {'trinfo': trinfo,
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    86
                                                        'entity': entity})
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    87
                except:
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    88
                    self.exception('while changing state of %s', entity)
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    89
            else:
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    90
                self.error("can't pass transition %s on entity %s",
2e7d399ee075 add textparser object, designed to trigger some actions from textual content such as email or checkin-message (unused in the library itself yet, see email cube)
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents:
diff changeset
    91
                           trname, entity)