web/views/embedding.py
author Sylvain Thénault <sylvain.thenault@logilab.fr>
Wed, 23 Jun 2010 13:54:02 +0200
branchstable
changeset 5857 1a24c62aefc5
parent 5424 8ecbcbff9777
child 5556 9ab2b4c74baf
child 5886 00a78298d30d
permissions -rw-r--r--
[bfss] fix file update to ensure file's content is available on the fs asap... and not only at commit time. So it's consistent with entity creation behaviour. The new file is created at assignement time and removed if the commit is rollbacked.
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
5421
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4252
diff changeset
     1
# copyright 2003-2010 LOGILAB S.A. (Paris, FRANCE), all rights reserved.
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4252
diff changeset
     2
# contact http://www.logilab.fr/ -- mailto:contact@logilab.fr
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4252
diff changeset
     3
#
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4252
diff changeset
     4
# This file is part of CubicWeb.
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4252
diff changeset
     5
#
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4252
diff changeset
     6
# CubicWeb is free software: you can redistribute it and/or modify it under the
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4252
diff changeset
     7
# terms of the GNU Lesser General Public License as published by the Free
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4252
diff changeset
     8
# Software Foundation, either version 2.1 of the License, or (at your option)
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4252
diff changeset
     9
# any later version.
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4252
diff changeset
    10
#
5424
8ecbcbff9777 replace logilab-common by CubicWeb in disclaimer
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 5421
diff changeset
    11
# CubicWeb is distributed in the hope that it will be useful, but WITHOUT
5421
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4252
diff changeset
    12
# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4252
diff changeset
    13
# FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public License for more
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4252
diff changeset
    14
# details.
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4252
diff changeset
    15
#
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4252
diff changeset
    16
# You should have received a copy of the GNU Lesser General Public License along
8167de96c523 proper licensing information (LGPL-2.1). Hope I get it right this time.
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4252
diff changeset
    17
# with CubicWeb.  If not, see <http://www.gnu.org/licenses/>.
0
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    18
"""Objects interacting together to provides the external page embeding
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    19
functionality.
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    20
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    21
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    22
"""
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    23
__docformat__ = "restructuredtext en"
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    24
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    25
import re
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    26
from urlparse import urljoin
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    27
from urllib2 import urlopen, Request, HTTPError
2808
497424219fb0 fix urlquote imports
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 2798
diff changeset
    28
from urllib import quote as urlquote # XXX should use view.url_quote method
0
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    29
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    30
from logilab.mtconverter import guess_encoding
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    31
692
800592b8d39b replace deprecated cubicweb.common.selectors by its new module path (cubicweb.selectors)
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 631
diff changeset
    32
from cubicweb.selectors import (one_line_rset, score_entity,
800592b8d39b replace deprecated cubicweb.common.selectors by its new module path (cubicweb.selectors)
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 631
diff changeset
    33
                                match_search_state, implements)
0
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    34
from cubicweb.interfaces import IEmbedable
800
860451b72ab7 update import
sylvain.thenault@logilab.fr
parents: 742
diff changeset
    35
from cubicweb.view import NOINDEX, NOFOLLOW
4023
eae23c40627a drop common subpackage
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 3460
diff changeset
    36
from cubicweb.uilib import soup2xhtml
0
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    37
from cubicweb.web.controller import Controller
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    38
from cubicweb.web.action import Action
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    39
from cubicweb.web.views import basetemplates
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    40
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    41
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    42
class ExternalTemplate(basetemplates.TheMainTemplate):
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    43
    """template embeding an external web pages into CubicWeb web interface
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    44
    """
3377
dd9d292b6a6d use __regid__ instead of id on appobject classes
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 2808
diff changeset
    45
    __regid__ = 'external'
1802
d628defebc17 delete-trailing-whitespace + some copyright update
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 1132
diff changeset
    46
0
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    47
    def call(self, body):
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    48
        # XXX fallback to HTML 4 mode when embeding ?
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    49
        self.set_request_content_type()
3451
6b46d73823f5 [api] work in progress, use __regid__, cw_*, etc.
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 3377
diff changeset
    50
        self._cw.search_state = ('normal',)
6b46d73823f5 [api] work in progress, use __regid__, cw_*, etc.
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 3377
diff changeset
    51
        self.template_header(self.content_type, None, self._cw._('external page'),
0
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    52
                             [NOINDEX, NOFOLLOW])
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    53
        self.content_header()
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    54
        self.w(body)
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    55
        self.content_footer()
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    56
        self.template_footer()
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    57
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    58
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    59
class EmbedController(Controller):
3377
dd9d292b6a6d use __regid__ instead of id on appobject classes
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 2808
diff changeset
    60
    __regid__ = 'embed'
0
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    61
    template = 'external'
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    62
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    63
    def publish(self, rset=None):
3451
6b46d73823f5 [api] work in progress, use __regid__, cw_*, etc.
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 3377
diff changeset
    64
        req = self._cw
0
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    65
        if 'custom_css' in req.form:
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    66
            req.add_css(req.form['custom_css'])
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    67
        embedded_url = req.form['url']
4083
3b285889b8e9 3.6 api update
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4023
diff changeset
    68
        allowed = self._cw.vreg.config['embed-allowed']
0
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    69
        _ = req._
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    70
        if allowed is None or not allowed.match(embedded_url):
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    71
            body = '<h2>%s</h2><h3>%s</h3>' % (
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    72
                _('error while embedding page'),
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    73
                _('embedding this url is forbidden'))
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    74
        else:
3451
6b46d73823f5 [api] work in progress, use __regid__, cw_*, etc.
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 3377
diff changeset
    75
            prefix = req.build_url(self.__regid__, url='')
0
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    76
            authorization = req.get_header('Authorization')
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    77
            if authorization:
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    78
                headers = {'Authorization' : authorization}
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    79
            else:
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    80
                headers = {}
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    81
            try:
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    82
                body = embed_external_page(embedded_url, prefix,
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    83
                                           headers, req.form.get('custom_css'))
3451
6b46d73823f5 [api] work in progress, use __regid__, cw_*, etc.
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 3377
diff changeset
    84
                body = soup2xhtml(body, self._cw.encoding)
0
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    85
            except HTTPError, err:
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    86
                body = '<h2>%s</h2><h3>%s</h3>' % (
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    87
                    _('error while embedding page'), err)
1092
b8fbb95dc0eb process_rql now done in the controller
sylvain.thenault@logilab.fr
parents: 800
diff changeset
    88
        self.process_rql(req.form.get('rql'))
3451
6b46d73823f5 [api] work in progress, use __regid__, cw_*, etc.
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 3377
diff changeset
    89
        return self._cw.vreg['views'].main_template(req, self.template,
6b46d73823f5 [api] work in progress, use __regid__, cw_*, etc.
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 3377
diff changeset
    90
                                                rset=self.cw_rset, body=body)
0
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    91
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    92
631
99f5852f8604 major selector refactoring (mostly to avoid looking for select parameters on the target class), start accept / interface unification)
sylvain.thenault@logilab.fr
parents: 431
diff changeset
    93
def entity_has_embedable_url(entity):
99f5852f8604 major selector refactoring (mostly to avoid looking for select parameters on the target class), start accept / interface unification)
sylvain.thenault@logilab.fr
parents: 431
diff changeset
    94
    """return 1 if the entity provides an allowed embedable url"""
99f5852f8604 major selector refactoring (mostly to avoid looking for select parameters on the target class), start accept / interface unification)
sylvain.thenault@logilab.fr
parents: 431
diff changeset
    95
    url = entity.embeded_url()
99f5852f8604 major selector refactoring (mostly to avoid looking for select parameters on the target class), start accept / interface unification)
sylvain.thenault@logilab.fr
parents: 431
diff changeset
    96
    if not url or not url.strip():
99f5852f8604 major selector refactoring (mostly to avoid looking for select parameters on the target class), start accept / interface unification)
sylvain.thenault@logilab.fr
parents: 431
diff changeset
    97
        return 0
4083
3b285889b8e9 3.6 api update
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 4023
diff changeset
    98
    allowed = entity._cw.vreg.config['embed-allowed']
631
99f5852f8604 major selector refactoring (mostly to avoid looking for select parameters on the target class), start accept / interface unification)
sylvain.thenault@logilab.fr
parents: 431
diff changeset
    99
    if allowed is None or not allowed.match(url):
99f5852f8604 major selector refactoring (mostly to avoid looking for select parameters on the target class), start accept / interface unification)
sylvain.thenault@logilab.fr
parents: 431
diff changeset
   100
        return 0
99f5852f8604 major selector refactoring (mostly to avoid looking for select parameters on the target class), start accept / interface unification)
sylvain.thenault@logilab.fr
parents: 431
diff changeset
   101
    return 1
99f5852f8604 major selector refactoring (mostly to avoid looking for select parameters on the target class), start accept / interface unification)
sylvain.thenault@logilab.fr
parents: 431
diff changeset
   102
99f5852f8604 major selector refactoring (mostly to avoid looking for select parameters on the target class), start accept / interface unification)
sylvain.thenault@logilab.fr
parents: 431
diff changeset
   103
0
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   104
class EmbedAction(Action):
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   105
    """display an 'embed' link on entity implementing `embeded_url` method
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   106
    if the returned url match embeding configuration
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   107
    """
3377
dd9d292b6a6d use __regid__ instead of id on appobject classes
Sylvain Thénault <sylvain.thenault@logilab.fr>
parents: 2808
diff changeset
   108
    __regid__ = 'embed'
742
99115e029dca replaced most of __selectors__ assignments with __select__
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 692
diff changeset
   109
    __select__ = (one_line_rset() & match_search_state('normal')
1802
d628defebc17 delete-trailing-whitespace + some copyright update
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 1132
diff changeset
   110
                  & implements(IEmbedable)
742
99115e029dca replaced most of __selectors__ assignments with __select__
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 692
diff changeset
   111
                  & score_entity(entity_has_embedable_url))
1802
d628defebc17 delete-trailing-whitespace + some copyright update
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 1132
diff changeset
   112
0
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   113
    title = _('embed')
1802
d628defebc17 delete-trailing-whitespace + some copyright update
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 1132
diff changeset
   114
0
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   115
    def url(self, row=0):
3451
6b46d73823f5 [api] work in progress, use __regid__, cw_*, etc.
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 3377
diff changeset
   116
        entity = self.cw_rset.get_entity(row, 0)
6b46d73823f5 [api] work in progress, use __regid__, cw_*, etc.
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 3377
diff changeset
   117
        url = urljoin(self._cw.base_url(), entity.embeded_url())
6b46d73823f5 [api] work in progress, use __regid__, cw_*, etc.
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 3377
diff changeset
   118
        if self._cw.form.has_key('rql'):
3460
e4843535db25 [api] some more _cw / __regid__, automatic tests now pass again
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 3451
diff changeset
   119
            return self._cw.build_url('embed', url=url, rql=self._cw.form['rql'])
e4843535db25 [api] some more _cw / __regid__, automatic tests now pass again
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 3451
diff changeset
   120
        return self._cw.build_url('embed', url=url)
0
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   121
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   122
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   123
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   124
# functions doing necessary substitutions to embed an external html page ######
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   125
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   126
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   127
BODY_RGX = re.compile('<body.*?>(.*?)</body>', re.I | re.S | re.U)
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   128
HREF_RGX = re.compile('<a\s+href="([^"]*)"', re.I | re.S | re.U)
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   129
SRC_RGX = re.compile('<img\s+src="([^"]*)"', re.I | re.S | re.U)
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   130
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   131
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   132
class replace_href:
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   133
    def __init__(self, prefix, custom_css=None):
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   134
        self.prefix = prefix
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   135
        self.custom_css = custom_css
1802
d628defebc17 delete-trailing-whitespace + some copyright update
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 1132
diff changeset
   136
0
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   137
    def __call__(self, match):
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   138
        original_url = match.group(1)
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   139
        url = self.prefix + urlquote(original_url, safe='')
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   140
        if self.custom_css is not None:
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   141
            if '?' in url:
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   142
                url = '%s&amp;custom_css=%s' % (url, self.custom_css)
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   143
            else:
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   144
                url = '%s?custom_css=%s' % (url, self.custom_css)
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   145
        return '<a href="%s"' % url
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   146
631
99f5852f8604 major selector refactoring (mostly to avoid looking for select parameters on the target class), start accept / interface unification)
sylvain.thenault@logilab.fr
parents: 431
diff changeset
   147
0
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   148
class absolutize_links:
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   149
    def __init__(self, embedded_url, tag, custom_css=None):
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   150
        self.embedded_url = embedded_url
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   151
        self.tag = tag
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   152
        self.custom_css = custom_css
1802
d628defebc17 delete-trailing-whitespace + some copyright update
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents: 1132
diff changeset
   153
0
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   154
    def __call__(self, match):
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   155
        original_url = match.group(1)
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   156
        if '://' in original_url:
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   157
            return match.group(0) # leave it unchanged
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   158
        return '%s="%s"' % (self.tag, urljoin(self.embedded_url, original_url))
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   159
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   160
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   161
def prefix_links(body, prefix, embedded_url, custom_css=None):
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   162
    filters = ((HREF_RGX, absolutize_links(embedded_url, '<a href', custom_css)),
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   163
               (SRC_RGX, absolutize_links(embedded_url, '<img src')),
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   164
               (HREF_RGX, replace_href(prefix, custom_css)))
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   165
    for rgx, repl in filters:
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   166
        body = rgx.sub(repl, body)
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   167
    return body
631
99f5852f8604 major selector refactoring (mostly to avoid looking for select parameters on the target class), start accept / interface unification)
sylvain.thenault@logilab.fr
parents: 431
diff changeset
   168
99f5852f8604 major selector refactoring (mostly to avoid looking for select parameters on the target class), start accept / interface unification)
sylvain.thenault@logilab.fr
parents: 431
diff changeset
   169
0
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   170
def embed_external_page(url, prefix, headers=None, custom_css=None):
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   171
    req = Request(url, headers=(headers or {}))
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   172
    content = urlopen(req).read()
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   173
    page_source = unicode(content, guess_encoding(content), 'replace')
1132
96752791c2b6 pylint cleanup
sylvain.thenault@logilab.fr
parents: 1092
diff changeset
   174
    page_source = page_source
0
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   175
    match = BODY_RGX.search(page_source)
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   176
    if match is None:
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   177
        return page_source
b97547f5f1fa Showtime !
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   178
    return prefix_links(match.group(1), prefix, url, custom_css)