doc/book/en/devrepo/fti.rst
author Yann Voté <yann.vote@logilab.fr>
Fri, 26 Jun 2015 16:09:27 +0200
changeset 10460 d260722f2453
parent 9514 29987849a435
permissions -rw-r--r--
[dataimport] introduce the importer and extentity classes This introduces the ``ExtEntity`` class which is a transitional state between data at external source and the actual CubicWeb entities. ``ExtEntitiesImporter`` is then in charge to turn a bunch of ext entities into CW entities in repository, using a given store. This changeset also introduces ``SimpleImportLog`` and ``HTMLImportLog`` which implement the CW DataImportLog interface in order to show log messages in UI using simple text and HTML formats respectively, instead of storing these messages in database. Both have mostly been backported from cubes.skos.dataimport. Closes #5414753.
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
8518
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
     1
.. _fti:
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
     2
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
     3
Full Text Indexing in CubicWeb
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
     4
------------------------------
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
     5
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
     6
When an attribute is tagged as *fulltext-indexable* in the datamodel,
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
     7
CubicWeb will automatically trigger hooks to update the internal
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
     8
fulltext index (i.e the ``appears`` SQL table) each time this attribute
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
     9
is modified.
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    10
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    11
CubicWeb also provides a ``db-rebuild-fti`` command to rebuild the whole
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    12
fulltext on demand:
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    13
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    14
.. sourcecode:: bash
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    15
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    16
   cubicweb@esope~$ cubicweb db-rebuild-fti my_tracker_instance
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    17
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    18
You can also rebuild the fulltext index for a given set of entity types:
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    19
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    20
.. sourcecode:: bash
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    21
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    22
   cubicweb@esope~$ cubicweb db-rebuild-fti my_tracker_instance Ticket Version
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    23
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    24
In the above example, only fulltext index of entity types ``Ticket`` and ``Version``
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    25
will be rebuilt.
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    26
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    27
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    28
Standard FTI process
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    29
~~~~~~~~~~~~~~~~~~~~
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    30
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    31
Considering an entity type ``ET``, the default *fti* process is to :
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    32
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    33
1. fetch all entities of type ``ET``
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    34
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    35
2. for each entity, adapt it to ``IFTIndexable`` (see
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    36
   :class:`~cubicweb.entities.adapters.IFTIndexableAdapter`)
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    37
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    38
3. call
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    39
   :meth:`~cubicweb.entities.adapters.IFTIndexableAdapter.get_words` on
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    40
   the adapter which is supposed to return a dictionary *weight* ->
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    41
   *list of words* as expected by
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    42
   :meth:`~logilab.database.fti.FTIndexerMixIn.index_object`. The
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    43
   tokenization of each attribute value is done by
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    44
   :meth:`~logilab.database.fti.tokenize`.
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    45
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    46
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    47
See :class:`~cubicweb.entities.adapters.IFTIndexableAdapter` for more documentation.
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    48
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    49
9514
29987849a435 [doc] Fix typo in devrepo/fti
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 8518
diff changeset
    50
Yams and ``fulltext_container``
29987849a435 [doc] Fix typo in devrepo/fti
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 8518
diff changeset
    51
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
8518
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    52
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    53
It is possible in the datamodel to indicate that fulltext-indexed
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    54
attributes defined for an entity type will be used to index not the
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    55
entity itself but a related entity. This is especially useful for
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    56
composite entities. Let's take a look at (a simplified version of)
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    57
the base schema defined in CubicWeb (see :mod:`cubicweb.schemas.base`):
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    58
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    59
.. sourcecode:: python
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    60
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    61
  class CWUser(WorkflowableEntityType):
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    62
      login     = String(required=True, unique=True, maxsize=64)
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    63
      upassword = Password(required=True)
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    64
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    65
  class EmailAddress(EntityType):
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    66
      address = String(required=True,  fulltextindexed=True,
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    67
                       indexed=True, unique=True, maxsize=128)
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    68
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    69
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    70
  class use_email_relation(RelationDefinition):
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    71
      name = 'use_email'
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    72
      subject = 'CWUser'
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    73
      object = 'EmailAddress'
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    74
      cardinality = '*?'
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    75
      composite = 'subject'
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    76
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    77
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    78
The schema above states that there is a relation between ``CWUser`` and ``EmailAddress``
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    79
and that the ``address`` field of ``EmailAddress`` is fulltext indexed. Therefore,
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    80
in your application, if you use fulltext search to look for an email address, CubicWeb
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    81
will return the ``EmailAddress`` itself. But the objects we'd like to index
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    82
are more likely to be the associated ``CWUser`` than the ``EmailAddress`` itself.
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    83
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    84
The simplest way to achieve that is to tag the ``use_email`` relation in
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    85
the datamodel:
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    86
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    87
.. sourcecode:: python
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    88
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    89
  class use_email(RelationType):
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    90
      fulltext_container = 'subject'
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    91
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    92
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    93
Customizing how entities are fetched during ``db-rebuild-fti``
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    94
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    95
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    96
``db-rebuild-fti`` will call the
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    97
:meth:`~cubicweb.entities.AnyEntity.cw_fti_index_rql_queries` class
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    98
method on your entity type.
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    99
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   100
.. automethod:: cubicweb.entities.AnyEntity.cw_fti_index_rql_queries
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   101
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   102
Now, suppose you've got a _huge_ table to index, you probably don't want to
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   103
get all entities at once. So here's a simple customized example that will
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   104
process block of 10000 entities:
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   105
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   106
.. sourcecode:: python
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   107
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   108
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   109
    class MyEntityClass(AnyEntity):
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   110
        __regid__ = 'MyEntityClass'
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   111
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   112
    @classmethod
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   113
    def cw_fti_index_rql_queries(cls, req):
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   114
        # get the default RQL method and insert LIMIT / OFFSET instructions
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   115
        base_rql = super(SearchIndex, cls).cw_fti_index_rql_queries(req)[0]
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   116
        selected, restrictions = base_rql.split(' WHERE ')
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   117
        rql_template = '%s ORDERBY X LIMIT %%(limit)s OFFSET %%(offset)s WHERE %s' % (
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   118
            selected, restrictions)
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   119
        # count how many entities you'll have to index
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   120
        count = req.execute('Any COUNT(X) WHERE X is MyEntityClass')[0][0]
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   121
        # iterate by blocks of 10000 entities
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   122
        chunksize = 10000
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   123
        for offset in xrange(0, count, chunksize):
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   124
            print 'SENDING', rql_template % {'limit': chunksize, 'offset': offset}
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   125
            yield rql_template % {'limit': chunksize, 'offset': offset}
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   126
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   127
Since you have access to ``req``, you can more or less fetch whatever you want.
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   128
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   129
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   130
Customizing :meth:`~cubicweb.entities.adapters.IFTIndexableAdapter.get_words`
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   131
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   132
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   133
You can also customize the FTI process by providing your own ``get_words()``
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   134
implementation:
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   135
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   136
.. sourcecode:: python
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   137
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   138
    from cubicweb.entities.adapters import IFTIndexableAdapter
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   139
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   140
    class SearchIndexAdapter(IFTIndexableAdapter):
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   141
        __regid__ = 'IFTIndexable'
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   142
        __select__ = is_instance('MyEntityClass')
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   143
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   144
        def fti_containers(self, _done=None):
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   145
            """this should yield any entity that must be considered to
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   146
            fulltext-index self.entity
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   147
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   148
            CubicWeb's default implementation will look for yams'
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   149
            ``fulltex_container`` property.
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   150
            """
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   151
            yield self.entity
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   152
            yield self.entity.some_related_entity
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   153
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   154
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   155
        def get_words(self):
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   156
            # implement any logic here
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   157
            # see http://www.postgresql.org/docs/9.1/static/textsearch-controls.html
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   158
            # for the actual signification of 'C'
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   159
            return {'C': ['any', 'word', 'I', 'want']}