doc/book/devrepo/fti.rst
author Julien Cristau <julien.cristau@logilab.fr>
Tue, 23 Jun 2015 17:04:40 +0200
changeset 10495 5bd914ebf3ae
parent 10491 c67bcee93248
child 10847 ce5403611cbe
permissions -rw-r--r--
[doc] fix warnings/errors in doc build - fix links to images - fix a couple of typos - re-add IDownloadableOneLineView doc - rename documenting.rst back to .txt, it's intended as a doc of how to write rst, not part of the rst doc Related to #4832808
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
8518
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
     1
.. _fti:
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
     2
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
     3
Full Text Indexing in CubicWeb
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
     4
------------------------------
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
     5
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
     6
When an attribute is tagged as *fulltext-indexable* in the datamodel,
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
     7
CubicWeb will automatically trigger hooks to update the internal
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
     8
fulltext index (i.e the ``appears`` SQL table) each time this attribute
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
     9
is modified.
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    10
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    11
CubicWeb also provides a ``db-rebuild-fti`` command to rebuild the whole
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    12
fulltext on demand:
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    13
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    14
.. sourcecode:: bash
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    15
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    16
   cubicweb@esope~$ cubicweb db-rebuild-fti my_tracker_instance
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    17
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    18
You can also rebuild the fulltext index for a given set of entity types:
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    19
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    20
.. sourcecode:: bash
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    21
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    22
   cubicweb@esope~$ cubicweb db-rebuild-fti my_tracker_instance Ticket Version
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    23
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    24
In the above example, only fulltext index of entity types ``Ticket`` and ``Version``
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    25
will be rebuilt.
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    26
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    27
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    28
Standard FTI process
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    29
~~~~~~~~~~~~~~~~~~~~
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    30
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    31
Considering an entity type ``ET``, the default *fti* process is to :
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    32
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    33
1. fetch all entities of type ``ET``
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    34
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    35
2. for each entity, adapt it to ``IFTIndexable`` (see
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    36
   :class:`~cubicweb.entities.adapters.IFTIndexableAdapter`)
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    37
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    38
3. call
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    39
   :meth:`~cubicweb.entities.adapters.IFTIndexableAdapter.get_words` on
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    40
   the adapter which is supposed to return a dictionary *weight* ->
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    41
   *list of words* as expected by
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    42
   :meth:`~logilab.database.fti.FTIndexerMixIn.index_object`. The
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    43
   tokenization of each attribute value is done by
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    44
   :meth:`~logilab.database.fti.tokenize`.
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    45
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    46
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    47
See :class:`~cubicweb.entities.adapters.IFTIndexableAdapter` for more documentation.
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    48
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    49
9514
29987849a435 [doc] Fix typo in devrepo/fti
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 8518
diff changeset
    50
Yams and ``fulltext_container``
29987849a435 [doc] Fix typo in devrepo/fti
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 8518
diff changeset
    51
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
8518
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    52
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    53
It is possible in the datamodel to indicate that fulltext-indexed
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    54
attributes defined for an entity type will be used to index not the
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    55
entity itself but a related entity. This is especially useful for
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    56
composite entities. Let's take a look at (a simplified version of)
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    57
the base schema defined in CubicWeb (see :mod:`cubicweb.schemas.base`):
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    58
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    59
.. sourcecode:: python
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    60
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    61
  class CWUser(WorkflowableEntityType):
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    62
      login     = String(required=True, unique=True, maxsize=64)
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    63
      upassword = Password(required=True)
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    64
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    65
  class EmailAddress(EntityType):
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    66
      address = String(required=True,  fulltextindexed=True,
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    67
                       indexed=True, unique=True, maxsize=128)
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    68
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    69
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    70
  class use_email_relation(RelationDefinition):
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    71
      name = 'use_email'
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    72
      subject = 'CWUser'
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    73
      object = 'EmailAddress'
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    74
      cardinality = '*?'
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    75
      composite = 'subject'
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    76
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    77
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    78
The schema above states that there is a relation between ``CWUser`` and ``EmailAddress``
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    79
and that the ``address`` field of ``EmailAddress`` is fulltext indexed. Therefore,
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    80
in your application, if you use fulltext search to look for an email address, CubicWeb
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    81
will return the ``EmailAddress`` itself. But the objects we'd like to index
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    82
are more likely to be the associated ``CWUser`` than the ``EmailAddress`` itself.
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    83
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    84
The simplest way to achieve that is to tag the ``use_email`` relation in
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    85
the datamodel:
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    86
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    87
.. sourcecode:: python
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    88
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    89
  class use_email(RelationType):
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    90
      fulltext_container = 'subject'
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    91
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    92
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    93
Customizing how entities are fetched during ``db-rebuild-fti``
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    94
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    95
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    96
``db-rebuild-fti`` will call the
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    97
:meth:`~cubicweb.entities.AnyEntity.cw_fti_index_rql_queries` class
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    98
method on your entity type.
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
    99
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   100
.. automethod:: cubicweb.entities.AnyEntity.cw_fti_index_rql_queries
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   101
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   102
Now, suppose you've got a _huge_ table to index, you probably don't want to
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   103
get all entities at once. So here's a simple customized example that will
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   104
process block of 10000 entities:
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   105
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   106
.. sourcecode:: python
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   107
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   108
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   109
    class MyEntityClass(AnyEntity):
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   110
        __regid__ = 'MyEntityClass'
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   111
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   112
    @classmethod
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   113
    def cw_fti_index_rql_queries(cls, req):
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   114
        # get the default RQL method and insert LIMIT / OFFSET instructions
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   115
        base_rql = super(SearchIndex, cls).cw_fti_index_rql_queries(req)[0]
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   116
        selected, restrictions = base_rql.split(' WHERE ')
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   117
        rql_template = '%s ORDERBY X LIMIT %%(limit)s OFFSET %%(offset)s WHERE %s' % (
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   118
            selected, restrictions)
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   119
        # count how many entities you'll have to index
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   120
        count = req.execute('Any COUNT(X) WHERE X is MyEntityClass')[0][0]
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   121
        # iterate by blocks of 10000 entities
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   122
        chunksize = 10000
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   123
        for offset in xrange(0, count, chunksize):
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   124
            print 'SENDING', rql_template % {'limit': chunksize, 'offset': offset}
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   125
            yield rql_template % {'limit': chunksize, 'offset': offset}
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   126
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   127
Since you have access to ``req``, you can more or less fetch whatever you want.
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   128
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   129
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   130
Customizing :meth:`~cubicweb.entities.adapters.IFTIndexableAdapter.get_words`
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   131
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   132
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   133
You can also customize the FTI process by providing your own ``get_words()``
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   134
implementation:
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   135
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   136
.. sourcecode:: python
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   137
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   138
    from cubicweb.entities.adapters import IFTIndexableAdapter
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   139
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   140
    class SearchIndexAdapter(IFTIndexableAdapter):
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   141
        __regid__ = 'IFTIndexable'
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   142
        __select__ = is_instance('MyEntityClass')
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   143
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   144
        def fti_containers(self, _done=None):
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   145
            """this should yield any entity that must be considered to
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   146
            fulltext-index self.entity
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   147
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   148
            CubicWeb's default implementation will look for yams'
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   149
            ``fulltex_container`` property.
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   150
            """
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   151
            yield self.entity
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   152
            yield self.entity.some_related_entity
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   153
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   154
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   155
        def get_words(self):
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   156
            # implement any logic here
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   157
            # see http://www.postgresql.org/docs/9.1/static/textsearch-controls.html
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   158
            # for the actual signification of 'C'
153a7c9cdca9 [fti] add some documentation
Adrien Di Mascio <Adrien.DiMascio@logilab.fr>
parents:
diff changeset
   159
            return {'C': ['any', 'word', 'I', 'want']}