[repository] set .eid on eschema when schema is loaded from the filesystem
enforcing the contract that a repository's schema should have .eid attribute of
entity schema set to the eid of the entity used to serialize them in the db.
Before this cset, this was not true during tests or for some c-c commands where
'quick_start' is set (eg db-restore, i18ncube...).
The change in server __init__ makes this assumption true during instance
creation: the serializing code was actually setting eid on schema object, but a
reference to a previously built schema was given instead of the one for the
latest created repository.
Closes #10450092
.. _fti:
Full Text Indexing in CubicWeb
------------------------------
When an attribute is tagged as *fulltext-indexable* in the datamodel,
CubicWeb will automatically trigger hooks to update the internal
fulltext index (i.e the ``appears`` SQL table) each time this attribute
is modified.
CubicWeb also provides a ``db-rebuild-fti`` command to rebuild the whole
fulltext on demand:
.. sourcecode:: bash
cubicweb@esope~$ cubicweb db-rebuild-fti my_tracker_instance
You can also rebuild the fulltext index for a given set of entity types:
.. sourcecode:: bash
cubicweb@esope~$ cubicweb db-rebuild-fti my_tracker_instance Ticket Version
In the above example, only fulltext index of entity types ``Ticket`` and ``Version``
will be rebuilt.
Standard FTI process
~~~~~~~~~~~~~~~~~~~~
Considering an entity type ``ET``, the default *fti* process is to :
1. fetch all entities of type ``ET``
2. for each entity, adapt it to ``IFTIndexable`` (see
:class:`~cubicweb.entities.adapters.IFTIndexableAdapter`)
3. call
:meth:`~cubicweb.entities.adapters.IFTIndexableAdapter.get_words` on
the adapter which is supposed to return a dictionary *weight* ->
*list of words* as expected by
:meth:`~logilab.database.fti.FTIndexerMixIn.index_object`. The
tokenization of each attribute value is done by
:meth:`~logilab.database.fti.tokenize`.
See :class:`~cubicweb.entities.adapters.IFTIndexableAdapter` for more documentation.
Yams and ``fulltext_container``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
It is possible in the datamodel to indicate that fulltext-indexed
attributes defined for an entity type will be used to index not the
entity itself but a related entity. This is especially useful for
composite entities. Let's take a look at (a simplified version of)
the base schema defined in CubicWeb (see :mod:`cubicweb.schemas.base`):
.. sourcecode:: python
class CWUser(WorkflowableEntityType):
login = String(required=True, unique=True, maxsize=64)
upassword = Password(required=True)
class EmailAddress(EntityType):
address = String(required=True, fulltextindexed=True,
indexed=True, unique=True, maxsize=128)
class use_email_relation(RelationDefinition):
name = 'use_email'
subject = 'CWUser'
object = 'EmailAddress'
cardinality = '*?'
composite = 'subject'
The schema above states that there is a relation between ``CWUser`` and ``EmailAddress``
and that the ``address`` field of ``EmailAddress`` is fulltext indexed. Therefore,
in your application, if you use fulltext search to look for an email address, CubicWeb
will return the ``EmailAddress`` itself. But the objects we'd like to index
are more likely to be the associated ``CWUser`` than the ``EmailAddress`` itself.
The simplest way to achieve that is to tag the ``use_email`` relation in
the datamodel:
.. sourcecode:: python
class use_email(RelationType):
fulltext_container = 'subject'
Customizing how entities are fetched during ``db-rebuild-fti``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
``db-rebuild-fti`` will call the
:meth:`~cubicweb.entities.AnyEntity.cw_fti_index_rql_limit` class
method on your entity type.
.. automethod:: cubicweb.entities.AnyEntity.cw_fti_index_rql_limit
Customizing :meth:`~cubicweb.entities.adapters.IFTIndexableAdapter.get_words`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
You can also customize the FTI process by providing your own ``get_words()``
implementation:
.. sourcecode:: python
from cubicweb.entities.adapters import IFTIndexableAdapter
class SearchIndexAdapter(IFTIndexableAdapter):
__regid__ = 'IFTIndexable'
__select__ = is_instance('MyEntityClass')
def fti_containers(self, _done=None):
"""this should yield any entity that must be considered to
fulltext-index self.entity
CubicWeb's default implementation will look for yams'
``fulltex_container`` property.
"""
yield self.entity
yield self.entity.some_related_entity
def get_words(self):
# implement any logic here
# see http://www.postgresql.org/docs/9.1/static/textsearch-controls.html
# for the actual signification of 'C'
return {'C': ['any', 'word', 'I', 'want']}