doc/book/en/development/datamodel/definition.rst
author Alexandre Fayolle <alexandre.fayolle@logilab.fr>
Mon, 05 Apr 2010 09:06:16 +0200
branchstable
changeset 5145 bfa4d775219f
parent 4936 a4b772a0d801
child 5220 42f854b6083d
permissions -rw-r--r--
added information about the naming conventions in schema.py also: * some typo fix * note about ObjectRelation soon being deprecated * clarified a few hazy points

 .. -*- coding: utf-8 -*-

Yams *schema*
-------------

The **schema** is the core piece of a *CubicWeb* instance as it defines
the handled data model. It is based on entity types that are either already
defined in the *CubicWeb* standard library; or more specific types defined
in cubes. The schema for a cube is defined in a :file:schema.py file or in
one or more Python files under the :file:`schema` directory (python package).

At this point, it is important to make clear the difference between
*relation type* and *relation definition*: a *relation type* is only a relation
name with potentially other additional properties (see below), whereas a
*relation definition* is a complete triplet
"<subject entity type> <relation type> <object entity type>".

Also, it should be clear that to properly handle data migration, an
instance's schema
is stored in the database, so the python schema file used to defined it is only read
when the instance is created or upgraded.

The following built-in types are available: `String`, `Int`, `Float`,
`Decimal`, `Boolean`, `Date`, `Datetime`, `Time`, `Interval`, `Byte`
and `Password`.

You'll also have access to :ref:`base CubicWeb entity types <CWBaseEntityTypes>`.

The instance schema is accessible through the .schema attribute of the
`vregistry`.  It's an instance of :class:`cubicweb.schema.Schema`, which
extends :class:`yams.schema.Schema`.

:note:
  In previous yams versions, almost all classes where available without
  any import, but the should now be explicitly imported.


Entity type
~~~~~~~~~~~
An entity type is an instance of :class:`yams.schema.EntitySchema`. Each entity type has
a set of attributes and relations, and some permissions which define who can add, read,
update or delete entities of this type.

XXX yams inheritance

Relation type
~~~~~~~~~~~~~
A relation type is an instance of :class:`yams.schema.RelationSchema`. A relation type is simply
a semantic definition of a kind of relationship that may occur in an application.

It is important to choose a good name, at least to avoid conflicts with some semantically
different relation defined in other cubes (since we've no name space yet).

A relation type holds the following properties (which are hence shared between all
relation definitions of that type):

* `inlined`: boolean handling the physical optimization for archiving
  the relation in the subject entity table, instead of creating a specific
  table for the relation. This applies to relations where cardinality
  of subject->relation->object is 0..1 (`?`) or 1..1 (`1`) for *all* its relation
  definitions.

* `symmetric`: boolean indicating that the relation is symmetrical, which
  means that `X relation Y` implies `Y relation X`.


Relation definition
~~~~~~~~~~~~~~~~~~~
A relation definition is an instance of :class:`yams.schema.RelationDefinition`. It is a complete triplet
"<subject entity type> <relation type> <object entity type>".

When creating a new instance of that class, the corresponding
:class:`RelationType` instance is created on the fly if necessary.


Properties
``````````

* Optional properties for attributes and relations:

  - `description`: a string describing an attribute or a relation. By default
    this string will be used in the editing form of the entity, which means
    that it is supposed to help the end-user and should be flagged by the
    function `_` to be properly internationalized.

  - `constraints`: a list of conditions/constraints that the relation has to
    satisfy (c.f. `Constraints`_)

  - `cardinality`: a two character string specifying the cardinality of the
    relation. The first character defines the cardinality of the relation on
    the subject, and the second on the object. When a relation can have
    multiple subjects or objects, the cardinality applies to all,
    not on a one-to-one basis (so it must be consistent...). The possible
    values are inspired from regular expression syntax:

    * `1`: 1..1
    * `?`: 0..1
    * `+`: 1..n
    * `*`: 0..n

* optional properties for attributes:

  - `unique`: boolean indicating if the value of the attribute has to be unique
    or not within all entities of the same type (false by default)

  - `indexed`: boolean indicating if an index needs to be created for this
    attribute in the database (false by default). This is useful only if
    you know that you will have to run numerous searches on the value of this
    attribute.

  - `default`: default value of the attribute. In case of date types, the values
    which could be used correspond to the RQL keywords `TODAY` and `NOW`.

* optional properties for type `String` attributes:

  - `fulltextindexed`: boolean indicating if the attribute is part of
    the full text index (false by default) (*applicable on the type `Byte`
    as well*)

  - `internationalizable`: boolean indicating if the value of the attribute
    is internationalizable (false by default)

* optional properties for relations:

  - `composite`: string indicating that the subject (composite == 'subject')
    is composed of the objects of the relations. For the opposite case (when
    the object is composed of the subjects of the relation), we just set
    'object' as value. The composition implies that when the relation
    is deleted (so when the composite is deleted, at least), the composed are also deleted.

  - `fulltext_container`: string indicating if the value if the full text
    indexation of the entity on one end of the relation should be used
    to find the entity on the other end. The possible values are
    'subject' or 'object'. For instance the use_email relation has
    that property set to 'subject', since when performing a full text
    search people want to find the entity using an email address, and not
    the entity representing the email address.

Constraints
```````````

By default, the available constraint types are:

General Constraints
......................

* `SizeConstraint`: allows to specify a minimum and/or maximum size on
  string (generic case of `maxsize`)

* `BoundConstraint`: allows to specify a minimum and/or maximum value on
  numeric types

* `UniqueConstraint`: identical to "unique=True"

* `StaticVocabularyConstraint`: identical to "vocabulary=(...)"

XXX Attribute, TODAY, NOW

RQL Based Constraints
......................

RQL based constraints may take three arguments. The first one is the ``WHERE``
clause of a RQL query used by the constraint. The second argument ``mainvars``
is the ``Any`` clause of the query. By default this include `S` reserved for the
subject of the relation and `O` for the object. Additional variables could be
specified using ``mainvars``. The argument expects a single string with all
variable's name separated by spaces. The last one, ``msg``, is the error message
displayed when the constraint fails. As RQLVocabularyConstraint never fails the
third argument is not available.

* `RQLConstraint`: allows to specify a RQL query that has to be satisfied
  by the subject and/or the object of relation. In this query the variables
  `S` and `O` are reserved for the relation subject and object entities.

* `RQLVocabularyConstraint`: similar to the previous type of constraint except
  that it does not express a "strong" constraint, which means it is only used to
  restrict the values listed in the drop-down menu of editing form, but it does
  not prevent another entity to be selected.

* `RQLUniqueConstraint`: allows to the specify a RQL query that ensure that an
  attribute is unique in a specific context. The Query must **never** return more
  than a single result to be satisfied. In this query the variables `S` is
  reserved for the relation subject entity. The other variables should be
  specified with the second constructor argument (mainvars). This constraints
  should be used when UniqueConstraint doesn't fit. Here is a simple example ::

    # Check that in the same Workflow each state's name is unique.  Using
    # UniqueConstraint (or unique=True) here would prevent states in different
    # workflows to have the same name.

    # With: State S, Workflow W, String N ; S state_of W, S name N

    RQLUniqueConstraint('S name N, S state_of WF, Y state_of WF, Y name N',
                        mainvars='Y',
                        msg=_('workflow already have a state of that name'))



XXX note about how to add new constraint

.. _securitymodel:


The security model
~~~~~~~~~~~~~~~~~~

The security model of `CubicWeb` is based on `Access Control List`.
The main principles are:

* users and groups of users
* a user belongs to at least one group of user
* permissions (read, update, create, delete)
* permissions are assigned to groups (and not to users)

For *CubicWeb* in particular:

* we associate rights at the entities/relations schema level
* for each entity, we distinguish four kinds of permissions: `read`,
  `add`, `update` and `delete`
* for each relation, we distinguish three kinds of permissions: `read`,
  `add` and `delete` (it is not possible to `modify` a relation)
* the default groups are: `administrators`, `users` and `guests`
* by default, users belong to the `users` group
* there is a virtual group called `owners` to which we
  can associate only `delete` and `update` permissions

  * we can not add users to the `Owners` group, they are
    implicitly added to it according to the context of the objects
    they own
  * the permissions of this group are only checked on `update`/`delete`
    actions if all the other groups the user belongs to do not provide
    those permissions

Setting permissions is done with the attribute `__permissions__` of entities and
relation types. The value of this attribute is a dictionary where the keys are the access types
(action), and the values are the authorized groups or expressions.

For an entity type, the possible actions are `read`, `add`, `update` and
`delete`.

For a relation type, the possible actions are `read`, `add`, and `delete`.

For each access type, a tuple indicates the name of the authorized groups and/or
one or multiple RQL expressions to satisfy to grant access. The access is
provided if the user is in one of the listed groups or if one of the RQL condition
is satisfied.

The standard user groups
````````````````````````

* `guests`

* `users`

* `managers`

* `owners`: virtual group corresponding to the entity's owner.
  This can only be used for the actions `update` and `delete` of an entity
  type.

It is also possible to use specific groups if they are defined in the
precreate script of the cube (``migration/precreate.py``). Defining groups in
postcreate script or later makes them unavailable for security
purposes (in this case, an `sync_schema_props_perms` command has to
be issued in a CubicWeb shell).


Use of RQL expression for write permissions
```````````````````````````````````````````
It is possible to define RQL expression to provide update permission
(`add`, `delete` and `update`) on relation and entity types.

RQL expression for entity type permission:

* you have to use the class `ERQLExpression`

* the used expression corresponds to the WHERE statement of an RQL query

* in this expression, the variables `X` and `U` are pre-defined references
  respectively on the current entity (on which the action is verified) and
  on the user who send the request

* it is possible to use, in this expression, a special relation
  "has_<ACTION>_permission" where the subject is the user and the
  object is any variable, meaning that the user needs to have
  permission to execute the action <ACTION> on the entities related
  to this variable

For RQL expressions on a relation type, the principles are the same except
for the following:

* you have to use the class `RRQLExpression` in the case of a non-final relation

* in the expression, the variables `S`, `O` and `U` are pre-defined references
  to respectively the subject and the object of the current relation (on
  which the action is being verified) and the user who executed the query

* we can also define rights over attributes of an entity (non-final relation),
  knowing that:

  - to define RQL expression, we have to use the class `ERQLExpression`
    in which `X` represents the entity the attribute belongs to

  - the permissions `add` and `delete` are equivalent. Only `add`/`read`
    are actually taken in consideration.

:Note on the use of RQL expression for `add` permission:

  Potentially, the use of an RQL expression to add an entity or a
  relation can cause problems for the user interface, because if the
  expression uses the entity or the relation to create, then we are
  not able to verify the permissions before we actually add the entity
  (please note that this is not a problem for the RQL server at all,
  because the permissions checks are done after the creation). In such
  case, the permission check methods (CubicWebEntitySchema.check_perm
  and has_perm) can indicate that the user is not allowed to create
  this entity but can obtain the permission.
  To compensate this problem, it is usually necessary, for such case,
  to use an action that reflects the schema permissions but which enables
  to check properly the permissions so that it would show up if necessary.


Use of RQL expression for reading rights
````````````````````````````````````````

The principles are the same but with the following restrictions:

* we can not use `RRQLExpression` on relation types for reading

* special relations "has_<ACTION>_permission" can not be used




Defining your schema using yams
-------------------------------

Entity type definition
~~~~~~~~~~~~~~~~~~~~~~

An entity type is defined by a Python class which inherits from
:class:`yams.buildobjs.EntityType`.  The class definition contains the
description of attributes and relations for the defined entity type.
The class name corresponds to the entity type name. It is expected to
be defined in the module ``mycube.schema``.

:Note on schema definition:

 The code in ``mycube.schema`` is not meant to be executed. The class
 EntityType mentioned above is different from the EntitySchema class
 described in the previous chapter. EntityType is a helper class to
 make Entity definition easier. Yams will process EntityType classes
 and create EntitySchema instances from these class definitions. Similar
 manipulation happen for relations.

When defining a schema using python files, you may use the following shortcuts:

- `required`: boolean indicating if the attribute is required, ed subject cardinality is '1'

- `vocabulary`: specify static possible values of an attribute

- `maxsize`: integer providing the maximum size of a string (no limit by default)

For example:

.. sourcecode:: python

  class Person(EntityType):
    """A person with the properties and the relations necessary for my
    application"""

    last_name = String(required=True, fulltextindexed=True)
    first_name = String(required=True, fulltextindexed=True)
    title = String(vocabulary=('Mr', 'Mrs', 'Miss'))
    date_of_birth = Date()
    works_for = SubjectRelation('Company', cardinality='?*')


The entity described above defines three attributes of type String,
last_name, first_name and title, an attribute of type Date for the date of
birth and a relation that connects a `Person` to another entity of type
`Company` through the semantic `works_for`.

:Naming convention:

 Entity class names must start with an uppercase letter. The common
 usage is to use ``CamelCase`` names.

 Attribute and relation names must start with a lowercase letter. The
 common usage is to use ``underscore_separated_words``. Attribute and
 relation names starting with a single underscore are permitted, to
 denote a somewhat "protected" or "private" attribute.

 In any case, identifiers starting with "CW" or "cw" are reserved for
 internal use by the framework.


The name of the Python attribute corresponds to the name of the attribute
or the relation in *CubicWeb* application.

An attribute is defined in the schema as follows::

    attr_name = attr_type(properties)

where `attr_type` is one of the type listed above and `properties` is
a list of the attribute needs to satisfy (see `Properties`_
for more details).

* it is possible to use the attribute `meta` to flag an entity type as a `meta`
  (e.g. used to describe/categorize other entities)

.. XXX the paragraph below needs clarification and / or moving out in
.. another place

*Note*: if you end up with an `if` in the definition of your entity, this probably
means that you need two separate entities that implement the `ITree` interface and
get the result from `.children()` which ever entity is concerned.

Inheritance
```````````
XXX feed me


Definition of relations
~~~~~~~~~~~~~~~~~~~~~~~

XXX add note about defining relation type / definition

A relation is defined by a Python class heriting `RelationType`. The name
of the class corresponds to the name of the type. The class then contains
a description of the properties of this type of relation, and could as well
contain a string for the subject and a string for the object. This allows to create
new definition of associated relations, (so that the class can have the
definition properties from the relation) for example ::

  class locked_by(RelationType):
    """relation on all entities indicating that they are locked"""
    inlined = True
    cardinality = '?*'
    subject = '*'
    object = 'CWUser'

If provided, the `subject` and `object` attributes denote the subject
and object of the various relation definitions related to the relation
type. Allowed values for these attributes are:

* a string corresponding to an entity type
* a tuple of string corresponding to multiple entity types
* special string such as follows:

  - "**": all types of entities
  - "*": all types of non-meta entities
  - "@": all types of meta entities but not system entities (e.g. used for
    the basic schema description)

When a relation is not inlined and not symmetrical, and it does not require
specific permissions, it can be defined using a `SubjectRelation`
attribute in the EntityType class. The first argument of `SubjectRelation` gives
the entity type for the object of the relation.

:Naming convention:

 Although this way of defining relations uses a Python class, the
 naming convention defined earlier prevails over the PEP8 conventions
 used in the framework: relation type class names use
 ``underscore_separated_words``. 

:Historical note:

   It has been historically possible to use `ObjectRelation` which
   defines a relation in the opposite direction. This feature is soon to be
   deprecated and therefore should not be used in newly written code.

:Future deprecation note:

  In an even more remote future, it is quite possible that the
  SubjectRelation shortcut will become deprecated, in favor of the
  RelationType declaration which offers some advantages in the context
  of reusable cubes.

Definition of permissions
~~~~~~~~~~~~~~~~~~~~~~~~~~
The entity type `CWPermission` from the standard library
allows to build very complex and dynamic security architectures. The schema of
this entity type is as follow:

.. sourcecode:: python

    class CWPermission(EntityType):
        """entity type that may be used to construct some advanced security configuration
        """
        name = String(required=True, indexed=True, internationalizable=True, maxsize=100)
        require_group = SubjectRelation('CWGroup', cardinality='+*',
                                        description=_('groups to which the permission is granted'))
        require_state = SubjectRelation('State',
                                        description=_("entity's state in which the permission is applicable"))
        # can be used on any entity
        require_permission = ObjectRelation('**', cardinality='*1', composite='subject',
                                            description=_("link a permission to the entity. This "
                                                          "permission should be used in the security "
                                                          "definition of the entity's type to be useful."))


Example of configuration:

.. sourcecode:: python

    class Version(EntityType):
        """a version is defining the content of a particular project's release"""

        __permissions__ = {'read':   ('managers', 'users', 'guests',),
                           'update': ('managers', 'logilab', 'owners',),
                           'delete': ('managers', ),
                           'add':    ('managers', 'logilab',
                                       ERQLExpression('X version_of PROJ, U in_group G,'
                                                 'PROJ require_permission P, P name "add_version",'
                                                 'P require_group G'),)}


    class version_of(RelationType):
        """link a version to its project. A version is necessarily linked to one and only one project.
        """
        __permissions__ = {'read':   ('managers', 'users', 'guests',),
                           'delete': ('managers', ),
                           'add':    ('managers', 'logilab',
                                  RRQLExpression('O require_permission P, P name "add_version",'
                                                 'U in_group G, P require_group G'),)
                       }
        inlined = True


This configuration indicates that an entity `CWPermission` named
"add_version" can be associated to a project and provides rights to create
new versions on this project to specific groups. It is important to notice that:

* in such case, we have to protect both the entity type "Version" and the relation
  associating a version to a project ("version_of")

* because of the genericity of the entity type `CWPermission`, we have to execute
  a unification with the groups and/or the states if necessary in the expression
  ("U in_group G, P require_group G" in the above example)