cubicweb: doc/book/en/02-foundation.en.txt@d3005cdc968f


.. -*- coding: utf-8 -*-

`CubicWeb` concepts
===================

A little history...
-------------------

`CubicWeb` is a web application framework developped by Logilab_ since 2001.

Entirely written in Python, `CubicWeb` publishes data from all sorts
of sources such as SQL database, LDAP directory and versioning system such
as subversion.

`CubicWeb` user interface was designed to let the final user a huge flexibility
on how to select and how to display content. It allows to browse the knowledge
database and to display the results with the best rendering according to
the context.
This interface flexibility gives back the user the control of the 
rendering parameters that are usually reserved for developpers.


We can list a couple of web applications developped with `CubicWeb`, an online
public phone directory (see http://www.118000.fr/), a system for managing 
digital studies and simulations for a research lab, a tool for shared children
babysitting (see http://garde-partagee.atoukontact.fr/), a tool to manage
software developpment (see http://www.logilab.org), etc.

In 2008, `CubicWeb` was ported for a new type of source : the datastore 
from GoogleAppEngine_.

Global architecture
-------------------
.. image:: images/archi_globale.png

.. note::
  For real, the client and server sides are integrated in the same
  process and interact directly, without the needs for distants
  calls using Pyro. It is important to note down that those two
  sides, client/server, are disjointed and it is possible to execute
  a couple of calls in distincts processes to balance the load of
  your web site on one or more machines.


Terms and vocabulary
--------------------

*schema*
  the schema defines the data model of an application based on entities
  and relations, thanks to the `yams`_ library. This is the core piece
  of an application. It is initially defined in the file system and is
  stored in the database at the time an instance is created. `CubicWeb`
  provides a certain number of system entities included automatically as
  it is necessarry for the core of `CubicWeb` and a library of
  cubes that can be explicitely included if necessary.


*entity type*
  an entity is a set of attributes; the essential attribute of
  an entity is its key, named eid

*relation type*
  entities are linked to each others by relations. In `CubicWeb`
  relations are binary: by convention we name the first item of
  a relation the `subject` and the second the `object`.

*final entity type*
  final types corresponds to the basic types such as string of characters,
  integers... Those types have a main property which is that they can
  only be used as `object` of a relation. The attributes of an entity
  (non final) are entities (finals).

*final relation type*
  a relation is said final if its `object` is a final type. This is equivalent
  to an entity attribute.

*relation definition*
  a relation definition is a 3-uple (subject entity type, relation type, object entity type),
  with an associated set of property such as cardinality, constraints...
  
*repository*
  this is the RQL server side of `CubicWeb`. Be carefull not to get
  confused with a Mercurial repository or a debian repository.

*source*
  a data source is a container of data (SGBD, LDAP directory...) integrated in the
  `CubicWeb` repository. This repository has at least one source, `system` which 
  contains the schema of the application, plain-text index and others
  vital informations for the system.

*configuration*
  it is possible to create differents configurations for an instance:

  - ``repository`` : repository only, accessible for clients using Pyro
  - ``twisted`` : web interface only, access the repository using Pyro
  - ``all-in-one`` : web interface and repository in a single process. 
     The repository could be or not accessible using Pyro.

*cube*
  a cube is a model grouping one or multiple data types and/or views
  to provide a specific functionnality or a complete `CubicWeb` application
  potentially using other cubes. The available subes are located in the file
  system at `/path/to/forest/cubicweb/cubes`

*instance*
  an instance is a specific installation of a cube. All the required 
  configuration files necessarry for the well being of your web application
  are grouped in an instance. This will refer to the cube(s) your application
  is based on.
  By example logilab.org and our intranet are two instances of a single
  cube jpl, developped internally.
  The instances are defined in the directory `~/etc/cubicweb.d`.

*application*
  the term application is sometime used to talk about an instance
  and sometimes to talk of a cube depending on the context. 
  So we would like to avoid using this term and try to use *cube* and
  *instance* instead.

*result set*
  this object contains the results of an RQL query and information on the query.

*Pyro*
  `Python Remote Object`_, distributed objects system similar to Java's RMI
  (Remote Method Invocation), which can be used for the dialog between the web
  side of the framework and the RQL repository.

.. _`Python Remote Object`: http://pyro.sourceforge.net/
.. _`yams`: http://www.logilab.org/project/yams/


`CubicWeb` engine
-----------------

The engine in `CubicWeb` is a set of classes managing a set of objects loaded
dynamically at the startup of `CubicWeb` (*appobjects*). Those dynamics objects, based on the schema
or the library, are building the final application. The differents dymanic components are
by example:

* client and server side

  - entities definition, containing the logic which enables application data manipulation

* client side

  - *views*, or more specifically

    - boxes
    - header and footer
    - forms
    - page templates

  - *actions*
  - *controllers*

* server side

  - notification hooks
  - notification views

The components of the engine are:

* a frontal web (only twisted is available so far), transparent for dynamic objects
* an object that encapsulates the configuration
* a `registry` (`cubicweb.cwvreg`) containing the dynamic objects loaded automatically

Every *appobject* may access to the instance configuration using its *config* attribute
and to the registry using its *vreg* attribute.

Details of the recording process
--------------------------------

At startup, the `registry` or registers base, inspects a number of directories
looking for compatible classes definition. After a recording process, the objects
are assigned to registers so that they can be selected dynamically while the
application is running.

The base class of those objects is `AppRsetObject` (module `cubicweb.common.appobject`).

XXX registers example
XXX actual details of the recording process!

Runtime objects selection
-------------------------
XXX tell why it's a cw foundation!

Application objects are stored in the registry using a two level hierarchy :

  object's `__registry__` : object's `id` : [list of app objects]

The following rules are applied to select an object given a register and an id and an input context:
* each object has a selector
  - its selector may be derivated from a set of basic (or not :)
    selectors using `chainall` or `chainfirst` combinators
* a selector return a score >= 0
* a score of 0 means the objects can't be applied to the input context
* the object with the greatest score is selected. If multiple objects have an
  identical score, one of them is selected randomly (this is usually a bug)

The object's selector is the `__selector__` class method on the object's class.

The score is used to choose the most pertinent objects where there are more than
one selectable object. For instance, if you're selecting the primary
(eg `id = 'primary'`) view (eg `__registry__ = 'view'`) for a result set containing
a `Card` entity, 2 objects will probably be selectable:

* the default primary view (`accepts = 'Any'`)
* the specific `Card` primary view (`accepts = 'Card'`)

This is because primary views are using the `accept_selector` which is considering the
`accepts` class attribute of the object's class. Other primary views specific to other
entity types won't be selectable in this case. And among selectable objects, the
accept selector will return a higher score the the second view since it's more
specific, so it will be selected as expected.

Usually, you won't define it directly but by defining the `__selectors__` tuple
on the class, with ::

  __selectors__ = (sel1, sel2)

which is equivalent to ::

  __selector__ = chainall(sel1, sel2)

The former is prefered since it's shorter and it's ease overriding in
subclasses (you have access to sub-selectors instead of the wrapping function).

:chainall(selectors...): if one selector return 0, return 0, else return the sum of scores

:chainfirst(selectors...): return the score of the first selector which has a non zero score

XXX describe standard selector (link to generated api doc!)

Example
~~~~~~~
Le but final : quand on est sur un Blog, on veut que le lien rss de celui-ci pointe
vers les entrées de ce blog, non vers l'entité blog elle-même.

L'idée générale pour résoudre ça : on définit une méthode sur les classes d'entité
qui renvoie l'url du flux rss pour l'entité en question. Avec une implémentation
par défaut sur AnyEntity et une implémentation particulière sur Blog qui fera ce
qu'on veut.

La limitation : on est embêté dans le cas ou par ex. on a un result set qui contient
plusieurs entités Blog (ou autre chose), car on ne sait pas sur quelle entité appeler
la méthode sus-citée. Dans ce cas, on va conserver le comportement actuel (eg appel
à limited_rql)

Donc : on veut deux cas ici, l'un pour un rset qui contient une et une seule entité,
l'autre pour un rset qui contient plusieurs entité.

Donc... On a déja dans web/views/boxes.py la classe RSSIconBox qui fonctionne. Son
sélecteur ::

  class RSSIconBox(ExtResourcesBoxTemplate):
    """just display the RSS icon on uniform result set"""
    __selectors__ = ExtResourcesBoxTemplate.__selectors__ + (nfentity_selector,)


indique qu'il prend en compte :

* les conditions d'apparition de la boite (faut remonter dans les classes parentes
  pour voir le détail)
* nfentity_selector, qui filtre sur des rset contenant une liste d'entité non finale

ça correspond donc à notre 2eme cas. Reste à fournir un composant plus spécifique
pour le 1er cas ::

  class EntityRSSIconBox(RSSIconBox):
    """just display the RSS icon on uniform result set for a single entity"""
    __selectors__ = RSSIconBox.__selectors__ + (onelinerset_selector,)


Ici, on ajoute onelinerset_selector, qui filtre sur des rset de taille 1. Il faut
savoir que quand on chaine des selecteurs, le score final est la somme des scores
renvoyés par chaque sélecteur (sauf si l'un renvoie zéro, auquel cas l'objet est
non sélectionnable). Donc ici, sur un rset avec plusieurs entités, onelinerset_selector
rendra la classe EntityRSSIconBox non sélectionnable, et on obtiendra bien la
classe RSSIconBox. Pour un rset avec une entité, la classe EntityRSSIconBox aura un
score supérieur à RSSIconBox et c'est donc bien elle qui sera sélectionnée.

Voili voilou, il reste donc pour finir tout ça :

* à définir le contenu de la méthode call de EntityRSSIconBox
* fournir l'implémentation par défaut de la méthode renvoyant l'url du flux rss sur
  AnyEntity
* surcharger cette methode dans blog.Blog

When to use selectors?
~~~~~~~~~~~~~~~~~~~~~~
Il faut utiliser les sélecteurs pour faire des choses différentes en
fonction de ce qu'on a en entrée. Dès qu'on a un "if" qui teste la
nature de `self.rset` dans un objet, il faut très sérieusement se
poser la question s'il ne vaut pas mieux avoir deux objets différent
avec des sélecteurs approprié.

If this is so fundamental, why don't I see them more often?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Because you're usually using base classes which are hiding the plumbing
of __registry__ (almost always), id (often when using "standard" object),
register and selector.

API Python/RQL
--------------

Inspired from the standard db-api, with a Connection object having the methods
cursor, rollback and commit essentially. The most important method is
the `execute` method of a cursor :

`execute(rqlstring, args=None, eid_key=None, build_descr=True)`

:rqlstring: the RQL query to execute (unicode)
:args: if the query contains substitutions, a dictionnary containing the values to use
:eid_key: 
   an implementation detail of the RQL queries cache implies that if a substitution
   is used to introduce an eid *susceptible to raise the ambiguities in the query
   type resolution*, then we have to specify the correponding key in the dictionnary
   through this argument


The `Connection` object owns the methods `commit` and `rollback`. You *should
never need to use them* during the development of the web interface based on
the `CubicWeb` framework as it determines the end of the transaction depending 
on the query execution success.

.. note::
  While executing updates queries (SET, INSERT, DELETE), if a query generates
  an error related to security, a rollback is automatically done on the current
  transaction.
  

The `Request` class (`cubicweb.web`)
------------------------------------

A request instance is created when an HTPP request is sent to the web server.
It contains informations such as forms parameters, user authenticated, etc.

**Globally, a request represents a user query, either through HTTP or not
(we also talk about RQL queries on the server side by example)**

An instance of `Request` has the following attributes:

* `user`, instance of `cubicweb.common.utils.User` corresponding to the authenticated
  user
* `form`, dictionnary containing the values of a web form
* `encoding`, characters encoding to use in the response

But also:

:Session data handling:
  * `session_data()`, returns a dictinnary containing all the session data
  * `get_session_data(key, default=None)`, returns a value associated to the given
    key or the value `default` if the key is not defined
  * `set_session_data(key, value)`, assign a value to a key
  * `del_session_data(key)`,  suppress the value associated to a key
    

:Cookies handling:
  * `get_cookie()`, returns a dictionnary containing the value of the header
    HTTP 'Cookie'
  * `set_cookie(cookie, key, maxage=300)`, adds a header HTTP `Set-Cookie`,
    with a minimal 5 minutes length of duration by default (`maxage` = None
    returns a *session* cookie which will expire when the user closes the browser
    window
  * `remove_cookie(cookie, key)`, forces a value to expire

:URL handling:
  * `url()`, returns the full URL of the HTTP request
  * `base_url()`, returns the root URL of the web application
  * `relative_path()`, returns the relative path of the request

:And more...:
  * `set_content_type(content_type, filename=None)`, adds the header HTTP
    'Content-Type'
  * `get_header(header)`, returns the value associated to an arbitrary header
    of the HTTP request
  * `set_header(header, value)`, adds an arbitrary header in the response
  * `cursor()` returns a RQL cursor on the session
  * `execute(*args, **kwargs)`, shortcut to ``.cursor().execute()``
  * `property_value(key)`, properties management (`EProperty`)
  * dictionnary `data` to store data to share informations between components
    *while a request is executed*

Please note down that this class is abstract and that a concrete implementation
will be provided by the *frontend* web used (in particular *twisted* as of
today). For the views or others that are executed on the server side,
most of the interface of `Request` is defined in the session associated
to the client.

The `AppObject` class
---------------------

In general:

* we do not inherit directly from this class but from a more specific
  class such as `AnyEntity`, `EntityView`, `AnyRsetView`,
  `Action`...

* to be recordable, a subclass has to define its own register (attribute
  `__registry__`) and its identifier (attribute `id`). Usually we do not have
  to take care of the register, only the identifier `id`.

We can find a certain number of attributes and methods defined in this class 
and so common to all the application objects:

At the recording, the following attributes are dynamically added to
the *subclasses*:

* `vreg`, the `vregistry` of the application
* `schema`, the application schema
* `config`, the application configuration

We also find on instances, the following attributes:

* `req`, `Request` instance
* `rset`, the *result set* associated to the object if necessarry
* `cursor`, rql cursor on the session


:URL handling:
  * `build_url(method=None, **kwargs)`, returns an absolute URL based on
    the given arguments. The *controller* supposed to handle the response
    can be specified through the special parameter `method` (the connection
    is theoretically done automatically :).

  * `datadir_url()`, returns the directory of the application data
    (contains static files such as images, css, js...)

  * `base_url()`, shortcut to `req.base_url()`

  * `url_quote(value)`, version *unicode safe* of the function `urllib.quote`

:Data manipulation:

  * `etype_rset(etype, size=1)`, shortcut to `vreg.etype_rset()`

  * `eid_rset(eid, rql=None, descr=True)`, returns a *result set* object for
    the given eid
  * `entity(row, col=0)`, returns the entity corresponding to the data position
    in the *result set* associated to the object

  * `complete_entity(row, col=0, skip_bytes=True)`, is equivalent to `entity` but
    also call the method `complete()` on the entity before returning it

:Data formatting:
  * `format_date(date, date_format=None, time=False)`
  * `format_time(time)`

:And more...:

  * `external_resource(rid, default=_MARKER)`, access to a value defined in the
    configuration file `external_resource`
    
  * `tal_render(template, variables)`, 


.. note::
  When we inherit from `AppObject` (even not directly), you *always* have to use
  **super()** to get the methods and attributes of the superclasses, and not
  use the class identifier.
  By example, instead of writting: ::

      class Truc(PrimaryView):
          def f(self, arg1):
              PrimaryView.f(self, arg1)

  You'd better write: ::
  
      class Truc(PrimaryView):
          def f(self, arg1):
              super(Truc, self).f(arg1)


Standard structure for a cube
-----------------------------

A complex cube is structured as follows:

::
  
  mycube/
  |
  |-- schema.py
  |
  |-- entities/
  |
  |-- sobjects/
  |
  |-- views/
  |
  |-- test/
  |
  |-- i18n/
  |
  |-- data/
  |
  |-- migration/
  | |- postcreate.py
  | \- depends.map
  |
  |-- debian/
  |
  \-- __pkginfo__.py
    
We can use simple Python module instead of packages, by example: 

::
  
  mycube/
  |
  |-- entities.py
  |-- hooks.py
  \-- views.py
    

where :

* ``schema`` contains the schema definition (server side only)
* ``entities`` contains the entities definition (server side and web interface)
* ``sobjects`` contains hooks and/or views notifications (server side only)
* ``views`` contains the different components of the web interface (web interface only)
* ``test`` contains tests specifics to the application (not installed)
* ``i18n`` contains the messages catalog for supported languages (server side and 
  web interface) 
* ``data`` contains arbitrary data files served statically
  (images, css, javascripts files)... (web interface only)
* ``migration`` contains the initialization file for new instances
  (``postcreate.py``) and in general a file containing the `CubicWeb` dependancies 
  of the cube depending on its version (``depends.map``)
* ``debian`` contains all the files that manages the debian packaging
  (you would find there the classical structure with ``control``, ``rules``, 
  ``changelog``... (not installed)
* the file ``__pkginfo__.py`` provides meta-data on the cube, especially the 
  distribution name and the current version (server side and web interface) or
  also the sub-cubes used by this cube

The only required files are:

* the file ``__pkginfo__.py``
* the schema definition
  XXX false, we may want to have cubes which are only adding a service, no persistent data (eg embeding for instance)
author	Sylvain Thenault <sylvain.thenault@logilab.fr>
	Thu, 20 Nov 2008 16:42:22 +0100
changeset 110	d3005cdc968f
parent 96	c1d04b2fa8c6
child 112	52bf52e6fc77
permissions	-rw-r--r--