doc/book/en/C040-rql.en.txt
author Nicolas Chauvat <nicolas.chauvat@logilab.fr>
Wed, 03 Jun 2009 19:38:14 +0200
branchtreeview-tabs
changeset 2026 74edd1b34c52
parent 405 295f3cd70915
child 1165 9d325c3b0014
permissions -rw-r--r--
closing treeview-tabs branch merged into trunk.

.. -*- coding: utf-8 -*-

.. _RQL:

======================================
RQL language (Relation Query Language)
======================================

Introduction
============

Goals RQL
---------

The goal is to have a language emphasizing the way of browsing
relations. As such, attributes will be regarded as cases of
special relations (in terms of implementation, the user
language not to see virtually no difference between an attribute and a
relation).

RQL is inspired by SQL but is the highest level. A knowledge of the 
`CubicWeb` schema defining the application is necessary.

Comparison with existing languages
----------------------------------

SQL
```
RQL builds on the features of SQL but is at a higher level
(the current implementation of RQL generates SQL). For that it is limited
to the way of browsing relations and introduces variables. 
The user does not need to know the model underlying SQL, but the `CubicWeb` 
schema defining the application.

Versa
`````
Should I look in more detail, but here is already some ideas for
the moment ... Versa_ is the language most similar to what we wanted
to do, but the model underlying data being RDF, there is some
number of things such as namespaces or handling of the RDF types which 
does not interest us. On the functionality level, Versa_ is very comprehensive
including through many functions of conversion and basic types manipulation,
which may need to be guided at one time or another. 
Finally, the syntax is a little esoteric.

Sparql
``````

The query language most similar to RQL is SPARQL_, defined by the W3C to serve
for the semantic web. 


The different types of queries
------------------------------

Search (`Any`)
   This type of query can extract entities and attributes of entities.

Inserting entities (`INSERT`)
   This type of query is used to insert new entities in the database. It
   will also create direct relationships entities newly created.

Update entities, relations creation (`SET`)
   This type of query updates existing entities in the database,
   or create relations between existing entities.

Deletion of entities or relationship (`DELETE`)
   This type of query allows for the removal of entities and relations existing
   in the database.

Search Query
------------

   [ `DISTINCT`] <entity type> V1 (V2) \ *
   [ `GROUPBY` V1 (V2) \*] [ `ORDERBY` <orderterms>]
   [ `WHERE` <restriction>]
   [ `LIMIT` <value>] [ `OFFSET` <value>]

:entity type:
   Type of selected variables.
   The special type `Any` is equivalent to not specify a type.
:restriction:
   list of relations to go through whic follow the pattern
     `V1 relation V2 | <static value>`
:orderterms:
   Definition of the selection order: variable or column number followed by
   sorting method ( `ASC`, `DESC`), ASC is the default.
:note for grouped queries:
   For grouped queries (e.g., a clause `GROUPBY`), all
   selected variables must be aggregated or grouped.



- *Search for the object of identifier 53*
   ::

        Any WHERE X
        X eid 53

- *Search material such as comics, owned by syt and available*
   ::

        WHERE X Document
        X occurence_of F, F class C, C name 'Comics'
        X owned_by U, U login 'syt'
        X available true

- *Looking for people working for eurocopter interested in training*
   ::

        Any P WHERE
        P is Person, P work_for P, S name 'Eurocopter'
        P interested_by T, T name 'training'

- *Search note less than 10 days old written by jphc or ocy*
   ::

        Any N WHERE
        N is Note, N written_on D, D day> (today -10),
        N written_by P, P name 'jphc' or P name 'ocy'

- *Looking for people interested in training or living in Paris*
   ::

        Any P WHERE
        P is Person, (P interested_by T, T name 'training') OR
        (P city 'Paris')

- *The name and surname of all people*
   ::

        Any N, P WHERE
        X is Person, X name N, X first_name P

   Note that the selection of several entities generally force
   the use of "Any" because the type specification applies otherwise
   to all the selected variables. We could write here
   ::

        String N, P WHERE
        X is Person, X name N, X first_name P


Insertion query
---------------

    `INSERT` <entity type> V1 (, <entity type> V2) \ * `:` <assignments>
    [ `WHERE` <restriction>]

: assignments:
   list of relations to assign in the form `V1 relationship V2 | <static value>`

The restriction can define variables used in assignments.

Caution, if a restriction is specified, the insertion is done for 
*each line results returned by the restriction*.

- *Insert a new person named 'foo'*
   ::

        INSERT Person X: X name 'foo'

- *Insert a new person named 'foo', another called 'nice' and a 'friend' relation
  between them*
  ::

        INSERT Person X, Person Y: X name 'foo', Y name 'nice', X friend Y

- *Insert a new person named 'foo' and a 'friend' relation with an existing 
  person called 'nice'*
  ::

        INSERT Person X: X name 'foo', X friend  Y WHERE name 'nice'

Update and relation creation queries
------------------------------------
    `SET` <assignements>
    [ `WHERE` <restriction>]

Caution, if a restriction is specified, the update is done *for
each line results returned by the restriction*.

- *Renaming of the person named 'foo' to 'bar' with the first name changed*
  ::

        SET X name 'bar', X first_name 'original' where X is Person X name 'foo'

- *Insert a relation of type 'know' between objects linked by 
  the relation of type 'friend'*
  ::

        SET X know Y  WHERE X friend Y


Deletion query
--------------
    `DELETE` (<entity type> V) | (V1 relation v2 ),...
    [ `WHERE` <restriction>]

Caution, if a restriction is specified, the deletion is made *for
each line results returned by the restriction*.

- *Deletion of the person named 'foo'*
  ::

        DELETE Person X WHERE X name 'foo'

- *Removal of all relations of type 'friend' from the person named 'foo'*
  ::

        DELETE X friend Y WHERE X is Person, X name 'foo'


Undocumented (yet) type of queries
----------------------------------

**Limit / offset**
::
    
    Any P ORDERBY N LIMIT 5 OFFSET 10 WHERE P is Person, P firstname N

**Function calls**
::
    
    Any UPPER(N) WHERE P firstname N

**Exists**
::
    
    Any X ORDERBY PN,N
    WHERE X num N, X version_of P, P name PN, 
          EXISTS(X in_state S, S name IN ("dev", "ready"))
          OR EXISTS(T tags X, T name "priority")

**Left outer join**
::

    Any T,P,V WHERE T is Ticket, T concerns P, T done_in V?
    
    
**Having**
::
    
    Any X GROUPBY X WHERE X knows Y HAVING COUNT(Y) > 10

**Simple union**
::

    (Any X WHERE X is Person) UNION (Any X WHERE X is Company)
    
**Complex union**
::

     DISTINCT Any W, REF
        WITH W, REF BEING 
            (
	      (Any W, REF WHERE W is Workcase, W ref REF, 
                                 W concerned_by D, D name "Logilab")
               UNION 
              (Any W, REF WHERE W is Workcase, W ref REF, '
                                W split_into WP, WP name "WP1")
            )


Language definition
===================

Reserved keywords
-----------------
The keywords are not case sensitive.

::

     DISTINCT, INSERT, SET, DELETE,
     WHERE, AND, OR, NOT, EXISTS,
     IN, LIKE, UNION, WITH, BEING,
     TRUE, FALSE, NULL, TODAY, NOW,
     LIMIT, OFFSET,
     HAVING, GROUPBY, ORDERBY, ASC, DESC


Variables and Typing
--------------------

With RQL, we do not distinguish between entities and attributes. The
value of an attribute is considered an entity of a particular type (see
below), linked to one (real) entity by a relation called the name of
the attribute.

Entities and values to browse and/or select are represented in
the query by *variables* that must be written in capital letters.

There is a special type **Any**, referring to a non specific type.

We can restrict the possible types for a variable using the
special relation **is**.
The possible type(s) for each variable is derived from the schema
according to the constraints expressed above and thanks to the relations between
each variable.

Built-in types
``````````````

The base types supported are string (between double or single quotes),
integers or floats (the separator is the'.'), dates and
boolean. We expect to receive a schema in which types String,
Int, Float, Date and Boolean are defined.

* `String` (literal: between double or single quotes).
* `Int`, `Float` (separator being'.').
* `Date`, `Datetime`, `Time` (literal: string YYYY/MM/DD [hh:mm] or keywords
   `TODAY` and `NOW`).
* `Boolean` (keywords `TRUE` and `FALSE`).
* `Keyword` NULL.


Operators
---------

Logical Operators
```````````````````
::

     AND, OR, ','

',' is equivalent to 'AND' but with the smallest among the priority
of logical operators (see :ref:`PriorityOperators`).

Mathematical Operators
``````````````````````
::

     +, -, *, /

Comparison operators
````````````````````
::

     =, <, <=, >=, > = ~, IN, LIKE

* The operator `=` is the default operator.

* The operator `LIKE` equivalent to `~=` can be used with the
  special character `%` in a string to indicate that the chain 
  must start or finish by a prefix/suffix:
  ::

     Any X WHERE X name =~ 'Th%'
     Any X WHERE X name LIKE '%lt'

* The operator `IN` provides a list of possible values:
  ::
  
    Any X WHERE X name IN ( 'chauvat', 'fayolle', 'di mascio', 'thenault')


XXX nico: A trick <> 'bar' would not it be more convenient than NOT A
trick 'bar'?

.. _PriorityOperators:

Operators priority
``````````````````

1. '*', '/'

2. '+', '-'

3. 'and'

4. 'or'

5. ','


Advanced Features
-----------------

Functions aggregates
````````````````````
::

     COUNT, MIN, MAX, AVG, SUM

Functions on string
```````````````````
::

     UPPER, LOWER

Optional relations
``````````````````

* They allow you to select entities related or not to another.

* You must use the `?` behind the variable to specify that the relation
  toward it is optional:

   - Anomalies of a project attached or not to a version ::

       Any X, V WHERE X concerns P, P eid 42, X corrected_in V?

   - All cards and the project they document if necessary ::

       Any C, P WHERE C is Card, P? documented_by C



BNF grammar
-----------

The terminal elements are in capital letters, non-terminal in lowercase.
The value of the terminal elements (between quotes) is a Python regular
expression.
::

     statement:: = (select | delete | insert | update) ';'


     # select specific rules
     select      ::= 'DISTINCT'? E_TYPE selected_terms restriction? group? sort?

     selected_terms ::= expression ( ',' expression)*

     group       ::= 'GROUPBY' VARIABLE ( ',' VARIABLE)*

     sort        ::= 'ORDERBY' sort_term ( ',' sort_term)*

     sort_term   ::=  VARIABLE sort_method =?

     sort_method ::= 'ASC' | 'DESC'


     # delete specific rules
     delete ::= 'DELETE' (variables_declaration | relations_declaration) restriction?


     # insert specific rules
     insert ::= 'INSERT' variables_declaration ( ':' relations_declaration)? restriction?


     # update specific rules
     update ::= 'SET' relations_declaration restriction


     # common rules
     variables_declaration ::= E_TYPE VARIABLE (',' E_TYPE VARIABLE)*

     relations_declaration ::= simple_relation (',' simple_relation)*

     simple_relation ::= VARIABLE R_TYPE expression

     restriction ::= 'WHERE' relations

     relations   ::= relation (LOGIC_OP relation)*
                   | '(' relations')'

     relation    ::= 'NOT'? VARIABLE R_TYPE COMP_OP? expression
                   | 'NOT'? R_TYPE VARIABLE 'IN' '(' expression (',' expression)* ')'
                   
     expression  ::= var_or_func_or_const (MATH_OP var_or_func_or_const) *
                   | '(' expression ')'

     var_or_func_or_const ::= VARIABLE | function | constant

     function    ::= FUNCTION '(' expression ( ',' expression) * ')'

     constant    ::= KEYWORD | STRING | FLOAT | INT

     # tokens
     LOGIC_OP ::= ',' | 'GOLD' | 'AND'
     MATH_OP  ::= '+' | '-' | '/' | '*'
     COMP_OP  ::= '>' | '>=' | '=' | '<=' | '<' | '~=' | 'LIKE'

     FUNCTION ::= 'MIN' | 'MAX' | 'SUM' | 'AVG' | 'COUNT' | 'upper' | 'LOWER'

     VARIABLE ::= '[A-Z][A-Z0-9]*'
     E_TYPE   ::= '[A-Z]\w*'
     R_TYPE   ::= '[a-z_]+'

     KEYWORD  ::= 'TRUE' | 'FALSE' | 'NULL' | 'TODAY' | 'NOW'
     STRING   ::= "'([^'\]|\\.)*'" |'"([^\"]|\\.)*\"'
     FLOAT    ::= '\d+\.\d*'
     INT      ::= '\d+'


Remarks
-------

Sorting and groups
``````````````````

- For grouped queries (e.g. with a GROUPBY clause), all
  selected variables should be grouped.

- To group and/or sort by attributes, we can do: "X,L user U, U
  login L GROUPBY L, X ORDERBY L"

- If the sorting method (SORT_METHOD) is not specified, then the sorting is
  ascendant.

Negation
````````

* A query such as `Document X WHERE NOT X owned_by U` means "the
  documents have no relation `owned_by`".
* But the query `Document X WHERE NOT X owned_by U, U login "syt"`
  means "the documents have no relation `owned_by` with the user
  syt". They may have a relation "owned_by" with another user.

Identity
````````

You can use the special relation `identity` in a query to 
add an identity constraint between two variables. This is equivalent
to ``is`` in python::

   Any A WHERE A comments B, A identity B

return all objects that comment themselves. The relation
`identity` is especially useful when defining the rules for securities
with `RQLExpressions`.

Implementation
==============

Internal representation (syntactic tree)
----------------------------------------

The tree research does not contain the selected variables 
(e.g. there is only what follows "WHERE").

The insertion tree does not contain the variables inserted or relations
defined on these variables (e.g. there is only what follows "WHERE").

The removal tree does not contain the deleted variables and relations
(e.g. there is only what follows the "WHERE").

The update tree does not contain the variables and relations updated
(e.g. there is only what follows the "WHERE").

::

     Select         ((Relationship | And | Gold)?, Group?, Sort?)
     Insert         (Relations | And | Gold)?
     Delete         (Relationship | And | Gold)?
     Update         (Relations | And | Gold)?

     And            ((Relationship | And | Gold), (Relationship | And | Gold))
     Or             ((Relationship | And | Gold), (Relationship | And | Gold))

     Relationship   ((VariableRef, Comparison))

     Comparison     ((Function | MathExpression | Keyword | Constant | VariableRef) +)

     Function       (())
     MathExpression ((MathExpression | Keyword | Constant | VariableRef), (MathExpression | Keyword | Constant | VariableRef))

     Group          (VariableRef +)
     Sort           (SortTerm +)
     SortTerm       (VariableRef +)

     VariableRef    ()
     Variable       ()
     Keyword        ()
     Constant       ()


Remarks
-------

- The current implementation does not support linking two relations of type
  'is' with a OR. I do not think that the negation is  supported on this type 
  of relation (XXX FIXME to be confirmed).

- Relations defining the variables must be left to those using them. 
  For example::

     Point P where P abs X, P ord Y, P value X+Y

   is valid, but::

     Point P where P abs X, P value X+Y, P ord Y

   is not.

RQL logs
--------

You can configure the `CubicWeb` application so that you keep a log
of the queries executed against your database. To do so, you want to
edit the configuration file of your application 
``.../etc/cubicweb.d/myapp/all-in-one.conf`` and uncomment the
variable ``query-log-file``: ::

  # web application query log file
  query-log-file=/tmp/rql-myapp.log


Conclusion
==========

Limitations
-----------

It lacks at the moment:

- COALESCE

- restrictions on groups (HAVING)

and certainly other things ...

A disadvantage is that to use this language we must know the
format used (with real relation names and entities, not those viewing
in the user interface). On the other hand, we can not really bypass
that, and it is the job of a user interface to hide the RQL.


Topics
------

It would be convenient to express the schema matching
relations (non-recursive rules)::

     Document class Type <-> Document occurence_of Fiche class Type
     Sheet class Type    <-> Form collection Collection class Type
    
Therefore 1. becomes::

     Document X where
     X class C, C name 'Cartoon'
     X owned_by U, U login 'syt'
     X available true

I'm not sure that we should handle this at RQL level ...

There should also be a special relation 'anonymous'.



.. _Versa: http://uche.ogbuji.net/tech/rdf/versa/
.. _SPARQL: http://www.w3.org/TR/rdf-sparql-query/


[FIXME] see also RQL documentation in source rql/doc.