diff -r 76ab3c71aff2 -r c67bcee93248 doc/book/annexes/rql/language.rst --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/doc/book/annexes/rql/language.rst Thu Jan 08 22:11:06 2015 +0100 @@ -0,0 +1,804 @@ +.. -*- coding: utf-8 -*- + +.. _RQL: + +RQL syntax +---------- + +.. _RQLKeywords: + +Reserved keywords +~~~~~~~~~~~~~~~~~ + +:: + + AND, ASC, BEING, DELETE, DESC, DISTINCT, EXISTS, FALSE, GROUPBY, + HAVING, ILIKE, INSERT, LIKE, LIMIT, NOT, NOW, NULL, OFFSET, + OR, ORDERBY, SET, TODAY, TRUE, UNION, WHERE, WITH + +The keywords are not case sensitive. You should not use them when defining your +schema, or as RQL variable names. + + +.. _RQLCase: + +Case +~~~~ + +* Variables should be all upper-cased. + +* Relation should be all lower-cased and match exactly names of relations defined + in the schema. + +* Entity types should start with an upper cased letter and be followed by at least + a lower cased latter. + + +.. _RQLVariables: + +Variables and typing +~~~~~~~~~~~~~~~~~~~~ + +Entities and values to browse and/or select are represented in the query by +*variables* that must be written in capital letters. + +With RQL, we do not distinguish between entities and attributes. The value of an +attribute is considered as an entity of a particular type (see below), linked to +one (real) entity by a relation called the name of the attribute, where the +entity is the subject and the attribute the object. + +The possible type(s) for each variable is derived from the schema according to +the constraints expressed above and thanks to the relations between each +variable. + +We can restrict the possible types for a variable using the special relation +**is** in the restrictions. + + +.. _VirtualRelations: + +Virtual relations +~~~~~~~~~~~~~~~~~ + +Those relations may only be used in RQL query but are not actual attributes of +your entities. + +* `has_text`: relation to use to query the full text index (only for entities + having fulltextindexed attributes). + +* `identity`: relation to use to tell that a RQL variable is the same as another + when you've to use two different variables for querying purpose. On the + opposite it's also useful together with the ``NOT`` operator to tell that two + variables should not identify the same entity + + +.. _RQLLiterals: + +Literal expressions +~~~~~~~~~~~~~~~~~~~ + +Bases types supported by RQL are those supported by yams schema. Literal values +are expressed as explained below: + +* string should be between double or single quotes. If the value contains a + quote, it should be preceded by a backslash '\\' + +* floats separator is dot '.' + +* boolean values are ``TRUE`` and ``FALSE`` keywords + +* date and time should be expressed as a string with ISO notation : YYYY/MM/DD + [hh:mm], or using keywords ``TODAY`` and ``NOW`` + +You may also use the ``NULL`` keyword, meaning 'unspecified'. + + +.. _RQLOperators: + +Operators +~~~~~~~~~ + +.. _RQLLogicalOperators: + +Logical operators +````````````````` +:: + + AND, OR, NOT, ',' + +',' is equivalent to 'AND' but with the smallest among the priority of logical +operators (see :ref:`RQLOperatorsPriority`). + +.. _RQLMathematicalOperators: + +Mathematical operators +`````````````````````` + ++----------+---------------------+-----------+--------+ +| Operator | Description | Example | Result | ++==========+=====================+===========+========+ +| `+` | addition | 2 + 3 | 5 | ++----------+---------------------+-----------+--------+ +| `-` | subtraction | 2 - 3 | -1 | ++----------+---------------------+-----------+--------+ +| `*` | multiplication | 2 * 3 | 6 | ++----------+---------------------+-----------+--------+ +| / | division | 4 / 2 | 2 | ++----------+---------------------+-----------+--------+ +| % | modulo (remainder) | 5 % 4 | 1 | ++----------+---------------------+-----------+--------+ +| ^ | exponentiation | 2.0 ^ 3.0 | 8 | ++----------+---------------------+-----------+--------+ +| & | bitwise AND | 91 & 15 | 11 | ++----------+---------------------+-----------+--------+ +| `|` | bitwise OR | 32 | 3 | 35 | ++----------+---------------------+-----------+--------+ +| # | bitwise XOR | 17 # 5 | 20 | ++----------+---------------------+-----------+--------+ +| ~ | bitwise NOT | ~1 | -2 | ++----------+---------------------+-----------+--------+ +| << | bitwise shift left | 1 << 4 | 16 | ++----------+---------------------+-----------+--------+ +| >> | bitwise shift right | 8 >> 2 | 2 | ++----------+---------------------+-----------+--------+ + + +Notice integer division truncates results depending on the backend behaviour. For +instance, postgresql does. + + +.. _RQLComparisonOperators: + +Comparison operators +```````````````````` + :: + + =, !=, <, <=, >=, >, IN + + +The syntax to use comparison operators is: + + `VARIABLE attribute VALUE` + +The `=` operator is the default operator and can be omitted, i.e. : + + `VARIABLE attribute = VALUE` + +is equivalent to + + `VARIABLE attribute VALUE` + + +The operator `IN` provides a list of possible values: + +.. sourcecode:: sql + + Any X WHERE X name IN ('chauvat', 'fayolle', 'di mascio', 'thenault') + + +.. _RQLStringOperators: + +String operators +```````````````` +:: + + LIKE, ILIKE, ~=, REGEXP + +The ``LIKE`` string operator can be used with the special character `%` in +a string as wild-card: + +.. sourcecode:: sql + + -- match every entity whose name starts with 'Th' + Any X WHERE X name ~= 'Th%' + -- match every entity whose name endswith 'lt' + Any X WHERE X name LIKE '%lt' + -- match every entity whose name contains a 'l' and a 't' + Any X WHERE X name LIKE '%l%t%' + +``ILIKE`` is the case insensitive version of ``LIKE``. It's not +available on all backend (e.g. sqlite doesn't support it). If not available for +your backend, ``ILIKE`` will behave like ``LIKE``. + +`~=` is a shortcut version of ``ILIKE``, or of ``LIKE`` when the +former is not available on the back-end. + + +The ``REGEXP`` is an alternative to ``LIKE`` that supports POSIX +regular expressions: + +.. sourcecode:: sql + + -- match entities whose title starts with a digit + Any X WHERE X title REGEXP "^[0-9].*" + + +The underlying SQL operator used is back-end-dependent : + +- the ``~`` operator is used for postgresql, +- the ``REGEXP`` operator for mysql and sqlite. + +Other back-ends are not supported yet. + + +.. _RQLOperatorsPriority: + +Operators priority +`````````````````` + +#. `(`, `)` +#. `^`, `<<`, `>>` +#. `*`, `/`, `%`, `&` +#. `+`, `-`, `|`, `#` +#. `NOT` +#. `AND` +#. `OR` +#. `,` + + +.. _RQLSearchQuery: + +Search Query +~~~~~~~~~~~~ + +Simplified grammar of search query: :: + + [ `DISTINCT`] `Any` V1 (, V2) \* + [ `GROUPBY` V1 (, V2) \*] [ `ORDERBY` ] + [ `LIMIT` ] [ `OFFSET` ] + [ `WHERE` ] + [ `WITH` V1 (, V2)\* BEING ()] + [ `HAVING` ] + [ `UNION` ] + +Selection +````````` + +The fist occuring clause is the selection of terms that should be in the result +set. Terms may be variable, literals, function calls, arithmetic, etc. and each +term is separated by a comma. + +There will be as much column in the result set as term in this clause, respecting +order. + +Syntax for function call is somewhat intuitive, for instance: + +.. sourcecode:: sql + + Any UPPER(N) WHERE P firstname N + + +Grouping and aggregating +```````````````````````` + +The ``GROUPBY`` keyword is followed by a list of terms on which results +should be grouped. They are usually used with aggregate functions, responsible to +aggregate values for each group (see :ref:`RQLAggregateFunctions`). + +For grouped queries, all selected variables must be either aggregated (i.e. used +by an aggregate function) or grouped (i.e. listed in the ``GROUPBY`` +clause). + + +Sorting +``````` + +The ``ORDERBY`` keyword if followed by the definition of the selection +order: variable or column number followed by sorting method (``ASC``, +``DESC``), ``ASC`` being the default. If the sorting method is not +specified, then the sorting is ascendant (`ASC`). + + +Pagination +`````````` + +The ``LIMIT`` and ``OFFSET`` keywords may be respectively used to +limit the number of results and to tell from which result line to start (for +instance, use `LIMIT 20` to get the first 20 results, then `LIMIT 20 OFFSET 20` +to get the next 20. + + +Restrictions +```````````` + +The ``WHERE`` keyword introduce one of the "main" part of the query, where +you "define" variables and add some restrictions telling what you're interested +in. + +It's a list of triplets "subject relation object", e.g. `V1 relation +(V2 | )`. Triplets are separated using :ref:`RQLLogicalOperators`. + +.. note:: + + About the negation operator (``NOT``): + + * ``NOT X relation Y`` is equivalent to ``NOT EXISTS(X relation Y)`` + + * ``Any X WHERE NOT X owned_by U`` means "entities that have no relation + ``owned_by``". + + * ``Any X WHERE NOT X owned_by U, U login "syt"`` means "the entity have no + relation ``owned_by`` with the user syt". They may have a relation "owned_by" + with another user. + +In this clause, you can also use ``EXISTS`` when you want to know if some +expression is true and do not need the complete set of elements that make it +true. Testing for existence is much faster than fetching the complete set of +results, especially when you think about using ``OR`` against several expressions. For instance +if you want to retrieve versions which are in state "ready" or tagged by +"priority", you should write : + +.. sourcecode:: sql + + Any X ORDERBY PN,N + WHERE X num N, X version_of P, P name PN, + EXISTS(X in_state S, S name "ready") + OR EXISTS(T tags X, T name "priority") + +not + +.. sourcecode:: sql + + Any X ORDERBY PN,N + WHERE X num N, X version_of P, P name PN, + (X in_state S, S name "ready") + OR (T tags X, T name "priority") + +Both queries aren't at all equivalent : + +* the former will retrieve all versions, then check for each one which are in the + matching state of or tagged by the expected tag, + +* the later will retrieve all versions, state and tags (cartesian product!), + compute join and then exclude each row which are in the matching state or + tagged by the expected tag. This implies that you won't get any result if the + in_state or tag tables are empty (ie there is no such relation in the + application). This is usually NOT what you want. + +Another common case where you may want to use ``EXISTS`` is when you +find yourself using ``DISTINCT`` at the beginning of your query to +remove duplicate results. The typical case is when you have a +multivalued relation such as Version version_of Project and you want +to retrieve projects which have a version: + +.. sourcecode:: sql + + Any P WHERE V version_of P + +will return each project number of versions times. So you may be +tempted to use: + +.. sourcecode:: sql + + DISTINCT ANY P WHERE V version_of P + +This will work, but is not efficient, as it will use the ``SELECT +DISTINCT`` SQL predicate, which needs to retrieve all projects, then +sort them and discard duplicates, which can have a very high cost for +large result sets. So the best way to write this is: + +.. sourcecode:: sql + + ANY P WHERE EXISTS V version_of P + + +You can also use the question mark (`?`) to mark optional relations. This allows +you to select entities related **or not** to another. It is a similar concept +to `Left outer join`_: + + the result of a left outer join (or simply left join) for table A and B + always contains all records of the "left" table (A), even if the + join-condition does not find any matching record in the "right" table (B). + +You must use the `?` behind a variable to specify that the relation to +that variable is optional. For instance: + +- Bugs of a project attached or not to a version + + .. sourcecode:: sql + + Any X, V WHERE X concerns P, P eid 42, X corrected_in V? + + You will get a result set containing all the project's tickets, with either the + version in which it's fixed or None for tickets not related to a version. + + +- All cards and the project they document if any + + .. sourcecode:: sql + + Any C, P WHERE C is Card, P? documented_by C + +Notice you may also use outer join: + +- on the RHS of attribute relation, e.g. + + .. sourcecode:: sql + + Any X WHERE X ref XR, Y name XR? + + so that Y is outer joined on X by ref/name attributes comparison + + +- on any side of an ``HAVING`` expression, e.g. + + .. sourcecode:: sql + + Any X WHERE X creation_date XC, Y creation_date YC + HAVING YEAR(XC)=YEAR(YC)? + + so that Y is outer joined on X by comparison of the year extracted from their + creation date. + + .. sourcecode:: sql + + Any X WHERE X creation_date XC, Y creation_date YC + HAVING YEAR(XC)?=YEAR(YC) + + would outer join X on Y instead. + + +Having restrictions +``````````````````` + +The ``HAVING`` clause, as in SQL, may be used to restrict a query +according to value returned by an aggregate function, e.g. + +.. sourcecode:: sql + + Any X GROUPBY X WHERE X relation Y HAVING COUNT(Y) > 10 + +It may however be used for something else: In the ``WHERE`` clause, we are +limited to triplet expressions, so some things may not be expressed there. Let's +take an example : if you want to get people whose upper-cased first name equals to +another person upper-cased first name. There is no proper way to express this +using triplet, so you should use something like: + +.. sourcecode:: sql + + Any X WHERE X firstname XFN, Y firstname YFN, NOT X identity Y HAVING UPPER(XFN) = UPPER(YFN) + +Another example: imagine you want person born in 2000: + +.. sourcecode:: sql + + Any X WHERE X birthday XB HAVING YEAR(XB) = 2000 + +Notice that while we would like this to work without the HAVING clause, this +can't be currently be done because it introduces an ambiguity in RQL's grammar +that can't be handled by Yapps_, the parser's generator we're using. + + +Sub-queries +``````````` + +The ``WITH`` keyword introduce sub-queries clause. Each sub-query has the +form: + + V1(,V2) BEING (rql query) + +Variables at the left of the ``BEING`` keyword defines into which +variables results from the sub-query will be mapped to into the outer query. +Sub-queries are separated from each other using a comma. + +Let's say we want to retrieve for each project its number of versions and its +number of tickets. Due to the nature of relational algebra behind the scene, this +can't be achieved using a single query. You have to write something along the +line of: + +.. sourcecode:: sql + + Any X, VC, TC WHERE X identity XX + WITH X, VC BEING (Any X, COUNT(V) GROUPBY X WHERE V version_of X), + XX, TC BEING (Any X, COUNT(T) GROUPBY X WHERE T ticket_of X) + +Notice that we can't reuse a same variable name as alias for two different +sub-queries, hence the usage of 'X' and 'XX' in this example, which are then +unified using the special `identity` relation (see :ref:`VirtualRelations`). + +.. warning:: + + Sub-queries define a new variable scope, so even if a variable has the same name + in the outer query and in the sub-query, they technically **aren't** the same + variable. So: + + .. sourcecode:: sql + + Any W, REF WITH W, REF BEING + (Any W, REF WHERE W is Workcase, W ref REF, + W concerned_by D, D name "Logilab") + + could be written: + + .. sourcecode:: sql + + Any W, REF WITH W, REF BEING + (Any W1, REF1 WHERE W1 is Workcase, W1 ref REF1, + W1 concerned_by D, D name "Logilab") + + Also, when a variable is coming from a sub-query, you currently can't reference + its attribute or inlined relations in the outer query, you've to fetch them in + the sub-query. For instance, let's say we want to sort by project name in our + first example, we would have to write: + + .. sourcecode:: sql + + + Any X, VC, TC ORDERBY XN WHERE X identity XX + WITH X, XN, VC BEING (Any X, COUNT(V) GROUPBY X,XN WHERE V version_of X, X name XN), + XX, TC BEING (Any X, COUNT(T) GROUPBY X WHERE T ticket_of X) + + instead of: + + .. sourcecode:: sql + + Any X, VC, TC ORDERBY XN WHERE X identity XX, X name XN, + WITH X, XN, VC BEING (Any X, COUNT(V) GROUPBY X WHERE V version_of X), + XX, TC BEING (Any X, COUNT(T) GROUPBY X WHERE T ticket_of X) + + which would result in a SQL execution error. + + +Union +````` + +You may get a result set containing the concatenation of several queries using +the ``UNION``. The selection of each query should have the same number of +columns. + +.. sourcecode:: sql + + (Any X, XN WHERE X is Person, X surname XN) UNION (Any X,XN WHERE X is Company, X name XN) + + +.. _RQLFunctions: + +Available functions +~~~~~~~~~~~~~~~~~~~ + +Below is the list of aggregate and transformation functions that are supported +nativly by the framework. Notice that cubes may define additional functions. + +.. _RQLAggregateFunctions: + +Aggregate functions +``````````````````` ++------------------------+----------------------------------------------------------+ +| ``COUNT(Any)`` | return the number of rows | ++------------------------+----------------------------------------------------------+ +| ``MIN(Any)`` | return the minimum value | ++------------------------+----------------------------------------------------------+ +| ``MAX(Any)`` | return the maximum value | ++------------------------+----------------------------------------------------------+ +| ``AVG(Any)`` | return the average value | ++------------------------+----------------------------------------------------------+ +| ``SUM(Any)`` | return the sum of values | ++------------------------+----------------------------------------------------------+ +| ``COMMA_JOIN(String)`` | return each value separated by a comma (for string only) | ++------------------------+----------------------------------------------------------+ + +All aggregate functions above take a single argument. Take care some aggregate +functions (e.g. ``MAX``, ``MIN``) may return `None` if there is no +result row. + +.. _RQLStringFunctions: + +String transformation functions +``````````````````````````````` + ++-----------------------------------------------+-----------------------------------------------------------------+ +| ``UPPER(String)`` | upper case the string | ++-----------------------------------------------+-----------------------------------------------------------------+ +| ``LOWER(String)`` | lower case the string | ++-----------------------------------------------+-----------------------------------------------------------------+ +| ``LENGTH(String)`` | return the length of the string | ++-----------------------------------------------+-----------------------------------------------------------------+ +| ``SUBSTRING(String, start, length)`` | extract from the string a string starting at given index and of | +| | given length | ++-----------------------------------------------+-----------------------------------------------------------------+ +| ``LIMIT_SIZE(String, max size)`` | if the length of the string is greater than given max size, | +| | strip it and add ellipsis ("..."). The resulting string will | +| | hence have max size + 3 characters | ++-----------------------------------------------+-----------------------------------------------------------------+ +| ``TEXT_LIMIT_SIZE(String, format, max size)`` | similar to the above, but allow to specify the MIME type of the | +| | text contained by the string. Supported formats are text/html, | +| | text/xhtml and text/xml. All others will be considered as plain | +| | text. For non plain text format, sgml tags will be first removed| +| | before limiting the string. | ++-----------------------------------------------+-----------------------------------------------------------------+ + +.. _RQLDateFunctions: + +Date extraction functions +````````````````````````` + ++----------------------+----------------------------------------+ +| ``YEAR(Date)`` | return the year of a date or datetime | ++----------------------+----------------------------------------+ +| ``MONTH(Date)`` | return the month of a date or datetime | ++----------------------+----------------------------------------+ +| ``DAY(Date)`` | return the day of a date or datetime | ++----------------------+----------------------------------------+ +| ``HOUR(Datetime)`` | return the hours of a datetime | ++----------------------+----------------------------------------+ +| ``MINUTE(Datetime)`` | return the minutes of a datetime | ++----------------------+----------------------------------------+ +| ``SECOND(Datetime)`` | return the seconds of a datetime | ++----------------------+----------------------------------------+ +| ``WEEKDAY(Date)`` | return the day of week of a date or | +| | datetime. Sunday == 1, Saturday == 7. | ++----------------------+----------------------------------------+ + +.. _RQLOtherFunctions: + +Other functions +``````````````` ++-------------------+--------------------------------------------------------------------+ +| ``ABS(num)`` | return the absolute value of a number | ++-------------------+--------------------------------------------------------------------+ +| ``RANDOM()`` | return a pseudo-random value from 0.0 to 1.0 | ++-------------------+--------------------------------------------------------------------+ +| ``FSPATH(X)`` | expect X to be an attribute whose value is stored in a | +| | :class:`BFSStorage` and return its path on the file system | ++-------------------+--------------------------------------------------------------------+ +| ``FTIRANK(X)`` | expect X to be an entity used in a has_text relation, and return a | +| | number corresponding to the rank order of each resulting entity | ++-------------------+--------------------------------------------------------------------+ +| ``CAST(Type, X)`` | expect X to be an attribute and return it casted into the given | +| | final type | ++-------------------+--------------------------------------------------------------------+ + + +.. _RQLExamples: + +Examples +~~~~~~~~ + +- *Search for the object of identifier 53* + + .. sourcecode:: sql + + Any X WHERE X eid 53 + +- *Search material such as comics, owned by syt and available* + + .. sourcecode:: sql + + Any X WHERE X is Document, + X occurence_of F, F class C, C name 'Comics', + X owned_by U, U login 'syt', + X available TRUE + +- *Looking for people working for eurocopter interested in training* + + .. sourcecode:: sql + + Any P WHERE P is Person, P work_for S, S name 'Eurocopter', + P interested_by T, T name 'training' + +- *Search note less than 10 days old written by jphc or ocy* + + .. sourcecode:: sql + + Any N WHERE N is Note, N written_on D, D day> (today -10), + N written_by P, P name 'jphc' or P name 'ocy' + +- *Looking for people interested in training or living in Paris* + + .. sourcecode:: sql + + Any P WHERE P is Person, EXISTS(P interested_by T, T name 'training') OR + (P city 'Paris') + +- *The surname and firstname of all people* + + .. sourcecode:: sql + + Any N, P WHERE X is Person, X name N, X firstname P + + Note that the selection of several entities generally force + the use of "Any" because the type specification applies otherwise + to all the selected variables. We could write here + + .. sourcecode:: sql + + String N, P WHERE X is Person, X name N, X first_name P + + + Note: You can not specify several types with * ... where X is FirstType or X is SecondType*. + To specify several types explicitly, you have to do + + + .. sourcecode:: sql + + Any X WHERE X is IN (FirstType, SecondType) + + +.. _RQLInsertQuery: + +Insertion query +~~~~~~~~~~~~~~~ + + `INSERT` V1 (, V2) \ * `:` + [ `WHERE` ] + +:assignments: + list of relations to assign in the form `V1 relationship V2 | ` + +The restriction can define variables used in assignments. + +Caution, if a restriction is specified, the insertion is done for +*each line result returned by the restriction*. + +- *Insert a new person named 'foo'* + + .. sourcecode:: sql + + INSERT Person X: X name 'foo' + +- *Insert a new person named 'foo', another called 'nice' and a 'friend' relation + between them* + + .. sourcecode:: sql + + INSERT Person X, Person Y: X name 'foo', Y name 'nice', X friend Y + +- *Insert a new person named 'foo' and a 'friend' relation with an existing + person called 'nice'* + + .. sourcecode:: sql + + INSERT Person X: X name 'foo', X friend Y WHERE Y name 'nice' + +.. _RQLSetQuery: + +Update and relation creation queries +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + `SET` + [ `WHERE` ] + +Caution, if a restriction is specified, the update is done *for +each result line returned by the restriction*. + +- *Renaming of the person named 'foo' to 'bar' with the first name changed* + + .. sourcecode:: sql + + SET X name 'bar', X firstname 'original' WHERE X is Person, X name 'foo' + +- *Insert a relation of type 'know' between objects linked by + the relation of type 'friend'* + + .. sourcecode:: sql + + SET X know Y WHERE X friend Y + + +.. _RQLDeleteQuery: + +Deletion query +~~~~~~~~~~~~~~ + + `DELETE` ( V) | (V1 relation v2 ),... + [ `WHERE` ] + +Caution, if a restriction is specified, the deletion is made *for +each line result returned by the restriction*. + +- *Deletion of the person named 'foo'* + + .. sourcecode:: sql + + DELETE Person X WHERE X name 'foo' + +- *Removal of all relations of type 'friend' from the person named 'foo'* + + .. sourcecode:: sql + + DELETE X friend Y WHERE X is Person, X name 'foo' + + +.. _Yapps: http://theory.stanford.edu/~amitp/yapps/ +.. _Left outer join: http://en.wikipedia.org/wiki/Join_(SQL)#Left_outer_join +