[testlib] gather all repository access logic in one place
Refactoring of the repository access API in test is imminent. We plan to move
from the "old" dbapi to the new repoapi.
Gathering all impacted method in one place help to understand how all those
method interact and help readability for both patch and resulting code.
No code change is done at all in this changeset. The refactoring will code
later.
# -*- coding: utf-8 -*-"""Diseasome data import module.Its interface is the ``entities_from_rdf`` function."""importreRE_RELS=re.compile(r'^<(.*?)>\s<(.*?)>\s<(.*?)>\s*\.')RE_ATTS=re.compile(r'^<(.*?)>\s<(.*?)>\s"(.*)"(\^\^<(.*?)>|)\s*\.')MAPPING_ATTS={'bio2rdfSymbol':'bio2rdf_symbol','label':'label','name':'name','classDegree':'class_degree','degree':'degree','size':'size'}MAPPING_RELS={'geneId':'gene_id','hgncId':'hgnc_id','hgncIdPage':'hgnc_page','sameAs':'same_as','class':'classes','diseaseSubtypeOf':'subtype_of','associatedGene':'associated_genes','possibleDrug':'possible_drugs','type':'types','omim':'omim','omimPage':'omim_page','chromosomalLocation':'chromosomal_location'}def_retrieve_reltype(uri):""" Retrieve a relation type from an URI. Internal function which takes an URI containing a relation type as input and returns the name of the relation. If no URI string is given, then the function returns None. """ifuri:returnuri.rsplit('/',1)[-1].rsplit('#',1)[-1]def_retrieve_etype(tri_uri):""" Retrieve entity type from a triple of URIs. Internal function whith takes a tuple of three URIs as input and returns the type of the entity, as obtained from the first member of the tuple. """iftri_uri:returntri_uri.split('> <')[0].rsplit('/',2)[-2].rstrip('s')def_retrieve_structure(filename,etypes):""" Retrieve a (subject, relation, object) tuples iterator from a file. Internal function which takes as input a file name and a tuple of entity types, and returns an iterator of (subject, relation, object) tuples. """withopen(filename)asfil:forlineinfil:if_retrieve_etype(line)notinetypes:continuematch=RE_RELS.match(line)ifnotmatch:match=RE_ATTS.match(line)subj=match.group(1)relation=_retrieve_reltype(match.group(2))obj=match.group(3)yieldsubj,relation,objdefentities_from_rdf(filename,etypes):""" Return entities from an RDF file. Module interface function which takes as input a file name and a tuple of entity types, and returns an iterator on the attributes and relations of each entity. The attributes and relations are retrieved as dictionaries. >>> for entities, relations in entities_from_rdf('data_file', ('type_1', 'type_2')): ... """entities={}forsubj,rel,objin_retrieve_structure(filename,etypes):entities.setdefault(subj,{})entities[subj].setdefault('attributes',{})entities[subj].setdefault('relations',{})entities[subj]['attributes'].setdefault('cwuri',unicode(subj))ifrelinMAPPING_ATTS:entities[subj]['attributes'].setdefault(MAPPING_ATTS[rel],unicode(obj))ifrelinMAPPING_RELS:entities[subj]['relations'].setdefault(MAPPING_RELS[rel],set())entities[subj]['relations'][MAPPING_RELS[rel]].add(unicode(obj))return((ent.get('attributes'),ent.get('relations'))forentinentities.itervalues())