Hi Maark,
thank you for your interest! To be honest, I decided against conforming to the standard for the moment. It appeared much easier (both for the (currently single) consumer and myself) to go for something more specific for the moment.
However, since you expressed interest, below is the specification I have now (this is still a draft, I am currently doing all the programming to see whether I overlooked something). But let me first try to answer your question (and possibly those from the other threads).
Background: a statistics and maps are simply (certain) functions in the mathematical sense of the word, and collections are certain sets on which these functions are defined. Best example: a collection might be the set of permutations of the set {1,…,n}, a statistic might be “give me the image of 1” and a map might be “give me the inverse permutation”. The database stores the first few thousand images of each function. I call these “Values” below.
First of all, I decided to get rid of the slashes. Instead I will use “Mp00020oMp00101oSt000005”. The (mathematical) meaning is: first apply map 20, then map 101 and finally statistic 5. The slashes would have been good for backward compatibility, but I agree that they are not a very good idea otherwise.
The main problem the layout should solve is that I can get all necessary data (and no more) with a single request. Some examples:
- a single statistic, containing its name and values (and some other data), together with a list of “exactly matching” statistics and their names (but not their values)
- a list of all statistics, containing the name of the statistic and the name of the collection it is defined on.
- a list of all compound statistics composed of at most 3 maps and a statistic, such that the values match a given list of values; the list should contain the values of the compound statistics (but not the values of the intermediate maps and the statistic it is composed of), along with the name of the maps and the statistic.
One thing I’d really like to have is that the fields to return are very easy to specify. In my use case, it seems to make sense that they depend on the type of the object only, although I actually have encountered an exception to this rule. (you may spot it in example 1 above).
I always want to include all objects that are referred to in a field that is specified.
Here is my current working draft:
- At the top level, I will have essentially
{ data: [<id>, ...],
included: {<type>: {<id>: <object>, ... }, ...},
}
- every has a , which determines the available fields
-
(<type>, <id>)
is globally unique
- the
<id>
's in the value list of data
all have the same type, which is (essentially) the required resource
-
<object>
is the dictionary of fields of an object
- any field of an object may have as value an
<id>
, with an implicit type (determined by the name of the field and the type of the object)
- all these objects are included in the
included
field
- there is one special query parameter
fields[<type>]
which determines the fields actually returned for each type
Using this scheme, the example in the original post, corresponding to the request `GET findstat.org/Statistics/Mp00020oMp00101oSt000005?fields[“CompoundStatistics”]=Values,Distribution``` would read as follows:
{
"data":[
"Mp00020oMp00101oSt000005"
],
"included":{
"CompoundStatistics":{
"Mp00020oMp00101oSt000005":{
"Values":"[.,.] => [1,0] => [1,0] => 0\n[.,[.,.]] => [1,1,0,0] => [1,0,1,0] => 1\n[[.,.],.] => [1,0,1,0] => [1,1,0,0] => 0\n...",
"Distribution":{
"1":{
"0":1
},
"2":{
"0":1,
"1":1
}
}
}
}
}
}