Clarification on allowed characters in id

I wonder whether it is an oversight that all characters are allowed in id’s. Imagine, for example, a second article with “id”: “1/author”. How would this work together with fetching the author corresponding to the article with “id”: “1”?

My motivation for asking is that I do not really understand the reason for supporting a GET request /articles/1/author - isn’t that the same as /articles/1?fields[articles]=author?

Maybe because there are plenty cases where the id is not an int?
/notebook/{id}/note/{id}
/notebook/garden/note/1

“id”: “1/author” is not valid, member names must not have ‘/’ or ‘’ etc.

I may be wrong,
/articles/1/author - isn’t that the same as /articles/1?fields[articles]=author ?
The second version allows only the inclusion of the author field, where the first may include more data on author and article

But the question is about the value of “id”, not the member name, Indeed, the spec says:

The values of `type` members **MUST** adhere to the same constraints as [member names](https://jsonapi.org/format/#document-member-names).

In fact, for me it would probably be much better if an id is allowed to contain slashes.

Why would it be much better? You’re definitely introducing confusion. What makes it worth it?

I admit that I am actually not sure whether I am on the right track. I can only give some background here, you can find the project at https://findstat.org. It currently has no structured API.

I have the following “basic” resources:

  • “Collections”, with identifiers of the form “Cc1234”
  • “Maps”, with identifiers of the form “Mp12345”
  • “Statistics”, with identifiers of the form “St123456”

All of these have some attributes, for example “Description”. “Statistics” and “Maps” also have relationships to “Collections”, for example “Domain” and “Codomain”.

I absolutely do not want to change the above identifiers. In particular, it makes sense to GET such a resource using “Statistics/St123456”.

A user will always have to specify the fields she wants to get, otherwise she only gets “type” and “id”.

Then I have a number of “compound” or “virtual” resources.

  • “CompoundMaps” with identifiers of the form “Mp00020/Mp00101”
  • “CompoundStatistics”, with identifiers of the form “Mp00020/Mp00101/St000005”.

One should be able to GET these using “Statistics/Mp00020/Mp00101/St000005” as a compound document (i.e., not using the endpoint “CompoundStatistics”.)

I realize that this introduces ambiguity: the type of “Statistics/St000005” is “Statistics”, although one might think it is “CompoundStatistics”. (Maybe I should rename “Statistics” to “BasicStatistics” and “CompoundStatistics” to “Statistics”…)

I thought I would represent “CompoundStatistics” as follows:

{
  "type":"CompoundStatistics",
  "id":"Mp00020/Mp00101/St000005",
  "attributes":{
    "Values":"[.,.] => [1,0] => [1,0] => 0\n[.,[.,.]] => [1,1,0,0] => [1,0,1,0] => 1\n[[.,.],.] => [1,0,1,0] => [1,1,0,0] => 0\n...",
    "Distribution":{
      "1":{
        "0":1
      },
      "2":{
        "0":1,
        "1":1
      }
    }
  },
  "relationships":{
    "Maps":{
      "data":[
        {
          "type":"Maps",
          "id":"Mp00020"
        },
        {
          "type":"Maps",
          "id":"Mp00101"
        }
      ]
    }
  },
  "Statistic":{
    "data":{
      "type":"Statistics",
      "id":"St000005"
    }
  }
}

I dislike the “relationships” member quite a bit, it is and should be completely redundant.
It exists only because of the full-linkage requirement.

  • “MatchingStatistics”, whose identifier will possibly be an identifier of a “CompoundStatistic” together with a running index, not sure about that yet.

I’m not sure what you mean by compound. Right now, your layout seems over complicated to me. Slashes in IDs is definitely a bad idea.

I’d like to know more about what maps, collections, stats, and compound resources just for fun. I want to see if I can come up with something simpler.

Hi Maark,

thank you for your interest! To be honest, I decided against conforming to the standard for the moment. It appeared much easier (both for the (currently single) consumer and myself) to go for something more specific for the moment.

However, since you expressed interest, below is the specification I have now (this is still a draft, I am currently doing all the programming to see whether I overlooked something). But let me first try to answer your question (and possibly those from the other threads).

Background: a statistics and maps are simply (certain) functions in the mathematical sense of the word, and collections are certain sets on which these functions are defined. Best example: a collection might be the set of permutations of the set {1,…,n}, a statistic might be “give me the image of 1” and a map might be “give me the inverse permutation”. The database stores the first few thousand images of each function. I call these “Values” below.

First of all, I decided to get rid of the slashes. Instead I will use “Mp00020oMp00101oSt000005”. The (mathematical) meaning is: first apply map 20, then map 101 and finally statistic 5. The slashes would have been good for backward compatibility, but I agree that they are not a very good idea otherwise.

The main problem the layout should solve is that I can get all necessary data (and no more) with a single request. Some examples:

  1. a single statistic, containing its name and values (and some other data), together with a list of “exactly matching” statistics and their names (but not their values)
  2. a list of all statistics, containing the name of the statistic and the name of the collection it is defined on.
  3. a list of all compound statistics composed of at most 3 maps and a statistic, such that the values match a given list of values; the list should contain the values of the compound statistics (but not the values of the intermediate maps and the statistic it is composed of), along with the name of the maps and the statistic.

One thing I’d really like to have is that the fields to return are very easy to specify. In my use case, it seems to make sense that they depend on the type of the object only, although I actually have encountered an exception to this rule. (you may spot it in example 1 above).

I always want to include all objects that are referred to in a field that is specified.

Here is my current working draft:

  • At the top level, I will have essentially
{ data: [<id>, ...],
  included: {<type>: {<id>: <object>, ... }, ...},
}
  • every has a , which determines the available fields
  • (<type>, <id>) is globally unique
  • the <id>'s in the value list of data all have the same type, which is (essentially) the required resource
  • <object> is the dictionary of fields of an object
  • any field of an object may have as value an <id>, with an implicit type (determined by the name of the field and the type of the object)
  • all these objects are included in the included field
  • there is one special query parameter fields[<type>] which determines the fields actually returned for each type

Using this scheme, the example in the original post, corresponding to the request `GET findstat.org/Statistics/Mp00020oMp00101oSt000005?fields[“CompoundStatistics”]=Values,Distribution``` would read as follows:

{
  "data":[
    "Mp00020oMp00101oSt000005"
  ],
  "included":{
    "CompoundStatistics":{
      "Mp00020oMp00101oSt000005":{
        "Values":"[.,.] => [1,0] => [1,0] => 0\n[.,[.,.]] => [1,1,0,0] => [1,0,1,0] => 1\n[[.,.],.] => [1,0,1,0] => [1,1,0,0] => 0\n...",
        "Distribution":{
          "1":{
            "0":1
          },
          "2":{
            "0":1,
            "1":1
          }
        }
      }
    }
  }
}

I can’t help thinking that the topic of “composite” keys is of some potential relevance here.
To identify a single resource, the spec wants you to use a unique ID to do so. Building that ID up as a composite of other IDs is certainly possible but can run into namespace issues or allowable character issues.

Have a look at some of these for potential inspiration: