Composite ID inside the resource object


#1

Hi!

We have use case where clients of the API are actually have to create new resource with their IDs. More technically, clients are doing POST on /a/{idOfA}/b/{idOfB}. a and b are in many-to-many relation. Therefore we would like to have something like:

{ "data": { 
     "id": ["idOfA", "idOfB"],
     ...
}}

Or even better to have JSON object as ID instead of array like:

{ "data": { 
    "id": {
      "a": "idOfA",
      "b": "idOfB"
    },
    ...
}}

Are there any plans to support that? Or how do you guys solve such an issues? I know we could put something like "id": "idOfA;idOfB" and then parse it but it is ugly! :smiley:

Thanks!


What are the constraints for 'id' member?
#2

The specification supports two ways of doing this.

Option 1
Create both resources then update the relationship.

POST Location: http://example.com/a/{idOfa} Content-Type: application/vnd.api+json

POST Location: http://example.com/b/{idOfb} Content-Type: application/vnd.api+json

Then update the relationship using a PATCH to one of the resources: http://jsonapi.org/format/#crud-updating-resource-relationships

PATCH http://www.example.com/a/{idOfa} Content-Type: application/vnd.api+json Accept: application/vnd.api+json

{
      "data": {
              "type": "a",
               "id": "{idOfa}",
              "relationships": {
                     "b": {
                         "data": { "type": "b", "id": "{idOfb}" }
   // Snip for brevity

If there is back linking from b to a then I recommend that the server maintains that relationship.

Option 2
The other way is to POST article A with the relationships included.

POST http://www.example.com/a/{idOfa} Content-Type: application/vnd.api+json Accept: application/vnd.api+json

{
  "data": {
          "type": "a",
           "id": "{idOfa}",
          "relationships": {
                 "b": {
                     "data": { "type": "b", "id": "{idOfb}" }
// Snip for brevity

#3

@brainwipe first of all thank you for the suggestion to model the problem with relations.
However, this is not an answer to the original problem (the question wasn’t properly defined, sorry)

Let me clarify it:

  • Some entities have a natural unique key (https://en.wikipedia.org/wiki/Natural_key) coming from business
    Example: consider Product that is defined by EAN (European Article Number). In this case EAN is used as “id” attribute as it uniquely identifies a product (at least in Europe).

  • Other entities don’t have a unique key that comes from business, but they do have a surrogate unique key (https://en.wikipedia.org/wiki/Surrogate_key) that is generated by db (Using SERIAL type in Postres, for example). In this case its also can be used in “id” attribute.

  • And there are entities that have compound keys and no surrogate keys. The problem is that they don’t fit into string-typed “id” attribute

In your proposal above there is an assumption that both entities “a” and “b” have scalar keys (either natural or surrogate).

Please consider this somewhat contrived example:
We are modeling and inventory microservice that is capable of saving quantities of products per merchant. Some other microservices are responsible for Merchant and Product entities; So there is a Quantity entity our Inventory microservice is responsible for.

And its not clear how to expose Quantity using JSON-API compliant API because its not clear what to put in its “id” attribute.


#4

Why not simply use <merchantID>_<ean> (or any other separator character that works well for you) as the ID for the Quantity resource?


#5

Lets compare:

id as an underscore-separated strings

"id": "1234_5678"

with

id as a JSON object

"id": {
  "merchant": "1234",
  "ean": "5678"
}
  • Separator-char question: why underscore and not anything else (dash, slash, pipe…)? Different APIs will use different separator characters, which is bad for obvious reasons. Now lets say spec prescribes to use ‘_’ always:
    How can we guarantee that parts of composite key never end/start from underscore, which makes parsing ambigous? Consider such merchant-id or ean: _0AB345TYX_, after concatenation looks like: _0AB345TYX__5678.

  • Ordering question: just by looking at response (1234_5678) its hard to say which value belongs to merchant-id and which to ean. Client (program or human) must hardcode/remember the convention and always follow it. It makes the data structure less self-decriptive.

  • Unmarshalling question: typical backend models composite key with some data structure.
    Example in scala:

case class QuantityId(merchantId: MerchantId, ean: EAN13)

Every JSON library makes it easy to translate JSON <--> Class if JSON fields are 1:1 with class attributes, otherwise a custom marshaller/unmarshaller must be written to support _-separated values.

So, while separated strings may work in certain cases, its a bad design choice in the general case because it lacks structure.


#6

Different APIs will use different separator characters, which is bad for obvious reasons.

I don’t agree that this is necessarily bad, but if it bothers you then you could always create a unique surrogate key for Quantity then just use the merchant ID or EAN as search criteria when looking for a Quantity.


#7

you could always create a unique surrogate key for Quantity

I challenge this assumption. Creating surrogate unique key means it has to be persisted. For a stateless-facade applications its a quite heavy requirement.


#8

If you’re never going to use the ID for a Quantity resource (you’re always going to search via merchant and EAN) then you could just assign a transient UUID - the spec doesn’t require an id to be persistent, just that it (when combined with a type) must map to at most one resource: http://jsonapi.org/format/#document-resource-object-identification


#9

Its not clear from the spec if id should be stable (same value returned if requested multiple times sequentially or in parallel). Transient ids generated with something like UUID.randomUUID() don’t posses this property.
Since id represents resource object’s identity, it makes full sense not to change it from request to request.


#10

Since id represents resource object’s identity, it makes full sense not to change it from request to request.

Sure, but I didn’t necessarily mean transient as varying per request, just that if your API crashed and needed to be restarted the same Quantity could have a different ID.
As I understand it, none of your clients ever use this ID for anything; they always use merchant + EAN, and you just need to put something in the ID to be compliant with the JSON API spec, not because you actually want to use it. Or have I misunderstood?


#11

Yes, none of our applications needs a surrogate id key, as we’re using a tuple of (merchant-id, ean) as a key.
Regarding the transient id: please also consider multple-stateless-nodes setup where every subsequent request can be load balanced to another node that will serve another id.


#12

please also consider multiple-stateless-nodes setup where every subsequent request can be load balanced to another node that will serve another id.

I agree a transient id probably goes against the spirit of the JSON API spec, but it does seem to be allowed (at least syntactically) and would solve your problem (that you need the id to contain something). Essentially in this case you have id as an irrelevant “write-only” value.

Your only other options would seem to be to

  1. Model every EAN as a related object for a Merchant, then have Quantity as an attribute of EAN (or have Merchants relate to EANs, and have the Merchant hold the quantity).

or

  1. Persist the Quantity ID - in your contrived example you probably have this as a key in your relational database anyway (or you have an event store and you can model the quantity as a total of event data / transactions).

#13

As the application I am currently working on is stateless facade and there is no way to persist a surrogate key - I am going to use id as an irrelevant “write-only” value in order to be compliant with the current spec.
However, I consider it a workaround for the fact that spec doesn’t support aggregate entities with composite keys, which are perfectly normal way to model things.


Representing non-resourceful aggregated data
#14

What about instead of storing a resource that is the quantity, available under a separate URL, you have it as meta information on the relationship between merchant and product. That way when you load either the merchant or the product information, if you return a relationship to the other it includes this information with it.


#15

I have another idea. What if you calculate ID on fly based on your “foreign keys” (merchant-id, ean) with some sort of hashing algorithm (MD5, or even simpler). Therefore the ID would be always the same for same merchant-id, ean combination.


#16

I am currently thinking about the same problem. Please allow me to brain dump my thoughts:

We have a data structure, where an object has many children and the children have attributes that are only valid within the context of this relationship. As an example, let’s say we have a photo album, which contains many photos, and each photo can contain many likes. A like is a relationship between a photo and a user. So far so simple. But: the same photo can have different likes in different albums.

From an api perspective it would be nice to add a photo to an album by its unique photo id

POST /albums/345/relationships/photos/

{
    "type": "photos",
    "id": "4"
}

So we can load the photo by

GET /albums/345/photos/4

Then we can add a like by

POST /albums/345/photos/4/relationships/likes

{ "type": "users", "id": "478" }

Then photo with id: 4 has a like by user 478 in album 345.

However in a different album, the same photo might have a different set of likes. So now we have an entity uniquely identified by an ID, but with different likes depending on the album.

From an API perspective this is not even a problem I guess. But a client would need to know that it can store the likes of a photo only in the context of an album and not on the photo object itself. Does this make sense?

To model this more explicitly, we could define a album_photo, that is: a photo contained in an album:

POST /albums/345/photos/

{
    "type": "album_photo",
    "data" : {
        "relationships" : {
            "photo": { "type": "photos", "id": "4"},
            "likes": [ { "type": "users", "id": "478" } ]
        }
    }
}

Then the server could respond with an opaque ID object which identifies this resource

CREATED

{
    "type": "album_photo",
    "id": "eyJwaG90b0lkIjoiNCIsImFsYnVtSWQiOiIzNDUifQo%3D",
    "data" : {
        "relationships" : {
            "photo": { "type": "photos", "id": "4"},
            "likes": [ { "type": "users", "id": "478" } ]
        }
    }
}

If the album already contains an album_photo with a link to the same photos entity, the server would need to either upsert the like, or better revoke the POST. Then the client could load the list of album_photos, find the one with the right photo id and then POST the like to its like relationship.

So, if you don’t want to have the album_photo id persistent on the server, it can infact be a compound value that is generated on the server, and the server knows how to convert it. Eg. {“photoId”:“4”,“albumId”:“345”} base64 encoded.

Do you guys have any thoughts on this? Somehow I don’t like either approach very much…


#17

Hi Tobias,

What is wrong with:

GET /albums/345/photos/4/likes

[{ "type": "user", "id": "478" }, { "type": "user", "id": "479" }]

And you could add new like like:

POST /albums/345/photos/4/likes

{ "type": "user", "id": "480" }

Does this fit to your use case?

BR


#18

Thanks for your reply, but I think you are missing the point: For the client it is not obvious to understand, that a photo inside a an album can have a set of likes A and in another album another set B, despite it still being the same photo with the same ID.


#19

But yeah… then you would have different set of likes for:

GET /albums/346/photos/4/likes

Or?


#20

You are right, but the photo data would look like this in both albums

{"type": "photos", "id": "4"}

If I would persist the photos object on disk on the client, I could not link the likes to this object, because the likes relation is only valid within an album.