How to handle an excessive number of included resources?


#1

Hi,

My team and I have ran into an issue with included resources that we would appreciate some advice on.

We have some resources that have “to-many” relationships that can run into the hundreds and for performance reasons we think it best to implement a strategy that restricts the number of included resources that can be returned in a response.

Here are the solutions we have came up with:

  1. Set a hard limit (x) in the API. The rationale is that if more than the x resources are needed they can follow the related link which supports pagination.
    For example, if we set the limit to 25, in the below response there could only be a maximum of 25 entries in the relationships.testpoints.data array regardless of the actual number for that test. In order to indicate that there are more than 25 testpoints we discussed adding a meta object with the count in it.
2. Add pagination to the relationship object. However, we are having a problem figuring out the request structure.
    In this case there would be a testpoints.meta object containing pagination but we aren't sure how to go about it.

{“data”:[
    {“type”: “test”
    “id”: “test1”
    “attributes”: {
        “date”: “2015-11-17 11:25:00”,
        “unit_tested”: “unit1”,
        “tester_id”: “1234”,
        “results”: “Pass”,
        …
    },
    “relationships”: {
        “testpoints”: {
            “links”: {“related”: “http://url.com/v1/test/1/testpoints”},
            “data”: [
                {“type”: “testpoint”, “id”: “tp1”},
                {“type”: “testpoint”, “id”: “tp2”},
                {“type”: “testpoint”, “id”: “tp3”},
                …
                {“type”: “testpoint”, “id”: “tpx”}
            ]
        },
        “uut”: {
            “links”: {“related”: “http://url.com/v1/uuts/unit1”},
            “data”: []
        }
    },
    “links”: {“self”: “http://url.com/v1/tests/1”}
    },
    …
],
“included”:[
    {“type”: “testpoint”,
    “id”: “tp1”
    “attributes”: {
        “number”: 1,
        “value”: 0.1,
        …
    }},
    {“type”: “testpoint”,
    “id”: “tp2”
    “attributes”: {
        “number”: 2,
        “value”: 0.2,
        …
    }},
    {“type”: “testpoint”,
    “id”: “tp3”
    “attributes”: {
        “number”: 3,
        “value”: 0.3,
        …
    }},
    …
    {“type”: “testpoint”,
    “id”: “tpx”
    “attributes”: {
        “number”: x,
        “value”: x.x,
        …
    }}
],
“meta”: {
    “pagination”: {
        “page_number”: 1,
        “page_size”: 10,
        “page_count”: 37,
        “total_count”: 364,
        “previous_page”: null,
        “next_page”: 2,
        “previous_href”: null,
        “next_href”: “http://url.com/v1/tests?include=testpoint&pagenumber=2&pagesize=10
},
    “links”: {“http://url.com/v1/tests?include=testpoint”}
}
}


#2

Adding pagination to the relationship object is definitely the way to go. That should look like this:

{
  "type": "test",
  "id": "test1",
  "attributes": { /* ... */ },
  "relationships": {
    "testpoints": {
      "data": [ /* first x linkage items here */ ],
      "links": {
        "next": "http://url.com/v1/test/1/relationships/testpoints?pagenumber=2&pagesize=x",
        "related": "http://url.com/v1/test/1/testpoints"
      }
    }
  }

Note that the next link in the relationship object must point to a URI that returns the next set resource identifier objects in its primary data, not the next set of full testpoint resources. (That is, the link is like a partial view of the "self" link’s content, not the "related" link’s.)

Also, as an aside, some of the pagination links in your example response should probably live in the top-level "links", not "meta", as specified here. Like this:

{
   "meta": {
     "pagination": {
       "page_number": 1,
       "page_size": 10,
       "page_count": 37,
       "total_count": 364,
       "previous_page": null, // are previous_page and
       "next_page": 2         // next_page necessary here?
     }
   },
   "links": {
     /* `next` and `prev` in links instead of `next_href`/`previous_href` */
     /*  in meta. also, you can use `last` if it helps */
     "next": "http://url.com/v1/tests?include=testpoint&pagenumber=2&pagesize=10",
     "prev": null,
     "last": "http://url.com/v1/tests?include=testpoint&pagenumber=37&pagesize=10",
     "self": "http://url.com/v1/tests?include=testpoint"
   }
}

#3

Thank you for your assistance.

So that we understand this correctly, when following the next link the response will still be the “test1” resource but with page two of the “testpoints” resource identifiers?

{
  "type": "test",
  "id": "test1",
  "attributes": { /* ... */ },
  "relationships": {
    "testpoints": {
      "data": [ /* second x linkage items here */ ],
      "links": {
        "prev": "http://url.com/v1/test/1/relationships/testpoints?pagenumber=1&pagesize=x",
        "next": "http://url.com/v1/test/1/relationships/testpoints?pagenumber=3&pagesize=x",
        "related": "http://url.com/v1/test/1/testpoints"
      }
    }
  }

Would you then add a links object under relationships to support paging more than one “to-many” relationship?

{
  "type": "test",
  "id": "test1",
  "attributes": { /* ... */ },
  "relationships": {
    "testpoints": {
      "data": [ /* first x linkage items here */ ],
      "links": {
        "next": "http://url.com/v1/test/1/relationships/testpoints?pagenumber=2&pagesize=x",
        "related": "http://url.com/v1/test/1/testpoints"
      }
    },
    "measerrors": {
        "data": [ /* first x linkage items here */ ],
        "links": {
            "next": "http://url.com/v1/test/1/relationships/measerrors?pagenumber=2&pagesize=x",
            "related": "http://url.com/v1/test/1/measerrors"
        }
    },
    "links": {
        "next": "http://url.com/v1/test/1/relationships?pagenumber=3&pagesize=x"
      }
  }  

Can you use include on a collection or just on a resource?
Would pagination for fields be handled in the same way?


#4

Not quite. When following the next link the response’s primary data will be only the next set of resource identifier objects, like this:

{
  "data": [
    /* second x resource identifier objects */
  ],
  // optionally, you can add links too
  "links": {
    "prev": "http://url.com/v1/test/1/relationships/testpoints?pagenumber=1&pagesize=x",
    "next": "http://url.com/v1/test/1/relationships/testpoints?pagenumber=3&pagesize=x",
    "related": "http://url.com/v1/test/1/testpoints"
  }
}

The idea here is that the original testpoints relationship is a collection of resource identifier objects, so its pagination links should all return portions of that collection as their primary data.

Yes, but those links should go under each relationship in the original response for the "test1" resource. So, the full response for the tests collection would look like this:

// GET /v1/tests?include=testpoints

{
  "data": [{
    "type": "test",
    "id": "test1",
    "links": {
      "self": "http://url.com/v1/tests/1"
    }
    "attributes": { /* data for test1 */ },
    "relationships": {
      "testpoints": {
        "data": [ /* first x linkage items here */ ],
        "links": {
          "next": "http://url.com/v1/test/1/relationships/testpoints?pagenumber=2&pagesize=x",
          "related": "http://url.com/v1/test/1/testpoints"
        }
      },
      "uut": { 
        // exactly the same structure (with limited data + next) 
        // as the testpoints relationship
      }
    }
  }, 
    /* 9 other test resources here */
  ],
  "included": [
    // full resources for all the resource identifier objects
    // given in the data above for the requested relationships.
    // i.e. `?include=testpoints` will lead to 10*x resources being
    // stored here (10 for each test0; `?include=testpoints,uut` 
    // would have 2*10*x.
  ],
  "links": {
    "next": "http://url.com/v1/tests?include=testpoint&pagenumber=2&pagesize=10",
    "prev": null,
    "last": "http://url.com/v1/tests?include=testpoint&pagenumber=37&pagesize=10",
    "self": "http://url.com/v1/tests?include=testpoint"
  },
  "meta": { /* custom pagination info for the tests collection */ }
}

Yup, on collections too!

I’m not sure I understand this. In JSON API terms, “fields” just means a resource’s attributes and relationships. Attributes can’t be paginated (since their values are just single blobs of data from the spec’s POV, not lists) and relationships are paginated per above.


#5

Again, thank you. We really appreciate you helping us out.

This is where we were confused, we were thinking that …\tests\1?include=testpoints was basically the same request as …\tests\1\relationships\testpoints

My bad, by “fields” I meant the query parameter for sparse fieldsets.
We thought that if an attribute from a relationship resource was requested that the response would be resource identifier objects/included. Since relationships are a collection of resource identifiers you would handle pagination the same whether returning the whole resource or a specific fieldset.

http://url.com/v1/units/unit1?fields[unit]=mfg,model,sernum&fields[test]=date,results
{"data":[
    {"type": "unit",
    "id": "unit1",
    "attributes":{
	    "mfg": "Acme",
	    "model": "A",
	    "sernum": "1"
	},
    "relationships": {
        "tests": {
            "data": [ /* resource identifying objects */ ],
            "links": { /* ... */}
        }        
    },
    "links":{
        "self": "http://url.com/v1/units/unit1"
    }}
],
"included":[
    {"type":"test",
    "id":"1",
    "attributes": {
        "date": "2015-11-18",
        "results": "pass"
        /* only the requested attributes */
    }},
    {"type":"test",
    "id":"9",
    "attributes": {
        "date": "2014-11-15",
        "results": "pass"
        /* only the requested attributes */
    }},
    ...
],
"links":{
    "self": "http://url.com/v1/units/unit1?fields[test]=date,results"
},
"meta":{ /* ... */}
}

#6

Got it! And it sounds like the confusion is cleared up now. I’d love to fix whatever it was on the site that initially gave you this other impression. Is there some spec text/example you can point me to that caused the problem? Or a change you think would save others from this confusion?

Yup, you got it! The example JSON you posted is perfect.


#7

Here is where the relationships self and related links are defined.

I think that expanding the self link explanation like would help.

self: a link to the resource identifier object or to the collection of resource identifier objects that define this relationship. This link allows the client to directly manipulate the relationship. For example, it would allow a client to remove an author from an article without deleting the people resource itself.

An example wouldn’t hurt either.

GET /articles/1/relationships/comments HTTP/1.1
Accept: application/vnd.api+json

{
  "data": [
      { "type": "comments", "id": "5" },
      { "type": "comments", "id": "12" }      
  ],
  "included": [{
    {
    "type": "comments",
    "id": "5",
    "attributes": {
      "body": "First!"
    }
  }, {
    "type": "comments",
    "id": "12",
    "attributes": {
      "body": "I like XML better"
    }
  }],
  "links": {
      "self": "http://example.com/articles/1/relationships/comments"
    }
}

#8

@JFiddie Perfect, that helps a lot!

Proof that the spec needs better organization: there actually is already a section that explains the self link in more detail and gives an example response. So it seems like the real problem is that that section is very far away from the place where the link is defined, and so it’s very hard to find.

I worked up a PR from this conversation that adds a brief description of the "self" link’s response, and links to more details, at the point where the "self" link is defined. Check it out and let me know if you think it’s sufficient or if something is still missing.

Thanks again! Hopefully we’ll have an Overview/Getting Started section soon, so that people don’t need to dig through all the spec jargon. But until then, feedback like this that makes the spec more readable is really invaluable!