Representing non-resourceful aggregated data

I’m working on an app for which we’re studying the possibility of using jsonapi to describe the data in all API responses. It works pretty well for resource-like entities, but we’re having some trouble coming up with a way to describe reports data in a jsonapi-like way.

By reports data I refer to data aggregated and computed on the fly from the base resource-like data we store in our database. For instance, imagine we store real estate information, and we have information about houses, apartments and office space, each associated to location, size of the property in square feet, type of property (wether it’s a house, apartment or office space), and any other relevant information about a property. Now imagine we need a report that receives ?group_by[]=location&group_by[]=type and we want the response to convey aggregated information about the intersection of those two group_by parameters. So we would receive for instance, an object containing the average square feet area of all properties in a given location, grouped also by type of property.

Average Property Sizes (in square feet)
                Houses    Apartments   Offices
Manhattan      1234.56        234.56    123.45
Cape Coral      456.78        654.32    876.54
Portland       4321.00        987.65   2345.67

The most resource-like thing we can think of from this data is each cell, but since they are the result of a computed aggregation of more basic data, they do not have a natural ID. We’ve been thinking about delivering them with a computed ID as well (perhaps combining the IDs of the dimensions by which their data is grouped, i.e. "house,34" where house represents a type of property, and 34 is the ID of the location “Manhattan”). Then each cell would have the relationship with the corresponding location record, which would be included in the included section of the payload. Here’s a sample json file of how would this look like:

{
  "data": [
    {
      "id": "house,123",
      "type": "report_items",
      "attributes": {
        "property_type": "house",
        "value": 108.75
      },
      "relationships": {
        "location": {
          "data": {
            "type": "locations",
            "id": 123
          }
        }
      }
    },
    {
      "id": "house,124",
      "type": "report_items",
      "attributes": {
        "property_type": "house",
        "value": 36.0
      },
      "relationships": {
        "location": {
          "data": {
            "type": "locations",
            "id": 124
          }
        }
      }
    },
    {
      "id": "house,125",
      "type": "report_items",
      "attributes": {
        "property_type": "house",
        "value": 1.0
      },
      "relationships": {
        "location": {
          "data": {
            "type": "locations",
            "id": 125
          }
        }
      }
    },
    {
      "id": "office,123",
      "type": "report_items",
      "attributes": {
        "property_type": "office",
        "value": 4.0
      },
      "relationships": {
        "location": {
          "data": {
            "type": "locations",
            "id": 123
          }
        }
      }
    },
    {
      "id": "office,124",
      "type": "report_items",
      "attributes": {
        "property_type": "office",
        "value": 2.0
      },
      "relationships": {
        "location": {
          "data": {
            "type": "locations",
            "id": 124
          }
        }
      }
    },
    {
      "id": "apartment,123",
      "type": "report_items",
      "attributes": {
        "property_type": "apartment",
        "value": 2.0
      },
      "relationships": {
        "location": {
          "data": {
            "type": "locations",
            "id": 123
          }
        }
      }
    },
    {
      "id": "apartment,125",
      "type": "report_items",
      "attributes": {
        "property_type": "apartment",
        "value": 4.5
      },
      "relationships": {
        "location": {
          "data": {
            "type": "locations",
            "id": 125
          }
        }
      }
    },
    {
      "id": "apartment,124",
      "type": "report_items",
      "attributes": {
        "property_type": "apartment",
        "value": 2.0
      },
      "relationships": {
        "location": {
          "data": {
            "type": "locations",
            "id": 124
          }
        }
      }
    }
  ],
  "included": [
    {
      "type": "locations",
      "id": "123",
      "attributes": {
        "name": "Manhattan"
      }
    },
    {
      "type": "locations",
      "id": "124",
      "attributes": {
        "name": "Cape Coral"
      }
    },
    {
      "type": "locations",
      "id": "125",
      "attributes": {
        "name": "Portland"
      }
    }
  ]
}

My question is: is this a proper way of representing this kind of data in jsonapi? Is jsonapi suitable and/or recommended for data that does not directly map to resources? Would I be better off representing this data in custom json? I know non of these questions probably have a definite answer, but perhaps there’s already some experience out there on how to approach similar scenarios, pros and cons of trying to make this kind of data fit onto jsonapi, etc. Any comments and help is very much appreciated. Thanks.

PS: I posted this even after some digging in the forum and on the internet, and these are the only two links that I found that talk about something that resembles what I’m trying to find out, and I include them here for references as well:
1.- Composite ID inside the resource object
2.- Extension for Chart / Graph data?

1 Like

Your API resources don’t have to be a 1:1 map to your database schema. There’s nothing stopping you from exposing a Manhattan resource via your API with attributes such as houseSqFt, or a Houses resource with attributes such as manhattanSqFt, depending on which which way makes most sense to most clients. (You could even expose both of these API resources if you wanted to.)

I’m not sure that would be an ideal solution if the items in the locations and/or property types are dynamic.

In the Manhattan resource you would end up with a list of attributes that changes dynamically (houseSqFt, OfficeSqFt, CondoSqFt, …). Additionally you would have a dynamic list of resources Manhattan, Boston, Mallorca, …

2 Likes

Indeed. And also, I did not mention that I simplified my example to two dimensions. But I would also want to be able to query three dimensions, and obtain a data structure that would give me information per property type, location, and type of listing (wether the property it is for rental or for sale).

My point is, I’m not asking for a specific solution to this particular case, but for some information regarding the possibility to use JSON API for this kind of non-resourceful data. Is it recommended? Should I make the extra effort in trying to adapt my data to JSON API? Or perhaps JSON API is designed to be used specifically with resource-like data?

If you want clients to be able to create ad hoc reports that mashup data in different interesting ways, then GraphQL might be a better choice than JSON API. If the clients always want to see the data structured in the same way then I would expect JSON API could do a reasonable job for you (as probably could other formats).

What do you mean by this? Fielding’s REST dissertation states: The key abstraction of information in REST is a resource. Any information that can be named can be a resource: a document or image, a temporal service (e.g. “today’s weather in Los Angeles”), a collection of other resources, a non-virtual object (e.g. a person), and so on. In other words, any concept that might be the target of an author’s hypertext reference must fit within the definition of a resource. A resource is a conceptual mapping to a set of entities, not the entity that corresponds to the mapping at any particular point in time.

By non-resourceful data, I mean aggregated data that does not have an id that naturally identifies it, wether it is a DB id, or some computed property that would work as an id. I know I can build an artificial id, and I must do so because json-api requires it. But it just seems artificial to me to create an id just for the sake of it, for instance in the case above by concatenating the ids of the objects that conform the intersection of each cell value. My point is, not all data have or need an id, and I think Fielding’s REST dissertation does not require it, but json-api does. It makes sense for things that can also be updated, deleted, created, etc. But it does not make much sense for data that is merely there to be queried and fetched, but that does not work in any other way as a resource. I know I can “force” this data to fit json-api’s design, but I’m wondering if I should, given that it does not naturally fit in it.

For instance, given the table in my original post, I know I can make a virtual model called “TableCell”, and create a RESTful api endpoint /table_cells?group_by[]=location&group_by[]=type&aggregate_field=property_area, and then receive a json-api response that would list each cell as an individual “resource” with a fake built id that it’s just there to comply with json-api, and then each cell have a relationship with its corresponding location and property type. And these in turn can be listed in the included section.

The question is: should I? Wouldn’t I be better off not using json-api for these kind of api endpoints, and just expose a json structure that more naturally describe this?

I suppose that depends on whether you’re imagining that your API is providing data to a client that will use the information to generate a report (in which case “TableCell” doesn’t seem an appropriate resource to me), or whether your API is providing the actual (unstyled) report (which seems a rather restricted API).

“Raw” JSON doesn’t provide any means for you indicate links / relationships between resources. If all your data is essentially a set of unrelated facts then JSON API might not be giving you much value. But if the data does have interesting cross-references then you’re going to need to invent your own mechanism for specifying these (or use another format, such as JSON-LD / HAL / UBER / Siren).

1 Like

Hey all,

I know this is pretty old, but I just wanted to give this discussion a humble plus-one and add my few cents. I’ve been mulling this over for a few months now, and have a few thoughts that may be useful for those getting here via google. The short of it is that I think @gnapse’s original solution (creating a virtual ID from query parameters) is the way to go. Here’s my defense of this position:

In the case of virtual data, it seems helpful to think of it somewhat like a cached query. If you were storing this in a Redis cache to improve load times, for example, you’d likely compose an ID just as you have: ${property type} + ${propertyId}. In this respect, the data actually is indexed by an ID, it’s just a sort of content-addressed ID, rather than some sort of intrinsic resource ID. Thus, I would argue that it’s actually correct to return a resource with this sort of ID.

In the case mentioned in the original post, I would actually be a little more inclined to be more rigid with the parameters. Rather than opening up arbitrary query parameters (like being able to group by a number of different fields), perhaps you create named reports, like /stats/property-sizes, which returns objects of type property-sizes with IDs as you’ve mentioned. This way, if you really wanted to, it would also make sense to do something like GET /stats/property-sizes/house,34 to get the dimensions of a specific property, rather than using the more general syntax, GET /stats/property-sizes?type=house&id=34.

Anyway, just wanted to put in my vote :slight_smile:. Good luck!

Hi, @gnapse: what was your conclusion, or how did you solve your problem?

As I see it now, and because you are calculating an average and not manipulating resources directly, you could use an RPC API (check jsonrpc.org) instead of a REST API.

Or perhaps it would be a good idea to take advantage of the object “meta” and build your custom representation of that non-standard data. The specification says the JSON document must contain at least one of these top-level members: data, meta, errors. So, it’s perfectly valid to have a document containing only a meta object, right?