Strategies for dealing with infinite data possibilities?


#1

I have been at it for a week now trying to implement JSON API to spec. I am going to try to describe my frustrations with it, and hopefully they resonate with someone else.

The problem is that the spec is brutal about how data relationships are supposed to be handled. I feel as if it’s a never-ending rabbit hole where an unending number of scenarios are possible. Any data can be filtered, sorted, paginated, sparsely queried, had nested data joined in, etc…

As I get further and further into this, I feel as if I will never be able to find all the issues. I’ve been trying to have our data models and relationships be uncovered dynamically. This has worked to an extent, but the number of edge cases grows continually.

Here is an example:

  • A person HasOne profile photo
    • type: person, id: 1
  • A profile photo BelongsToOne person
    • type: photo, id: 10, (foreign key: personId: 1)

Each of these has a data model, and a set of routes.

  • /person/1?include=photo
  • /photo/10?include=person
  • /person/1/photo
  • /photo/10/person

Querying for an items requires you to go back and generate the appropriate route. I don’t know what strategies other people are using for this. It’s a strange situation where the model depends on the router depends on the model (which model goes with which routes? how do I generate them again?)

In one case, we are querying for:
Photo WHERE PersonId = 1

In the other case we are querying for:
Person JOIN Photo WHERE Photo.id = 10

This doesn’t even include more cases where I could have more complex query relationships.

I know these must sound like madman ramblings, but I feel like I’m climbing a hill I can never get to the top of. If I try to handle everything dynamically, I can never catch all the edge cases. If I try to do everything manually, I can never implement the infinite number of query combinations and paths that are possible.

Does anyone have strategies for dealing with this?


#2

Writing an ORM is a daunting task and it sounds to me like this is what you’re trying to tackle. I suspect you are writing a JSON API implementation that translates all the requests into raw SQL queries; is this correct?

The approach we have taken in our framework is to fallback on existing ORMs (i.e. hibernate, in-memory stores, etc.) and translate JSON API into higher-level models that the ORM itself can then go and format the query effectively. If you can insulate yourself from actually constructing the query yourself, I am sure you will find that you save yourself a lot of headache in the long run. If you’re set on constructing the queries from scratch, then I agree it will be a more difficult task.

That being said, I am not convinced this is an issue of JSON API. Any API which supports this kind of querying through data would pose the same issues (short of the input being a raw query). What JSON API provides is a sane and specified method for exposing your datastores through a series of RESTful endpoints.

tl;dr You should look into using an existing ORM and translate your queries to higher-level models rather than raw SQL.


#3

I’m using an ORM currently, which is helping to an extent. The problem is that data relationships can become quite complex and translating them into something the ORM understands has been an ongoing challenge. The query disparity I posted above is one place this pops up.

It could be that the ORM we are using here is too weak: http://docs.sequelizejs.com/en/latest/

Querying by a joined table has been very difficult.


#4

What query disparity?

  • /person/1/photo — the request is for a photo with the constraint person=1
  • /photo/10/person — the request is for a person with the constraint photo=10.

The queries appear to match the requests. A “disparity” is to be expected since you’re requesting two disparate entities.

To get started with a general algorithm, given the path /root/{id}/rrel1/rrel2/.../primary?include=prel1,prel2,..., look up the model that corresponds to root, then walk the relation tree based on the fields rrel1 up to primary, which is the actual target of the request and will be returned as primary data.

Query for the entity corresponding to primary, add the join constraints between root and rrelN, and there’s your query for the primary data. Query for the includes and populate the primary result set. Voilà.


#5

The problem I have been running into is that the constraints are not as simple as you have laid out. Perhaps some ORMs treat them this way, but others do not.

One is simply this - you are correct.

SELECT person.*
FROM person
WHERE id = 1

However, the other suddenly involves a join:

SELECT person.*
FROM person
INNER JOIN photo on photo.personId = person.id
WHERE photo.id = 10

This is, of course, a simplified example. Some data relationships may transform into much more complex queries. I appreciate that ORMs handle a lot of the magic here, but explaining these relationships to them (depending on your ORM) can be difficult.

Sequelize, for example, doesn’t support subqueries. You can filter on model inclusions, but at that point you are really querying the primary table and bringing in or filtering on extra data. There’s no good way to query the related table and walk the where clause up to find data related to the primary. http://docs.sequelizejs.com/en/latest/docs/querying/#relations-associations

I appreciated your comment about walking up the relationship tree, however. I am going to try using data inclusions and then pulling out the data I want. It means more load on the database, but if it saves me sanity I’m going to do it.


#6

The two queries you just posted can be produced by the same general algorithm: SELECT from the primary resource and add a JOIN constraint for each additional segment in the requested path.

If your ORM doesn’t support chaining JOIN constraints manually, or you can’t build such functionality on top of it, then the ORM may not be up to the task.