Share/Propose a Filtering Strategy


#1

Introduction

I wanted to share/propose the filtering strategy I designed when it came time for adding filtering of resource collections. After much debate, the filtering strategy I decided upon was to adopt a “subset” or “best of” from the OData 4.0 query standard for the following reasons:

  • Well designed and maintained query options standard, why reinvent the wheel.
    • Same reason why I adopted the json-api standard, standards are good!
  • Already an adopted industry standard and well documented.
  • After much review I found it expressive, URL friendly, and would meet almost 100% of my needs for filtering.

Here is the OData 4.0 documentation for it’s URL conventions: http://docs.oasis-open.org/odata/odata/v4.0/odata-v4.0-part2-url-conventions.html

Again I am not proposing the entire OData query syntax, only the best parts with minor tweaking if necessary. With that said here is what I have currently adopted and have working for filtering of our json:api based resource collections:

Logical Operators

  • eq operator (Equals)
  • ne operator (Not Equals)
  • gt operator (Greater Than)
  • ge operator (Greater Than or Equal)
  • lt operator (Less Than)
  • le operator (Less Than or Equal)
  • and operator (And)
  • or operator (Or)
  • not operator (Not)

Grouping

Grouping with the open ( and close ) parenthesis operators.

String Functions

  • contains function (filters if a string contains another substring)
  • startswith function (filters if string startswith a substring)
  • endswith function (filters if string endswith a substring)

Geography Functions

  • geo.distance function (filters on distance between 2 geographical points (latitude/longitude))

In Operator

The OData standard does not currently support the in operator but has an open requirement to add it. It was so fundamental that I added the in operator based on the open pull request that will be adding it to the OData standard.

Filter Examples:

The following are examples of filtering based on this OData “subset” and “best-of” that I have currently implemented and have working:

Assume we have a Movie resource collection at: /movies

/movies?filter=movie-name eq 'Braveheart'

/movies?filter=movie-rating eq 'G' or movie-rating eq 'PG-13'

/movies?filter=contains(movie-name, 'Star Wars') and price lt 10

/movies?filter=runtime-in-minutes ge 90 and runtime-in-minutes le 120 and rating in ('R', 'PG-13')

/movies?filter=schedule-date eq '12/25/2015' and geo.distance(theater-location, POINT(-80.2008 26.3681)) le 10000

/movies?filter=not(startswith(movie-name, 'The') and movie-rating in ('G', 'PG-13')) or schedule-date eq '1/1/2016'

Conclusion

As you can see with just the “subset” and “best-of” from the OData query standard we have a powerful filter framework to filter our resource collections currently. It was an interesting engineering challenge to create the lexical scanner, Pratt parser, and create the EntityFramework WHERE clause from the parsed syntax tree but now that it is working, filtering is easy and actually fun…

Maybe the json:api team should consider something like this as another possible “recommendation” or “proposal” for filtering, or not…


Why not an agnostic sort strategy like filter?
Handling aggregation & comparison operators
#2

I like this!

I’d have to think more about the details, but my feeling has been for a while that we should begin to standardize filter. We just haven’t had the bandwidth to do it.

I’m curious what others think of the specifics you’ve proposed.

Also, @scottmcdonald, did you look at GraphQL at all? It might have some ideas that could inform this.


#3

That looks great! But I thought it’s complicated too.
To clarify, this prosal is for an extension or recommendation right?
The speicification says “JSON API is agnostic about the strategies supported by a server. The filter query parameter can be used as the basis for any number of filtering strategies.”.
http://jsonapi.org/format/#fetching-filtering


#4

I’m really curious about why using string instead of hash? I’ve implemented a quite similar filtering strategy as well using a hash. For example:

/movies?filter=movie-name eq 'Braveheart'

could be done using a hash:

{ filter: { name: { eq: 'Braveheart' } } }

which translate to

/movies?filter[name][eq]=Braveheart

I think that working and parsing a hash is easier than a string.


#5

@gottfrois The hash format looks nice until you start trying to use OR conditionals.

How would you represent the following:

/movies?filter=movie-rating eq ‘G’ or movie-rating eq ‘PG-13’
/movies?filter=runtime-in-minutes ge 90 and runtime-in-minutes le 120 and rating in (‘R’, ‘PG-13’)
/movies?filter=schedule-date eq ‘12/25/2015’ and geo.distance(theater-location, POINT(-80.2008 26.3681)) le 10000
/movies?filter=not(startswith(movie-name, ‘The’) and movie-rating in (‘G’, ‘PG-13’)) or schedule-date eq ‘1/1/2016’

…in your format?


#6

This looks very nice @scottmcdonald! I agree with @ethanresnick that we should consider standardizing how filter works at some point. Unfortunately, I feel we have a long way to go before our specification can be that prescriptive. We’re having trouble from a bandwidth perspective landing a single new member (revision) in resource objects, let alone a holistic filtering strategy.

At the moment I think the most appropriate thing for us to do would be to say kudos on a job well done and consider this for an extension (when we have extensions ironed out).


#7

@tkellen it’s easy to handle with in operator and with a or operator using an array:

/movies?filter=movie-rating eq 'G' or movie-rating eq 'PG-13'
/movies?filter=runtime-in-minutes ge 90 and runtime-in-minutes le 120 and rating in ('R', 'PG-13')
/movies?filter=schedule-date eq '12/25/2015' and geo.distance(theater-location, POINT(-80.2008 26.3681)) le 10000
/movies?filter=not(startswith(movie-name, 'The') and movie-rating in ('G', 'PG-13')) or schedule-date eq '1/1/2016'
{filter: { rating: { in: ['G', 'PG-13'] }}}
{filter: { runtime: { gt: 90, lt: 120 }, rating: { in: ['R', 'PG-13'] } }}
{filter: { schedule: { eq: '12/25/2015' }, location: { near: [-80.2008, 26.3681], maxdistance: 10000 }}}
{filter: { or: [{ name: { regex: '^[^The].*$' }, rating: { in: ['G', 'PG-13'] } }, { schedule: { eq: '1/1/2016' }}] }} 

BTW in my current implementation, I’m prefixing operators with $, so eq becomes $eq. This is to avoid messing with fields and operators.

Thoughts would be appreciated.


#8

Taking a step back here, what I really liked about JsonAPI Format was that is has well thought conventions about how to structure the json payload. Object resource, relationships, compound documents, this is all great.

But when I saw the paragraph about how my server should behave for the sort strategy I have to say I was a bit disapointed because it forces me to implement a potentially not standard way to handle that thing.

I feel like, sort and filter strategy, should really be shared ideas about how one could handle that feature in his API. But we should not force them to do it in one specific way. Extensions would be great since one could choose to add it or not. Also because 3rd party clients could state to implement the “filter strategy extension” but they would not have to.


#9

@gottfrois You asked for feedback so I will say that I find your hash-based approach to grouping conditionals very difficult to follow compared to what @scottmcdonald has implemented.


#10

@tkellen

tl;dr
Anyone with “large enough” data (this is a function of compute resources, DB tuning, data size, etc.) will run into performance issues and need filtering. Having a non-standard filter strategy will likely tightly-couple client and server implementations in practice.

I understand the bandwidth issue, but I think it would be very constructive to the standard to iron out some of these details. In particular, we (Elide) have realized a slightly different (and, albeit, less powerful) implementation than proposed here in our framework. Likewise, this likely means any other JSON API services would find a similar issue.

Filtering is such a powerful thing for writing efficient queries. In fact, we found that our implementation was unusable on our large data without using filters (full table scans generally aren’t an option for us). Consequently, I find that many services will probably end up running into this issue sooner or later. As a result, we would end up having to pair particular clients to particular backends which feels (as I understand it) against the spirit of JSON API.

Just my 2 cents.

Anyway, thanks for all of your guys’ hard work on JSON API :smile: We’ve been silently following along since around May and have already seen a lot of maturity from the standard since then.


#11

Hi,

It would be so easy to bring a server to its knees:

I would rather see standardized way of providing link to filtered subsets that should be used by clients in most cases. May be with templates. Clients can stick with provided link names and may be stop worrying about building URLs.


#12

Well, JSONAPI extensions sound like perfect solution for this kind of filtering strategies.


#13

@tkellen The purpose of this post was to share for edification purposes anyone who has adopted the json:api standard and the json:api team one solution architect’s approach to filtering json:api resource collections since filtering is an important topic. I wanted the json:api team to be aware of this successful approach so in the future when the team starts tackling the “filter” feature either as a first class part of the specification or as an extension your team was informed basically.

What I really like about adopting/tweaking the best of the OData query syntax is it’s expressiveness and human readability as we use our browser as a poor man’s client tool for development purposes and it is really easy to construct a filter in the URL with this type of syntax.

Total understand about bandwidth and do not expect to see a standard json:api filtering specification or extension in the near future. And yes I am still waiting on the revision member pull request to be “pulled” so I can move my ubiquitous version attribute in all my resources to be a top-level resource member.


#14

@ruz Getting a resource collection filtered or unfiltered may or may not bring a “server to its’ knees” - that is why you would implement/force server-side paging for example.

We do have the concept of a “view” of a resource collection which is essentially a packaged named query for common use-cases but you can not anticipate ever way a client will want to filter your resource collections, nor would you want to, IMHO.


#15

@ethanresnick I have never heard of GraphQL before so I quickly perused it. My first impressions were it was not immediately intuitive like OData query syntax is and it looks like you construct a json-based payload of the query and supply it in the GET call and not in filter query parameter in the URL? But it is developed/supported by Facebook which means it is definitely a viable query protocol for your team to consider as a standard or extension to json:api.


#16

I think the proposed syntax is pretty, but I’m sure I’m not alone it not wanting to write my own mini-parser on dealing with all the language semantics. If this gets standardized, will a sample implementation be provided at least? In JavaScript at a minimum.


#17

@jocull In theory (assuming not too many tweaks to the syntax), you could use an existing odata parser even though the backend wouldn’t necessarily support the entirety of odata (independent of being able to parse the entirety of the spec).

In any case, I am sure if this was standardized that authors of JSON API libraries would certainly implement this functionality. Consequently, it would likely be easily to isolate the parser into its own library if need be.


#18

I agree totally with @gottfrois, that seems to me a more standard way to pass structured data to the server for filtering.

BTW, that is the approach I currently use and work like a charm.

Also, is a common format for filtering that some ORM have, see http://sequelize.readthedocs.org/en/latest/docs/querying/ as an example.


#19

Might also want to check out RQL (https://github.com/persvr/rql) as an alternative to OData.


#20

Another option of a query language is FIQL:
https://tools.ietf.org/html/draft-nottingham-atompub-fiql-00