Share/Propose a Filtering Strategy

scottmcdonald · December 11, 2015, 10:12pm

Introduction

I wanted to share/propose the filtering strategy I designed when it came time for adding filtering of resource collections. After much debate, the filtering strategy I decided upon was to adopt a “subset” or “best of” from the OData 4.0 query standard for the following reasons:

Well designed and maintained query options standard, why reinvent the wheel.
- Same reason why I adopted the json-api standard, standards are good!
Already an adopted industry standard and well documented.
After much review I found it expressive, URL friendly, and would meet almost 100% of my needs for filtering.

Here is the OData 4.0 documentation for it’s URL conventions: http://docs.oasis-open.org/odata/odata/v4.0/odata-v4.0-part2-url-conventions.html

Again I am not proposing the entire OData query syntax, only the best parts with minor tweaking if necessary. With that said here is what I have currently adopted and have working for filtering of our json:api based resource collections:

Logical Operators

eq operator (Equals)
ne operator (Not Equals)
gt operator (Greater Than)
ge operator (Greater Than or Equal)
lt operator (Less Than)
le operator (Less Than or Equal)
and operator (And)
or operator (Or)
not operator (Not)

Grouping

Grouping with the open ( and close ) parenthesis operators.

String Functions

contains function (filters if a string contains another substring)
startswith function (filters if string startswith a substring)
endswith function (filters if string endswith a substring)

Geography Functions

geo.distance function (filters on distance between 2 geographical points (latitude/longitude))

In Operator

The OData standard does not currently support the in operator but has an open requirement to add it. It was so fundamental that I added the in operator based on the open pull request that will be adding it to the OData standard.

Filter Examples:

The following are examples of filtering based on this OData “subset” and “best-of” that I have currently implemented and have working:

Assume we have a Movie resource collection at: /movies

/movies?filter=movie-name eq 'Braveheart'

/movies?filter=movie-rating eq 'G' or movie-rating eq 'PG-13'

/movies?filter=contains(movie-name, 'Star Wars') and price lt 10

/movies?filter=runtime-in-minutes ge 90 and runtime-in-minutes le 120 and rating in ('R', 'PG-13')

/movies?filter=schedule-date eq '12/25/2015' and geo.distance(theater-location, POINT(-80.2008 26.3681)) le 10000

/movies?filter=not(startswith(movie-name, 'The') and movie-rating in ('G', 'PG-13')) or schedule-date eq '1/1/2016'

Conclusion

As you can see with just the “subset” and “best-of” from the OData query standard we have a powerful filter framework to filter our resource collections currently. It was an interesting engineering challenge to create the lexical scanner, Pratt parser, and create the EntityFramework WHERE clause from the parsed syntax tree but now that it is working, filtering is easy and actually fun…

Maybe the json:api team should consider something like this as another possible “recommendation” or “proposal” for filtering, or not…

ethanresnick · December 15, 2015, 12:20am

I like this!

I’d have to think more about the details, but my feeling has been for a while that we should begin to standardize filter. We just haven’t had the bandwidth to do it.

I’m curious what others think of the specifics you’ve proposed.

Also, @scottmcdonald, did you look at GraphQL at all? It might have some ideas that could inform this.

dopin · December 15, 2015, 9:07am

That looks great! But I thought it’s complicated too.
To clarify, this prosal is for an extension or recommendation right?
The speicification says “JSON API is agnostic about the strategies supported by a server. The filter query parameter can be used as the basis for any number of filtering strategies.”.
http://jsonapi.org/format/#fetching-filtering

gottfrois · December 15, 2015, 9:32am

I’m really curious about why using string instead of hash? I’ve implemented a quite similar filtering strategy as well using a hash. For example:

/movies?filter=movie-name eq 'Braveheart'

could be done using a hash:

{ filter: { name: { eq: 'Braveheart' } } }

which translate to

/movies?filter[name][eq]=Braveheart

I think that working and parsing a hash is easier than a string.

tkellen · December 15, 2015, 12:12pm

@gottfrois The hash format looks nice until you start trying to use OR conditionals.

How would you represent the following:

/movies?filter=movie-rating eq ‘G’ or movie-rating eq ‘PG-13’
/movies?filter=runtime-in-minutes ge 90 and runtime-in-minutes le 120 and rating in (‘R’, ‘PG-13’)
/movies?filter=schedule-date eq ‘12/25/2015’ and geo.distance(theater-location, POINT(-80.2008 26.3681)) le 10000
/movies?filter=not(startswith(movie-name, ‘The’) and movie-rating in (‘G’, ‘PG-13’)) or schedule-date eq ‘1/1/2016’

…in your format?

tkellen · December 15, 2015, 12:16pm

This looks very nice @scottmcdonald! I agree with @ethanresnick that we should consider standardizing how filter works at some point. Unfortunately, I feel we have a long way to go before our specification can be that prescriptive. We’re having trouble from a bandwidth perspective landing a single new member (revision) in resource objects, let alone a holistic filtering strategy.

At the moment I think the most appropriate thing for us to do would be to say kudos on a job well done and consider this for an extension (when we have extensions ironed out).

gottfrois · December 15, 2015, 1:00pm

@tkellen it’s easy to handle with in operator and with a or operator using an array:

/movies?filter=movie-rating eq 'G' or movie-rating eq 'PG-13'
/movies?filter=runtime-in-minutes ge 90 and runtime-in-minutes le 120 and rating in ('R', 'PG-13')
/movies?filter=schedule-date eq '12/25/2015' and geo.distance(theater-location, POINT(-80.2008 26.3681)) le 10000
/movies?filter=not(startswith(movie-name, 'The') and movie-rating in ('G', 'PG-13')) or schedule-date eq '1/1/2016'

{filter: { rating: { in: ['G', 'PG-13'] }}}
{filter: { runtime: { gt: 90, lt: 120 }, rating: { in: ['R', 'PG-13'] } }}
{filter: { schedule: { eq: '12/25/2015' }, location: { near: [-80.2008, 26.3681], maxdistance: 10000 }}}
{filter: { or: [{ name: { regex: '^[^The].*$' }, rating: { in: ['G', 'PG-13'] } }, { schedule: { eq: '1/1/2016' }}] }}

BTW in my current implementation, I’m prefixing operators with $, so eq becomes $eq. This is to avoid messing with fields and operators.

Thoughts would be appreciated.

gottfrois · December 15, 2015, 1:13pm

Taking a step back here, what I really liked about JsonAPI Format was that is has well thought conventions about how to structure the json payload. Object resource, relationships, compound documents, this is all great.

But when I saw the paragraph about how my server should behave for the sort strategy I have to say I was a bit disapointed because it forces me to implement a potentially not standard way to handle that thing.

I feel like, sort and filter strategy, should really be shared ideas about how one could handle that feature in his API. But we should not force them to do it in one specific way. Extensions would be great since one could choose to add it or not. Also because 3rd party clients could state to implement the “filter strategy extension” but they would not have to.

tkellen · December 15, 2015, 1:23pm

@gottfrois You asked for feedback so I will say that I find your hash-based approach to grouping conditionals very difficult to follow compared to what @scottmcdonald has implemented.

dmcwherter · December 15, 2015, 4:25pm

@tkellen

tl;dr
Anyone with “large enough” data (this is a function of compute resources, DB tuning, data size, etc.) will run into performance issues and need filtering. Having a non-standard filter strategy will likely tightly-couple client and server implementations in practice.

I understand the bandwidth issue, but I think it would be very constructive to the standard to iron out some of these details. In particular, we (Elide) have realized a slightly different (and, albeit, less powerful) implementation than proposed here in our framework. Likewise, this likely means any other JSON API services would find a similar issue.

Filtering is such a powerful thing for writing efficient queries. In fact, we found that our implementation was unusable on our large data without using filters (full table scans generally aren’t an option for us). Consequently, I find that many services will probably end up running into this issue sooner or later. As a result, we would end up having to pair particular clients to particular backends which feels (as I understand it) against the spirit of JSON API.

Just my 2 cents.

Anyway, thanks for all of your guys’ hard work on JSON API We’ve been silently following along since around May and have already seen a lot of maturity from the standard since then.

ruz · December 16, 2015, 12:44pm

Hi,

It would be so easy to bring a server to its knees:

I would rather see standardized way of providing link to filtered subsets that should be used by clients in most cases. May be with templates. Clients can stick with provided link names and may be stop worrying about building URLs.

ruz · December 16, 2015, 12:46pm

Well, JSONAPI extensions sound like perfect solution for this kind of filtering strategies.

scottmcdonald · December 16, 2015, 6:28pm

@tkellen The purpose of this post was to share for edification purposes anyone who has adopted the json:api standard and the json:api team one solution architect’s approach to filtering json:api resource collections since filtering is an important topic. I wanted the json:api team to be aware of this successful approach so in the future when the team starts tackling the “filter” feature either as a first class part of the specification or as an extension your team was informed basically.

What I really like about adopting/tweaking the best of the OData query syntax is it’s expressiveness and human readability as we use our browser as a poor man’s client tool for development purposes and it is really easy to construct a filter in the URL with this type of syntax.

Total understand about bandwidth and do not expect to see a standard json:api filtering specification or extension in the near future. And yes I am still waiting on the revision member pull request to be “pulled” so I can move my ubiquitous version attribute in all my resources to be a top-level resource member.

scottmcdonald · December 16, 2015, 6:38pm

@ruz Getting a resource collection filtered or unfiltered may or may not bring a “server to its’ knees” - that is why you would implement/force server-side paging for example.

We do have the concept of a “view” of a resource collection which is essentially a packaged named query for common use-cases but you can not anticipate ever way a client will want to filter your resource collections, nor would you want to, IMHO.

scottmcdonald · December 16, 2015, 6:49pm

@ethanresnick I have never heard of GraphQL before so I quickly perused it. My first impressions were it was not immediately intuitive like OData query syntax is and it looks like you construct a json-based payload of the query and supply it in the GET call and not in filter query parameter in the URL? But it is developed/supported by Facebook which means it is definitely a viable query protocol for your team to consider as a standard or extension to json:api.

jocull · January 7, 2016, 4:30pm

I think the proposed syntax is pretty, but I’m sure I’m not alone it not wanting to write my own mini-parser on dealing with all the language semantics. If this gets standardized, will a sample implementation be provided at least? In JavaScript at a minimum.

dmcwherter · January 11, 2016, 8:59pm

@jocull In theory (assuming not too many tweaks to the syntax), you could use an existing odata parser even though the backend wouldn’t necessarily support the entirety of odata (independent of being able to parse the entirety of the spec).

In any case, I am sure if this was standardized that authors of JSON API libraries would certainly implement this functionality. Consequently, it would likely be easily to isolate the parser into its own library if need be.

diosney · February 5, 2016, 7:42pm

I agree totally with @gottfrois, that seems to me a more standard way to pass structured data to the server for filtering.

BTW, that is the approach I currently use and work like a charm.

Also, is a common format for filtering that some ORM have, see http://sequelize.readthedocs.org/en/latest/docs/querying/ as an example.

jgornick · June 7, 2016, 3:05pm

Might also want to check out RQL (https://github.com/persvr/rql) as an alternative to OData.

pashkin · September 14, 2016, 9:34am

Another option of a query language is FIQL:
https://tools.ietf.org/html/draft-nottingham-atompub-fiql-00

Topic		Replies	Views
Filter strategy and strict compliance	3	6963	September 26, 2017
Recommendations on filtering operation (and/or)	1	1388	October 21, 2015
Why not an agnostic sort strategy like filter?	7	4901	January 7, 2016
Filtering/querying (deep) relationships - a proposal for a syntax	8	8090	March 17, 2021
Best way to handle nested filtering?	1	33	May 9, 2025