Best way to represent simple catalogs (key, description)

shg · July 15, 2020, 4:03am

Hello, everybody!

We are building APIs for our university. While building the JSON docs following JSON:API spec, we realized that we have a considerable amount of “catalogs” of the form {“key”: “abc”, “description”: “XYZ” }. When trying to represent a resource that has many of these key-description fields as attributes, we first tried to represent the catalogs as resources and used the relationships and included objects, but then noted that the resulting JSON document grows considerably, as below:

{
    "data": {
        "type": "student",
        "id": "studentID",
        "attributes": {
            "firstName": "First Name",
            "lastName": "Last Name",
        },
        "relationships": {
            "has-gender": { "data": { "type": "genders", "id": "M" } },
            "has-nationality": { "data": [ { "type": "nationalities", "id": "01" },
                                           { "type": "nationalities", "id": "02" } ],
            "has-status": { "data": { "type": "student-status", "id": "R" } }
        },
        "included": [
            {
                "type": "genders",
                "id": "M",
                "attributes": {
                    "description": "Male"
                }
            },
            {
                "type": "nationalities",
                "id": "01",
                "attributes": {
                    "description": "Mexican"
                }
            },
            {
                "type": "nationalities",
                "id": "02",
                "attributes": {
                   "description": "US American"
                }
            },
            {
                "type": "student-status",
                "id": "R",
                "attributes": {
                    "description": "Regular"
                }
            }
        ]
    }
}

This is for a single student representation. We then thought about including the foreign keys as attributes of the student, together with their corresponding descriptions in an object, as follows:

{
    "data": {
        "type": "student",
        "id": "studentID",
        "attributes": {
            "firstName": "First Name",
            "lastName": "Last Name",
            "gender": { "genderKey": "M", "genderDescription": "Male" },
            "nationalities": [ { "nationalityKey": "01", "nationalityDescription": "Mexican" },
                               { "nationalityKey": "02", "nationalityDescription": "US American" } ],
            "student-status": { "statusKey": "R", "statusDescription": "Regular" }
        }
    }
}

This is just an example, but we have resources that contain more than 10 attributes of this type and become very verbose when using relationships.

Is the last approach “correct”? I read in the spec that foreign keys SHOULD NOT be part of the attributes of a resource, but precisely because it says “SHOULD NOT”, and not “MUST NOT”, is that we are considering this approach.

We could define to set as attributes only the descriptions, but in some cases the consumer could need the keys to perfom some other operations.

Any comments are welcome. Thank you!

cmeeren · July 22, 2020, 2:45pm

If you use a relationship like this:

"has-gender": { "data": { "type": "genders", "id": "M" } }

then you say that there exists a resource of type genders with ID "M". Is that correct? Alternatively, a gender could just be an enum attribute:

"gender": "M"

where you then have to document the known set of values for the gender attribute.

Also, is there a reason you call stuff has-xx? Seems verbose. I’d just call it gender, nationalities, and studentStatus.

I don’t know much about your domain, but I would probably represent both gender and status as enum attributes as described above. If some of these are dynamic and you actually need to have them as resources (e.g. you have a /nationalities endpoint where you can manage the known nationalities), then I’d use relationships for those. I would also primarily use relationships for anything that’s to-many (like nationalities). That makes it possible for clients to add/remove members and not just replace the complete set.

Yes, JSON:API is a bit verbose with many relationships. No way around that. The benefit is the flexibility. And keep in mind that the client code doesn’t care about the verbosity. What seems verbose to humans may actually be very small amounts of data in the grand scheme of things.

shg · August 5, 2020, 11:54pm

Thank you for your comments.

Sorry that I took so long to answer. I started this message in the following two days after your answered and then interrupted it till now…

Yes, we were thinking about our “catalogs” as resources at first. But as they only have an “id” and a “description” that rarely change, we were trying to find out a way to simplify the JSON document to represent this. I agree on considering them as enum attributes.

Now: as some of the “catalogs” I mention here are kept on different tables in the database and their extraction is rather expensive, we are thinking on including only the “ids” in the default Response. So, if a consumer needs also the descriptions, they would need to set an input parameter in the Request, like, for example, “includeEnumDescriptions=true”. What do you think about this?

As for the names of the attributes, you’re right: no need to add “has-”. I got confused, for those kinds of names were thought to be used for boolean parameters, and not for enum parameters.

Following my example as before, now we would have the following as the default Response:

{
    "links": { "self": "https://apihost/student/studentID" }
    "data": {
        "type": "student",
        "id": "studentID",
        "attributes": {
            "firstName": "First Name",
            "lastName": "Last Name",
            "genderID": "M",
            "statusID": "R"
        },
        "relationships": {
            "nationality": { "data": [ { "type": "nationalities", "id": "01" },
                                       { "type": "nationalities", "id": "02" } ],
        },
        "included": [
            {
                "type": "nationalities",
                "id": "01",
                "attributes": {
                    "description": "Mexican"
                }
            },
            {
                "type": "nationalities",
                "id": "02",
                "attributes": {
                   "description": "US American"
                }
            }
        ]
    }
}

And this for the Response when asked to include descriptions of enum attributes:

{
    "links": { "self": "https://apihost/student/studentID?includeEnumDesc=true" }
    "data": {
        "type": "student",
        "id": "studentID",
        "attributes": {
            "firstName": "First Name",
            "lastName": "Last Name",
            "genderID": "M",
            "genderDesc": "Male",
            "statusID": "R"
            "statusDesc": "Regular"
        },
    "relationships": {
     ...
}

Last, I agree on your comments about the flexibility jsonapi provides vs verbosity. I had somehow the idea, but not as clear: APIs are to be understood by things before than by humans, so it’s ok to be a little verbose.

Regards!

cmeeren · August 6, 2020, 5:41am

With your latest suggestion you seem to be implementing custom stuff that JSON:API already handles well.

`includeEnumDesc` parameter

This functionality is already covered by sparse fieldsets. Use fields[student]=genderDesc,statusDesc,.... AFAIK it’s fine to only include a subset of all the fields by default, and include other fields only if the client specifically requests them.

`genderID`/`statusID`

The names smack of foreign key attributes. JSON:API says this:

Although has-one foreign keys (e.g. author_id ) are often stored internally alongside other information to be represented in a resource object, these keys SHOULD NOT appear as attributes.

If you do not model them as separate resources but only as enum properties, I suggest calling them gender and status. The client does not need to know that the values are used as IDs in a database. You should abstract that from the clients, and the ...ID names make them a leaky abstraction.

If you model them as separate resources, then of course you should use a relationship, and not an attribute.

`genderDesc` and `statusDesc`

Do you really need these? My proposal for using a statically documented enum instead of separate resources in a relationship is predicated on 1) the values rarely changing, and 2) not needing to pass other information about the enum value.

Regarding 2), genderDesc seems superfluous; it seems like your API is then also taking on the role of supplying strings to the front-end. If you simply document all known enum values for gender, then the front-end can decide on its own how to display the values to the user.

The same applies to statusDesc, though regarding 1), if you find that you need to change the values so frequently that static documentation doesn’t suffice, and you actually need to supply strings to the front-end, you should consider making it a relationship instead.

Nationality description

Do you need this? There are standard ways of representing nationalities. For example, you can use ISO 3166 two- or three-letter country codes. Then the front-end can use tools like this to choose suitable ways of presenting the country names to the user.

fdrake · August 6, 2020, 1:57pm

I’ve seen comments like this in this forum, but I wish that were explicit in the specification.

-Fred

shg · August 6, 2020, 5:00pm

I agree that JSON:API already includes “fields” for the consumers to get only the fields they need. But I think this mechanism was thought of for a scenario where the consumer needs less fields than the default response provides, for it is mandatory to explicitly list all the attributes needed. I think this would be cumbersome for scenarios where you need more data than provided by the default. Of course, the default should be designed to solve the 80/20 of the scenarios (or maybe higher). That may suffice to justify the 20 ones.

So, I’ve been thinking that it might be useful to define something like fields[student]=all() or fields[student]=exclude(a, b, c) so as not to need to list all attributes in the querystring when the default has less data than needed. (Maybe I’m thinking again in the developers more than in the client systems. Maybe the JSON:API group already thougt about it and found out that it was unnecesary or would rearely be used)

Nowadays, we provide the descriptions in the APIs. But we’re redesigning them, so I’m trying to define standard guidelines for everyone to standardize. Anyway, I don’t like the idea of implementing custom stuff when there’s a way already defined. So, I’ll take a closer look to our scenarios and cases to make sure that some of the atributes are OK to be treated as enum properties, but others, as you said, would be better treated as resources on their own (the ones changing frequently or having many different values that would make it difficult for consumers to maintain the list of descriptions locally).

Last: nationality. This topic, together with others in our institution regarding identity and master data, has long been an issue for several reasons, including “many IT groups” working independently for years and just centralized some years ago. We have a considerable technical debt yet. I saw the ISO codes you mention and the repository of countries with translations but I need to talk to our data governance team about those. Thanks for the suggestion.

cmeeren · August 6, 2020, 8:38pm

That may be the most common use-case, but sparse fieldsets is certainly a good solution to expensive fields that must be explicitly requested. There is nothing in the spec that requires all resource fields to be returned by default, hence it is implicitly allowed to return a subset of fields by default. Just as you may choose to include certain relationships by default, or include relationships’ data (without including the related resources) by default. It’s implicitly allowed by the spec. I do agree with @fdrake’s sentiment, though:

Perhaps one of you could suggest that on the spec’s GitHub repo?

While I share your desire for an official “remove” syntax for sparse fieldsets, I would strongly advise against implementing non-standard syntaxes for features that are fully covered by JSON:API. As you say yourself:

Sparse fieldsets solves your problem, with the only cost being a slightly longer hard-coded query string in client apps. Again, as you say yourself:

(Emphasis mine.) I don’t necessarily agree that many values = difficult to maintain. I would say that enums that change frequently = difficult to maintain, regardless of the number of values. Enums that never change = easy to maintain, regardless of number, since they never have to be updated.

fdrake · August 6, 2020, 9:07pm

Perhaps I could. I’ve added a couple of issues there over the past week as a test, to see if the decision-makers are any more responsive there. Since I’m on a small team, I don’t have a lot of time to write up rational suggestions to get no response.

But just in case:

-Fred

fdrake · August 7, 2020, 8:17pm

The allowed behavior for unspecifed fields for a type has been clarified in both 1.0 and draft 1.1:

Topic		Replies	Views
Metadata About Relationships	4	11190	February 1, 2016
Working with data on relationships	20	25226	January 24, 2019
Structure of "reference document"	3	412	April 12, 2024
Managing relationships with data	8	272	June 10, 2024
JSON-Format for an attribute rendered as combobox in the UI	3	1685	July 28, 2015

Best way to represent simple catalogs (key, description)

includeEnumDesc parameter

genderID/statusID

genderDesc and statusDesc

Nationality description

Related topics

`includeEnumDesc` parameter

`genderID`/`statusID`

`genderDesc` and `statusDesc`