REST API: Future thoughts...

OVERVIEW

REST API: Future thoughts

Whilst we are grateful to HighQ for delivering a rich set of API's, to interact with the Collaborate product, using them in anger when developing applications uncovers various drawbacks that are inherent with REST API's in general.

> TL;DR

REST usage leads to extremely “chatty” communications between application and back-end system. What I'd like to propose is that HighQ develop a complementing API based on GraphQL, to facilitate smart usage that is more conducive for today's progressive web and mobile applications. Developing such an API can be done very quickly and cheaply, using much of the back-end components utilised by the REST API.

I've created an example GraphQL server that is consuming the REST API's of the Integration Beta instance as proof-of-concept:

GraphiQL (playground query IDE)

https://zcvjxvq8cd.execute-api.ap-southeast-2.amazonaws.com/integrationbeta/graphql

GraphQL endpoint (this is where you send queries)

https://zcvjxvq8cd.execute-api.ap-southeast-2.amazonaws.com/integrationbeta/graphql

GraphQL Schema Diagram

https://s3-ap-southeast-2.amazonaws.com/highq-integrationbeta-public/img/graphql-schema-collaborate.png

It's important to remember that because I am using the REST API, behind the GraphQL server to resolve data, the performance of this proof-of-concept is not reflective of a GraphQL server that would be created by HighQ using native application layer components to resolve the data.

The problem

REST API's provide a way of exposing operations (verbs) on system resources (endpoints), e.g. a User or a document, and as such when building applications that consume these endpoints, it soon ends up becoming a profusion of calls to these API endpoints to build up a consolidation of these resources, and their relationships to each other. For example, say we want to find out simple details of a particular site, the iSheets in it and their views, and let's add top-level folders to the mix. To obtain all this information, the following API endpoints would need to be called:

GET /sites/{siteid}
    GET /folders/{folderid}/items
    GET /isheets/admin?siteid={siteid}
        [iSheet #1]
            GET /isheets/admin/{isheetid}/sections
            GET /isheets/admin/{isheetid}/columns
            GET /isheets/admin/{isheetid}/views
        [iSheet #2]
            GET /isheets/admin/{isheetid}/sections
            GET /isheets/admin/{isheetid}/columns
            GET /isheets/admin/{isheetid}/views
        ...
        ...
        [iSheet N]
            GET /isheets/admin/{isheetid}/sections
            GET /isheets/admin/{isheetid}/columns
            GET /isheets/admin/{isheetid}/views

A site that has two iSheet's will result in 9 separate calls, and for 3 iSheets that's 12 separate calls - the calls to sections, columns, views need to be made for each iSheet in that site. The number of calls increases more when we wish to obtain permissions on iSheets, views, and columns, or any other resource that requires separate calls. The popular term for this drawback is often referred to as “under-fetching”, that is, response data doesn't contain enough information required by the caller leading to more requests.

The indentation above also shows the order in which these calls will need to be made; iSheet and Folders need to be called after the first call to the Site endpoint, sections, columns, views have to be called after the calls to that iSheet endpoint has been completed. This needs to be accounted for when developing in asynchronous (event driven) environments.

The other major drawback with REST API's is usually referred to by the other popular nomenclature “over-fetching”. In essence it's the reverse, some endpoint responses contain far more information than is necessary in the current context which wastes network bandwidth, and server usage, unnecessarily.

The answer...or is it?

On some of the REST API endpoints, filters or 'switches' are employed to help control what level of information is returned. For example:

GET /sites?filterby=all&status=active&includefolderpermission=true&name=Test

In this example we control the list of sites returned to only include Active sites that contain the word 'Test” in the site name, and we wish to include folder permissions in the endpoint response.

Although this goes some way to solve some of the issues raised above, it's not a sustainable solution. How many 'switches' would it take to satisfy the requirements for each and every developer and the type of applications they are creating? Do we raise enhancement requests with HighQ to add additional filtering to endpoints?

The answer...a better fit?

Three years ago Facebook publicly released an alternative approach to REST, called GraphQL, which they had been using for their own platforms for some time.

GraphQL is a structured (strongly-typed) data querying language that allows clients to define the structure of the data required, and which the response data returns exactly that structure. This ability to only return what was requested helps avoid the problems of over-fetching, and under-fetching that we have described above when talking about using REST. The structure of the data is defined by the GraphQL Schema which details not just the data types, but also defines relationships.

Let's see an example:

type Site {
    siteId: Int!
    name: String
    description: String
    createdDate: Date
    status: SiteStatus
    siteUsers: [User]
    folder: Folder
    sheets: [Sheet]
    groups: [SiteGroup]
}

In the above example we have an object (Site) that has the properties defined – this includes relationships to other objects; Folder, User, Sheet (square brackets simply denotes 'a collection of'). With the GraphQL API, we define queries that allow us to retrieve these objects and their properties. Using the example above, we can define a simple query (in the Schema) as such:

type Query {
    getSite(siteId: Int): Site
}

And call the GraphQL endpoint with the following request (via HTTP POST):

query {
    getSite(siteId: 20) {
        siteId
        name
        sheets {
            sheetId
            title
        }
    }
}

This will result in details about the Site (with ID#20) being returned to the caller. What's important to note here is that in the above query, we specify only the information we require. The site object also has properties for folders, users, and groups, but all we requested was id, name, and any iSheets. The GraphQL server retrieves the information, only returning what's necessary to the caller, as specified by them. From a network bandwidth point of view, we are performing optimally by only sending/receiving necessary information. For the server, it's only retrieving the information needed, i.e. no back-end queries were made to Users, or Folders that are associated with this Site – because the caller never specified as such.

Let's extend the query above to return information we originally defined in the “The problem” section:

query {
    site(siteId: 20) {
        siteId
        status
        folder {
            name
            description
            createdDate
            modifiedDate
            location
        }
        sheets {
            sheetId
            title
            description
            sections {
                sectionId
                name
                description
            }
            columns {
                columnId
                name
                description
                type {
                    name
                    alias
                }
            }
        }
    }
}

Here we make just one call to the GraphQL server who is then responsible for retrieving the information requested – whether that 'resolves' to further back-end queries (for other objects like Folders or Sheets) is of no concern to the caller, and allows implementation of the server to use certain smart practices to enhance performance, e.g. caching.

Another feature of the query language is that we can composite several queries into one call, for example:

query {
    getSite(siteId: 20) {
        siteId
        name
        sheets {
            .........the rest omitted for brevity.......
        }
    }
    getSheetItems(sheetId:201, viewId:229) {
        recordCount
        totalCount
        results {
              .........the rest omitted for brevity.......
        }
    }
}

This will not only retrieve the requested details of site #20, but will also return all the iSheet Items in Sheet #201, using view #229.

Proof-of-concept

To help further cement the usefulness of this type of API, I've create a proof-of-concept that you can all play around with, and familiarise yourself with the power behind the API.

This proof-of-concept was created using the following technology stack (links provided), and totalled about 3 man-days of effort (excluding any additional effort required to overcome Collaborate API idiosyncrasies).:

Techonology Stack used:

AWS Lambda - using Node.JS
AWS API Gateway
Apollo GraphQL Server - using the Apollo Server with Lambda
Serverless Framework – for developing, test, and deploying AWS infrastructure resources

Note: GraphQL Server implementations are also available for most popular languages, e.g. Java, Python, C#, PHP, Ruby, and many more.

You can create and submit your own queries by going to an IDE that the GraphQL server provides (GraphiQL) at the following address:

    https://zcvjxvq8cd.execute-api.ap-southeast-2.amazonaws.com/integrationbeta/graphql

The GraphQL server is consuming the REST API of the Integration Beta instance, and is limited to just one site, called “Acme – Public”. If you wish to gain access to view this site on Collaborate, you can contact either myself or High Support.

GraphQL Schema

You can navigate the GraphQL Schema using the [ < Docs ] button, at the top-right of the screen, to open up the "Documentation Explorer" where you can view all the query objects that are available, as well as all the other objects and their properties, that are defined in the Schema.

    I also have a Schema image diagram you can look at too

You will also notice that in the Documentation Explorer there are descriptions below some of the objects, properties, and parameters. These can all be added when constructing the schema - this can then become your online API documentation.

Below are some queries to get you started. Click on the link to load the query into the IDE, then press the play button to run the query. You can then add to, or amend the query as you see fit:

Example Queries

Query #1

Query all active sites in the instance, and return site properties, as well as all site users, site groups (plus module permissions, and users), and some details about the root folder for each site.
Query #2

An advanced iSheet Search for Planets that are both hot, and have mountainous terrains (across two columns, both choice values).
Query #3

Retrieve all tasks for the site, and include any attachments.
Query #4

Get all Task, Blog, and Comment activities along with the user details (actors), and attachments on any comments; AND get the Planet iSheet columns (with section details), the default view with it's columns, permissions, and sort order, and any email view defined. Finally, return all site groups.
Query #5

Get basic details about a site, and a particular iSheet, retrieving details about the Default View and it's columns, and the sort ordering. Note here we are retrieving column details beyond just the Id, we are actually asking for details like choice options, it's mandatory, or whether any column conditions exists for this column. We can do this because of the relationships - Sheet Views have a collection of Columns, so it doesn't matter if we ask for column details from the Sheet level, or the View level.
Query #6

This example uses variables you pass into "canned" queries. Here we query a site, and include details about iSheets and optionally views and columns (only mandatory, or all).

Conclusion

GraphQL is nearly three years old now and, since the second half of last year, activity in this space has gained impressive pace and uptake – not just from the community but from blue-chip companies too. With focus on application performance, be it web or native mobile, GraphQL fits this purpose better than REST by providing a framework (via the query language) that can help alleviate the over and under-fetching inherit in REST API's - though REST can incorporate 'switches' to influence responses, it pales compared to the potential abilities available through GraphQL.

I'd like to propose that HighQ should look at developing a GraphQL-based API that complements the existing REST API (not replace) initially for query-based functionality. Although GraphQL is capable of implementing full CRUD functionality via mutations, to expedite things it would be good to concentrate on delivering query-based functionality primarily (with any modifications performed via REST) then subsequent releases providing mutations, and possibly even subscriptions.

Further considerations

As this is a proof-of-concept, and not production-ready, certain implementation aspects will still need to be considered. Here are four such considerations:

Security – though there are some security walls in the proof-of-concept, this certainly wouldn't be enough for a production-based service. Passing OAUTH tokens in request headers would be once such solution.
Caching – again the proof-of-concept employs some 'smart' caching (on a request basis only), but more could be achieved if using native application layers to resolve data.
Pagination – the proof-of-concept utilises some pagination where the REST API provides limit/offset functionality, though best practices have seen movements towards cursor-based solutions.
Query Cost – with GraphQL it's possible to construct a query that ends up interrogating a lot of resources in Collaborate. To combat this, when implementing the GraphQL server, we can attribute 'costs' to certain properties of a query. These costs can then be totalled to produce an overall weighting of the total expense of running a query, taking appropriate action.

And Finally

Over the next coming weeks, I will be putting together a Home Page dashboard on the "Acme - Public" site, to demonstrate the possibilities of using the API - so stay tuned.

I would like to hear thoughts from HighQ, and other devs on this site. If you have any comments, or anything you would like to add, please leave comments down below.

Updates...

Friday, 23rd March 2018 - The proof-of-concept GraphQL server was developed against version 4.3.3.3 of Collaborate. Now that Integration Beta has been upgraded to 4.3.4.1, there may be some things that are breaking or not working as expected due to fixes to the Collaborate API. If you come across any issues, please shout out and provide details in a comment. Thank you, Andrew

Tags (7)

rest api graphql development demo integration beta

Comments

2 Comments

Blog Comments

Hi Imran Aziz, was there any interest from the dev team regarding this?

Great suggestion and insight Andrew Quinn. I do agree that there is value in using GraphQL, so let me discuss this with our engineering team, so I have an idea of the effort required to do this.

I will need some time to discuss and understand the impact of this change as it will be significant.

As you have pointed out it will be good to understand if other partners and clients will be interested in this change and are willing to use GraphQL.

Last Updated: Aug 18,2023

Categories

Archives