Discussion on: GRANDstack Access Control - Basics and Concepts

View post

Hello Ian.
Implementing a granular permission system to GRANDSTACK is a great initiative!

While I was reading through te concepts, i was thinking that in my opinion, there are 2 major approaches for implementing this. The first is at the instance level, the second is at the schema level. Your implementation combines both if I understand it correctly.
The permissions are first declared in the schema, but at the end they are persisted in the graph on instance level as relations to User Nodes. This may lead to the Neo4j supernode syndrome (many incoming relations from millions of nodes to the user nodes). Also this approach creates a mass of extra relations in order to persist the permissions system. It's like a permission system graph layer on top of a data graph layer. This also leads to graph equivalents of ALTER TABLE when the permissions change. All the concerned permission relations will have to be deleted or recreated.

Does it not make more sense to implement everything at the schema level? This means that each Schema field on a node should declare its own permissions using RBAC at the Schema level: Ex. Task.name has permissions:
{
read: ['everyone'],
create: ['devops', 'developers', 'managers'],
update: ['owner', 'admins'],
delete: ['owner', 'admins']
}
Then these permissions should be validated before querying the graph. The permissions should generate/modify the GraphQL/CYPHER query in such a way that the to be executed query already describes the constraints/filters that enforces the declared permissions in the Schema. That way no extra permissions graph layer needs to be created in the graph, no full graph scan ALTER TABLE are necessary and no danger for the supernode syndrome. Permissions remain flexible at the Schema level. And also query performance will be greater, since less nodes and relations are involved.
The gist of Schema level permissions is that, it is the Query itself that is generated/templated with the correct embedded enforced schema permissions. That "permissions-aware" query just hits the "open" data graph, and returns only the nodes/relations that the user can access.

Any thoughts on this approach?

Ian Kleats • Mar 14 '20

Thanks for the thoughts! It's a lot to respond to, so please forgive me if I skip anything. There are also four subsequent articles that might answer some of the issues for you.

To one of your last comments: "That 'permissions-aware' query just hits the 'open' data graph, and returns only the nodes/relations that the user can access". Exactly the point!

I've spent a lot of time over the past year digging through the neo4j-graphql-js source code, and I've found it's kind of challenging to extend in the way you've described. This proof-of-concept just uses the filter argument to modify WHERE clauses instead of doing a broader re-write of the neo4j-graphql-js internals.

I think what you bring up are more issues with specific access control structures, not with the implementation of the schema directives. I apologize if my example gave the impression I was advocating one specific structure over another; it was merely for illustration.

Under the simplest RBAC structures, you could definitely do something like what you've laid out. It could even be easily done with the TranslationRule approach I'm putting forward by referencing the JWT claims from the request context. However, this is already mostly solved by the existing support for graphql-auth-directives aside from the over-fetching aspect, so it wasn't what was motivating me.

With anything more complicated, I might be missing something, but I'm still trying to wrap my head around how what you suggested would be applied in an instance where there is heterogeneity of roles for a single user within a single node type (i.e. a User has Owner rights on some Tasks, Editor rights on others, Public rights on many more, and Forbidden/Undefined on the rest).

If such a user were to query the graphQL endpoint with query { Task { somePublicField, someProtectedField } }, we would have to:

a) Store a list of all User IDs qualifying for each permission on each field of the node with the UserID still showing up in the WHERE clause (i.e. filter argument, so able to be accomplished by the current implementation w/o doing more tweaking of cypher strings).
b) Store object references and user-specific claims for all relevant nodes as part of the JWT and perform some type of UNION by claim-level (but where is this information persisted in the first place? If in same Neo4j instance, then likely also accessible through a filter argument).
c) Storing this information as additional labels, nodes, and relationships on the graph which can be referenced by pattern-matching WHERE clauses (i.e. possible with filter argument again).

The level of complexity of whatever access control structure that is implemented is at the discretion (haha, so pun-ish me) of the implementer. I used a very simple example because it is very obvious, not one I'd recommend or use myself necessarily. The point was that, whatever directive support I tried to create, I wanted to be very unopinionated -- ensuring that it could support just about any implementation someone else would dream up.

The supernode issue is a consideration for any graph data model, and it's really up to the implementer to be mindful of and figure out how to overcome that. (And frankly, by the time you're at the scale supernode becomes a problem, you're probably not going to be running a simple GRANDstack anyway.) But... if Neo4j weren't good for modeling permissions, it probably wouldn't be one of the top highlighted use-cases for the database.

Hit me back if you have more questions/thoughts or if I've completely missed the mark on your points.