AWS Neptune for analysing event ticket sales between users - Part 2

#networks #neptune #visualisation #cypher

This blog follows on from the setup of Neptune DB with Worldwide Events data in the first part. Here we will run some queries and investigate node relationships in the Neptune Notebook.
We can use some of the magic commands available in graph notebook open sourced by AWS and available to use in the Neptune Notebook instance configured in the first part of this blog. We can configure the visualisation options available in the notebook when we execute the query, using the%graph_notebook_vis_options command. This will output a json containing the default configuration options for rendering the graphs.

To modify the executing notebook's vis.js options, we can use %%graph_notebook_vis_options with the modified JSON payload provided in the cell body. For example, in the screenshot below I have switched the physics solver from barnesHut to forceAtlas2Based.

We will also create a mapping between node label and property used to label the node in the visualisation. Run the following in the notebook cell.

display_var = '{"user":"name","event":"name"}'

We will reference $display_var, the value of the variable when running the first query. The %%oc magic command indicates we want to execute an openCypher query. The -d hint is used to enable the mappings defined above, so we pass $display_var. The -l hint sets the maximum length of the text to 20, that can be displayed in a node. The cypher query will return all the connected nodes (along with the edges) in the database.

%%oc -d $display_var -l20
MATCH ((n)-[r]-(p))
RETURN n,r,p

By default, we see the results in text form in the console tab

If we switch to the graph tab, we can see the rendered visualisation. We can zoom in and out using then +/- icons or move the nodes by clicking and dragging. Zooming into a cluster of nodes and relationships should also show the associated labels. Hovering over any nodes should also show the label.

Users from New York

Now lets find out all users (buyers and sellers) who are from New York. We will use the user when to only match the user nodes and then use the WHERE clause to filter only nodes whose property city is New York.

%%oc 
MATCH (t:user)
WHERE t.city = "New York"
RETURN t

Users buying and selling for events in Toronto

To find paths containing users who listed tickets and bought tickets for events in Toronto. We can use the following query to match the path (n:user)-[]-(e:event)-[]-(u:user) and then filter the city property of the event node to Toronto

%%oc  -d $display_var -l20
MATCH p=(n:user)-[]-(e:event)-[]-(u:user)
WHERE e.city = "Toronto"
RETURN p

Match event property directly in path

Let us now match all sellers and buyers for tickets to The Police event. Instead of using the WHERE clause after MATCH we can filter the required paths directly in the first MATCH clause by specifying the property name as The Police in the event node in MATCH statement.

%%oc -d $display_var
MATCH p=(seller:user)-->(event {name: 'The Police'})-->(buyer:user)
RETURN p

Looks like we have two events, for which tickets listed by a user were purchased by another user. However, maybe we need some more granular information regarding the transactions and whether all the tickets listed by the seller were bought.

We can then return the associated properties of the relationships as separate columns in a table. The previous query can be modified to return the properties and relationship type instead of the path and alias the names (which will be the table column names). We also do not need to match the full path (explicitly the directions between users and events) as in the previous query as we are interested in all relationship types connected to the The Police events node(s). This output will not give the option of displaying a graph in the widget as we have not returned a path.

%%oc -d $display_var
MATCH p=()-[e]-(event {name: 'The Police'})
RETURN e.date as event_date,
type(e) AS event_type,
e.quantity as number_of_tickets,
e.price as price

Path length and hops

Here we will try and find a user has listed a ticket for the event Mary Poppins and is at least 11 hops away from any other node. In the cypher query below, we have matched a user node with a directed relationship to event node with property name Mary Poppins. Since this already accounts for the first hop, we need to match the remaining minimum 10 hops from the event node to any other user node. We can achieve this by using variable length pattern matching in cypher which allows users to specify a range of lengths in the relationship description of a pattern. Here we use a lower bound for the range followed by ellipsis to signify no upper bound. Finally we return the username.

MATCH p=(u:user)-->(event {name: 'Mary Poppins'})-[e*10..]-(:user) 
RETURN DISTINCT u.name AS name

This returns user QRG30DIY. Now let's return the path so we can visualise who this person is connected to.

We can modify the query to match the user QRG30DIY who listed the ticket for Mary Poppins event and then return all subsequent relationships and nodes connected any number of hops away from the Mary Poppins event node (using the * notation).

MATCH p=(u:user {name:'QRG30DIY'})-->(event {name: 'Mary Poppins'})-[*]-()
RETURN p

The user node QRG30DIY has been highlighted in the visual below. If we count the number of connections from this node, there are two paths which have at least 11, ending at user nodes have a single connection to the Macbeth event node.

Deleting Resources

Once you have finished with the queries and analysis, you will need to delete the Neptune DB instance and Redshift Serverless namespace (and associated workgroup). The Neptune DB instance can be deleted with or without final snapshot by following the instructions in the docs. Then delete the Redshift Serverless workgroup by following the steps here followed by the namespace.