DEV Community

B.
B.

Posted on • Updated on

Intro to Neo4r, the movie

Have you ever used Neo4j?

If you haven't here's a little introduction:

Neo4j is the world’s leading graph database. Its architecture is designed for optimal management, storage and traversal of nodes and relationships. The database takes a property graph approach which is beneficial for both traversal performance and operations runtime. Neo4j offers dedicated memory management as well as memory efficient operations. (...) Cypher is a declarative query language for graphs.
Introduction to Neo4j

But what about this post? I'm not going to talk about the basics of Neo4j, nor Cypher, but a driver to connect and analyse Neo4j data from R: Neo4r. I am going to contribute to the package so I thought some posts introducing the subject would be a good idea.

If you already know about Neo4j maybe you know about the movies example which we are going to extend in this post.

Since RNeo4j there hasn't been any other package to connect to Neo4J from R. I loved RNeo4j, the package gave you a great start point to work with the database with a high level API knowing minimal Cypher (not really an API but for example, you could create nodes without any querying, just with a function) as well as the classical way to work with Neo.

Neo4r provides a driver in the most classical sense, you can read and write like you would do directly with any other client. The advantage is working with your data directly on R.

So far package has been mainly coded by Colin Fay and Sébastien Rochette

If you have already your movie data loaded let's get to it.

con <- neo4j_api$new(url = "http://localhost:7474", 
                       user = params$user, password = params$pass)
con$ping()

[1] 200

con$get_labels()

# A tibble: 2 x 1
  labels
  <chr> 
1 Movie 
2 Person

con$get_relationships()

# A tibble: 6 x 1
  labels  
  <chr>   
1 ACTED_IN
2 DIRECTED
3 PRODUCED
4 WROTE   
5 FOLLOWS 
6 REVIEWED

get_labels and get_relationships are specially useful, I never manage to remember all.

From time to time I get obsessed with The Godfather movies, I was going full obsession doing this tutorial. Let's see what info there is about The Godfather in the database already:

"MATCH (m:Movie {title: 'The Godfather'}) RETURN m" %>%
  call_neo4j(con)

list()
attr(,"class")
[1] "neo"  "neo"  "list"

None 🙀. We need to remedy that:

send_cypher('../data/thegodfather.cypher', con)

[[1]]
# A tibble: 12 x 2
   type                  value
   <chr>                 <dbl>
 1 contains_updates          1
 2 nodes_created            15
 3 nodes_deleted             0
 4 properties_set           48
 5 relationships_created    28
 6 relationship_deleted      0
 7 labels_added             15
 8 labels_removed            0
 9 indexes_added             0
10 indexes_removed           0
11 constraints_added         0
12 constraints_removed       0

What is this send_cypher that I wrote? That's 50 lines of query to add not only the movies but all main actors, crew and relationships. You can see all here.

Now that we have some information we can work with the data. A cool thing is seing the relationships between the characters and the movies. That would be something like:

"WITH '.*Corleone.*' as param MATCH a=((p:Person)-[r:ACTED_IN]->(:Movie)) WHERE ANY(item IN r.roles WHERE item =~ param) RETURN a" %>%
  call_neo4j(con, type = 'graph') -> corleones

nodes <- corleones$nodes %>%
  unnest_nodes(what = 'properties')

rels <- corleones$relationships %>%
  unnest_relationships() %>%
  select(from = startNode, to = endNode, label = type)

nodes %<>%
  left_join(corleones$relationships %>%
              select(startNode, properties) %>%
              tidyr::unnest(properties) %>%
              tidyr::unnest(properties) %>% 
              tidyr::unnest(properties), by = c('id' = 'startNode')) %>%
  mutate(label = if_else(label == 'Movie', 
                         title, properties)) %>%
  distinct(id, .keep_all = T)

visNetwork(nodes, rels) %>%
  visEdges(arrows = 'to') %>% 
  visOptions(highlightNearest = TRUE, , nodesIdSelection = TRUE)

We have used VisNetwork to visualize the graph and the result is simple but effective

Corleone graph

Why all the unnesting? Neo4j returns JSON format, so we get a list in R. But there's so many more possibilities! You can get a dataframe if you ask for another type of information like relationships or counts.

Of course this was just an appetizer, let's continue improving the package! If you would like me to exaplain something specific about Neo4j or the package Neo4r please reach out!

You can see the full example and the interactive graph in

GitHub logo neo4j-examples / movies-rstats-neo4r

Example Project for R using neo4r

movies-rstats-neo4r

Example Project for R using neo4r

Top comments (0)