Sandor Dargo

Posted on Nov 11, 2017

Cypher tutorial: the MATCH keyword

#database #graphs #cypher #neo4j

This article has been originally published on my blog.

After having discussed what one can do using CREATE and slightly mentioning MATCH, let's explore what that MATCH can do for you in Cypher

Luckily on a basic level both CREATE and MATCH are quite self-explanatory. So with MATCH you'll be able to look for certain elements in your database. In short, you can match patterns. Often you will you use MATCH together with WHERE, but not necessarily as I will show you.

Let's see quickly what are the corresponding keywords in SQL. As usual, there is no exact match, but you can think about FROM and WHERE if you think SQL. But while in SQL you heavily rely on WHERE, in Cypher only MATCH can do the work for you in a lot of cases.

When we want to explore the usage of MATCH, we also have to understand another two keywords: WHERE and RETURN. Later I will dedicate an own post for each of them, now let's just quickly skim them through.

`WHERE`

This keyword cannot stand on its own. It goes with any of the following clauses: MATCH, OPTIONAL MATCH, START or WITH. This also means that it can be used in different ways. Now I'm just going to focus on its basic usage with MATCH and then in another article, we'll cover the rest.

WHERE is to add constraints to your patterns defined in MATCH. It these terms it works pretty similar to SQL. You can use use the usual boolean operators (AND, OR, XOR and NOT), you can use it for string matching, quantified comparisons, existence checks, etc. Here is a short (and really non-exhaustive) list of examples:

WHERE region.name = 'Pannon' AND (grape.name = 'Merlot' OR grape.name = 'Cabernet franc')

WHERE subregion.size > 500

WHERE grape.name STARTS WITH 'Cabernet'

WHERE grape.name IN ['Muscat ottonel', 'Cabernet sauvignon']

These are only the simplest expressions, more advanced examples will be part of the post dedicated to the WHERE clause.

`RETURN`

It is like SELECT in SQL. But while in SQL you start with SELECT, in Cypher you end with RETURN. You define what you want to return from your query. In its simplest form, you can return everything: RETURN *. Most of the times, you will return nodes and/or relationships:

MATCH (wineRegion:WineRegion)
RETURN wineRegion

MATCH ()-[relationship:GROWS_AT]->()
RETURN relationship

MATCH (grape:Grape)-[relationship:GROWS_AT]->(subregion:SubRegion)
RETURN grape, relationship, subregion

As you can see if you want to return multiple elements, it's really easy, you can just list the elements separated by commas.

If you don't want to return the whole node, just a single property, it's also really easy, you specify the name of the property after the node/relationship name separated by a dot:

MATCH (wineRegion:WineRegion)
RETURN wineRegion.name

That's enough for now. Let's talk about the MATCH keyword.

Get all nodes

Let's start it easy. Let's return all nodes.

MATCH (n)
RETURN n

It is that simple. Just match every node. You might expect to get a bunch of lonely nodes on your console, but in that case, you're mistaken. All the relationships are retrieved too. I have to be more precise with this latter statement. When you retrieve nodes all the direct relationships will be retrieved which are between those nodes.

You'll see quickly what I mean. But first, have a look at a sub-graph that we will heavily use today. You can see all the wine subregions contained in a region called Eger and all the grapes which grow in these subregions.

Get nodes by name

Let's assume that there are differently labeled nodes with the same name. Still, for some reason, it wouldn't make sense the tag the same node with two different labels. There is such a case in our example graph related to wine regions.

Eger which is a Hungarian city is also a name of a wine region and a wine subregion. If you want to get all the nodes with the same name without declaring the labels you are interested in, you must use the WHERE clause as you can see below. Putting it differently, you cannot write something like this: MATCH (n:{name: 'Eger'}). If you specify an attribute in the MATCH clause you must specify the label too.

MATCH (n)
 WHERE n.name = "Eger"
 RETURN n

Get nodes by label and name

You can can get the same result by executing this query:

MATCH (region:WineRegion{name:'Eger'}), (subregion:WineSubRegion{name:'Eger'})
RETURN region, subregion

In both cases, you could see that when you return these nodes, which are in a direct relationship with each other, the relationships are also shown on the graph returned on the Neo4j console. However, if you check other views, you can see that they are not returned, just shown as a sign of courtesy.

Now if you query two nodes which are in an indirect relationship with each other, like a grape and the region (there is a subregion in between), even on the graph view you won't see any relationship between them, you'll only see two lonely nodes.

MATCH (grape:Grape {name:"Cabernet franc"}), (region:WineRegion {name:"Eger"})
RETURN grape, region

Relationships with undefined directions

If you want to retrieve any direct relationship between some nodes, you can do it this way. You can see that there is no arrow pointing in any direction. Not even the relationship type is defined. We just named it as rel and returned it.

MATCH (region:WineRegion{name:'Eger'})-[rel]-(subregion:WineSubRegion)
RETURN region, subregion, rel

Relationships with defined directions

Now we write our query with the relationship pointing at the wine region. Executing the query one can see that there are no nodes or relationships retrieved.

MATCH (region:WineRegion{name:'Eger'})<-[rel]-(subregion:WineSubRegion)
RETURN region, subregion, rel

Let's turn the direction of that relationship! Now you can see that we're getting the same results as before with the undirected relationships. In fact, the relationships in your graph are always directed, but the Cypher engine will look for both directions.

MATCH (region:WineRegion{name:'Eger'})-[rel]->(subregion:WineSubRegion)
RETURN region, subregion, rel

You can even specify the type of the relationship if you want to be more specific:

MATCH (region:WineRegion{name:'Eger'})-[rel:CONTAINS]->(subregion:WineSubRegion)
RETURN region, subregion, rel

Or you can even specify multiple relationship types to match:

MATCH (region:WineRegion{name:'Eger'})-[rel:CONTAINS|GROWS_AT]->(subregion:WineSubRegion)
RETURN region, subregion, rel

Getting the type of the relationship

Let's say you don't know exactly how some nodes are connected. As we saw in your query you don't have to indicate the relationship type. You can just get every relationship irrespective of their type or direction. You can retrieve their type instead of the relationships themselves. Or both! Just use the type() function.

MATCH (region:WineRegion{name:'Eger'})-[rel:CONTAINS]->(subregion:WineSubRegion)
RETURN region, subregion, rel, type(rel)

You can only use type() with relationships.

Get nodes or relationships by id

If you are building and executing your queries from an application, it's quite probable that you would identify your nodes and/or relationships with their technical IDs. No problem, you can easily retrieve them as such, using their id by using the id() function. You can easily match one single or a list of ids. It's very intuitive as you can see.

MATCH (n)
WHERE id(n) IN [0, 3, 5]
RETURN n

MATCH ()-[r]->()
WHERE id(r)= 0
RETURN r

We are far from finishing exploring the complete list of possibilities that MATCH provides, but that's it for today. I keep some topics about paths to an advanced article. Stay tuned!

DEV Community