Recently, we have worked on a project that uses Neo4j to store and process large graph data for our client. Our client has been asked for a solution to launch, install and configure a Neo4j single node (for the development environment and High Availability Neo4j cluster (for production environment). Our team has selected Ansible to implement this requirement, if you wanted to know why we selected Ansible, check out this article for more details. If you wanted to know why we selected Ansible, check out this article.
Before getting started, I just wanted to give you a note that there are a couple of options to deploy Neo4j on AWS, so you might take a look before looking for details or select the best option that works for you:
- Neo4j Community AWS Marketplace, easy to launch instances and configure networking such as VPCs, storage.
- Neo4j Enterprise Causal Cluster, which deploys multi-node causal clusters.
This article provides a step by step guild on how to launch, install and configure high availability Neo4j cluster (aka HA cluster) using Ansible on AWS. We use AWS for demonstration, but you are able to custom the playbooks and configurations for other cloud vendors such as Google Cloud, Azure.
In order to use this Ansible playbook on AWS, the following is needed:
- An AWS account with a user's access key and secret key.
- An IAM policy attached to the above user that allows launching new EC2 instances, authorize ports in security groups.
- An EC2 Key-Pair to allow SSH to EC2 instances.
- git installed on your machine
To deploy Neo4j, what are we going to build is the following deployment flow:
- Setup security groups and authorize ports communication
- Launch EC2 instance(s) - optional
- Update OS
- Install Neo4j Enterprise on EC2 instance(s)
- Install HAProxy and configure HA cluster on EC2 instance(s) - only required for HA cluster
Before deploying, a security group needs to be created that the Neo4j cluster/instance will use. In fact, you can create multiple security groups for different purposes such as allow SSH to instance, allow Neo4j communication between each other. But to simplify the process, we will use one security groups that allow the following ports:
- 22 (SSH)
- Neo4j Ports listed on this page
- 8000 - HA admin port for HA cluster deployment
Login to AWS Console Management portal and create a security group and open the above ports like below screenshot:
A well-defined project structure will help us easy to understand each part of the solution, allow reuse, and customizable. If you have experience in working with Ansible, you should know how to organize the Ansible project. I followed the alternative approach mentioned in this article, feel free to select your own approach.
The above project structure contains the following:
- extension/setup: contain scripts to install Ansible and required python packages
- inventories/[env]: define all variables for playbooks that allow us custom for each environment
roles: predefined and reusable roles for our playbooks. In this solution, we use the following roles:
- common: the common role to install common package or update the latest OS version.
- haproxy: the role to install and configure HAProxy.
- launch-ec2: the role to launch EC2 instances in multiple AZ.
- neo4j: install and configure Neo4j on a single instance.
- templates: template files for configuring Neo4j instances as well as HAProxy config files
- There are two main playbooks:
- neo4j.single.yml: the playbook to launch and install a single Neo4j node.
- neo4j.cluster.yml: the playbook to launch and install an HA Neo4j cluster.
Step 1 - Clone/download source code from Github using GIT
change directory into the newly created directory
Step 2 - Install Ansible and required python packages
chmod +x extension/setup/setup.sh ./extension/setup/setup.sh
Step 3 - Decrypt Ansible Vault file
A vault file contains sensitive information so that we shouldn't commit to source control in plaintext. So we need to encrypt it before committing to source control. Using Ansible vault so this problem. In this repo, we committed the password file for demo purpose, please note that you should not commit the password file into Source control.
Run below command to decrypt the vault.yml file in the inventory directory:
ansible-vault decrypt inventories/dev/group_vars/vault.yml --vault-password-file ansible-vault.pass
Step 4 - Update vault.yml file
--- # Sensitive variables here are applicable to deploy application aws_access_key: <<your access key>> aws_secret_key: <<your secret access key>> # The security group id to be attached to new instance security_group: <<your security group id>> # An Amazone Linux image image: <<AMI id i.e. ami-048a01c78f7bae4aa>> # The first subnet to launch instance, it should be public subnet if you allowed public access vpc_az1_subnet_id: <<your subnet 1 id>> # The second subnet to launch instance, it should be public subnet if you allowed public access vpc_az2_subnet_id: <<your subnet 2 id>> # Set initial password for Neo instances initial_password: <<your password>> # HAProxy configuration (requires for cluster mode with HAProxy) stats_user: <<your HAProxy username>> stats_pass: <<your HAProxy user password>>
Deploy a single Neo4j Node
Step 1 - Update groups variables
Review and update variables in the inventory\dev\group_vars\neo4j-single.yml file, below are some important variables:
- region: an AWS region to launch and deploy Neo4j
- keypair: an existing key-pair on the above region
- other variables: feel free to update according to your requirement
Step 2- Run Ansible playbook
Run below command to deploy a single Neo4j instance for
dev environment. Replace
dev to any existing inventory in the
inventories directory (i.e. staging, prod)
ansible-playbook neo4j.single.yml -e env=dev --vault-password-file ansible-vault.pass # or we can use -b -K to enter SUDO password (sudo su) ansible-playbook neo4j.single.yml -e env=dev --vault-password-file ansible-vault.pass -b -K
Wait until the command finished and access to Neo4j browser at http://public-ip:7474
Deploy an HA Neo4j cluster
Execute the same steps above with
- neo4j-cluster.yml group variables file
- neo4j.single.yml the Ansible playbook
Check out the result of each case by watching those videos on our Youtube channel:
If you have any issues when practicing this instruction, feel free to let us know by giving us our comments.
My name is Hoang, I am the Co-founder and CTO of InnomizeTech. My title is CTO but I am a full-stack developer and software architect, passionate about Cloud Computing, Serverless, DevOps, Machine Learning, and IoT.
If you are looking for developers, offshore team, or need consulting about the AWS cloud, Serverless architecture, and so on, then hire us, we can help you!
Thank you for reading my article.