Recently, I am trying to learn Elasticsearch once again. I used "once again" because I wanted to learn it since late 2016 and in between the time frame, I tried learning it several times and as always I have failed myself to learn it. And just like every other time, I am motivated this time as well 😉
Motive
To learn elasticsearch, you need lots of data to make queries as you want. I searched a few places to get a valid dump. But I couldn't find any dump that I can go with. Also what I found online, I am not familiar with the types of data. So, I thought to make a generator of my own. I have already used Artisan Console and fzaninotto/Faker, that's why I thought to make a generator that anyone can use with their terminal and generate the dump the way they wish.
The repository
This is the repository that you can use to generate the dump.
ssi-anik
/
elasticsearch-sample-data-generator
Sample data generator and writes in file to upload to Elasticsearch for bulk upload
elasticsearch-sample-data-generator
The purpose of the project is to generate a dump for Elasticsearch Bulk API.
Requirements
- Either your local machine should have
composerordockerinstalled to get it working. And the local PHP version should be>=7.3and<8.0
Installation
- Clone the repository.
- If you have
composerinstalled on your local machine and satisfies the requirement, then runcomposer installto install the project dependencies. - If you don't know
phpor the localphprequirement is not satisfied on your machine, then uncomment theCOPY . /appandRUN composer installlines inDockerfile. So, they'll look like the following.
# It'll copy the project in the PHP container.
COPY . /app
# It'll install the project dependencies.
RUN composer install
- Run
cp docker-compose.yml.example docker-compose.yml. - Make changes in your
docker-compose.ymlfile. If you don't need theelasticsearch&kibana, remove those services. - If you made the…
Installation [without docker]
- Clone the repository.
- If your machine has PHP version
>=7.3and<8.0and composer installed, then just runcomposer installbeing in the root of the repository. It'll install the project dependencies.
That's all.
Installation [with docker]
- Clone the repository.
- Uncomment the line
Copy . /appin theDockerfile. - Uncomment the line
RUN composer installin theDockerfile. - Copy the
docker-compose.yml.exampletodocker-compose.yml. - Comment the line
.:/appin your docker-compose.yml'sservices.php.volumes. - Uncomment the line
./dumps:/app/dumpsin your docker-compose.yml'sservices.php.volumes. - If you don't need elasticsearch and kibana services, then just delete them.
- Run
docker-compose up -d --buildto run your containers. - To exec into the PHP service, run
docker-compose exec php bash.
That's all for the docker-based installation. If you're good at docker, you can tweak these things as well by going through the Dockerfile and the docker-compose.yml.
Usage
The repository contains one executable elasticsearch-dump in the root of it. We'll have to use this to run commands and generate dumps.
./elasticsearch-dump generate is the base command. Let's have a look at the available arguments and options.
./elasticsearch-dump generate --help
Description:
Generate dump for elasticsearch bulk API upload
Usage:
generate [options] [--] <fields>
Arguments:
fields Enter the fields definition (required)
Options:
--file[=FILE] Enter the file name [default: "dumps/dump.json"]
--entries[=ENTRIES] Enter the number of entries [default: "1"]
--action[=ACTION] Enter the action name [index or create] [default: "index"]
--index[=INDEX] Enter the index name [default: "my-index"]
--id[=ID] Enter the sequence start value [default: "1"]
--append Append to existing file
--force Does not ask for confirmation
--uuid UUID based ID generation
Options
Before we check the required argument, let's explore the options first. There are few options that expect values and a few are boolean flags. And all the options are optional. You'll override the common values passing these options.
-
--file- Default isdumps/dump.json. You can pass the file name where you want to save the dump. You can pass a relative or absolute path. If the path starts with/then it'll use it as an absolute path. Otherwise, it'll always dump in thedumpsdirectory and considers the file name only. -
--entries- Default is1. The number of entries you want to generate. -
--action- Default isindex. The type of the action. Either it can beindexorcreate. -
--index- Default ismy-index. The name of the index where you'll put these values. -
--id- Default is1. The start position of the sequence. It can only generate a numeric sequence. -
--append- A boolean flag. If exists then it'll append to the existing file. If the file doesn't exist, then it'll create the file and put contents on it. -
--force- A boolean flag. By default, the command will ask you for confirmation. By providing this flag, you can bypass the confirmation. -
--uuid- A boolean flag. If passed, the--idwill not be considered and will generate the UUID-based IDs.
Arguments
The command generates data utilizing the PHP's Faker library. We have to pass the fields that we want to generate with the fake data.
Suppose we want to generate name and address fields. When you pass the fields, you can use the pipe | to separate each field. So, the command looks like the following.
Example:
./elasticsearch-dump generate --entries 10 "name|address"
Here, both the name and address fields are resolved to the Faker's name and address properties. If we have to have a different key for the objects, we can use a colon : to separate them. So, if we want to have firstName in our name fields, and streetAddress in our address field, then we can simply use the following.
Example:
./elasticsearch-dump generate --entries 10 \
"name:firstName|address:streetAddress"
# Generates
# {"name":"Roosevelt","address":"45647 Judy Isle"}
Here, the name key will be in the object, containing the firstName as well as the streetAddress value in the address key. Now, firstName and the streetAddress are resolved to the faker's property.
If the faker wants you to pass a method, you can also do it by passing as a method.
Example:
./elasticsearch-dump generate --entries 10 \
"name:firstName|id:numerify('ID-####')|amount:numberBetween(1000, 9000)"
# Generates
# {"name":"Lourdes","id":"ID-4912","amount":1004}
Object nesting
When passing your fields to the command's argument, you can pass nest objects using the dot notation.
Example:
./elasticsearch-dump generate --entries 10 \
"student.name:firstName|student.age:numberBetween(20, 27)|id:numerify('ID-####')"
# Generates
# {"student":{"name":"Chandler","age":20},"id":"ID-4386"}
Check the JSON. The student object contains the name and age within it. The ID field is outside the student object.
Extending the faker functionality
If the faker doesn't provide the type of data you want and you want to extend it, you can also do so by providing an array of values in the project's config/source.php file. The file already contains designation as an example. You can call the custom provider using the custom('key') format.
Example:
./elasticsearch-dump generate "name|designation:custom('designation')"
# Generates
# {"name":"Annabelle Balistreri","designation":"HR Managers"}
So, for our case custom('designation'), where designation is the key in the config/source.php file.
Hope this helps you to generate lots of data.
Happy coding. ❤️
Top comments (0)