8P Inc.

Posted on Jul 16, 2019

Performance Testing Elasticsearch with "Rally"

#elasticsearch #elk #rally #performancetesting

มาช้า...แต่มานะ..

ถ้าใครอ่านบนความที่ผ่านมาไปแล้ว น่าจะพอคุ้นๆ ว่าผมทำ performance test ด้วย วันนี้ก็เลยจะมา share ครับ ว่า Tool นั้นคืออะไร !!!

Tool นั้นก็คือ ... "Rally" นั่นเอง หลายคนอาจถามว่า ...

คำตอบก็ง่ายๆ ... "ผมใช้ jmeter ไม่เป็น"

เกริ่นมาพอสมควร เรามาเริ่มต้นกันด้วย เหตุผลในการเลือกใช้ Rally กันดีกว่า

เหตุผลที่เลือกใช้ Rally

เป็นของ Elastic เอง
ไม่ต้องนั่งหา dataset มา load เอง
install ง่าย
เก็บผลลง Elasticseach แล้ว plot graph ผ่าน Kibana ได้เลย

การทำงานของ Rally เป็นดังนี้

Source: Daniel Mitterdorfer's Presentation

ทำการ provision ไปยัง Elasticsearch ทำการสร้าง index เรียบร้อย
ทำการ download track (dataset ที่ใช้ในการ performance testing)
ยิง load เข้าไปยัง cluster ที่เราจะ test
เมื่อยิงเสร็จ ทำการ get result และ statistics ต่างๆ กลับมาให้เรา
ถ้าเราเลือกให้เก็บผล Rally จะ เอา result และ statistics ที่ได้ไปเก็บที่ elastic cluster ที่เราต้องการ

เมื่อเข้าใจการทำงานของมันกันแล้ว เรามาเริ่มลงมือทำกันเลย

Environment

OS: CentOS7 x86_64
Requirements: Python 3.5+ including pip3, git 1.9+
Applications:
 - Elasticsearch 7.2.0
 - Rally 1.2.1

Setup

1.เนื่องจาก Rally ต้องการ git v1.9 ขึ้นไป แต่ใน repo ของ CentOS มีให้แค่ 1.8 เท่านั้น ดังนั้น เราต้องไปเอาจากแหล่งอื่น ในที่นี้เราจะไปเอาที่ Wandisco Git Repo

# cat > /etc/yum.repos.d/wandisco-git.repo <<EOF
[wandisco-git]
name=Wandisco GIT Repository
baseurl=http://opensource.wandisco.com/centos/7/git/\$basearch/
enabled=1
gpgcheck=1
gpgkey=http://opensource.wandisco.com/RPM-GPG-KEY-WANdisco
EOF
# rpm --import http://opensource.wandisco.com/RPM-GPG-KEY-WANdisco
# yum install gcc python36-pip python36-devel git
# git --version
git version 2.22.0

2.ทำการ update pip ให้เป็น version ล่าสุด

# pip3 install --upgrade pip

3.ออกจาก user root, set PATH ใหม่ แล้วทำการลง esrally ด้วย pip

# exit
$ echo 'export PATH=/usr/local/bin:$PATH' >> ~/.bashrc
$ source ~/.bashrc
$ pip3 install esrally

4.ทำการ init configure Rally ด้วย comand esrally

$ esrally
    ____        ____
   / __ \____ _/ / /_  __
  / /_/ / __ `/ / / / / /
 / _, _/ /_/ / / / /_/ /
/_/ |_|\__,_/_/_/\__, /
                /____/

Running simple configuration. Run the advanced configuration with:

  esrally configure --advanced-config

* Setting up benchmark root directory in /home/sawitmee/.rally/benchmarks
* Setting up benchmark source directory in /home/sawitmee/.rally/benchmarks/src/elasticsearch

Configuration successfully written to /home/sawitmee/.rally/rally.ini. Happy benchmarking!

More info about Rally:

* Type esrally --help
* Read the documentation at https://esrally.readthedocs.io/en/1.2.1/
* Ask a question on the forum at https://discuss.elastic.co/c/elasticsearch/rally

5.Delete track "nested" ออก เนื่องจากมันทำให้ error

$ rm -rf ${HOME}/.rally/benchmarks/tracks/default/nested

6.ทำการ list track ดูว่ามันมี datasets อะไรให้เราเล่นบ้าง ในที่นี้ผมเลือก http_logs เพราะผมตั้งใจเอาไว้เก็บ access log จาก web servers

$ esrally list tracks
    ____        ____
   / __ \____ _/ / /_  __
  / /_/ / __ `/ / / / / /
 / _, _/ /_/ / / / /_/ /
/_/ |_|\__,_/_/_/\__, /
                /____/

error: cannot rebase: You have unstaged changes.
error: Please commit or stash them.
[WARNING] Local changes in [/root/.rally/benchmarks/tracks/default] prevent tracks update from remote. Please commit your changes.
Available tracks:

Name           Description                                                                                                                                                                        Documents    Compressed Size    Uncompress
-------------  ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  -----------  -----------------  ----------
eventdata      This benchmark indexes HTTP access logs generated based sample logs from the elastic.co website using the generator available in https://github.com/elastic/rally-eventdata-track  20,000,000   755.1 MB           15.3 GB   
geonames       POIs from Geonames                                                                                                                                                                 11,396,505   252.4 MB           3.3 GB    
geopoint       Point coordinates from PlanetOSM                                                                                                                                                   60,844,404   481.9 MB           2.3 GB    
geopointshape  Point coordinates from PlanetOSM indexed as geoshapes                                                                                                                              60,844,404   470.5 MB           2.6 GB    
geoshape       Shapes from PlanetOSM                                                                                                                                                              60,523,283   13.4 GB            45.4 GB   
http_logs      HTTP server log data                                                                                                                                                               247,249,096  1.2 GB             31.1 GB   
metricbeat     Metricbeat data                                                                                                                                                                    1,079,600    87.6 MB            1.2 GB    
noaa           Global daily weather measurements from NOAA                                                                                                                                        33,659,481   947.3 MB           9.0 GB    
nyc_taxis      Taxi rides in New York in 2015                                                                                                                                                     165,346,692  4.5 GB             74.3 GB   
percolator     Percolator benchmark based on AOL queries                                                                                                                                          2,000,000    102.7 kB           104.9 MB  
pmc            Full text benchmark with academic papers from PMC                                                                                                                                  574,199      5.5 GB             21.7 GB   
so             Indexing benchmark using up to questions and answers from StackOverflow                                                                                                            36,062,278   8.9 GB             33.1 GB   

-------------------------------
[INFO] SUCCESS (took 2 seconds)
-------------------------------

Let's Performance Testing

1.Run Performance Test

$ esrally --track=http_logs --target-hosts=192.168.1.100:9200 --pipeline=benchmark-only

Note:

ในที่นี้เพื่อความง่ายผมจะไม่ให้มันเก็บ data แต่ทำการ load test อย่างเดียว โดยใส่ option --pipeline=benchmark-only
ทำการระบุ IP ของ ของ coordinator หรือ ingress node ของ cluster ของเราใน option --target-hosts=<IP>:<Port>
ในการ run ครั้งแรกจะใช้เวลานานมาก เพราะ Rally มันต้องไป load datasets ลงมาเก็บไว้ แล้วจำทำการ load test

เอ้า รอ.......

ในที่สุด !!!! และแล้วผลก็ออกมา เรามาดูผลกันเลยว่าหน้าตาเป็นอย่างไร

------------------------------------------------------
    _______             __   _____
   / ____(_)___  ____ _/ /  / ___/_________  ________
  / /_  / / __ \/ __ `/ /   \__ \/ ___/ __ \/ ___/ _ \
 / __/ / / / / / /_/ / /   ___/ / /__/ /_/ / /  /  __/
/_/   /_/_/ /_/\__,_/_/   /____/\___/\____/_/   \___/
------------------------------------------------------

|   Lap |                                                         Metric |                Task |       Value |    Unit |
|------:|---------------------------------------------------------------:|--------------------:|------------:|--------:|
|   All |                     Cumulative indexing time of primary shards |                     |      208.92 |     min |
|   All |             Min cumulative indexing time across primary shards |                     |  0.00228333 |     min |
|   All |          Median cumulative indexing time across primary shards |                     |     1.74452 |     min |
|   All |             Max cumulative indexing time across primary shards |                     |     33.5324 |     min |
|   All |            Cumulative indexing throttle time of primary shards |                     |           0 |     min |
|   All |    Min cumulative indexing throttle time across primary shards |                     |           0 |     min |
|   All | Median cumulative indexing throttle time across primary shards |                     |           0 |     min |
|   All |    Max cumulative indexing throttle time across primary shards |                     |           0 |     min |
|   All |                        Cumulative merge time of primary shards |                     |     22.1641 |     min |
|   All |                       Cumulative merge count of primary shards |                     |          58 |         |
|   All |                Min cumulative merge time across primary shards |                     |           0 |     min |
|   All |             Median cumulative merge time across primary shards |                     |           0 |     min |
|   All |                Max cumulative merge time across primary shards |                     |     5.03668 |     min |
|   All |               Cumulative merge throttle time of primary shards |                     |       7.913 |     min |
|   All |       Min cumulative merge throttle time across primary shards |                     |           0 |     min |
|   All |    Median cumulative merge throttle time across primary shards |                     |           0 |     min |
|   All |       Max cumulative merge throttle time across primary shards |                     |       1.901 |     min |
|   All |                      Cumulative refresh time of primary shards |                     |     5.78282 |     min |
|   All |                     Cumulative refresh count of primary shards |                     |         548 |         |
|   All |              Min cumulative refresh time across primary shards |                     | 8.33333e-05 |     min |
|   All |           Median cumulative refresh time across primary shards |                     |    0.167417 |     min |
|   All |              Max cumulative refresh time across primary shards |                     |    0.353017 |     min |
|   All |                        Cumulative flush time of primary shards |                     |     13.9207 |     min |
|   All |                       Cumulative flush count of primary shards |                     |         131 |         |
|   All |                Min cumulative flush time across primary shards |                     | 0.000983333 |     min |
|   All |             Median cumulative flush time across primary shards |                     |   0.0626167 |     min |
|   All |                Max cumulative flush time across primary shards |                     |     2.21502 |     min |
|   All |                                             Total Young Gen GC |                     |     394.982 |       s |
|   All |                                               Total Old Gen GC |                     |       0.813 |       s |
|   All |                                                     Store size |                     |     20.7492 |      GB |
|   All |                                                  Translog size |                     |     14.6162 |      GB |
|   All |                                         Heap used for segments |                     |     88.7109 |      MB |
|   All |                                       Heap used for doc values |                     |   0.0732574 |      MB |
|   All |                                            Heap used for terms |                     |      75.681 |      MB |
|   All |                                            Heap used for norms |                     |   0.0206299 |      MB |
|   All |                                           Heap used for points |                     |     5.78467 |      MB |
|   All |                                    Heap used for stored fields |                     |     7.15138 |      MB |
|   All |                                                  Segment count |                     |         334 |         |
|   All |                                                 Min Throughput |        index-append |     90276.5 |  docs/s |
|   All |                                              Median Throughput |        index-append |      101049 |  docs/s |
|   All |                                                 Max Throughput |        index-append |      108923 |  docs/s |
|   All |                                        50th percentile latency |        index-append |     278.056 |      ms |
|   All |                                        90th percentile latency |        index-append |     747.217 |      ms |
|   All |                                        99th percentile latency |        index-append |     2807.96 |      ms |
|   All |                                      99.9th percentile latency |        index-append |     7787.64 |      ms |
|   All |                                     99.99th percentile latency |        index-append |     11270.2 |      ms |
|   All |                                       100th percentile latency |        index-append |     14497.8 |      ms |
|   All |                                   50th percentile service time |        index-append |     278.056 |      ms |
|   All |                                   90th percentile service time |        index-append |     747.217 |      ms |
|   All |                                   99th percentile service time |        index-append |     2807.96 |      ms |
|   All |                                 99.9th percentile service time |        index-append |     7787.64 |      ms |
|   All |                                99.99th percentile service time |        index-append |     11270.2 |      ms |
|   All |                                  100th percentile service time |        index-append |     14497.8 |      ms |
|   All |                                                     error rate |        index-append |           0 |       % |
|   All |                                                 Min Throughput |             default |        8.01 |   ops/s |
|   All |                                              Median Throughput |             default |        8.01 |   ops/s |
|   All |                                                 Max Throughput |             default |        8.02 |   ops/s |
|   All |                                        50th percentile latency |             default |     5.36076 |      ms |
|   All |                                        90th percentile latency |             default |      6.3142 |      ms |
|   All |                                        99th percentile latency |             default |      10.279 |      ms |
|   All |                                       100th percentile latency |             default |     36.7274 |      ms |
|   All |                                   50th percentile service time |             default |     5.15145 |      ms |
|   All |                                   90th percentile service time |             default |     6.13836 |      ms |
|   All |                                   99th percentile service time |             default |     10.0726 |      ms |
|   All |                                  100th percentile service time |             default |      36.509 |      ms |
|   All |                                                     error rate |             default |           0 |       % |
|   All |                                                 Min Throughput |                term |       50.06 |   ops/s |
|   All |                                              Median Throughput |                term |       50.07 |   ops/s |
|   All |                                                 Max Throughput |                term |       50.07 |   ops/s |
|   All |                                        50th percentile latency |                term |     5.81547 |      ms |
|   All |                                        90th percentile latency |                term |     6.81667 |      ms |
|   All |                                        99th percentile latency |                term |     14.4916 |      ms |
|   All |                                       100th percentile latency |                term |     28.0648 |      ms |
|   All |                                   50th percentile service time |                term |     5.68062 |      ms |
|   All |                                   90th percentile service time |                term |     6.65691 |      ms |
|   All |                                   99th percentile service time |                term |     11.7318 |      ms |
|   All |                                  100th percentile service time |                term |     27.8995 |      ms |
|   All |                                                     error rate |                term |           0 |       % |
|   All |                                                 Min Throughput |               range |           1 |   ops/s |
|   All |                                              Median Throughput |               range |           1 |   ops/s |
|   All |                                                 Max Throughput |               range |           1 |   ops/s |
|   All |                                        50th percentile latency |               range |     805.266 |      ms |
|   All |                                        90th percentile latency |               range |     866.193 |      ms |
|   All |                                        99th percentile latency |               range |     1131.61 |      ms |
|   All |                                       100th percentile latency |               range |     1313.74 |      ms |
|   All |                                   50th percentile service time |               range |     804.106 |      ms |
|   All |                                   90th percentile service time |               range |     856.583 |      ms |
|   All |                                   99th percentile service time |               range |      931.64 |      ms |
|   All |                                  100th percentile service time |               range |     1313.47 |      ms |
|   All |                                                     error rate |               range |           0 |       % |
|   All |                                                 Min Throughput |          hourly_agg |         0.2 |   ops/s |
|   All |                                              Median Throughput |          hourly_agg |         0.2 |   ops/s |
|   All |                                                 Max Throughput |          hourly_agg |         0.2 |   ops/s |
|   All |                                        50th percentile latency |          hourly_agg |     2494.07 |      ms |
|   All |                                        90th percentile latency |          hourly_agg |      2624.2 |      ms |
|   All |                                        99th percentile latency |          hourly_agg |     2819.63 |      ms |
|   All |                                       100th percentile latency |          hourly_agg |     2948.67 |      ms |
|   All |                                   50th percentile service time |          hourly_agg |     2491.42 |      ms |
|   All |                                   90th percentile service time |          hourly_agg |     2621.63 |      ms |
|   All |                                   99th percentile service time |          hourly_agg |      2817.1 |      ms |
|   All |                                  100th percentile service time |          hourly_agg |     2945.98 |      ms |
|   All |                                                     error rate |          hourly_agg |           0 |       % |
|   All |                                                 Min Throughput |              scroll |       25.02 | pages/s |
|   All |                                              Median Throughput |              scroll |       25.05 | pages/s |
|   All |                                                 Max Throughput |              scroll |       25.11 | pages/s |
|   All |                                        50th percentile latency |              scroll |     542.954 |      ms |
|   All |                                        90th percentile latency |              scroll |     650.621 |      ms |
|   All |                                        99th percentile latency |              scroll |     762.191 |      ms |
|   All |                                       100th percentile latency |              scroll |     795.128 |      ms |
|   All |                                   50th percentile service time |              scroll |     542.432 |      ms |
|   All |                                   90th percentile service time |              scroll |     650.222 |      ms |
|   All |                                   99th percentile service time |              scroll |     761.621 |      ms |
|   All |                                  100th percentile service time |              scroll |     794.628 |      ms |
|   All |                                                     error rate |              scroll |           0 |       % |
|   All |                                                 Min Throughput | desc_sort_timestamp |         0.8 |   ops/s |
|   All |                                              Median Throughput | desc_sort_timestamp |         0.8 |   ops/s |
|   All |                                                 Max Throughput | desc_sort_timestamp |         0.8 |   ops/s |
|   All |                                        50th percentile latency | desc_sort_timestamp |     1172.69 |      ms |
|   All |                                        90th percentile latency | desc_sort_timestamp |     1397.25 |      ms |
|   All |                                        99th percentile latency | desc_sort_timestamp |     1671.16 |      ms |
|   All |                                       100th percentile latency | desc_sort_timestamp |     1736.43 |      ms |
|   All |                                   50th percentile service time | desc_sort_timestamp |      1163.1 |      ms |
|   All |                                   90th percentile service time | desc_sort_timestamp |     1260.04 |      ms |
|   All |                                   99th percentile service time | desc_sort_timestamp |     1367.37 |      ms |
|   All |                                  100th percentile service time | desc_sort_timestamp |     1581.69 |      ms |
|   All |                                                     error rate | desc_sort_timestamp |           0 |       % |
|   All |                                                 Min Throughput |  asc_sort_timestamp |         0.8 |   ops/s |
|   All |                                              Median Throughput |  asc_sort_timestamp |         0.8 |   ops/s |
|   All |                                                 Max Throughput |  asc_sort_timestamp |         0.8 |   ops/s |
|   All |                                        50th percentile latency |  asc_sort_timestamp |     1122.85 |      ms |
|   All |                                        90th percentile latency |  asc_sort_timestamp |     1217.99 |      ms |
|   All |                                        99th percentile latency |  asc_sort_timestamp |     1339.78 |      ms |
|   All |                                       100th percentile latency |  asc_sort_timestamp |     1344.91 |      ms |
|   All |                                   50th percentile service time |  asc_sort_timestamp |     1118.84 |      ms |
|   All |                                   90th percentile service time |  asc_sort_timestamp |     1217.83 |      ms |
|   All |                                   99th percentile service time |  asc_sort_timestamp |     1335.92 |      ms |
|   All |                                  100th percentile service time |  asc_sort_timestamp |     1339.58 |      ms |
|   All |                                                     error rate |  asc_sort_timestamp |           0 |       % |

----------------------------------
[INFO] SUCCESS (took 6118 seconds)
----------------------------------

อู้ยยย.... เยอะจัง ผมจะคัดหลักๆ มาดูละกันเนอะ ที่เหลือไปดูใน Doc เองนะ 😋

Understanding the Race Results

จาก track ที่เราเลือกมีทั้งหมด 247,249,096 documents

Indexing Time คือ เวลาที่ใช้ในการ load ทั้ง 247,249,096 documents เข้าไปยัง Elasticsearch Cluster

Lap	Metric	Task	Value	Unit
All	Cumulative indexing time of primary shards		208.92	min

Indexing throttle time คือ จำนวนเวลารวมที่ elasticsearch สั่งให้ Rally ชะลอการยิง load (น้อยๆ ดี)

Lap	Metric	Task	Value	Unit
All	Cumulative indexing throttle time of primary shards		0	min

Throughput คือ Events per Second (EPS) ที่ cluster สามารถรับได้

Lap	Metric	Task	Value	Unit
All	Min Throughput	index-append	96587.6	docs/s
All	Median Throughput	index-append	103298	docs/s
All	Max Throughput	index-append	115656	docs/s

Error rate คือ Errors response codes หรือ Exceptions ที่ได้รับจาก Elasticsearch (ควรจะเป็น 0%)

Lap	Metric	Task	Unit
All	error rate	index-append	%
All	error rate	default	%
All	error rate	term	%
All	error rate	range	%
All	error rate	hourly_agg	%
All	error rate	scroll	%
All	error rate	desc_sort_timestamp	%
All	error rate	asc_sort_timestamp	%

เย่ !!! จบแล้วครับ เป็นไงบ้าง ง่ายม่ะ งาน chill งานสบาย ไว้ใจผม 😆

DEV Community