Holiday preparedness is a yearly practice in which the engineering team at Bloomreach scales the already highly available production systems to break its own limits without compromising on the performance and latency for customers.
The Bloomreach Serving Infrastructure
In last year's blog, the team gave a glimpse of the dependency graph between different services at Bloomreach. A lot has changed since then and in this article, we will focus more on what has changed since last time.
Bloomreach has dockerized the autosuggest service and started running it inside Kubernetes. The Search API server, which runs the Django Python app, has been changed to auto-scale based on the traffic demand.
Below is the bird's eye view of the Bloomreach Search & Merchandising (brSM) serving infrastructure. During the holidays, we scale up our API servers along with the hot backup that we maintain in the west coast datacenter.
Load Testing
Capacity provisioning is useless if we don’t load test our systems to validate our assumptions and check them for any failure points.
Up until last year, we were running load tests solely using Vegeta but this year, we spiced that up with a touch of Kubernetes.
We created a prebaked Docker image with Vegeta for running load tests which take log files from S3 and the endpoint on which we wanted to run the load tests. From there on, it was simply a matter of writing k8s job and we were good to go. Load testing is run for a couple of days. You can find the sample dockerfile for reference below:
###
# python 3 has some issues with s3cmd, currently 2.0.2 is latest
# https://github.com/s3tools/s3cmd/issues/930
###
FROM python:2.7.17-slim-buster
ARG VERSION="12.7.0"
ARG PLATFORM="linux-amd64"
# Add new user to run the whole thing as non-root
RUN addgroup mobile \
&& useradd -g mobile -d /loadtest mobile
RUN apt-get update \
&& apt-get install --no-install-recommends -y vim-tiny wget lzop procps \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
RUN pip install s3cmd=="2.0.2"
# install vegeta
RUN wget -O "./vegeta.tar.gz" "https://github.com/tsenart/vegeta/releases/download/v${VERSION}/vegeta-${VERSION}-${PLATFORM}.tar.gz" \
&& tar -xvf "./vegeta.tar.gz" \
&& mv "./vegeta" "/usr/local/bin/" \
&& rm "./vegeta.tar.gz"
COPY . /loadtest
RUN chown -R mobile:mobile /loadtest
WORKDIR /loadtest
USER mobile
ENTRYPOINT ["/loadtest/docker/entrypoint.sh"]
Delivering a great experience
The stakes and consequently the expectations are higher during the holiday period, as any downtime will have a revenue impact for the customer. By maintaining a 100% uptime for Search, Autosuggest & Dashboard we upheld the customer confidence in Bloomreach.
Search
Search observed a peak QPS of ~1600 on Cyber Monday
Autosuggest
Autosuggest service observed a peak QPS of ~3000, again on Cyber Monday
Organic (Related Searches, Related Products)
Organic had the peak QPS of ~2400
Global Latencies
The graph below shows the latency during this year's holiday period. Average latency is the average of all API calls in the given time window. The average latency of the brSM API was < 150ms, for Suggest it was < 30ms and for Organic RS/RP it was < 2ms.
2019 Holiday Trends
The Bloomreach BA team also did an analysis of shopping trends for this year's holiday season. The major observations are as follows:
- Mobile traffic beats desktop users in both the US and EU.
- Mobile users are spending more when compared to the desktop.
- Search trends generally stayed the same i.e. consumers are still buying the same things.
Acknowledgements
- Purshotam for leading the effort together with Raunak, Naveen, Jyoti, Abhishek & Mayank.
- All other internal teams - Connect, Search Quality, Metal, Dashboard & Analytics for making the holiday preparedness a success.
- Special thanks to the Bloomreach support team for helping us have an incident-free holiday.
Blog written by: Abishek & Purshotam from Bloomreach, 2020
Top comments (0)