Improving developer experience with internal tooling at Airasia

I enjoy building software so much😁, but various challenges can sometimes make the process less enjoyable. At work, I prioritize both efficiency and the satisfaction of my engineers, recognizing that smooth workflows and effective tools are crucial to a positive working environment. Happy engineers, great work (I hope so😅). Gotta keep things from being boring

In this article, I highlight some of the key challenges that hinder engineer workflows, such as manual processes, slow response to scaling needs, and security management, and present the solutions I've implemented. From Jira automation and Cloudflare cache management to Kubernetes connection abstraction and secret management pipelines, I've developed tools and processes that streamline these aspects, thereby enhancing productivity and overall job satisfaction.

Jira Automation

I focused on enhancing the developer experience by creating automation tools. By collaborating with the QA team, I identified key areas where workflows could be streamlined, reducing repetitive tasks. This effort led to the development of a QA verification pipeline that automates the creation of test verification tickets once pull requests are merged. It also automates the creation of retrospective notes in Confluence at the end of each sprint.

Additionally, I analyzed engineers' workflows in Jira and identified opportunities to automate several manual processes. Now, the system automatically updates ticket statuses: moving a ticket to "In Review" when a Merge Request (MR) is opened, shifting it to "QA" when the MR is merged, transitioning it to "In Progress" upon branch creation, and mentioning the Engineering Manager for P0 tickets. 😄

These automation tools save time and reduce the cognitive load on engineers, allowing them to focus more on coding and less on administrative tasks, thus enhancing their overall work experience.

Cloudflare Automation

Previously, clearing the Cloudflare cache required manual intervention, which often led to delays and potential inconsistencies in the deployment process. Recognizing the inefficiencies and the risk of errors, I took the initiative to automate the cache clearing process. I integrated this automation into the deployment pipeline for all frontend deployments. Now, the cache is automatically cleared with each deployment, eliminating the need for manual action. This automation not only significantly reduces the time required for new changes to go live but also ensures a more consistent and reliable update process. As a result, we've seen a noticeable improvement in shipment efficiency, allowing the team to focus on other critical tasks and deliver a smoother experience for our users.

Abstraction of Kubernetes Connection

In working with Kubernetes, I noticed that engineers often faced challenges in efficiently managing cluster connections, especially when dealing with multiple microservices. The process of manually fetching cluster endpoint and authentication data, generating kubeconfig entries, and setting the appropriate context within the cluster was time-consuming and prone to errors. This not only slowed down the workflow but also created barriers to quick and effective debugging, as engineers needed to repeatedly perform these steps to access essential resources like pods, logs, and secrets.

To address these issues, I implemented an abstraction layer for Kubernetes connections. This solution automates the key processes: fetching the necessary cluster endpoint and authentication data, generating kubeconfig entries, and setting the appropriate context. By integrating these functions into a common library, I made it easy for engineers to source the abstraction directly from the terminal while working on any microservice. This setup streamlines access to essential resources, enabling engineers to quickly and easily interact with Kubernetes environments.

The abstraction layer not only saves time but also reduces the complexity involved in managing Kubernetes connections. By automating repetitive tasks, it minimizes the potential for human error and ensures consistent configurations across different services. This has proven to be a valuable tool for debugging, as engineers can now access critical information swiftly, leading to faster issue resolution. Overall, this abstraction layer has enhanced our team's productivity and efficiency, allowing us to focus more on developing and improving our applications rather than getting bogged down in infrastructure management tasks.

Pipeline Job to Update Secrets

In managing Kubernetes clusters, I recognized a significant challenge in securely handling secrets, such as API keys and credentials. The manual process of updating these sensitive pieces of information was not only time-consuming but also prone to human error, which could lead to security vulnerabilities. Ensuring that these secrets were consistently and correctly updated was crucial to maintaining the security and integrity of our deployments.

To address this issue, I established a pipeline job specifically for managing secrets within Kubernetes clusters. This automated job accesses the cluster, retrieves the relevant secret files, and either creates or updates the secrets according to the specifications defined in a JSON file. It also includes handling the Base64 encryption of these secrets, ensuring that the data remains secure throughout the process.

By automating the management of sensitive information, this pipeline job significantly reduces the risk of manual errors. It ensures that all updates to secrets are consistently applied and correctly encrypted, enhancing the overall security of our deployment process. Additionally, the automation frees up valuable time for engineers, who no longer need to manually manage and update secrets, allowing them to focus on more critical tasks. This solution not only strengthens our security posture but also streamlines the workflow, ensuring that our systems are both secure and efficient.

Pre-warming service

To handle increased incoming traffic, especially during peak periods like sales events, it's crucial to scale pods efficiently. However, I noticed that our autoscaler often struggled to adjust pod capacity quickly enough, resulting in errors and service unavailability. This lag not only led to user dissatisfaction but also triggered complaints and unnecessary debugging efforts by engineers, consuming valuable time and resources.

To solve this issue, I introduced a service called sso-fire-flower, designed to automate the adjustment of the minimum and maximum replicas for specific services. This service allows for seamless scaling from any location, automatically resetting the replicas to their original count after one week. It uses a Scale Configuration file to specify the desired replica counts, making the process straightforward and user-friendly.

The implementation of sso-fire-flower includes two primary jobs: change_replica and revert_hpa_file. These jobs are driven by a runtime configuration file generated from the Base and Scale Configuration files, which detail crucial deployment information. By integrating this tool into our CI/CD pipeline, I enabled easy scaling of replicas, providing mechanisms for both automatic reversion and manual control if needed.

Moreover, the tool's Base Configuration file extends its functionality beyond IAM (formerly SSO) users, making it accessible to other teams within the organization. This adaptability allows different teams to tailor the tool to their specific needs, enhancing overall operational efficiency. By automating the scaling process and providing a robust, flexible solution, sso-fire-flower has significantly improved our ability to manage high-traffic periods, reducing service interruptions and streamlining engineering workflows.

Conclusion

In conclusion, enhancing the developer experience at work has been a multi-faceted effort, addressing various pain points across different stages of the software development lifecycle. Through the strategic implementation of automation tools, I've significantly reduced manual tasks, streamlined workflows, and increased efficiency. The Jira automation improvements have not only saved time but also minimized the cognitive load on engineers, allowing them to focus on more critical development tasks. Similarly, automating the Cloudflare cache clearing process has expedited the deployment pipeline, ensuring faster delivery of updates.

The abstraction of Kubernetes connections has simplified the management of cluster interactions, providing engineers with quick access to essential resources and improving debugging capabilities. The introduction of a pipeline job for secret management has bolstered security measures, ensuring sensitive information is handled consistently and securely. Finally, the sso-fire-flower service has addressed the challenges of scaling during high-traffic periods, providing a robust solution for managing pod replicas with minimal manual intervention.

These initiatives collectively enhance not only the efficiency and productivity of my engineering teams but also their overall satisfaction. By automating routine tasks and providing intuitive tools, I've created an environment that fosters innovation and allows our engineers to focus on building high-quality software. As we continue to refine and expand these tools, I remain committed to supporting my engineers and maintaining a cutting-edge development infrastructure at work. See you in the next one!