DEV Community

Gyeongjun Paik
Gyeongjun Paik

Posted on • Updated on

One Year of DevOps at Idus: Reflections and Learnings

global.idus.com

From Work Experience to Development Community Activities

One Year Already…

Although my previous company was a fast-growing firm with much to learn, I needed to change my environment for my personal growth. Among several companies, I chose Idus because it made me reflect on my career direction.

After presenting at the Django Lightning Talk in June 2020, I met a senior developer from Idus (now my leader) during the networking session. I was introduced to DevOps and became interested in the DevOps culture.

SRE (Site Reliability Engineering)

Curious about DevOps culture, I began an SRE study. Although there were many aspects not suitable for application in an early startup, the philosophy of DevOps was clear.

One of the most impressive phrases from SRE is:

“Culture eats strategy for breakfast.”

Simply liking coding does not create a good development culture. The principles I believe are essential for creating a good development culture are:

  1. Acknowledge failures without deceiving oneself or others.
  2. Be tolerant of others' failures and encourage challenges.
  3. Improve the environment based on failures.

This DevOps philosophy was captivating, and I wanted to embody the inspiration it gave me into my experiences.

Just in time, a DevOps position opened up at Idus, the company that had sparked my curiosity about DevOps, and in May 2021, I joined Idus.

Work Experience at Idus

Idus DevOps Team

My work at Idus can be categorized into three main perspectives:

  1. Incident Response: The highest priority task to ensure stable service delivery.

    • Process:
      • Incident occurs → Incident acknowledgment (reporter or alert) → Incident communication (relevant stakeholders) → Incident resolution → Post-mortem review
    • Post-mortem:
      • Post-mortem involves analyzing the cause of an incident and systematizing measures to prevent the same mistake from recurring. Originating from the medical/aviation industry, if there are casualties due to an accident, the responsible party is held accountable. To avoid blame, the responsible party might hide the cause, but if the cause is concealed, repeated problems can lead to further harm. This is the worst scenario, and everyone would flee if accountability is enforced.
      • To prevent this, post-mortems do not blame individuals. Instead, companies often reward well-identified and improved structural issues (Google is a notable example).

    Idus’s incident response embraces the post-mortem culture:

    • Short term: Share warnings about the issue with team members.
    • Long term: Prepare tasks to systematize measures preventing recurrence.
  2. Technical Research: Introducing new technologies & repaying technical debt.

    • To enhance team productivity, I explore tools suitable for the team and review the benefits.
    • Projects I participated in include:
      • Improving DEV/QA separation
      • Refactoring EKS Helm & deployment pipeline
      • Adopting Terraform Cloud
      • Adopting Datadog
      • Preparing for global services
    • Detailed project information will be covered in future posts.
  3. Operational Tasks: Facilitating the smooth growth of Idus.

    • Infrastructure setup for Idus
    • Legacy system performance improvement
    • Cost optimization
    • ISMS response

    The advent of IaC (Infrastructure as Code) significantly contributed to the popularity of DevOps. Managing infrastructure with code and automating it became possible. I worked on writing basic modules for infrastructure setup and constructing Terraform code to inject variables and create infrastructure. I provided infrastructure setup guides to developers, and resources are created through reviews.

  • Global Resource Creation Guide:
    • Positives:
    • Contribution to infrastructure structure unification.
    • Previously, the infrastructure was complex with PROD and STAGE environments combined for development, testing, and QA, leading to discrepancies over time and unreliable QA.
    • We removed the STAGE environment and restructured QA and DEV environments, focusing on unifying infrastructure structure across environments and specializing each environment.
    • Resolving bottlenecks in the infrastructure creation process:
    • Before I joined, every step from A to Z in infrastructure setup had to go through DevOps.
    • Now, developers can refer to guides and directly involve themselves in infrastructure creation, reducing the wait time for DevOps to proceed with tasks.

Areas for Improvement

  1. Data-Driven Tasks:

    • One advantage of working in an IT company is the ease of data-driven tasks. Data-driven tasks clearly communicate the contribution of work.
    • For example:
      • AS-IS: Improved API latency of the XX endpoint in OO service.
      • TO-BE: Improved API latency (p99) of the XX endpoint in OO service by 50% (from 500ms to 250ms) in 5 hours of work.
    • Though it takes effort to data-driven tasks, it reduces communication costs significantly and is a more effective way of working. Initially, I spoke generally about the benefits, but working at Idus has increased data-driven communication.
  2. Sharing Small Tasks:

    • When recently adding Datadog Integration to the DEV environment, I mistakenly removed the EC2 host filter, significantly increasing costs. Though notified of ticket completions, it's challenging to monitor all notifications. If shared simply on a messenger before proceeding, issues could be identified quicker.
  3. Developing a Habit of Double-Checking:

    • Most incidents result from human error. Though guaranteed systems would prevent issues, initial tasks may require handmade (manual) processes. In such cases, developing a habit of reviewing 2-3 times and double-checking with team members before proceeding can reduce incident frequency.

Additionally: (People + Development) = Good Development Community Activities

CTO Sungil always shares good news. Participating in the development community provides much inspiration. It also helps build great relationships with fellow developers who share similar concerns and relieve stress. Idus supports and encourages growth and happiness.

In Conclusion

I write retrospectives each year to make the current year better than the last. This year's retrospective was meaningful. Writing this made me think of each team member at Idus, and I want to express my gratitude.

Top comments (0)