<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ravi More</title>
    <description>The latest articles on DEV Community by Ravi More (@thekaizen).</description>
    <link>https://dev.to/thekaizen</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3571593%2F20775704-3817-4ad8-81de-9410687bff65.png</url>
      <title>DEV Community: Ravi More</title>
      <link>https://dev.to/thekaizen</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/thekaizen"/>
    <language>en</language>
    <item>
      <title>What Does a Production Support Engineer Actually Do ?</title>
      <dc:creator>Ravi More</dc:creator>
      <pubDate>Fri, 17 Oct 2025 18:49:28 +0000</pubDate>
      <link>https://dev.to/thekaizen/what-does-a-production-support-engineer-actually-do--20n8</link>
      <guid>https://dev.to/thekaizen/what-does-a-production-support-engineer-actually-do--20n8</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd1x9a6ao2vtgn84j2x4q.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd1x9a6ao2vtgn84j2x4q.jpeg" alt=" " width="735" height="490"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A Production Support Engineer ensures that live applications and systems run smoothly without interruptions. They act as the first line of defense when something goes wrong in production. This role involves monitoring, troubleshooting, automation, and communication and all aimed at keeping systems stable and users happy.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Let’s explore the key responsibilities with real-world examples : &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;1. Monitoring Alerts Using ITRS and Splunk&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Production environments generate alerts when something unusual happens — like high CPU usage or failed transactions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Using ITRS Geneos, you might receive an alert that a database query is taking too long. You log into the system, check the query logs, and inform the database team.&lt;/p&gt;

&lt;p&gt;With Splunk, you can search logs using keywords to find errors like:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ERROR: PaymentService failed to connect to DB&lt;br&gt;
&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;You then investigate the root cause and resolve it.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;2. Writing Shell Scripts to Automate Tasks&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Manual tasks can be time-consuming. Shell scripting helps automate repetitive actions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You write a script to:&lt;/li&gt;
&lt;li&gt;Archive logs older than 7 days&lt;/li&gt;
&lt;li&gt;Restart a service if it crashes&lt;/li&gt;
&lt;li&gt;Send email alerts when disk usage crosses 80%
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;#!/bin/bash
if [ $(df / | grep -v Filesystem | awk '{print $5}' | sed 's/%//') -gt 80 ]; then
  echo \"Disk usage high\" | mail -s \"Alert\" admin@example.com
fi
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Breakdown :&lt;/p&gt;

&lt;blockquote&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;df /&lt;/strong&gt; : Shows disk usage of the root directory.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;grep -v Filesystem&lt;/strong&gt; : Removes the header line from the output.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;awk '{print $5}'&lt;/strong&gt; : Extracts the percentage of disk used (e.g., 85%).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;sed 's/%//'&lt;/strong&gt; : Removes the % symbol to get a pure number.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;$(...)&lt;/strong&gt; : Executes the command inside and returns the result.&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;-gt 80&lt;/strong&gt; : Compares the result to 80. If greater, the condition is true.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;echo "Disk usage high"&lt;/strong&gt; : Creates the message body.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;mail -s "Alert"&lt;/strong&gt; : Sends an email with subject “Alert”.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;a href="mailto:admin@example.com"&gt;admin@example.com&lt;/a&gt;&lt;/strong&gt; : Recipient of the alert.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;fi&lt;/strong&gt; : Ends the if block.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;3. Monitoring Jobs Using AutoSys&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;AutoSys is used to schedule and monitor batch jobs like report generation or data sync.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You check if the EOD job for generating daily sales reports has failed. If it has, you rerun it and notify the business team.&lt;/p&gt;

&lt;p&gt;You might use commands like:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;autorep -j job_name -q  &lt;br&gt;
&lt;/code&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;4. Checking Start-of-Day (SOD) and End-of-Day (EOD) Activities&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;These checks ensure systems are ready for business operations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In the morning (SOD), you verify:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;All services are running&lt;/li&gt;
&lt;li&gt;No critical alerts are pending&lt;/li&gt;
&lt;li&gt;Jobs scheduled overnight completed successfully&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At night (EOD), you ensure:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reports are generated&lt;/li&gt;
&lt;li&gt;Backups are triggered&lt;/li&gt;
&lt;li&gt;No pending transactions&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;5. Handling User Tickets via ServiceNow&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Users raise issues through ticketing tools like ServiceNow.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A user reports they can't access a dashboard. You check their permissions, fix the issue, and update the ticket with resolution steps.&lt;/p&gt;

&lt;p&gt;You also categorize tickets:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Access issues&lt;/li&gt;
&lt;li&gt;Data mismatches&lt;/li&gt;
&lt;li&gt;Application errors&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;6. Troubleshooting Production Issues and Finding Root Cause&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When something breaks, you investigate logs, metrics, and configurations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;An API is returning 500 errors. You:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Check logs in Splunk&lt;/li&gt;
&lt;li&gt;Restart the service&lt;/li&gt;
&lt;li&gt;Identify a missing config file&lt;/li&gt;
&lt;li&gt;Fix it and document the RCA (Root Cause Analysis)&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;7. Using Linux Commands for System Tasks&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Linux is widely used in production. You use commands to check system health and perform actions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Common Commands:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;tail -f logfile.log → View live logs&lt;/li&gt;
&lt;li&gt;df -h → Check disk space&lt;/li&gt;
&lt;li&gt;ps -ef | grep service → Check if a service is running&lt;/li&gt;
&lt;li&gt;top → Monitor CPU and memory usage&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;8. Maintaining KT Documents in Confluence&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Knowledge Transfer (KT) documents help share information across the team.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You create a Confluence page 
titled “How to Restart Payment 
Gateway Service” with:
Step-by-step instructions&lt;/li&gt;
&lt;li&gt;Screenshots&lt;/li&gt;
&lt;li&gt;Common errors and fixes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This helps new team members learn quickly and ensures consistency.&lt;/p&gt;




&lt;p&gt;While Production Support Engineers and DevOps Engineers share some overlapping skills — like automation, monitoring, and troubleshooting — their roles are different in scope.&lt;/p&gt;

&lt;p&gt;You can think of a Production Support Engineer as someone who handles real-time operational issues, whereas a DevOps Engineer focuses more on building and maintaining CI/CD pipelines, infrastructure as code, and deployment automation.&lt;/p&gt;

&lt;p&gt;The responsibilities of a Production Support Engineer can vary from company to company. The exact tasks often depend on the client’s requirements, the technology stack, and the business domain. While some engineers may focus more on automation and scripting, others might handle more incident management or user support.&lt;/p&gt;

</description>
      <category>career</category>
      <category>devops</category>
      <category>monitoring</category>
    </item>
  </channel>
</rss>
