<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Arosh Wijepala</title>
    <description>The latest articles on DEV Community by Arosh Wijepala (@aroshw).</description>
    <link>https://dev.to/aroshw</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F732912%2F80829831-a7a7-4e79-b3af-02c29ae7573a.png</url>
      <title>DEV Community: Arosh Wijepala</title>
      <link>https://dev.to/aroshw</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/aroshw"/>
    <language>en</language>
    <item>
      <title>Automating VMware Tools Upgrades with PowerShell and N-Central</title>
      <dc:creator>Arosh Wijepala</dc:creator>
      <pubDate>Mon, 14 Apr 2025 11:59:18 +0000</pubDate>
      <link>https://dev.to/aroshw/automating-vmware-tools-upgrades-with-powershell-and-n-central-5e7l</link>
      <guid>https://dev.to/aroshw/automating-vmware-tools-upgrades-with-powershell-and-n-central-5e7l</guid>
      <description>&lt;p&gt;As a systems engineer supporting a wide range of clients with all kinds of VMware environments—from vSphere to standalone ESXi hosts—I kept running into the same challenge: VMware Tools versions were often out of date. Manual upgrades were inconsistent, time-consuming, and prone to being forgotten.&lt;/p&gt;

&lt;p&gt;We already had a well-established patch management process using N-Central RMM, and I figured: Why not piggyback off the same maintenance window to upgrade VMware Tools, too? That would eliminate the manual effort and ensure every machine stayed updated and supported, automatically.&lt;/p&gt;

&lt;p&gt;This blog is a story of how a simple PowerShell script helped streamline this process—and how you can use it too.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;VMware Tools is essential for good VM performance, better guest OS integration, and clean VM shutdowns. But in environments with dozens or hundreds of virtual machines, keeping it updated across different versions of ESXi is messy.&lt;/p&gt;

&lt;p&gt;N-Central, our Remote Monitoring and Management (RMM) tool, allowed us to schedule tasks during patching windows. So I decided to build a PowerShell script that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Checks the latest VMware Tools version online&lt;/li&gt;
&lt;li&gt;Compares it with the version installed locally&lt;/li&gt;
&lt;li&gt;Downloads and silently installs the update—only if needed&lt;/li&gt;
&lt;li&gt;Logs everything along the way&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The PowerShell Script
&lt;/h2&gt;

&lt;p&gt;Here’s the full script. I’ll explain it part by part in a moment. &lt;a href="https://github.com/ph4n7om2000/vmware_tools_upgrade/" rel="noopener noreferrer"&gt;You can download it from my GitHub here&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$URL = "https://packages.vmware.com/tools/esx/latest/windows/x64/"
$LogFilePath = "C:\temp\VMwareToolsUpdateScript.log"

(Get-Date).ToString() + " :  Script Initiated" &amp;gt;&amp;gt; $logfilepath

$PSversion = Get-Host | Select-Object Version
(Get-Date).ToString() + " : PowerShell version = " + $PSversion &amp;gt;&amp;gt; $logfilepath

$QueryVMWareToolsVersion = Invoke-WebRequest $URL -UseBasicParsing
$VMWareToolsSetupName = $QueryVMWareToolsVersion.Links.HREF | Select -Skip 1

[string]$VMWareToolsNewestVersion = $VMWareToolsSetupName -replace ".*VMware-tools-" -replace "-.*"
$VMWareToolsInstalledVersion = Get-WmiObject Win32_Product -Filter "Name like 'VMware Tools'" | Select-Object -ExpandProperty Version
[string]$VMWareToolsInstalledVersion = $VMWareToolsInstalledVersion.Substring(0,$VMWareToolsInstalledVersion.LastIndexOf('.'))

If ([version]$VMWareToolsInstalledVersion -lt [version]$VMWareToolsNewestVersion) {
    $DownloadURL = $URL + $VMWareToolsSetupName
    try {
        Invoke-WebRequest -Uri $DownloadURL -OutFile "C:\temp\$VMWareToolsSetupName"
        (Get-Date).ToString() + " : Download Finished!" &amp;gt;&amp;gt; $logfilepath
    } catch {
        (Get-Date).ToString() + " : Download Failed" &amp;gt;&amp;gt; $logfilepath
    }

    $ArgumentList = "/S /v " + '"/qn REBOOT=R ADDLOCAL=ALL"' + " /l C:\temp\VMwareToolsSetup.log"
    $FilePath = "C:\temp\" + $VMWareToolsSetupName

    try {
        Start-Process -FilePath $FilePath -ArgumentList $ArgumentList
        (Get-Date).ToString() + " : Installation Finished!" &amp;gt;&amp;gt; $logfilepath
    } catch {
        (Get-Date).ToString() + " : Installation Failed" &amp;gt;&amp;gt; $logfilepath
    }
} else {
    (Get-Date).ToString() + " : VMware latest version is already installed!" &amp;gt;&amp;gt; $logfilepath
}

(Get-Date).ToString() + " : Script executed successfully" &amp;gt;&amp;gt; $logfilepath

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  How It Works
&lt;/h2&gt;

&lt;p&gt;Let’s break this down step by step:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Logging and PowerShell Version Detection&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;(Get-Date).ToString() + " : Script Initiated" &amp;gt;&amp;gt; $logfilepath
$PSversion = Get-Host | Select-Object Version

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We start with timestamped logging and note which PowerShell version is running. This helps in troubleshooting if the script fails due to version incompatibility (some older versions have quirks with &lt;code&gt;Invoke-WebRequest&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Check for the Latest VMware Tools Version&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$QueryVMWareToolsVersion = Invoke-WebRequest $URL -UseBasicParsing
$VMWareToolsSetupName = $QueryVMWareToolsVersion.Links.HREF | Select -Skip 1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We hit VMware’s official tools repository and look at the second link on the page (the first is usually &lt;code&gt;../&lt;/code&gt;). This should be the &lt;code&gt;.exe&lt;/code&gt; file for the latest version.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[string]$VMWareToolsNewestVersion = $VMWareToolsSetupName -replace ".*VMware-tools-" -replace "-.*"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Using a little regex magic, we extract the version string from the filename.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Detect the Currently Installed Version&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$VMWareToolsInstalledVersion = Get-WmiObject Win32_Product -Filter "Name like 'VMware Tools'"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We use WMI to get the currently installed version of VMware Tools. Since VMware’s versioning can include build numbers (e.g. &lt;code&gt;12.2.5.45654&lt;/code&gt;), we trim that last segment for a cleaner version comparison.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Version Comparison and Conditional Upgrade&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;If ([version]$VMWareToolsInstalledVersion -lt [version]$VMWareToolsNewestVersion)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the installed version is older, we build the download URL and pull the installer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Silent Install and Logging&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ArgumentList = "/S /v " + '"/qn REBOOT=R ADDLOCAL=ALL"' + " /l C:\temp\VMwareToolsSetup.log"
Start-Process -FilePath $FilePath -ArgumentList $ArgumentList
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The installer is launched silently (&lt;code&gt;/qn&lt;/code&gt;) with reboot handling (&lt;code&gt;REBOOT=R&lt;/code&gt;) and full feature installation (&lt;code&gt;ADDLOCAL=ALL&lt;/code&gt;). The log file helps diagnose any install issues.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Approach Worked
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No reliance on vCenter or ESXi APIs&lt;/strong&gt; — just plain HTTP and PowerShell&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compatible with any guest OS that supports&lt;/strong&gt; VMware Tools on Windows&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No unnecessary upgrades&lt;/strong&gt; — it installs only when there’s a newer version&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Built-in logging&lt;/strong&gt; makes it easy to audit and troubleshoot&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We scheduled the script to run during our regular patch maintenance window, so every machine was updated without manual intervention. It’s a small thing, but it shaved hours off our monthly maintenance efforts and ensured better consistency across customer environments.&lt;/p&gt;

&lt;h2&gt;
  
  
  Download the Script
&lt;/h2&gt;

&lt;p&gt;👉&lt;a href="https://github.com/ph4n7om2000/vmware_tools_upgrade/" rel="noopener noreferrer"&gt;Download the full script on GitHub&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Automation doesn't always need to be fancy. Sometimes, it's just about using the tools you already have in a smarter way. If you're running VMware environments and want to keep things simple, give this approach a try—and let me know how it works for you!&lt;/p&gt;

&lt;p&gt;Feel free to fork, improve, or suggest changes to the script. I’d love to hear how others are handling VMware Tools upgrades in complex environments.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>My Cloud Resume Challenge Journey: Embracing Serverless</title>
      <dc:creator>Arosh Wijepala</dc:creator>
      <pubDate>Mon, 07 Apr 2025 11:44:21 +0000</pubDate>
      <link>https://dev.to/aroshw/my-cloud-resume-challenge-journey-embracing-serverless-3ag9</link>
      <guid>https://dev.to/aroshw/my-cloud-resume-challenge-journey-embracing-serverless-3ag9</guid>
      <description>&lt;p&gt;I’ve spent nearly 14 years in the tech industry, working my way up from providing end-user support to managing full-scale corporate infrastructure. For a good portion of my career, I worked with physical servers running virtualization—systems I could walk up to, plug into, and manage directly, with on-prem infrastructure that I understood inside and out.&lt;/p&gt;

&lt;p&gt;As cloud computing grew in popularity, I naturally found myself migrating infrastructure to the cloud—deploying virtual servers, and also migrating data from various platforms to Microsoft 365, and earning certifications like AZ-104 and AZ-305 along the way. I understood Infrastructure-as-a-Service (IaaS) and had real-world experience using it. But there was still something I hadn’t really touched: serverless. I’d read about it, played with it here and there on my own time, but never had a proper project to truly dive in.&lt;/p&gt;

&lt;p&gt;Still, the curiosity was always there. I found myself drawn to the idea of serverless, of building things without thinking about the underlying machines. I’d read The &lt;em&gt;Phoenix Project&lt;/em&gt; and The &lt;em&gt;Unicorn Project&lt;/em&gt;, and those books stayed with me. They planted seeds—seeds of understanding how DevOps, automation, and cloud-native technologies were transforming how we deliver technology.&lt;/p&gt;

&lt;p&gt;Then one day, I stumbled upon the &lt;a href="https://cloudresumechallenge.dev/" rel="noopener noreferrer"&gt;&lt;strong&gt;Cloud Resume Challenge&lt;/strong&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Start of Something New
&lt;/h2&gt;

&lt;p&gt;The challenge wasn’t just a list of tasks. It was an experience designed to stretch you. It gave you a direction, but no step-by-step instructions. And that’s exactly what I needed. I already had the certifications (AZ-900, AZ-104, AZ-305) and some real-world Azure experience under my belt, but this was a chance to apply all that knowledge—and then some.&lt;br&gt;
It started off easy. I had some basic experience with HTML and CSS (though I wasn’t exactly passionate about front-end development). So I leaned on AI to generate my resume web page, uploaded it to Azure Storage as a static site, and pointed my Cloudflare DNS to it. That part was smooth.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Uncomfortable
&lt;/h2&gt;

&lt;p&gt;Things got harder when I tried to set up &lt;strong&gt;Azure Front Door&lt;/strong&gt;. I knew what it was—a global load balancer—from my certification studies. But putting theory into practice is another thing entirely. Understanding how frontends, backends, routes, and origins all fit together took me down a rabbit hole of documentation. I spent evenings reading, testing, and debugging. Eventually, it clicked. It was a small win, but a satisfying one.&lt;/p&gt;

&lt;p&gt;The real turning point came when I got to &lt;strong&gt;creating the API with Azure Functions&lt;/strong&gt;. I had never worked with them before. I knew Python well enough, so writing a basic HTTP-triggered function wasn’t too intimidating. But integrating it with Cosmos DB? That was tough. I wanted to develop it locally in VS Code, and setting everything up took longer than expected. The biggest hurdle was understanding how &lt;strong&gt;bindings&lt;/strong&gt; worked to connect the function to the database.&lt;/p&gt;

&lt;p&gt;It was one of those moments where frustration starts to creep in. But I reminded myself—this is why I took on the challenge in the first place. With enough reading, trial and error, reviewing Azure Application Insights and a little help from ChatGPT, I started to get the hang of it. The sense of progress was incredibly rewarding.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stepping into Infrastructure as Code
&lt;/h2&gt;

&lt;p&gt;Next, I took on &lt;strong&gt;Terraform&lt;/strong&gt;. I’d used it before, but only briefly. This time, I committed to really learning it. I installed it, started writing out my .tf files, explored tfvars, and learned about state files and variable assignments.&lt;/p&gt;

&lt;p&gt;What really tripped me up was how detailed and nested Azure resources could be. Configuring something like Front Door in Terraform meant understanding all of its sub-resources and dependencies. But I stuck with it, and eventually managed to define the full infrastructure in code—including monitoring and alerting with Azure Monitor.&lt;/p&gt;

&lt;p&gt;I also automated the DNS records in Cloudflare using its Terraform provider. Seeing everything come together—the infrastructure, the code, the monitoring—was incredible. It started to feel like I was really building something robust and production-ready.&lt;/p&gt;

&lt;h2&gt;
  
  
  Adding the Real-World Touches
&lt;/h2&gt;

&lt;p&gt;Since I had experience using PagerDuty in previous roles, I decided to bring it into this project as well. I set up a team, schedules, and escalation policies. Then I wired Azure Monitor to send alerts via webhook, so I’d get a ping on my phone if something broke. It gave the project a sense of realism—it wasn’t just a lab experiment anymore.&lt;/p&gt;

&lt;h2&gt;
  
  
  Learning CI/CD from Scratch
&lt;/h2&gt;

&lt;p&gt;The final mountain to climb was &lt;strong&gt;CI/CD&lt;/strong&gt; using GitHub Actions. Until this point, I hadn’t worked with pipelines, and it felt like a completely different world. I started reading up on YAML workflows, learning how to trigger deployments with code pushes, how to use Git in VS Code, and how to structure commits and branches.&lt;/p&gt;

&lt;p&gt;One thing that really stuck with me during this phase was the importance of security. As I prepared to make the project public, I had to learn how to manage secrets safely. I used GitHub’s encrypted secrets for things like my Azure credentials and Cloudflare API tokens. I also configured a backend in Azure Storage for Terraform’s state file, ensuring I could work on the project from multiple machines without losing consistency.&lt;/p&gt;

&lt;p&gt;All of this took time. There were late nights. There were moments of confusion. But each hurdle taught me something meaningful—and that’s what made the experience so valuable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Completing the Cloud Resume Challenge wasn’t just about putting a resume online. It was about stepping out of my comfort zone and immersing myself in the technologies I’d always been curious about but hadn’t truly explored.&lt;/p&gt;

&lt;p&gt;This challenge taught me more than just how to use serverless resources or write Terraform code. It taught me how to break down problems, how to keep learning even when things feel difficult, and how to build something real from scratch.&lt;/p&gt;

&lt;p&gt;Even with years of experience in IT, this project gave me a new perspective on what the cloud can do—and how far I’ve come in my own journey. It reminded me why I got into tech in the first place: to keep building, to keep learning, and to keep pushing forward.&lt;/p&gt;

&lt;p&gt;If you're someone who's cloud-curious, or even if you've been in tech a long time but want to expand your skills, I wholeheartedly recommend taking on the Cloud Resume Challenge. It might just change the way you see your career.&lt;/p&gt;

&lt;p&gt;Want to check out my project? [&lt;a href="https://github.com/ph4n7om2000/cloudresumechallenge" rel="noopener noreferrer"&gt;https://github.com/ph4n7om2000/cloudresumechallenge&lt;/a&gt;]&lt;/p&gt;

</description>
      <category>terraform</category>
      <category>azurefunctions</category>
      <category>cosmosdb</category>
      <category>html</category>
    </item>
    <item>
      <title>Proactive Web Application Monitoring and Automated Recovery with Selenium and Python</title>
      <dc:creator>Arosh Wijepala</dc:creator>
      <pubDate>Wed, 02 Apr 2025 09:53:30 +0000</pubDate>
      <link>https://dev.to/aroshw/proactive-web-application-monitoring-and-automated-recovery-with-selenium-and-python-2clj</link>
      <guid>https://dev.to/aroshw/proactive-web-application-monitoring-and-automated-recovery-with-selenium-and-python-2clj</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Ensuring the availability and reliability of critical web applications is a challenge for any organization. During my tenure at one of the largest education providers in the world, I encountered a recurring issue with a secure file transfer platform that frequently became unavailable due to database deadlocks. As part of my role, I researched, developed, and implemented an automated monitoring and remediation solution using Selenium and Python to address this challenge.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;The secure file transfer platform supported hundreds of concurrent users, making its uptime crucial. However, we repeatedly faced an issue where the application became unresponsive due to database deadlocks, causing the database connection to become unavailable. The only known workaround was restarting the SQL Server services, followed by restarting the application services—or vice versa, depending on the situation.&lt;/p&gt;

&lt;p&gt;A major issue was that we had no proactive way of detecting downtime. We only became aware of failures when users reported them. While working with vendor support for a long-term fix, we needed an interim solution that could monitor the application, detect downtime, and apply the necessary workarounds automatically.&lt;/p&gt;

&lt;h2&gt;
  
  
  Choosing a Solution
&lt;/h2&gt;

&lt;p&gt;Due to budget constraints, commercial monitoring solutions like New Relic were not an option. After thorough research, I determined that Selenium, a web automation framework, could be used to automate periodic login attempts and verify application availability. Selenium allowed us to interact with the web application just as a user would, making it an ideal choice.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tools Used
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Python&lt;/strong&gt;: Scripting language for automation&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Chrome Headless&lt;/strong&gt;: Command-line interface browser&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Selenium&lt;/strong&gt;: Web automation framework&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;PS Tools&lt;/strong&gt;: pskill.exe is used to terminate services, while psService.exe is utilized to start remote services, as both the database and application services are hosted on a Windows Server environment.&lt;/p&gt;

&lt;h2&gt;
  
  
  Download the Script
&lt;/h2&gt;

&lt;p&gt;You can download the full monitoring script along with the required files from GitHub: &lt;a href="https://github.com/ph4n7om2000/sft_synthetic_monitor_fully_automated" rel="noopener noreferrer"&gt;🔗 Download from GitHub&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Implementing the Monitoring Function
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Importing modules&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;selenium&lt;/code&gt; module is imported to navigate through web pages and interact with web elements. &lt;code&gt;smtplib&lt;/code&gt; module is imported to send notification emails. &lt;code&gt;datetime&lt;/code&gt; module is used to write a date stamp in log files for keeping a log of the script's activity. &lt;code&gt;time&lt;/code&gt; module is utilized to wait for a specific amount of time before proceeding with the next task. &lt;code&gt;os&lt;/code&gt; module is imported to create and delete &lt;code&gt;reboot_flag&lt;/code&gt; file utilised for changing the sequence of service restarts.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import smtplib
from email import message
from datetime import datetime
import time
import os
import subprocess
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Setting up log file: The script is set to open a log file named &lt;code&gt;uat_runlog&lt;/code&gt;in append mode to record the execution logs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;log_object = open('uat-runlog', 'a')
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Email notification&lt;/strong&gt;: The following code defines a function &lt;code&gt;send_failure_email()&lt;/code&gt; that sends an email notification in case of a login check failure meaning the web application is down. The function sets email parameters and it then creates a message object using the &lt;code&gt;message.Message()&lt;/code&gt; method and sets its headers and payload.&lt;/p&gt;

&lt;p&gt;The function then creates an SMTP server instance and connects to the SMTP server using the &lt;code&gt;smtplib.SMTP()&lt;/code&gt; method with the server address and port number as arguments. The &lt;code&gt;ehlo()&lt;/code&gt; method is called twice to identify the client and initiate the SMTP conversation. The function then logs in to the SMTP server using the &lt;code&gt;login()&lt;/code&gt; method with the &lt;code&gt;from_addr&lt;/code&gt; and &lt;code&gt;yoursmtppassword&lt;/code&gt; as arguments.&lt;/p&gt;

&lt;p&gt;Finally, the function sends the message to the specified &lt;code&gt;to_addr&lt;/code&gt; and &lt;code&gt;to_addr2&lt;/code&gt; email addresses using the &lt;code&gt;server.send_message()&lt;/code&gt; method, passing the message object, &lt;code&gt;from_addr&lt;/code&gt; and &lt;code&gt;to_addrs&lt;/code&gt; as arguments.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def send_failure_email():
    from_addr = 'sftalert@yourdomain.com'
    to_addr = 'team1@yourdomain.com'
    to_addr2 = 'team2@yourdomain.com'
    subject = 'SFT Alert!'
    body = 'Login check has failed. Check application availablility ASAP!'
    msg = message.Message()
    msg.add_header('from', from_addr)
    msg.add_header('to', to_addr)
    msg.add_header('subject', subject)
    msg.set_payload(body)
    server = smtplib.SMTP('smtp.yoursmtpserver.com', 587)
    server.ehlo()
    server.starttls()
    server.ehlo()
    server.login(from_addr, 'yoursmtppassword')
    server.send_message(msg, from_addr=from_addr, to_addrs=[to_addr])
    server.send_message(msg, from_addr=from_addr, to_addrs=[to_addr2])
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Monitoring&lt;/strong&gt;: The code snippet below outlines the fundamental purpose of the script - to log in to the web application, click and verify the loading of specific elements.&lt;/p&gt;

&lt;p&gt;First the code is setting up the options and configuration for the ChromeDriver using the Selenium WebDriver library in Python to control the Chrome browser.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;options.add_argument("--headless")&lt;/code&gt;: This line sets the "headless" option to run the Chrome browser in headless mode, meaning the browser will run without a graphical user interface. &lt;/p&gt;

&lt;p&gt;&lt;code&gt;options.add_argument("--no-sandbox")&lt;/code&gt;: This line sets the "no-sandbox" option to disable the Chrome browser sandbox, which is a security feature that isolates browser tabs and prevents them from affecting each other.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;s = Service("chromedriver")&lt;/code&gt;: This line creates an instance of the Service class which specifies the path to the ChromeDriver executable.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;url = "https://uat-sft.yourdomain.com/"&lt;/code&gt; line sets the URL of the web application that the script will be interacting with. &lt;code&gt;driver.get(url)&lt;/code&gt; line instructs the Chrome browser to navigate to the specified URL.&lt;/p&gt;

&lt;p&gt;The next part of the code is a try-except block that attempts to login to the web application and perform certain checks. If the login and checks are successful, the code returns up and writes logs of its activities. If the login and checks fail, the script catches the exception, writes logs of the failure, sends a failure email using the &lt;code&gt;send_failure_email()&lt;/code&gt; function, and returns down. &lt;/p&gt;

&lt;p&gt;Here is a breakdown of the code:&lt;/p&gt;

&lt;p&gt;The script first waits for a maximum of 10 seconds for the presence of the HTML username element with &lt;code&gt;EC.presence_of_element_located((By.ID, "username")&lt;/code&gt; and writes a log if the login page is loaded successfully.&lt;/p&gt;

&lt;p&gt;The script then finds the HTML elements &lt;code&gt;username&lt;/code&gt; and &lt;code&gt;password&lt;/code&gt; , enters the login credentials, clicks the sign-in button with the following code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;driver.find_element(By.ID, "username").send_keys("platformuser@yourdomain.com")
 driver.find_element(By.ID, "password").send_keys("platformuserpassword")
 driver.find_element(By.ID, "signinButton").click()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once it is able to login, in the first page, there is an HTML element &lt;code&gt;called compose-delivery-link&lt;/code&gt; which is a compose button for a secure delivery. The script waits for a maximum of 10 seconds for the presence of the compose-delivery-link, with &lt;code&gt;EC.presence_of_element_located((By.ID, "compose-delivery-link"))&lt;/code&gt; and writes a log if the login is successful.&lt;/p&gt;

&lt;p&gt;Then the script clicks the compose button, waits for a maximum of 10 seconds for the presence of the &lt;code&gt;divSecureMessage&lt;/code&gt; element which is the body of the secure message window, then it writes logs if the checks are passed, logs out by calling &lt;code&gt;driver.get(logouturl)&lt;/code&gt; , and writes logs of the successful logout.&lt;/p&gt;

&lt;p&gt;If the checks fail, the script catches the exception, writes logs of the failure, sends a failure email using the &lt;code&gt;send_failure_email()&lt;/code&gt; function, and returns down. If the checks are successful, the script creates a &lt;code&gt;reboot_flag&lt;/code&gt; file if it does not exist and returns &lt;code&gt;up&lt;/code&gt;. Finally, the script closes the web driver and the log file.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def monitor():
    options = Options()
    options.add_argument("--headless")
    options.add_argument("--no-sandbox")
    s = Service("chromedriver")
    url = "https://uat-sft.yourdomain.com/"
    driver = webdriver.Chrome(options=options, service=s)
    driver.get(url)
    try:
        usernameelement = WebDriverWait(driver, 10).until(
                EC.presence_of_element_located((By.ID, "username"))
            )

        time.sleep(10)
        if usernameelement.is_displayed() == True:
                print ("Login page loaded!")
                now = datetime.now()
                log_object.write("Login page loaded at: " + now.strftime("%m/%d/%Y, %H:%M:%S") + "\n")

        driver.find_element(By.ID, "username").send_keys("platformuser@yourdomain.com")
        driver.find_element(By.ID, "password").send_keys("platformuserpassword")
        now = datetime.now()
        log_object.write("Attempting login at: " + now.strftime("%m/%d/%Y, %H:%M:%S") + "\n")
        driver.find_element(By.ID, "signinButton").click()

        time.sleep(10)
        composebuttonelement = WebDriverWait(driver, 10).until(
                EC.presence_of_element_located((By.ID, "compose-delivery-link"))
            )
        if composebuttonelement.is_displayed() == True:
                now = datetime.now()
                log_object.write("Successfully logged in at: " + now.strftime("%m/%d/%Y, %H:%M:%S") + "\n")

        now = datetime.now()
        log_object.write("Opening compose delivery page at: " + now.strftime("%m/%d/%Y, %H:%M:%S") + "\n")
        driver.find_element(By.ID, "compose-delivery-link").click()

        time.sleep(10)
        divSecureMessageelement = WebDriverWait(driver, 10).until(
                EC.presence_of_element_located((By.ID, "divSecureMessage"))
            )
        if divSecureMessageelement.is_displayed() == True:
                now = datetime.now()
                log_object.write("Opening compose delivery page at: " + now.strftime("%m/%d/%Y, %H:%M:%S") + "\n")
                now = datetime.now()
                log_object.write("All checks passed at: " + now.strftime("%m/%d/%Y, %H:%M:%S") + "\n")

        logouturl = "https://uat-sft.yourdomain.com/bds/Logout.do"
        now = datetime.now()
        log_object.write("Successfully logged out at: " + now.strftime("%m/%d/%Y, %H:%M:%S") + "\n")
        log_object.write("------------------------------------------------\n")
        driver.get(logouturl)
        log_object.close()
        if os.path.exists('reboot_flag') == False:
            open('reboot_flag', 'x')
        driver.close()
        return "up"

    except:
        now = datetime.now()
        log_object.write("SFT health check failed at: " + now.strftime("%m/%d/%Y, %H:%M:%S") + "\n")
        send_failure_email()
        return "down"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Corrective Actions&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;In the following code, &lt;code&gt;appstatus = monitor()&lt;/code&gt; assigns the output of the monitor() function to &lt;code&gt;appstatus. monitor()&lt;/code&gt; checks the status of the application and returns &lt;code&gt;up&lt;/code&gt; or &lt;code&gt;down&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;If &lt;code&gt;appstatus&lt;/code&gt; is &lt;code&gt;down&lt;/code&gt;, the code checks for the existence of a file called &lt;code&gt;reboot_flag&lt;/code&gt;. If the file exists, it initiates a SQL services restart using &lt;code&gt;subprocess.call(['psService.exe', mssql_arguments])&lt;/code&gt;, then it initiates a Tomcat server restart by first using &lt;code&gt;subprocess.call(['pskill.exe', tomcat_kill_arguments])&lt;/code&gt; The reason for using &lt;code&gt;pskill.exe&lt;/code&gt; to kill the process instead of using &lt;code&gt;psService.exe&lt;/code&gt; is that tomcat service takes a significant amount of time to gracefully shutdown. To avoid that, we are forcefully killing the process and using &lt;code&gt;subprocess.call(['psService.exe', tomcat_start_arguments])&lt;/code&gt; to start it. Finally it deletes the &lt;code&gt;reboot_flag&lt;/code&gt; file using &lt;code&gt;os.remove("reboot_flag")&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Once the &lt;code&gt;reboot_flag&lt;/code&gt; file is deleted using &lt;code&gt;os.remove("reboot_flag")&lt;/code&gt;, the code iterates from the beginning to check whether the application is up and running. If it still fails, it comes back to the part where it checks if &lt;code&gt;os.path.exists('reboot_flag') == True&lt;/code&gt; and goes inside code in &lt;code&gt;else&lt;/code&gt; and start restarting the application services first and then start the SQL services. The it again creates the &lt;code&gt;reboot_flag&lt;/code&gt; file. This is how the reboot flag has been used to change the service start sequence.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;appstatus = monitor()

if appstatus == "down":
    now = datetime.now()

    if os.path.exists('reboot_flag') == True:
        log_object.write("Initiated SQL services restart at: " + now.strftime("%m/%d/%Y, %H:%M:%S") + "\n")
        mssql_arguments = '\\sqlserver.yourdomain.com restart mssqlserver'
        subprocess.call(['psService.exe', mssql_arguments])
        time.sleep(60)
        log_object.write("SQL services have been restarted at: " + now.strftime("%m/%d/%Y, %H:%M:%S") + "\n")

        log_object.write("Initiated application services restart at: " + now.strftime("%m/%d/%Y, %H:%M:%S") + "\n")  
        tomcat_kill_arguments = '\\applicationserver.yourdomain.com Tomcat9'
        subprocess.call(['pskill.exe', tomcat_kill_arguments])
        time.sleep(60)

        tomcat_start_arguments = '\\applicationserver.yourdomain.com start Tomcat9'
        subprocess.call(['psService.exe', tomcat_start_arguments])
        log_object.write("Application services have been restarted at: " + now.strftime("%m/%d/%Y, %H:%M:%S") + "\n")
        os.remove("reboot_flag")
        log_object.close()

    else:
        log_object.write("Initiated application services restart at: " + now.strftime("%m/%d/%Y, %H:%M:%S") + "\n")
        tomcat_kill_arguments = '\\applicationserver.yourdomain.com Tomcat9'
        subprocess.call(['pskill.exe', tomcat_kill_arguments])
        time.sleep(60)

        tomcat_start_arguments = '\\applicationserver.yourdomain.com start Tomcat9'
        subprocess.call(['psService.exe', tomcat_start_arguments])
        time.sleep(60)
        log_object.write("Application services have been restarted at: " + now.strftime("%m/%d/%Y, %H:%M:%S") + "\n")

        log_object.write("Initiated SQL services restart at: " + now.strftime("%m/%d/%Y, %H:%M:%S") + "\n")
        mssql_arguments = '\\sqlserver.yourdomain.com restart mssqlserver'
        subprocess.call(['psService.exe', mssql_arguments])
        time.sleep(60)
        log_object.write("SQL services have been restarted at: " + now.strftime("%m/%d/%Y, %H:%M:%S") + "\n")
        open('reboot_flag', 'x')
        log_object.close()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Flowchart&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnqm2v1n6quwb170oll97.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnqm2v1n6quwb170oll97.jpg" alt="Flowchart" width="800" height="553"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Through research and development, I was able to build a proactive monitoring solution using Selenium and Python that significantly reduced downtime. This automation eliminated the need for manual intervention, allowing engineers to focus on higher-priority tasks. Eventually, after months of investigation, the vendor provided a script to clean unnecessary data, permanently resolving the deadlock issue. However, during that time, our automation saved countless hours and prevented service disruptions.&lt;/p&gt;

&lt;p&gt;By leveraging Selenium, Python, and system administration tools, we successfully implemented an automated recovery system that ensured seamless application availability with minimal human intervention.&lt;/p&gt;

</description>
      <category>python</category>
      <category>selenium</category>
      <category>automation</category>
      <category>windows</category>
    </item>
  </channel>
</rss>
