It is always crucial to understand the issue. There should be the right approach or a step-by-step process to be followed to troubleshoot the issues. Doesn’t matter if you are a Software Developer or DevOps Engineer or Architect. Unix. /Linux is used widely, and you should be aware of the issues and the correct approach to resolve them.
Let’s discuss a few of them:
Issue 1: Server is not reachable or unable to connect
Approach / Solution:
├── Ping the server by Hostname and IP Address
│ ├── Hostname/IP Address is pingable
│ │ ├── The Issue might be on the client side as the server is
reachable
│ ├── Hostname is not pingable but IP Address is pingable
│ │ ├── Could be the DNS issue
│ │ │ ├── check /etc/hosts
│ │ │ ├── check /etc/resolv.conf
│ │ │ ├── check /etc/nsswitch.conf
│ │ │ ├── (Optional) DNS can also be defined in the
/etc/sysconfig/network-scripts/ifcfg-<interface>
│ ├── Hostname/IP Address both are not pingable
│ │ ├── Check the other server on the same network to see if there
is it a Network side access issue or other overall
something bad
│ │ │ ├── False: The issue is not overall network side but with
that host/server
│ │ │ ├── True: Might be an overall network-side issue
│ │ ├── Logged into the server by Virtual Console, if the server
is Powered ON. Check the uptime
│ │ ├── Check if the server has the IP, and has UP status of the
Network interface
│ │ │ ├── (Optional) Also check IP-related information from
/etc/sysconfig/network-scripts/ifcfg-<interface>
│ │ ├── Ping the gateway, also check routes
│ │ ├── Check Selinux, Firewall rules
│ │ ├── Check physical cable conn
Issue 2: Unable to connect to a website or an application
Approach / Solution:
├── Ping the server by Hostname and IP Address
│ ├── False: Above Troubleshooting Diagram "Server is not
reachable or cannot connect"
│ ├── True: Check the service availability by using the telnet
command with port
│ │ ├── True: Service is running
│ │ ├── False: Service is not reachable or running
│ │ │ ├── Check the service status using systemctl or other
commands
│ │ │ ├── Check the firewall/selinux
│ │ │ ├── Check the service logs
│ │ │ ├── Check the service configuration
Issue 3: Unable to ssh as root or any other user.
Approach / Solution:
├── Ping the server by Hostname and IP Address
│ ├── False: Above Troubleshooting Diagram "Server is not
reachable or cannot connect"
│ ├── True: Check the service availability by using the telnet
command with port
│ │ ├── True: Service is running
│ │ │ ├── Issue might be on the client side
│ │ │ ├── User might be disabled, no-login shell, disabled root
login and other configuration
│ │ ├── False: Service is not reachable or running
│ │ │ ├── Check the service status using systemctl or other
commands
│ │ │ ├── Check the firewall/selinux
│ │ │ ├── Check the service logs
│ │ │ ├── Check the service configuration
Issue 4: Disk Space is full issue or add/extend disk space
Approach / Solution:
├── System Performance degradation detection
│ ├── Application getting slow/unresponsive
│ ├── Commands are not running (For Example: as / disk space is
full)
│ ├── Cannot do logging and other etc.
├── Analyse the issue
│ ├── df command to find the problematic filesystem space issue
├── Action
│ ├── After finding the specific filesystem, use du command in
that filesystem to get which files/directories are large
│ ├── Compress/remove big files
│ ├── Move the items to another partition/server
│ ├── Check the health status of the disks using badblocks command
(For Example, #badblocks -v /dev/sda)
│ ├── Check which process is IO Bound (using iostat)
│ ├── Create a link to file/dir
├── New disk addition
│ ├── Simple partition
│ │ ├── Add disk to VM
│ │ ├── Check the new disk with df/lsblk command
│ │ ├── fdisk to create the partition. Better to have LVM
partition
│ │ ├── Create filesystem and mount it
│ │ ├── fstab entry for persistent
│ ├── LVM Partition
│ │ ├── Add disk to VM
│ │ ├── Check the new disk with df/lsblk command
│ │ ├── fdisk to create LVM partition
│ │ ├── PV, VG, LV
│ │ ├── Create filesystem and mount it
│ │ ├── fstab entry for persistent
│ ├── Extend LVM partition
│ │ ├── Add disk, and create LVM partition
│ │ ├── Add LVM partition (PV) in existing VG
│ │ ├── Extend LV and resize the filesystem
Issue 5: Filesystem corrupted
Approach / Solution:
├── One of the errors that cause the system unable to BOOT UP
├── Check /var/log/messages, dmesg, and other log files
├── If we have bad sector logs, we have to run fsck
│ ├── True:
│ │ ├── reboot the system into rescue mode by booting it from
CDROM by applying ISO
│ │ ├── proceed with option 1, which mounts the original root
filesystem under /mnt/sysimage.
│ │ ├── edit fstab entries or create a new file with the help of
blkid and reboot.
Top comments (0)