Day 1 - Fundamentals of Observability

GitHub - https://github.com/SUBHAM-NANDI/Observability-Zero-to-Hero/blob/main/Day01/Readme.md

What is Observability?

Observability is the ability to assess the internal state of a system based on the data it generates—primarily logs, metrics, and traces. This allows engineers to gain insights into system behavior and understand why things are happening, beyond just identifying that something is wrong.

Logs: Provide detailed records of events, helping explain why issues occur.
Metrics: Measure key performance indicators like CPU usage and memory to track what is happening.
Traces: Follow a request's path through different services, showing how issues develop.

What is Monitoring?

Monitoring focuses on keeping tabs on the performance of systems by tracking metrics and sending alerts when predefined thresholds are crossed. It answers what is happening in real time and ensures system health by proactively identifying potential problems.

Why Monitoring Matters?

Monitoring is essential for ensuring the availability, performance, and security of IT systems. By setting up monitoring, we can detect issues early and address them before they lead to downtime or significant performance degradation. The key purposes of monitoring include:

Early Problem Detection: Identify issues before they impact end users.
Measuring Performance: Keep track of key metrics to ensure optimal performance.
Ensuring Availability: Make sure systems are running smoothly.

Why Observability is Important?

While monitoring alerts us to potential issues, observability helps us diagnose the root cause and understand system behavior. It provides deeper insights, allowing us to answer the "why" behind issues. The goals of observability are:

Diagnosing Issues: Pinpoint the cause of performance bottlenecks or failures.
Understanding Behavior: Gain deeper insights into system processes.
Improving Systems: Use data to optimize performance and prevent future issues.

Monitoring vs. Observability: What's the Difference?

Although closely related, monitoring and observability focus on different aspects of system management.

Category	Monitoring	Observability
Focus	Checking if everything works	Understanding why things happen
Data	Collects metrics like CPU, memory, error rates	Collects logs, metrics, and traces for full visibility
Alerts	Sends notifications when things go wrong	Correlates events to identify root causes
Example	Alerts if CPU usage exceeds 90%	Helps trace slow performance across microservices

Does Observability Cover Monitoring?

Yes, monitoring is a subset of observability. Monitoring tracks specific metrics and generates alerts, while observability provides a more comprehensive understanding by collecting a broader range of data, including logs, metrics, and traces. Observability helps correlate data across the system, allowing for more effective root cause analysis.

What Can Be Monitored?

Common areas where monitoring is applied include:

Infrastructure: Track CPU usage, memory, disk I/O, and network traffic.
Applications: Monitor response times, error rates, and throughput.
Databases: Watch for query performance and transaction rates.
Network: Analyze latency, bandwidth, and packet loss.
Security: Track unauthorized access attempts and firewall logs.

What Can Be Observed?

Observability looks deeper into system behavior:

Logs: Capture detailed event records within the system.
Metrics: Quantitative data points like CPU load and request counts.
Traces: Track the flow of requests across multiple services.

Monitoring Bare-Metal vs. Kubernetes

Monitoring can vary based on the environment.

Bare-Metal Servers: Easier access to hardware metrics and logs with a simpler infrastructure.
Kubernetes: More challenging due to its dynamic nature, requiring advanced tools to track ephemeral containers and distributed systems.

Observing Bare-Metal vs. Kubernetes

Bare-Metal Servers: Fewer components make it easier to collect and correlate data.
Kubernetes: Requires sophisticated tools to track the flow of requests through containers and microservices. Multiple observability tools are needed to gain complete insights.

Tools for Monitoring and Observability

Here are some common tools used in each domain:

Monitoring Tools: Prometheus, Grafana, Nagios, Zabbix, PRTG.
Observability Tools: ELK Stack (Elasticsearch, Logstash, Kibana), EFK Stack (Elasticsearch, FluentBit, Kibana), Splunk, Jaeger, Zipkin, New Relic, Dynatrace, Datadog.