Episode Followup – Observability 2.0 – More Than Just Logs, Metrics & Traces

Join us as Neel explores how observability is evolving beyond traditional logs, metrics, and traces into a predictive, AI-powered discipline.

Neel walks through the evolution of Observability, demonstrating how OpenTelemetry, machine learning, and LLMs are transforming how we monitor and maintain modern applications. You’ll learn about dynamic sampling techniques that reduce costs while maintaining visibility, how ML algorithms detect anomalies before they cause outages, and practical implementations using tools like the OpenTelemetry Collector. This episode covers real-world scenarios from reducing massive log volumes to predicting system failures before they impact customers.

Timestamps

0:00 Welcome & Introduction

4:29 Neel’s Background & Community Work

5:03 The Evolution of Observability

6:29 The 2 AM Production Incident Scenario

8:13 OpenTelemetry’s Role in Modern Observability

  • 12:45 Dynamic Sampling Techniques
  • 18:22 ML & AI in Anomaly Detection
  • 24:16 LLM Observability Explained
  • 28:32 Cost Optimization Strategies
  • 30:04 Context Windows & Token Management
  • 32:00 Self-Healing Systems Discussion
  • 34:15 Edge Cases: When Dynamic Sampling Doesn’t Work
  • 36:27 Wrap-up & Resources

How to find Neel:

https://www.linkedin.com/in/neelcshah/

https://bento.me/neelshah

Links from the show:

https://neelshah.dev/blogs/observability-2

https://opentelemetry.io/

https://middleware.io/blog/observability-2-0/