Splunk : The Five Tenets of Observability

January 25, 2022 at 12:17 pm EST

By Greg Leffler January 25, 2022

A new year is a chance to have a new start, and one thing that it's a great opportunity to think about is the monitoring and observability platform you're using for your applications. If you've been using a legacy monitoring system, you've probably heard about observability all over the 'net and want to figure out if this is really something you need to care about.

In this post, I'll briefly explain what observability is, what a system needs to actually provide you with true observability, and how you can start the observability journey yourself.

Observability is a mindset that lets you answer questions about your business - from the user's experience, through the application itself, and beyond to the business metrics and processes that the application enables. It's an evolution of monitoring that greatly expands the volume of ingested data and radically expands the number and type of questions you can answer. It's not just "metrics, traces, and logs" - observability is really about instrumenting everything and using this data to make better decisions. I wrote more about this in a different post, Observability: It's Not What You Think, that I'd encourage you to check out for an observability deep-dive.

Before I came to work at Splunk, I was an SRE (well, a systems admin at one of my jobs, but I'm old.) I know first-hand how important enterprise-grade observability is, because there are plenty of problems I solved in the past that I wish I had been able to use an observability system like the one we sell at Splunk to dive into. In the rest of this post, I'm going to discuss five things that an observability system must do to make it worth your investment, and I'm also going to give some examples from my experience in operations as to why these are critical.

What Differentiates One Observability Product from Another?

Every vendor will tell you that by buying their product and installing it you instantly 'get' observability, and in every case, including buying the product from us, this isn't true. What you get out of the box varies a lot, however. When you're thinking about what an observability solution will get you, you need to think of a few things that aren't necessarily going to be published on the website or discussed in reviews. In the next section, I'll discuss what I've found to be the five key tenets for an observability system. These apply to any system - commercial or homegrown - and make a real difference in how you can get value from an observability migration.

The Five Key Tenets of Observability

When evaluating an observability system, here are the five key tenets of Observability: Full-stack, end-to-end visibility; real-time answers; analytics-powered insight; enterprise-grade scale and features; and open standards. Let's dive in to each of these in more detail:

Full Stack and End-to-End

Adopting an observability platform that can't give you 100% visibility into all your transactions, from the user browser's, through your application, to the underlying business platform is setting yourself up to miss something critical. This includes support for things like RUM to determine user browser behavior, but also this includes avoiding sampling - read this post to learn why sampling is an antipattern in observability. In addition to the user's experience, you'll also need to have insight into the backend performance, including things like database query performance or code profiling.

I can't count the number of issues I had to troubleshoot at LinkedIn brought on by someone important firing off a bug report to the sre@ email list - at that point, you simply have to figure out what happened and fix it. If our tools at LinkedIn hadn't been able to see the end-to-end history for all our users, I may not have been able to fix those issues at all, or it would have taken much longer than necessary.

Real Time

A good observability platform must give you insights and data in real-time. If you have to wait for a periodic alert rollup to find out about a problem, you're likely to hear about it first from an angry tweet or an unhappy customer. Additionally, in a serverless world, the lifetime of a function can be in the hundreds of milliseconds (or less,) so it's critical that your platform is able to show you issues as quickly as possible.

In one of my early tech jobs, we found out about a problem via phone call from the CTO before any of our alerting told us it was a problem. While he was explaining the issue, alerting started to fire, but by that point, the issue had already been happening for close to 15 minutes. We hit bad timing with when the problem happened, but this could easily happen to anyone.

Analytics-Powered

The volume of data generated by an observability system is astronomical. There's no way around it - you need something to help you make sense of this data and to suggest things that matter. An observability platform has to make problems easier to solve, not more difficult. Just instrumenting and adding tons of data into a system with no way for it to surface important things is going to make your problems worse.

Adding additional information to an observability system can backfire on you without a way to analyze it. In one of my past jobs, nearly every service ran in a JVM, so of course, it made sense to collect JVM memory statistics and to then alert on excessive memory usage, GC pause time, and things like that. What we didn't anticipate when adding these metrics was how many events would be generated by small problems in one application. The alerting tool had no dedeuplication and there were thousands of events to manually clear every time the workload changed enough to alter memory allocation patterns in one app. These patterns didn't have any user impact, the app was just behaving differently to us. A good analytics tool would have at least deduplicated these, and at best would have indicated that these aren't impacting any customer-facing metrics so aren't worth a realtime investigation.

Enterprise-Grade

Yes, I know that we're dealing with buzzword city whenever anyone says "enterprise", but a robust observability system has to do many things that go beyond simple monitoring. Your system eventually will probably need to operate across multiple clouds (and probably a few on-premise systems.) You'll start to rely on it, so it needs to keep running no matter how much you grow and no matter how many services you have. Eventually as you get even larger, true 'enterprise' features like RBAC and access tokens and accounting will be needed. The worst outcome would be needing these features and them not being available, requiring a time-consuming shift in observability tools unnecessarily.

Open Standards

OpenTelemetry is the future of observability. This is primarily because instrumentation is challenging work. To get the benefits of observability, you have to instrument all of your applications, but ideally, you would want to only instrument one time then observe from anywhere. OpenTelemetry enables this. Without an open standard, time spent instrumenting your environment is time and effort on work that you'll almost certainly have to do again at some point in the future. With OpenTelemetry, you can change observability platforms if the need arises easily. You also have full control over what data is sent where, for enhanced customer privacy and possibly enhanced performance of your observability system.

What This Means for You

To start your observability journey, you want to make sure that whatever platform you're choosing can deliver on these five key tenets. Splunk Observability Cloud is built to deliver on these, in addition to providing a single place to view your entire operation, from an on-premise monolith to a globally distributed Kubernetes world, observability-as-code support through Terraform, and more.

You can start a free trial with no credit card required and experience it for yourself, or check out a demo on the product overview page.

Attachments

Original Link
Original Document
Permalink

Disclaimer

Splunk Inc. published this content on 25 January 2022 and is solely responsible for the information contained therein. Distributed by Public, unedited and unaltered, on 25 January 2022 17:16:05 UTC.

	1st Jan change	Capi.
MICROSOFT CORPORATION	+20.65%	3,371B
SYNOPSYS INC.	+19.25%	94.08B
CADENCE DESIGN SYSTEMS, INC.	+15.90%	86.55B
PALANTIR TECHNOLOGIES INC.	+63.45%	62.51B
DASSAULT SYSTÈMES SE	-22.05%	49.4B
ATLASSIAN CORPORATION	-23.72%	47.23B
SEA LIMITED	+82.30%	42.4B
TAKE-TWO INTERACTIVE SOFTWARE, INC.	-4.72%	26.84B
ROBLOX CORPORATION	-11.35%	25.94B

1st Jan change

Capi.

MICROSOFT CORPORATION

+20.65%

3,371B

SYNOPSYS INC.

+19.25%

94.08B

CADENCE DESIGN SYSTEMS, INC.

+15.90%

86.55B

PALANTIR TECHNOLOGIES INC.

+63.45%

62.51B

DASSAULT SYSTÈMES SE

-22.05%

49.4B

ATLASSIAN CORPORATION

-23.72%

47.23B

SEA LIMITED

+82.30%

42.4B

TAKE-TWO INTERACTIVE SOFTWARE, INC.

-4.72%

26.84B

ROBLOX CORPORATION

-11.35%

25.94B

Splunk Inc. Introduces New Security Innovations to Power the SOC of the Future	Jun. 12	CI
Splunk Unveils Next-Generation Data Management Experience At the Edge and Beyond	Jun. 12	CI
Splunk Inc. Introduces Advanced AI Enhancements for Observability, Security and IT Service Intelligence	Jun. 11	CI
Cisco and Splunk Announce Integrated Full-Stack Observability Experience for the Enterprise	Jun. 05	CI
Bitwarden Expands Splunk Cloud Integration for Advanced Event Management	May. 16	CI
Splunk Unveils Asset and Risk Intelligence to Revolutionize Proactive Risk Mitigation	May. 06	CI
ANALYST RECOMMENDATIONS : Best Buy, Wells Fargo, AMD, Netflix, Nvidia...	Mar. 20
Splunk Inc.(NasdaqGM:SPLK) dropped from FTSE All-World Index	Mar. 19	CI
Splunk Inc.(NasdaqGS:SPLK) dropped from S&P Software & Services Select Industry Index	Mar. 19	CI
Splunk Inc.(NasdaqGS:SPLK) dropped from S&P TMI Index	Mar. 19	CI
Splunk Inc.(NasdaqGS:SPLK) dropped from S&P Global BMI Index	Mar. 19	CI
ANALYST RECOMMENDATIONS : 3M Company, Snowflake, Splunk, Micron, Nvidia...	Mar. 19
How Cisco Will Integrate Splunk Into Company	Mar. 18	MT
Cisco: completes acquisition of Splunk for $28 billion	Mar. 18	CF
Splunk Inc.(NasdaqGS:SPLK) dropped from NASDAQ Composite Index	Mar. 17	CI
Cisco Systems, Inc. completed the acquisition of Splunk Inc. from Hellman & Friedman Capital Partners X, L.P., managed by Hellman & Friedman LLC, BlackRock, Inc., The Vanguard Group, Inc., PRIMECAP Management Company and others for approximately $27 billion..	Mar. 17	CI
Splunk Inc.(NasdaqGS:SPLK) dropped from NASDAQ-100 Index	Mar. 14	CI
Add a little SaaS to your life	Mar. 14
EU Watchdog Green-lights Cisco Systems' Purchase of Splunk	Mar. 14	MT
Cisco gains EU antitrust nod for $28 billion Splunk acquisition	Mar. 14	RE
Oracle posts rise in quarterly profit on strong cloud demand	Mar. 11	RE
Linde to Join Nasdaq-100 Index	Mar. 11	MT
Cisco's Splunk deal set to win unconditional EU antitrust OK, sources say	Mar. 05	RE
GitLab shares drop as 'less conservative' forecast disappoints investors	Mar. 05	RE
Splunk beats quarterly revenue estimates on steady demand for cloud services	Feb. 27	RE

Splunk Inc.

Equities

SPLK

US8486371045

Software

Splunk : The Five Tenets of Observability

Latest news about Splunk Inc.

Chart Splunk Inc.

Company Profile

Sector Other Software