DEV Community

Cover image for Cats and Clouds – There Are No Pillars in Observability with Yoshi Yamaguchi
Mandy Moore for New Relic

Posted on

Cats and Clouds – There Are No Pillars in Observability with Yoshi Yamaguchi

Jonan Scheffler talks to Developer Advocate at Google Cloud, Yoshi Yamaguchi, about the way OpenTelemetry has brought together metrics and traces, profiling data, logging, and the importance of always enjoying the technology that you work on.

Should you find a burning need to share your thoughts or rants about the show, please spray them at devrel@newrelic.com. While you're going to all the trouble of shipping us some bytes, please consider taking a moment to let us know what you'd like to hear on the show in the future. Despite the all-caps flaming you will receive in response, please know that we are sincerely interested in your feedback; we aim to appease. Follow us on the Twitters: @ObservyMcObserv.

play pause Observy McObservface

Jonan Scheffler: Hello and welcome back to Observy McObservface, proudly brought to you by New Relic's Developer Relations team, The Relicans. Observy is about observability in something a bit more than the traditional sense. It's often about technology and tools that we use to gain visibility into our systems. But it is also about people because, fundamentally, software is about people. You can think of Observy as something of an observability variety show where we will apply systems thinking and think critically about challenges across our entire industry, and we very much look forward to having you join us. You can find the show notes for this episode along with all of The Relicans podcasts on developer.newrelic.com/podcasts. We're so pleased to have you here this week. Enjoy the show.

Hello and welcome back to Observy McObservface. I'm Jonan. And I'm joined today by my friend, Yoshi. How are you, Yoshi?

Yoshi Yamaguci: Hi, how are you, Jonan? I'm pretty much excited to talk about observability stuff with you.

Jonan: I'm pretty excited to hear about observability stuff. You have a lot to say about this given your role. Well, speaking of, what is it that you do, Yoshi?

Yoshi: I'm a Developer Advocate for Google Cloud, especially focusing on observability and SRE practices in the developer relation team there. I think that's the reason why I was invited to your show.

Jonan: I think so.

Yoshi: Maybe otherwise because I'm a Japanese developer, so [laughs] you'd like to talk about some Japanese stuff.

Jonan: [laughs] I like to talk to Japanese developers very much.

Yoshi: [laughs]

Jonan: I am excited about all of those things. I've heard of Google. They're quite large. And I've been playing with GCP for a long time. I got into using Google Cloud stuff because I was playing with a device, the Google Home device. I was trying to put together Google action to touch things up. I had one that would tell me at any moment how popular cats were on the internet by checking Twitter. I could ask it for the cat popularity score, and it would tell me back out of the last 10,000 tweets how many times people talked about cats. It was pretty important software.

Yoshi: [laughs] Interesting. I was in the Google Assistant Developer Relation team before joining the Google Cloud Developer Relation team. So I'm glad that you enjoyed Google Assistant before.

Jonan: Oh, cool. So you worked on Google Assistant for how long? How long have you been at Google?

Yoshi: So I've been in Google for ten years plus.

Jonan: Wow.

Yoshi: So this year is the eleventh year. So I've been in the Google Cloud Developer Relation team for almost three years. And before that, I was in the Partner Developer Relations team, which is kind of difficult to explain. So that handles all the client-side technologies such as Android, Chrome, Chromecast, and Google Assistant, and blah, blah, blah. And then, we were focusing on creating the new use cases of using the latest technologies or latest features released by Google.

So before the public release, we worked closely with the big companies or startups that are willing to use the EAP technology, Early Access Program technologies before the public release. And then we launch that feature or that technology in public. Then we showcase those big integrations with those companies. And then we worked on it.

Jonan: So you would work with...like, if you were working on Google Assistant there, then you would work with Jonan's Cat Company, Jonan Cat Corporate to build this important integration, and then we would launch.

Yoshi: Yes.

Jonan: I see. Okay. So in that sense, Partner Developer Relations is the name of the team.

Yoshi: Yes.

Jonan: Okay. And now you're not doing so much work with Partners like the Early Access Program stuff. What sort of stuff do you do now?

Yoshi: For example, I give a talk about our technologies or latest technologies heavily relevant to Google or cloud in general at developers’ conference. Or I give a talk about specific technologies relevant to Google Cloud to our customers in private sessions. And also, we do workshops as well as talks. And internally, we try out the new features or new products before the release and then give the friction log to the product team. And then also, I collect the feedback from the customers who use the EAP program, and I aggregate all of them and then share the feedback on improvement points to the product team. And also, I give comments to the design documentation before the actual implementation of the specific features or product.

Jonan: Wow.

Yoshi: So we behave like the DevRel's user of the specific product, and then to make the developers' experience with that product better. So that's my day-to-day work. And also, I oftentimes work closely with the open-source people, for example, OpenTelemetry or other things.

Jonan: Let's talk about that. So you've been working with OpenTelemetry for a while...well since OpenTelemetry existed. Because before that, you were working with OpenCensus, the Google half that became...so it was OpenCensus and Open…

Yoshi: OpenTracing.

Jonan: OpenTracing. The two projects combined to become OpenTelemetry, and that was about what, two years ago, three years ago?

Yoshi: Two years ago.

Jonan: Two years ago, okay. And so what sort of stuff have you been talking about with OpenTelemetry lately?

Yoshi: So actually, I was not the core member of OpenCensus or OpenTelemetry. But as a DevRel, I introduced OpenCensus for external developers because it reduces the effort to instrument the application, and also, it gives wider options for the purpose which backends to use. So I did a lot of talks externally. And then for that, I had to communicate with the internal OpenCensus team, such as Bogdan, who is now in the OpenTelemetry core team, and also, Jaana who now is working for the AWS observability production team, I think. So I started my relationship with OpenCensus at that time.

And also, Morgan McLean now is the PM of Splunk. He was the main person who worked for OpenCensus, and then he tried to make OpenCensus and OpenTelemetry the first choice for the instrumentation for Google Cloud Trace and Cloud Monitoring. So that's the reason why I had to work with those libraries or the project. So Morgan McLean was really keen on communicating with the OpenTracing team because the objectives of instrumentation for trace were the same in the OpenTracing project and OpenCensus project. And he was really good at communicating with other people.

So we did a really good competition with the OpenTracing team. And then, we decided to merge into one, so that's the origin of OpenTelemetry. And also, OpenCensus had instrumentations for the metrics part. And it is a really good benefit for the OpenTracing team because they only focus on distributed tracing. So our OpenCensus metric part gave them a good benefit to have wider instrumentation, tracing, and metrics. So we merged. And now, the tracing part of OpenTelemetry was released as GA stable 1.0. So it's good.

Jonan: How long ago was that they released that?

Yoshi: You mean OpenTelemetry tracing part GA?

Jonan: Yeah, the tracing spec 1.0.

Yoshi: So the stable spec was released six months ago. The stable spec was released in March this year. And then, recently, the Go language stable library was released. In terms of that, Java is leading the implementation. So Java's stable release was maybe June, I guess. So Java and Python already have the stable release. And now the Go is counted as the stable one as well.

Jonan: Java is cheating, though. They've been around a long time. Java always has an advantage in these things. They have so many hooks in the JVM already. You mentioned when I spoke to you earlier about the profiling stuff that you've been working on with pprof. It's just continuous profiling, generally. The GCP does this, but is there a feature name for it?

Yoshi: So, as you mentioned, pprof is just for one short collection of profiling. It's so confusing. So pprof is included in Go's standard library. Go users can utilize pprof as an out-of-the-box experience, but it is actually the extension of pprof. So pprof itself is just a visualizer. So you need to have the profile data in advance of using pprof. And then Go's standard tools include the profile collection tools as well. So that's the reason why Go's pprof is considered as a profile collection and a visualizing tool. But the original pprof itself is just a visualizer. So that's the context.

And what we do in Google Cloud is that we still use the pprof. But it's a kind of forked version of the pprof because the visualization part is totally different. So we give that tool as the Cloud Profiler. So that is the product name. And then, the difference between the pprof and a Cloud Profiler is that Cloud Profiler provides the agent library for each famous programming language such as Java, Python, Node.js, and Go. And then what you need to do is to just write 5 to 10 lines of code in a new application. That launches the profile agent in your application as a subsidiary thread of the main thread. And then, that thread periodically collects the profile data of the application and then sends that data back to Google Cloud and the Cloud Profiler.

Jonan: But it's not a separate process. It has to run in the main process.

Yoshi: Yes. In the case of Java, you can attach the external agent (JAR file is the main application) and then do the same thing.

Jonan: Which is why Java is cheating because you could just attach to the running process. It's like strace is tracing the thing. Okay, so we have a lot of things we talked about here. We talked a little bit about the way OpenTelemetry brought together the metrics and the traces. And then we've talked a little bit about profiling data. But we left an important piece out of this conversation which I'm hearing more and more about in OpenTelemetry recently, which is logs. I think it has become unpopular to talk about observability being composed of MELT or the pillars of observability. There are no pillars; observability is just all of it together. But logs are certainly important here. And it's a place where we could use a lot of standardization. I think a lot of people don't even treat their logs as data right now. They don't really use all that structured logging. Do you know anything about this, the progress of the logging piece?

Yoshi: So log has a long history. So I do not know the whole of them. But at least the parts that relate toOpenTelemetry logs I know some because OpenTelemetry is now focusing on OpenTelemetry Collector, which collects the telemetries such as trace and metrics and aggregate them in one binary, one demo, and then sends it back to each backends like for traces and for metrics. But the OpenTelemetry teams thought like...so we now run the Daemon for traces and metrics. And then still, the log agent is working in the same instance. But it would be great if the OpenTelemetry Collector can get the log information as well, then everyone can unify the whole agent that leads to the telemetry collection. So that's the start of the log part in the OpenTelemetry project.

And then they contacted some famous log collection projects such as the Fluent Bit and also others like Stanza as well as Syslog, I guess. And I didn't read the whole thread of the conversation around log collection. But now, they set the Stanza as the first implementation of OpenTelemetry logs. And in Stanza, the observIQ with Stanza is merged under the OpenTelemetry log repository, so that's the status. And then, they try to standardize the format of logs based on the Stanza format, such as what kind of information should be included in log.

Jonan: Yeah, this part has always been kind of confusing to me because I think of logs as being very rich data. I can report all kinds of things in my logs. I could report the number of cat tweets found in the last hour, for example. But it's not like we're going to have a bucket for that. There's not going to be a key that is part of the standard that is like, cat count. They're pretty free-form by their very nature. You think that that's a thing that is achievable? Certainly. But is it valuable? You think people are going to end up going off the rails anyway, or is that all just built into it? Much like we have labels where you just have a bucket of key-value whatever you want in there.

Yoshi: I think the number with cat tweets could be the label because that totally depends upon the application. So log is basically the record of the events in the application. So there should be some mandatory fields such as timestamp, for example, because it is the most important information of the event and also the severity levels. So the importance of which is the label of the importance of the event itself? This is kind of easy information to have, easy information to come up with. But they'd like to standardize which kind of information should be included as the mandatory field.

So, for example, in the era of microservices, maybe some instance know the related information should be important, especially in the case of Kubernetes, maybe then the pod information, Node information, namespace information. That information should be included in a log. So what they are trying to do is to standardize the attribute of logs in general and try to set out what information is mandatory and what is not. I'm just observing how it goes, but it's a very interesting conversation.

Jonan: It's very interesting. I think Kubernetes is a good example of this because it's a standardization. We're trying to come to these standards, not because we want to force people to behave in any certain way, but because we're making everyone's lives easier. In the near term, I think it's hard to say that Kubernetes has made anyone's life easier. It's new, and so there are a lot of things that change very rapidly. And there's a lot to learn, and the resources to learn those things are not there.

For context, I've spent the last week fighting with Kubernetes in a robot that I built where I have Kubernetes running on a bunch of little Raspberry Pi things. But when you start to understand the pieces of it, it's incredibly powerful, and I really quite like it. I am shocked by what I'm able to get away with in terms of effort of deploying a large, resilient fleet of web applications onto a bunch of Raspberry Pis and blowing them away with ephemeral file systems. It's fantastic from that perspective. I have my own little cloud running on my desk, but it's very hard to get up to speed quickly.

I feel like the OpenTelemetry ecosystem is still in its very early days, and a lot of that still applies. How do you think people are going to get up to speed quickly on those? Kubernetes has things like KubeCon running. And a lot of companies are running Kubernetes now, so they're all very motivated. But what do you think is going to motivate people to start using these sorts of things in the OpenTelemetry project? How is it going to be driven to the people?

Yoshi: That's an interesting question. I think OpenTelemetry has good features or good nature from its beginning because the motivation of starting out the OpenTelemetry project comes from the challenges that many instrumentation libraries or instrumentation methods had. So the reason why OpenTelemetry started is because a number of APM vendors or monitoring SaaS vendors provided their own instrumentation away. And the developers struggled with applying those instrumentations into their system one by one, so that was hard.

So from the developer's point of view, having one single way to instrument application, I think it's good stuff. In the case that they would like to have the multiple output, for example, for standard out as well as the managed services, then what they need to do is to just write single configuration in a configuration file, and then that's it. So I think that's a good benefit for developers. And only that one gives enough motivation for developers still.

Jonan: I agree with you.

Yoshi: And also, the good thing about the OpenTelemetry is that they start everything from having the spec, the common spec among the many languages. So the developers can understand how those libraries are implemented and how they can use those libraries in their application because the way to instrument is always the same among the many libraries, many languages, so that gives the easier understanding of the OpenTelemetry libraries itself. So I think that's another good point of the OpenTelemetry project as well.

So I think that compared to new technologies such as Kubernetes or other CNCF projects, OpenTelemetry has a long history before its beginning. So now, OpenTelemetry is a joint project among the many SaaS vendors and also observability-related open-source projects such as Prometheus, Jaeger, Zipkin. And then, in terms of the APM vendors, including your company like New Relic, Google, AWS, Splunk, or Microsoft, Datadog, Dynatrace, and Lightstep, all of those companies are getting together to make the best solution to instrument the application. And each of those companies have their own struggles, and then they share those struggles and efforts with the community. So all the spec includes those histories and now get into one.

So I think the OpenTelemetry project itself is new, but the outcome is the summary of their long history. So from the developer point of view, though the OpenTelemetry seems relatively new, the idea itself, I think it's well-considered. And it looks like really common old tools such as Linux, command line, and so on. So yeah, I think it's easier for developers to start out OpenTelemetry once they read the documentation.

Jonan: It's a more mature project than it might appear.

Yoshi: Yeah, maturity. In terms of the idea itself, it's mature. Though the implementation of the library is still young and not matured enough to use it in production except for traces. The idea itself is mature, so it's easy to get along with.

Jonan: Yeah, I'm a big fan. I've only really dipped my toe into learning about the OpenTelemetry project, but it's an impressive collaboration. I have a hard time thinking of a time when companies have come across all together with so much consensus around what this thing should be. Even with things like the browser wars and stuff, we've never really had so many people on the same team. And it gets this big enough snowball at some point that it becomes unstoppable where you have a choice to embrace OpenTelemetry or be left behind. It's a very exciting project.

I have a couple of questions that I like to ask people on this show. And the first one is what you think is coming in the near future. I want you to make a prediction so that we can have this podcast again in a year, and we can tell you that you were wrong. So it has to be a risky one.

Yoshi: [laughs]

Jonan: So you’re pretty sure that it's going to be a bad one, and then we can...No. But what do you think is coming for our industry or even just software generally over the next couple of years? A year maybe is too short. Five years is too long. Do you have any predictions about where things are headed, maybe in the OpenTelemetry space? Do you think that pprof is going to be incorporated here, and everyone's going to have continuous profiling as part of some common spec?

Yoshi: So my focus is on observability, and I'm not really looking at other areas like security and so forth. So I cannot make the assumption of those areas. But in terms of observability, I think in the next couple of years, especially in the OpenTelemetry area, I think more and more plugins are coming into the project. For example, recently, Google announced that Sqlcommenter will be merged into OpenTelemetry. So Sqlcommenter is the open-source to put the SQL comment into the actual SQL, actual SQL issued by ORM mappers in applications or query libraries of each programming language. And then this is a good example. This is just an extension of SQL. But once it is integrated with OpenTelemetry, they can attach the tracing information in the query information like SQL.

And then, as an example, Cloud SQL gives the tracing information into the Google Cloud Trace. And so that gives the information around slow query on a Cloud Trace. But now, Sqlcommenter is merged into OpenTelemetry. So other tracing libraries can collaborate with our RDBMS solutions. So if the RDBMS backend can provide the tracing information of a specific query to the trace visualizer such as Jaeger and Zipkin, then developers can get the SQL query really easily.

And I believe that in the next couple of years, OpenTelemetry will be expanded in such a way. So, for example, if the security team can give information, log information, around the latency of encryptions, or like, I don't know, the number of suspicious attacks from the attackers, I don't know…So every single observability-related information can be an extension of the OpenTelemetry. So I think in the coming couple of years, OpenTelemetry can have more and more SIG team in the project. And then they can be established, expanded more and more in a variety of areas. So that's my expected expectation of OpenTelemetry.

Jonan: This SQL feature is very interesting to me. So the Sqlcommenter that we're talking about is concretely a request ID. And we could include a lot of other things, but this query happened on this request. It suddenly allows me to see every request from every query instead of what is often happening now where you're kind of bundling up the queries because that's a lot of information. You run EXPLAIN PLANS on all of these queries. And then you dump all of that into a payload that goes with your telemetry data rather than just being able to pull it out of the database because it's in the query comments on any given query. That's the thing that's merged right now? Sqlcommenter was just recently merged?

Yoshi: So it's announced they will merge in the near future. And so, the Sqlcommenter is located under Google's Organization in GitHub. But now, OpenTelemetry is preparing for the place to the migration, preferred migration. And also, the OpenTelemetry team is now asking for the maintenance and approvals of the Sqlcommenter after it is merged.

Jonan: Cool.

Yoshi: So it's pretty much new and ready to be merged.

Jonan: It's an exciting project.

Yoshi: Yeah.

Jonan: Okay, I have another question for you, and then we're going to call it a show. But this one's maybe a little bit more personal. This is about what you might tell yourself starting out in your career. I imagine that there are a lot of people out there listening who aspire to be in your shoes today. You've worked at Google for more than a decade. You have a long and successful career. What advice do you have for them getting started? Or what advice do you wish you had for yourself? What would you tell yourself just starting out?

Yoshi: I do not think of myself as a successful engineer, but I managed to survive until today.

Jonan: [laughs]

Yoshi: And the reason why I could survive is because always I tried to enjoy the technology that I work on, regardless of my will. For example, when I joined the Google Cloud team in the developer relations team, the area I focused on, the observability, was not my choice. I was assigned to work on the observability area because not many people were working on it on that day. At the time, I did not have much attention to observability itself. But I felt like, why don't I give it a shot? And then I started out researching what observability is, and also, what kind of products are around that area, and also, what kind of companies and people are there. And now I'm pretty much enjoying observability itself. And I am so excited to talk about it like now.

And also, before joining the Google Cloud team, I was always like that. So I had worked on Google Assistant for a while. And then before that, I had never worked on voice command-related devices or applications. And then, I started learning how to design the application based on the voice, only voice. That was a good experience for me because that gave me a lot of ideas around accessibility, for example. And then that kind of view gave me a good idea on considering what the information should be when we give specific information to other people. So I think the key is to enjoy the area you work on and make everything fun by yourself.

Jonan: And not necessarily only work on the things that you think are fun already because that's how you stretch yourself. You find the new things that are fun.

Yoshi: Yes.

Jonan: That's very smart advice. I'll take it. I'm going to go and learn how to do some Google Assistant programming now because I feel like the cat tweet analyzer needs to exist. It's an important project. The world needs more cat tweet analysis.

Yoshi: Yeah. I'll personally watch how it goes. So yeah, looking forward to more features to come.

Jonan: Please follow the repo. It's going to be a great one. I really appreciate you coming on the show, Yoshi. Do you have any parting thoughts for our listeners, or should we just call it a day?

Yoshi: So if you come up with any ideas or any opinions, then just feel free to reach out to me on Twitter. I think Jonan will put a link to my account somewhere here.

Jonan: I will, absolutely. We will have all of the links in the descriptions. And it's nice of you to invite people to reach out if they have questions about any of the things you talked about. They can reach out to you on...Twitter is the best way?

Yoshi: Yeah, Twitter is the best way.

Jonan: All right. Well, thank you very much, Yoshi. I hope you have a wonderful day.

Yoshi: I'm glad. I had a good talk with you. Thank you for the invitation. I'm looking forward to talking with you again.

Jonan: Thank you so much for joining us. We really appreciate it. You can find the show notes for this episode along with all of the rest of The Relicans podcasts on therelicans.com. In fact, most anything The Relicans get up to online will be on that site. We'll see you next week. Take care.

Latest comments (0)