Jonan Scheffler interviews Developer Advocate at Observable, Anjana Vakil, about the rise of data visualization, Computational Linguistics, and the importance of finding an incredible community to guide you through your learning journey.
Should you find a burning need to share your thoughts or rants about the show, please spray them at email@example.com. While you're going to all the trouble of shipping us some bytes, please consider taking a moment to let us know what you'd like to hear on the show in the future. Despite the all-caps flaming you will receive in response, please know that we are sincerely interested in your feedback; we aim to appease. Follow us on the Twitters: @ObservyMcObserv.
Jonan Scheffler: Hello and welcome back to Observy McObservface, proudly brought to you by New Relic's Developer Relations team, The Relicans. Observy is about observability in something a bit more than the traditional sense. It's often about technology and tools that we use to gain visibility into our systems. But it is also about people because, fundamentally, software is about people. You can think of Observy as something of an observability variety show where we will apply systems thinking and think critically about challenges across our entire industry, and we very much look forward to having you join us. You can find the show notes for this episode along with all of The Relicans podcasts on developer.newrelic.com/podcasts. We're so pleased to have you here this week. Enjoy the show.
Anjana Vakil: Hi, lovely to be here. Thanks for having me on.
Jonan: Thank you for coming. So we are sitting here remotely from across the planet, you in some mysterious location. I'm admiring your backdrop here.
Jonan: You have like a wooden...what do they call this here?
Anjana: It's like a folding screen wood-carved screen.
Jonan: Gold screen. It's beautiful.
Anjana: Thank you.
Jonan: I'm really jealous. Where are you right now?
Anjana: I am coming at you from San Francisco in the Bay Area in the tiny window between a smoke crisis moment. [chuckles] Luckily, a blissfully clear day here.
Jonan: Yeah, the wildfires are out of control down there right now.
Anjana: They are not great in California and just generally the Western continent.
Jonan: I'm up in Oregon, and it's not great up here. We haven't quite had the same as we did last year, but we're hanging in there. With this wooden screen in the background, I was going to guess Bali. Bali was my second guess.
Anjana: [laughs] That would be amazing as well.
Jonan: Yeah. So tell us a little bit about yourself. What is it that brought you here today on Observy?
And I think that data visualization is a really fascinating field. I am just starting to learn it, having been in this space for the last year or so or a little over a year. And as I've learned more about DataViz and as I've learned more about observability, I feel like the two go hand in hand. And so I thought it would be real fun to come on and nerd out about data and visualizing it, and whatever else floats our fancy here. [laughs]
Jonan: Well, I'm so glad that you did. I think the Observe club...we could call it Observer Guild.
Anjana: Observer Guild.
Jonan: Observer Guild…
Anjana: Observer Guild Buddies. [chuckles]
Jonan: [chuckles] Yes, Observer Buddies, The Observer Buddies Club where we all hang out.
Jonan: If you have data without the visualization piece, your life is going to be much more difficult. I'm imagining myself watching a terminal of floating points just fly past my eyes. That would be less successful for me, measuring all of the things that we measure.
Anjana: Exactly. And maybe you're Neo in The Matrix, and all you see when you look at the numbers flying by you on the black and green terminal is women in red dresses or what have you. But wouldn't it be a lot easier if you could actually just visualize all of the people in the city and see them instead of having to be Neo with superhuman powers?
Jonan: I appreciate that you picked up on my Neo vibes.
Jonan: I am very neo-esque. But you're right; a little bit of visualization helps. So I am very interested in the Observable name as a company and matched up with this observability space. But it was not necessarily an accident because your company probably predates the move that we're seeing from marketing people. As with most naming of tech things, it's usually forced by marketing upon us where they determine that this should be called observability, and then it becomes named observability. How long has Observable been around?
Anjana: So Observable...I've worked at Observable since early 2020. It had been in development as a notebook environment for a few years before that. And I think it had been an idea in its creator's mind. So Mike Bostock, who also created D3, I think had had the interactive notebook in-browser prototyping environment in his mind for much longer than that. And it was initially a project called D3 Express. And so, over the years, it morphed into a small three-person team who is building out this product in sort of a beta mode for a couple of years. And then it became a startup officially on the scene with some seed funding, and then Series A in 2019, I believe. And so yeah, it's been around for a few years, and we've had some people making amazing stuff on the platform for the last few years.
So it's an in-browser notebook environment, but it's also a community and a group of people who are really passionate about code and about DataViz and about bringing insight to light. So it's a really powerful environment for visualizing not just, let's say, a dataset, a CSV file, some data you're pulling from an API, a database you've got hooked up to it, but also things like folks who have made amazing explainers of concepts in computer science, and mathematics, and other sciences like chemistry.
Because you can use all the power of the web - interactivity, animation, visualization of whatever size and shape to convey and help people gain an intuitive understanding of not just a dataset but also really difficult to understand concepts like how an algorithm works, how a search algorithm works for example. Being able to break that down visually and show people with animations and transitions how things progress from one step to another really helps people get a lot of insight into what they're trying to understand.
So I think the name Observable to me what it means...I can't claim to speak to the minds of the founders in the room where they came up with it. But to me, what it means is really being able to take a deeper look and get a deeper understanding of what is happening in our data, in our code, in our systems, as the case may be, and to really understand at a deeper level the world around us.
Jonan: I'm sold on the dream, actually. This is, I think, a parallel definition of Observable. It occurs to me also because I love D3, and I love data, and I've ended up at New Relic now twice; it’s obvious that you and I are both knee-deep in this all the time. And we are suddenly at risk of being those two DevRel people who get in a room and just nerd out about their thing forever and ever because I could go on about this. The name notebook, especially I think, is common with Jupyter Notebooks and other things in the data science community. Do you have a large Python base, probably?
Jonan: I had no idea that that was true. So I could go over to Jupyter right now and get a Ruby notebook running on Jupyter just to troll all of the Python people over there.
Anjana: [laughs] That is entirely my understanding. But you don't even just have to troll them. You can actually use it to very nicely take advantage of whatever Ruby ecosystem things you depend on for whatever the task is at hand.
Jonan: I don't know what internet you've been using, but that's not what the internet is for.
Jonan: The internet is for trolling and memes. That's what we're here to do.
So what we trade-off, with Jupyter, you have this server-client conversation that's happening. And also, the execution of the program is very linear. It's very much you run the cell at the top, and then you run the cell below it. And if you change things, what you end up having to do a lot, or at least this was my experience when I was working with Jupyter Notebooks in grad school, and things like that is that you end up having to rerun a lot of your code. You have to start from scratch and go back and replay everything because it's working in this very imperative style of first you execute this cell and then you execute this cell.
What we trade for this server-client correspondence and the linear way that that code runs in Observable, the other side of the coin is that everything runs in your browser. So the advantage of that is that you can instantly get feedback absolutely quickly. So as you're iterating on your visualizations or you're exploring a dataset, everything is happening right in the client, in front of you, and you're getting instant feedback.
Jonan: And with that, we're going to end the Observable conversation and the observability conversation.
Jonan: This thing does a lot. I'm impressed by the product. You have impressed me. I'm sure that our viewers who are interested are also impressed and know all of those things about it. So also, in that discussion of the product, you dropped a note that I want to revisit here that you were in grad school and doing this. Did you go to school to study data science?
Anjana: I did not. I did not. My first time through college, I studied philosophy. And it was not until much later in life that I got interested in computers and sort of doing a little bit of programming here and there. And simultaneously, I was always fascinated by foreign languages. And I ended up teaching English as a foreign language for several years.
Ultimately, I went back to grad school for computational linguistics, which was a field I did not know existed for a long period of my life. That is kind of a combination of computer science, linguistics, software development, and language technology, and a bunch of other adjacent fields. And so, through doing that work and working in the field of computational linguistics, I learned a little bit of the data science, and the natural language processing, and the things that you need to be able to work with language data.
Jonan: So like NLP kind of stuff. You've played around with these engines that exist. When I talk to my Alexa in the living room or my children add blood to the grocery list as they're so fond of doing, and then I go to check my grocery list, and it's just blood, blood, blood.
Jonan: "Alexa, what's on the grocery list?" "Blood, blood, blood.
Anjana: Oh wow. [laughs]
Jonan: “Murder," they like to stick that one in there, totally healthy young kids. But they collect all of this voice data, and then they use it to inform products like Polly...well, the Alexa voice recognition one. Polly being the one that does, I think the opposite where you can have a realistic voice reading from text. But computational linguistics applies in both cases?
Anjana: Yeah. So speech technology, which applies to everything that you just said, is definitely a part of computational linguistics. Computational linguistics is this huge umbrella term that people use to mean a lot of different things. But within that umbrella or under that umbrella, certainly, speech technology and computational methods of processing human speech are definitely part of that.
So speech recognition which is the thing where Alexa understands what you said, and natural language understanding which is where Alexa understands what you said and then understands what you meant by that. So that is certainly one side of the coin. And then the other side that you mentioned, the text to speech generation, where you're taking some text and creating a realistic voice-over. Those are definitely two applications of computational linguistics research and work in what one might call human language technology.
Jonan: Playing around with these tools, I've been a big fan of building silly robots over the years.
Anjana: [chuckles] Nice.
Jonan: And then making those silly robots respond to my silly voice commands. I've done things like deploy my applications using my voice, which can go horribly wrong. Please don't ever build that thing.
Jonan: Ship to production using your voice.
Jonan: So you studied this computational linguistics field. This is a deep and complex field. I studied a bit of linguistics myself. And I remember reading about how it's like linguists break down into phonemes the sounds that we have in our words, so not necessarily in a one-to-one correspondence to a letter like the letter A as an example. In your name, the ah sound can be a phoneme, right? Ah.
Jonan: But then, in natural language processing, they actually break it down into a subcomponent. And I forget what they call this thing. But I feel like we're going closer to the center of the atom when it comes to language. This stuff is fascinating to me. How did you decide to come into the place where you are from language? Because if I were you, I would be out there building Schmolexa, the next version of this.
Jonan: I hope that someone at home has an Alexa device, and they're listening to this. Alexa, order milk. Confirm. And then they just ordered milk, right?
Jonan: How is it that you transitioned over to this place? Did you just stumble upon the computer science piece or fall in love with that? What motivated you to come this way?
Anjana: It's a good question. How did I end up here being a software developer and developer advocate after studying computational linguistics? I guess you could say that the TLDR, the short version of the story, is that I needed to write a lot of code to do the linguistics research that I was doing. And as I came to have to do that more and more, I got more and more interested in the software development side of the equation in the wait; how is it that this codebase is so difficult to maintain? Wait, how is it that we have to rebuild this solution from scratch? Wait, how is it that I can't just take somebody else's algorithm that they've implemented and extend that to shape it to my needs? And so I became just more and more interested in learning more about software development best practices and learning more about how people built code in the real world, if you will. I'm doing air quotes that you can't see. [laughs]
Jonan: Like maybe use two files instead of one file for your million lines of code. Some scientists have written abhorrent code for being so brilliant with all of these complex algorithms. I'm like; you could just break this out into two files to make it easier to read. I've seen like a 100,000-line file of just one long function. Yep, that's one way you could do it.
Anjana: Absolutely. And so I wanted to learn more about that. And so that's where I really started digging in. And I think also what really struck me about that was not just the quality of the code and how it was architected, and whether it was maintainable, and whether it was fragile or brittle. But also this notion that folks in the scientific community are doing so much work with so much code that ultimately determines what those numbers are that they put into some paper that they publish or chart that goes into that paper. But we often don't get to see that code.
And I think now there's this awesome movement for more openness around the code used in academic studies and scientific research. And there are amazing projects to host that code and make it reproducible. I mean, GitHub itself is really useful. But I think that's still a muscle that the academic and scientific community is still collectively building. So this is another thing that's so awesome about having everything be beyond the web and whether it's in a Jupyter Notebook that's hosted somewhere on GitHub or on somebody's site or whether it's in an Observable notebook that's hosted on observablehq.com.
Being able to see the code that actually went into a data analysis or a chart that you're taking conclusions away from and maybe making decisions based on, being able to see really, how did you get to those numbers? How did you get to that presentation of the data? I think is just so important. And so that was another thing that drove me hard towards the open-source community. And so, from there, it was kind of a slippery slope into this wide, weird, wonderful world as it were. [laughs]
Jonan: It makes a lot of sense to me then. So I poked a little bit of fun at my scientist friends there, but I hope I didn't discourage anyone. If you're feeling shy about your code, take a look at my GitHub profile sometime, I promise you'll feel so much better. But please keep putting it out there because the world needs it. And it's actually a really valuable opportunity that we have now. It's so much easier to share these kinds of projects. The point is not that your code was awesome. The point is that your algorithm was awesome. Show us how the science has done because I love playing with those kinds of things and learning from them.
Anjana: Absolutely. And I would just add that there's this hesitancy that people have around sharing their work. I don't think it's just code. I think it's anything.
Anjana: Putting anything out there is so hard. And I think it's just when we share things openly if we give them the benefit of the doubt, folks can come back at us and say, "Hey, I really loved what you did, but I have a helpful suggestion of an addition or a small fix or a typo I noticed," or whatever it is. And if we keep it to ourselves and we're scared of sharing that, then we never get that opportunity to learn from each other. So I just think it's amazing to have more and more ways that we can share what we're doing with each other, and learn from each other's work, and learn from each other's mistakes, and just help each other all move forward and upward together.
Jonan: We are all in this together. All of my science friends out there typing your very long scripts, I believe in you. I would love to see the code. Please ship it up to GitHub. So the root of this question is actually a question that I ask everyone who comes on this show. And I'm excited to hear your answer, which is...well, we do two of these, two of the repeated segments on Observy.
The first one I'm going to start with is the easy one which is to make some sort of prediction about the state of the world a year from now. It's not like around the pandemic or the wildfires or any of the other reasons that things are broken but maybe specifically in your sphere of technology. I think it's safe to say that a year from now, all that will be well behind us, and the whole world will be better and healed.
Anjana: Everything will be fixed, yes. Obviously. [laughs]
Jonan: Pretty much guaranteed, yeah. But a year from now in the tech space or in DataViz or wherever it is, what do you think is different enough that we can bring you back on the show and accuse you of being wrong if you are, in fact?
Anjana: [laughs] Well, I hope wrong or right to be back on the show. So one thing that I've been thinking about and talking with my colleagues about Observable is just the field of DataViz and DataViz as a skill. So data visualization as a tool in the toolbox of, I think, more and more people with different types of roles and goals and things that they're working on. And so I think in the tech space, in the wider sense that we have just so much more data, so much more information to understand, whether that's data about our users, about our systems, about our code changes, and open-source projects on GitHub and what have you. There is just so much data, and it seems to just be exploding, and exploding, and exploding, the volume that we have of it.
I think data visualization and being able to understand what it means to make decisions that go into how you're visually presenting a dataset is something that's going to be increasingly more important. And I think we've seen that through the COVID pandemic of how all of these charts, a lot of folks are looking at these spikes and these hospitalization rates and case rates and things like that every day with a lot more frequency than we might have been looking at dashboards like that before.
And you see so many charts going around that are so misleading and make so many assumptions about what you're showing or have such horrific Y-axis crimes committed that they can give people an entirely wrong idea about the world around them and help them or hinder them from making the decision. So I think that learning how DataViz works, learning what choices go into, whether it's a pie chart or a bar chart or a line chart or something that you think you've seen a million times, but there are some real important choices behind that. I think becoming more fluent with that, developing skills to be able to create visualizations of the massive amounts of data we have is going to be something that we're all going to need to do a little bit more of.
Jonan: I think this is a reasonable assumption that you are making. So the other question that we have away from the future here let's talk about the past a little bit and maybe what advice you would have given to yourself just starting out in this industry, something that you wish someone had told you. There are many people out listening today who aspire to be in your shoes today. What advice do you have for those folks?
Anjana: I think that my number one advice for anyone trying to learn anything or break into any kind of new field is to find people to do it with, so find community. And find people that you can talk about your frustrations with, talk about your goals with, talk about the things that you've just learned, people who are in a similar position to you, or maybe one or two steps further along on the journey.
So I started learning to code as a hobby that I was doing really in isolation. I found a book at the library, and that's already a form of community, right? Like, humans wrote that book and are communicating to me through its pages. But for probably a year or two, it was just kind of tinkering and maybe peeking at Stack Overflow here and there or a few other kind of more forum-y sites and not really talking to anybody face-to-face or virtually, as the case may be, about what I was learning, or what I was trying to learn, or what I was frustrated with, or what I was not understanding.
And I later had the immense good fortune of finding an incredible community called The Recurse Center in New York City, which really just blew my world wide open. Being able to meet so many different folks who are working on similar things that I was interested in, totally different things I'd never heard of that now I could be curious about and find more information about. And just having a community of folks who understood how hard it is to do this work and who understood how hard it is to learn a complex new topic or concept. That was really the single thing I think that made it possible for me to be where I am today. And that led me to a lot of other amazing communities that also are instrumental in me being where I am today.
And so I think that just finding people, talking to them, putting yourself out there. Like we said before, it is so scary, but I think it's really essential, especially if you're trying to learn a totally new field.
Jonan: Yeah, I absolutely agree; both are wonderful bits of advice and prediction. I look forward to having you back in a year and seeing how much progress. Maybe we'll have you on with one of our guests today who goes off to Recurse Center and jumps into a DataViz career.
Anjana: Yeah, I hope so. [laughs]
Jonan: It was really nice chatting with you. I hope you have a wonderful day.
Anjana: Likewise. Thanks so much for having me on.
Jonan: Thank you so much for joining us. We really appreciate it. You can find the show notes for this episode along with all of the rest of The Relicans podcasts on therelicans.com. In fact, most anything The Relicans get up to online will be on that site. We'll see you next week. Take care.