DEV Community: Jayesh Bapu Ahire

Adopt AI, But Responsibly!

Jayesh Bapu Ahire — Sat, 13 Apr 2024 14:24:27 +0000

In this riveting episode of the AI Guardrails Podcast, we dive deep into the transformative impact of artificial intelligence on the field of cybersecurity. Join us as we sit down with ⁠Andy Martin⁠, founder and CEO of ⁠Control Plane⁠, who shares his expert insights on the integration of AI technologies in protecting digital infrastructures. From the origins of AI in gaming to its pivotal role in modern cybersecurity strategies, Andy unpacks the complexities and potential of AI to reshape security paradigms.

We explore cutting-edge topics such as AI's ability to automate vulnerability assessments, the ethical considerations of AI deployment, and the future of AI-enhanced security measures. Whether you're a cybersecurity professional, AI enthusiast, or just keen to understand the next big thing in tech, this episode will arm you with the knowledge you need to navigate the AI-powered security landscape. Tune in to discover if AI is truly the cybersecurity hero we've been waiting for or if it's a Pandora's box that might unleash new challenges.

Guest Bio:

Andrew Martin is founder and CEO of ⁠⁠Control Plane, ⁠specializing in securely unlocking next-generation cloud technologies for highly regulated clients. He holds pro bono positions as CISO at OpenUK and co-chair of the CNCF’s Technical Advisory Group for Security. His clients include Google, the UK Government, Citibank, JPMC, PWC, BP, Visa, British Gas, The Economist, and News UK.
He is co-author of Hacking Kubernetes (O’Reilly Media, 2022), has published training material and numerous whitepapers for clients including the Linux Foundation and SANS, and regularly speaks and delivers training at international security conferences and events.

Transcript

Jayesh Ahire: Hello, everyone. Welcome to the AI Guardians Podcast, where we discuss AI, the adoption challenges, and the security opportunities around building new AI and ML applications. Today, we have with us Andy from Control Plane.

Andy Martin: Hello, thank you very much for having me on. I'm Andy, founder and CEO at Control Plane. I have a background in classic computer science, did a lot of development and infrastructure engineering, DevOps, and all that good stuff, and worked through a number of different regulated organizations and started Control Plane to bring consulting and assurance around cloud-native and next-generation technologies to regulated industries. We're based out of London with a lot of colleagues across Europe, Asia Pacific, and North America.

Jayesh Ahire: Interesting. Andy and I met back in 2019 on the streets in Sydney where we shared some good kebabs and good talks, and that's where we actually got to know each other. Control Plane has been doing fantastic work in different domains of security and different strategic relationships they have with various organizations in the UK and across the world.

Today, we'll be discussing different use cases Andy has come across specifically in the cybersecurity domain when it comes to AI and the new gen AI, the "LM" thing which has been catching up since last year. So to begin with, how did you get interested in this specific domain? And what are some of the interesting use cases you have seen while working on this?

Andy Martin: Well, one of the first things that sparked the interest was watching some of the things that OpenAI did many years ago, where they released a playground, and you could build your own agents to play classic games so you could teach an agent to play pong and some side scrolling games and that kind of thing. A very dear friend of mine who actually trained as a lawyer turns around to me and says there's so much stuff going on here. DeepMind, they were just starting the AlphaGo project as well, and one of the DeepMind founders was, or still is, I think a lecturer at University College, London. His reinforcement learning course was live on YouTube. So my friend and I sat down and watched a lecture module on reinforcement learning and discovered that it was reasonably complicated. But this very simple process of having an agent with a feedback loop, observation of the environment, and a reward function, it's a very, very simple way of modeling something that's actually exponentially complex across multiple dimensions. We did this course, it was fantastically interesting during that time. DeepMind then got acquired by Google and integrated internally there. We've got Demis Hassabis, the CEO, now deeply integrated into Google's plans and all of their latest models. So the last two years since the release of OpenAI's ChatGPT in the open AI playground before the big launch, it was clear that there was something there, the token prediction capabilities of LLMs got to a point where they were really useful.

Jayesh Ahire: As you rightly mentioned, right specifically around the things like vulnerability assessment, and then like at least reducing the repetitive work which has been happening in these workflows.

Andy Martin: Absolutely, and it's becoming one of the interesting use cases. We have been seeing this across things like SAS, and even the DAST tooling, even SCA, for that matter. And traditionally, SAST is producing a lot of false positives. But how can we minimize them using all of this context we are feeding into these models?

Jayesh Ahire: And I know you have been running some of the initiatives, including the FinOps AI readiness work group, and all of the others. We want to touch base on that, and go through. What are the different things you have been doing and exploring in this specific domain?

Andy Martin: Yes. So, the financial services open source organization which sits under the Linux Foundation has soon to be announced AI readiness working group. I'm on the steering committee, and the goal of the group is to bring together stakeholders from all the major banks and some other financial services organizations and figure out what is the simplest way to get AI into these organizations. Now, roadblocks involved there include well, who owns this data and is the leak of this data an existential event for the bank, for a regulator. For existing security controls and mechanisms in place in the banks that we want to continue to respect because we spent many years building them, and they're compliant with regulation. So the AI Readiness working group is a collection of individuals operating under Chatham house rules to share experiences and to build out a set of common guidelines and frameworks for secure adoption of AI. So that that's very much based on the legal implications, privacy, and how we deal with the governance, risk, and compliance of the entire end-to

-end AI ML lifecycle.

Jayesh Ahire: You made an interesting point at the end, right where? As you're adopting these technologies, it's very important to be responsible. And at the same time make sure that we are aware of all the caveats that come with it to protect ourselves from any of this damages.

Andy Martin: It's extremely difficult. Again, if we model what we've done with AI as an extension of human patterns of behavior, capturing of what biases, as well from from an ethical perspective, and our performance at keeping things private as a species, then we can see that we have just made a far more flamboyant rods for our for our own backs. But for data protection, it's a huge open goal. It's back to this question of where do we delineate those security boundaries? And do we put financial information in the same place as legal information in the same place as proprietary information? The answer, of course, is No, but what the the initial dream has been, or perhaps in this space of uncertainty, as we begin to adopt this as a commercial enterprise across the world. Is that? Yes, we can do this with a single model. And really, this is where I see the the greatest greatest issue. The inherent risks when the model is inherently opaque as to its training, data, composition. One piece, and then add on top of that. the mixing of different data classifications and sensitivities. And we're in quite a risky space. So one of the things that that I'm also involved with is open UK. which is a charity sorry, a nonprofit focused on open technology hardware data and software. And we try and stop governments from making non technically based legislation decisions. And and it is wonderful to work with the the UK Government and the European Union and and the White house on those things. One thing across the 3 is that the European Union has had an AI act in place and moving along for for a number of years. That's neo completion. We'll we'll be enforced in the next, I think. Couple of years from a UK perspective we got a consortium together of of countries from around the world, and agreed on safe and sort of trusted AI. But also, you know, the UK Government is really rushing into this. The UK is one of the biggest service industries in the world, and it's clear the AI is very well placed to eat some of that breakfast, if you like, and and and to make services more optimal so suddenly, there's a lot of challenges. So the the government in the UK has really pushed for you know they courted open AI which will open an office in London, and already. and a really push from that perspective. And then the US. Is is kind of just observing it. It doesn't have legislation moving in in the same way for these things. Of course it is the birthplace, and it's now looking more at. The embargoes around the hardware that needed for AI, so that Chip will with China, for example, TSMC being protected. In in Taiwan. spoke in terms of of where that privacy and data protection is sat. The European Union are doing the best job by a long way of actually codifying and sort of gathering those ethereal threats.

Jayesh Ahire: Yeah. And that will be definitely very interesting to see where this evolves to that extent, where everything is in context. And we can actually just go and ask the question, and we get the answer. It will be just one more teammate in the whole process, I guess. The one thing which comes along with that is. as these, models are getting perfected. There are also some of these humans who are trying to perfect in the whole AI driven cybersecurity, or even learn more about the new advancements in LLM and the learn LLM space. What will be the advice from you, for them.

Andy Martin: My favourite piece of advice in this space is. consider the model as a frenemy. It's not your enemy, but it's not your friend and I, I'd say that it is an untrusted, non-deterministic. potentially hostile threat actor behind your firewall. So as long as we consider that we've dropped something in that is completely new in the history of computing. We haven't given an agent this level of autonomy. Now we have to some extent that those agents are testable, deterministic. In this case, we don't understand how models make their decisions. We understand how they're trained, but actually getting explainability back out of the model, which is also part of the EU AI act. Getting that explainability is not uniform across everything, and it's not necessarily accurate all the time. So we have to consider that we've installed something new we haven't dealt with before at this scale that is evolving extremely quickly. So just consider what actually is that model. as I say. half friend, half enemy, with the capacity to do incredible

goods, and to reduce toil for humanity, and reduce the burden of repetitive tasks. But also we. We must use this to embellish ourselves and not replace ourselves, because we can't trust that. and we have society evolved to give us these constraints and and legal requirements and ethical concerns that we have baked into our souls, our personas, as people that are not replicated by the machines. So Firewall and everything, and trust but verify.

Jayesh Ahire: Thank you. And absolutely so. That's I guess. more of a drawing drawing this boundary somewhere and making sure that everything you do remains in that specific constrained environment. and making sure all of the things you want to put it up, protected all of the things you want insights on are better served by the models you are using cool. So I guess, though, those are all the questions I had, and I wanted to go through on the sidelines like, Have you? Do you? As I can see you like reading a lot, I guess. And then you go through a bunch of these things. Any interesting book recommendations you have come across recently?

Andy Martin: Yes. What? my friend and colleague Mr. Vicente Herrera, who has been doing a lot of work control plane on red teaming models, and and how we consider them as that's. And again, as these insecure things with security potential just recommended, a book to me called Generative AI Security. It's published by Springer and had it the yeah, I'll I'll share a link to that. That that's quite new, and it's very comprehensive. I've also been reading things like the OWASP LLM top 10, things like the NCSC Secure AI adoption guidelines. A lot of people have put out a lot of different literature. and it tends to come in at different levels. So on the one hand, you'll have security of data and privacy and keeping it very high level. At the other end, with something like OWASP, you have in depth. On what do these attacks look like? How might they actually happen? So so I find those very interesting as well. And it's a and buying and friends is the generative Bio Security Book.

Jayesh Ahire: We? Yeah. thank you. And that, that that must be interesting to go through as so. thanks, Andy for joining in. And I guess. It was pretty long and detailed. Very interesting conversation one of the longest episode, for sure. and but I really like talking to you, and all the insights you have from different things you do. In your in your day to day. Again. Thanks for joining in. I hope you enjoyed it as much as I did. and see you again.

Andy Martin: It was tremendous. I enjoyed it thoroughly, and thank you very much for having me.

Jayesh Ahire: Thank you. Have a good time.

Conclusion

Thanks for listening/ reading the Second episode of AI Guardrails podcast. You can find the latest episode here: https://podcasters.spotify.com/pod/show/ai-guardrails . We are available on Spotify, Apple Podcast, or any of your favorite podcast apps. What are the next topics you want to listen about? Let us know in comments!
⁠⁠

Don't Keep Up with AI!

Jayesh Bapu Ahire — Fri, 12 Apr 2024 13:34:09 +0000

Don't try to keep up with AI! by AI Guardrails

In this thought-provoking episode hosted by Jayesh Ahire, Noah Gift, founder and CEO of Pragmatic AI Labs, delves into the complex interplay between generative AI, cybersecurity, and ethics. Gift challenges the current hype surrounding AI, emphasizing its role as an enhancer of existing best practices rather than a disruptive force. With a keen focus on the importance of ethical considerations and the potential risks of commercial AI models, he offers insightful perspectives on the future of technology. Join us as we explore the ethical minefield of AI in cybersecurity and the critical importance of adhering to solid principles over chasing the latest trends, all through the insightful dialogue between Ahire and Gift. Guest Bio: Noah Gift is the founder of Pragmatic A.I. Labs. Noah Gift lectures at MSDS, at Northwestern, Duke MIDS Graduate Data Science Program, the Graduate Data Science program at UC Berkeley, the UC Davis Graduate School of Management MSBA program, UNC Charlotte Data Science Initiative, and University of Tennessee (as part of the Tennessee Digital Jobs Factory). He teaches and designs graduate machine learning, MLOps, AI, and Data Science courses, and consulting on Machine Learning and Cloud Architecture for students and faculty. These responsibilities include leading a multi-cloud certification initiative for students. Host Bio: Jayesh Ahire is the Founding Product Manager at TraceableAI where he runs the Company’s API Security initiative. He is a Practitioner at heart and has worked with numerous organizations to design and implement secure API architectures and integrate security practices into their development processes. He is the maintainer of Open source projects like OWASP crAPI, Hypertrace, and many others. He has presented at various industry conferences on topics related to API security and secure development practices including DefCon, Bsides, and BlackHat, and also runs the API security Global community.

podcasters.spotify.com

In this episode hosted by ⁠Jayesh Ahire⁠, ⁠Noah Gift⁠, founder and CEO of Pragmatic AI Labs, delves into the complex interplay between generative AI, cybersecurity, and ethics. Gift challenges the current hype surrounding AI, emphasizing its role as an enhancer of existing best practices rather than a disruptive force. With a keen focus on the importance of ethical considerations and the potential risks of commercial AI models, he offers insightful perspectives on the future of technology.

Join us as we explore the ethical minefield of AI in cybersecurity and the critical importance of adhering to solid principles over chasing the latest trends, all through the insightful dialogue between Ahire and Gift.

Guest Bio

⁠Noah Gift⁠ is the founder of ⁠Pragmatic A.I. Labs⁠. Noah Gift lectures at ⁠MSDS⁠, at Northwestern, ⁠Duke MIDS Graduate Data Science Program⁠, the ⁠Graduate Data Science program at UC Berkeley⁠, the UC Davis Graduate School of Management MSBA program, ⁠UNC Charlotte Data Science Initiative⁠, and ⁠University of Tennessee (as part of the Tennessee Digital Jobs Factory)⁠.
He teaches and designs graduate machine learning, MLOps, AI, and Data Science courses, and consulting on Machine Learning and Cloud Architecture for students and faculty. These responsibilities include leading a multi-cloud certification initiative for students.

Transcript

Jayesh Ahire: Hello, everyone! Welcome to AI and Guardrails. Today we have with us Mr. Noah Gift.

Jayesh Ahire: As you know, in this podcast, we talk about generative AI and security and how to incorporate generative AI into your security strategies. And how to put specific guardrails in your workflows.

Jayesh Ahire: So Noah is the founder and CEO of Pragmatic AI Labs, and as we were chatting before this, he mentioned that he has done a bunch of interesting gigs in the past, one of those being a bouncer.

Jayesh Ahire: So I will let him introduce himself and go through his journey through tech and AI, specifically.

Noah Gift: Yeah, hi, happy to be here. So my background is that I've had a lot of different jobs. Early in my career, I worked in live TV when I was a teenager, which was a pretty useful thing to know how to do because I learned how to edit.

Noah Gift: And then, when I was in college, one of the jobs that I had just for maybe like 6 months was a bouncer at a really large bar, and it wasn't necessarily that I was looking to be a bouncer. It just happened to be the job that I could get so that I could pay the rent while I was in school. And it was pretty fun because I got to work with actually one of the UFC champions, Chuck Liddell. He was one of the bouncers there as well. So it was an interesting kind of accident to work with somebody like that. And then, later in my career, I've worked in TV and film quite a bit and then later startups in the Bay Area. So currently, though, I teach part-time at a couple of different universities, including Duke. And I'm focused on creating content around, I'd say, cloud computing, data engineering, and AI.

Jayesh Ahire: Interesting. Yeah, that must have been very interesting to throw people out of the bar.

Jayesh Ahire: Yeah. So, as you mentioned, right, you have been in this industry for a while now. And focusing on cloud computing, AI, a bunch of different things. So specific to the topic we are discussing right? What are some of the interesting use cases or impactful ways you have seen gen AI or LLM being used in the industry in general, as well as cybersecurity, for that matter, for the sake of the topic? If you can focus on both of those, it would be great.

Noah Gift: Yeah, I think what I see right now with generative AI is that there's a lot of hype around it. But some of the hype doesn't really play out, I think, in terms of usefulness where the organizations that are going to have the best return on investment are going to be organizations that already are well organized. So they are using agile, they use DevOps. They have, you know, security best practices like the principle of least privilege, auditing, you know, multiple layers of security. And so where I see generative AI coming into play is just enhancing what you're already doing. So if you're already doing, you know, exploit analysis, then you can use generative AI to help you with exploit analysis. If you're already doing analysis for outliers, looking at strange behaviors, right? You can also use generative AI to help you with that. So I see it less as like some revolutionary change and more as an accelerant to best practices. So if you are not doing best practices, you're gonna have a very poor time getting results from generative AI.

Jayesh Ahire: And that's an excellent point you made there, right? Enhancing the current workflows which you already have in place. And that's where we have been working with a bunch of customers where people are trying to incorporate this in a way where there's some existing workflow, some existing automation, some existing tech in place, with how can we increase the efficacy? How can we make that workflow more efficient?

Jayesh Ahire: So when it comes to the overall security strategy of an organization, how do you think this fits, and what are some of the interesting use cases or workflows you have seen where gen AI/LLM can have a significant impact?

Noah Gift: Yeah, I think Google recently did a survey where they asked, I think, 100 C-level people what they think the main use cases are for generative AI, and to summarize, I think it was customer service, developer productivity, also content development, and then potentially automation. So where I see maybe a big takeaway would be in terms of developer productivity, you could use generative AI tools to help you look for security holes in your code. So you know, you could have an assistant, maybe with a specific prompt, and the prompt would look for specific patterns that you're trying to identify like, are you declaring

variables that you're not using, or are you not freeing up memory, etc. And so I think that would be potentially an easy one would be to enhance what you're already doing with a chatbot. And then, in terms of automation, as I mentioned earlier, potentially, you could have your already good practices, like looking for outlier behavior or auditing logs, etc. You could have generative AI help you look for patterns like, do you see something that looks like it's a specific behavior, etc. So that's where I really see it fitting in is in those four areas of customer service, content development, developer productivity, and automation.

Jayesh Ahire: Yeah, absolutely. And so, in the same context, right? Also seeing things like people doing SCA and all of those things, using gen AI, using even analyzing the results which come out of this while having predictions to make sure they prioritize the right things, or having all the context built into things so that the personalization or prioritization of a few of these things can be efficient. And that's as we start using this, as you mentioned, the four use cases as we start adopting some of the strategies in our day-to-day workflows.

Jayesh Ahire: What do you think are challenges when it comes to the CxO or CISOs' point of view, where we're trying to incorporate it into security operations. But there are still challenges, including data protection, privacy, a bunch of different things. Right? What is your view on those specific things?

Noah Gift: Yeah, I think that there's a real problem with relying on a commercial, large language model technology. And it's probably much better to think about using open-source large language models as a trial. So what you could do is use technologies like LLaMA, for example, which is Mozilla's runner essentially for large language models. And you could start to take a look at potentially using things like MixTURE or some of these open-source models as maybe the start of what you're building in terms of automation. The real issue with these commercial large language models is that we really don't know yet what will happen when you send all this data to these commercial companies. In theory, the data will never get leaked, in practice, there's a lot of data leaks. So if you're really thinking about security, and then you immediately start using commercial third-party systems and start sending them your data. It doesn't really seem like it fits the smell test for doing security best practices to start sending a bunch of data to a company that, in many cases, is actually already being sued for pirating data or really not caring about data. So I think that's a very big risk for organizations.

Jayesh Ahire: Yeah. And as you're talking about this specifically and the solution being using some of these open-source technologies to make the process efficient and even reliable to some extent. Have you seen any specific scenarios where any examples around you where people have tried to use this, and they have gotten good results? What did that workflow look like, or what are the considerations still in place while using these open-source technologies and building the end-to-end pipeline on it?

Noah Gift: I think an easy one potentially is transcribing video to text. I mean, I think that could be an interesting one. So let's say that you were looking for. Let's say you're working for a government, and the government organization wanted you to prevent classified information from being transmitted into the public. Well, an easy technique could be to transcribe all of the video content into text, and then take that text and then analyze it for leaks. So I think that could be a good example of an open-source tool like Whisper, for example, could use that technology.

Jayesh Ahire: Got it. And as you started talking about government, one of the aspects around AI which is continuously debated and continuously discussed on different fronts is AI and ethics. And when it comes to ethical considerations, there are definitely some of the challenges. On using this AI at a larger scale, especially when you're dealing with sensitive information when you're dealing with things that can impact a lot of lives. There are definitely regulations coming up and then the EU already has some of the things in place, even the laws coming in the US, Asia, but different places right? But at the same time, when we start adopting gen AI, LLM into our workflows, especially when you're dealing with things like security most of the security tools, or even any of the security pieces, will deal with a bunch of sensitive data in theory as well as in practice. And when it comes to dealing with this whole bunch of data, how do you navigate through this ethical consideration, especially around the privacy and protection?

Noah Gift: Well, I think a good starting point would be if you're using commercial models to look at what the company is already doing. So I think if we look at different organizations that are

doing large language models, some of them have already had problems right with ethical issues. And so I would say, avoiding those companies and choosing the companies that have the least amount of litigation, or least amount of concerns about them, would probably be a good choice. So you could rank the different commercial models. So you could pick, you know, 3, 4, different models, just like you would do with a bank, or will you do with any kind of vendor that you deal with, rank them and try to find the companies that seem like they're doing the best with data protection. So I think that's maybe the starting point. Then the second point would be to, you know, really think heavily as well about ways that you can have isolated data when it uses the large language model technology. So if we look at RAG, for example, that's a good example. Where you could have your data protected somewhere talking to a vector database. And there's a large language model that maybe is commercial. If you've implemented it correctly, maybe Amazon Bedrock has a good example of this, or maybe it's a local, large language model. But so I think isolating your data from being exposed to the third-party system. And then there's also things like bias, right where if you're using models that are accentuating what's already happening that was historically bad, like discrimination, then you want to be very careful about putting that into production because you're going to accelerate a problem that society has tried to solve.

Jayesh Ahire: Very good points, and irrespective of the industry in place. What is your personal take on the whole ethical aspect, like anything that is controversial, which can give me a quote?

Noah Gift: So I mean, I think, in terms of the ethical components of large language models. Right now, I think one of the biggest issues is probably piracy. And I think when a large for-profit company is training data intentionally on pirated data, I think that raises a lot of questions about what are their true intentions in terms of helping the world or making a profit, etc. So I think that's probably one of the bigger issues is, are companies respecting intellectual property and are they asking for consent? And really the keyword is consent. And I think we see many examples of a lack of consent when training on data, and even in the case of open-source code, you know, there are different licenses. For example, there's MIT or Apache license, which you can do anything you want. But there's also Creative Commons license as well, where some people have said that their code is non-commercial Creative Commons. So if you've trained your model intentionally on code that's been licensed to be Creative Commons not commercial, then you're basically intentionally breaking the law. There's no other way to put it. I mean, the license specifically says that. So I think those kinds of questions really should be thought about when you're dealing with a company.

Jayesh Ahire: Absolutely. And so just one question slightly deviating from this one. Right? When you mentioned the self-hosted models and building some of these things in-house one of the regular things we keep hearing is cost—the cost of running these models internally or whereas the cost of running this from someone else like using third-party services can be less in some of the cases. So any thoughts around this cost of ownership, and how can we manage it to some extent?

Noah Gift: Well, I think one of the ways to think about it again is thinking about what are your software engineering best practices. So if you have very poor software engineering best practices, you're not doing continuous integration, continuous delivery, you have poor DevOps skills or project management skills, well, you're gonna waste a lot of money on anything you do. So that might be the first place to start, and then in terms of hosting a model. I think it really depends on what it is you're doing. If you're taking an open-source model and you've already got an extremely efficient software system. It may be actually very efficient to host your own model in terms of calling an API. I think one of the problems with calling an API is you have unbounded cost. So if we look at Big O notation, you've got O to the 10 to the N, O to the N squared. When you're basically having O to the N calls. So as your company gets larger and larger, and you keep making calls over and over to an API every time you make a call, you're being charged for it. On the flip side, if the model lives locally, it's O to the 1. Every time you call it, it's just a fixed cost. Like, if you already have a server and you're calling some kind of generative AI workflow, it's essentially a fixed cost that you never have to pay for again. So I think it's really a combination of what you already are doing in

terms of software engineering best practices as well as how you reason about the number of API calls you're going to make.

Jayesh Ahire: That definitely helps, and I guess that will be helpful for many of the others as well.

Jayesh Ahire: So as we go through all of this, like, we talked about some problems, we talked about some solutions. But at the end, if we want to talk about the future specifically, what do you think will be some of the interesting use cases? Which can become prominent in the next 2 to 3 years?

Noah Gift: I think probably software engineering will be more of a collaborative workflow with lots of different agents. So it's possible that you have 2 or 3 different chatbots that are watching what you're writing, and maybe one of them is looking for security. Maybe another one is looking for architecture. And then maybe another one is looking for code quality. Right? So it's almost like you could have lots of different people pair programming with you, but not slowing you down, so I think I think we'll probably see more tooling around code. I don't think coding will be automated anytime soon, but I think there will be different workflows.

Jayesh Ahire: Interesting and anything specific to security?

Noah Gift: I think it's the same with security. So in terms of security, anything you're doing already, you could just think of it as adding additional personnel, right? So maybe you have someone when you're looking at security incidents, you could have a chatbot helping you right? Looking at different outliers, giving you ideas, but it's up to you, as the domain expert, to filter all those different ideas, to make sure that they make sense.

Jayesh Ahire: And as you mentioned one thing right? Like, you don't think programmers will be replaced by AI at all. It will be more of a collaborative effort going forward. Do you think like any of the do you think this this applies to most of the jobs that exist right now, like most of the jobs, will turn into that kind of fashion. Or do you feel that some of those things can get replaced at some point? I know we are way diverging the topic at this point. But yeah, to take your view.

Noah Gift: Yeah, I think anything that requires human judgment. We're a far way off, right? So if it requires an expert to make a decision, I don't think that's going to happen anytime soon. So in the case of security, sure, maybe some things could be automated. But at some point, somebody's still gonna have to make a decision based on that data. And that's where I think there are real gaps in what we've currently got. Same with coding, same with self-driving cars. Right? We're nowhere close to self-driving cars. And so I think the same with the automation. Now, on the flip side, if what you've been doing is cutting and pasting text, yeah, I think you're going to get automated.

Jayesh Ahire: Cool absolutely. And as we're talking about this specific point, there are a lot of folks who are interested in learning more about LLM, learning more about gen AI, as this is going to be more collaborative in the future. And we'll need to know these things. Anyways, there's a lot of interest in learning about these things. Right? So what would be your advice for these folks who want to learn more about gen AI, LLM any workflows, or any of the things you have in mind, any courses you have in mind? I know you have been working on one for some time, so.

Noah Gift: Yeah, so I do a lot of work with Duke on Coursera. I also have some stuff coming up on EdX. That's gonna get announced in Q2, 2024. But basically, if you just look for Coursera Noah Gift, the amount of courses I have is, I think I have roughly 40 courses on Coursera, which is one of the largest amounts of courses by an individual. But a lot of the topics are around real-world, large language model usage, real-world security, real-world cloud computing. So stuff that would really apply to you immediately at work, or maybe would help you get a job. And so that's probably where I would point people is, look at the content I've created on Coursera. There are, I think, 7 courses that are live right now around large language models. It's in the realm of LLM Ops on Coursera, and that would probably be a good spot. I also have some stuff coming up in the next few months on Agile with AI, and also security with AI.

Jayesh Ahire: Interesting. Yeah. So that will definitely be helpful. But one of the things which keep coming up. And I have been discussing

with a bunch of people. So there are a lot of things happening, a bunch of things happening every single day right there. There are new models coming in, new papers being written, so how to keep up with the space, and how to keep ourselves updated every single day. Like, what do you follow, or what will you recommend for people to follow?

Noah Gift: Yeah, I think that's an interesting question, because it's true that there's so much stuff happening that it feels like how could you possibly keep up, and what I would say is that don't keep up. So what I mean by that is that you can. There's nothing that's urgent in the last 20 years, 30 years that I've been in the tech industry. There's never been a case where, if you didn't know something that one week that you lose your job. That's not how the tech industry works. What's the most important is the principles. So, as I mentioned before, you know, agile, DevOps, all these core principles of software engineering best practices. That's what's important. So if you don't have that, it really doesn't matter what new advance is happening. I mean, if you're thinking that you can basically cut the line and get in front of other people who have deep, expert experience by cutting and pasting things from ChatGPT, you're gonna be in for a deep surprise, right? Because that's not how the world works is cutting and pasting code. Right? It works with automation, continuous improvements, these best practices. So I would say, it's not really that important to be up to date on a week to week basis. I think it's much more important to have a deep portfolio of work that shows software engineering best practices, and then wait a little bit. Let the people filter out what's important, right? Because there's so many people trying to be on top of everything. Just wait a week or two, and then you'll figure out based on what the community has decided is important. And then just do that.

Jayesh Ahire: That's definitely a very interesting take on the whole thing.

Jayesh Ahire: Yeah, absolutely. I agree with some of this part, because I remember, like a couple of months back having a discussion. And then one thing that came up was this feels so important because this is one of the most interesting things that has happened in the last decade. Also because everybody was doing exactly the same thing right level. Everything was running in AWS, Azure, Google Cloud day off. All of those things were becoming so standard that there was nothing new to look forward to. And then suddenly the hype started. And now everybody has something to discuss every single day. And that's why I guess people keep falling for keeping up with things, because meetings are happening, but, as you likely pointed out, don't keep up you're trying to see something.

Noah Gift: Yeah, I mean, I'm not saying that you shouldn't try to be up to date on technology. It's just the time interval. So what I mean is that it's okay to be a month behind. There's absolutely nothing really of value that's going to happen within one month. There really just is not that the core principles are what's important. And I would say, there's actually a huge strategy to letting other people figure out what's important first and again waiting a few weeks. And then, if you know, like somebody figures out a new technology great, then use it. Be lazy. Stop working so hard. Be much lazier.

Jayesh Ahire: Yeah, absolutely.

Jayesh Ahire: Cool. So mostly that was all I had around the specific topic, right? As we go towards them. I like reading myself. But do you read often, or any fiction nonfiction recommendations which you have gone to recently?

Noah Gift: Yeah, I read a lot of books, I would say in terms of reading, I think a lot of books around the turn of the century. Like, in terms of 1910, 1920, 1930, have all been kind of interesting to me lately. I think reading Hemingway is an interesting author because he talks about a lot of the things that happened in World War I, World War II, the different changes. So if someone hasn't read Hemingway, I think that could be a good choice for a book to read like, for example, For Whom the Bell Tolls is actually a pretty good book, and maybe even a timely book that talks about the rise of fascism in the 1930s. And how a fascist dictator came to Spain, Franco, and there was a civil war between socialism and fascism. It could be a very interesting book, especially in the world that we're in right now. There's a lot of political unrest, and that could be a good author to read

, because he covers a lot of the topics that I think are potentially topics that we're re-addressing a hundred years later.

Jayesh Ahire: Interesting. Yeah, absolutely I'll personally go through that. But anybody who is interested in the specific era and the specific topic as Noah comments go through it. Cool. So mostly that's it. And thanks for joining us. Thanks for giving the insights. And I think a lot of these things will be very helpful for listeners as well as I. Personally, learned a few important things from the conversation as well. So yeah, thank you, and have a great evening.

Noah Gift: Alright! Talk to you later. Bye.

Jayesh Ahire: Bye.

Conclusion

Thanks for listening/ reading the first episode of AI Guardrails podcast. You can find the latest episode here: https://podcasters.spotify.com/pod/show/ai-guardrails . We are available on Spotify, Apple Podcast, or any of your favorite podcast apps. What are the next topics you want to listen about? Let us know in comments!

Getting started with Observability

Jayesh Bapu Ahire — Mon, 09 Aug 2021 13:12:06 +0000

Most of the tech giants including companies like Amazon, Netflix, started to build their systems using a monolithic architecture because back in the time it was much faster to set up a monolith and get the business moving. But over time as the product matures or fat growth happens, with growing systems the code gets more and more complicated. They all faced this problem and looked at microservices as a solution. One of the biggest benefits of microservices is that each microservice can be developed, scaled, and deployed independently.

With great power comes great responsibility and that’s what happened when organizations switched to microservices from more monolithic application architectures, they got significant benefits in delivery speed and scalability but on the flip side now they have to deal with the operational complexity in managing, monitoring and securing the new distributed architecture.

One of the benefits of working with older technologies was the limited set of defined failure modes. Yes, things broke, but you would pretty much know what broke at any given time, or you could find out quickly because a lot of older systems failed in pretty much the same three ways over and over again.


Microservice interactions at Amazon and Netflix (Image by divante.com)

Adopting a single deployment platform can address some of the concerns regarding operational complexity but it goes against the philosophy that makes microservice architectures effective. Using APIs to expose core business functionality and facilitate service-to-service communication gives us several control points and makes it easier to deal with complex modern applications. API-driven applications come with their issues like design complexity, visibility, communication, security, etc. which we discussed in detail in this blog post.

In a nutshell, Operating distributed systems is hard, not only because of their inherent complexity of the number of components and their distribution but also because of the unpredictability of their failure modes: there are plenty of unknown unknowns. We are left with an imperative to build systems that can be debugged, armed with evidence instead of conjectures.

With the growing complexity of systems and fast-moving software delivery trains due to modern cloud-native architectures, the possible failure modes became more abundant. Monitoring tools helped us for a while in keeping track of application and infrastructure performance analytics but it isn’t very suitable for modern distributed applications. As we discussed above, developers these days don’t know what their software failure modes are and more unknown unknowns means we won’t put any effort into fixing something because we don’t know the problem exists in the first place. Standard monitoring can only help you with tracking known unknown and it’s very relative. Your monitoring is only as useful as your system is monitorable.

This monitor-ableness of your modern applications is what we call "Observability".

In control theory,

Observability is defined as a measure of how well internal states of a system can be inferred from knowledge of that system’s external outputs. Simply put, observability is how well you can understand your complex system.

Metrics, events, logs, and traces—or MELT—are at the core of Observability. But, Observability is about a whole lot more than just data.

Observability is all about the ability to ask abstract questions to your system and find the answer without the need of opening a black box. Like, consider your process of placing an order on Amazon failed due to query timeout so What characteristics did the queries that timed out at 500ms share in common? Service versions? Browser plugins? Here, Instrumentation produces data which is what we call telemetry, and querying that data answers our questions.

Whenever we talk about Observability, we also talk about metrics, logs, and traces which are three pillars of Observability.

Metrics: Aggregated summary statistics.
Logs: Detailed debugging information emitted by processes.
Distributed Tracing: Provides insights into the full lifecycles, aka traces of requests to a system, allowing you to pinpoint failures and performance issues.

We will discuss all three of these in detail in the upcoming articles in this series.

In a nutshell,

This blog post was about giving you a brief overview of Observability and help you understand why you need Observability. In the upcoming blog posts, we will talk about metrics, logs, and traces and also see different applications of modern Observability for modern distributed applications.

Until then, if your organization is using microservice architecture and exploring Observability solutions, feel free to check out Hypertrace, which is a modern API Observability platform. join our slack community to interact with folks who are on the same microservice transition journey and are exploring Observability.

References

Challenges with Microservice and API ecosystem

Jayesh Bapu Ahire — Tue, 01 Jun 2021 10:59:34 +0000

In the last post of this series, we discussed the evolution of microservices and API ecosystems and what are the different benefits microservices and APIs offer when it comes to building large, scalable, and efficient systems. As I said in the last part (actually as uncle Ben said this), with great power comes great responsibility and the same thing applies to microservices and API architectures. These complex systems bring a lot of challenges with them and we will discuss those challenges in this part.

There are many challenges when it comes to adopting microservices or dealing with microservice architectures. We will be discussing a few of the important ones in this blog post.

Complexity: Complexity comes in different forms, either it can be design complexity or operational complexity. In the case of microservice architecture design, you have to deal with both of those.
- Design: When it comes to distributed systems, complexity is given. But, one way to mitigate this complexity is to create optimal abstractions. The effective modularization of the complex monolith can be done with proper and accurate service boundary definitions but creating those service boundary definitions is hard as well. That’s where design thinking comes into the picture. Defining clear boundaries and responsibility for each service is complicated and developers have to use a data-centric approach to arrive at a proper conclusion here. As compared to microservice design, API design is mature and has defined standard practices. API design thinking helps to identify the right service boundaries and helps to establish loose coupling between services so that implementation details don’t leak through.
- Operational: The main benefit of using microservices is you can develop, deploy and upgrade every service independently. This same benefit becomes the biggest pain point for many small teams who are trying to adopt microservice architectures as now there are maybe 60 services to be managed by 10 people and that operational load is very high. Maintaining, and continuously monitoring these complex architectures becomes very hard after some point. Some of the major pain points in operations management are:
  - Monitoring
  - Optimizations and scaling
  - Fault tolerance
Communication: Microservices have to communicate with each other to get desired things done. There need to be some infrastructure layer configs to enable resource sharing across services which ultimately enables these microservices to talk to each other. These configurations, if not optimized properly can result in high latency and error rate.
Security: As discussed in the above point, microservices use infrastructure layer configs to talk to each other. This along with multi-environment (multi-cloud as well as on-prem) deployments, results in even less visibility and creates many vulnerable points. This collectively increases the risk of a security attack.

Microservice frameworks in general come with many challenges when it comes to security, some of them are as follows:
- Data is distributed so maintain privacy, confidentiality, and integrity is hard.
- Setting up access control and service level authentication is tricky and makes services more prone to attack sometimes.
- Finding the origin of the attack and affected services can be tricky depending on the size of the architecture.

Most of the challenges we discussed above have some common origins and the most common one being less visibility in the system due to complex architectures and infrastructure. Observability into these microservice architectures and APIs can help us to understand these complex interactions better and help us solve design issues, operational issues, communication issues as well as security issues.

How? Going forward in this series, we will discuss API Observability and how we can use it to solve these complex but very important issues in API and Microservices ecosystem.

Until then, if your organization is using microservice architecture and exploring Observability solutions, feel free to check out Hypertrace, which is a modern API Observability platform. If you are in transition and want to learn more about Observability and instrumentation, join our slack community to interact with folks who have been through this transition or going through this transition.

References:

4 Easy Ways to Contribute to an Open Source Project

Jayesh Bapu Ahire — Thu, 20 May 2021 10:35:30 +0000

Want to contribute to an Open Source project but don't know where to start? You're in luck! This article will explain 4 easy ways you can contribute to an Open Source project as well as a few Hypertrace contributions you can make as examples.

Let's get started, shall we?

Documentation

Do you like to write. Do you like to share with people? As the face of an open source project, documentation is one of the most important contributions. Every open source project welcomes contributions to documentation. If you are just getting started with Distributed Tracing or Hypertrace and felt that we were missing some documentation details especially on our Getting Started page, you can go ahead and raise a pull request in Hypertrace-docs. We will be more than happy to review and add your suggestions.

Here are few important documentation categories you can contribute to:

Getting Started and Installation
FAQs
UI & Platform Overview
Anything else users can see

Code/ Features/ Bug fixes

You like to code and would love to contribute code to an open source project? Here are the few ways to find an issue you want to fix or a feature you want to add.

Go to the GitHub Issues tab of the open source project repository you would like to contribute to. For Hypertrace, you can find open issues here
If you are contributing for the first time,, look for issues with the good first issue label. Otherwise just find something that interests you and start working on it.
Do you have a feature/enhancement idea that you want to work on? Open an issue or start a discussion thread. Once you get some feedback, then start working!
If you can't find something then you can also write unit tests if they are missing. No maintainer will say no to more/better tests in the suite. Here's an article to help you get started: https://www.toptal.com/qa/how-to-write-testable-code-and-why-it-matters
While trying the project, if you find a bug, report them right away as that itself is a nice contribution.

If you are first time contributor, here are few articles which might help you:

Evangelism

Would you like to share your experience using an open source project? Enterprise products use marketing teams to get the word out about themselves but open source projects depend upon users. You can join the Hypertrace user community here. Once you join the community, you can talk with other users about their experience and perhaps even chat with the creators of the project. Once you are comfortable, you can write a blog post explaining how the project helped you and what problems it solved.

Similarly, you can also share your experience in a meetup or online event or create a tutorial video which will help others learn about the project.

If you want to write a blog post about Hypertrace, take a look at blog.hypertrace.org and let us know what you are interested in writing about. You can reach in the Welcome channel on Slack

Helping others

Are you experienced but don't have time to invest? Be a part of Slack and GitHub communities and answer questions there. Or share your thoughts on open issues and feature ideas.

You can also help in reviewing PRs, moderating discussions or reporting bugs.

Videos are very popular these days. A short tutorial or sample app can help others build deeper understanding about the project. Even your smallest contribution can have a bigger impact on an open source project.

So, what are you waiting for? Pick up any of these ideas and dive into the world of Open Source. If you ask me, I will tell you to get started with documentation as it will give you a deeper understanding of the project and help you build confidence too.

On that note,

We are looking for contributors to help us build Hypertrace, an Open Source distributed tracing and Observability platform. And any contributions you make are greatly appreciated. Feel free to reach out to us on GitHub Discussions or join us on slack to learn more.

Evolution of API and Microservice ecosystem

Jayesh Bapu Ahire — Tue, 18 May 2021 14:30:04 +0000

In this first installment in a series on the evolution of API and microservices ecosystem, we will learn about how microservices and APIs became industry standards and what are different benefits microservices offer. We will explore different challenges with this ecosystem and how to solve them in the second part of this series.

But, what is microservice architecture? As called out by James Lewis and Martin Fowler, The term "Microservice Architecture" has sprung up over the last few years to describe a particular way of designing software applications as suites of independently deployable services. While there is no precise definition of this architectural style, there are certain common characteristics around organization around business capability, automated deployment, intelligence in the endpoints, and decentralized control of languages and data.
I hope it’s not that hard to guess but these microservices talk to each other via APIs and two of the most commonly used protocols are HTTP request-response with resource APIs and lightweight messaging. Although as I mentioned above, companies like Netflix and Amazon have been using microservices for quite a long time, many small organizations have also started adopting API first or microservice driven architectures recently, because APIs have become the heart of the global tech industry in the past decade.

Rise of Microservices

Let’s dive more into the origins of APIs and Microservices as they share a common origin story.

Rise of service-oriented architectures
Rise of cloud computing and managed services
Rise of decentralization movement
Rise of Agile movement

Rise of service-oriented architectures

As we all know, building distributed systems is hard and managing them is even harder. The rise of the web back in the days opened the doors to innovate the way we build distributed systems and that’s where Service based architecture came into the picture. SOA was defined as a style of multi-tier computing that helps organizations share logic and data among multiple applications and usage modes. [as defined by Gartner ]

Though a failed movement back in the time, SOA surely helped a lot in initiating microservice movements, and even organizations like Netflix and Amazon were calling their architectures SOAs before the microservice movement. Due to the centralized nature of ESB topology and some other reasons, SOAs increased complexity and introduced bottlenecks, and the costs of implementing an SOA infrastructure (based on the ESB, registry, and service platform template) were excessive.

Due to these problems, people started looking for better alternatives.

Rise of cloud computing and managed services

The rise of RESTful Web APIs arose as a lighter-weight alternative to SOAP services. -- a style of interconnecting applications that had evolved organically on the Web --. The distributed nature of cloud infrastructure challenged the placement of the centralized ESB topology.
Everyone started adopting microservices due to the benefits they provided and here we are today looking at complex modern architectures. Cloud computing helped in removing barriers for deployment and provided a variety of new use-cases for APIs. It worked out as a novel platform for deploying more granular API-fronted application components. Cloud services provided another reason to move towards more service-oriented and modular deployment architectures.

Rise of decentralization movement

As we discussed earlier, at one point in time, service-oriented computing became an increasingly popular paradigm for modeling and building distributed systems in open and heterogeneous environments. However, proposed service-oriented architectures are typically based on centralized components, such as service registries or service brokers, that introduce reliability, management, and performance issues.

During this whole time, the capabilities and scale of distributed systems have increased. The trend towards decentralization in both the system itself as well as the supporting organization started to catch up and the decentralization moment started.

Rise of Agile movement

In his blog post titled “Coding the Architecture”, Simon Brown pointed out that agile architecture does not naturally emerge from agile development practices. Rather, it must be consciously sought. Note that his description of agile software architecture is a perfect match for microservice architecture.

If we look at the characteristics of agile software architecture, we tend to think of something that is built using a collection of small, loosely coupled components/services that collaborate together to satisfy an end-goal. This style of architecture provides agility in a number of ways. Small, loosely coupled components/services can be built, modified, and tested in isolation, or even ripped out and replaced depending on how requirements change. This style of architecture also lends itself well to a very flexible and adaptable deployment model, since new components/services can be added and scaled if needed.

The agile software movement arose as a reaction to the same centralized approach to enterprise IT that hampered the SOA movement. Agile’s popularity and success in software development led to the CI/CD approach to software deployment, followed by the cultural philosophy of the DevOps movement. Between CI/CD, DevOps, agile development, and progressive delivery, the software delivery train also started speeding up.

Benefits of Microservices

Now that we have gone through the reasons behind the rise of Microservices, Let’s try to understand what business value microservices provide.

The main reason that drives the move to microservices in any organization is speed and agility at scale which helps in software delivery. Reducing cross-team coordination, building diverse language applications, flexible deployments, enhanced manageability are some of the additional perks that organizations with microservice architectures enjoy.

Many of the benefits which come with Microservice architectures are mostly due to the API-first nature of microservices and here are a few of them:

Composability: When services are published through an API, it is easier to use them in multiple business contexts to assist in various business processes
Testability: When services are accessible over a network boundary, it is easier to isolate tests and exercise individual components of the system
Scalability: Each microservice can be scaled autonomously without disrupting the other microservices that comprise the application. When demand increases, you only need to upgrade or divert more resources to the microservice affected by the increasing demands.
Evolvability: When services are exposed through an API, implementation details can be hidden from the consumer, making it easier to change components without impacting dependent parts of the system
Comprehensibility: When a complex system is broken down into modular APIs, it is easier to understand the overall business functionality of the system, which helps in both designing and maintaining the system
Automatability: Along with the data plane API benefits above, control plane APIs allow automation in the deployment and management of microservices, thus increasing the velocity of software delivery

But as we know with great power comes great responsibility and the same thing applies to microservices and API ecosystems as well. These complex systems bring a lot of challenges with them and we will discuss those challenges in the next part of this blog post.

Until then, if your organization is using microservice architecture and exploring Observability solutions, feel free to check out our Open Source Observability platform Hypertrace. If you are in transition and want to learn more about Observability or want to contribute to Hypertrace, join our slack community to interact with folks who have been through this transition or going through this transition.

References:

The ethics of AI

Jayesh Bapu Ahire — Wed, 06 May 2020 11:12:32 +0000

Whether in daily mobility, in industrial applications or in the form of assistance solutions at home: Artificial Intelligence permeates an ever wider range of our lives. It is associated with great hopes, but it also raises fears. Therefore, the call for ethical guidelines regarding the new technologies is becoming increasingly louder.

There are various questions we need to find answers to:

How can the “right” values be integrated into technical action?
Can and may AI determine what is right?
How to ensure that AI is developed, used and provided in the service of mankind?
And how can the decision-making processes of AI, including wrong decisions, be made transparent, given the huge, complex amounts of data?

We organized a panel discussion on what is importance of implementing ethical practices within your predictive models, data workflows, products and AI research. I was the part of the panel along with Scott Haines, Lizzie Siegle and Nick Walsh.

In this article, we will go through some of the points we discussed with panel and their views on various topics along with my view on each topic.

For the sake of simplicity, we will discuss further details into 3 broad categories:

Fairness & Bias
Interpretability & Explainability
Privacy

Let's jump into it!

1. Fairness & Bias:

Bias is often identified as one of the major risks associated with artificial intelligence (AI) systems. As A Survey on Bias and Fairness in Machine Learning suggests, there are clear benefits to algorithmic decision-making; unlike people, machines do not become tired or bored, and can take into account orders of magnitude more factors than people can. However, like people, algorithms are vulnerable to biases that render their decisions “unfair”. In the context of decision-making, fairness is the absence of any prejudice or favoritism toward an individual or a group based on their inherent or acquired characteristics. Thus, an unfair algorithm is one whose decisions are skewed toward a particular group of people.

The public discussion about bias in such scenarios often assigns blame to the algorithm itself. The algorithm, it is said, has made the wrong decision, to the detriment of a particular group. But this claim fails to take into account the human component: People perceive bias through the subjective lens of fairness.

So, Bias occurs when we discriminate against (or promote) a defined group consciously or unconsciously, and it can creep into an AI system as a result of skewed data or an algorithm that does not account for skewed data. Fairness, meanwhile, is a social construct. And in fact, when people judge an algorithm to be “biased,” they are often conflating bias and fairness: They are using a specific definition of fairness to pass judgment on the algorithm.

No AI system can be universally fair or unbiased as there are more than 20 definitions of fairness and it is impossible for every decision to be fair to all parties. But we can design these systems to meet specific fairness goals, thus mitigating some of the perceived unfairness and creating a more responsible system overall.

This discussion also brings us to a question if we can relate trolley problem here?

Let's first understand what is trolley problem:

A runaway trolley is heading down the tracks toward five workers who will all be killed if the trolley proceeds on its present course. Adam is standing next to a large switch that can divert the trolley onto a different track. The only way to save the lives of the five workers is to divert the trolley onto another track that only has one worker on it. If Adam diverts the trolley onto the other track, this one worker will die, but the other five workers will be saved.

Should Adam flip the switch, killing the one worker but saving the other five?
The original trolley problem came from a paper about the ethics of abortion, written by English philosopher Phillipa Foot. It’s the third of three increasingly complicated thought experiments she offers to help readers assess whether intentionally harming someone (e.g. choosing to hit them) is morally equivalent to merely allowing harm to occur (choosing not to intervene to stop them getting hit).

Apart from above version there are various modern day implications of this problem which are considered while social psychology studies conducted by different universities. You can try Moral Machine by MIT.

Who else is ready to accept this as a solution?

Jokes apart, But is this applicable in case of AI?
Simon Beard explains in his article that this philosophical issue is irrelevant to self-driving cars because they don’t have intentions. Those who think intentions are morally significant tend to hold strong views about our freedom of will. But machines don’t have free will (at least not yet). Thus, as far back as Isaac Asimov’s three laws of robotics, we’ve recognized that for machines to harm someone—or “through inaction, allow a human being to come to harm”—is morally equivalent.

Which seems logical explanation as AI because trolley problem is very situational and AI works in real-world scenarios which are extremely uncertain and you can't certainly apply universal laws of ideal behavior to machines. AI tends to learn continuously from it's mistake and try to achieve a state with better decision making capabilities with each iteration.

This brings us to second point of our discussion that how important explainability and Interpretability of decisions made by AI is considering ever increasing lack of transparency has only become more visible, midst of techlash.

2. Interpretability & Explainability

When it comes to machine learning and artificial intelligence, explainability and interpretability are often used interchangeably.

According to Miller, interpretability is the degree to which a human can understand the cause of a decision. Interpretable predictions lead to better trust and provide insight into how the model may be improved. We can also define intepretability as the extent to which we are able to predict what is going to happen, given a change in input or algorithmic parameters.

Explainability, meanwhile, is the extent to which the internal mechanics of a machine or deep learning system can be explained in human terms. It’s easy to miss the subtle difference with interpretability, but consider it like this: interpretability is about being able to discern the mechanics without necessarily knowing why. Explainability is being able to quite literally explain what is happening.

Let's understand using example, consider a chemistry experiment: Knowing that when you mix 1 gm. of blue colour chemical with 100 ml of liquid you get pink colour solution and this is interpretability but understanding exact chemical reaction for this experiment means explanability.

Interpretability & Explanability are not required if the model has no significant impact or when the problem is well studied. This might also enable people or programs to manipulate the system but if we’re unable to properly deliver improved interpretability, and ultimately explainability, in our algorithms, we’ll seriously be limiting the potential impact of artificial intelligence. Which would be a shame.

Let's discuss few methods to increase the interpretability of complex ML models.

LIME or Local Interpretable Model-Agnostic Explanations, is a method developed in the paper Why should I trust you? for interpreting individual model predictions based on locally approximating the model around a given prediction. The researchers explain that LIME can explain “the predictions of any classifier in an interpretable and faithful manner, by learning an interpretable model locally around the prediction.”
What this means in practice is that the LIME model develops an approximation of the model by testing it out to see what happens when certain aspects within the model are changed. Essentially it’s about trying to recreate the output from the same input through a process of experimentation.
DeepLIFT (Deep Learning Important FeaTures) is another method which serves as a recursive prediction explanation method for deep learning. This method decomposes the output prediction of a neural network on a specific input by backpropagating the contributions of all neurons in the network to every feature of the input. DeepLIFT assigns contribution scores based on the difference between activation of each neuron and its ‘reference activation’. DeepLIFT can also reveal dependencies which are missed by other approaches by optionally giving separate consideration to positive and negative contributions.
Layerwise Relevance Propagation (LRP) is a technique for determining which features in a particular input vector contribute most strongly to a neural network’s output. The technique was originally described in this paper. It defines a set of constraints to derive a number of different relevance propagation functions.
Algorithmic generalization: Improving generalization is not as easy as it sounds. Sometimes Models feel secondary when we realise that most ML engineering is applying algorithms in a very specific way to uncover a certain desired outcome. However, by shifting this attitude to consider the overall health of the algorithm, and the data on which it is running, you can begin to set a solid foundation for improved interpretability.
Pay attention to feature importance: Feature importance tries to answer question, what features have the biggest impact on predictions. Looking closely at the way the various features of your algorithm have been set is a practical way to actually engage with a diverse range of questions, from business alignment to ethics. Debate and discussion over how each feature should be set might be a little time-consuming, but having that tacit awareness that different features have been set in a certain way is nevertheless an important step in moving towards interpretability and explainability.

So, we explored the views around to open a black-box or not! Now, we have to think about privacy aspect which is one of the major concern in today's world.

3. Privacy

When we look around for application areas of AI, we find that much of the most privacy-sensitive data analysis today–such as search algorithms, recommendation engines, and adtech networks–are driven by machine learning and decisions by algorithms. As artificial intelligence evolves, it magnifies the ability to use personal information in ways that can intrude on privacy interests by raising analysis of personal information to new levels of power and speed.

The discussion of AI in the context of the privacy debate often brings up the limitations and failures of AI systems. Bias in data and unfair systems have raised significant issues, but privacy legislation is complicated enough even without packing in all the social and political issues that can arise from uses of information. As Cameron F. Kerry suggested in one of his survey, to evaluate the effect of AI on privacy, it is necessary to distinguish between data issues that are endemic to all AI, like the incidence of false positives and negatives or overfitting to patterns, and those that are specific to use of personal information.

The privacy legislative proposals that involve these issues refer to decisions made by AI as “automated decisions” (borrowed from EU data protection law) or “algorithmic decisions”. In general I believe that this helps to shift people’s focus from the use of AI as a tool to the use of personal data in AI and to the impact this use may have on individuals. This debate centers in particular on algorithmic bias and the potential for algorithms to produce unlawful or undesired discrimination in the decisions to which the algorithms relate. These are major concerns for civil rights and consumer organizations that represent populations that suffer undue discrimination.

There's lot to think about and discuss about the scope of privacy legislation as some of them also address algorithmic discrimination.
Let's split our views into two separate points:

to what extent can or should legislation address issues of algorithmic bias? According to Law, Discrimination is not self-evidently a privacy issue, since it presents broad social issues that persist even without the collection and use of personal information, and fall under various civil rights laws.
When we want to ensure privacy protection in AI systems and regulate use of consumer data and try to stop this unfair and deceptive practices the consumer choice based notice-and-choice model for privacy policies become meaningless. We will require a change in the paradigm of privacy, to protect such privacy interests in the context of AI.

we need an approach that addresses risk more obliquely, with accountability measures designed to identify discrimination in the processing of personal data and regulate use of it. Number of organizations and companies as well as several legislators propose such accountability. Their proposals take various forms:

Transparency: Accountability is a major concern and being able to examine how company is handling data makes it much more easier to hold company accountable for things. This mostly refers to disclosures relating to uses of algorithmic decision-making. Replacing current lengthy, detailed privacy policies are not helpful to most consumers with “privacy disclosures” that require a complete description of what and how data is collected, used, and protected would enhance the benchmarking which is being done by regulators. In turn, requiring that these disclosures identify significant uses of personal information for algorithmic decisions would help watchdogs and consumers know where to look out for untoward outcomes.
Explainability: As we discussed above, this helps us understand how the data has been used to reach to particular decision and which features have played significant role in conclusion. European Union’s General Data Protection Regulation (GDPR) also follows this approach to settle disputes because of automated decisions. The GDPR requires that, for any automated decision with “legal effects or similarly significant effects” such as employment, credit, or insurance coverage, the person affected has recourse to a human who can review the decision and explain its logic. This incorporates a “human-in-the-loop” component and an element of due process that provide a check on anomalous or unfair outcomes. The central problem with both explainability is that you’re adding an additional step in the development process along with significant regulatory burden. Indeed, you’re probably adding multiple steps. From one perspective, this looks like you’re trying to tackle complexity with even greater complexity.
Risk assessment: When we talk about risk assessment in this context we are mostly dealing will assessing impact of algorithmic decision on individuals and potential bias in design of a system. As Cameron F. Kerry rightly points out, for the regulatory burden to be proportionate, the level of risk assessment should be appropriate to the significance of the decision-making in question, which depends on the consequences of the decisions, the number of people and volume of data potentially affected, and the novelty and complexity of algorithmic processing.
Audits: There are some general accountability requirements as well as ways such as self-audits or third-party audits which help authorities to ensure companies comply with their privacy programs. When we couple auditing outcomes of AI decisions with proactive risk assessments it can help us match foresight with hindsight; although, like explainability, auditing machine-learning routines is difficult and still developing.

Because of the difficulties of foreseeing machine learning outcomes as well as reverse-engineering algorithmic decisions, no single measure can be completely effective in avoiding perverse effects. Thus, where algorithmic decisions are consequential, it makes sense to combine measures to work together. That's why it’s important for algorithm operators and developers to ensure that they won't leave some groups of people worse off as a result of the algorithm’s design or its unintended consequences. [AI debate, Nicol Turner Lee with Paul Resnick and Genie Barton]

On a positive note,

Amid the dissonance of concern over unfair decisions made by artificial intelligence (AI) and use of personal data. The potential for AI to do good cannot be overlooked. Technology leaders such as Microsoft, IBM, Google and many others have entire sections of their business focused on the topic and dedicate resources to build AI solutions for good and to support developers who do. In the fight to solve extraordinarily difficult challenges such as Accessibility, Climate Change, Conservation and the Environment, World hunger, Human rights, Fake News and so many others, we can surely use help by AI!

Note: In case you're interested to watch this discussion, you can check out the video here:

References:

MLOps with AML and Azure Devops

Jayesh Bapu Ahire — Mon, 04 May 2020 10:43:55 +0000

Few years ago, when someone used to say machine learning or AI the focus used to be on data prep and cleaning, EDA, modeling etc. Inferencing/ prediction used to be final steps for most people. But the situation has changed today, people are now talking about more elusive and less tangible final state — often termed “deployment”, “delivery”.

When is comes to deploying your machine learning solution in production there are various ways to do that. Some of them are as follows:

Python code behind e.g. Flask: This is the simplest way to go into production. write python or flask wrapper for your machine learning model inferencing and you're ready. But this way is very hard to scale.
Execution service from cloud provider: All major cloud service providers easy and scalable out of the box solutions.
Runtime
- TensorFlow serving
- Clipper
- NVIDIA TensorRT inference server
- MXNet Model Server
- …
Bespoke solutions (C++, …): Write your own runtime
kubernetes-ize everything above: Any solution above can be scaled using Kubernetes.

There are some things we need to consider before we deploy our model in production. Whatever deployment solution you choose must fulfill following criteria:

Must fit the technology stack
Not just about languages, but about semantics, scalability, guarantees
Run anywhere, any size
Composable building blocks
Must try to limit the amount of moving parts and the amount of moving data
Must make best use of resources

Now, if you have built some machine learning solution and if you will like to set up a minimal, robust, repeatable set of pipelines for productionising Machine Learning models in the cloud using Microsoft Azure, this article will help you with steps as well as resources you should refer.

First let's go through some terms:

What is MLOps?

According to Nisha Talagala, MLOps (a compound of Machine Learning and “information technology OPerationS”) is new discipline/focus/practice for collaboration and communication between data scientists and information technology (IT) professionals while automating and productizing machine learning algorithms. Via practice and tools, MLOps aims to establish a culture and environment where ML technologies can generate business benefits by rapidly, frequently and reliably building, testing, and releasing ML technology into production.

Azure Machine Learning:

Azure Machine Learning service provides a cloud-based environment you can use to develop, train, test, deploy, manage, and track machine learning models.

Azure DevOps:

We will be using the Azure DevOps project for build and release pipelines along with Azure ML services for ML/AI model management and operationalization.

Azure Pipelines:

Azure Pipelines are cloud-hosted pipelines that are fully integrated with Azure DevOps. You can either use a yaml file or a UI-based tool in Azure DevOps to set up your pipelines. It allows us to frequently update models, test new models, and continuously roll out new ML models alongside your other applications and services.

The end-to-end Machine learning pipeline includes data-prep, training, packaging and validating model, deploying model and continuous testing. It looks like below:

Our end-to-end CI/CD pipeline will look like below:

Let's get started:

Prerequisites

Microsoft Azure Account: You will need a valid and active Azure account for the Azure labs. If you do not have one, you can sign up for a free trial.
You will need an Azure DevOps account. If you do not have one, you can sign up for free here.

Setup

We will be using Azure Demo Generator to initiate setup. Azure DevOps Demo Generator helps you create team projects on your Azure DevOps Organization with sample content that include source code, work items, iterations, service endpoints, build and release definitions based on the template you choose during the configuration.

Use the Azure DevOps Demo Generator to provision the project on your Azure DevOps organization. This URL will automatically select Azure Machine Learning template in the demo generator. This template contains code and pipeline definition for a machine learning project demonstrating how to automate the end to end ML/AI project.

Step 1: Configure CI pipeline

In this step, we will configure CI pipeline for your ML/AI project. This pipeline will include DevOps tasks for

data sanity test,
model training on different compute targets,
model version management,
model evaluation/model selection etc.

a. Let's go to Azure Devops Dashboard:

b. Select your project and Navigate to Pipeline > Builds. Select the pipeline DevOps-for-AI-CI and click on edit.

Your current pipeline configs will look like below:

c. In this, Python environment and Install Requirements tasks will be required to setup and prepare python environment for subsequent builds.

Select next task, which will be Create or get Workspace (currently it's showing some setting need attention). In task configurations, Select the Azure subscription from the drop-down list and click Authorize to configure Azure service connection. Make sure your subscription allows to create AML workspace as this task will be used to create Workspace for Azure Machine learning service.

d. Follow similar procedure for all other tasks and select same subscription. Your end configs will look like below. save this configs.

e. Also, click on triggers tab and make sure CI is enabled.

f. Now, we are done with our CI pipeline which does following tasks:

Prepare the python environment for our upcoming builds
Get or Create the workspace for AML service
Submit Training job on the remote DSVM / Local Python Env
Compare performance of different models and select the best
Register model to the workspace
Create Docker Image for Scoring Web service
Copy and Publish the Artifacts to Release Pipeline

Step 2: Configure CD pipeline

So, as we are done with CI pipeline let's configure Release pipeline which will deploy the image created from the build pipeline to Azure Container Instance and Azure Kubernetes Services.

a. Now, navigate to Pipeline > Releases and select Deploy Web service and click Edit pipeline.

b. As we can see here, pipeline has two steps QA and Prod. We have to modify configs for QA so click on QA.

Similar to sub-step c in step-1,
Python environment and Install Requirements tasks will be required to setup and prepare python environment for subsequent builds.

c. Select next task, which will be Deploy webservice of ACI (currently it's showing some setting need attention). In task configurations, Select the Azure subscription from the drop-down list and click Authorize to configure Azure service connection. This task creates ACI (Azure Container Instance) and deploys web service image created in Build Pipeline to ACI.

d. Follow similar steps for Prod-Deploy on AKS which you can find form drop-down in top bar.

This task configures AKS (Azure Kubernetes Service) so we can deploy our web-service on AKS in production.

Step 3: Update config file in the source code to trigger CI and CD

a. Navigate to Repos and choose files. Go to directory aml_config/ and open config.json file.

b. Update your Azure subscription ID in place of <>. If required Change resource group name, AML workspace name and the location where you want to deploy your Azure ML service workspace. Click Commit to commit the changes.

Note: Navigate to environment_setup/install_requirements.sh, change the Azure CLI version from existing 2.0.X to 2.5.0.

c. Since we have enabled in CI trigger a build will be queued automatically.

d. If you want to check build, you can navigate to Pipelines > Builds and you will see a build is queued. Open the build to see the progress.

e. Once the build is success a release will be queued automatically. Navigate to Pipelines –> Releases to see the release in progress.


Deployment process in action (courtesy: Azure DevOps Labs)

f. If you want to view the resources provisioned and deployed by CI-CD pipelines, navigate to your Azure portal as shown below!

This was a very short and high-level overview to play with Azure DevOps and Azure pipelines to create manage machine learning deployments. We used the Azure DevOps project for build and release pipelines along with Azure ML services for ML/AI model management and operationalization.

What's next?

If you want to go through detailed step-by-step explanation, you can check out blog by our friend Ben Alex Keen available here.

For more information about various services we used here, you can go through list of references and learn more.

References:

Data extraction from documents made easy with Amazon Textract

Jayesh Bapu Ahire — Sun, 05 Apr 2020 11:30:42 +0000

Artificial Intelligence as we know found use cases in every possible industry! Many complicated problems we used to face during our day to day are now being solved using AI. Some of them might not give results upto human standards but with improvements in underlying algorithms and optimizations we are progressing towards achieving this standards. In this article we will see one such important problem, Text Extraction from documents. For many years, companies are working on this problem using manual techniques, rule-based methods or customized OCR which are both time consuming and complicated.

One important point here is documents are important! How? Let's see!

Documents are primary tools for keeping the records. Large amount of data is stored in structured or unstructured documents. They are also important when it comes to communicate, collaborate or transact the data across industries like medical, legal, business management, finance, education, tax management and many more.

What are the types of documents we are looking at?

We are looking at scanned documents, digital documents, forms, tables, contracts and many other.

I mentioned some classical techniques which we are using above. What is the problem with those? The major problems in this manual techniques are they are too expensive, error prone and time consuming as it involves human-intervention.

Let's see problems with each of the technique:

1. Manual processing (humans):

When we depend on humans processing the docs there might be issues like

Variable output
Inconsistent results
Reviews for consensus

in a example below humans can process and interpret this blocks differently and it depends on variety of factors.

2. Customized OCR was better solution than manual extraction but it has it's own problem:

Paragraph detection (You can code this but again manual intervention comes in. You can annotate the sample set and train a ML model on model on that which will give you separated paragraphs and again there are some unsupervised methods but ML comes into play here. )
No rotated text and stylized text detection
No multi-column detection
Table Extraction

You can obviously add this features and if you want to do it without ML you have to maintain a separate code template (and templates are brittle) for each document and it's time consuming. If we consider tax form for any country there will be different variations for different job categories and you have to maintain different template and rule-sets for all of them which is nightmare.

So how can we not complicate our life further and still make a robust text extraction solution? Amazon textract comes handy and solves many of the problems we have seen! It's tagline says extract text and data from virtually any document!

Let's jump into details!

What Amazon Textract can do?

Let's first list down some things you can achieve using amazon textract and then see core features in details:

Text detection from documents
Multi-column detection and reading order
Natural language processing and document classification
Natural language processing for medical documents
Document translation
Search and discovery
Form extraction and processing
Compliance control with document redaction
Table extraction and processing
PDF document processing

How textract works?

Amazon textract API accepts the document stored in s3 and uses ML models built in to extract text, tables or any fields of interest from docs. Now we get an option to either store this extracted data into some other format or stack some other services for further processing the output. We can use services like Elasticsearch to create indexes for the data to built a search application around it or we can amazon comprehend to use Natural Language Processing on our data.

We can use services like amazon comprehend medical which uses advanced machine learning models to accurately and quickly identify medical information, such as medical conditions and medications, and determines their relationship to each other, for instance, medicine dosage and strength. Amazon Comprehend Medical can also link the detected information to medical ontologies such as ICD-10-CM or RxNorm. And if you are not interested in all this fancy stuff you can just store your data in database with pre-defined schema and use it in your application! The above self-explanatory diagram from documentation will make understanding of things little easy!

Before going ahead let's just see request and response format of Textract API.

1. Request Syntax:

{
   "Document": { 
      "Bytes": blob,
      "S3Object": { 
         "Bucket": "string",
         "Name": "string",
         "Version": "string"
      }
   },
   "FeatureTypes": [ "string" ],
   "HumanLoopConfig": { 
      "DataAttributes": { 
         "ContentClassifiers": [ "string" ]
      },
      "FlowDefinitionArn": "string",
      "HumanLoopName": "string"
   }
}

Here, Document is input document which can be base64-encoded bytes or an Amazon S3 object and it's required. FeatureTypes is list of features you want to extract like tables, forms etc. and it's also required. HumanLoopConfig allows you to set human reviewer and it's not required.

2. Response Syntax:

{
   "AnalyzeDocumentModelVersion": "string",
   "Blocks": [ 
      { 
         "BlockType": "string",
         "ColumnIndex": number,
         "ColumnSpan": number,
         "Confidence": number,
         "EntityTypes": [ "string" ],
         "Geometry": { 
            "BoundingBox": { 
               "Height": number,
               "Left": number,
               "Top": number,
               "Width": number
            },
            "Polygon": [ 
               { 
                  "X": number,
                  "Y": number
               }
            ]
         },
         "Id": "string",
         "Page": number,
         "Relationships": [ 
            { 
               "Ids": [ "string" ],
               "Type": "string"
            }
         ],
         "RowIndex": number,
         "RowSpan": number,
         "SelectionStatus": "string",
         "Text": "string"
      }
   ],
   "DocumentMetadata": { 
      "Pages": number
   },
   "HumanLoopActivationOutput": { 
      "HumanLoopActivationConditionsEvaluationResults": "string",
      "HumanLoopActivationReasons": [ "string" ],
      "HumanLoopArn": "string"
   }
}

Here, AnalyzeDocumentModelVersion tells you version of model used used and Blocks contains all the detected items. DocumentMetadata gives additional information about document and HumanLoopActivationOutput gives results of evaluation by human reviewer.

Now we know what textract can do and how it works, let's see the core features and capabilities textract provides in details:

Core Features:

You can try this all from Amazon Textract Console directly!

1. Table Extraction:

Amazon textract can extract tables from given document and provide them into any format we want including CSV or spreadsheet and we can even automatically load the extracted data into a database using a pre-defined schema.

Let's consider one document and see how Textract works for that!

Here are the results which are really promising!

2. Form Extraction:

Amazon textract can extract data from forms in key-value pairs which we can use for various applications. For example you want to setup automated process which accepts scanned bank account opening application and fills required data into system and creates account you can do that using amazon textract form extraction.

let's try this on below document:

Here are the results:

Let's see harder problem with document like this:

Here's what we got:

3. Text Extraction:

Amazon textract uses a better adoption of OCR which uses ML along with OCR (some people like to call it OCR++) which detects printed text and numbers in a scan or rendering of a document. This can be used for medical reports, financial reports or we can use it for applications like clause extraction in legal documents when paired with amazon comprehend.

Let's try to extract text from this document:

Here are the results:

Along with this 3 core features, textract also provides you bunch of features like Bounding Boxes, Adjustable Confidence Thresholds, Built-in Human Review Workflow.

So, how can we use the textract API with python?

Let's build a very simplified upload and analyze pipeline based on amazon textractor.

Pipeline: First, we will upload document to s3 and then use amazon textractor to extract fields we want from document.

import os
import subprocess as sp
from s3_upload import upload
import re

def run_pipeline(source_file, bucket_name, object_key, flags):
    upload(source_file, bucket_name, object_key)
    url = f"s3://{bucket_name}/{object_key}"

    command_analysis = f"python textractor.py --documents {url} {flags}"
    os.system(command_analysis)


def main():
    import argparse
    parser = argparse.ArgumentParser()
    parser.add_argument('source_file', help='The path and name of the source file to upload.')
    parser.add_argument('bucket_name', help='The name of the destination bucket.')
    parser.add_argument('object_key', help='The key of the destination object.')
    parser.add_argument('flags', help='Only one of the flags (--text, --forms and --tables) is required at the minimum. You can use combination of all three.')
    args = parser.parse_args()

    run_pipeline(args.source_file, args.bucket_name, args.object_key, args.flags)

if __name__ == "__main__":
    main()

Here, we will provide local file path, s3 bucket we want to upload file in and name of the file along with what we want to extract.

Upload file to s3: uploading file to s3 is really easy:

def upload(source_file, bucket_name, object_key):
    s3 = boto3.resource('s3')
    try:
        s3.Bucket(bucket_name).upload_file(source_file, object_key)
    except Exception as e:
        print(e)

Textractor: Textractor is the ready to use solution made by amazon which helps to speed up the PoC's. It can convert output in different formats including raw JSON, JSON for each page in the document, text, text in reading order, key/values exported as CSV, tables exported as CSV. It can also generate insights or translate detected text by using Amazon Comprehend, Amazon Comprehend Medical and Amazon Translate.

This is how textractor uses response parser library which helps process JSON returned from Amazon Textract. See the repo and documentation for more details.

# Call Amazon Textract and get JSON response
docproc = DocumentProcessor(bucketName, filePath, awsRegion, detectText, detectForms, tables)
response = docproc.run()

# Get DOM
doc = Document(response)

# Iterate over elements in the document
for page in doc.pages:
    # Print lines and words
    for line in page.lines:
        print("Line: {}--{}".format(line.text, line.confidence))
        for word in line.words:
            print("Word: {}--{}".format(word.text, word.confidence))

    # Print tables
    for table in page.tables:
        for r, row in enumerate(table.rows):
            for c, cell in enumerate(row.cells):
                print("Table[{}][{}] = {}-{}".format(r, c, cell.text, cell.confidence))

    # Print fields
    for field in page.form.fields:
        print("Field: Key: {}, Value: {}".format(field.key.text, field.value.text))

    # Get field by key
    key = "Phone Number:"
    field = page.form.getFieldByKey(key)
    if(field):
        print("Field: Key: {}, Value: {}".format(field.key, field.value))

    # Search fields by key
    key = "address"
    fields = page.form.searchFieldsByKey(key)
    for field in fields:
        print("Field: Key: {}, Value: {}".format(field.key, field.value))

This is how the output looks like!

What's next

We went through various features and capabilities textract provides! This is one of the ready to use solution which can simplify some very complicated problems we face while building business applications around documents. This is not 100% accurate and directly usable for every case but some small tweaks here and there should make it usable for most of the use cases. In next article, we will see how we can use this is some of the business applications and we will also try to build end to end pipeline using various AWS services.

Until then, let me know if you have some use-cases where you are already using amazon textract or you're planning to use this in comments. If you have any questions or want to discuss any use-cases ping me on twitter.

Stay safe!

References:

Amazon Textract : https://aws.amazon.com/textract/
Amazon Textract Console: https://console.aws.amazon.com/textract/home?region=us-east-1#/
Amazon Blogs: https://aws.amazon.com/blogs/machine-learning/automatically-extract-text-and-structured-data-from-documents-with-amazon-textract/
Amazon Textract Documentation: https://docs.aws.amazon.com/textract/latest/dg/what-is.html
Amazon textract textractor

Demystifying the XOR problem

Jayesh Bapu Ahire — Fri, 03 Apr 2020 14:21:32 +0000

In my previous post on Extreme learning machines I told that the famous pioneers in AI Marvin Minsky and Seymour Papert claimed in their book Perceptron [1969], that the simple XOR cannot be resolved by two-layer of feedforward neural networks, which "drove research away from neural networks in the 1970s, and contributed to the so-called AI winter".[Wikipedia 2013]

Let's explore what is this XOR problem...

The XOR Problem

The XOR, or “exclusive or”, problem is a classic problem in ANN research. It is the problem of using a neural network to predict the outputs of XOR logic gates given two binary inputs. An XOR function should return a true value if the two inputs are not equal and a false value if they are equal. All possible inputs and predicted outputs are shown in figure 1.

XOR is a classification problem and one for which the expected outputs are known in advance. It is therefore appropriate to use a supervised learning approach.

On the surface, XOR appears to be a very simple problem, however, Minksy and Papert (1969) showed that this was a big problem for neural network architectures of the 1960s, known as perceptrons.

Perceptrons

Like all ANNs, the perceptron is composed of a network of *units*, which are analagous to biological neurons. A unit can receive an input from other units. On doing so, it takes the sum of all values received and decides whether it is going to forward a signal on to other units to which it is connected. This is called activation. The activation function uses some means or other to reduce the sum of input values to a 1 or a 0 (or a value very close to a 1 or 0) in order to represent activation or lack thereof. Another form of unit, known as a bias unit, always activates, typically sending a hard coded 1 to all units to which it is connected.

Perceptrons include a single layer of input units — including one bias unit — and a single output unit (see figure 2). Here a bias unit is depicted by a dashed circle, while other units are shown as blue circles. There are two non-bias input units representing the two binary input values for XOR. Any number of input units can be included.

The perceptron is a type of feed-forward network, which means the process of generating an output — known as forward propagation — flows in one direction from the input layer to the output layer. There are no connections between units in the input layer. Instead, all units in the input layer are connected directly to the output unit.

A simplified explanation of the forward propagation process is that the input values X1 and X2, along with the bias value of 1, are multiplied by their respective weights W0..W2, and parsed to the output unit. The output unit takes the sum of those values and employs an activation function — typically the Heavside step function — to convert the resulting value to a 0 or 1, thus classifying the input values as 0 or 1.

It is the setting of the weight variables that gives the network’s author control over the process of converting input values to an output value. It is the weights that determine where the classification line, the line that separates data points into classification groups, is drawn. If all data points on one side of a classification line are assigned the class of 0, all others are classified as 1.

A limitation of this architecture is that it is only capable of separating data points with a single line. This is unfortunate because the XOR inputs are not linearly separable. This is particularly visible if you plot the XOR input values to a graph. As shown in figure 3, there is no way to separate the 1 and 0 predictions with a single classification line.

Multilayer Perceptrons

The solution to this problem is to expand beyond the single-layer architecture by adding an additional layer of units without any direct access to the outside world, known as a hidden layer. This kind of architecture — shown in Figure 4 — is another feed-forward network known as a multilayer perceptron (MLP).

It is worth noting that an MLP can have any number of units in its input, hidden and output layers. There can also be any number of hidden layers. The architecture used here is designed specifically for the XOR problem.

Similar to the classic perceptron, forward propagation begins with the input values and bias unit from the input layer being multiplied by their respective weights, however, in this case there is a weight for each combination of input (including the input layer’s bias unit) and hidden unit (excluding the hidden layer’s bias unit). The products of the input layer values and their respective weights are parsed as input to the non-bias units in the hidden layer. Each non-bias hidden unit invokes an activation function — usually the classic sigmoid function in the case of the XOR problem — to squash the sum of their input values down to a value that falls between 0 and 1 (usually a value very close to either 0 or 1). The outputs of each hidden layer unit, including the bias unit, are then multiplied by another set of respective weights and parsed to an output unit. The output unit also parses the sum of its input values through an activation function — again, the sigmoid function is appropriate here — to return an output value falling between 0 and 1. This is the predicted output.

This architecture, while more complex than that of the classic perceptron network, is capable of achieving non-linear separation. Thus, with the right set of weight values, it can provide the necessary separation to accurately classify the XOR inputs.

Backpropagation

The elephant in the room, of course, is how one might come up with a set of weight values that ensure the network produces the expected output. In practice, trying to find an acceptable set of weights for an MLP network manually would be an incredibly laborious task. In fact, it is NP-complete (Blum and Rivest, 1992). However, it is fortunately possible to learn a good set of weight values automatically through a process known as backpropagation. This was first demonstrated to work well for the XOR problem by Rumelhart et al. (1985).

The backpropagation algorithm begins by comparing the actual value output by the forward propagation process to the expected value and then moves backward through the network, slightly adjusting each of the weights in a direction that reduces the size of the error by a small degree. Both forward and back propagation are re-run thousands of times on each input combination until the network can accurately predict the expected output of the possible inputs using forward propagation.

For the XOR problem, 100% of possible data examples are available to use in the training process. We can therefore expect the trained network to be 100% accurate in its predictions and there is no need to be concerned with issues such as bias and variance in the resulting model.

Conclusion

In this post, we explored the classic ANN XOR problem. The problem itself was described in detail, along with the fact that the inputs for XOR are not linearly separable into their correct classification categories. A non-linear solution — involving an MLP architecture — was explored at a high level, along with the forward propagation algorithm used to generate an output value from the network and the backpropagation algorithm, which is used to train the network.

The next post in this series will feature a implementation of the MLP architecture described here, including all of the components necessary to train the network to act as an XOR logic gate.

References

Blum, A. Rivest, R. L. (1992). Training a 3-node neural network is NP-complete. Neural Networks, 5(1), 117–127.
Minsky, M. Papert, S. (1969). Perceptron: an introduction to computational geometry. The MIT Press, Cambridge, expanded edition, 19(88), 2.
Rumelhart, D. Hinton, G. Williams, R. (1985). Learning internal representations by error propagation (No. ICS-8506). California University San Diego LA Jolla Inst. for Cognitive Science.

Demystifying Extreme Learning Machines: Part 1

Jayesh Bapu Ahire — Wed, 01 Apr 2020 13:52:55 +0000

Artificial Intelligence is hot research area since past few years and there are many major breakthroughs happening in this area. The traditional problems (or goals) of AI research include reasoning, knowledge representation, planning, learning, natural language processing, perception and the ability to move and manipulate objects. Extreme Learning Machines (ELMs)—has become one of the hot area of research over the past years, many researchers around the world are contributing to the research in this topic.

In this series of articles, we will explore Extreme Learning Machines [ELMs]. In this part, we will see introductions to various concepts which will help us understand ELMs in details.

According to Professor Huang Guangbin, Extreme Learning Machines (ELM) are filling the Gap between Frank Rosenblatt's Dream and John von Neumann's Puzzle.

Let's go 60 years back and see what he meant by that:

Rosenblatt made statements about the perceptron that caused a heated controversy among the fledgling AI community. Based on Rosenblatt's statements, The New York Times reported the perceptron to be "the embryo of an electronic computer that [the Navy] expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence. Which was to be fair is the definition of cognition. [http://en.wikipedia.org/wiki/Perceptron]

In their book Perceptron [1969], the famous pioneers in AI Marvin Minsky and Seymour Papert claimed that the simple XOR cannot be resolved by two-layer of feedforward neural networks, which "drove research away from neural networks in the 1970s, and contributed to the so-called AI winter".[Wikipedia 2013]

During same time, John Von Neumann felt puzzled on why "an imperfect (biological) neural network, containing many random connections, can be made to perform reliably those functions which might be represented by idealized wiring diagrams.”

If you want to read more about what are Artificial Neural Networks, you can read my series over here.

For now, Let's see what are FeedForward Neural Network:

Feedforward neural network :

Feedforward neural network is the first invention is also the most simple artificial neural network. It contains multiple neurons (nodes) arranged in multiple layers. Adjacent layer nodes have connections or edges. All connections are weighted.

A feedforward neural network can contain three kinds of nodes:

Monolayer: This is the simplest feedforward neural network that does not contain any hidden layers.
Multilayer Perception: Multilayer Perception has at least one hidden layer. We will only discuss multilayer perceptrons below, as they are more useful than single-layer perceptrons in today’s real world.
Multilayer perceptrons: Multilayer Perceptron (MLP) includes at least one hidden layer (except for one input layer and one output layer). Single-layer sensors can only learn linear functions, while multi-layer sensors can also learn nonlinear functions.

Broadly describing there are 3 types of layers:

Input Layer: The input layer has three nodes. The offset node value is 1. The other two nodes take external inputs from X1 and X2 (both are digital values from the input data set). As discussed above, no calculations are performed at the input layer, so the output of the input layer node is 1, and three values X1 and X2 are passed to the hidden layer.
Hidden layer: Hidden layer also has three nodes, offset node output is 1. The output of the other two nodes of the hidden layer depends on the output (1, X1, X2) of the input layer and the weight attached to the connection (boundary). Figure 4 shows the calculation of an output in hidden layer (highlighted). The output of other hidden nodes is calculated in the same way. Note that f refers to the activation function. These outputs are passed to nodes in the output layer.
Output Layer: The output layer has two nodes that receive input from the hidden layer and perform calculations similar to the highlighted hidden layer. These calculated values (Y1 and Y2) as the result of the calculation are the output of the multilayer sensor.

For our convenience let's classify feedforward neural networks into two categories: Single-Hidden-Layer Feedforward Networks (SLFNs) and Multi-Hidden-Layer Feedforward Networks.

Mathematical Model for SLFNs–

Approximation capability [Leshno 1993, Park and Sandberg 1991]: Any continuous target function f(x) can be approximated by SLFNs with adjustable hidden nodes. In other words, given any small positive value $ε\varepsilon$ , for SLFNs with enough number of hidden nodes (L) we have $∣∣fa(x)−f(x)∣∣−ε|\left | {f_{a}}(x) - f(x)|\right | - \varepsilon$ .
Classification capability [Huang, et al 2000]: As long as SLFNs can approximate any continuous target function f(x), such SLFNs can differentiate any disjoint regions.

A. Learning Issues:

Conventional theories: only resolves the existence issue, however, does not tackle learning issue at all.
In real applications, target function f is usually unknown. One wishes that unknown f could be approximated by SLFNs appropriately ${f_{l}}$ .

B. Learning Methods:

Many learning methods mainly based on gradient-descent / iterative approaches have been developed over the past three decades.
- Back-Propagation (BP) [Rumelhart 1986]and its variants are most popular.
Least-square (LS) solution for RBF network, with singleimpact factor for all hidden nodes [Broomhead and Lowe 1988].
QuickNet(White, 1988) and Random vector functional network (RVFL) [Igelnikand Pao 1995].
Support vector machines and its variants. [Cortes and Vapnik 1995]
Deep learning: dated back to 1960s and resurgence in mid of 2000s [wiki 2015]

Let's understand the Aim of ELM:

According to Professor Huang:

Unlike conventional learning theories and tenets, our doubts are "Do we really need so many different types of learning algorithms (SVM, BP, etc) for so many different types of networks (different types of SLFNs (RBF networks, polynomial networks, complex networks, Fourier series, wavelet networks, etc) and multi-layer of architectures, different types of neurons, etc)? "

Is there a general learning scheme for wide type of different networks (SLFNs and multi-layer networks)?

Neural networks (NN) and support vector machines (SVM) play key roles in machine learning and data analysis. Feedforward neural networks and support vector machines are usually considered different learning techniques in computational intelligence community. Both popular learning techniques face some challenging issues such as: intensive human intervene, slow learning speed, poor learning scalability.

It is clear that the learning speed of feedforward neural networks including deep learning is in general far slower than required and it has been a major bottleneck in their applications for past decades. Two key reasons behind may be: 1) the slow gradient-based learning algorithms are extensively used to train neural networks, and 2) all the parameters of the networks are tuned iteratively by using such learning algorithms.

Why ELM?

How ELM theories manage to address the open problem which has puzzled the neural networks, machine learning and neuroscience communities for 60 years: whether hidden nodes/neurons need to be tuned in learning, and proved that in contrast to the common knowledge and conventional neural network learning tenets, hidden nodes/neurons do not need to be iteratively tuned in wide types of neural networks and learning models (Fourier series, biological learning, etc.). Unlike ELM theories, none of those earlier works provides theoretical foundations on feedforward neural networks with random hidden nodes;
ELM is proposed for both generalized single-hidden-layer feedforward network and multi-hidden-layer feedforward networks(including biological neural networks);
homogeneous architecture-based ELM is proposed for feature learning,clustering, regression and (binary/multi-class) classification.
Compared to ELM, SVM and LS-SVM tend to provide suboptimal solutions, and SVM and LS-SVM do not consider feature representations in hidden layers of multi-hidden-layer feedforward networks either.

What's next

Now we know all the basics which are required to understand ELM better and we are good to go deep to understand mathematical aspect of ELM along with details of how that works.
In next part, we will define ELM and see how ELMs learn. If you want to learn more about Neural Networks, XoR Problem you can visit articles I linked here.

References:

Building BookWorm: A book info & recommendation bot using Twilio!

Jayesh Bapu Ahire — Mon, 30 Mar 2020 17:59:30 +0000

In previous article, we built the WhatsApp bot to fight fake news! If you missed it you can check it out here. In this detailed tutorial we will see how we can build a bot which will give us some book recommendations and tell us information about a book we want.

Let's just jump into this!

Aim:

We will be building the WhatsApp bot which will give us more information about the book whose name we will be providing as an input and will also recommend us similar books!

What we will need?

A Twilio account --- sign up for a free one here
A Twilio whatsapp sandbox --- configure one here

Set up your Python and Flask developer environment --- Make sure you have Python 3.
ngrok so we can expose our local endpoints so that we can receive incoming webhooks

Dataset:

We will be using goodbooks-10k dataset.

This dataset contains six million ratings for ten thousand most popular (with most ratings) books. There are also:

books marked to read by the users
book metadata (author, year, etc.)
tags/shelves/genres

You can download zipped data from here: https://github.com/zygmuntz/goodbooks-10k/releases

Pre-Processing:

We will do some preprocessing for initial stage. This is ready to use dataset though we will drop some columns which we will not be using and fill some blank cells.
Initially the dataset has 23 columns out of which we dropped 4 columns which are title, work_ratings_count, image_url and small_image_url as we won't be using them.

# importing pandas module 
import pandas as pd 

# making data frame from csv file 
books = pd.read_csv("books.csv", index_col ="Name" ) 

# dropping passed columns 
books.drop(["title", "work_ratings_count",  "image_url", "small_image_url"], axis = 1, inplace = True)

#filling blank values with "Not Available" 
books = books.fillna("Not Available")

We will do some occasional formatting whenever needed.

Let's split further task into two modules:

Fetch Book information
Book recommendation system

Let's see first part:

1. Fetch Book information:

This part is very simple and you don't need to know anything apart from basic python.
Here, we will go through our clean CSV file which we got after preprocessing and search for book title which we received from user in title field in csv (which is renamed original_title field).
If we find a match we will return the index of that row and store that index into list of matched books.
So now we have the list of indexes books matching to user query. Now we can fetch whatever information we want so let's just keep this list aside for a second.

def get_matches(book_title):
    matching_books_list = []
    with open('clean_books.csv', 'r') as file_reader:
        flines = file_reader.readline()
        print(flines.rstrip())
        search = file_reader.readlines()

        for i, sline in enumerate(search):
            if book_title.upper() in sline.upper():
                matching_books_list.append(i)
    return matching_books_list

Now, we have to see how we can infer for serving results on WhatsApp!
Let's create a flask server for that which we can use to serve our book information API.

app = Flask(__name__)
@app.route('/sms', methods=['POST'])
def sms():
    resp = MessagingResponse()
    inbMsg = request.values.get('Body')
    book_list = book_ratings.get_matches(inbMsg)
    df = pd.read_csv('clean_books.csv')
    for i in book_list:
        resp.message(
            'The book with title' + df['original_title'].iloc[i] + 'written by ' + df['authors'].iloc[i] +'has average user rating of ' + str(df['average_rating'].iloc[i])+' and this book is reviewed by '+str(df['work_text_reviews_count'].iloc[i]+'people.')'.\n ---------------------------------')
    return str(resp)

Now let's go ahead and see how we can build the recommendation engine.

2. Book recommendation engine:

We will generate recommendations using 3 different criteria. For all of them we will vectorize the input and find the cosine similarity with our particular columns in our dataset and return ones those have very high similarity.
Let's see how we can do this

Author Based recommendations:

tf = TfidfVectorizer(analyzer='word',ngram_range=(1, 2),min_df=0, stop_words='english')
tfidf_matrix = tf.fit_transform(books['authors'])
cosine_sim = linear_kernel(tfidf_matrix, tfidf_matrix)

# Build a 1-dimensional array with book titles
titles = books['title']
indices = pd.Series(books.index, index=books['title'])

# Function that get book recommendations based on the cosine similarity score of book authors
def authors_recommendations(title):
    idx = indices[title]
    sim_scores = list(enumerate(cosine_sim[idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    sim_scores = sim_scores[1:21]
    book_indices = [i[0] for i in sim_scores]
    return titles.iloc[book_indices]

Tags based recommendations:

tf1 = TfidfVectorizer(analyzer='word',ngram_range=(1, 2),min_df=0, stop_words='english')
tfidf_matrix1 = tf1.fit_transform(books_with_tags['tag_name'].head(10000))
cosine_sim1 = linear_kernel(tfidf_matrix1, tfidf_matrix1)

# Build a 1-dimensional array with book titles
titles1 = books['title']
indices1 = pd.Series(books.index, index=books['title'])

# Function that get book recommendations based on the cosine similarity score of books tags
def tags_recommendations(title):
    idx = indices1[title]
    sim_scores = list(enumerate(cosine_sim1[idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    sim_scores = sim_scores[1:21]
    book_indices = [i[0] for i in sim_scores]
    return titles.iloc[book_indices]

Corpus based recommendations: In this we will build recommendation of books using the authors and tags attributes for better results. We will create corpus of features and calculate the TF-IDF on the corpus of attributes for gettings better recommendations.

books['corpus'] = (pd.Series(books[['authors', 'tag_name']]
                .fillna('')
                .values.tolist()
                ).str.join(' '))

tf_corpus = TfidfVectorizer(analyzer='word',ngram_range=(1, 2),min_df=0, stop_words='english')
tfidf_matrix_corpus = tf_corpus.fit_transform(books['corpus'])
cosine_sim_corpus = linear_kernel(tfidf_matrix_corpus, tfidf_matrix_corpus)

# Build a 1-dimensional array with book titles
titles = books['title']
indices = pd.Series(books.index, index=books['title'])

# Function that get book recommendations based on the cosine similarity score of books tags
def corpus_recommendations(title):
    idx = indices1[title]
    sim_scores = list(enumerate(cosine_sim_corpus[idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    sim_scores = sim_scores[1:5]
    book_indices = [i[0] for i in sim_scores]
    return titles.iloc[book_indices]

Above functions will return list of recommended books but for the sake of inference we will return list of indexes which match our criteria using return book_indices instead of return titles.iloc[book_indices].

Now let's see how inferencing will work in case of book recommendation system.

app = Flask(__name__)
@app.route('/sms', methods=['POST'])
def sms():
    resp = MessagingResponse()
    inbMsg = request.values.get('Body')
    rec = recommendations.corpus_recommendations(inbMsg)
    df = pd.read_csv('clean_books.csv')
    resp.message('Recommendations based on your input:')
    for i in rec:
        resp.message (df['original_title'].iloc[i+2]+ "\n")
    return str(resp)

You can use any of the recommendation function though I have used corpus recommendation here as it considers both author and tags.

Final steps

Once this is done, we will run our flask server using this:

FLASK_APP=app:app FLASK_ENV=development flask run

To test this we'll need to open up a tunnel to our server running on our machine. We will be using ngrok for this. Run this once you have installed ngrok:

ngrok http 5000

This will open a tunnel pointing to port 5000 and will provide us a public ngrok URL which will point to our local application. Now, we have to open the WhatsApp Sandbox in our Twilio console and enter that URL plus the path /sms into the field labelled When a message comes in.

Let's send our sandbox number a message with book name and let's see results:

Book Information

Book Recommendation

We have successfully generated book information and recommendation! Isn't this cool?

You can find the complete code here.

What's next?

This was an basic intro to how you can create an recommendation system using Twilio WhatsApp API or Messaging API. You can use similar approach to enhance customer experience in your business.
What you are planning to build with this? Let me know in comments below or hit me up on twitter !