Kazuya

Posted on Dec 6, 2025 • Edited on Dec 8, 2025

AWS re:Invent 2025 - Building an AI platform: Adaptive teaching to institutional auditing (WPS316)

🦄 Making great presentations more accessible.
This project enhances multilingual accessibility and discoverability while preserving the original content. Detailed transcriptions and keyframes capture the nuances and technical insights that convey the full value of each session.

Note: A comprehensive list of re:Invent 2025 transcribed articles is available in this Spreadsheet!

Overview

📖 AWS re:Invent 2025 - Building an AI platform: Adaptive teaching to institutional auditing (WPS316)

In this video, Cornell University's Marty Sullivan and Fuhrman Romero explain how they built an AI sandbox platform using Amazon Bedrock, LiteLLM, N8N, and LibreChat to democratize AI access across campus. They showcase three production use cases: VERA, a values exploration chatbot serving 900 engineering freshmen with 40,000 interactions; a Socratic chatbot using Bloom's taxonomy for 300+ students in climate courses; and an expense automation system processing 10,000 reimbursement requests per semester, saving 30 minutes per request. The platform enables rapid model deployment—new Claude versions go live within days—and supports 24 internal projects with 70+ graduate students. Their API-first architecture with ECS Fargate, CodePipeline, and Bedrock Guardrails allows citizen developers to build AI solutions while maintaining security for medium-risk university data, eliminating shadow IT and project backlogs.

; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

The AI Innovation Challenge: From Confusion to Campus-Wide Solutions

All right, thanks. First of all, thanks for waking up early with us today, guys. Didn't stay out too late, I guess. So let me first tell you about a problem that probably exists at your organization today. Someone, probably a staff member, a student, a faculty member, maybe you or someone on your own team, is working on a process or a workflow, and you've been using it for years, and it's either broken or inefficient. You think to yourself, or they think to themselves, I wonder if I could use AI to improve this process or fix this process.

And there's just so much noise and buzz going on around AI services. What services are available, what models are available to use, what they're allowed to use, they just don't really know where to start, right? And they just get kind of stuck before they can even start experimenting with things. They don't know who to ask for approval internally in order for them to actually start testing these services, start using these services. They start spinning up things like shadow IT, or they start taking their sensitive data and sending it up to public AI services just using their own personal individual logins. Or even worse, they basically just give up because they're stuck, and no innovation happens. They go back to their old legacy, inefficient process that they've been using forever.

Now, Cornell had that same problem until they built a platform that everyone could centrally work towards and give everyone in the campus a starting point to start innovating and experimenting using AI solutions. And today you'll meet two people who helped drive this change, my two wonderful, awesome customers that I personally get to support. And they took their campus, Cornell University, from "I wonder if AI could help solve these problems" to "AI actually is helping solve these problems." They built an AI sandbox platform using Amazon Bedrock and some of our serverless architecture that you'll see in a little bit. And this allowed professors, students, faculty members, central IT members all to have a central starting point to start experimenting with their own workflows and processes and seeing how they can innovate using AI services.

Cornell's Strategic Shift: Empowering End Users Instead of Centralizing AI

My name is Mike Bergnoni. I'm a senior solutions architect here at AWS, and joining me today are Marty Sullivan and Fuhrman Romero of Cornell University, and you're going to be hearing from them in a moment. They're the real brains of the operations up here today, how they implemented the strategy at Cornell, as well as the key AWS services that they used along the way.

Now Cornell really had the same challenges every other campus, or a lot of organizations did. ChatGPT came out around 2022, and everyone suddenly wanted AI, right? But really at that time there wasn't a safe place to experiment with it. There was no real way to integrate it into existing workflows and processes, and there was no controls at the time around cost or data protection and privacy. And they could have done what many organizations did or are still doing today, create this centralized AI team that focuses just on AI projects, take in requirements, and they're the only ones that are allowed to decide how AI is used and where it's used. And that runs into your typical project backlogs, and by the time they release an AI solution to their end users, the use case probably already changed because AI is changing so rapidly as we've seen.

But instead, Cornell asked themselves a different question. They asked, what if we stopped trying to build the solutions for our end users and what if we let them, gave them the tools to build these solutions themselves? What if the professors in the classrooms could actually see the problems that they're having and actually be the ones to solve it using AI? And you'll hear one of the use cases actually is just that. And their belief was that AI shouldn't be this specialized project that no one else has access to. It should be something like a daily tool like we use email or Slack, and everyone should be allowed to use it.

Three Use Cases and the AWS Technology Stack Behind Cornell's AI Sandbox

That meant that the barrier had to be near zero so that everyone could easily adopt it, right? And that philosophy that the individual contributor is really what matters most when finding these problems to work on in your organizations is really what drove Cornell's AI sandbox platform to fruition. Now Marty and Fuhrman are going to walk you through three specific use cases today. A chatbot that aligns student values to their educational goals. A Socratic chat application that can show whether a student actually understands the course material that a professor is teaching them versus just them memorizing answers to test questions.

And then an AI-powered expense application that actually helped clear out a constant backlog of expense reports. Now, before I turn it over to them, I just want to show these are some of the key AWS services that all work together to help streamline Cornell's application deployment process and allowed for this rapid innovation and experimentation at Cornell.

CloudFormation manages their infrastructure deployments, and they use CodeBuild and CodePipeline to orchestrate their CI/CD workflows. ECS Fargate was what they used to deploy their containerized applications, and it allowed them to handle scaling automatically without having to have anyone manage individual servers or worry about OS maintenance, things like that. And throughout this whole process, Bedrock was the service that they used to provide access to foundation models which helped power their AI applications. And as I mentioned before, technology is rapidly changing, and when Anthropic was releasing their new models of Claude, the new versions of Claude this year, Cornell had immediate access to those through Bedrock, which was huge. There were no additional vendor negotiations. There were no API rewrites or anything like that. All they had to do was update the config in their code, one line, change the model name, and redeploy the application. Now they're on the new version of Claude.

Now the combination of all these services are what created this automated, efficient system for Cornell to experiment and deploy these AI-powered applications into production. All right, that's enough from me, so let me turn it over to Marty and Firman to tell you how they executed the strategy, but also listen in on how you can take away some of this and apply this to your own organization.

Building a Central AI Platform: The Three-Component Architecture

Thanks, Mikey. Good morning everyone. My name is Firman Romero. I'm an application development manager at Cornell University, and I just kind of want to recap a little bit what we're going to talk about. So we're going to talk about how we built this platform. One of the things that I've found out as I've gone around and talked to people is by learning how we sort of rolled this out to bring it to our entire university, it made it so we can have so much less friction to be able to deploy new products. Mike mentioned new models are able to come out as soon as we can load them up, as soon as they show up on Bedrock. We're going to talk about the three use cases. I'll walk through one of them. Marty's got a couple of the others, and we're going to take a technical dive into how those use cases work and how we're actually solving real problems with AI right now at our university level. And we'll talk about the technical stuff on how we got there, and our hope is that you're going to understand that if you sort of follow this sort of model, you'll also be able to deploy on a much broader scale for your organization. It doesn't just necessarily have to be higher ed.

So, as Mike had mentioned, our journey started out with a question, right? When ChatGPT came out, the entire university, all of the colleges across the university and their individual departments, they started reaching out to OpenAI and reaching out to Anthropic and trying to figure out how they could get accounts for each one of their individual schools. And right away we knew that this was going to be a long-term headache. There's no management, there's no observability, there's no anything, and also there's no way to control costs in that scenario. So we started to think, what if we could build some sort of a central AI platform? Like what does that look like? How do we even bring a central AI platform or what can we build? Like how does that even work? And how can we make it accessible to all of our developers on campus?

One of the things that was really important to us is that we can put this technology in the hands of as many people as possible, because when you confine it to just a small group of people, that group of people may be making decisions or not understand other people's use cases, and so we want everyone to be experimenting with this. And we learned about keeping humans in the loop, and we learned about how our data works with all of these things.

So I'm going to talk a little bit about the high level and then I'm going to turn it over to Marty here, but basically our platform's made up of three components, and the first component is LiteLLM, right? And that is what we use as our AI gateway that lives on top of Bedrock. Then we have another automation platform called N8N. That's what we call our agent studio, and that's where we actually are able to build all of our automations in a low-code environment. This allows us to have what we're calling citizen developers. And then last we have LibreChat, right? And that goes back to that ChatGPT thing where everyone wanted to have that, but if we can implement this on top of our own gateway, we can provide them a lot more model access.

AI Commons and Agent Studio: Reducing Technical Debt Through Low-Code Solutions

So I'm going to hand it over to Marty. He's going to give you a little bit more technical information. Sure. Thanks Firman. So I'm Marty Sullivan, so I'm kind of the one who architected all of this stuff and implemented it kind of as fast as possible, and put it in the hands of as many people as I could.

The first point I want to draw attention to is the diagram. What we're calling our AI Commons is the first thing that end users on campus are seeing. This is for your day-to-day people who are using AI, and they might be fine with just using a chat application like ChatGPT or Claude Desktop. But we're also trying to build things in a way that reduces our technical debt.

What we're trying to focus on is building more conversational interfaces, so building things into applications like Slack and Microsoft Teams, or just email back and forth. We want to treat AI agents as a coworker rather than just another application that we're building and managing. So we're trying to veer away from building things like web interfaces and web applications where each person is building something different and we have to figure out how to deploy it and then maintain it for years. We might still have to maintain the workflow, but we really want to not have to maintain applications.

One of the ways that I'm enabling that is with our Agent Studio. We're powering our Agent Studio right now with an application called N8N. So N8N is a visual and low-code workflow building service. Really what we're trying to do is when we're building agents, we want to consolidate all of the technical debt we're building into a no-code, low-code platform like that so that we can have many experts across campus on that platform. And then they're able to quickly understand what other people have built in there rather than trying to have to go out and learn an entire tech stack of what some developer built, or a grad student sometimes even, in their tech stack.

And then all of this stuff is brought together with our AI Gateway. As Furman mentioned, we're using the LiteLLM software, which is open source. And the great thing about that is we can gate access to not just AI models but things like MCP servers that hook up to chat clients and give people access to the agents we're building. And it makes it so the IT organizations themselves can govern access to not just the data sources, because we have many data stewards who need to build special views of data for our agents, but then it gives the IT organizations the power to control who can access what. If we're building a chatbot around something like the registrar that has student information in it, we need to make sure that only the right people can access that particular agent.

Platform Scalability and Rapid Model Deployment with Day-Zero Support

So this all matters because we wanted to build something that gave everyone access to whatever they needed. We're not locked into a single AI model provider. Anybody can switch to different models based on what task they're working on. They can have fallback models to make sure that their application is resilient and highly available, or maybe they just want to use the cheapest model and their problem is solved by really anything.

Obviously, since we're building a platform that critical applications might depend on, we needed something, we knew we wanted to build this in AWS because we needed to be able to scale out and accept thousands to maybe someday even millions of requests. And so we're testing that out today. Already we have 24 of our internal projects using the gateway every day. And just think about the scale of these types of projects. We have brought on 70 master's level students to help us with tackling all of these problems that people have submitted to us.

And people like Furman have joined us as tech leads who come from all of the different departments on campus who are really interested in being involved in AI and want to lead these types of projects and work with different customers that they're not used to working with. And then obviously for the IT organizations themselves, we're really interested in also just monitoring, just knowing what are people doing with AI and how are they using it and how much are they spending on it.

And it just gives us insight into if we do need to produce an audit report someday for what Cornell University is using AI for, we could pretty easily put that together. We provide centrally a shared responsibility model, trying to take a similar approach to AWS. We're building this model because we're putting these tools in the hands of not just IT people but also tech-savvy power users on campus who might be able to automate some of their own daily tasks.

We have to have some sort of agreement with our customers about the ways they should use AI and the type of data they're allowed to use. If they have really sensitive data, we might need to have a conversation about how we can work with them and make it so they can use our gateway while also protecting the sensitive parts of their data, whether that's due to compliance or just university policy. Some ways we can do that is by creating guardrails and monitoring and our own automations that look at everybody's agents that they're building and give us alerts if people are maybe doing something they shouldn't with certain credentials or things like that.

At the end of the day, my background is as a DevOps engineer, so I also wanted to make sure that we had a rapid release cycle for these things because we all know how fast these AI models and availability are changing. I built our stack around AWS's CodePipeline. We have everything in source control in GitHub and store our container images in ECR, and we build those containers using CodeBuild. Then we deploy containers out to ECS using CloudFormation. We also have everything built into our single sign-on that everybody uses.

We use the Application Load Balancer to create high availability and serve our services across availability zones, and we use the Fargate service to simplify infrastructure so we don't have to provision EC2 instances or anything. We're just sending out Fargate containers. Really, the greatest thing about this release cycle is that the software we're using, like LiteLLM and N8N, they have day zero support of new models, and then generally they'll have a stable release within one to two weeks of new API features for the different model providers.

This all leads to us not forcing people to use some platform where we're two months behind on what the latest and greatest is. We're trying to keep up as fast as we can with new releases, and pretty much for the last six months, every time there's been a new model, we've had it in there within ideally the day it's released, but usually within a few days. People have been really happy with that.

VERA: Values Exploration Chatbot for 900 Engineering Freshmen

Now I'm going to start talking about some actual use cases. We think we picked some really interesting ones to talk about today. The first one I'm going to talk about is our Sibley Center's Values Exploration and Reflection Assistant, or VERA. Just to describe the problem a bit, every year we have about 900 new incoming engineering freshmen, and an activity that the associate dean of the Sibley Center likes to have happen is she wants her students to come in knowing what their values are and what goals they want to reach, basically why they're here at Cornell over somebody else and how they can embody and reach those goals.

Some students might come in thinking they know what their values and goals are, but maybe they have an abstract sense of what that is but can't really articulate to another person what those are, and that means they can't really articulate it to themselves.

So what they do is they bring in professional coaches. I think they hire around 70 coaches, and every student that comes in gets a one-on-one session with a coach. So what we needed to do was find a way for the students, before they met with the coach, to generate basically what their values and goals were and give a summary to the coach so that they could go in knowing sort of where to start talking to the student.

So how we implemented this was we built an N8N workflow and we worked together with the assistant dean to create this. So she spent time using Claude for Desktop to actually write out a very, very complex system prompt. She worked really hard on this for, I would say, probably several weeks, and it was very detailed and it had PDFs with research information and everything about how she wanted the model to behave, and she was able to do that all on her own. Then she came to us with that, and we were able to basically plug in her data set and her system prompt into an agent in N8N, and it was like ready to go. I mean, we were able to use the same models that she was using in Claude for Desktop to make sure that it was behaving exactly the same as it would when she was testing with it, and we were able to configure some automatic failover.

Some things that she was worried about was she didn't want students to be able to do some of their own prompt injection and change the way the model was behaving or have inappropriate conversations with the chatbot. So we were able to use Bedrock Guardrails also through our gateway to sort of keep students on track and make them not be able to alter what the purpose of the chatbot was, and all of this came together pretty quickly. So here's sort of what the workflow looks like, and you can see it's not even really that complicated. N8N has a chatbot built into it. You can see there's a shared conversation history between, in the top right is where the actual agent is configured, and you can see that the conversation history is shared between what the user sees in the chatbot and what the agent sees when it's interacting with what the student is sending back and forth.

And we have a specific step where we're applying those guardrails I was talking about, and then we can handle errors gracefully and things like that. So it's really a great platform. And so just to go over some success metrics, we pretty much started talking about this with the Kellner Center in I would say June of 2025, and by July we had launched it and exposed it to over 900 students. Once we configured this, we got zero support requests, and that made me think, well, are people actually using this? But I found by the end we had had over 40,000 back-and-forth interactions with students.

And the greatest thing about this was, like I said, the most work that went into this was the person who cared about this project the most. She was able to make the chatbot essentially, and we just sort of on the back end threw together an agent that was able to do exactly what she wanted it to do without any IT knowledge for her, and really for us it probably took a few days to put together at the end of it all. And so here's just a few quotes to show, and I will say these are really positive quotes and we didn't actually cherry pick these at all. I mean, really we had a lot of feedback that Erica Dawson, who is the assistant dean, collected from the students, coaches, and her staff, and really it was overwhelmingly positive.

Socratic Chatbot: Assessing Student Understanding Through Bloom's Taxonomy

So that was Vera. So that was kind of like a, you know, here's a once-a-year type of thing that we need a chatbot for. The next thing I'm going to talk about is our Socratic chatbot. So this is more of an everyday application that instructors can use in their classrooms. So again, to start with the problem itself, the challenge we're trying to solve here is specific to large classes. So I've worked with Professor Toby Alt for a long time.

He's actually my advisor because I'm also a PhD student on the side. He teaches a course called Climate and Energy, and it's a very large class with over 300 students in it. Typically, it is a science course, but it's not really for science majors. It's usually people from outside science majors who are coming in to take it to fulfill their science requirement for their degree.

With these types of students, one of the challenges is how do you really assess that these students are understanding what's being said in lectures? If you read Toby's quote here, the only way he can know is based on what their faces look like when he says something. A lot of times students might just be tired or something, and there's not really a great way to really know if the material's hitting.

I worked together with Toby. I sat down with him and took the product manager approach and came up with some user stories and acceptance criteria with him for what we wanted. I also worked with our Center for Teaching Innovation, which manages all of our teaching technology at the university that's used in the classroom. The things that I came up with were that instructors need to be able to use the tools they're already using. We didn't want to make a separate application and make professors go somewhere else, so we knew we wanted to build it right into Canvas, which is our learning management system that we use at Cornell.

It really needed to be easy for the instructor. I had one specific professor who I talked to, and they said, "Look, if this is going to take me longer than 10 or 15 minutes, I'm just not going to use it." So I had to meet that threshold in engineering an application like this.

For the students, the feedback we got from the Center for Teaching Innovation was that we really need to be using a structured learning framework that's pedagogically proven. So we chose Bloom's taxonomy because that's actually a common taxonomy at Cornell. I'm not going to go too deep into it, but basically there are different levels. You can picture it like a pyramid. It starts at the most basic level where if you can recall basic facts about a topic, that shows the most basic level of your ability to understand this topic, all the way up to a PhD level where you should be able to create new solutions around that topic.

Also for the students, in a technical sense, if they're given an activity in a course, they need to know when they're going to be done with this. So you can't just give them a chatbot and have them have to go back and forth with it. One of the things that we built in was a progress bar for them that's actually transparent and guides them through Bloom's taxonomy. Basically, at this level of course, we're really only interested in them getting to level three. We're not expecting them to be PhD students, so we just want to be able to have them be sure that if they get a question around climate change, they should just be able to understand how to apply that knowledge in the real world.

The value here is that students are able to, on their own time, come back and forth to a topic and spend time thinking critically about it rather than just going in and answering multiple choice questions. It really creates the same type of dialogue that students would have with a TA or a professor during office hours. They get the immediate feedback, and then for instructors, what we also do is we're able to analyze the conversations afterwards and gain insights into things like, okay, well if 20 students got topic A and everybody got to the applied level of Bloom's taxonomy, then we can assume that the class has a general understanding of that. But if five students out of 20 struggled on topic B, that's a clue that maybe I need to revisit this topic in the next lecture.

And then I'm going to hand it over to Furman to talk about our next use case. Before I go, please visit the public sector booth in the expo because you can actually view a demo of the Socratic chat application there.

Automating Expense Reimbursements: Solving 10,000 Requests Per Semester

So for our third use case, we're going to talk a little bit about automations, because as we know, these AI tools can do more than creating chatbots or things like that. And so the N8N platform allows us to do some real automated work and save some real time. So I want to talk a little bit about this project that we've got. I've got some graduate students that I'm working with this semester who are also working on this. Shout out to my grad team on this.

So when I was in college, I would raise money for my club by going across the street and buying a sheet pizza. Then I would sit in the hallway and I would sell the pizza for two bucks a slice. And after all the pizza was sold, I'd go up to this lady named Judy, and I would take her the receipt and I would hand her the receipt, and she would write me a check to refund me out of our club funds for that pizza. And I did this all the way through college. Fast forward to 2025 and that same person who is doing the job that Judy used to do is suddenly looking at contract negotiations and gigantic catering bills and AV stuff.

So for example, the chess club on campus recently put on a tournament, and that tournament had an AV company and a catering company and all of these things that suddenly this person is spending all this time on to refund these students out of their campus funds. And we have 1,500 campus groups on campus, and they get about 10,000 requests per semester. And when she first told me 10,000, I actually thought she was exaggerating, right, because that's the number we all use. How many things do you have? I got 10,000 of them. But no, she legitimately gets 10,000. So we had to figure out what are we going to do, how are we going to kind of solve this.

So with 1,600 groups on campus and 10,000 reimbursements, it sometimes can take up to 30 minutes per request. There's all these rules that they have to follow. You know, you can't buy alcohol, you can't do this, you can't do that, and these are just things that, you know, it's not easy to automate. It's not a thing that we can make this job easier for her. She's looking inside our campus financial system, she's looking inside the campus groups system, and all of this stuff with just multiple screens that have to come up that integrations would be complicated to build. So this is just a time-consuming process that somebody said, hey, I think that AI can solve this problem.

So if the student goes to, I don't know about you folks, but like when I go out to dinner, I don't know where that itemized receipt is like the next morning, let alone like an 18-year-old student being asked a month later where that itemized receipt is, right? Because again, she's got to look for things like, what did they buy, alcohol or did they buy gift cards or anything like that. So the way that we put this together through automation on the N8N process is we do what we're called contextual prompting, right? And so instead of like prompt engineering, what we can do is we can go and gather a bunch of this data.

So through this, we can program through API access to our various systems where it can actually go and programmatically begin gathering this data. So this is just code part of it and it goes and looks for that budget. It goes and looks for various items, right? And then it takes that receipt and it sends the receipt up to the LLM and begins asking questions like, is this an itemized receipt? Does this have any of this or any of that? And that goes through multiple processes and the whole workflow sort of takes it through getting to a final prompt that gets sent to the LLM.

So the question is, right away, one of the things that we figured out was, what is your biggest pain point? And honestly that was the first thing that we were told was the itemized receipt thing. Like they just don't realize like the difference between those two receipts they hand you at the restaurant. So we send that up to an LLM and we just ask a simple question, is this an itemized receipt? That way we can get some hard information and the LLM is super good at that, as you probably know.

Next we can ask, is there any alcohol on this, right? And there's a couple of gotchas on there, and I'll talk about that in a minute. But we can ask all of these basic questions and say is this this and we ask it to put out a JSON format in a very definitive yes or no because I know on questions like that LLM isn't required to hallucinate, it's just required to make an analysis on a particular image and receipts have text on them, so it's pretty easy.

So this is kind of what our workflow looks like. So you can see there's that data gathering process that I was talking about where we go out and I actually go and download the guidelines and I actually go and download the audit instructions and we save these in an S3 bucket so that way we can modify them. It's not hard coded into the workflow at all. It's just an MD file that sits out there. And then we start putting that all together, right?

So I make a call out to the campus groups information to see what kind of transactions they've got because when the student submits for this reimbursement, they go through a survey process and answer various questions. So I can pull all that information in and I start building this prompt that effectively is you are this kind of auditor, you are looking for this, you are looking for this, here's the information that you have. Amongst this information, answer questions about these particular items. And it starts to develop this overall what we call like an answer card, and that answer card just ultimately comes to a very simple yes or no, like can this be approved, right?

So if it's a pizza receipt, like I had talked about back when I was in college, the thing that you're going to find is very easily the AI can say, yeah, that's good, right? But if something a little more complicated like that tournament that I had talked about, you might get a thumbs down, but that doesn't mean that it's bad, it just means that there's something to review. As part of that JSON export that we require from the LLM, that JSON export has all of the answers in that card, and I force it as part of the process to talk through why it made certain decisions.

So when the human looks at it and it's got a thumbs down, it will say thumbs down for no itemized receipt, or thumbs down for it looks like they bought a gift card, or in the case of a couple of students who went to Olive Garden and ordered vodka pasta, no matter what I try, I can't get the LLM to not tell them they're buying alcohol. I've put it in there to use your common sense about alcohol, but the LLM just continues to flag that one. But these are some of the weird things that we have to overcome, and it's been an interesting project to work because, like we had said, we've got students who are helping us with this and all of this is able to be rapidly iterated because this platform is a low code platform with nodes.

So the one thing that we know about is these models, they can hallucinate pretty easily, but the main thing that we're finding out is when we go in and we say to the AI, all right, give your initial draft of this audit, like tell us what's going on, it's so often that we get a green thumbs up that we can simply go in there and the LLM will have talked through like here's why it's green, these two things match, they've got enough money in their funds, it's going to come out of this account. And so in time, even though we've got a human in the loop right now, in time, I think the auditors who are that human are going to be a little bit more trusting and they're going to start to understand, but right now, every single one of them stops and every single one of them gets looked at.

So I'm working with a guy named Jonathan Hart over there at the finance, and he was the first person who came to me about this, and this is another kind of lesson that we've learned along the way here is it's the individual contributors who are really the folks who are going to be able to tell you exactly how AI is going to solve their problem. So where you have a manager who might say, I think AI can solve that, right, oftentimes they don't fully understand what that means or what that process is, and what we're finding is that the individual contributor is your partner in this, they're the people who you can look right at them and say, what's the hardest part of this thing? What do you get stuck on? In this case, we found out it was those receipts.

So we're saving over 30 minutes per request on 10,000 requests per semester. I know that math doesn't work for how much a single person has in a semester, but the reality is some of these things do take a very long time. Generally, coming into the fall semester, I went in and looked at the queue, and I think there was still a couple hundred in there, and we were getting ready to come into the fall semester, still finishing out from the previous semester. And you know it's at a certain point you just pick the ones that are under a certain threshold and you make sure that you've got basically everything there, but this is going to save all of that time because going into it, the auditor can look at that, they'll have that green thumbs up on it and they'll just be able to quickly go in and look, yeah, it looks like Furrman bought a pizza and he wants 25 bucks back for the pizza.

So the other thing is, because all of the same rules get applied and because it's looking at those guidelines and we've got those guidelines saved in an S3 bucket, they can go in and change them. So one of the things that we also learned about was every semester the rules are slightly tweaked because they learned some student exploited something in a certain way or one of the clubs didn't spend their money exactly the way they should. And so this allows us to quickly change that system prompt and be able to just continue iterating and moving forward. So and then lastly we built a dashboard for this and that allows us to get a lot of insight into what's being submitted and things that we didn't have before.

Technical Implementation: Self-Hosted Software and API-First Design

So moving past that, this is all made possible like everything about this is sort of made possible by the fact that we've built all of this on top of Bedrock and built this with N8N because this allows people to be able to see how our AI works on campus, right? It's not a nebulous thing where they come to a central department and say, hey, build me this AI solution, because they understand what those workflows look like. We were able to talk to them up front and say, hey, here's how this workflow works, and that's what we've been doing is going around campus and talking to them. What does workflow automation look like? What does it mean? How does it work?

And that requires us to really start at ground zero, and I'm going in there and I'm spending time with another colleague telling people about the definition of GPT, telling them about how neural networks work, and giving them that baseline information that a lot of us kind of picked up over the past couple of years, but these are folks who've never been exposed to that, you know, when they're playing with ChatGPT they think that ChatGPT is that chat interface. And so once we were able to roll out our sandbox which was built on LibreChat, people start to understand sort of how these platforms sit on top of a gateway, that gateway being Bedrock. It also makes it incredibly easy to deploy new models too.

As Marty mentioned, a new model will come out, and I make it a fun thing. Every time a new model comes out, I'll email Marty directly and I'll make some image of people holding up protest signs like "We want Claude 4.5," and I'll email it to him in the morning and say, "Hey, we've got to get the new model up." Usually I see it that day. This means immediately my dev teams on launch day for a new model are able to use these new models. Because we know that our gateway is already negotiated for medium-risk data, there's no friction there either. So if a new plugin comes out and that plugin can run through our gateway, that's it. I can immediately put that into production because I already know it's the gateway that is approved for our medium-risk university data. And everyone on campus knows what medium-risk data is too, so when we label it with that, people get it.

I'm going to turn it back over to Marty to talk about the rest of the technical stuff. Thanks, Furman. A lot of what we started with when we started thinking about this was we really decided that we wanted to go with self-hosted software. So that's both the N8N software and LiteLLM, which have self-hosted options even with their enterprise licensing. That was one of the things that we wanted, not just because we want to make sure data stays within our environment, but also we have to access a lot of sensitive data sources at the university. Having to configure some software as a service cloud service out there to be able to get into our private networks is really difficult, whereas when we're in AWS, we have a direct connect set up back to our campus. So we can even access systems on our private network on campus, so that was really important to us. And we really like open source or source-available software in the case of N8N because then we can make contributions to the source code to meet our needs. So all of those things came together for us. It's things that we wanted and why we chose the particular pieces of software that we did.

In terms of integration patterns, we really want an API-first design because it really allows anybody to do stuff. So we can have developers who are building applications and they're just like, "Look, I don't care about your no-code platform. I just want to build a really cool application on my own." We can enable that for them. Or there may be a developer who wants to use an AI coding assistant in their IDE like VS Code or something like that, or Cursor in the case of, you know, if you've seen that around on AWS. I want to configure this IDE to use my gateway and be sure that my private code is going to our approved models. That was another really cool thing that we can enable through our gateway.

Some of the challenges though are we still need to work with our data stewards and global Microsoft admins and things like that when we want access to things like Microsoft Teams to build a conversational user interface or accessing certain email. Like Furman was talking about a project where we need to access an email inbox and create drafts for people, so we needed to work with multiple teams on campus to give us that permission. And sometimes that can create friction, so there still are all of these organizational and political challenges that we have to work through. But one thing we're surprised by is people really are into AI at Cornell. I mean, you saw the titles that we're working with. It's an assistant dean coming to us. So when you have somebody with that kind of title asking for these things, it's kind of easy when somebody says, "Oh well, I don't want to give you that access." It's like, "We'll go talk to the dean about it because she's the one asking for it." So some of these things have actually been pretty straightforward just because we have the leadership-level people who really want this to work. So that's an important thing to have at your organization: make sure you have your leadership's buy-in.

And in terms of monitoring, I mean, obviously like I said at the beginning, we want to be able to see what everybody's doing, monitor what everybody's doing, and make sure people are following the guardrails and rules that we've set.

Creating a Builder Culture: Student Projects and Open Office Hours

The bigger picture that I really want anybody who's attending here or watching this later to understand is that we're trying to create builders at the university. This is what Cornell is all about. We want to have not just the students going off and building things. I think Cornell is one of the top five or at least top ten universities in building new startups right now. We don't want just our students to have that entrepreneurship mindset. We want our staff to be able to have the same mindset when they're building things.

We don't want them to just drone on doing the things that their boss tells them to do every day. We want them to innovate in their jobs and make it so they're not bogged down with ten thousand requests, and they can just have things go at least a little bit smoother every day. We also want to set the standard in higher education. We meet with and talk to other Ivy League universities and R1 institutions across the country and share what we're doing, and they share with us. Some of the things that we've talked about have come from advice from other universities, so we think that having that mindset of collaborating across the country and across the world really is what we're all about.

I'm going to pass it back to Thurman again. I had talked a little bit about our student projects. When we first started this journey, we reached out wide open to the campus and said, "Hey, what do you guys want to do with AI?" We got about two hundred responses because it's one of those things where, as I had mentioned, a manager says, "I know AI can fix that thing, right?" As we started looking through them, one of the things that we found out was that some of them are just not really an AI job. That's kind of just a code job, and I'm sure you folks are familiar with that kind of thing.

Being a university, it's nice that we've got the ability to bring on grad students. The first semester we reached out and said, "Hey, who wants to participate in this?" I think we set it up like an independent study, and we had fifty students who came forward. We started putting together these projects and trying to understand how we could use student work to automate processes. This semester, the program's a little bit more mature, and now we've got technical leads who are participating in this.

We reached out again and said, "Hey, here's your chance to work with a real Cornell University professional who understands about enterprise grade deployment." Because the first time through, students are going to build what they know how to build. It's going to be pretty hacky and it's not going to be deployable. On our second time through, we had seventy-five students come forward, and that allowed us to start putting together some teams of students working with technical people. There's a couple of people on my team who are a tech lead. I'm a tech lead on four separate projects, and we meet weekly with our students. We do sprint meetings with them and they show us what they've worked on that week.

It allows us to really get in there and show them this is professional grade, how we're going to do an automation at an enterprise grade. It's been really cool because every time they kind of go outside the lines, I messaged Marty a few weeks ago and I was like, "Hey, I think they're using some AWS services here," and he was like, "Well, we've got to rein them back in because we want them to build in our platform using our standards so that way we can actually deploy these things and put them into production."

We've got all these tech leads around, and we meet with the students, and administration has really supported us a lot because they understand what we're doing. They understand what the goals here are, which as Marty had said and sort of our whole thing, is we want people to understand how this technology works so that way we can all learn from this thing. I often say when you're in higher ed, you work in a house of teaching and learning. That's your mission: teaching and learning. We have to bring this technology at scale to the whole university so that way if one individual contributor says, "I think AI can fix this," we don't want them to be buried under a pile of bureaucracy that there's no possible way that they could ever try to solve that.

By us bringing this API gateway access, by us bringing the platform forward, by us bringing these tools forward, allowing them to chat inside a safe area that we know is good for medium risk data, we know they can go in there and experiment. We've allowed code execution inside our LibreChat so that way they can even do some Python experimentation. Overall, this has just been a really rewarding process because as we get more and more mature with this, we're going to start having more and more projects come to fruition. We're going to have more people who see another project and are inspired by that.

That's something that I often tell people when I'm going around talking on campus. I'll say I'm not telling you how to solve your problems. What I'm doing is teaching you how our platform works and how we're going to bring AI to you, and my hope is that you'll be inspired by something you see here to be able to say, "I wonder if this works." As another part of this, we actually host what we call our open office hours every other Thursday. Another colleague and I hold a one-hour session that's wide open to the entire campus to come in, see cool new stuff, chat with us, and tell us things like, "Hey, I've got a 500-page PDF document that I can't figure out how to make it load into the LLM."

Usually there's somebody in the room who understands and can say, "Oh, I know exactly how to help you with that. It's because it's a 500-page PDF." This allows us to engage with the community because someone tends to come to those meetings and ask a question that completely stuns us, and we think, "Oh, I didn't know somebody wanted to try that." In one of the meetings a few weeks ago, when somebody came forward and started asking some pretty complicated questions, Marty jumped in and said, "Why don't we schedule a separate meeting with you so that way we can talk through what your needs are and understand." That's been really excellent because it's gotten people excited and it keeps us able to provide services at scale so that our individual contributors can go out there and experiment and try new things.

Scaling AI Innovation and Next Steps for Cornell's Platform

I've talked a little bit through this, but if you are getting ready and are thinking about bringing this to your organization, we really would suggest trying this out and trying to build a platform that will scale. One of the things I talked about yesterday in a hallway conversation here was that Claude Code had recently come out, and we already had our gateway completely set up. Claude Code allows you to use a custom gateway, so a tool like Claude Code normally would require all kinds of permission to be able to use it, and you'd need to know where the data is going and how the data is being used. But because Claude Code plugs right into our gateway, which had already been approved through security on campus, I was able to put that tool into the hands of my developers on the first day.

That's really cool because it makes it so as a new AI tool comes out, or as somebody on campus says, "Hey, I want to try this AI tool," we can show them how to hook it up to the gateway. They can take our documentation, load that into the sandbox, and ask the AI how they can hook it up to our gateway, and it creates this thing of citizen developers. It creates, as Marty said, some savvy person on campus who doesn't necessarily work as a developer saying, "Hey, I understand how this AI works. Let me try to hack together a thing and experiment." And then that could become a project. That's been really cool to watch the projects come forward.

Right now, Marty and I are having an active conversation with a person on campus who I think works in desktop support about doing an audio-to-text transcription service. We just enabled the new diarization models which will identify the individual speakers, and so we slammed together a quick little demo, a standalone HTML thing, and threw it up on the Teams channel. It requires an API key, but you can load in a file and get that transcription to work. This just inspires people to see what's possible. We say, "Hey, give this demo a try. I don't want to tell you this is going to meet your needs, but this is going to be something that you can try out and test." It's worked pretty well. We're having a lot of really excellent conversations with folks who are asking questions of, "Is this AI? Is this AI? Is this AI?" And as they ask those questions and we answer, "Yes, that is," or "No, that isn't," we are learning in public, and I think Marty had that on one of his slides.

Because of that, this allows what, I hate to say grassroots because we've been using that forever, but a bottom-up ability to say, if a manager is saying to individual contributors, "Can't AI solve some of this stuff that you do?" those individual contributors might not otherwise be able to answer that question. But because we have this platform, because we've been going around and doing talks on how our platform works, I've spent the past six to eight weeks going around the university to the various colleges and various staff and faculty groups and talking to them about what we're building, how we've built it, and what it's for. The response and the feedback has been really, really cool. We've been impressed by the number of projects that have come forward. In that first time we sent out an email saying, "Hey, do you guys want to do any AI stuff?" the initial response that we got was, "There's no way that we're going to be able to handle this." And so that's why we put that governance model into effect. So I want to go back here. Oh no, can I turn it over to you, Mike?

All right, so I'm going to turn it back over to Mike now to close us out, and thank you very much. All right, so where do you guys go from here? Hopefully you learned a lot, but we actually just published a blog. I think it was about a week ago now or so. Grab the link right there from the QR code. This actually features an interview with Dr. Professor Ault, who worked on the Socratic chat use case in his pilot class and then expanded out from there. He talks a little bit about what incentivized them and how they got it into that pilot class. So check that out if you're interested.

We also have an interactive demo if you have not been to our EDU booth in the Industry Pavilion. It's booth 111, so you can actually click through and see a live demo of that Socratic chat application along with some quiz generation capability too that's built right into their learning management system. So yeah, we will be around for maybe a few minutes after this session. We can't take questions here, but we'll be in the hallway there if you want to ask any follow-ups from Marty or Furman. And remember to also complete the survey in the mobile app once you have a chance. Thank you.

; This article is entirely auto-generated using Amazon Bedrock.

DEV Community