Agentic AI and Search Agents

M.Shahmir Khan Afridi — Tue, 10 Mar 2026 16:36:39 +0000

— okay this was actually kind of intriguing

so I have been putting this off for like three days. eventually sat down with both papers last night and I am going to be honest the first one( the MDPI one) took me ever to get through because the preface is just thick. like they define the same thing four different ways before actually moving on. but formerly I got past that it was fine.

I used NotebookLM to induce a summary before reading which actually I do for utmost heavy papers now. the summary was decent but I will talk about that further at the end because I actually have studies on it.

anyway. agentic AI.

the introductory idea and why it's different from normal AI

okay so in our AI class we have talked a lot about agents — how they perceive effects, make opinions, take conduct. the whole sense- think- act thing. what these papers are describing is principally that but taken way further than the text interpretation.

normal LLMs( like the bones we have bandied in class) are reactive. you put commodity in, commodity comes out, done. agentic AI is different because the system can actually pursue a thing over multiple way without you holding its hand the whole time. it setssub-goals on its own, picks tools to use, executes conduct, checks the results, adjusts. the whole thing runs with minimum mortal involvement.

which sounds simple when I write it like that but the factual perpetration is way more complicated and that is kind of what both papers are about.

the MDPI paper is further of a big picture review — delineations, fabrics, infrastructures, operations, challenges. the arXiv one( 2508.05668) is more focused specifically on hunt agents and gets into training methodology and marks. they lap a lot but they are doing different effects and I suppose my original LLM summary kind of missed that distinction which I will get to.

infrastructures — the part I actually set up intriguing

the MDPI paper goes through several architectural models and some of them connected really well to stuff we have covered.

the ReAct model is principally a circle. reason, act, observe what happed, reason again. it's fast and works well for simpler tasks. reminded me of the introductory agent cycle from class actually, like it's the most direct perpetration of sense- think- act you can get.

also there is the Hierarchical/ Supervisor model which is more intriguing to me. you have a main agent at the top that breaks a problem into pieces and hands them off to technicalsub-agents under. so like if the task is" write a request exploration report" the administrator does not do all of it — it delegates the web searching to one agent, the data analysis to another, the jotting to another. this maps ontomulti-agent systems which we touched on in class and makes a lot further sense to me now as an factual practical thing rather than just a conception.

the BDI armature was the bone

         I kept coming back to. Belief Desire Intention. we mentioned this in class briefly and I actually did not completely get it at the time but reading it in  environment helped. the point is that you can look at the agent's beliefs( what it thinks is true about the world), its  solicitations( what it wants to achieve), and its intentions( what it's committed to doing right now) and actually trace why it made a decision. which is a big deal for responsibility and  translucency — two  effects we have talked about a lot in the AI ethics portions of the course. like yes the agent is  independent but at least you can  review it.

there's also Concentrated Neuro- Symbolic design but I will be honest that section was harder for me to completely grasp. the introductory idea is combining neural networks( good at perception and pattern matching, bad at being resolvable) with emblematic sense( good at structured logic, traceable). the paper argues you need both. which connects to commodity we bandied beforehand in the semester about whether deep literacy alone is sufficient or whether you need emblematic factors on top — and the answer these papers feel to give is principally" you need both and then is how to combine them."

hunt agents specifically — this is where the alternate paper comes in

okay so the arXiv paper is specifically about what they call" deep hunt agents" which are a technical type of agentic system concentrated on information reclamation. and not just like, googling commodity — these agents control the entire reclamation process. web hunt, private databases, internal memory, all of it. they decide what to search for, read the results, decide what to search for coming grounded on what they set up, and keep going until they have actually answered the question duly.

the paper breaks down hunt into three structures and this part connected really directly to information reclamation generalities from class

resemblant hunt you putrefy the query into multiplesub-queries and run them all at the same time. good for breadth and effectiveness.

successional/ iterative hunt you run a circle. hunt, read what you got, reflect, decide what to search next. the coming hunt depends on what the former one returned. this is actually how I probe effects when I actually watch about the answer.

mongrel tree or graph- grounded — this bone

         is the most complex. the agent can explore multiple hunt paths, annul if  commodity is not working, and revise its whole strategymid-task. countermanding is  commodity we have seen in AI hunt algorithms( like in the hunt and problem  working unit) and seeing it applied to information  reclamation made it click for me as a natural extension of the same idea.

training and optimization the specialized stuff

this is the section I had to read doubly. the arXiv paper goes into methodology duly which I appreciated indeed though it was thick.

the introductory training channel is Supervised Fine- Tuning first — you show the model exemplifications of good logic and search circles so it learns what" doing it right" looks like. also underpinning Learning on top of that, specifically commodity called RLVR( underpinning literacy with empirical prices) where the agent gets price signals grounded on whether its labors are actually correct and empirical rather than just presumptive- sounding.

the price functions aremulti-objective which means they are balancing multiple effects at formerly — answer correctness, how effective the reclamation was, quality of substantiation, and penalties for spare quests or labors that are longer than they need to be. that last bone

         is  intriguing to me because it means the system is being trained to be  terse which is n't  commodity I anticipated.

there's also this conception called spanning up test- time hunt — giving the model more computational coffers at conclusion time( when it's actually being used) rather than just at training time. the argument is that further thinking time at the moment of use can ameliorate logic quality. this feels counterintuitive coming from the standard" bigger training = better model" supposition but the paper makes a reasonable case for it.

marks used include FRAMES, GAIA, and HotpotQA. delicacy in technical settings reportedly goes over 94 which sounds great but the paper is enough honest that this does not always transfer to messier real world surroundings.

operations

healthcare, finance, legal exploration, automated reporting the operations section is broad and actually I skimmed corridor of it because it started to feel like a list. the bone

         that stood out to me was Deep Research where an agent runs an extended  independent  exploration session across multiple sources and produces a full report at the end. I have actually used this  point in a couple of AI tools  ahead and understanding the armature underneath it now( the iterative hunt  circles, the tool selection, the  price shaping during training) makes it feel less like a black box.

there's also this thing where agents use hunt to ameliorate their own internal capabilities — navigating memory, opting the right tools, reacquiring once gests to reason more in the future. it's kind of recursive in a way that I am not sure I completely understand yet but the conception is intriguing.

challenges and limitations

both papers devote decent space to what is still broken and I suppose this section is actually important to not skip.

the fineness problem — these agents perform well in controlled settings and also degrade when commodity unanticipated happens. this is a real deployment problem. like you can not put a system in a sanitarium or a bank that works 94 of the time in the lab and also falls piecemeal when the data looks slightly different from what it was trained on.

responsibility is still authentically unresolved.However, fraudulent fiscal action, whatever — who's responsible, if an independent agent makes a bad decision — wrong medical recommendation. the inventor? the company planting it? the stoner who set the thing? the paper raises this and does not completely answer it because the field has n't. this connects to the AI ethics stuff we have covered and is actually one of those questions that makes me suppose the specialized progress is ahead of the governance progress by a significant periphery.

security — inimical attacks, data poisoning, unverifiable tool use. these are real vulnerabilities that get more serious as agents get further independent. not academic .

NotebookLM experience — honest reflection

okay so I loaded both papers into NotebookLM and had it induce a summary before I read them duly. and the summary was fine. accurate at a face position, got the main generalities right, named the infrastructures.

but then is the thing I actually noticed. the summary treated both papers as principally the same thing — like one unified document about agentic AI. and they are not. the MDPI paper is doing broad taxonomic review work, the arXiv paper is doing focused specialized check work on a specific subtype of system. that distinction matters for understanding what each paper is actually contributing. the LLM summary smoothed it.

the other thing is that summaries give everything roughly equal weight. the training methodology section in the arXiv paper — the RLVR approach, the test- time cipher scaling, themulti-objective price functions got epitomized in like two rulings. but that section is actually one of the further technically significant corridor of the paper for understanding why these systems perform the way they do. you'd noway know that from the summary.

so my honest take is that using the LLM summary as a starting point was useful for exposure — I knew what motifs were coming before I hit them in the factual textbook. but it was not a relief for reading because it could not tell me which corridor actually signified versus which corridor were background environment. that judgment still needed going back to the source. the chart is n't the home and all that.

final studies

I suppose the thing that stuck with me from both papers is that the gap between" AI that answers questions" and" AI that pursues pretensions" is bigger than it sounds. technically, conceptually, and in terms of what it means for responsibility and safety. the course material gave me the vocabulary to actually engage with what these papers are describing — agent infrastructures, search structures,multi-agent collaboration, the ethics of independent systems — which made reading them feel less like decoding and more like connecting blotches.

still suppose the responsibility question is going to be the hardest one to break. the specialized problems feel soluble with further exploration. the governance problems feel like they bear commodity further than exploration.

anyway that is presumably enough. it's late.

Mention: @raqeeb_26

DEV Community: M.Shahmir Khan Afridi

Agentic AI and Search Agents