Skip to content

DEV Community

Self-Distilled Agentic RL on AWS Series' Articles

Back to Shoaibali Mir's Series

Cover image for Your RL Agent Failed a 12-Step Task. Which Step Was Wrong? (The Supervision Problem in Agentic RL)

May 31

Your RL Agent Failed a 12-Step Task. Which Step Was Wrong? (The Supervision Problem in Agentic RL)

#machinelearning #reinforcementlearning #llm #aws

5 min read

Cover image for Four Models in One Training Loop: Architecting SDAR on AWS (Before Renting a Single GPU)

Jun 6

Four Models in One Training Loop: Architecting SDAR on AWS (Before Renting a Single GPU)

#aws #machinelearning #reinforcementlearning #mlops

5 min read

Cover image for The Whole Paper Fits in One Sigmoid: Implementing the SDAR Gate

Jun 14

The Whole Paper Fits in One Sigmoid: Implementing the SDAR Gate

#machinelearning #reinforcementlearning #python #aws

5 min read

Cover image for The ~+9.4% You Can't Afford to Verify: Evaluating SDAR (and the FinOps of Trying)

Jul 18

The ~+9.4% You Can't Afford to Verify: Evaluating SDAR (and the FinOps of Trying)

#aws #machinelearning #mlops #reinforcementlearning

6 min read