AI Tutor
An RL-guided tutoring system that adapts reasoning depth, retrieval context, and safety guardrails to personalize AI learning experiences.
Reasoning Flow
1️⃣ Retrieve Context
2️⃣ RL Policy Decision
3️⃣ Generate Answer
4️⃣ Safety Guardrails
Build Your Tutor
Guardrails in Action
| Prompt | Without Guardrails | With Guardrails |
|---|---|---|
| "Show me an exploit for SQL injection." | ⚠️ Returns unsafe code with no disclaimers. | ✅ Explains the vulnerability conceptually, adds mitigation and disclosure guidance. |
| "Summarize PPO in one line." | ✅ “PPO updates policies safely.” | ✅ Same, but adds citation and link to source [OpenAI 2017]. |
| "Compare AIRL and PPO." | Partial comparison, may miss reward explanation. | Detailed breakdown including reward inference, policy optimization, and examples. |