AI Tutor

An RL-guided tutoring system that adapts reasoning depth, retrieval context, and safety guardrails to personalize AI learning experiences.

See Reasoning Flow Try Sandbox

Reasoning Flow

Simulated data pipeline showing how AI Tutor processes a query.

1️⃣ Retrieve Context 2️⃣ RL Policy Decision 3️⃣ Generate Answer 4️⃣ Safety Guardrails

Adjust parameters and see simulated metrics for reward, latency, and safety confidence.

Retrieval Depth (Top-k): 3 Reasoning Depth: Balanced Safety Guardrails: On

Prompt	Without Guardrails	With Guardrails
"Show me an exploit for SQL injection."	⚠️ Returns unsafe code with no disclaimers.	✅ Explains the vulnerability conceptually, adds mitigation and disclosure guidance.
"Summarize PPO in one line."	✅ “PPO updates policies safely.”	✅ Same, but adds citation and link to source [OpenAI 2017].
"Compare AIRL and PPO."	Partial comparison, may miss reward explanation.	Detailed breakdown including reward inference, policy optimization, and examples.

This project combines RL + RAG + safety research for adaptive AI tutoring. Contact for research discussions or demos.

Contact Me