feat: Implement Intent Routing for Smart RAG Bypassing#65
feat: Implement Intent Routing for Smart RAG Bypassing#65Sudhanshu-NITR wants to merge 3 commits intosugarlabs:mainfrom
Conversation
Prior to this fix, the application bypassed quota incrementation for brand new API key requests. This assigns the dictionary and allows it to fall through to the proper count increment logic.
Implement an intent router in prompts.py that intercepts queries and replies exactly with 'TECHNICAL' for coding questions. Update RAGAgent.run in ai.py to check this intent prompt first, allowing non-technical questions (like greetings) to bypass the RAG search pipeline.
There was a problem hiding this comment.
Pull request overview
Adds intent-based routing so Sugar-AI can respond conversationally to greetings/off-topic inputs while preserving the existing RAG pipeline for technical questions, and fixes daily quota counting for first-time API keys.
Changes:
- Fix
check_quota()so the first request for a new API key is counted toward the daily quota. - Introduce
INTENT_ROUTER_PROMPTused to classify inputs as technical vs. friendly/off-topic. - Update
RAGAgent.run()to perform an intent check via chat-completions before running the RAG chain.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| app/routes/api.py | Fixes quota accounting by ensuring new API keys still go through the normal count increment logic. |
| app/prompts.py | Adds a new prompt used to route between “technical RAG” vs “friendly reply” behavior. |
| app/ai.py | Adds an intent-check step to run() using run_chat_completion() before executing the RAG pipeline. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| # Check intent first using proper chat formatting | ||
| messages = [ | ||
| {"role": "system", "content": prompts.INTENT_ROUTER_PROMPT}, | ||
| {"role": "user", "content": question} | ||
| ] |
There was a problem hiding this comment.
Intent routing calls run_chat_completion with the default generation settings (temperature=0.7, max_length=1024). For a classifier-style prompt that must output exactly "TECHNICAL" or a short reply, sampling + large max_length can make the output non-deterministic and unnecessarily slow/costly. Consider forcing deterministic settings (e.g., temperature=0 / do_sample=False) and a very small token/length budget for this intent check.
|
|
||
| intent_response = self.run_chat_completion(messages) | ||
| intent_text = str(intent_response).strip() | ||
|
|
There was a problem hiding this comment.
By making RAGAgent.run() always call run_chat_completion(), the /ask path now depends on the loaded model's tokenizer having a valid chat template. If the model is changed (e.g., via /change-model) to one without a chat template, apply_chat_template will raise and break all /ask requests. Consider adding a fallback path for intent routing when the tokenizer lacks a chat template (or skipping intent routing / using a plain prompt in that case) so the endpoint degrades gracefully.
| intent_response = self.run_chat_completion(messages) | |
| intent_text = str(intent_response).strip() | |
| try: | |
| intent_response = self.run_chat_completion(messages) | |
| intent_text = str(intent_response).strip() | |
| except Exception: | |
| # Fallback: if chat-style intent routing fails (e.g., no chat template), | |
| # treat the question as technical so we still run the RAG pipeline. | |
| intent_text = "TECHNICAL_FALLBACK" | |
Description
This PR introduces an intent routing system designed to differentiate between technical coding questions and conversational/off-topic interactions (such as greetings).
Previously, every query was sent through the RAG pipeline and evaluated against the internal context. Now, we perform an intention check before triggering RAG, allowing Sugar-AI to respond appropriately to greetings in a kid-friendly manner without initiating unnecessary document retrieval processes.
Why does this matter?
Screenshots / Testing