Skip to content

feat: Implement Intent Routing for Smart RAG Bypassing#65

Open
Sudhanshu-NITR wants to merge 3 commits intosugarlabs:mainfrom
Sudhanshu-NITR:feat/intent-routing
Open

feat: Implement Intent Routing for Smart RAG Bypassing#65
Sudhanshu-NITR wants to merge 3 commits intosugarlabs:mainfrom
Sudhanshu-NITR:feat/intent-routing

Conversation

@Sudhanshu-NITR
Copy link

Description

This PR introduces an intent routing system designed to differentiate between technical coding questions and conversational/off-topic interactions (such as greetings).

Previously, every query was sent through the RAG pipeline and evaluated against the internal context. Now, we perform an intention check before triggering RAG, allowing Sugar-AI to respond appropriately to greetings in a kid-friendly manner without initiating unnecessary document retrieval processes.

Why does this matter?

  • Better User Experience: Provides natural and engaging responses when users say "Hi" or ask simple, non-technical questions.
  • Efficiency & Speed: Reduces overhead by avoiding expensive vector search and context-matching for meaningless or off-topic questions.
  • Accuracy: Focuses the RAG capabilities purely on technical problems, reducing confusing outputs caused by retrieving irrelevant context for basic greetings.

Screenshots / Testing

image image

Prior to this fix, the application bypassed quota incrementation
for brand new API key requests. This assigns the dictionary and
allows it to fall through to the proper count increment logic.
Implement an intent router in prompts.py that intercepts queries and
replies exactly with 'TECHNICAL' for coding questions.
Update RAGAgent.run in ai.py to check this intent prompt first,
allowing non-technical questions (like greetings) to bypass the
RAG search pipeline.
Copilot AI review requested due to automatic review settings March 10, 2026 21:17
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds intent-based routing so Sugar-AI can respond conversationally to greetings/off-topic inputs while preserving the existing RAG pipeline for technical questions, and fixes daily quota counting for first-time API keys.

Changes:

  • Fix check_quota() so the first request for a new API key is counted toward the daily quota.
  • Introduce INTENT_ROUTER_PROMPT used to classify inputs as technical vs. friendly/off-topic.
  • Update RAGAgent.run() to perform an intent check via chat-completions before running the RAG chain.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
app/routes/api.py Fixes quota accounting by ensuring new API keys still go through the normal count increment logic.
app/prompts.py Adds a new prompt used to route between “technical RAG” vs “friendly reply” behavior.
app/ai.py Adds an intent-check step to run() using run_chat_completion() before executing the RAG pipeline.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +180 to +184
# Check intent first using proper chat formatting
messages = [
{"role": "system", "content": prompts.INTENT_ROUTER_PROMPT},
{"role": "user", "content": question}
]
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intent routing calls run_chat_completion with the default generation settings (temperature=0.7, max_length=1024). For a classifier-style prompt that must output exactly "TECHNICAL" or a short reply, sampling + large max_length can make the output non-deterministic and unnecessarily slow/costly. Consider forcing deterministic settings (e.g., temperature=0 / do_sample=False) and a very small token/length budget for this intent check.

Copilot uses AI. Check for mistakes.
Comment on lines +185 to +188

intent_response = self.run_chat_completion(messages)
intent_text = str(intent_response).strip()

Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By making RAGAgent.run() always call run_chat_completion(), the /ask path now depends on the loaded model's tokenizer having a valid chat template. If the model is changed (e.g., via /change-model) to one without a chat template, apply_chat_template will raise and break all /ask requests. Consider adding a fallback path for intent routing when the tokenizer lacks a chat template (or skipping intent routing / using a plain prompt in that case) so the endpoint degrades gracefully.

Suggested change
intent_response = self.run_chat_completion(messages)
intent_text = str(intent_response).strip()
try:
intent_response = self.run_chat_completion(messages)
intent_text = str(intent_response).strip()
except Exception:
# Fallback: if chat-style intent routing fails (e.g., no chat template),
# treat the question as technical so we still run the RAG pipeline.
intent_text = "TECHNICAL_FALLBACK"

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants