-
Notifications
You must be signed in to change notification settings - Fork 47
Description
Bug Description
Every API request (/ask, /ask-llm, /ask-llm-prompted) produces
this warning in the server logs:
Both `max_new_tokens` (=1024) and `max_length`(=20) seem to
have been set. `max_new_tokens` will take precedence. Please
refer to the documentation for more information.
Environment
- transformers: 5.3.0
- torch: 2.10.0
- Python: 3.13
- OS: Windows 11
Root Cause Analysis
The current requirements.txt specifies:
transformers>=4.45.2
This allows transformers 5.x to be installed, which introduced
stricter generation config handling.
When transformers 5.x is installed, it detects TWO sources of
generation config conflicting:
Source 1 — model's downloaded generation_config.json:
{
"top_k": 20,
"temperature": 0.7,
"top_p": 0.8,
"repetition_penalty": 1.1
}Source 2 — pipeline parameters in app/ai.py:
self.model = pipeline(
"text-generation",
model=model,
max_new_tokens=1024,
truncation=True,
...
)transformers 4.x silently merged these two configs with no
warning. transformers 5.x is strict about this and raises
the warning on every single request.
Steps to Reproduce
- Clone the repo
- Install dependencies — transformers 5.3.0 gets installed
- Run
uvicorn main:app --host 0.0.0.0 --port 8000 --reload - Call
POST /askwith any question - Warning appears twice in server logs for every request
Proposed Fix
Pin transformers below 5.0.0 in requirements.txt until the
codebase is updated to handle transformers 5.x config behavior:
# Before
transformers>=4.45.2
# After
transformers>=4.45.2,<5.0.0
This is a one-line change that prevents other contributors from
hitting the same issue while a proper transformers 5.x
compatibility update is planned.
References
- HuggingFace transformers 5.x migration guide
app/ai.pypipeline definitions (lines 59-80)requirements.txtline 1