This project implements a Tool-Integrated Reasoning agent that can solve complex problems through multi-turn tool calls during mathematical reasoning. The agent uses tools such as Python code execution and mathematical calculations, and is trained end-to-end through reinforcement learning.
- Function: Core workflow that manages multi-turn reasoning processes
- Key Features:
- Inherits from AReaL's
RolloutWorkflowbase class - Supports multi-turn tool calling reasoning processes
- Implements streaming generation and tool call detection
- Integrates reward function calculation
- Inherits from AReaL's
- Function: Tool manager responsible for coordinating tool calls
- Supported Tools:
- Python Executor: Executes Python code for mathematical calculations
- Calculator: Basic mathematical operations
- Key Features:
- Tool registration and routing mechanism
- Secure code execution environment
- Unified tool calling interface
- BaseTool (
tools/base.py): Tool base class that defines tool interfaces - PythonTool (
tools/python_tool.py): Python code execution tool - CalculatorTool (
tools/calculator_tool.py): Mathematical calculation tool
- Function: Complete training pipeline implementation
- Features:
- Integrates AReaL's GRPO training framework
- Supports distributed training
Execute Python code
# Initialize the count of concave numbers
count = 0
# Iterate over all possible values for A (hundreds place)
for A in range(2, 10):
# For each A, iterate over all possible values for B (tens place)
for B in range(0, A):
# For each B, iterate over all possible values for C (ones place)
for C in range(B + 1, A):
# Increment the count for each valid concave number
count += 1
# The final count of distinct three-digit concave numbers
print(count)output: 120Mathematical calculation
<calculator>1 + 2 * 3</calculator>
output: 6Main process:
- Model pauses when generating to tool call markers
- Detect and parse tool call content
- Execute tools in a secure environment
- Integrate tool results into the conversation
- Continue generating subsequent content
Core logic can be found in examples/tir/tir_workflow.py
- Follow
docs/tutorial/installation.mdfor basic environment installation qwen_agentis included in the main AReaL dependencies
The project uses mathematical reasoning datasets. Using the ToRL dataset as an example, the data format is as follows:
{"messages": [{"role": "user", "content": "What is 15 + 27?"}], "answer": "42"}
{"messages": [{"role": "user", "content": "Calculate 3 * 4 + 2 * 5"}], "answer": "22"}Edit the tir_config.yaml configuration file:
# Model configuration
actor:
path: /path/to/your/model
dtype: bfloat16
# Dataset configuration
train_dataset:
path: /path/to/train/data.parquet
batch_size: 64
valid_dataset:
path: /path/to/valid/data.parquet
batch_size: 64
# TIR related configuration
tir:
max_turns: 2
max_length: 3000
tool_timeout: 30
enable_tools: python;calculatorSingle-machine Multi-GPU Training
python3 examples/tir/train_tir.py \
--config examples/tir/tir_config.yaml \
scheduler.type=localMulti-machine Multi-GPU Training
TODO
- Training used Qwen2.5-Math-1.5B as the base model.
- Reward is based solely on whether the result is correct.
- Training prompts reference ToRL, only hinting that
the model can use programming tools. See
examples/tir/prompts.pyfor details.
Key metric changes during training:
-
Reward Curve:
grpo_actor/task_reward
The yellow line shows TIR's reward, demonstrating approximately 15% accuracy advantage over pure GRPO training.
-
Tool Usage Frequency:
Changes in tool call count and success rate. As training progresses, the number of tool calls per answer increases from 0.9 to 1.2, while tool call success rate shows no significant change.
examples/tir/
├── README.md # Project documentation
├── tir_workflow.py # Core workflow implementation
├── tool_manager.py # Tool manager
├── tir_config.yaml # Configuration file
├── train_tir.py # Training script
├── test_tir.py # Test script
├── tools/ # Tool implementations
│ ├── __init__.py
│ ├── base.py # Tool base class
│ ├── python_tool.py # Python executor
│ └── calculator_tool.py # Calculator
├── data/ # Data files
│ └── sample_math.jsonl # Sample data
└── utils/ # Utility functions
└── __init__.py
- Evaluation script
- Support for asynchronous tool calls
- Support for multi-machine training
- Fine-tuning, provide prompt templates for instruction models