HFO-Trainer是如何写成的
HFO trainer
Overview of Half Field Offense environment
Half Field Offense environment is a branch of 2D simulation which pay attention to the offense-defense situations within the half field. It’s established around a rcssserver (with version 15.2) addition to a HFO referee. Python scripts and other tools are involved too. Users can connect to it with their own agents and practice them. Its processes are like this:
start a rcssserver with python script, exhibit the connection parameter for agents and wait for enough agents connect to it.
when enough agents are connected, automatically start the training.
place ball and player in the half field randomly(but reasonably)
change playmode to playon, and offense player(s) take a try to make a score
when satisfy the conditions like
- ball is out of bound
- ball is out of HFO area(half field)
- scores
- ball is caught by defense player(s)
the HFO system will judge the condition and output the result of this episode.
goto step 3
Since it’s a complete and high-available system, we noticed some shortcomings.
- It’s highly integrated with practical version of rcssserver, especially for its core module: the HFO referee. That makes this project hard to move forward. When rcssserver upgrades to version16, a lot of new characteristics can not be applied on old HFO environment.
- Conflict between official rcssserver. Since its rcssserver has been modified and add some new variable to config files, but it’s still read config from default directory which places the default config files generated by official rcssserver. We learned to use parameter ‘include=file’ to solve it afterwards but this did trap us for days.
- Continue from last point, we noticed that the last commit was in two years ago and a lot of updates are missed and we can not take care of it with ourselves.
When we were reading official websites of rcsoccersim, we noticed trainer are highly recommend to train your own agent. After reading the source code of HFO referee, we realized that referee and trainer can be transformed to each other easily. So this how we start HFO trainer.
Migration to a RCSS Server Trainer
We start with the example of HELIOS BASE. The table below shows how we deal with the difference between referee and trainer.
Referee | Trainer | Notes |
---|---|---|
HFO referee | HFO trainer | referee is a module of rcssserver, trainer is a isolate but complete executable binary |
command line parameters | HFO parameter | HFO parameters are around the whole rcssserver, trainer’s are in same place |
HFO referee in HFO system is similar to time referee and other referee. It works as a module of the rcssserver system. But for trainer, HFO function has to be called directly by main function. So we first extract class ‘HFOReferee’ and migrate it with ‘sample trainer’, then rename it to ‘HFO trainer’. Inheritance relationship with base class ‘Referee’ was cut off but necessary member are involved in new class ‘HFO trainer’. After that, we move the HFO parameter to a new class named ‘HFOParam’. Finally we add necessary parameter parse command to main function and make sure its work well.
We finished approximately 80% of the work in HFO system with steps above. But there remains some problems while trainer has priority than player instead of having all privilege rcssserver have like start a player with particular uniform number. This may cause problems for teams who make decisions due to their agents’ uniform number. So we applied bash scripts to finished the rest of work. The script below shows how we fix the problem of starting agents with particular uniform number(7 and 11).
rcssserver server::coach=on &>/dev/null &
./Apollo2D_hfo_trainer &>/dev/null &
sleep 1
i=0
while (($i < 11)); do
./Apollo2D_player --config_dir=./formations-dt &>/dev/null &
pid=$!
i=$(expr $i + 1)
sleep 1
if [[ $i == 7 || $i == 11 ]]; then
continue
fi
kill $pid
done
Application and future work
more efficient HFO action
A big difference between HFO system and official rcssserver is that HFO system provides some basic actions so users can pay more attention on how to make decisions instead of getting caught by coding actions with abstract models.
We also provides some atomic actions similar to the HFO system’s, but ours are optimized and verified to be more efficient. The diagram below shows cycles spent on catching ball between them.
HFO with more conditions
The original HFO system defined a lot of constants like HFO area and others in the modified rcssserver. After transforming them into a trainer, it’s much easier to modified them and we also add more variables to make it more flexible. For example, we can make offense player start a game from backfield to play a counter-attack now.
HFO for beginners
For beginners who start a trip of 2D simulation, they need time to understand how rcssserver works. So we apply the HFO trainer on teaching beginners. The rookies this year(2022) of Apollo2D are taught to write a player that can get a moving ball as rapidly as they can. We applied HFO trainer on it and added a condition about getting the ball. Their recently grades are like below
About future
After finishing the job of HFO trainer, we reviewed the HFO system. We realized it’s might be another project like Google football. It tell us to focus on decisions in multi-agents problem, but we believe the actions are equivalent important to decisions. As for situation in offense-defense part of the HFO game, HFO trainer will help us take a close look on its nature.