The RoboFish project investigates interactions of swarm agents by using a robot fish to interact with real fish. This work explores the use of Deep Reinforcement Learning to train a robot policy that is able to lead real agents. To this end, a simulation with handcrafted swarm agents as well as an appropriate observation and action space for the robot is set up. Then, Neural Networks with Convolutional layers and Long-Short Term Memory cells are trained to maximize reward via Proximal Policy Optimization. The reward function is defined such that it measures the ability to make agents follow the robot to different places in the environment. Domain Randomization is used on the simulation in order to obtain a policy that is robust against different agent dynamics. The method is evaluated on its ability to lead one agent in a randomized simulation where interestingly, the robot seems to display a form of meta learning by being able to dynamically adapt its policy by ad-hoc inference about the agent dynamics. Its success in randomized simulations provides reason to be optimistic about a transfer from simulated to real environment in future work.