Over the past few years, machine learning has revolutionized fields such as computer vision, natural language processing, and speech recognition. Much of this success is based on collecting vast amounts of data, often in privacy-invasive ways. Federated Learning is a new subfield of machine learning that allows training models without collecting the data itself. Instead of sharing data, users collaboratively train a model by only sending weight updates to a server. While this better respects privacy and is more flexible in some situations, it does come at a cost. Naively implementing the concept scales poorly when applied to models with millions of parameters. To make Federated Learning feasible, this thesis proposes changes to the optimization process and explains how dedicated compression methods can be employed. With the use of Differential Privacy techniques, it can be ensured that sending weight updates does not leak significant information about individuals. Furthermore, strategies for additionally personalizing models locally are proposed. To empirically evaluate Federated Learning, a large-scale system was implemented for Mozilla Firefox. 360,000 users helped to train and evaluate a model that aims to improve search results in the Firefox URL bar.