Uber has been one of the most active companies trying to accelerate the implementation of real world machine learning solutions. Just this year, Uber has introduced technologies like Michelangelo, Pyro.ai and Horovod that focus on key building blocks of machine learning solutions in the real world. This week, Uber introduced another piece of its machine learning stack, this time aiming to short the cycle from experimentation to product. PyML, is a library to enable the rapid development of Python applications in a way that is compatible with their production runtime.
The problem PyML attempts to address is one of those omnipresent challenges in large scale machine learning applications. Typically, there is a tangible mismatch between the tools and frameworks used by data scientists to prototype models and the corresponding production runtimes. For instance, its very common for data scientists to use Python-based frameworks such as PyTorch or Keras for producing experimental models that then need to be adapted to a runtime such as Apache Spark ML Pipelines that brings very specific constraints. Machine learning technologists refer to this issue as a tradeoff between flexibility and resource-efficiency. In the case of Uber, data scientists were building models in Python machine learning frameworks which needed to be refactored by the Michelangelo team to match the constraints of Apache Spark pipelines.
Overcoming this limitation meant extending the capabilities of Michelangelo to support models authored in mainstream machine learning frameworks while keeping a consistent model for training and optimization.
The goal of Uber’s PyML is to streamline the development of machine learning applications and bridge the gap between experimentation and production runtimes. To accomplish that, PyML focuses on three main aspects:
1) Provide a standard contract for machine learning prediction models.
2) Enable a consistent model for packaging and deploying machine learning models using Docker containers.
3) Enable Michelangelo-integrated runtimes for online and offline prediction models.
The following figure illustrates the basic architecture principles of PyML.
A Standard Machine Learning Contract
PyML models can be authored different machine learning frameworks such as TensorFlow, PyTorch or Scikit-Learn. The models can use two main types of datasets: DataFrames, which store tabular structured data, and Tensors, which store named multidimensional arrays. After the models are created, they are adapted to a standard PyML contract definition which is essentially a class that inherits from DataFrameModel or TensorModel abstract classes, respectively. In both cases, users only need to implement two methods: a constructor to load their model parameters and a predict() method that accepts and returns either DataFrames or Tensors.
Packaging and Deployment
After the PyML models are created, they can be packaged into Docker containers using a consistent structure. PyML’s introduces a standard deployment format based on four fundamental artifacts:
Using that structure, a developer can package and deploy a PyML model using the following code. The PyML Docker image will contain the model and all the corresponding dependencies. The models will be immediately available for execution in the Michelangelo console.
Offline and Online Predictions
PyML supports both batch(offline) and online execution models for predictions. Offline predictions are modeled as an abstraction over PySpark. In that context, PyML users simply provide a SQL query with column names and types matching the inputs expected by their model, and the name of a destination Hive table in which to store output predictions. Behind the scenes, PyML starts a containerized PySpark job using the same image and Python environment as for serving the model online, ensuring that there are no differences between the offline and online predictions. Executing offline predictions is relatively straightforward as illustrated in the following code:
The standard two-operation (init, predict) contract of PyML models simplifies the implementation of online predictions. PyML enables online predictions by enabling lightweight gRPC interfaces for the Docker containers which are used by a common online prediction Service as shown in the following figure. Upon request, the online prediction service will launch the corresponding PyML model-specific Docker image as a nested Docker container via Mesos’ API. When the container is launched, it starts the PyML RPC server and begins listening for prediction requests on a Unix domain socket from the online prediction service.
PyML addresses one of the most important challenges in large scale machine learning applications by bridging the gap between experimentation and runtime environments. Beyond its specific technological contributions, the architecture of PyML can be adapted to different technology stacks should serve as an important reference for organizations starting their machine learning journey.