Microsoft and deep learning !

Author Topic: Microsoft and deep learning !  (Read 756 times)

Offline Sadat

  • Newbie
  • *
  • Posts: 41
  • Test
    • View Profile
Microsoft and deep learning !
« on: April 28, 2017, 12:48:14 PM »
Have you ever wondered what it would be like to combine the power of deep learning with the scalability of distributed computing? Say no more! We present a solution that uses leading-edge technologies to score images using a pre-trained deep learning model in a distributed and scalable fashion.

This post is a follow-up to the “Embarrassingly Parallel Image Classification, Using Cognitive Toolkit and TensorFlow on Azure HDInsight Spark” post that came out recently, and describes the methods and best practices used in configuring the Microsoft deep learning toolkit on an HDInsight cluster. Our approach is described in detail by our full tutorial and Jupyter notebook.

Recipe
Begin with an Azure HDInsight Hadoop cluster pre-provisioned with an Apache Spark 2.1.0 distribution. Spark is a distributed-computing framework widely used for big data processing, streaming, and machine learning. Spark HDInsight clusters come with pre-configured Python environments where the Spark Python API (PySpark) can be used.
Install the Microsoft Cognitive Toolkit (CNTK) on all cluster nodes using a Script Action. Cognitive Toolkit is an open-source deep-learning library from Microsoft. As of version 2.0, Cognitive Toolkit has high-level Python interfaces for both training models on CPUs/GPUs as well as loading and scoring pre-trained models. Cognitive Toolkit can be installed as a Python package in the existing Python environments on all nodes of the HDInsight cluster. Script actions enable this installation to be done in one step using one script, thus eliminating the hurdles involved in configuring each cluster node separately. In addition to installing necessary Python packages and required dependencies, script actions can also be used to restart any affected applications and services in order to maintain the cluster state consistent.
Pre-process and score thousands of images in parallel using Cognitive Toolkit and PySpark in a Jupyter notebook. Python can be run interactively through Jupyter Notebooks which also provide a visualization environment. This solution uses the Cognitive Toolkit Python APIs and PySpark inside a Jupyter notebook on an HDInsight cluster.