Artificial intelligence and machine learning (AI and ML) are key technologies that help organizations develop new methods to increase sales, reduce costs, simplify business processes, and better understand customers. AWS helps customers accelerate their AI/ML adoption by providing powerful computing, high-speed networks, and scalable high-performance storage options on demand for any machine learning project. This lowers the barriers to entry for organizations that want to adopt the cloud to extend their ML applications.
Developers and data scientists are pushing technological boundaries and increasingly adopt deep learning, which is a type of machine learning based on neural network algorithms. These deep learning models are larger and more complex, leading to an increase in the cost of running the underlying infrastructure to train and deploy these models.
To enable customers to accelerate their AI/ML transformation, AWS is building high-performance and low-cost machine learning chips. AWS Inferentia is the first machine learning chip built by AWS from the ground up to achieve the lowest cost machine learning inference in the cloud. In fact, compared with the current generation of GPU-based EC2 instances, Amazon EC2 Inf1 instances powered by Inferentia have improved machine learning inference performance by 2.3 times and reduced costs by 70%. AWS Trainium is AWS’s second machine learning chip, designed for training deep learning models, and will be available at the end of 2021.
Customers from all walks of life have deployed their ML applications in Inferentia’s production environment and have seen significant performance improvements and cost savings. For example, AirBnB’s customer support platform provides a smart, scalable and excellent service experience for its global community of millions of hosts and guests. It uses Inferentia-based EC2 Inf1 instances to deploy natural language processing (NLP) models that support its chatbots. This resulted in a 2x improvement in out-of-the-box performance compared to GPU-based instances.
Through these chip innovations, AWS enables customers to easily train and execute their deep learning models in production, with high performance and throughput, and a significant cost reduction.
Machine learning challenges accelerate the transition to cloud-based infrastructure
Machine learning is an iterative process that requires teams to quickly build, train, and deploy applications, and often train, retrain, and experiment to improve the prediction accuracy of the model. When deploying trained models into their business applications, organizations also need to extend their applications to serve new users around the world. They need to be able to process multiple incoming requests at the same time with near real-time delay to ensure an excellent user experience.
Emerging use cases such as object detection, natural language processing (NLP), image classification, conversational AI, and time series data all rely on deep learning techniques. The scale and complexity of deep learning models have grown exponentially, from having millions of parameters to billions in a few years.
Training and deploying these complex and sophisticated models will translate into substantial infrastructure costs. As organizations expand their applications to provide near real-time experiences to their users and customers, the costs will quickly snowball.
This is where cloud-based machine learning infrastructure services can help. The cloud provides on-demand access to computing, high-performance networks, and big data storage, seamlessly integrated with ML operations and higher-level AI services, enabling organizations to immediately start and expand their AI/ML initiatives.
How AWS helps customers accelerate their AI/ML transformation
AWS Inferentia and AWS Trainium aim to democratize machine learning and allow developers to use it regardless of experience and organization size. Inferentia’s design is optimized for high performance, throughput, and low latency, making it ideal for large-scale deployment of ML inference.
Each AWS Inferentia chip contains four NeuronCores, which implement a high-performance systolic array matrix multiplication engine that can greatly accelerate typical deep learning operations such as convolution and converters. NeuronCore is also equipped with a large on-chip cache to help reduce external memory access, reduce latency and increase throughput.
AWS Neuron is Inferentia’s software development kit, which natively supports leading ML frameworks such as TensorFlow and PyTorch. Developers can continue to use the same framework and life cycle development tools that they are familiar with and love. For many trained models, they only need to change one line of code to compile and deploy them on Inferentia without changing other application code.
The result is a high-performance inference deployment that can be easily expanded while controlling costs.
Sprinklr is a software-as-a-service company with an AI-driven unified customer experience management platform that enables the company to collect real-time customer feedback across multiple channels and turn it into actionable insights. This leads to proactive problem solving, enhanced product development, improved content marketing and better customer service. Sprinklr used Inferentia to deploy its NLP and some computer vision models and saw significant performance improvements.
Some Amazon services have also deployed their machine learning models on Inferentia.
Amazon Prime Video uses computer vision ML models to analyze the video quality of live events to ensure the best viewing experience for Prime Video members. It has deployed its image classification ML model on EC2 Inf1 instances, which has improved performance by 4 times and cost savings of up to 40% compared with GPU-based instances.
Another example is Amazon Alexa’s intelligence based on AI and ML, powered by Amazon Web Services, and currently available on more than 100 million devices. Alexa’s promise to customers is that it always becomes smarter, more talkative, more proactive, and even more enjoyable. Delivering on this promise requires continuous improvement in response time and machine learning infrastructure costs. By deploying Alexa’s text-to-speech ML model on the Inf1 instance, it can reduce inference latency by 25% and reduce the cost of each inference by 30%, thereby improving the service experience for tens of millions of customers who use Alexa every month.
Unleash new machine learning capabilities in the cloud
As companies race to adapt to the business of the future by providing the best digital products and services, no organization will be left behind when it comes to deploying sophisticated machine learning models to help innovate the customer experience. In the past few years, the applicability of machine learning in a variety of use cases has greatly improved, from personalization and churn prediction to fraud detection and supply chain prediction.
Fortunately, the machine learning infrastructure in the cloud is unleashing new features that were not possible before, making it easier for non-expert practitioners to access. This is why AWS customers are already using Inferentia-based Amazon EC2 Inf1 instances to provide the intelligence behind their recommendation engines and chatbots, and to gain actionable insights from customer feedback.
With AWS cloud-based machine learning infrastructure options for all skill levels, it is clear that any organization can accelerate innovation and embrace the entire machine learning lifecycle at scale. As machine learning becomes more common, organizations are now able to fundamentally change the customer experience and the way they do business through a cost-effective, high-performance cloud-based machine learning infrastructure.
Learn more about how AWS’s machine learning platform can help your company innovate here.
This content is produced by AWS. It was not written by the editors of MIT Technology Review.