NLP@AWS Newsletter September 2022

#aws #nlp #machinelearning #ai

Hello world. This is the September 2022 edition of the AWS Natural Language Processing(NLP) newsletter covering everything related to NLP at AWS. Feel free to leave comments & share it on your social network.

AWS NLP Summit

Only one week to the first ever AWS NLP Summit 2022 in London! We are so excited about this and the team has put in a huge amount of effort to make sure that the agenda is packed with interesting speakers, sessions, and workshops.

The AWS NLP Summit 2022 at London, UK, features inspiring keynotes by:
Craig Saunders Director of Machine Learning at Alexa AI
Vasi Philomin Vice President of AWS AI
Satish Lakshmanan Director of AWS AI/ML Worldwide Specialists

Not only do you get to interact with and hear from the most innovative minds in the industry, you can also try it in action with hands-on workshops, inspiring breakouts, and learn from expert panels of inventors, scientists, and pioneering startups that are shaping key NLP trends in AWS today. The summit will host 25 sessions covering cutting-edge innovation, the hottest research and building enterprise NLP applications at scale for popular use cases across industries.

When: October 5 and 6, 2022
Where: 1 Principal Place, Worship Street, LONDON, EC2A 2FA
Format: In-person only

Use this link to register and save your spots.

NLP@AWS Customer Success Stories

Savana - Builds World’s Only Natural Language Processing Clinical Research Network on AWS
Madrid-based Savana helps healthcare providers to unlock the value of their electronic medical records (EMRs) for research purposes. It combines research-grade methodology with natural language processing (NLP) and predictive analytics to obtain relevant results for healthcare and life science providers investigating disease prediction and treatment.

"AWS and Savana work together as trusted cloud service providers and our customers benefit from that trust as well. Hospital IT departments are realizing that the cloud is safer than the traditional on-premises approach." – said Jorge Tello, Chief Executive Officer at Savana

HM Land Registry - Cuts Document Review Time in Half Using Amazon Textract
The UK government agency’s caseworkers need to review complex legal documents manually, which was time consuming, so HMLR and Kainos built a solution powered by Amazon Textract and Amazon Comprehend that automatically compares documents and flags discrepancies for review. Now, caseworkers no longer need to review thousands of documents per week, and the agency can approve property transfers faster.

"This project has successfully shown how we can use artificial intelligence and machine learning and, more importantly, how we can strengthen that capability as we progress through our digital transformation. Using AWS, we can improve our processes and enhance how our employees work" – said Nick Davies, Senior Product Manager for Internal Digital Services at HM Land Registry

State Auto - Improves Processes across the Life Cycle Using AWS Machine Learning, Computer Vision, and Serverless Architecture
State Automobile Mutual Insurance Company (State Auto) began using technology to help its customer service representatives (CSRs) meet quality and customer satisfaction score goals when it built its SA360 solution on Amazon Web Services (AWS). By using data-fueled insights and making these insights available to customers and CSRs, State Auto was able to build a better service experience, redirecting typical customer calls to self-service channels so that CSRs were able to focus on those customers with more complex needs.

"Because AWS services do their job so well out of the box, we have the flexibility to be creative and build things on top of them." – said Uthra Ramanujam, Vice President of Strategic Technology Research at State Automobile Mutual Insurance Company

AI Language Services

Amazon Lex introduces Visual Conversation builder, a drag and drop interface to visualize and build conversation flows in a no-code environment. The Visual Conversation Builder greatly simplifies bot design. In addition to the already available menu-based editor, and Lex APIs, the visual builder provides a complete view of the entire conversation flow in one location. It empowers any user to build engaging conversational experiences more quickly. Launch blog post.

Amazon Lex introduces the composite slot type. A slot is used to capture user input and provide the bot the necessary information to fulfil a task. In some cases, the information contains multiple values, each requiring its own slot. With the composite slot type, Amazon Lex can capture the full user response at once and associate each piece of information with the appropriate slot.

AWS Comprehend now supports synchronous processing for targeted sentiment. Amazon Comprehend customers can now extract the sentiments associated with entities from text documents in real-time using the newly released synchronous API. Targeted Sentiment synchronous API enables customers to derive granular sentiments associated with specific entities of interest such as brands or products without waiting for batch processing. Launch blog post.

Amazon Textract announces updates to the text extraction feature. Amazon Textract is a machine learning service that automatically extracts text, handwriting, and data from any document or image. We are pleased to announce quality enhancements to our text extraction feature available via the DetectDocumentText API. The latest Text detection models available via the DetectDocumentText API now provide improvements to word and line extraction accuracy and specifically for E13B fonts commonly found in checks/cheques, International Bank Account Numbers found in banking documents, and long words (e.g., email addresses).

NLP on SageMaker

Amazon SageMaker now supports deploying large models through configurable volume size and timeout quotas. Amazon SageMaker’s Real-time and Asynchronous Inference options can now deploy large models (up to 500GB) for inference by configuring the maximum EBS volume size and timeout quotas. This launch enables customers to leverage SageMaker's fully managed Real-time and Asynchronous inference capabilities to deploy and manage large ML models such as variants of GPT and OPT. Launch blog post: Deploy large models on Amazon SageMaker using DJLServing and DeepSpeed model parallel inference.

Amazon SageMaker Model Monitor: A system for real-time insights into deployed machine learning models, is a fully managed service that continuously monitors the quality of machine learning models hosted on Amazon SageMaker. It automatically detects data, concept, bias, and feature attribution drift in models in real-time and provides alerts so that model owners can take corrective actions and thereby maintain high quality models.
NLP encoders like word-2vec, BERT and RoBERTa are used in a wide array of applications, ranging from chat bots and virtual assistants, to machine translation and text summarization. These encoders operate by converting input words or sequences of words into (contextual or non-contextual) word-level embeddings. These embeddings are then used by downstream task-specific models. A change in the distribution of the input text can clearly impact the performance of the downstream model.
However, due to specific structure of text, monitoring text data is a challenging task. Unlike tabular data which is often fixed-dimensional and bounded, text data is often free form. To overcome these issues, SageMaker Model Monitor can be used on the embeddings of the text data as opposed to the raw text itself. The following figure shows an example of configuring a custom monitoring schedule to detect drifts in text data.

NLP @ Community

Whisper
Last week OpenAI released an open-sourcing neural net called Whisper that approaches human level robustness and accuracy on English speech recognition. This automatic speech recognition (ASR) system was trained on 680,000 hours of multilingual and multitask supervised data collected from the web. The use of such a large and diverse dataset leads to improved robustness to accents, background noise and technical language. Moreover, it enables transcription in multiple languages, as well as translation from those languages into English.
Launch content [Blog] [Paper] [Code] [Model card]

NLP Posts from AWS Machine Learning Blog

Improve transcription accuracy of customer-agent calls with custom vocabulary in Amazon Transcribe
In many countries, such as India, English is not the primary language of communication. Indian customer conversations contain regional languages like Hindi, with English words and phrases spoken randomly throughout the calls. In the source media files, there can be proper nouns, domain-specific acronyms, words, or phrases that the default Amazon Transcribe model isn’t aware of. Transcriptions for such media files can have inaccurate spellings for those words.
This post demonstrates how you can provide more information to Amazon Transcribe with custom vocabularies to update the way Amazon Transcribe handles transcription of your audio files with business-specific terminology.

Get better insight from reviews using Amazon Comprehend
Statistics show that the majority of shoppers use reviews to determine what products to buy and which services to use. As per Spiegel Research Centre, the purchase likelihood for a product with five reviews is 270% greater than the purchase likelihood of a product with no reviews. Reviews have the power to influence consumer decisions and strengthen brand value.
This post shows how to use Amazon Comprehend to extract meaningful information from product reviews, analyze it to understand how users of different demographics are reacting to products, and discover aggregated information on user affinity towards a product.

Read webpages and highlight content using Amazon Polly
This post demonstrates how to use Amazon Polly—a leading cloud service that converts text into lifelike speech—to read the content of a webpage and highlight the content as it’s being read. Adding audio playback to a webpage improves the accessibility and visitor experience of the page. Audio-enhanced content is more impactful and memorable, draws more traffic to the page, and taps into the spending power of visitors. It also improves the brand of the company or organization that publishes the page.

Discover insights from Zendesk with Amazon Kendra intelligent search
Now you can use the Amazon Kendra Zendesk connector to index your Zendesk service tickets, help guides, and community posts, and perform intelligent search powered by machine learning (ML). Amazon Kendra smartly and efficiently answers natural language-based queries using advanced natural language processing (NLP) techniques. It can learn effectively from your Zendesk data, extracting meaning and context.
This post shows how to configure the Amazon Kendra Zendesk connector to index your Zendesk domain and take advantage of Amazon Kendra intelligent search.

Choose the k-NN algorithm for your billion-scale use case with OpenSearch
When organizations set out to build machine learning (ML) applications such as natural language processing (NLP) systems, recommendation engines, or search-based systems, often times k-Nearest Neighbor (k-NN) search will be used at some point in the workflow. As the number of data points reaches the hundreds of millions or even billions, scaling a k-NN search system can be a major challenge. Applying Approximate Nearest Neighbor (ANN) search is a great way to overcome this challenge.
The OpenSearch k-NN plugin provides the ability to use some of these algorithms within an OpenSearch cluster. This post presents the different algorithms that are supported and shows experiments to see some of the trade-offs between them.

Stay in touch with NLP on AWS

Our contact: aws-nlp@amazon.com
Email us about (1) your awesome project about NLP on AWS, (2) let us know which post in the newsletter helped your NLP journey, (3) other things that you want us to post on the newsletter. Talk to you soon.