<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Edward Ma</title>
    <description>The latest articles on DEV Community by Edward Ma (@makcedward).</description>
    <link>https://dev.to/makcedward</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F177788%2F69ccdee4-848e-48fc-a84c-1a27aac410c0.jpeg</url>
      <title>DEV Community: Edward Ma</title>
      <link>https://dev.to/makcedward</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/makcedward"/>
    <language>en</language>
    <item>
      <title>Unsupervised Data Augmentation</title>
      <dc:creator>Edward Ma</dc:creator>
      <pubDate>Mon, 05 Aug 2019 00:26:58 +0000</pubDate>
      <link>https://dev.to/makcedward/unsupervised-data-augmentation-3ico</link>
      <guid>https://dev.to/makcedward/unsupervised-data-augmentation-3ico</guid>
      <description>&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Xr2DjAHb--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/1024/0%2A1NvpsIcE3_oTbHQ_" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Xr2DjAHb--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/1024/0%2A1NvpsIcE3_oTbHQ_" alt=""&gt;&lt;/a&gt;Photo by &lt;a href="https://unsplash.com/@makcedward?utm_source=medium&amp;amp;utm_medium=referral"&gt;Edward Ma&lt;/a&gt; on &lt;a href="https://unsplash.com/?utm_source=medium&amp;amp;utm_medium=referral"&gt;Unsplash&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  A Look at Data Augmentation | &lt;a href="https://towardsai.net"&gt;Towards AI&lt;/a&gt;
&lt;/h4&gt;

&lt;p&gt;The more data we have, the better the performance we can achieve. However, it is very too luxury to annotate a large amount of training data. Therefore, proper data augmentation is useful to boost up your model performance. Authors of &lt;a href="https://arxiv.org/pdf/1904.12848.pdf"&gt;Unsupervised Data Augmentation&lt;/a&gt; (Xie et al., 2019) proposed Unsupervised Data Augmentation (UDA) assistants us to build a better model by leveraging several data augmentation methods.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;In natural language processing (NLP) field, it is hard to augmenting text due to high complexity of language. Not every word we can replace it by others such as a, an, the. Also, not every word has synonym. Even changing a word, the context will be totally difference. On the other hand, generating augmented image in computer vision area is relative easier. Even introducing noise or cropping out portion of image, model can still classify the image.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Xie et al. conducted several data augmentation experiments on image classification (AutoAugment) and text classification (Back translation and TF-IDF based word replacing). After generating large enough data set of model training, the authors noticed that the model can easily over-fit. Therefore, they introduce Training Signal Annealing (TSA) to overcome it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Augmentation Strategies
&lt;/h3&gt;

&lt;p&gt;This section will introduce three data augmentation in computer vision (CV) and the natural language processing (NLP) field.&lt;/p&gt;

&lt;h4&gt;
  
  
  AutoAugment for Image Classification
&lt;/h4&gt;

&lt;p&gt;AutoAugment is found by google in 2018. It is a way to augment images automatically. Unlike the traditional image augmentation library, AutoAugment is designed to find the best policy to manipulate data automatically.&lt;/p&gt;

&lt;p&gt;You may visit &lt;a href="https://github.com/tensorflow/models/tree/master/research/autoaugment"&gt;here&lt;/a&gt; for model and implementation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--oBnjiQHh--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/630/1%2AnwnEJoSn5-sxNpG6DOlFVA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--oBnjiQHh--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/630/1%2AnwnEJoSn5-sxNpG6DOlFVA.png" alt=""&gt;&lt;/a&gt;Generated result by AutoAugment (Cubuk et al., 2018)&lt;/p&gt;

&lt;h4&gt;
  
  
  Back translation for Text Classification
&lt;/h4&gt;

&lt;p&gt;Back translation is a method to leverage the translation system to generate data. Given that we have a model for translating English to Cantonese and vice versa. Augmented data can be retrieved by translating the original data from English to Cantonese and then translating back to English.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://arxiv.org/pdf/1511.06709.pdf"&gt;Sennrich et al. (2015)&lt;/a&gt; used back-translation method to generate more training data to improve translation model performance.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--tyNPDcdr--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/926/1%2APkj0hnD43MJuMUUAgHpwKA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--tyNPDcdr--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/926/1%2APkj0hnD43MJuMUUAgHpwKA.png" alt=""&gt;&lt;/a&gt;Examples of back translation (Xie et al., 2019)&lt;/p&gt;

&lt;h4&gt;
  
  
  TF-IDF based word replacing for Text Classification
&lt;/h4&gt;

&lt;p&gt;Although back translation helps to generate a lot of data, there is no guarantee that keywords will be kept after translation. Some keywords carry more information than others and it may be missed after translation.&lt;/p&gt;

&lt;p&gt;Therefore, Xie et al. use &lt;a href="https://towardsdatascience.com/3-basic-approaches-in-bag-of-words-which-are-better-than-word-embeddings-c2cbc7398016"&gt;TF-IDF&lt;/a&gt; to tackle this limitation. The concept of TF-IDF is that high frequency may not able to provide much information gain. In other word, rare words contribute more weights to the model. Word importance will be increased if the number of occurrence within the same document (i.e. training record). On the other hand, it will be decreased if it occurs in the corpus (i.e. other training records).&lt;/p&gt;

&lt;p&gt;IDF score is calculated by the DBPedia corpus. TF-IDF score will be computed for each token and replace it according to the TF-IDF score. Low TF-IDF score will have a high probability to be replaced.&lt;/p&gt;

&lt;p&gt;If you are interested to use TF-IDF based word replacing for data augmentation, you may visit &lt;a href="https://github.com/makcedward/nlpaug"&gt;nlpaug&lt;/a&gt; for python implementation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Training Signal Annealing (TSA)
&lt;/h3&gt;

&lt;p&gt;After generated a large amount of data by using the aforementioned skill, Xie et al. noticed that the model will be over-fitting easily. Therefore, they introduce the TSA. During model training, examples with high confidence will be removed from loss function to prevent over-training.&lt;/p&gt;

&lt;p&gt;The following figure shows the value range of ηt while K is the number of categories. If the probability is higher then ηt, it will be removed from loss function.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--WU6aAdCd--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/346/1%2AITyrjAHpn21ua7bDkSBaNQ.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--WU6aAdCd--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/346/1%2AITyrjAHpn21ua7bDkSBaNQ.png" alt=""&gt;&lt;/a&gt;The threshold of removing high probability examples (Xie et al., 2019)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--PWEZxt-A--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/546/1%2Awts7wcoL1hsED_5eaNGyuA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--PWEZxt-A--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/546/1%2Awts7wcoL1hsED_5eaNGyuA.png" alt=""&gt;&lt;/a&gt;TSA’s objective function (Xie et al., 2019)&lt;/p&gt;

&lt;p&gt;3 calculations of ηt are considered for different scenarios.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Linear-schedule: Growing constantly&lt;/li&gt;
&lt;li&gt;Log-schedule: Growing faster in the early stage of training.&lt;/li&gt;
&lt;li&gt;Exp-schedule: Growing faster at the end of the training.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--U6h9atXe--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/491/1%2AwJrF9YPqo5VmxKc1yk5Tmw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--U6h9atXe--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/491/1%2AwJrF9YPqo5VmxKc1yk5Tmw.png" alt=""&gt;&lt;/a&gt;Training process among three schedules (Xie et al., 2019)&lt;/p&gt;

&lt;h3&gt;
  
  
  Recommendation
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;The above approach is designed to solve problems that authors are facing in their problem. If you understand your data, you should tailor made augmentation approach it. Remember that golden rule in data science is garbage in garbage out.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Like to learn?
&lt;/h3&gt;

&lt;p&gt;I am a Data Scientist in the Bay Area. Focusing on the state-of-the-art in Data Science, Artificial Intelligence, especially in NLP and platform related. Feel free to connect with &lt;a href="https://makcedward.github.io/"&gt;me&lt;/a&gt; on &lt;a href="https://www.linkedin.com/in/edwardma1026"&gt;LinkedIn&lt;/a&gt; or &lt;a href="https://github.com/makcedward"&gt;Github&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Extension Reading
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://towardsdatascience.com/data-augmentation-in-nlp-2801a34dfc28"&gt;Data Augmentation in NLP&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://towardsdatascience.com/data-augmentation-library-for-text-9661736b13ff"&gt;Data Augmentation for Text&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/makcedward/data-augmentation-for-audio-5fii"&gt;Data Augmentation for Audio&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/makcedward/data-augmentation-for-speech-recognition-bfc"&gt;Data Augmentation for Spectrogram&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/makcedward/does-your-nlp-model-able-to-prevent-adversarial-attack-1me7-temp-slug-1330031"&gt;Does your NLP model able to prevent an adversarial attack?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/DeepVoltaire/AutoAugment"&gt;Unofficial AutoAugment implementation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Reference
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;R. Sennrich, B. Haddow and A Birch. &lt;a href="https://arxiv.org/pdf/1511.06709.pdf"&gt;Improving Neural Machine Translation Models with Monolingual Data&lt;/a&gt;. 2015&lt;/li&gt;
&lt;li&gt;E. D. Cubuk, B. Zoph, D. Mane, V. Vasudevan and Q. V. Le. &lt;a href="https://arxiv.org/pdf/1805.09501.pdf"&gt;AutoAugment: Learning Augmentation Strategies from Data&lt;/a&gt;. 2018&lt;/li&gt;
&lt;li&gt;Q. Xie, Z. Dai, E Hovy, M. T. Luong and Q. V. Le. &lt;a href="https://arxiv.org/pdf/1904.12848.pdf"&gt;Unsupervised Data Augmentation&lt;/a&gt;. 2019&lt;/li&gt;
&lt;/ul&gt;




</description>
      <category>machinelearning</category>
      <category>python</category>
      <category>ai</category>
      <category>nlp</category>
    </item>
    <item>
      <title>How does your assistant device work based-on Text-to-Speech technology?</title>
      <dc:creator>Edward Ma</dc:creator>
      <pubDate>Mon, 29 Jul 2019 13:54:16 +0000</pubDate>
      <link>https://dev.to/makcedward/how-does-your-assistant-device-work-based-on-text-to-speech-technology-2bdg</link>
      <guid>https://dev.to/makcedward/how-does-your-assistant-device-work-based-on-text-to-speech-technology-2bdg</guid>
      <description>&lt;h4&gt;
  
  
  Speech synthesis
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--lp7VES9b--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/1024/0%2AUsIXmmxFHg5reGcE" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--lp7VES9b--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/1024/0%2AUsIXmmxFHg5reGcE" alt=""&gt;&lt;/a&gt;Photo by &lt;a href="https://unsplash.com/@howardlawrenceb?utm_source=medium&amp;amp;utm_medium=referral"&gt;Howard Lawrence B&lt;/a&gt; on &lt;a href="https://unsplash.com?utm_source=medium&amp;amp;utm_medium=referral"&gt;Unsplash&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Speech synthesis is the artificial production of human speech. Text-to-Speech (TTS) is way to converts language to human voice (or speech). The goal of TTS is to render naturally sounding speech signals for downstream such as assistant device (Google’s Assistant, Amazon’s Echo, Apple’s Siri). This story will talk about how we can generate a human-like voice. Concatenative TTS and Parametric TTS are the traditional ways to generate audio but there are some limitations. Google released a generative model, WaveNet, which is a break through on TTS. It can generate a very good audio and overcoming traditional ways’ limitation.&lt;/p&gt;

&lt;p&gt;This story will discuss about &lt;a href="https://arxiv.org/pdf/1609.03499.pdf"&gt;WaveNet: A Generative Model for Raw Audio&lt;/a&gt; (van den Oord et al., 2016) and the following are will be covered:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Text-to-Speech&lt;/li&gt;
&lt;li&gt;Technique of Classical Speech Synthesis&lt;/li&gt;
&lt;li&gt;WaveNet&lt;/li&gt;
&lt;li&gt;Experiment&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Trending AI Articles:
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://becominghuman.ai/basics-of-neural-network-bef2ba97d2cf"&gt;1. Basics of Neural Network&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://becominghuman.ai/making-a-simple-neural-network-2ea1de81ec20"&gt;2. Making a Simple Neural Network&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://becominghuman.ai/are-you-using-the-term-ai-incorrectly-911ac23ab4f5"&gt;3. Are you using the term ‘AI’ incorrectly?&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://becominghuman.ai/from-perceptron-to-deep-neural-nets-504b8ff616e"&gt;4. From Perceptron to Deep Neural Nets&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Text-to-Speech (TTS)
&lt;/h3&gt;

&lt;p&gt;Technically, we can treat TTS as a sequence-to-sequence problem. It includes 2 major stages which are text analysis and speech synthesis. Text analysis is quite similar to generic natural language processing (NLP) steps (Although we may not need heave preprocessing when using deep neural network). For example, sentence segmentation, word segmentation, part-of-speech(POS). The output of first stage is grapheme-to-phoneme (G2P) which is the input of second stage. In speech synthesis, it takes the output from first stage and generating waveform.&lt;/p&gt;

&lt;h3&gt;
  
  
  Technique of Classical Speech Synthesis
&lt;/h3&gt;

&lt;p&gt;Concatenative TTS and Parametric TTS are the traditional ways to generate audio by feeding text. As named mentioned, Concatenative TTS concatenate a short clip to form a speech. As short clips are recorded by human, quality is good and voice is clear. However, the limitations are huge human effort for recordings and re-recording if transcript is changed. Parametric TTS can generate voice easily as it stores all base information such as fundamental frequency, magnitude spectrum. As voice is generated, voice is more unnatural than Concatenative TTS.&lt;/p&gt;

&lt;h3&gt;
  
  
  WaveNet
&lt;/h3&gt;

&lt;p&gt;WaveNet is introduced by van den Oord et al. It can generate audio from text and achieving very good result which you may not able to distinguish generated audio and human voice. On the other hand, dilated causal convolutions architecture is leveraged to deal with long-range temporal dependencies. Also, a single model can generate multiple voices&lt;/p&gt;

&lt;p&gt;It is based on PixelCNN ( van den Oord et al., 2016) architecture. By leveraging dilated causal convolutions, it contributes to increasing receptive field without greatly increasing computational cost. A dilated convolution is similar to normal convolution but the filter is applied over an area larger than its length and causing some of input values are skipped. It is similar to larger filter but less computational cost.&lt;/p&gt;

&lt;p&gt;From the following figure, you notice that second layer (Hidden Layer , Dilation=2) get current input and the one of previous one input. In next layer (Hidden Layer, Dilation =4), current input and 4 previous one input. During the experiment, van den Oord et al. doubled for every layer up to a limit and then repeated. So dilation sequence is&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;1, 2, 4, 8, 512, 1, 2, 4 ….&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--H65W-aVt--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/675/1%2A8UnrsD-QFaKp7FO0fuKf_w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--H65W-aVt--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/675/1%2A8UnrsD-QFaKp7FO0fuKf_w.png" alt=""&gt;&lt;/a&gt;Dilated Causal Convolutions (from van den Oord et al., 2016)&lt;/p&gt;

&lt;p&gt;The following animation show the operation of dilated causal convolutions. Previous output becomes input and it combines previous input to generate new outputs.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--KFSV95vR--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://cdn-images-1.medium.com/max/570/0%2ALgGxxpenJyzo4t4W.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--KFSV95vR--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://cdn-images-1.medium.com/max/570/0%2ALgGxxpenJyzo4t4W.gif" alt=""&gt;&lt;/a&gt;Dilated Causal Convolutions (from &lt;a href="https://deepmind.com/blog/wavenet-generative-model-raw-audio/"&gt;DeepMind&lt;/a&gt;)&lt;/p&gt;

&lt;h3&gt;
  
  
  Experiment
&lt;/h3&gt;

&lt;p&gt;van den Oord et al. conducts four experiments to validate this model. First experiment is multi-speaker speech generation. By leveraging CSTR Voice Cloning Toolkit dataset, it can generate up to 109 speaker voices. More speaker training data lead to a better result as WaveNet’s internal representation are shares among speaker voices.&lt;/p&gt;

&lt;p&gt;The second experiment is TTS. van den Oord et al. use Google’s North American English and Mandarin Chinese TTS systems as a training data to compare different models. To make the comparison fairly, researchers use hidden Markov model (HMM) and LSTM-RNN-based statistical parametric model as baseline models. Mean Opinion Score (MOS) is used to measure the performance. It is a five-point scale score (1: Bad, 2: Poor, 3: Fair, 4: Good, 5: Excellent). From the following, although WaveNet’s score is still lower than human natural voice but it is better than those baseline models a lot.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--0qQTOdd---/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/828/1%2Ahpc01ZLJPP1mzbeZSgAQng.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--0qQTOdd---/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/828/1%2Ahpc01ZLJPP1mzbeZSgAQng.png" alt=""&gt;&lt;/a&gt;Model performance comparison (van den Oord et al., 2016)&lt;/p&gt;

&lt;p&gt;The third and forth experiments are music generation and speech recognition. Resear&lt;/p&gt;

&lt;p&gt;The following figures hows the latest Google DeepMind’s WaveNet performance.&lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/JjK8apEishQ"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--SVyMGvDw--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/1024/0%2APnJUIpTXlKV2hsk1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--SVyMGvDw--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/1024/0%2APnJUIpTXlKV2hsk1.png" alt=""&gt;&lt;/a&gt;Comparison result among different model in US English and Mandarin Chinese (from &lt;a href="https://deepmind.com/blog/wavenet-generative-model-raw-audio/"&gt;DeepMind&lt;/a&gt;)&lt;/p&gt;

&lt;h3&gt;
  
  
  Take Away
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Google applied WaveNet on Google Assistant such that it can response to our voice command without storing all of the audio but generating it in realtime.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Like to learn?
&lt;/h3&gt;

&lt;p&gt;I am Data Scientist in Bay Area. Focusing on state-of-the-art in Data Science, &lt;a href="https://becominghuman.ai/"&gt;Artificial Intelligence&lt;/a&gt; , especially in NLP and platform related. Feel free to connect with &lt;a href="https://makcedward.github.io/"&gt;me&lt;/a&gt; on &lt;a href="https://www.linkedin.com/in/edwardma1026"&gt;LinkedIn&lt;/a&gt; or following me on &lt;a href="http://medium.com/@makcedward/"&gt;Medium&lt;/a&gt; or &lt;a href="https://github.com/makcedward"&gt;Github&lt;/a&gt;. I am offering short advise on &lt;a href="https://becominghuman.ai/"&gt;machine learning&lt;/a&gt; problem or data science platform for small fee.&lt;/p&gt;

&lt;h3&gt;
  
  
  Extension Reading
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://deepmind.com/blog/wavenet-generative-model-raw-audio/"&gt;DeepMind’s WaveNet&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Reference
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, Koray Kavukcuoglu. &lt;a href="https://arxiv.org/pdf/1609.03499.pdf"&gt;WaveNet: A Generative Model for Raw Audio&lt;/a&gt;. 2016&lt;/li&gt;
&lt;li&gt;Aaron van den Oord, Nal Kalchbrenner, Oriol Vinyals, Lasse Espeholt, Alex Graves, Koray Kavukcuoglu. &lt;a href="https://arxiv.org/pdf/1606.05328.pdf"&gt;Conditional Image Generation with PixelCNN Decoders&lt;/a&gt;. 2016&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Don’t forget to give us your 👏 !
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--9lM2CYWq--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/780/0%2AWNel-PqkxnjN3yKj" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--9lM2CYWq--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/780/0%2AWNel-PqkxnjN3yKj" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://medium.com/media/c43026df6fee7cdb1aab8aaf916125ea/href"&gt;&lt;/a&gt;&lt;a href="https://medium.com/media/c43026df6fee7cdb1aab8aaf916125ea/href"&gt;https://medium.com/media/c43026df6fee7cdb1aab8aaf916125ea/href&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://becominghuman.ai/artificial-intelligence-communities-c305f28e674c"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Y8Uu1TR---/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/255/1%2A2f7OqE2AJK1KSrhkmD9ZMw.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://upscri.be/8f5f8b"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--lXHXGupe--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/255/1%2Av-PpfkSWHbvlWWamSVHHWg.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://becominghuman.ai/write-for-us-48270209de63"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--PTFhidO2--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/255/1%2AWt2auqISiEAOZxJ-I7brDQ.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;




</description>
      <category>ai</category>
      <category>datascience</category>
      <category>texttospeech</category>
    </item>
    <item>
      <title>Multi-Task Learning for Sentence Embeddings</title>
      <dc:creator>Edward Ma</dc:creator>
      <pubDate>Mon, 10 Jun 2019 03:54:40 +0000</pubDate>
      <link>https://dev.to/makcedward/multi-task-learning-for-sentence-embeddings-3f4p</link>
      <guid>https://dev.to/makcedward/multi-task-learning-for-sentence-embeddings-3f4p</guid>
      <description>&lt;p&gt;Universal Sentence Encoder&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--WnUurJC---/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/1024/0%2AeaSQmfXUK5j4XWwN" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--WnUurJC---/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/1024/0%2AeaSQmfXUK5j4XWwN" alt=""&gt;&lt;/a&gt;“ Mount Fuji“ by &lt;a href="https://unsplash.com/@makcedward?utm_source=medium&amp;amp;utm_medium=referral"&gt;Edward Ma&lt;/a&gt; on &lt;a href="https://unsplash.com/photos/xFmcfCR9h20?utm_source=medium&amp;amp;utm_medium=referral"&gt;Unsplash&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Cera et al. demonstrated that the transfer learning result of sentence embeddings is outperform &lt;a href="https://towardsdatascience.com/3-silver-bullets-of-word-embedding-in-nlp-10fa8f50cc5a"&gt;word embeddings&lt;/a&gt;. The traditional way of building sentence embeddings is either average, sum or contacting a set of word vectors to product sentence embeddings. This method loss lots of information but just easier of calculation. Cera et al. evaluated two famous network architectures which are transformer based model and deep averaging network (DAN) based model.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--PTCh1cm9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/507/1%2AwgGU92OyaWrz6TBwBKclpg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--PTCh1cm9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/507/1%2AwgGU92OyaWrz6TBwBKclpg.png" alt=""&gt;&lt;/a&gt;Sentence similarity score (Cera et al., 2018)&lt;/p&gt;

&lt;p&gt;This story will discuss about &lt;a href="https://arxiv.org/pdf/1803.11175.pdf"&gt;Universal Sentence Encoder&lt;/a&gt; (Cera et al., 2018) and the following are will be covered:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Data&lt;/li&gt;
&lt;li&gt;Architecture&lt;/li&gt;
&lt;li&gt;Implementation&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Data
&lt;/h3&gt;

&lt;p&gt;As it is designed to support multiple downstream tasks, multi task learning is adopted. Therefore, Cera et al. use multiple data sources to train model including movie review, customer review, sentiment classification, question classification, semantic textual similarity and Word Embedding Association Test (WEAT) data.&lt;/p&gt;

&lt;h3&gt;
  
  
  Architecture
&lt;/h3&gt;

&lt;p&gt;Text will be tokenized by Penn Treebank(PTB) method and passing to either transformer architecture or deep averaging network. As both models are designed to be a general purpose, multi-task learning approach is adopted. The training objective includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Same as &lt;a href="https://towardsdatascience.com/transforming-text-to-sentence-embeddings-layer-via-some-thoughts-b77bed60822c"&gt;Skip-though&lt;/a&gt;, predicting previous sentence and next sentence by giving current sentence.&lt;/li&gt;
&lt;li&gt;Conversational response suggestion for the inclusion of parsed conversational data.&lt;/li&gt;
&lt;li&gt;Classification task on supervised data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--lQDmyuBj--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/proxy/1%2AjKZr6M_UOUizE3ej74ohwg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--lQDmyuBj--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/proxy/1%2AjKZr6M_UOUizE3ej74ohwg.png" alt=""&gt;&lt;/a&gt;Predicting previous sentence and next sentence (Kiros et al., 2015)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://arxiv.org/pdf/1706.03762.pdf"&gt;Transformer&lt;/a&gt; architecture is developed by Google in 2017. It leverages self attention with multi blocks to learn the context aware word representation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--RG-P7Lb_--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/438/1%2A42QeWPLtG-kwhblDbtGP4Q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--RG-P7Lb_--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/438/1%2A42QeWPLtG-kwhblDbtGP4Q.png" alt=""&gt;&lt;/a&gt;Transformer architecture (Vaswani et al,, 2017)&lt;/p&gt;

&lt;p&gt;Deep averaging network (DAN) is using average of embeddings (word and bi-gram) and feeding to feedforward neural network.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Y8uv0Jg9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/539/1%2Av07lrQnceNCWXxyVx2yixg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Y8uv0Jg9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/539/1%2Av07lrQnceNCWXxyVx2yixg.png" alt=""&gt;&lt;/a&gt;DAN architecture (Ivver et al., 2015)&lt;/p&gt;

&lt;p&gt;The reasons of introducing two models because different concern. Transformer architecture achieve a better performance but it needs more resource to train. Although DAN does not perform as good as transformer architecture. The advantage of DAN is simple model and requiring less training resource.&lt;/p&gt;

&lt;h3&gt;
  
  
  Implementation
&lt;/h3&gt;

&lt;p&gt;To explore the Universal Sentence Encoder, if you simply follow the instruction from &lt;a href="https://tfhub.dev/google/universal-sentence-encoder/2"&gt;Tensorflow Hub&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Take Away
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Multi-task learning is important for learning text representations. It can be found that lots of modern NLP model architecture use multi-task learning rather than standalone data set&lt;/li&gt;
&lt;li&gt;Rather than aggregate multi word vectors to represent sentence embeddings, learning it from multi word vectors achieve better result.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  About Me
&lt;/h3&gt;

&lt;p&gt;I am Data Scientist in Bay Area. Focusing on state-of-the-art in Data Science, Artificial Intelligence , especially in NLP and platform related. Feel free to connect with me on &lt;a href="https://www.linkedin.com/in/edwardma1026"&gt;LinkedIn&lt;/a&gt; or following me on &lt;a href="http://medium.com/@makcedward/"&gt;Medium&lt;/a&gt; or &lt;a href="https://github.com/makcedward"&gt;Github&lt;/a&gt;. I am offering short advise on machine learning problem or data science platform for small fee.&lt;/p&gt;

&lt;h3&gt;
  
  
  Extension Reading
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://tfhub.dev/google/universal-sentence-encoder-large/3"&gt;Universal Sentence Encoder&lt;/a&gt; implementation&lt;/li&gt;
&lt;li&gt;&lt;a href="https://towardsdatascience.com/3-silver-bullets-of-word-embedding-in-nlp-10fa8f50cc5a"&gt;Word Embeddings&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://towardsdatascience.com/transforming-text-to-sentence-embeddings-layer-via-some-thoughts-b77bed60822c"&gt;Skip-though (Sentence Embeddings)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Reference
&lt;/h3&gt;

&lt;p&gt;D. Cera , Y. Yang , S. Y. Kong , N, Hua , N. Limtiaco, R. S. Johna , N. Constanta , M. Guajardo-Cespedes, S. Yuan, C. Tar , Y. H. Sung , B. Strope and Ray Kurzweil. &lt;a href="https://arxiv.org/pdf/1803.11175.pdf"&gt;Universal Sentence Encoder&lt;/a&gt;. 2018&lt;/p&gt;

&lt;p&gt;A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser and I. Polosukhin. &lt;a href="https://arxiv.org/pdf/1706.03762.pdf"&gt;Attention Is All You Need&lt;/a&gt;. 2017&lt;/p&gt;

&lt;p&gt;M. Iyyer, V. Manjunatha, J. Boyd-Graber and H. Daume III. &lt;a href="http://www.aclweb.org/anthology/P15-1162"&gt;Deep Unordered Composition Rivals Syntactic Methods for Text Classification&lt;/a&gt;. 2015&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>machinelearning</category>
      <category>datascience</category>
    </item>
    <item>
      <title>Data Augmentation for Audio</title>
      <dc:creator>Edward Ma</dc:creator>
      <pubDate>Sat, 01 Jun 2019 12:58:39 +0000</pubDate>
      <link>https://dev.to/makcedward/data-augmentation-for-audio-5fii</link>
      <guid>https://dev.to/makcedward/data-augmentation-for-audio-5fii</guid>
      <description>&lt;h4&gt;
  
  
  Data Augmentation
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Y1Cawti---/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/1024/0%2AQsYCC8milwRRx3Tf" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Y1Cawti---/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/1024/0%2AQsYCC8milwRRx3Tf" alt=""&gt;&lt;/a&gt;Photo by &lt;a href="https://unsplash.com/@makcedward?utm_source=medium&amp;amp;utm_medium=referral"&gt;Edward Ma&lt;/a&gt; on &lt;a href="https://unsplash.com?utm_source=medium&amp;amp;utm_medium=referral"&gt;Unsplash&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Although tuning model architecture and hyperparameter are successful factor of building a wonderful model, data scientist should also focus on data. No matter how amazing model you build, &lt;strong&gt;garbage in, garbage out (GIGO)&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Intuitively, lack of data is one of the common issue in actual data science problem. Data augmentation helps to generate synthetic data from existing data set such that generalisation capability of model can be improved.&lt;/p&gt;

&lt;p&gt;In the previous &lt;a href="https://dev.to/makcedward/data-augmentation-for-speech-recognition-3279-temp-slug-8408894"&gt;story&lt;/a&gt;, we explained how we play with spectrogram. In this story, we will talk about a basic augmentation methods for audio. This story and implementation are inspired by &lt;a href="https://www.kaggle.com/CVxTz/audio-data-augmentation"&gt;Kaggle’s Audio Data Augmentation Notebook&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Data Augmentation for Audio
&lt;/h3&gt;

&lt;p&gt;To generate syntactic data for audio, we can apply noise injection, shifting time, changing pitch and speed. &lt;a href="https://github.com/numpy/numpy"&gt;numpy&lt;/a&gt; provides an easy way to handle noise injection and shifting time while &lt;a href="https://github.com/librosa/librosa"&gt;librosa&lt;/a&gt; (library for Recognition and Organization of Speech and Audio) help to manipulate pitch and speed with just 1 line of code.&lt;/p&gt;

&lt;h4&gt;
  
  
  Noise Injection
&lt;/h4&gt;

&lt;p&gt;It simply add some random value into data by using numpy.&lt;/p&gt;



&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import numpy as np

def manipulate(data, noise\_factor):
 noise = np.random.randn(len(data))
 augmented\_data = data + noise\_factor \* noise
 # Cast back to same data type
 augmented\_data = augmented\_data.astype(type(data[0]))
 return augmented\_data
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--6sjTXR_o--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/604/1%2A43FyaaFTlz6qqHcZBy8RAw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--6sjTXR_o--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/604/1%2A43FyaaFTlz6qqHcZBy8RAw.png" alt=""&gt;&lt;/a&gt;Comparison between original and noise voice&lt;/p&gt;

&lt;h4&gt;
  
  
  Shifting Time
&lt;/h4&gt;

&lt;p&gt;The idea of shifting time is very simple. It just shift audio to left/right with a random second. If shifting audio to left (fast forward) with x seconds, first x seconds will mark as 0 (i.e. silence). If shifting audio to right (back forward) with x seconds, last x seconds will mark as 0 (i.e. silence).&lt;/p&gt;



&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import numpy as np

def manipulate(data, sampling\_rate, shift\_max, shift\_direction):
 shift = np.random.randint(sampling\_rate \* shift\_max)
 if shift\_direction == 'right':
 shift = -shift
 elif self.shift\_direction == 'both':
 direction = np.random.randint(0, 2)
 if direction == 1:
 shift = -shift

augmented\_data = np.roll(data, shift)
 # Set to silence for heading/ tailing
 if shift \&amp;gt; 0:
 augmented\_data[:shift] = 0
 else:
 augmented\_data[shift:] = 0
 return augmented\_data
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--5P3-d6Dm--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/604/1%2AqAwU-YJYltf3nXkeF0qO8A.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--5P3-d6Dm--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/604/1%2AqAwU-YJYltf3nXkeF0qO8A.png" alt=""&gt;&lt;/a&gt;Comparison between original and shifted voice&lt;/p&gt;

&lt;h4&gt;
  
  
  Changing Pitch
&lt;/h4&gt;

&lt;p&gt;This augmentation is a wrapper of &lt;a href="https://librosa.github.io/librosa/generated/librosa.effects.pitch_shift.html"&gt;librosa&lt;/a&gt; function. It change pitch randomly&lt;/p&gt;



&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import librosa

def manipulate(data, sampling\_rate, pitch\_factor):
 return librosa.effects.pitch\_shift(data, sampling\_rate, pitch\_factor)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--T66pGbc5--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/604/1%2Aw5c8k9poONgevgYCvmQiuQ.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--T66pGbc5--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/604/1%2Aw5c8k9poONgevgYCvmQiuQ.png" alt=""&gt;&lt;/a&gt;Comparison between original and changed pitch voice&lt;/p&gt;

&lt;h4&gt;
  
  
  Changing Speed
&lt;/h4&gt;

&lt;p&gt;Same as changing pitch, this augmentation is performed by &lt;a href="https://librosa.github.io/librosa/generated/librosa.effects.time_stretch.html"&gt;librosa&lt;/a&gt;function. It stretches times series by a fixed rate.&lt;/p&gt;



&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import librosa

def manipulate(data, speed\_factor):
 return librosa.effects.time\_stretch(data, speed\_factor)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Qb3A0_VW--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/604/1%2AbEEmdjVd_ddPuEKpD_dPeQ.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Qb3A0_VW--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/604/1%2AbEEmdjVd_ddPuEKpD_dPeQ.png" alt=""&gt;&lt;/a&gt;Comparison between original and changed speed voice&lt;/p&gt;

&lt;h3&gt;
  
  
  Take Away
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Above 4 methods are implemented in &lt;a href="https://github.com/makcedward/nlpaug"&gt;nlpaug&lt;/a&gt; package (≥ 0.0.3). You can generate augmented data within a few line of code.&lt;/li&gt;
&lt;li&gt;Data augmentation cannot replace real training data. It just help to generate synthetic data to make the model better.&lt;/li&gt;
&lt;li&gt;Do not blindly generate synthetic data. You have to understand your data pattern and selecting a appropriate way to increase training data volume.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  About Me
&lt;/h3&gt;

&lt;p&gt;I am Data Scientist in Bay Area. Focusing on state-of-the-art in Data Science, Artificial Intelligence , especially in NLP and platform related. Feel free to connect with &lt;a href="https://makcedward.github.io/"&gt;me&lt;/a&gt; on &lt;a href="https://www.linkedin.com/in/edwardma1026"&gt;LinkedIn&lt;/a&gt; or following me on   &lt;a href="https://github.com/makcedward"&gt;Github&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Extension Reading
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/CVxTz/audio-data-augmentation"&gt;Audio Data Augmentation in Kaggle competition&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://librosa.github.io/"&gt;librosa&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://towardsdatascience.com/data-augmentation-in-nlp-2801a34dfc28"&gt;Data Augmentation in NLP&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://towardsdatascience.com/data-augmentation-library-for-text-9661736b13ff"&gt;Data Augmentation for Text&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/makcedward/data-augmentation-for-speech-recognition-3279-temp-slug-8408894"&gt;Data Augmentation for Spectrogram&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




</description>
      <category>speechrecognition</category>
      <category>machinelearning</category>
      <category>ai</category>
      <category>datascience</category>
    </item>
    <item>
      <title>3 subword algorithms help to improve your NLP model performance</title>
      <dc:creator>Edward Ma</dc:creator>
      <pubDate>Sun, 19 May 2019 01:37:13 +0000</pubDate>
      <link>https://dev.to/makcedward/3-subword-algorithms-help-to-improve-your-nlp-model-performance-333n</link>
      <guid>https://dev.to/makcedward/3-subword-algorithms-help-to-improve-your-nlp-model-performance-333n</guid>
      <description>&lt;h4&gt;
  
  
  Introduction to subword
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--yx3riG2Q--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/1024/0%2AOzrGnF_f8bj3nqAf" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--yx3riG2Q--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/1024/0%2AOzrGnF_f8bj3nqAf" alt=""&gt;&lt;/a&gt;Photo by &lt;a href="https://unsplash.com/@makcedward?utm_source=medium&amp;amp;utm_medium=referral"&gt;Edward Ma&lt;/a&gt; on &lt;a href="https://unsplash.com?utm_source=medium&amp;amp;utm_medium=referral"&gt;Unsplash&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://towardsdatascience.com/3-silver-bullets-of-word-embedding-in-nlp-10fa8f50cc5a"&gt;Classic word representation&lt;/a&gt; cannot handle unseen word or rare word well. &lt;a href="https://towardsdatascience.com/besides-word-embedding-why-you-need-to-know-character-embedding-6096a34a3b10"&gt;Character embeddings&lt;/a&gt; is one of the solution to overcome out-of-vocabulary (OOV). However, it may too fine-grained any missing some important information. Subword is in between word and character. It is not too fine-grained while able to handle unseen word and rare word.&lt;/p&gt;

&lt;p&gt;For example, we can split “subword” to “sub” and “word”. In other word we use two vector (i.e. “sub” and “word”) to represent “subword”. You may argue that it uses more resource to compute it but the reality is that we can use less footprint by comparing to word representation.&lt;/p&gt;

&lt;p&gt;This story will discuss about &lt;a href="https://arxiv.org/pdf/1808.06226.pdf"&gt;SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing&lt;/a&gt; (Kudo et al., 2018) and further discussing about different subword algorithms. The following are will be covered:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Byte Pair Encoding (BPE)&lt;/li&gt;
&lt;li&gt;WordPiece&lt;/li&gt;
&lt;li&gt;Unigram Language Model&lt;/li&gt;
&lt;li&gt;SentencePiece&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;&lt;em&gt;Byte Pair Encoding (BPE)&lt;/em&gt;&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Sennrich et al. (2016) proposed to use Byte Pair Encoding (BPE) to build subword dictionary. Radfor et al adopt BPE to construct subword vector to build &lt;a href="https://towardsdatascience.com/too-powerful-nlp-model-generative-pre-training-2-4cc6afb6655"&gt;GPT-2&lt;/a&gt; in 2019.&lt;/p&gt;

&lt;h4&gt;
  
  
  Algorithm
&lt;/h4&gt;

&lt;ol&gt;
&lt;li&gt;Prepare a large enough training data (i.e. corpus)&lt;/li&gt;
&lt;li&gt;Define a desired subword vocabulary size&lt;/li&gt;
&lt;li&gt;Split word to sequence of characters and appending suffix “&amp;lt;/w&amp;gt;” to end of word with word frequency. So the basic unit is character in this stage. For example, the frequency of “low” is 5, then we rephrase it to “l o w &amp;lt;/w&amp;gt;”: 5&lt;/li&gt;
&lt;li&gt;Generating a new subword according to the high frequency occurrence.&lt;/li&gt;
&lt;li&gt;Repeating step 4 until reaching subword vocabulary size which is defined in step 2 or the next highest frequency pair is 1.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s---hwO-9A7--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/485/1%2A_bpIUb6YZr6DOMLAeSU2WA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s---hwO-9A7--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/485/1%2A_bpIUb6YZr6DOMLAeSU2WA.png" alt=""&gt;&lt;/a&gt;Algorithm of BPE (Sennrich et al., 2015)&lt;/p&gt;

&lt;h4&gt;
  
  
  Example
&lt;/h4&gt;

&lt;p&gt;Taking “low: 5”, “lower: 2”, “newest: 6” and “widest: 3” as an example, the highest frequency subword pair is e and s. It is because we get 6 count from newest and 3 count from widest. Then new subword (es) is formed and it will become a candidate in next iteration.&lt;/p&gt;

&lt;p&gt;In the second iteration, the next high frequency subword pair is es (generated from previous iteration )and t. It is because we get 6count from newest and 3 count from widest.&lt;/p&gt;

&lt;p&gt;Keep iterate until built a desire size of vocabulary size or the next highest frequency pair is 1.&lt;/p&gt;

&lt;h3&gt;
  
  
  WordPiece
&lt;/h3&gt;

&lt;p&gt;WordPiece is another word segmentation algorithm and it is similar with BPE. Schuster and Nakajima introduced WordPiece by solving Japanese and Korea voice problem in 2012. Basically, WordPiece is similar with BPE and the difference part is forming a new subword by likelihood but not the next highest frequency pair.&lt;/p&gt;

&lt;h4&gt;
  
  
  Algorithm
&lt;/h4&gt;

&lt;ol&gt;
&lt;li&gt;Prepare a large enough training data (i.e. corpus)&lt;/li&gt;
&lt;li&gt;Define a desired subword vocabulary size&lt;/li&gt;
&lt;li&gt;Split word to sequence of characters&lt;/li&gt;
&lt;li&gt;Build a languages model based on step 3 data&lt;/li&gt;
&lt;li&gt;Choose the new word unit out of all the possible ones that increases the likelihood on the training data the most when added to the model.&lt;/li&gt;
&lt;li&gt;Repeating step 5until reaching subword vocabulary size which is defined in step 2 or the likelihood increase falls below a certain threshold.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Unigram Language Model
&lt;/h3&gt;

&lt;p&gt;Kudo. introduced unigram language model as another algorithm for subword segmentation. One of the assumption is all subword occurrence are independently and subword sequence is produced by the product of subword occurrence probabilities. Both WordPiece and Unigram Language Model leverages languages model to build subword vocabulary.&lt;/p&gt;

&lt;h4&gt;
  
  
  Algorithm
&lt;/h4&gt;

&lt;ol&gt;
&lt;li&gt;Prepare a large enough training data (i.e. corpus)&lt;/li&gt;
&lt;li&gt;Define a desired subword vocabulary size&lt;/li&gt;
&lt;li&gt;Optimize the probability of word occurrence by giving a word sequence.&lt;/li&gt;
&lt;li&gt;Compute the loss of each subword&lt;/li&gt;
&lt;li&gt;Sort the symbol by loss and keep top X % of word (e.g. X can be 80). To avoid out-of-vocabulary, character level is recommend to be included as subset of subword.&lt;/li&gt;
&lt;li&gt;Repeating step 3–5until reaching subword vocabulary size which is defined in step 2 or no change in step 5.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  SentencePiece
&lt;/h3&gt;

&lt;p&gt;So, any existing library which we can leverage it for our text processing? Kudo and Richardson implemented &lt;a href="https://github.com/google/sentencepiece"&gt;SentencePiece&lt;/a&gt; library. You have to train your tokenizer based on your data such that you can encode and decoding your data for downstream tasks.&lt;/p&gt;

&lt;p&gt;First of all, preparing a plain text including your data and then triggering the following API to train the model&lt;/p&gt;



&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import sentencepiece as spm
spm.SentencePieceTrainer.Train('--input=test/botchan.txt --model\_prefix=m --vocab\_size=1000')
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;It is super fast and you can load the model by&lt;/p&gt;



&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sp = spm.SentencePieceProcessor()
sp.Load("m.model")
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;To encode your text, you just need to&lt;/p&gt;



&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sp.EncodeAsIds("This is a test")
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;For more examples and usages, you can access this &lt;a href="https://github.com/google/sentencepiece/blob/master/python/README.md"&gt;repo&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Take Away
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Subword balances vocabulary size and footprint. Extreme case is we can only use 26 token (i.e. character) to present all English word. 16k or 32k subwords are recommended vocabulary size to have a good result.&lt;/li&gt;
&lt;li&gt;Many Asian language word cannot be separated by space. Therefore, the initial vocabulary is larger than English a lot. You may need to prepare over 10k initial word to kick start the word segmentation. From Schuster and Nakajima research, they propose to use 22k word and 11k word for Japanese and Korean respectively.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Like to learn?
&lt;/h3&gt;

&lt;p&gt;I am Data Scientist in Bay Area. Focusing on state-of-the-art in Data Science, Artificial Intelligence , especially in NLP and platform related. Feel free to connect with &lt;a href="https://makcedward.github.io/"&gt;me&lt;/a&gt; on &lt;a href="https://www.linkedin.com/in/edwardma1026"&gt;LinkedIn&lt;/a&gt; or following me on &lt;a href="http://medium.com/@makcedward/"&gt;Medium&lt;/a&gt; or &lt;a href="https://github.com/makcedward"&gt;Github&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Extension Reading
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://towardsdatascience.com/3-silver-bullets-of-word-embedding-in-nlp-10fa8f50cc5a"&gt;Classic word representation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://towardsdatascience.com/besides-word-embedding-why-you-need-to-know-character-embedding-6096a34a3b10"&gt;Character embeddings&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://towardsdatascience.com/too-powerful-nlp-model-generative-pre-training-2-4cc6afb6655"&gt;Too powerful NLP model (GPT-2)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/google/sentencepiece"&gt;SentencePiece GIT repo&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Reference
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;T. Kudo and J. Richardson. &lt;a href="https://arxiv.org/pdf/1808.06226.pdf"&gt;SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing&lt;/a&gt;. 2018&lt;/li&gt;
&lt;li&gt;R. Sennrich, B. Haddow and A. Birch. &lt;a href="http://aclweb.org/anthology/P16-1162"&gt;Neural Machine Translation of Rare Words with Subword Units&lt;/a&gt;. 2015&lt;/li&gt;
&lt;li&gt;M. Schuster and K. Nakajima. &lt;a href="https://storage.googleapis.com/pub-tools-public-publication-data/pdf/37842.pdf"&gt;Japanese and Korea Voice Search&lt;/a&gt;. 2012&lt;/li&gt;
&lt;li&gt;Taku Kudo. &lt;a href="https://arxiv.org/pdf/1804.10959.pdf"&gt;Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates&lt;/a&gt;. 2018&lt;/li&gt;
&lt;/ul&gt;




</description>
      <category>datascience</category>
      <category>naturallanguageproce</category>
      <category>tokenization</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>How do they apply BERT in the clinical domain?</title>
      <dc:creator>Edward Ma</dc:creator>
      <pubDate>Mon, 06 May 2019 13:31:03 +0000</pubDate>
      <link>https://dev.to/makcedward/how-do-they-apply-bert-in-the-clinical-domain-1ipd</link>
      <guid>https://dev.to/makcedward/how-do-they-apply-bert-in-the-clinical-domain-1ipd</guid>
      <description>&lt;h4&gt;
  
  
  BERT in clinical domain
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--BLtLvP2_--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/1024/0%2AYtLXZbfYDflzxBJw" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--BLtLvP2_--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/1024/0%2AYtLXZbfYDflzxBJw" alt=""&gt;&lt;/a&gt;Photo by &lt;a href="https://unsplash.com/@makcedward?utm_source=medium&amp;amp;utm_medium=referral"&gt;Edward Ma&lt;/a&gt; on &lt;a href="https://unsplash.com?utm_source=medium&amp;amp;utm_medium=referral"&gt;Unsplash&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Contextual word embeddings is proven that have dramatically improved NLP model performance via &lt;a href="https://towardsdatascience.com/elmo-helps-to-further-improve-your-word-embeddings-c6ed2c9df95f"&gt;ELMo&lt;/a&gt; (Peters et al., 2018), &lt;a href="https://towardsdatascience.com/how-bert-leverage-attention-mechanism-and-transformer-to-learn-word-contextual-relations-5bbee1b6dbdb"&gt;BERT&lt;/a&gt; (Devlin et al., 2018) and &lt;a href="https://towardsdatascience.com/too-powerful-nlp-model-generative-pre-training-2-4cc6afb6655"&gt;GPT-2&lt;/a&gt; (Radford et al., 2019). Lots of researches intend to fine tune BERT model on domain specific data. &lt;a href="https://towardsdatascience.com/how-to-apply-bert-in-scientific-domain-2d9db0480bd9"&gt;BioBERT and SciBERT&lt;/a&gt; are introduced in last time. Would like to continue on this topic as there are another 2 research fine tune BERT model and applying in the clinical domain.&lt;/p&gt;

&lt;p&gt;This story will discuss about &lt;a href="https://arxiv.org/pdf/1904.03323.pdf"&gt;Publicly Available Clinical BERT Embeddings&lt;/a&gt; (Alsentzer et al., 2019) and &lt;a href="https://arxiv.org/pdf/1904.05342.pdf"&gt;ClinicalBert: Modeling Clinical Notes and Predicting Hospital Readmission&lt;/a&gt; (Huang et al., 2019) while it will go through BERT detail but focusing how researchers applying it in clinical domain. In case, you want to understand more about BERT, you may visit this &lt;a href="https://towardsdatascience.com/how-bert-leverage-attention-mechanism-and-transformer-to-learn-word-contextual-relations-5bbee1b6dbdb"&gt;story&lt;/a&gt;.The following are will be covered:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Building clinical specific BERT resource&lt;/li&gt;
&lt;li&gt;Application for ClinicalBERT&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Building clinical specific BERT resource
&lt;/h3&gt;

&lt;p&gt;Alsentzer et al. apply 2 millon notes in the &lt;a href="https://mimic.physionet.org/gettingstarted/dbsetup/"&gt;MIMIC-III v1.4 database&lt;/a&gt; (Johnson et al., 2016). There are among 15 note types in total and Alsentzer et al. aggregate to either non-Discharge Summary type and Discharge Summary type. Discharge summary data is designed for downstream tasks training/ fine-tuning.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--bJ1DWGiI--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/439/1%2A9V7c5J8UR-N5sLTLQ7rTYA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--bJ1DWGiI--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/439/1%2A9V7c5J8UR-N5sLTLQ7rTYA.png" alt=""&gt;&lt;/a&gt;Distribution of note type MIMIC-III v1.4 (Alsentzer et al., 2019)&lt;/p&gt;

&lt;p&gt;Giving that those data, &lt;a href="https://arxiv.org/pdf/1902.07669.pdf"&gt;ScispaCy&lt;/a&gt; is leveraged to tokenize article to sentence. Those sentences will be passed to BERT-Base (Original BERT base model) and &lt;a href="https://github.com/naver/biobert-pretrained"&gt;BioBERT&lt;/a&gt; respectively for additional pre-training.&lt;/p&gt;

&lt;p&gt;Clinical BERT is build based on BERT-base while Clinical BioBERT is based on BioBERT. Once the contextual word embeddings is trained, a signal linear layer classification model is trained for tacking named-entity recognition (NER), de-identification (de-ID) task or sentiment classification.&lt;/p&gt;

&lt;p&gt;These models achieves a better result in MedNLI by comparing to original BERT model. Meanwhile, you may notice that there are no improvement fro i2b2 2006 and i2b2 2014 which are de-ID tasks.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--9BE4SR7d--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/904/1%2AAKULrfnaGFRyqtBfEOCeOA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--9BE4SR7d--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/904/1%2AAKULrfnaGFRyqtBfEOCeOA.png" alt=""&gt;&lt;/a&gt;Performance comparison among different models in MedNLI and i2b2 data set (Alsentzer et al., 2019))&lt;/p&gt;

&lt;h3&gt;
  
  
  Application for ClinicalBERT
&lt;/h3&gt;

&lt;p&gt;In the same time, Huang et al. also focus on clinical notes. However, the major objective of Huang et al. research is building a prediction model by leveraging a good clinical text representation. Huang et al. researched that lower readmission rate is good for patients such as saving money.&lt;/p&gt;

&lt;p&gt;Same as Alsentzer et al., MIMIC-III dataset (Johnson et al., 2016) are used for evaluation. Following same BERT practice, contextual word embeddings is trained by predicting a masked token and next sentence prediction. In short, predicting a masked token is mask a token randomly and using surrounding words to predict masked token. Next sentence prediction is a binary classifier, output of this model is classifying whether second sentence is a next sentence of first sentence or not.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Pcmej0ip--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/662/1%2Au2H0R8HmpuyXFBGHWJW4yQ.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Pcmej0ip--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/662/1%2Au2H0R8HmpuyXFBGHWJW4yQ.png" alt=""&gt;&lt;/a&gt;Training tasks of ClincialBERT (Huang et al., 2019)&lt;/p&gt;

&lt;p&gt;After having a pre-trained contextual word embeddings, fine-tuned process is applied on readmission prediction. It is a binary classification model to predict whether patient need to be readmission within the next 30 days.&lt;/p&gt;

&lt;p&gt;One of the BERT model limitation is maximum length of token is 512. A long clinical note will be split to multiple parts and predicting it separately. Once all sub-part is predicted, a final probability will be aggregated. Due to the concern on using maximum or mean purely, Huang et al. combine both of them to have a accurate result.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--iJJPeaQp--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/429/1%2AfArd8urwJFW3V4wv9_LVMw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--iJJPeaQp--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/429/1%2AfArd8urwJFW3V4wv9_LVMw.png" alt=""&gt;&lt;/a&gt;Scalable radmission prediction formula (Huang et al., 2019)&lt;/p&gt;

&lt;p&gt;Finally, the experiment result demonstrated a fine-tuned ClinicalBERT is better than classical model.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--_ZieH_qz--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/804/1%2Av2ZzgyAvrHHBUNVIeQ1GiQ.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--_ZieH_qz--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/804/1%2Av2ZzgyAvrHHBUNVIeQ1GiQ.png" alt=""&gt;&lt;/a&gt;Performance comparison among models (Huang et al., 2019)&lt;/p&gt;

&lt;h3&gt;
  
  
  Take Away
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Alsentzer et al. uses a signal layer of classification model to evaluate result. It may be a good start for that and expected BERT model able to learn the content. Evaluating other advanced model architecture may provide a better comprehensive experiment result.&lt;/li&gt;
&lt;li&gt;For long clinical note, Huang et al. uses some mathematics trick to resolve it. It may not able to capture content a very long clinical notes. May need to further think about a better way to tackle a long input.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Like to learn?
&lt;/h3&gt;

&lt;p&gt;I am Data Scientist in Bay Area. Focusing on state-of-the-art in Data Science, Artificial Intelligence , especially in NLP and platform related. Feel free to connect with &lt;a href="https://makcedward.github.io/"&gt;me&lt;/a&gt; on &lt;a href="https://www.linkedin.com/in/edwardma1026"&gt;LinkedIn&lt;/a&gt; or following me on &lt;a href="http://medium.com/@makcedward/"&gt;Medium&lt;/a&gt; or &lt;a href="https://github.com/makcedward"&gt;Github&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Extension Reading
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/EmilyAlsentzer/clinicalBERT"&gt;Clinical BERT Embeddings GIT repository&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/kexinhuang12345/clinicalBERT"&gt;ClinicalBERT&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://mimic.physionet.org/gettingstarted/dbsetup/"&gt;MIMIC-III v1.4 database&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://towardsdatascience.com/how-to-apply-bert-in-scientific-domain-2d9db0480bd9"&gt;BioBERT and SciBERT&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Reference
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;E. Alsentzer, J. R. Murphy, W. Boag, W. H. Weng, D. Jin, T. Naumann and M. B. A. McDermott. &lt;a href="https://arxiv.org/pdf/1904.03323.pdf"&gt;Publicly Available Clinical BERT Embeddings&lt;/a&gt;. 2019&lt;/li&gt;
&lt;li&gt;K. Huang, J. Altosaar and R. Ranganath. &lt;a href="https://arxiv.org/pdf/1904.05342.pdf"&gt;ClinicalBert: Modeling Clinical Notes and Predicting Hospital Readmission&lt;/a&gt;. 2019&lt;/li&gt;
&lt;/ul&gt;




</description>
      <category>nlp</category>
      <category>machinelearning</category>
      <category>python</category>
      <category>datascience</category>
    </item>
    <item>
      <title>Data Augmentation for Speech Recognition</title>
      <dc:creator>Edward Ma</dc:creator>
      <pubDate>Wed, 01 May 2019 13:33:53 +0000</pubDate>
      <link>https://dev.to/makcedward/data-augmentation-for-speech-recognition-bfc</link>
      <guid>https://dev.to/makcedward/data-augmentation-for-speech-recognition-bfc</guid>
      <description>&lt;h4&gt;
  
  
  Automatic Speech Recognition (ASR)
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F0%2AiD9ioERFLy_BIX4f" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F0%2AiD9ioERFLy_BIX4f"&gt;&lt;/a&gt;Photo by &lt;a href="https://unsplash.com/@makcedward?utm_source=medium&amp;amp;utm_medium=referral" rel="noopener noreferrer"&gt;Edward Ma&lt;/a&gt; on &lt;a href="https://unsplash.com?utm_source=medium&amp;amp;utm_medium=referral" rel="noopener noreferrer"&gt;Unsplash&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The objective of Speech Recognition is converting audio to text. This technology is applied in our life widely. &lt;a href="https://en.wikipedia.org/wiki/Google_Assistant" rel="noopener noreferrer"&gt;Google Assistant&lt;/a&gt; and &lt;a href="https://en.wikipedia.org/wiki/Amazon_Alexa" rel="noopener noreferrer"&gt;Amazon Alexa&lt;/a&gt; are some of the examples which taking our voice as input and converting to text to understand our intention.&lt;/p&gt;

&lt;p&gt;Same as other NLP problem, one of critical challenge is lack of adequate volume of training data. It leads overfit or hard to tackle unseen data. &lt;a href="https://ai.google/research/teams/brain/" rel="noopener noreferrer"&gt;Google Brain&lt;/a&gt; team with AI Resident come to tackle this problem by introducing several data augmentation method for speech recognition. This story will discuss about &lt;a href="https://arxiv.org/pdf/1904.08779.pdf" rel="noopener noreferrer"&gt;SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition&lt;/a&gt; (Park et al., 2019) and the following are will be covered:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Data&lt;/li&gt;
&lt;li&gt;Architecture&lt;/li&gt;
&lt;li&gt;Experiment&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Data
&lt;/h3&gt;

&lt;p&gt;To process data, waveform audio converts to spectrogram and feeding to neural network to generate output. Traditional way to perform data augmentation is normally applied to waveform. Park et al. go for another approach which is manipulate spectrogram.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F640%2F0%2ADMFTlxD5-3rhmcks.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F640%2F0%2ADMFTlxD5-3rhmcks.png"&gt;&lt;/a&gt;Waveform audio to spectrogram (&lt;a href="https://ai.googleblog.com/2019/04/specaugment-new-data-augmentation.html" rel="noopener noreferrer"&gt;Google Brain&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;Given a spectrogram, you can view it as an image where x axis is time while y axis is frequency.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1000%2F0%2Ad3wpu740AzHNtsof.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1000%2F0%2Ad3wpu740AzHNtsof.png"&gt;&lt;/a&gt;Spectrogram representation (&lt;a href="https://librosa.github.io/librosa/generated/librosa.feature.melspectrogram.html" rel="noopener noreferrer"&gt;librosa&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;Intuitively, it improves training speed because no data transformation between waveform data to spectrogram data but augmenting spectrogram data.&lt;/p&gt;

&lt;p&gt;Park et al. introduced SpecAugment for data augmentation in speech recognition. There are 3 basic ways to augment data which are time warping, frequency masking and time masking. In their experiment, they combine these ways to together and introducing 4 different combinations which are LibriSpeech basic (LB), LibriSpeech double (LD), Switchboard mild (SM) and Switchboard strong (SS).&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;&lt;em&gt;Time Warping&lt;/em&gt;&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;A random point will be selected and warping to either left or right with a distance w which chosen from a uniform distribution from 0 to the time warp parameter W along that line.&lt;/p&gt;

&lt;h4&gt;
  
  
  Frequency Masking
&lt;/h4&gt;

&lt;p&gt;A frequency channels [f0, f0 + f) are masked. f is chosen from a uniform distribution from 0 to the frequency mask parameter F, and f0 is chosen from (0, ν − f) where ν is the number of frequency channels.&lt;/p&gt;

&lt;h4&gt;
  
  
  Time Masking
&lt;/h4&gt;

&lt;p&gt;t consecutive time steps [t0, t0 + t) are masked. t is chosen from a uniform distribution from 0 to the time mask parameter T, and t0 is chosen from [0, τ − t).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F504%2F1%2Axq6oahbJzFI9HdwY1fdWJw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F504%2F1%2Axq6oahbJzFI9HdwY1fdWJw.png"&gt;&lt;/a&gt;From top to bottom, the figures depict the log mel spectrogram of the base input with no augmentation, time warp, frequency masking and time masking applied. (Park et al., 2019)&lt;/p&gt;

&lt;h4&gt;
  
  
  Combination of basic augmentation policy
&lt;/h4&gt;

&lt;p&gt;By combing the augmentation policy of Frequency Masking and Time Masking, 4 new augmentation policies are introduced. While the symbols denote:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;W: Time Warping Parameter&lt;/li&gt;
&lt;li&gt;F: Frequency Masking Parameter&lt;/li&gt;
&lt;li&gt;mF: Number of frequency masking applied&lt;/li&gt;
&lt;li&gt;T: Time Masking Parameter&lt;/li&gt;
&lt;li&gt;mT: Number of time masking applied&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F521%2F1%2AxfOBzbams8Z27HLVvTRvFA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F521%2F1%2AxfOBzbams8Z27HLVvTRvFA.png"&gt;&lt;/a&gt;Configuration for LB, LD, SM and SS (Park et al., 2019)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F540%2F1%2APFcBVKIa4zWtcnwBjvy2NA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F540%2F1%2APFcBVKIa4zWtcnwBjvy2NA.png"&gt;&lt;/a&gt;From top to bottom, the figures depict the log mel spectrogram of the base input with policies None, LB and LD applied. (Park et al., 2019)&lt;/p&gt;

&lt;h3&gt;
  
  
  Architecture
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Listen, Attend and Spell (LAS) Network Architecture
&lt;/h4&gt;

&lt;p&gt;Park et al. uses LAS network architecture to verify the performance with and without data augmentation. It includes 2 layers of Convolutional Neural Network (CNN), attention and stacked bi-directional LSTMs. As the objective of this paper is data augmentation and the model is leveraged to see the impact of models, you can deep dive into LAS from &lt;a href="https://arxiv.org/pdf/1508.01211.pdf" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;
  
  
  Learning Rate Schedules
&lt;/h4&gt;

&lt;p&gt;Learning rate schedule turn out to be come a critical factor to determine model performance. Similar to &lt;a href="https://towardsdatascience.com/multi-task-learning-in-language-model-for-text-classification-c3acc1fedd89" rel="noopener noreferrer"&gt;Slanted triangular learning rates (STLR)&lt;/a&gt;, a non-static learning rate is applied. Learning rate will be decay exponentially until it reaches 1/100 of its maximum value and keeping it as constant beyond this point. Some parameters are denoted:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;sr: Step of the ramp-up (from zero learning rate) is complete&lt;/li&gt;
&lt;li&gt;si: Step of the exponential decay starts&lt;/li&gt;
&lt;li&gt;sf: Step of the exponential decay stops.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Another learning rate schedule is uniform label smoothing. The correct class label is assigned the confidence 0.9, while the confidence of the other labels are increased accordingly. Parameter is denoted:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;snoise: Variational weight noise&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In later experiment, three standard learning rate schedules are defined:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;B(asic): (sr, snoise, si, sf ) = (0.5k, 10k, 20k, 80k)&lt;/li&gt;
&lt;li&gt;D(ouble): (sr, snoise, si, sf ) = (1k, 20k, 40k, 160k)&lt;/li&gt;
&lt;li&gt;L(ong): (sr, snoise, si, sf ) = (1k, 20k, 140k, 320k)&lt;/li&gt;
&lt;/ol&gt;

&lt;h4&gt;
  
  
  Langauge Models (LM)
&lt;/h4&gt;

&lt;p&gt;LM is applied to further boost up the model performance. In general, LM is designed to predict next token given consequence of previous tokens. Once a new token is predicted, it will be treat as “previous token” when predicting next tokens. This approach is applied in lots of modern NLP model such as &lt;a href="https://towardsdatascience.com/how-bert-leverage-attention-mechanism-and-transformer-to-learn-word-contextual-relations-5bbee1b6dbdb" rel="noopener noreferrer"&gt;BERT&lt;/a&gt; and &lt;a href="https://towardsdatascience.com/too-powerful-nlp-model-generative-pre-training-2-4cc6afb6655" rel="noopener noreferrer"&gt;GPT-2&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Experiment
&lt;/h3&gt;

&lt;p&gt;Model performance is measured by &lt;a href="https://en.wikipedia.org/wiki/Word_error_rate" rel="noopener noreferrer"&gt;Word Error Rate&lt;/a&gt; (WER).&lt;/p&gt;

&lt;p&gt;From the below figure, “Sch” denotes as learning rate schedule while “Pol” denotes as augmentation policy. We can see that LAS with 6 LSTM layer and 1280 embedding vector perform the best result.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F490%2F1%2AgAVZttVQlypTtT6Cbncqzw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F490%2F1%2AgAVZttVQlypTtT6Cbncqzw.png"&gt;&lt;/a&gt;Evaluation of LibriSpeech (Park et al., 2019)&lt;/p&gt;

&lt;p&gt;By using LAS-6–1280 with SpecAugment perform the best result when comparing to other model and LAS without data augmentation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F491%2F1%2A74sHvUtwGvuN04nCZ9tfRg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F491%2F1%2A74sHvUtwGvuN04nCZ9tfRg.png"&gt;&lt;/a&gt;Comparing SpecAugment method in LibriSpeech 960h (Park et al., 2019)&lt;/p&gt;

&lt;p&gt;In Switchboard 300h, LAS-4–1024 is applied to be benchmark. We can see that SpecAugment did help on further boost up model performance.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F485%2F1%2ATeKA8tmEsyNxBu-E0m9FbA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F485%2F1%2ATeKA8tmEsyNxBu-E0m9FbA.png"&gt;&lt;/a&gt;Comparing SpecAugment method in Switchboard 300h (Park et al., 2019)&lt;/p&gt;

&lt;h3&gt;
  
  
  Take Away
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Time warping did not improve model performance a lot. If resource is limited, this approach will be discarded.&lt;/li&gt;
&lt;li&gt;Label smoothing leads instability to training.&lt;/li&gt;
&lt;li&gt;Data augmentation converts over-fit problem to under-fit problems. From below figures, you can notice that the model without augmentation (None) perform nearly perfect in training set while no similar result is performed in other dataset.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F445%2F1%2A210d_HNZ72-BAA0VUmCxyA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F445%2F1%2A210d_HNZ72-BAA0VUmCxyA.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;To facilitate data augmentation for speech recognition, &lt;a href="https://github.com/makcedward/nlpaug" rel="noopener noreferrer"&gt;nlpaug&lt;/a&gt; supports SpecAugment methods now.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  About Me
&lt;/h3&gt;

&lt;p&gt;I am Data Scientist in Bay Area. Focusing on state-of-the-art in Data Science, Artificial Intelligence , especially in NLP and platform related. Feel free to connect with &lt;a href="https://makcedward.github.io/" rel="noopener noreferrer"&gt;me&lt;/a&gt; on &lt;a href="https://www.linkedin.com/in/edwardma1026" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; or following me on &lt;a href="http://medium.com/@makcedward/" rel="noopener noreferrer"&gt;Medium&lt;/a&gt; or &lt;a href="https://github.com/makcedward" rel="noopener noreferrer"&gt;Github&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Extension Reading
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://towardsdatascience.com/data-augmentation-in-nlp-2801a34dfc28" rel="noopener noreferrer"&gt;Data Augmentation in NLP&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://towardsdatascience.com/data-augmentation-library-for-text-9661736b13ff" rel="noopener noreferrer"&gt;Data Augmentation for Text&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://towardsdatascience.com/data-augmentation-for-audio-76912b01fdf6" rel="noopener noreferrer"&gt;Data Augmentation for Audio&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://ai.googleblog.com/2019/04/specaugment-new-data-augmentation.html" rel="noopener noreferrer"&gt;Official release of SpecAugment from Google&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://towardsdatascience.com/multi-task-learning-in-language-model-for-text-classification-c3acc1fedd89" rel="noopener noreferrer"&gt;Slanted triangular learning rates (STLR)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://towardsdatascience.com/how-bert-leverage-attention-mechanism-and-transformer-to-learn-word-contextual-relations-5bbee1b6dbdb" rel="noopener noreferrer"&gt;Bidirectional Encoder Representations from Transformers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://towardsdatascience.com/too-powerful-nlp-model-generative-pre-training-2-4cc6afb6655" rel="noopener noreferrer"&gt;Generative Pre-Training 2&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Reference
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;D. S. Park, W. Chan, Y. Zhang, C. C. Chiu, B. Zoph, E. D. Cubuk and Q. V. Le. &lt;a href="https://arxiv.org/pdf/1904.08779.pdf" rel="noopener noreferrer"&gt;SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition&lt;/a&gt;. 2019&lt;/li&gt;
&lt;li&gt;W. Chan, N. Jaitly, Q. V. Le and O. Vinyals. &lt;a href="https://arxiv.org/pdf/1508.01211.pdf" rel="noopener noreferrer"&gt;Listen, Attend and Spell&lt;/a&gt;. 2015&lt;/li&gt;
&lt;/ul&gt;




</description>
      <category>datascience</category>
      <category>speechrecognition</category>
      <category>ai</category>
      <category>nlp</category>
    </item>
  </channel>
</rss>
