<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Jesús Seijas</title>
    <description>The latest articles on DEV Community by Jesús Seijas (@jesusseijassp).</description>
    <link>https://dev.to/jesusseijassp</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F473908%2Fc6fae116-0433-4496-b4c4-96dd3f2d5e4a.jpeg</url>
      <title>DEV Community: Jesús Seijas</title>
      <link>https://dev.to/jesusseijassp</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/jesusseijassp"/>
    <language>en</language>
    <item>
      <title>Getting started with NLP.js</title>
      <dc:creator>Jesús Seijas</dc:creator>
      <pubDate>Wed, 23 Sep 2020 13:19:14 +0000</pubDate>
      <link>https://dev.to/jesusseijassp/getting-started-with-nlp-js-4l1p</link>
      <guid>https://dev.to/jesusseijassp/getting-started-with-nlp-js-4l1p</guid>
      <description>&lt;p&gt;Ever wanted to build a chatbot and encountered some blockers along the way relating to data privacy or supported languages? Do you wish to reduce chatbot response time or run them without an active data connection? &lt;/p&gt;

&lt;p&gt;If that’s the case or if you’re just curious and want to learn more, give &lt;a href="https://github.com/axa-group/nlp.js" rel="noopener noreferrer"&gt;NLP.js&lt;/a&gt; a try.&lt;/p&gt;

&lt;h2&gt;
  
  
  Natural Language Processing &amp;amp; NLP.js
&lt;/h2&gt;

&lt;p&gt;Natural Language Processing or NLP is a field combining linguistics and computing, as well as artificial intelligence. Correctly understanding natural language is critical for virtual assistants, chatbots, voice assistants, and a wide range of applications based on a voice or text interface with a machine.&lt;br&gt;
These applications typically include a Natural Language Processor whose purpose is to extract the interactions and intention, as well as related information and metadata, from a piece of plain natural language and translate them into something a machine can process. &lt;/p&gt;

&lt;p&gt;NLP.js is an on-premise open source set of more than 70 libraries, used to tackle and solve the main three areas of NLPs: natural language understanding, language generation, and named entity recognition. The key differentiating feature that NLP.js provides is an enhanced user experience via an improved response time, additional language support and, &lt;a href="https://github.com/axa-group/nlp.js/blob/master/docs/v3/benchmarking.md" rel="noopener noreferrer"&gt;according to some benchmarks&lt;/a&gt;, improved accuracy while leveraging increased data privacy &amp;amp; security controls and choices.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why have an NLP library?
&lt;/h2&gt;

&lt;p&gt;It isn’t easy to understand how existing NLPs process every sentence and why specific behavior results as an output. This black box effect, due to the lack of visibility on why the chatbot has answered in a specific way without being able to dig into the source of the problem, causes frustration to chatbot managers. &lt;br&gt;
Having the NLP as an open-source library provides more visibility and understanding of the low-level natural language processing. It would enable technical people to better comprehend the processing of the conversation for managing language-specific strategies to achieve the expected accuracy level. Even if having a specific strategy per country isn’t a mandatory approach, it’s highly recommended when you target high-performance chatbots in languages other than the most-commonly used.&lt;/p&gt;

&lt;h2&gt;
  
  
  The main features of NLP.js
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Language support
&lt;/h3&gt;

&lt;p&gt;NLP.js supports up to 104 different languages with the use of &lt;a href="https://github.com/google-research/bert" rel="noopener noreferrer"&gt;BERT embeddings&lt;/a&gt;. Without BERT, it natively supports 41 languages.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Stemmers
&lt;/h3&gt;

&lt;p&gt;NLP.js implements stemmers to both improve accuracy and require fewer training utterances to achieve the same result. It drastically reduces the manpower and computing power needed to train the NLP.&lt;/p&gt;

&lt;p&gt;Stemmers are algorithms used to calculate the stem (root) of words. For example, words such as &lt;strong&gt;‘developed’&lt;/strong&gt;, &lt;strong&gt;‘developer’&lt;/strong&gt;, &lt;strong&gt;‘developing’&lt;/strong&gt;, &lt;strong&gt;‘development’&lt;/strong&gt;, and &lt;strong&gt;‘developers’&lt;/strong&gt;, are all classified as having the same stem - &lt;strong&gt;‘develop’&lt;/strong&gt;. This is important because when preparing sentences to be trained or classified by an NLP, we usually tend to split those sentences into features. Some NLPs use a tokenizer to divide them into words, but the problem with this approach is that you may need to train the NLP with more sentences to include the different inflections of the language.&lt;/p&gt;

&lt;p&gt;Consider the example where you train the NLP with the sentence &lt;strong&gt;‘who’s your developer?’&lt;/strong&gt; with the word &lt;strong&gt;‘developer’&lt;/strong&gt; as the intent, and then, someone asks the question: &lt;strong&gt;‘who developed you?’&lt;/strong&gt;. Without a stemmer, the words &lt;strong&gt;‘developer’&lt;/strong&gt; and &lt;strong&gt;‘developed’&lt;/strong&gt; wouldn't be recognized as being similar, as they aren't identified with the same token. This issue is even more pronounced in highly inflected languages like Spanish or Indonesian, where the same word can be inflected to indicate gender or, in the case of verbs, tense, mood, and person for example.&lt;/p&gt;

&lt;h3&gt;
  
  
  3.  Open questions
&lt;/h3&gt;

&lt;p&gt;As a result of the integration with BERT, you can have open questions over texts using NLP.js. This means that instead of training the NLP with sentences and intents, you only have to provide a text to BERT and you could then ask any question over the text. The NLP.js BERT integration makes it possible to have an unsupervised classification where you don’t have to provide the intents.&lt;/p&gt;

&lt;p&gt;Below, you can see an example where the text provided to the chatbot is information about Harry Potter, with some open questions subsequently asked over text:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fvf9syf85l1bw5s3pxxbh.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fvf9syf85l1bw5s3pxxbh.gif" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Entity extraction
&lt;/h3&gt;

&lt;p&gt;NLP.js enables entity extraction at several levels. &lt;a href="https://github.com/axa-group/nlp.js/tree/master/examples/06-huge-ner" rel="noopener noreferrer"&gt;It includes an optimized named entity extraction that can search and compare millions of possibilities in milliseconds.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Also, it has golden entity extraction to identify numbers, emails, phone numbers, measures, URLs, currency, etc. When we're talking about identifying a number, it can be quite simple when the figure is written in numerical digits such as ‘541’, but it isn’t so obvious to understand that ‘five hundred and forty-one’ corresponds to the same number. Currencies and measurements written in characters  &lt;a href="https://github.com/axa-group/nlp.js/blob/master/docs/v4/language-support.md" rel="noopener noreferrer"&gt;is possible for up to 44 languages in NLP.js&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  NLP.js helps to optimize the user experience
&lt;/h2&gt;

&lt;p&gt;Data privacy, security, and response time are key pillars for improving user experience and the overall conversational system.&lt;/p&gt;

&lt;h3&gt;
  
  
  Data privacy
&lt;/h3&gt;

&lt;p&gt;Most of the NLP market leaders are cloud-based solutions, meaning that all the data is being processed in the cloud and, in some cases, managed outside of the target customer platform. In principle, cloud data processing isn’t a big issue when aiming to meet the data privacy needs and requirements of most countries. However, it can still be a showstopper in certain regions, such as Germany, Singapore, or Turkey…&lt;/p&gt;

&lt;h3&gt;
  
  
  Security
&lt;/h3&gt;

&lt;p&gt;The idea of making the NLP a library would allow the overall solution to be deployable fully on-premise if required. Furthermore, NLP.js could be executed directly on a smartphone without needing a data connection. With the current trends of globalization and making everything more and more connected, it's important to keep an open door to fully on-premise solutions to maintain control over data..&lt;/p&gt;

&lt;h3&gt;
  
  
  Response time
&lt;/h3&gt;

&lt;p&gt;By removing the need for cloud connectivity, a significant improvement in terms of latency and performance will be observed, even though, any API call will always have some inherent latency. This latency can be further avoided by including NLP.js as an embedded library. In terms of benchmarking, this faster performance would highlight a significant difference against other market solutions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Running NLP.js locally (example)
&lt;/h2&gt;

&lt;p&gt;First, you'll need Node.js installed on your computer. If you haven’t, you can get it  &lt;a href="https://nodejs.org/en/download/" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Then, create a folder for your project, init a new node project and install these NLP.js dependencies: &lt;code&gt;basic&lt;/code&gt;, &lt;code&gt;express-api-server&lt;/code&gt; and &lt;code&gt;directline-connector&lt;/code&gt;. &lt;code&gt;basic&lt;/code&gt; installs the packages needed to run NLP.js, &lt;code&gt;express-api-server&lt;/code&gt; provides an API server using &lt;code&gt;express&lt;/code&gt; and the frontend for the chatbot, and &lt;code&gt;directline-connector&lt;/code&gt; provides an API for the chatbot like the Microsoft Directline one.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

&lt;span class="nb"&gt;mkdir &lt;/span&gt;chatbot
&lt;span class="nb"&gt;cd &lt;/span&gt;chatbot
npm init
npm i @nlpjs/basic @nlpjs/express-api-server @nlpjs/directline-connector


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Now you'll need a Corpus, that's the knowledge data for your chatbot, organized into intents, and for each intent the sentences to train as well as the answers. You can access an example of corpus in English &lt;a href="https://github.com/axa-group/nlp.js/blob/master/examples/03-qna-pipelines/corpus.json" rel="noopener noreferrer"&gt;here&lt;/a&gt; or the &lt;a href="https://raw.githubusercontent.com/axa-group/nlp.js/master/examples/03-qna-pipelines/corpus.json" rel="noopener noreferrer"&gt;raw file&lt;/a&gt;. Download it and put it inside the folder where you've your project.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

curl &lt;span class="nt"&gt;-O&lt;/span&gt; https://raw.githubusercontent.com/axa-group/nlp.js/master/examples/03-qna-pipelines/corpus.json


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Create a file called &lt;em&gt;conf.json&lt;/em&gt;, this is the configuration file telling NLP.js what plugins it must include and the configuration for each plugin. Put the following information in the &lt;em&gt;conf.json&lt;/em&gt; file to run this example:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"settings"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"nlp"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"corpora"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"./corpus.json"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"api-server"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"port"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"serveBot"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"use"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"Basic"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ExpressApiServer"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"DirectlineConnector"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;


&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The use part is the name of the plugins to include and the &lt;code&gt;settings&lt;/code&gt; part is the configuration of each plugin. In this case we're telling the NLP to load the corpora, &lt;strong&gt;the corpus.json&lt;/strong&gt; file we downloaded before. We're also telling the API server to start on the port 3000 and we set &lt;code&gt;serveBot&lt;/code&gt; to true as we want the frontend of the bot to be automatically served.&lt;/p&gt;

&lt;p&gt;Now that we’ve the configuration, let’s create an &lt;strong&gt;index.js&lt;/strong&gt; file with the code to get it running:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;dockStart&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@nlpjs/basic&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;dock&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;dockStart&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;nlp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;dock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;nlp&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;nlp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;train&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;})();&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;And that's everything we need. We can now start the application:&lt;/p&gt;

&lt;p&gt;With &lt;code&gt;const dock = await dockStart()&lt;/code&gt; we're telling NLP.js to initialize, load the &lt;strong&gt;conf.json&lt;/strong&gt; file, load the associated plugins defined and start them with the defined configuration. It returns a dock instance that holds a container with all the plugins loaded. Then &lt;code&gt;const nlp = dock.get('nlp')&lt;/code&gt; is where we retrieve the NLP plugin from the dock container. This instance of NLP already contains the corpus that we defined in the configuration, but isn’t trained yet, so we’ve to train it with&lt;code&gt;await nlp.train()&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;And that's everything we need. We can now start the application:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

node &lt;span class="nb"&gt;.&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;And navigate to &lt;a href="http://localhost:3000" rel="noopener noreferrer"&gt;http://localhost:3000&lt;/a&gt; to see the webchat and talk with the chatbot.&lt;/p&gt;

&lt;h2&gt;
  
  
  Online demo
&lt;/h2&gt;

&lt;p&gt;If you prefer to play with an online demo, you can 'Remix' the code on &lt;a href="https://glitch.com/edit/#!/remix/nlpjs" rel="noopener noreferrer"&gt;Glitch&lt;/a&gt;, meaning you’ll be able to run the demo, as well as make your modifications to the code and play with it. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://glitch.com/edit/#!/remix/nlpjs" rel="noopener noreferrer"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fxa2quuf0xyuj8n139a6t.png" alt="Remix on Glitch"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fnc5kcsfcd9a7pe007iok.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fnc5kcsfcd9a7pe007iok.gif" alt="Remix Demo"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For more information, you can access the &lt;a href="https://github.com/axa-group/nlp.js/blob/master/docs/v4/quickstart.md" rel="noopener noreferrer"&gt;full tutorial&lt;/a&gt; and some additional &lt;a href="https://github.com/jesus-seijas-sp/nlpjs-examples/tree/master/01.quickstart" rel="noopener noreferrer"&gt;codes snippets&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The value of open source
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://tom.preston-werner.com/2011/11/22/open-source-everything.html" rel="noopener noreferrer"&gt; According to &lt;em&gt;Tom Preston-Werner&lt;/em&gt; - cofounder of GitHub&lt;/a&gt;: &lt;em&gt;"Smart people like to hang out with other smart people. Smart developers like to hang out with smart code. When you open source useful code, you attract talent".&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In our ambition to become a tech-led company, sharing relevant open-source projects and libraries is an excellent method to showcase our technology to the world, extend our collaboration beyond our company walls, and to expand our ways of connecting with additional talent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;NLP.js&lt;/strong&gt; is an excellent candidate for AXA’s open-source program. It doesn't contain anything specific from the AXA core business, it’s generic enough, easy to be reused, and we believe it provides a perfect opportunity to engage and contribute back to the open source community.&lt;/p&gt;

&lt;p&gt;Among other uses and publications, it has already been used in the &lt;a href="https://conversations2019.files.wordpress.com/2020/01/conversations_2019_paper_8-preprint.pdf" rel="noopener noreferrer"&gt;University of Goettingen&lt;/a&gt; and presented at the  &lt;a href="https://youtu.be/f9G4D_916nY?t=4318" rel="noopener noreferrer"&gt;Colombia 4.0 AI conference in 2019&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you want to learn more about AXA’s open source program and technology, please contact: &lt;em&gt;&lt;a href="mailto:opensource@axa.com"&gt;opensource@axa.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>javascript</category>
      <category>chatbots</category>
      <category>node</category>
      <category>nlp</category>
    </item>
  </channel>
</rss>
