<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: AcademicAI</title>
    <description>The latest articles on DEV Community by AcademicAI (@academicai).</description>
    <link>https://dev.to/academicai</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F4380%2F8d944024-6f5a-4288-b7ae-4802006673cc.png</url>
      <title>DEV Community: AcademicAI</title>
      <link>https://dev.to/academicai</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/academicai"/>
    <language>en</language>
    <item>
      <title>🌳 Utilizando Neural edit-tree lemmatization no português</title>
      <dc:creator>Jessica Cardoso</dc:creator>
      <pubDate>Sat, 26 Mar 2022 23:31:54 +0000</pubDate>
      <link>https://dev.to/academicai/utilizando-neural-edit-tree-lemmatization-para-o-portugues-nme</link>
      <guid>https://dev.to/academicai/utilizando-neural-edit-tree-lemmatization-para-o-portugues-nme</guid>
      <description>&lt;p&gt;Os desenvolvedores do spaCy lançaram um modelo de lematização experimental que conseguiu aumentar a acurácia de lematização do português de 0.75 para 0.97. As versões tradicionais dos lematizadores do spaCy são baseadas em &lt;em&gt;lookup tables&lt;/em&gt; e &lt;em&gt;rule sets&lt;/em&gt; &lt;a href="https://explosion.ai/blog/edit-tree-lemmatizer"&gt;[1]&lt;/a&gt;. Os autores disponibilizaram um template de projeto do &lt;em&gt;spaCy&lt;/em&gt; para facilitar a edição dos parâmetros e configurações de treinamento.&lt;/p&gt;

&lt;p&gt;Abordamos aqui duas formas de utilizar o edit tree lemmatizer: Por meio do modelo pré-treinado &lt;em&gt;transformer&lt;/em&gt; e treinando a pipeline do zero com componentes específicos.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Abordagem &lt;em&gt;Transformer&lt;/em&gt;
&lt;/h2&gt;

&lt;p&gt;A primeira abordagem já tem tudo pronto para utilização. A desvantagem dela é que o o modelo transformer ocupa bastante espaço na memória, e o tempo de inferência na cpu pode ser bastante lento, dependendo dos requisitos.&lt;/p&gt;

&lt;p&gt;Para utilizar basta seguir os seguintes comandos:&lt;/p&gt;

&lt;p&gt;Instalar as bibliotecas necessárias com o modelo&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;https://huggingface.co/explosion/pt_udv25_portuguesebosque_trf/resolve/main/pt_udv25_portuguesebosque_trf-any-py3-none-any.whl
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Agora é só carregar o modelo e utilizar&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;spacy&lt;/span&gt;
&lt;span class="n"&gt;nlp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;spacy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pt_udv25_portuguesebosque_trf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# processa o texto
&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;nlp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;meu gato não é como os outros e não come ração&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;original:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lematizado:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lemma_&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;token&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Como ainda está em uma versão experimental, alguns erros podem acontecer. Em um deles é necessário fixar a versão thinc com o comando&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;pip&lt;/span&gt; &lt;span class="n"&gt;install&lt;/span&gt; &lt;span class="n"&gt;thinc&lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="mf"&gt;8.0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;14&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  2. Treinamento da pipeline
&lt;/h2&gt;

&lt;p&gt;Nessa parte do tutorial, faremos uso do template do &lt;em&gt;edit_tree_lemmatizer&lt;/em&gt; que está disponível &lt;a href="https://github.com/explosion/spacy-experimental"&gt;aqui&lt;/a&gt;. No exemplo, foi configurado para o alemão, nós modificaremos para treinar um modelo em português.&lt;/p&gt;

&lt;h3&gt;
  
  
  2.1 Pré requisito
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Ambiente virtual do python&lt;/li&gt;
&lt;li&gt;Biblioteca do &lt;a href="https://spacy.io/usage"&gt;spaCy&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2.2 Configurando projeto
&lt;/h3&gt;

&lt;p&gt;Primeiro começamos trazendo o projeto do &lt;em&gt;Github&lt;/em&gt; para a nossa máquina através do comando &lt;a href="https://spacy.io/api/cli#project-clone"&gt;&lt;code&gt;spacy project clone&lt;/code&gt;&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python &lt;span class="nt"&gt;-m&lt;/span&gt; spacy project clone projects/edit_tree_lemmatizer &lt;span class="nt"&gt;--repo&lt;/span&gt; https://github.com/explosion/spacy-experimental
&lt;span class="nb"&gt;cd &lt;/span&gt;edit_tree_lemmatizer
python &lt;span class="nt"&gt;-m&lt;/span&gt; pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Vamos mudar o &lt;em&gt;dataset&lt;/em&gt;  utilizado de &lt;em&gt;Dutch Alpino treebank&lt;/em&gt; para um &lt;em&gt;treebank&lt;/em&gt; em nosso idioma. Para isso editamos o arquivo project.yml.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;vars&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;lang&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pt"&lt;/span&gt;
  &lt;span class="na"&gt;treebank&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;UD_Portuguese-Bosque"&lt;/span&gt;
  &lt;span class="na"&gt;train_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pt_bosque-ud-train"&lt;/span&gt;
  &lt;span class="na"&gt;dev_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pt_bosque-ud-dev"&lt;/span&gt;
  &lt;span class="na"&gt;test_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pt_bosque-ud-test"&lt;/span&gt;
  &lt;span class="na"&gt;gpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;-1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Além de alterar essa parte do arquivo, também precisamos adicionar o parâmetro &lt;code&gt;--merge-subtokens&lt;/code&gt; ao final do comando &lt;a href="https://spacy.io/api/cli#convert"&gt;&lt;code&gt;spacy convert&lt;/code&gt;&lt;/a&gt; na seção &lt;code&gt;preprocess&lt;/code&gt; do nosso &lt;em&gt;yaml&lt;/em&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;commands&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;preprocess"&lt;/span&gt;
    &lt;span class="na"&gt;help&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Convert&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;to&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;spaCy&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;format"&lt;/span&gt;
    &lt;span class="na"&gt;script&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mkdir&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;-p&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;corpus/${vars.treebank}"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;python&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;-m&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;spacy&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;convert&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;assets/${vars.treebank}/${vars.train_name}.conllu&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;corpus/${vars.treebank}/&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;--n-sents&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;10&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;--merge-subtokens"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mv&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;corpus/${vars.treebank}/${vars.train_name}.spacy&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;corpus/${vars.treebank}/train.spacy"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;python&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;-m&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;spacy&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;convert&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;assets/${vars.treebank}/${vars.dev_name}.conllu&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;corpus/${vars.treebank}/&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;--n-sents&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;10&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;--merge-subtokens"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mv&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;corpus/${vars.treebank}/${vars.dev_name}.spacy&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;corpus/${vars.treebank}/dev.spacy"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;python&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;-m&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;spacy&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;convert&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;assets/${vars.treebank}/${vars.test_name}.conllu&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;corpus/${vars.treebank}/&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;--n-sents&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;10&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;--merge-subtokens"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mv&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;corpus/${vars.treebank}/${vars.test_name}.spacy&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;corpus/${vars.treebank}/test.spacy"&lt;/span&gt;
    &lt;span class="na"&gt;deps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assets/${vars.treebank}/"&lt;/span&gt;
    &lt;span class="na"&gt;outputs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;corpus/${vars.treebank}/train.spacy"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;corpus/${vars.treebank}/dev.spacy"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;corpus/${vars.treebank}/test.spacy"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Feito isso, vamos baixar o &lt;em&gt;dataset&lt;/em&gt; definido acima, converter no formato do spaCy e criar o arquivo de configurações.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python &lt;span class="nt"&gt;-m&lt;/span&gt; spacy project assets
python &lt;span class="nt"&gt;-m&lt;/span&gt; spacy project run preprocess
python &lt;span class="nt"&gt;-m&lt;/span&gt; spacy project run create-config
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Os dados do repositório UD_Portuguese-Bosque estão na pasta &lt;em&gt;assets&lt;/em&gt;, já o arquivo de configuração usado no treinamento está na pasta &lt;em&gt;config&lt;/em&gt;. Nesse tutorial, não modificamos o &lt;em&gt;config.cfg&lt;/em&gt;, mas é possível ajustar os parâmetros do treinamento.&lt;/p&gt;

&lt;p&gt;Treinamos o modelo com utilizando a configuração padrão através do comando abaixo.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python &lt;span class="nt"&gt;-m&lt;/span&gt; spacy project run train
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Para avaliar o modelo, apenas trocamos a palavra &lt;em&gt;train&lt;/em&gt; por &lt;em&gt;evaluate&lt;/em&gt;,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python &lt;span class="nt"&gt;-m&lt;/span&gt; spacy project run evaluate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Conseguimos alcançar na avaliação com a configuração padrão, as seguintes métricas:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;TOK     99.98
LEMMA   96.20
SPEED   21155
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2.3 Usando o modelo treinado
&lt;/h3&gt;

&lt;p&gt;O modelo treinado é gerado na pasta &lt;em&gt;training&lt;/em&gt; e pode ser carregado em uma &lt;em&gt;pipeline&lt;/em&gt; do &lt;em&gt;spaCy&lt;/em&gt;, para isso basta apenas carregar o modelo treinado no &lt;em&gt;script&lt;/em&gt; a seguir.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;spacy&lt;/span&gt;

&lt;span class="c1"&gt;# cria o objeto nlp ao carregar o lematizador
&lt;/span&gt;&lt;span class="n"&gt;nlp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;spacy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;edit_tree_lemmatizer/training/UD_Portuguese-Bosque/model-best&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# processa o texto
&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;nlp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;meu gato não é como os outros e não come ração&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;original:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lematizado:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lemma_&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;token&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Nesse segundo caso, não editamos os valores no &lt;em&gt;config.cfg&lt;/em&gt;, então a acurácia será menor que o informado no blog explosion e o modelo com &lt;em&gt;transformers&lt;/em&gt;. &lt;/p&gt;

</description>
      <category>spacy</category>
      <category>português</category>
      <category>lematizador</category>
      <category>nlp</category>
    </item>
  </channel>
</rss>
