<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Condé Nast Italy</title>
    <description>The latest articles on DEV Community by Condé Nast Italy (@condenastitaly).</description>
    <link>https://dev.to/condenastitaly</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F420455%2F572c6877-6639-4610-8fb0-ba7c23c46b23.png</url>
      <title>DEV Community: Condé Nast Italy</title>
      <link>https://dev.to/condenastitaly</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/condenastitaly"/>
    <language>en</language>
    <item>
      <title>When Food meets AI: the Smart Recipe Project</title>
      <dc:creator>Condé Nast Italy</dc:creator>
      <pubDate>Fri, 07 Aug 2020 10:05:49 +0000</pubDate>
      <link>https://dev.to/condenastitaly/when-food-meets-ai-the-smart-recipe-project-b3g</link>
      <guid>https://dev.to/condenastitaly/when-food-meets-ai-the-smart-recipe-project-b3g</guid>
      <description>&lt;h1&gt;
  
  
  Part 3. FoodGraph: Loading data and Querying the graph with SPARQL
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--vEn8HCUn--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/12kth3n5kah1hgrmy0hl.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--vEn8HCUn--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/12kth3n5kah1hgrmy0hl.jpg" alt="Alt Text" width="880" height="550"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Did you ever try a Maritozzo?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In the past post, we converted the recipe data, stored in JSON files, into RDF triples. In this post, we show you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;how we loaded this data on &lt;a href="https://aws.amazon.com/neptune/"&gt;Amazon Neptune&lt;/a&gt;;&lt;/li&gt;
&lt;li&gt;how we integrated the output of the extractor and classifier systems in FoodGraph;&lt;/li&gt;
&lt;li&gt;how we can query the graph to extract useful and connected information.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To query the graph, we use &lt;a href="https://www.w3.org/TR/rdf-sparql-query/"&gt;SPARQL&lt;/a&gt;. SPARQL is an RDF query language, namely a semantic query language for databases, able to retrieve and manipulate data stored or viewed in the RDF format.&lt;/p&gt;

&lt;h2&gt;
  
  
  Loading data on Amazon Neptune
&lt;/h2&gt;

&lt;p&gt;We followed the &lt;a href="https://aws.amazon.com/neptune/developer-resources/"&gt;described procedure&lt;/a&gt; to load the RDF triples on the Amazon Neptune service.&lt;br&gt;
We used an Amazon Simple Storage Service, the Amazon S3 bucket. Firstly we created an S3 bucket; then we uploaded the data.  In this first phase, we loaded the RDF data to build the first level of the graph (see the previous article).&lt;/p&gt;

&lt;p&gt;In the case we want to add a few recipes at the time, we can alternatively use the SPARQL statement INSERT DATA :&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--yElg-2q4--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/k4b7beukmtjunnrsjm3v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--yElg-2q4--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/k4b7beukmtjunnrsjm3v.png" alt="Alt Text" width="880" height="224"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Integrating the extractor and classifier services within the graph
&lt;/h2&gt;

&lt;p&gt;Once the recipes have been loaded, we checked whether there are recipes not yet processed by the extractor and classifier services. This means to check which recipes have not &lt;br&gt;
i) food entity chunks extracted (the bnode in the graph, see the previous article); &lt;br&gt;
ii) ingredients classified. &lt;/p&gt;

&lt;p&gt;This is the SPARQL query to check whether bnodes exist in the graph (through the statement FILTER NOT EXISTS), which is equivalent to say “return all the recipes without bnodes”:  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--ATXPChk---/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/xeg6ulusngn2cwyamn0o.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ATXPChk---/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/xeg6ulusngn2cwyamn0o.png" alt="Alt Text" width="880" height="350"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Extracting knowledge from the graph via SPARQL
&lt;/h2&gt;

&lt;p&gt;Now the graph is on Amazon Neptune. Let’s have fun of these connections, extracting knowledge from the graph: &lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--FvBJsf3a--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/s478sowc3sj8qfcme72b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--FvBJsf3a--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/s478sowc3sj8qfcme72b.png" alt="Alt Text" width="880" height="372"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;With the above query we interrogate the graph to know 1) whether there are recipes containing the ingredient “butter” and 2) which are these recipes. The WHERE statement navigates the graph following the pattern described in the triples to arrive at the query result. In this case, the output is the id of the recipes which have the ingredients ”butter”.&lt;br&gt;
We can query the graph to return recipes containing more than one ingredient or all the recipes containing some ingredients and not others:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--a5yJVeIf--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/9f1z17q9e1udjyfg4u1j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--a5yJVeIf--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/9f1z17q9e1udjyfg4u1j.png" alt="Alt Text" width="880" height="466"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Smart Recipe Project: what has been done, what can be done
&lt;/h2&gt;

&lt;p&gt;With this last article, we conclude illustrating the main stages of the Smart Recipe Project, this innovative and amazing project involving on one side the global company &lt;a href="https://www.condenast.it"&gt;Condé Nast&lt;/a&gt;, and on the other the IT company &lt;a href="https://www.res-group.com/en/"&gt;RES&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;We have in mind some possible interesting applications for the resources we developed under the Smart Recipe Project like:&lt;/p&gt;

&lt;p&gt;personalization of contents, personalized recipe searchers, newsletter;&lt;br&gt;
recommendation systems for food items, recipes, and menus, which integrate, where needed, dietary restrictions; &lt;br&gt;
virtual assistants, able to guide you in planning and cooking meals; &lt;br&gt;
smart cooking devices, and much more.&lt;/p&gt;

&lt;p&gt;As always, &lt;a href="https://medium.com/@condenastitaly/when-food-meets-ai-the-smart-recipe-project-2bf0d20aaf79"&gt;go on Medium&lt;/a&gt; to read the complete article.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;When Food meets AI: the Smart Recipe Project&lt;/strong&gt;&lt;br&gt;
a series of 6 amazing articles&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Table of contents&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://dev.to/condenastitaly/when-food-meets-ai-the-smart-recipe-project-mg2"&gt;Part 1: Cleaning and manipulating food data&lt;/a&gt;&lt;br&gt;
&lt;a href="https://dev.to/condenastitaly/when-food-meets-ai-the-smart-recipe-project-29bf"&gt;Part 1: A smart method for tagging your datasets&lt;/a&gt;&lt;br&gt;
&lt;a href="https://dev.to/condenastitaly/when-food-meets-ai-the-smart-recipe-project-2d6e"&gt;Part 2: NER for all tastes: extracting information from cooking recipes&lt;/a&gt;&lt;br&gt;
&lt;a href="https://dev.to/condenastitaly/when-food-meets-ai-the-smart-recipe-project-21pg"&gt;Part 2: Neither fish nor fowl? Classify it with the Smart Ingredient Classifier&lt;/a&gt;&lt;br&gt;
&lt;a href="https://dev.to/condenastitaly/when-food-meets-ai-the-smart-recipe-project-27pm"&gt;Part 3: FoodGraph: a graph database to connect recipes and food data&lt;/a&gt;&lt;br&gt;
&lt;a href="https://dev.to/condenastitaly/when-food-meets-ai-the-smart-recipe-project-b3g"&gt;Part 3. FoodGraph: Loading data and Querying the graph with SPARQL&lt;/a&gt;&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>ai</category>
      <category>neptune</category>
      <category>sparql</category>
    </item>
    <item>
      <title>When Food meets AI: the Smart Recipe Project</title>
      <dc:creator>Condé Nast Italy</dc:creator>
      <pubDate>Mon, 03 Aug 2020 11:20:00 +0000</pubDate>
      <link>https://dev.to/condenastitaly/when-food-meets-ai-the-smart-recipe-project-27pm</link>
      <guid>https://dev.to/condenastitaly/when-food-meets-ai-the-smart-recipe-project-27pm</guid>
      <description>&lt;h1&gt;
  
  
  Part 3: FoodGraph: a graph database to connect recipes and food data
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--ZJ2mioZk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/jxgpn8kk4txg8zbgd8xj.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ZJ2mioZk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/jxgpn8kk4txg8zbgd8xj.jpg" alt="Cuttlefish"&gt;&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Delicious Cuttlefish&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;After enriching data and developing ML and DL models to extract and classify elements from recipes, ​we in the Smart Recipe Project moved a step further connecting the data and the output of the services (see the previous posts) under a graph database architecture. The goal is the creation of a knowledge base, named FoodGraph, where the different recipe data information is connected together to form a deep net of knowledge.&lt;/p&gt;

&lt;p&gt;In this two-section post:&lt;br&gt;
(SECTION 1): we will give you some insights about the concepts and technologies used in designing a graph database;&lt;/p&gt;

&lt;p&gt;(SECTION 2): we show you our method for converting JSON files, containing the recipe data, into RDF triples, the data model we chose for constructing the graph.&lt;/p&gt;
&lt;h2&gt;
  
  
  Graph database: key concepts
&lt;/h2&gt;

&lt;p&gt;Graph databases are a NoSQL way to store and treat data and relationships among it, where relationships are equally important to data itself. In contrast to other approaches, the graph databases are initially designed to incorporate relationships since they store connections alongside the data in the model.&lt;/p&gt;

&lt;p&gt;The building blocks of a graph database are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Nodes or vertex → They are the constructs standing for real-world entities participating in relationships.&lt;/li&gt;
&lt;li&gt;Edges or links → They represent connections and relationships among nodes and express the existing properties between the entities.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--s1vZfCFM--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/gl4y6goqz8e9aa2wx05k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--s1vZfCFM--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/gl4y6goqz8e9aa2wx05k.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  &lt;a href="https://www.w3.org/RDF/"&gt;RDF&lt;/a&gt;​:​ a data model to build the graph database.
&lt;/h2&gt;

&lt;p&gt;RDF stands for Resource Description Framework (RDF) and ​is a data model that describes the semantics, or meaning of information. ​The core structure of an RDF model is a set of triples, each consisting of a ​subject​, a ​predicate and, an ​object​, which together form an RDF graph or a&lt;/p&gt;

&lt;p&gt;triples store. ​Each RDF statement states a single thing about its subject (in purple) by linking it to an object (in red) by the means of a predicate (in red), the property​.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&amp;lt;http://www.w3.org/TR/rdf-syntax-grammar&amp;gt;​&amp;lt;http://purl.org/dc/elements/.1/title&amp;gt; ​"RDF/XML Syntax Specification"​.&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;In the example, above the triple states “​The technical report on RDF syntax and grammar has&lt;br&gt;
the title ​RDF/XML Syntax Specification.​​”&lt;/p&gt;
&lt;h3&gt;
  
  
  Ontologies and Vocabularies.
&lt;/h3&gt;

&lt;p&gt;An ontology represents a formal ​description of a knowledge domain as a set of concepts relationships that hold between them​. ​To enable such a description, we need to formally specify components such as individuals (instances of objects), classes, attributes, and relations as well as restrictions, rules, and axioms. As a result, ontologies do not only introduce a sharable and reusable knowledge representation but can also add new knowledge about the domain and help data integration when data comes from different datasets.&lt;/p&gt;
&lt;h3&gt;
  
  
  Logic and Inferences.
&lt;/h3&gt;

&lt;p&gt;Another important component of linked data is the possibility to perform inferences (or reasoning) on data though rules defined with data itself. Inference means that automatic procedures performed by inference engines (or “reasoners”) can generate new relationships based on data and some additional information in the form of an ontology. Thus the database can be used not only to retrieve information but also to deduce new information from facts in the data.&lt;/p&gt;
&lt;h3&gt;
  
  
  SPARQL.
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://www.w3.org/TR/rdf-sparql-query/"&gt;SPARQL&lt;/a&gt; ​is an RDF query language, namely a semantic query language for databases, able to retrieve and manipulate data stored in RDF format. The results of SPARQL queries return the resources for all triples that match the specified patterns and can be result sets or RDF graphs.&lt;/p&gt;
&lt;h3&gt;
  
  
  Amazon Neptune.
&lt;/h3&gt;

&lt;p&gt;Amazon Neptune is a graph database service that simplifies the construction and the integration of applications working with highly connected datasets. Its engine is able to store billions of relationships which can be speedily navigated and queried.&lt;/p&gt;
&lt;h2&gt;
  
  
  Convert JSON file to RDF
&lt;/h2&gt;

&lt;p&gt;The first step for building the graph database consists of converting the JSON files, containing the recipe data, into RDF triples.&lt;br&gt;
With few lines of code, we extracted data from the ​JSON ​file (using the Python library &lt;a href="https://docs.python.org/3/library/json.html"&gt;json&lt;/a&gt;​) and converted it into RDF triples (in ​&lt;a href="https://www.w3.org/TR/turtle/"&gt;Turtle&lt;/a&gt; format), manually writing the RDF structure. This approach fits well with our task since the number of the data type to convert is relatively few.&lt;br&gt;
The procedure to build the RDF triples consists in general of three steps:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Prefix declaration&lt;/strong&gt; → The prefixes identify the ontologies/vocabularies describing properties, classes, entities, and attributes used to build the graph. These elements indeed can be called in the triple via URI or using a namespace prefix. In Turtle format, the prefixes are introduced by a “@” and stand at the beginning of the Turtle document.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data extraction&lt;/strong&gt; → using the Python library json, we extract the data contained in the json array. This data represents the nodes of the RDF graph.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Writing RDF triples&lt;/strong&gt;​ → using the data extracted and the ontologies declared, we manually write the RDF triples in a Turtle file. This data will be loaded on Amazon Neptune.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is, for example, the JSON file containing 1) the output of the extractor service (see the previous post) and 2) other technical information of the NER model within the service:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--sICTE5oI--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/7f7aaimqamvivlfgpy4k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--sICTE5oI--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/7f7aaimqamvivlfgpy4k.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is the code we used to convert the JSON to RDF triples:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;uuid&lt;/span&gt;

&lt;span class="n"&gt;lang_dict&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="s"&gt;"it"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"italian"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"wiki:Q652"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="s"&gt;"en"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"english"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"wiki:Q1860"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;lev2_rdfgraph&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json_array&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lang_dict&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nb"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"path.ttl"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'w'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;encoding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'utf-8'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;lev2_rdfgraph&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;#prefix declaration
&lt;/span&gt;        &lt;span class="n"&gt;lev2_rdfgraph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"""@prefix recipe:&amp;lt;http//www.example.com/recipe/&amp;gt;.
        @prefix example:&amp;lt;http://www.example.com/&amp;gt;.
        @prefix schema:&amp;lt;https://schema.org/&amp;gt;.
        @prefix rdf:&amp;lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#&amp;gt;.
        @prefix xs:&amp;lt;http://www.w3.org/2001/XMLSchema#&amp;gt;.
        @prefix dcterms:&amp;lt;http://purl.org/dc/terms/&amp;gt;.
        @prefix wiki:&amp;lt;http://wikidata.org/wiki/&amp;gt;.”””)   

#data extraction
        for js in json_array:
            id_recipe = js['id']
            model_date = js['info_services']['model_date']
            language = js['language']
        #write rdf triples              
        rdf_file.write("recipe:"+id_recipe+"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;schema:dateModified&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;" + str(model_date)+".&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;")
        #data extraction
       for chunk in js['intervals']:
           ingr, ingr_id  = "", ""
           for token in chunk['ingr']:
                ingr = str(ingr+token+" ")
                ingr_id = str(uuid.uuid4().hex()
                bnode_name = str(uuid.uuid4())
                #write rdf triples                
                lev2_rdfgraph.write("recipe:"+id_recipe+"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;schema:material&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;_:"+bnode_name+".&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;")                   
                if 'unit' in chunk.keys():
                    lev2_rdfgraph.write(“_:”+bnode_name+”&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;schema:materialExtent&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;\”+str(chunk[“unit”])+”\”.&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;”)
                if “value” in chunk.keys():   
                lev2_rdfgraph.write("_:"+bnode_name+"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;rdf:value&lt;/span&gt;&lt;span class="se"&gt;\n\"&lt;/span&gt;&lt;span class="s"&gt;"+str(chunk['value]')+“\”.&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;)
           lev2_rdfgraph.write("_:"+bnode_name+"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;schema:recipeIngredient&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;example:"+ingr_id+".&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;")                                                 
           lev2_rdfgraph.write("example:"+ingred_id+"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;xs:string&lt;/span&gt;&lt;span class="se"&gt;\n\"&lt;/span&gt;&lt;span class="s"&gt;+ingr.rstrip()+"&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;")
     lev2_rdfgraph.write("recipe:" + id_recipe + "&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;" + "dcterms:language" + "&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;" +   lang_dict[language][1] + ".&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;"                                                                                       
     lev2_rdfgraph.write(lang_dict[language][1] + '&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;xs:string&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"' + lang_dict[language][0] + '".&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;'

lev2_rdfgraph(json_array, lang_dict)


&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;This is a graphic visualization for this piece of graph. The nodes represent the subject and the object of the graph, while the edges the predicates (for clarity, in the figure the properties are in the extended form and not called via prefix).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--5DI744Y5--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/w580nen2bydunhnk0l8t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--5DI744Y5--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/w580nen2bydunhnk0l8t.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Food Graph is a three-level depth graph. &lt;a href="https://medium.com/@condenastitaly/when-food-meets-ai-the-smart-recipe-project-eea259f53ed2"&gt;Let’s discover the other levels of knowledge reading the Medium article&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;When Food meets AI: the Smart Recipe Project&lt;/strong&gt;&lt;br&gt;
a series of 6 amazing articles&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Table of content&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://dev.to/condenastitaly/when-food-meets-ai-the-smart-recipe-project-mg2"&gt;Part 1: Cleaning and manipulating food data&lt;/a&gt;&lt;br&gt;
&lt;a href="https://dev.to/condenastitaly/when-food-meets-ai-the-smart-recipe-project-29bf"&gt;Part 1: A smart method for tagging your datasets&lt;/a&gt;&lt;br&gt;
&lt;a href="https://dev.to/condenastitaly/when-food-meets-ai-the-smart-recipe-project-2d6e"&gt;Part 2: NER for all tastes: extracting information from cooking recipes&lt;/a&gt;&lt;br&gt;
&lt;a href="https://dev.to/condenastitaly/when-food-meets-ai-the-smart-recipe-project-21pg"&gt;Part 2: Neither fish nor fowl? Classify it with the Smart Ingredient Classifier&lt;/a&gt;&lt;br&gt;
&lt;a href="https://dev.to/condenastitaly/when-food-meets-ai-the-smart-recipe-project-27pm"&gt;Part 3: FoodGraph: a graph database to connect recipes and food data&lt;/a&gt;&lt;br&gt;
&lt;a href="https://dev.to/condenastitaly/when-food-meets-ai-the-smart-recipe-project-b3g"&gt;Part 3. FoodGraph: Loading data and Querying the graph with SPARQL&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>When food meets AI: the Smart Recipe Project</title>
      <dc:creator>Condé Nast Italy</dc:creator>
      <pubDate>Mon, 27 Jul 2020 06:35:35 +0000</pubDate>
      <link>https://dev.to/condenastitaly/when-food-meets-ai-the-smart-recipe-project-21pg</link>
      <guid>https://dev.to/condenastitaly/when-food-meets-ai-the-smart-recipe-project-21pg</guid>
      <description>&lt;h1&gt;
  
  
  Part 2. Neither fish nor fowl? Classify it with the Smart Ingredient Classifier
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--cfDHdxFP--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/p8mmrmathp81btcyjdyc.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--cfDHdxFP--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/p8mmrmathp81btcyjdyc.jpg" alt="Musssel soup" width="880" height="440"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Mussel soup&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In the previous article, we extracted food entities (ingredients, quantities and units of measurement) from recipes. In this post, we classify the ingredient taxonomic class using the &lt;a href="https://arxiv.org/abs/1810.04805"&gt;BERT&lt;/a&gt; model. In plain words, this means to classify Emmental as cheese, orange as a fruit, peas as a vegetable, and so on for each ingredient in recipes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--sJ624p3L--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/sncd6jfcx7u78pgar5p6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--sJ624p3L--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/sncd6jfcx7u78pgar5p6.png" alt="Alt Text" width="880" height="883"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  BERT in five points
&lt;/h2&gt;

&lt;p&gt;Since its release in late 2018, BERT has positively changed the way to face NLP tasks, solving many challenging problems in the NLP field.&lt;br&gt;
Under this view, one of the main problems in NLP consists of a lack of training data. To cope with this lack, the idea is to exploit a large amount of unannotated data for training general-purpose ​&lt;a href="https://en.wikipedia.org/wiki/Language_model"&gt;language representation models​&lt;/a&gt;, a process known as pre-training, and then fine-tuning these models on a smaller task-specific dataset.&lt;br&gt;
Though this technique is not new (see &lt;a href="https://towardsdatascience.com/introduction-to-word-embedding-and-word2vec-652d0c2060fa"&gt;​word2vec&lt;/a&gt; and &lt;a href="https://nlp.stanford.edu/projects/glove/"&gt;​GloVE&lt;/a&gt; embeddings), we can say, BERT exploits it better. Why? Let’s find it out in five points:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;It is built on a &lt;a href="https://medium.com/inside-machine-learning/what-is-a-transformer-d07dd1fbec04"&gt;​Transformer&lt;/a&gt; architecture, a powerful state-of-the-art architecture, which applies an attention mechanism to understand relationships between tokens in a sentence.&lt;/li&gt;
&lt;li&gt;It is deeply bidirectional since it takes into account the left and right contexts at the same time.&lt;/li&gt;
&lt;li&gt;BERT is pre-trained on a large corpus of unlabeled text that allows to pick up the deeper and intimate understandings of how the language works.&lt;/li&gt;
&lt;li&gt;BERT can be fine-tuned for different tasks by adding a few additional output layers. &lt;/li&gt;
&lt;li&gt;BERT is trained to perform:&lt;/li&gt;
&lt;li&gt;Masked Language Modelling: BERT has to predict randomly masked words.&lt;/li&gt;
&lt;li&gt;New sentence prediction: BERT tries to predict the next sentence in a sequence
of sentences.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The Smart Recipe Project Taxonomy
&lt;/h2&gt;

&lt;p&gt;To carry out the task, we designed a taxonomy, a model of classification for defining macro-categories and classifying the ingredients within them:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--mbTwHCvk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/ki0hjielgwkmx16a45ag.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--mbTwHCvk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/ki0hjielgwkmx16a45ag.png" alt="Alt Text" width="880" height="983"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Such categorization is then used to tag the dataset that trains the model.&lt;/p&gt;

&lt;h2&gt;
  
  
  BERT for ingredient taxonomic classification
&lt;/h2&gt;

&lt;p&gt;For our task (ingredient taxonomic classification), the pre-trained BERT models have optimal performance. We chose the ​&lt;a href="https://huggingface.co/bert-base-multilingual-cased"&gt;bert-base-multilingual-cased&lt;/a&gt; model and divided the classifier into two modules:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;A training module&lt;/strong&gt;. We used &lt;a href="https://huggingface.co/transformers/model_doc/bert.html"&gt;​Bert For Sequence Classification&lt;/a&gt; a basic Bert with a single linear layer at the top for classification. Both the pre-trained model and the untrained layer were trained on our data.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;An applying module&lt;/strong&gt;. The applier takes the trained model and uses it to determine the taxonomic class of the ingredient in the recipe.&lt;br&gt;
&lt;a href="https://medium.com/@condenastitaly/when-food-meets-ai-the-smart-recipe-project-6aa28b30e248?sk=201511e1d46f4b26e8fc24e0a21a6f7f"&gt;You can find a more detailed version of the post on Medium.&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;When Food meets AI: the Smart Recipe Project&lt;/strong&gt;&lt;br&gt;
a series of 6 amazing articles&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Table of content&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://dev.to/condenastitaly/when-food-meets-ai-the-smart-recipe-project-mg2"&gt;Part 1: Cleaning and manipulating food data&lt;/a&gt;&lt;br&gt;
&lt;a href="https://dev.to/condenastitaly/when-food-meets-ai-the-smart-recipe-project-29bf"&gt;Part 1: A smart method for tagging your datasets&lt;/a&gt;&lt;br&gt;
&lt;a href="https://dev.to/condenastitaly/when-food-meets-ai-the-smart-recipe-project-2d6e"&gt;Part 2: NER for all tastes: extracting information from cooking recipes&lt;/a&gt;&lt;br&gt;
&lt;a href="https://dev.to/condenastitaly/when-food-meets-ai-the-smart-recipe-project-21pg"&gt;Part 2: Neither fish nor fowl? Classify it with the Smart Ingredient Classifier&lt;/a&gt;&lt;br&gt;
&lt;a href="https://dev.to/condenastitaly/when-food-meets-ai-the-smart-recipe-project-27pm"&gt;Part 3: FoodGraph: a graph database to connect recipes and food data&lt;/a&gt;&lt;br&gt;
&lt;a href="https://dev.to/condenastitaly/when-food-meets-ai-the-smart-recipe-project-b3g"&gt;Part 3. FoodGraph: Loading data and Querying the graph with SPARQL&lt;/a&gt;&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>ai</category>
      <category>python</category>
      <category>recipes</category>
    </item>
    <item>
      <title>When Food meets AI: the Smart Recipe Project</title>
      <dc:creator>Condé Nast Italy</dc:creator>
      <pubDate>Mon, 20 Jul 2020 08:11:59 +0000</pubDate>
      <link>https://dev.to/condenastitaly/when-food-meets-ai-the-smart-recipe-project-2d6e</link>
      <guid>https://dev.to/condenastitaly/when-food-meets-ai-the-smart-recipe-project-2d6e</guid>
      <description>&lt;h2&gt;
  
  
  Part 2. NER for all tastes: extracting information from cooking recipes
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--ePdbG0n9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/hobax8otfu2ct9yeci7a.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ePdbG0n9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/hobax8otfu2ct9yeci7a.jpg" alt="Alt Text" width="880" height="440"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the previous articles, we constructed two label datasets to train machine learning models and develop systems able to interpret cooking recipes.&lt;/p&gt;

&lt;p&gt;This post dives into the extractor system, a system able to extract ingredients, quantities, time of preparation, and other useful information from recipes. To develop the service, we tried different Named Entity Recognition (NER) approaches.&lt;/p&gt;

&lt;h3&gt;
  
  
  Hold on! What is NER?
&lt;/h3&gt;

&lt;p&gt;ER is a two-step process consisting of a) identifying entities (a token or a group of tokens) in documents and b) categorizing them into some predetermined categories such as Person, City, Company... For the task, we created our own categories, which are INGREDIENT, QUANTIFIER and UNIT.&lt;/p&gt;

&lt;p&gt;NER is a very useful NLP application to group and categorize a great amount of data which share similarities and relevance. For this, it can be applied to many business use cases like &lt;em&gt;Human resources&lt;/em&gt;​, &lt;em&gt;​Customer support&lt;/em&gt;​,&lt;em&gt;​Search and recommendation engines&lt;/em&gt;​,&lt;em&gt;​Content classification&lt;/em&gt;​, and much more​.&lt;/p&gt;

&lt;h3&gt;
  
  
  NER for the Smart Recipe Project
&lt;/h3&gt;

&lt;p&gt;For the Smart Recipe Project, we trained four models: a CRF model, a BiLSTM model, a combination of the previous two (BiLSTM-CRF) and the NER Flair NLP model.&lt;/p&gt;

&lt;h3&gt;
  
  
  CRF model
&lt;/h3&gt;

&lt;p&gt;Linear-chain Conditional Random Fields ​- (&lt;a href="https://medium.com/ml2vec/overview-of-conditional-random-fields-68a2a20fa541)%5BCRFs%5D"&gt;https://medium.com/ml2vec/overview-of-conditional-random-fields-68a2a20fa541)[CRFs]&lt;/a&gt; - are a very popular way to control sequence prediction. CRFs are ​&lt;a href="https://medium.com/@mlengineer/generative-and-discriminative-models-af5637a66a3%23:~:text=In%20General,%20A%20Discriminative%20model,actual%20distribution%20of%20each%20class.&amp;amp;text=A%20Generative%20Model%20%E2%80%8Clearns%20the,the%20help%20of%20Bayes%20Theorem."&gt;discriminative models&lt;/a&gt; able to solve some shortcomings of the ​&lt;a href="https://medium.com/@mlengineer/generative-and-discriminative-models-af5637a66a3%23:~:text=In%20General,%20A%20Discriminative%20model,actual%20distribution%20of%20each%20class.&amp;amp;text=A%20Generative%20Model%20%E2%80%8Clearns%20the,the%20help%20of%20Bayes%20Theorem."&gt;generative&lt;/a&gt; counterpart. Indeed while an HHM output is modeled on the ​&lt;a href="https://medium.com/@mlengineer/joint-probability-vs-conditional-probability-fa2d47d95c4a"&gt;joint probability&lt;/a&gt; distribution, a CRF output is computed on ​&lt;a href="https://medium.com/@mlengineer/joint-probability-vs-conditional-probability-fa2d47d95c4a"&gt;the conditional probability​&lt;/a&gt; distribution.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;In poor words, while a generative classifier tries to learn how the data was generated, a discriminative one tries to model just observing the data&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In addition to this, CRFs take into account the features of the current and previous labels in sequence. This increases the amount of information the model can rely on to make a good prediction.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--K349pVcm--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/jaflrpqxlw3tlpxbygar.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--K349pVcm--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/jaflrpqxlw3tlpxbygar.png" alt="Fig.1 CRF Network" width="880" height="357"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Fig.1 CRF Network&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;For the task, we used the &lt;a href="https://nlp.stanford.edu/software/CRF-NER.html"&gt;​Stanford ​NER algorithm&lt;/a&gt;​, which is an implementation of a CRF classifier. This model outperforms the other models in accuracy, though it cannot understand the context of the forward labels (a pivotal feature for sequential tasks like NER) and requires extra feature engineering.&lt;/p&gt;

&lt;h3&gt;
  
  
  BiLSTM with character embeddings
&lt;/h3&gt;

&lt;p&gt;Going neural... we trained a &lt;a href="https://colah.github.io/posts/2015-08-Understanding-LSTMs/"&gt;Long Short-Term Memory (​LSTM)&lt;/a&gt; model. LSTM networks are a type of Recurrent Neural Networks (RNNs), except that the hidden layer updates are replaced by purpose-built memory cells. As a result, they find and exploit better long-range dependencies in the data.&lt;/p&gt;

&lt;p&gt;To benefit from both past and future context, we used a bidirectional LSTM model (BiLSTM), which processes the text in two directions: both forward (left to right) and backward (right to left). This allows the model to uncover more patterns as the amount of input information increases.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--uso6ysp9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/yftym38rakrrakv88529.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--uso6ysp9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/yftym38rakrrakv88529.png" alt="BiLSTM architecture" width="880" height="765"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Fig.2 BiLSTM architecture&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Moreover, we incorporated character-based word representation as the input of the model. Character-level representation exploits explicit sub-word-level information, infers features for unseen words and shares information of morpheme-level regularities.&lt;/p&gt;

&lt;h3&gt;
  
  
  NER Flair NLP
&lt;/h3&gt;

&lt;p&gt;This model belongs to the (&lt;a href="https://github.com/flairNLP/flair)%5B%E2%80%8BFlair"&gt;https://github.com/flairNLP/flair)[​Flair&lt;/a&gt; NLP library] developed and open-sourced by (&lt;a href="https://research.zalando.com/)%5B%E2%80%8BZalando"&gt;https://research.zalando.com/)[​Zalando&lt;/a&gt; Research​]. The strength of the model lies in a) the use of state-of-the-art character, word and context string embeddings (like (&lt;a href="https://nlp.stanford.edu/projects/glove/)%5B%E2%80%8BGloVe%5D%E2%80%8B"&gt;https://nlp.stanford.edu/projects/glove/)[​GloVe]​&lt;/a&gt;, ​(&lt;a href="https://arxiv.org/abs/1810.04805)%5BBERT%5D%E2%80%8B"&gt;https://arxiv.org/abs/1810.04805)[BERT]​&lt;/a&gt;, (&lt;a href="https://arxiv.org/pdf/1802.05365.pdf)%5BELMo%5D%E2%80%8B.."&gt;https://arxiv.org/pdf/1802.05365.pdf)[ELMo]​..&lt;/a&gt;.), b) the possibility to easier combine these embeddings.&lt;/p&gt;

&lt;p&gt;In particular, ​(&lt;a href="https://www.aclweb.org/anthology/C18-1139/)%5BContextual"&gt;https://www.aclweb.org/anthology/C18-1139/)[Contextual&lt;/a&gt; string embedding] helps to contextualize words producing different embeddings for polysemous words (same words with different meanings):&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--KEOVQ7L7--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/jdzl773vmu6xgk0vbroh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--KEOVQ7L7--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/jdzl773vmu6xgk0vbroh.png" alt="Context String Embedding network" width="880" height="452"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Fig.3 Context String Embedding network&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  BiLSTM-CRF
&lt;/h3&gt;

&lt;p&gt;Last but not least, we tried a hybrid approach. We added a layer of CRF to a BiLSTM model. The advantages (well explained ​&lt;a href="https://arxiv.org/pdf/1508.01991.pdf"&gt;here&lt;/a&gt;​) of such a combo is that this model can efficiently use both 1) past and future input features, thanks to the bidirectional LSTM component, and 2) sentence level tag information, thanks to a CRF layer. The role of the last layer is to impose some other constraints on the final output.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--ovzpsX1x--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/cubohnsakmp8085pzixj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ovzpsX1x--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/cubohnsakmp8085pzixj.png" alt="BiLSTM-CRF: general architecture" width="880" height="386"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Fig. 4 BiLSTM-CRF: general architecture&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  What about performance?
&lt;/h3&gt;

&lt;p&gt;(&lt;a href="https://medium.com/@condenastitaly/when-food-meets-ai-the-smart-recipe-project-8dd1f5e727b5)%5BRead"&gt;https://medium.com/@condenastitaly/when-food-meets-ai-the-smart-recipe-project-8dd1f5e727b5)[Read&lt;/a&gt; the complete article on medium], to discover that and more about this step of the Smart Recipe Project.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;When Food meets AI: the Smart Recipe Project&lt;/strong&gt;&lt;br&gt;
a series of 6 amazing articles&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Table of content&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://dev.to/condenastitaly/when-food-meets-ai-the-smart-recipe-project-mg2"&gt;Part 1: Cleaning and manipulating food data&lt;/a&gt;&lt;br&gt;
&lt;a href="https://dev.to/condenastitaly/when-food-meets-ai-the-smart-recipe-project-29bf"&gt;Part 1: A smart method for tagging your datasets&lt;/a&gt;&lt;br&gt;
&lt;a href="https://dev.to/condenastitaly/when-food-meets-ai-the-smart-recipe-project-2d6e"&gt;Part 2: NER for all tastes: extracting information from cooking recipes&lt;/a&gt;&lt;br&gt;
&lt;a href="https://dev.to/condenastitaly/when-food-meets-ai-the-smart-recipe-project-21pg"&gt;Part 2: Neither fish nor fowl? Classify it with the Smart Ingredient Classifier&lt;/a&gt;&lt;br&gt;
&lt;a href="https://dev.to/condenastitaly/when-food-meets-ai-the-smart-recipe-project-27pm"&gt;Part 3: FoodGraph: a graph database to connect recipes and food data&lt;/a&gt;&lt;br&gt;
&lt;a href="https://dev.to/condenastitaly/when-food-meets-ai-the-smart-recipe-project-b3g"&gt;Part 3. FoodGraph: Loading data and Querying the graph with SPARQL&lt;/a&gt;&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>ai</category>
      <category>python</category>
      <category>recipes</category>
    </item>
    <item>
      <title>When Food meets AI: the Smart Recipe Project</title>
      <dc:creator>Condé Nast Italy</dc:creator>
      <pubDate>Mon, 13 Jul 2020 07:25:28 +0000</pubDate>
      <link>https://dev.to/condenastitaly/when-food-meets-ai-the-smart-recipe-project-29bf</link>
      <guid>https://dev.to/condenastitaly/when-food-meets-ai-the-smart-recipe-project-29bf</guid>
      <description>&lt;h3&gt;
  
  
  Part 1: A smart method for tagging your datasets
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Foeccqcsxpk9sxacmes9o.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Foeccqcsxpk9sxacmes9o.jpg" alt="Alt Text"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;A classic Tiramisù&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Raise your hand if you have never come across the “lack of data” problem while working on ML projects.&lt;/p&gt;

&lt;p&gt;The unavailability or scarcity of training data is indeed one of the most serious challenges in ML and specifically in NLP. A problem that gets harder when the data you need has to be labeled. When no other &lt;a href="https://www.kdnuggets.com/2019/06/5-ways-lack-data-machine-learning.html" rel="noopener noreferrer"&gt;shortcut&lt;/a&gt; works for you, the only alternative is to tag your data... At this point, we imagine the enthusiasm on your face! &lt;/p&gt;

&lt;p&gt;But don’t put you off! Read the post and discover how we impressively reduced the time and cost of the tagging process. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;DISCLAIMER&lt;/strong&gt;: &lt;br&gt;
We worked within the food context, but the approach can be easily extended to many different cases.&lt;/p&gt;
&lt;h3&gt;
  
  
  But first… What to tag?
&lt;/h3&gt;

&lt;p&gt;The entities we want to tag are:&lt;br&gt;
&lt;strong&gt;INGREDIENT&lt;/strong&gt;: apples, cheese, yogurt, hot peppers…&lt;br&gt;
&lt;strong&gt;QUANTIFIER&lt;/strong&gt;: one, 2, ¾, a couple of….&lt;br&gt;
&lt;strong&gt;UNIT&lt;/strong&gt; of measurements: oz, g, lb, liter, cups, tbsp... &lt;/p&gt;

&lt;p&gt;We used a variant of the &lt;a href="https://en.wikipedia.org/wiki/Inside%E2%80%93outside%E2%80%93beginning_(tagging)" rel="noopener noreferrer"&gt;IOB schema&lt;/a&gt; to tag the entities, where B-, I- tags indicate the beginning and intermediate positions of entities. O is the default tag.&lt;/p&gt;
&lt;h3&gt;
  
  
  Let’s tag!
&lt;/h3&gt;

&lt;p&gt;We speeded up the ingredient tagging process with TagINGR,  a semi-automatic tool which works:&lt;br&gt;
    1. matching items in the recipes with those in a list of ingredients;&lt;br&gt;
    2. adding the tag INGREDIENT when an item is both in the list and in the recipe.&lt;/p&gt;
&lt;h3&gt;
  
  
  Here the code:
&lt;/h3&gt;

&lt;p&gt;In part 1, the recipe_tagger function tokenizes words and declares some variables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;recipe_tagger&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lang&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;desc_ingr_list&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;recipe&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="c1"&gt;# Part 1
&lt;/span&gt;    &lt;span class="n"&gt;tokenized_ingr_list&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;  &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;tokenize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lang&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;el&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;el&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;desc_ingr_list&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;ingr_token_list&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;tokenized_ingr_list&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;ingr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tag_ingr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;ingr_token&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;ingr_token_list&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;token&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ingr_token&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="n"&gt;ingr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ingr_tag&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;\t[A-Z]+\tO\n&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; B-INGREDIENT&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                     &lt;span class="n"&gt;ingr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ingr_tag&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ingr&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;\t[A-Z]+\tO\n&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ingr_tag&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; I-INGREDIENT&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In part 2, it tags the ingredients:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ingr_tokens&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;      &lt;span class="c1"&gt;#Part 2
&lt;/span&gt;                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ingr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;recipe&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;\t[NN][A-Z]*\tO&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ingr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;recipe&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;group&lt;/span&gt;&lt;span class="p"&gt;()):&lt;/span&gt;
                    &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ingr&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ingr_tag&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
            &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ingr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;recipe&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                    &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ingr&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ingr_tag&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;recipe&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;\n(.*)\t.*\t(.*)&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\\&lt;/span&gt;&lt;span class="s"&gt;1 &lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s"&gt;2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;recipe&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;recipe&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  What about other entities?
&lt;/h3&gt;

&lt;p&gt;Once ingredients were tagged, we can easily tag quantities and units. We first individuated some entity patterns and then tagged them using a set of regex:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fa5sgn0h12b18350xc7k8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fa5sgn0h12b18350xc7k8.png" alt="entities patterns"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;All very well, but… how did we build the list? what assures us it is complete? what does NN mean in the code? these and other questions will be answered in the &lt;a href="https://medium.com/@condenastitaly/when-food-meets-ai-the-smart-recipe-project-12bfaa3acfd7" rel="noopener noreferrer"&gt;medium&lt;/a&gt;. Go read it!&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;When Food meets AI: the Smart Recipe Project&lt;/strong&gt;&lt;br&gt;
a series of 6 amazing articles&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Table of content&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://dev.to/condenastitaly/when-food-meets-ai-the-smart-recipe-project-mg2"&gt;Part 1: Cleaning and manipulating food data&lt;/a&gt;&lt;br&gt;
&lt;a href="https://dev.to/condenastitaly/when-food-meets-ai-the-smart-recipe-project-29bf"&gt;Part 1: A smart method for tagging your datasets&lt;/a&gt;&lt;br&gt;
&lt;a href="https://dev.to/condenastitaly/when-food-meets-ai-the-smart-recipe-project-2d6e"&gt;Part 2: NER for all tastes: extracting information from cooking recipes&lt;/a&gt;&lt;br&gt;
&lt;a href="https://dev.to/condenastitaly/when-food-meets-ai-the-smart-recipe-project-21pg"&gt;Part 2: Neither fish nor fowl? Classify it with the Smart Ingredient Classifier&lt;/a&gt;&lt;br&gt;
&lt;a href="https://dev.to/condenastitaly/when-food-meets-ai-the-smart-recipe-project-27pm"&gt;Part 3: FoodGraph: a graph database to connect recipes and food data&lt;/a&gt;&lt;br&gt;
&lt;a href="https://dev.to/condenastitaly/when-food-meets-ai-the-smart-recipe-project-b3g"&gt;Part 3. FoodGraph: Loading data and Querying the graph with SPARQL&lt;/a&gt;&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>python</category>
      <category>ai</category>
      <category>food</category>
    </item>
    <item>
      <title>When Food meets AI: the Smart Recipe Project</title>
      <dc:creator>Condé Nast Italy</dc:creator>
      <pubDate>Thu, 02 Jul 2020 10:37:09 +0000</pubDate>
      <link>https://dev.to/condenastitaly/when-food-meets-ai-the-smart-recipe-project-mg2</link>
      <guid>https://dev.to/condenastitaly/when-food-meets-ai-the-smart-recipe-project-mg2</guid>
      <description>&lt;h3&gt;
  
  
  Part 1: Cleaning and manipulating food data
&lt;/h3&gt;

&lt;p&gt;Cooking recipes, videos, photos are everywhere on the web, which is today the greatest archive of food-related content.&lt;br&gt;
But ​what if this big amount of data meets Artificial Intelligence? We in the Smart Recipe Project answered the question of developing systems able to interpret and extract information from food recipes.&lt;br&gt;
Are you wondering how?&lt;/p&gt;
&lt;h3&gt;
  
  
  The project step-by-step:
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;using NLP techniques, we enriched data, labeling entities and adding entity-specific information;&lt;/li&gt;
&lt;li&gt;exploiting state of the art ML and DL models, we developed services able to automatically extract information from recipes;&lt;/li&gt;
&lt;li&gt;adopting the Amazon Neptune technology, we built graph databases to store and navigate relationships among data.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;But first... we collected and cleaned the data.&lt;/p&gt;
&lt;h3&gt;
  
  
  Data Extraction
&lt;/h3&gt;

&lt;p&gt;Using Python and its text manipulation libraries, we extracted recipes from tsv databases:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;data_extractor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df_content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;df_ingredients&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;df_steps&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dim&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="n"&gt;list_cell&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
   &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cell&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df_content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;dim&lt;/span&gt;&lt;span class="p"&gt;]):&lt;/span&gt; 
      &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cell&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="s"&gt;'nan'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
         &lt;span class="n"&gt;list_cell&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cell&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; 
      &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
         &lt;span class="n"&gt;list_cell&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;df_ingredients&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;'&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;'&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;df_steps&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;]))&lt;/span&gt; 
   &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;list_cell&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;h3&gt;
  
  
  Data Cleaning
&lt;/h3&gt;

&lt;p&gt;Then cleaned them with a couple of regex:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;clean_recipe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;recipe&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;regex_list&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
   &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;regex1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;regex2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...)&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;regex_list&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
      &lt;span class="n"&gt;recipe&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;regex1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;regex2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; 
   &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;recipe&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;h3&gt;
  
  
  Data preprocessing
&lt;/h3&gt;

&lt;p&gt;Finally, we 1) tokenized and 2) pos tagged the data with NLTK:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;nltk&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;tokenize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;recipe&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
   &lt;span class="n"&gt;sentences&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;nltk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sent_tokenize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;recipe&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;language&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"English"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
   &lt;span class="n"&gt;tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
   &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;nltk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;MWETokenizer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sentence&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"english"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt;
&lt;span class="n"&gt;sentence&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;sentences&lt;/span&gt; &lt;span class="p"&gt;]&lt;/span&gt; 
   &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;tokens&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;





&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;pos_tagger&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;recipe&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; 
   &lt;span class="n"&gt;tagged_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;
   &lt;span class="n"&gt;tokenized_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tokenize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;clean_recipe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;recipe&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;regex_list&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
   &lt;span class="n"&gt;tagged_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[[&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tag_token&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="n"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\t&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tag_token&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; 
   &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;tag_token&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;nltk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pos_tag&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;tokens&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;tokenized_text&lt;/span&gt; &lt;span class="p"&gt;]&lt;/span&gt;
   &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;tagged_tokens&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Curious about the output? Go on &lt;a href="https://medium.com/@condenastitaly/when-food-meets-ai-the-smart-recipe-project-2cf5ecc8c2be?sk=bfd64dbe34693083897a18b1ae7c07e0"&gt;Medium to read the complete article&lt;/a&gt; and find out more about the most appetizing stages of our work.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;When Food meets AI: the Smart Recipe Project&lt;/strong&gt;&lt;br&gt;
a series of 6 amazing articles&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Table of contents&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://dev.to/condenastitaly/when-food-meets-ai-the-smart-recipe-project-mg2"&gt;Part 1: Cleaning and manipulating food data&lt;/a&gt;&lt;br&gt;
&lt;a href="https://dev.to/condenastitaly/when-food-meets-ai-the-smart-recipe-project-29bf"&gt;Part 1: A smart method for tagging your datasets&lt;/a&gt;&lt;br&gt;
&lt;a href="https://dev.to/condenastitaly/when-food-meets-ai-the-smart-recipe-project-2d6e"&gt;Part 2: NER for all tastes: extracting information from cooking recipes&lt;/a&gt;&lt;br&gt;
&lt;a href="https://dev.to/condenastitaly/when-food-meets-ai-the-smart-recipe-project-21pg"&gt;Part 2: Neither fish nor fowl? Classify it with the Smart Ingredient Classifier&lt;/a&gt;&lt;br&gt;
&lt;a href="https://dev.to/condenastitaly/when-food-meets-ai-the-smart-recipe-project-27pm"&gt;Part 3: FoodGraph: a graph database to connect recipes and food data&lt;/a&gt;&lt;br&gt;
&lt;a href="https://dev.to/condenastitaly/when-food-meets-ai-the-smart-recipe-project-b3g"&gt;Part 3. FoodGraph: Loading data and Querying the graph with SPARQL&lt;/a&gt;&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>artificialintelligenge</category>
      <category>food</category>
      <category>python</category>
    </item>
  </channel>
</rss>
