<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: crowintelligence</title>
    <description>The latest articles on DEV Community by crowintelligence (@crowintelligence).</description>
    <link>https://dev.to/crowintelligence</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F379928%2F9d1c9d1b-d654-4852-8a80-936d82156d81.jpg</url>
      <title>DEV Community: crowintelligence</title>
      <link>https://dev.to/crowintelligence</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/crowintelligence"/>
    <language>en</language>
    <item>
      <title>Graph Theory and Network Science for Natural Language Processing – Part 2, Databases and Analytics Engines </title>
      <dc:creator>crowintelligence</dc:creator>
      <pubDate>Tue, 30 Jun 2020 10:20:01 +0000</pubDate>
      <link>https://dev.to/crowintelligence/graph-theory-and-network-science-for-natural-language-processing-part-2-databases-and-analytics-engines-1942</link>
      <guid>https://dev.to/crowintelligence/graph-theory-and-network-science-for-natural-language-processing-part-2-databases-and-analytics-engines-1942</guid>
      <description>&lt;p&gt;From keyword extraction to knowledge graphs, graph and network science offer a good framework to deal with natural language. We love using graph-based methods in our work so much, like&lt;a href="https://crowintelligence.org/2020/03/27/what-if-you-need-more-labeled-data-label-spreading-and-propagation/"&gt; generating more labeled data&lt;/a&gt;, &lt;a href="https://crowintelligence.org/2020/03/20/from-babbling-to-talking-visualizing-language-acquisition/"&gt;visualizing language acquisition&lt;/a&gt; and &lt;a href="https://crowintelligence.org/2020/04/03/the-marriage-of-artificial-intelligence-and-art/"&gt;shedding light on hidden biases in language&lt;/a&gt;, that we decided to start a series on the topic. &lt;a href="https://crowintelligence.org/2020/06/22/graph-theory-and-network-science-for-natural-language-processing-part-1/"&gt;The first part&lt;/a&gt; explored the theoretical background of network science and dealt with graphs using Python. This part focuses on graph processing frameworks and graph databases.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why do we need graph databases and frameworks?
&lt;/h2&gt;

&lt;p&gt;The question seems to be naive for everyone but newbies. You should keep in mind that at some point:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your data doesn’t fit into your computer’s memory.&lt;/li&gt;
&lt;li&gt;Data processing lasts for ages even if you use parallelization techniques.&lt;/li&gt;
&lt;li&gt;It is too complicated to use csv, json, parquet or any other file format.&lt;/li&gt;
&lt;li&gt;You must manage your data, because it is changing over time.&lt;/li&gt;
&lt;li&gt;You need to process your data frequently to answer various questions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As we mentioned in the first part of the series, NetworkX is not good at handling large networks, i.e. about over 100.000 nodes, but it really depends on the structure of the network. If you work with a large dataset, you need to use two tools, namely one for processing it (e.g. to compute centrality measures, find clusters) and another for storing it and running analytic queries on it (e.g. find the shortest path between two nodes, list all nodes that can be reached from a given node within five or less step).&lt;/p&gt;

&lt;h2&gt;
  
  
  Graph Databases
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--KMsBo0-s--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/crowintelligence.org/wp-content/uploads/2020/06/Gremlin_programming_language.png%3Fw%3D800%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--KMsBo0-s--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/crowintelligence.org/wp-content/uploads/2020/06/Gremlin_programming_language.png%3Fw%3D800%26ssl%3D1" alt=""&gt;&lt;/a&gt;Source: &lt;a href="https://en.wikipedia.org/wiki/Gremlin_(query_language)#/media/File:Gremlin_(programming_language).png"&gt;https://en.wikipedia.org/wiki/Gremlin_(query_language)#/media/File:Gremlin_(programming_language).png&lt;/a&gt;The landscape of graph databases is huge and complicated. Read &lt;a href="https://graphaware.com/graphaware/2020/02/17/graph-technology-landscape-2020.html"&gt;this post&lt;/a&gt; if you want to get a systematic overview of it. We have a very opinionated position on graph databases, we like open source and open standards, so we like graph databases that support the &lt;a href="http://tinkerpop.apache.org/gremlin.html"&gt;Gremlin&lt;/a&gt; graph traversal machine and language. The Gremlin language enables one to host language embedding, so you can use it in your own language in a very idiomatic way.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--cAByrRnI--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/crowintelligence.org/wp-content/uploads/2020/06/Bechberger-GD-MEAP-HI.jpg%3Fw%3D800%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--cAByrRnI--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/crowintelligence.org/wp-content/uploads/2020/06/Bechberger-GD-MEAP-HI.jpg%3Fw%3D800%26ssl%3D1" alt=""&gt;&lt;/a&gt;Source: &lt;a href="https://images.manning.com/book/b/7825565-46a5-4846-b899-a0dfb64e54bb/Bechberger-GD-MEAP-HI.png"&gt;https://images.manning.com/book/b/7825565-46a5-4846-b899-a0dfb64e54bb/Bechberger-GD-MEAP-HI.png&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you want to learn more about what graph databases offer, how to model your data and what kind of queries can be run on such dbs, read &lt;em&gt;Graph Databases in Action&lt;/em&gt; from Bechberger and Perryman – it’s freely available on &lt;a href="https://livebook.manning.com/book/graph-databases-in-action/welcome/v-9/"&gt;its website&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--DLelGuYX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i0.wp.com/crowintelligence.org/wp-content/uploads/2020/06/janusgraph.png%3Fw%3D800%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--DLelGuYX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i0.wp.com/crowintelligence.org/wp-content/uploads/2020/06/janusgraph.png%3Fw%3D800%26ssl%3D1" alt=""&gt;&lt;/a&gt;Source: &lt;a href="https://janusgraph.org/img/janusgraph.png"&gt;https://janusgraph.org/img/janusgraph.png&lt;/a&gt;There are billions of graph databases, but we especially love &lt;a href="https://janusgraph.org/"&gt;JanusGraph&lt;/a&gt;. It is 100% open source and as per our experience it works fine, though it is not perfect.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--bRfc3jIW--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i0.wp.com/crowintelligence.org/wp-content/uploads/2020/06/neo4j.png%3Ffit%3D800%252C322%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--bRfc3jIW--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i0.wp.com/crowintelligence.org/wp-content/uploads/2020/06/neo4j.png%3Ffit%3D800%252C322%26ssl%3D1" alt=""&gt;&lt;/a&gt;Source: &lt;a href="https://en.wikipedia.org/wiki/Neo4j#/media/File:Neo4j-2015-logo.png"&gt;https://en.wikipedia.org/wiki/Neo4j#/media/File:Neo4j-2015-logo.png&lt;/a&gt;Probably &lt;a href="https://neo4j.com/"&gt;neo4j&lt;/a&gt; is the most comprehensive and most advanced graph database which is widely used in the industry . We think it is superior to others, but it is not fully open. Of course you can use its community edition for learning and testing. It supports Gremlin, so it is also a good choice to work with.&lt;/p&gt;

&lt;h2&gt;
  
  
  Graph Frameworks – really it’s just Spark
&lt;/h2&gt;

&lt;p&gt;All graph processing framework build on&lt;a href="https://dl.acm.org/doi/pdf/10.1145/1807167.1807184"&gt; a paper from Google&lt;/a&gt; that describes its internal system for large scale graph processing called Pregel after the river of Königsberg, and yes, this river had those seven bridges.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--pUVEwhxU--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/crowintelligence.org/wp-content/uploads/2020/06/Apache_Spark_logo.svg_.png%3Ffit%3D800%252C416%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--pUVEwhxU--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/crowintelligence.org/wp-content/uploads/2020/06/Apache_Spark_logo.svg_.png%3Ffit%3D800%252C416%26ssl%3D1" alt=""&gt;&lt;/a&gt;Source: &lt;a href="https://upload.wikimedia.org/wikipedia/commons/f/f3/Apache_Spark_logo.svg"&gt;https://upload.wikimedia.org/wikipedia/commons/f/f3/Apache_Spark_logo.svg&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There are three major frameworks for graph processing: &lt;a href="https://hadoop.apache.org/"&gt;Apache Hadoop&lt;/a&gt;, &lt;a href="https://giraph.apache.org/"&gt;Apache Giraph&lt;/a&gt; and &lt;a href="https://spark.apache.org/"&gt;Apache Spark&lt;/a&gt;. Apache Giraph is the only one which was made only for graph processing. Sadly it is neither an actively maintained project nor a well documented one. Hadoop and Spark are big data analytics engines with graph processing capabilities. These days Spark seems to be more popular, at least among data scientists.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--E9fbgSB5--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i0.wp.com/crowintelligence.org/wp-content/uploads/2020/06/graphx_logo.png%3Fw%3D800%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--E9fbgSB5--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i0.wp.com/crowintelligence.org/wp-content/uploads/2020/06/graphx_logo.png%3Fw%3D800%26ssl%3D1" alt=""&gt;&lt;/a&gt;Source: &lt;a href="https://spark.apache.org/docs/latest/img/graphx_logo.png"&gt;https://spark.apache.org/docs/latest/img/graphx_logo.png&lt;/a&gt;&lt;a href="https://spark.apache.org/graphx/"&gt;GraphX&lt;/a&gt; is the graph and parallel computing API of Spark. Although it is far from being a perfect tool, it is widely used by the industry, very robust and well-supported by documentations and by a big user base.&lt;/p&gt;

&lt;h2&gt;
  
  
  OK, but how these things are used in NLP/ML?
&lt;/h2&gt;

&lt;p&gt;Deep Learning is the sexiest things on earth these days, but it needs lots of data. Google is using its Pregel system to feed its algorithm in a semi-supervised way. &lt;a href="https://arxiv.org/pdf/1512.01752.pdf"&gt;This paper&lt;/a&gt; explains how Pregel is used for a kind of l&lt;a href="https://crowintelligence.org/2020/03/27/what-if-you-need-more-labeled-data-label-spreading-and-propagation/"&gt;abel spreading&lt;/a&gt; method to boost training data. Such system used to train the smart reply function of Gmail and it helped to improve Google’s sentiment analyzer.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--af2Svl6x--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/crowintelligence.org/wp-content/uploads/2020/06/1_64AZ80NoAO8wH1RVGToSKg.png%3Ffit%3D800%252C271%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--af2Svl6x--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/crowintelligence.org/wp-content/uploads/2020/06/1_64AZ80NoAO8wH1RVGToSKg.png%3Ffit%3D800%252C271%26ssl%3D1" alt=""&gt;&lt;/a&gt;Source: &lt;a href="https://encrypted-tbn0.gstatic.com/images?q=tbn%3AANd9GcT46T0yTqDOiXRqTY3Fts9LRYwcBKIgAZ29UQ&amp;amp;usqp=CAU"&gt;https://encrypted-tbn0.gstatic.com/images?q=tbn%3AANd9GcT46T0yTqDOiXRqTY3Fts9LRYwcBKIgAZ29UQ&amp;amp;usqp=CAU&lt;/a&gt;Graph databases can be used for various task, but Knowledge Graphs are the most well-known examples. Historically, Google developed its Knowledge Graph service to enhance its search results with factual information on the basis of Freebase, a semantic database. Now, the name of the service is a synonym of semantic databases. Building knowledge graphs is a very common NLP task in the industry. E.g. by using a named entity recognizer you can build a very simple one based on the co-occurrence of entities, or you can take a step further and by using relation mining you can determine the type of the connection between the co-occuring entities. Read &lt;a href="https://www.analyticsvidhya.com/blog/2019/10/how-to-build-knowledge-graph-text-using-spacy/"&gt;this post&lt;/a&gt; to see a simple example of building a knowledge graph from unstructured text. The knowledge graph is usually stored in a graph database. Graph analytics is used to enhance the data with centrality measures, cluster, and other metrics. Also, graph analytics helps to filter out unwanted datapoints.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--LaEGtvEH--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i0.wp.com/crowintelligence.org/wp-content/uploads/2020/06/OReilly-Graph-Algorithms_v2_ol1.jpg%3Fw%3D800%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--LaEGtvEH--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i0.wp.com/crowintelligence.org/wp-content/uploads/2020/06/OReilly-Graph-Algorithms_v2_ol1.jpg%3Fw%3D800%26ssl%3D1" alt=""&gt;&lt;/a&gt;Source: &lt;a href="https://dist.neo4j.com/wp-content/uploads/20190326120839/OReilly-Graph-Algorithms_v2_ol1.jpg"&gt;https://dist.neo4j.com/wp-content/uploads/20190326120839/OReilly-Graph-Algorithms_v2_ol1.jpg&lt;/a&gt;&lt;em&gt;Graph Algorithms Paractical Examples in Apache Spark and Neo4j&lt;/em&gt; by Needham and Hodler is full of great examples of using graph analytics and graph databases. You can download it for free after filling out a form &lt;a href="https://neo4j.com/graph-algorithms-book/"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;No one works alone in the real-world. Data engineers tend toprovide data scientists with the necessary infrastructure. So you don’t have to become an expert in graph databases and processing frameworks, but you should know enough to work with your peers and communicate with them.&lt;/p&gt;

&lt;h2&gt;
  
  
  What’s coming up next?
&lt;/h2&gt;

&lt;p&gt;If you are interested in this topic, we have a good news. Alessandro Negro of GraphAware and author of &lt;a href="https://www.manning.com/books/graph-powered-machine-learning"&gt;Graph-Powered Machine Learning&lt;/a&gt; will speak about &lt;em&gt;Using Knowledge Graphs to predict customer needs, improve product quality and save costs&lt;/em&gt; at our upcoming meetup. He will also present a demo, &lt;em&gt;Fighting corona virus with Knowledge Graph and Hume&lt;/em&gt;. &lt;a href="https://www.meetup.com/Hungarian-nlp/events/271201765/"&gt;Register here&lt;/a&gt; to attend the online event, or you can watch the recorded talk later on our&lt;a href="https://www.youtube.com/channel/UCPDpTte5_zC9IX-iv8UnNqA"&gt; YouTube channel&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In the third part of this blog series we will introduce the open source tools to visualize smallish and large graphs. Stay tuned!&lt;/p&gt;

&lt;h2&gt;
  
  
  Subscribe to our newsletter
&lt;/h2&gt;

&lt;p&gt;Get highlights on NLP, AI, and applied cognitive science straight into your inbox.&lt;/p&gt;

&lt;p&gt;Enter your email address&lt;/p&gt;

&lt;p&gt;&lt;a href="https://tinyletter.com"&gt;powered by TinyLetter&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="http://creativecommons.org/licenses/by-nc-sa/4.0/"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--XPRdnNRf--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/i.creativecommons.org/l/by-nc-sa/4.0/88x31.png%3Fw%3D800%26ssl%3D1" alt="Creative Commons License"&gt;&lt;/a&gt;&lt;br&gt;
This work is licensed under a &lt;a href="http://creativecommons.org/licenses/by-nc-sa/4.0/"&gt;Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>networkscience</category>
      <category>graphtheory</category>
      <category>spark</category>
      <category>janusgraph</category>
    </item>
    <item>
      <title>Graph Theory and Network Science for Natural Language Processing - Part 1 </title>
      <dc:creator>crowintelligence</dc:creator>
      <pubDate>Tue, 30 Jun 2020 10:18:48 +0000</pubDate>
      <link>https://dev.to/crowintelligence/graph-theory-and-network-science-for-natural-language-processing-part-1-32i0</link>
      <guid>https://dev.to/crowintelligence/graph-theory-and-network-science-for-natural-language-processing-part-1-32i0</guid>
      <description>&lt;p&gt;From keyword extraction to knowledge graphs, graph and network science offer a good framework to deal with natural language. We love using graph-based methods in our work, like &lt;a href="https://crowintelligence.org/2020/03/27/what-if-you-need-more-labeled-data-label-spreading-and-propagation/"&gt;generating more labeled data&lt;/a&gt;, &lt;a href="https://crowintelligence.org/2020/03/20/from-babbling-to-talking-visualizing-language-acquisition/"&gt;visualizing language acquisition&lt;/a&gt; and &lt;a href="https://crowintelligence.org/2020/04/03/the-marriage-of-artificial-intelligence-and-art/"&gt;shedding light on hidden biases in language&lt;/a&gt;. This series gives you tips on how to get started with graph and network theory, which Python tools to use, where to look for graph databases and how to visualize networks, finally we offer a few resources on Graph Neural Networks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Graph Theory and Network Science
&lt;/h2&gt;

&lt;p&gt;First of all, one might ask what’s the difference between Graph Theory and Network Science. We argue that there is no sharp boundary between the two fields. It seems that NLP practitioners tend to prefer graphs to networks, while cognitive scientists and AI researchers tend to have reversed preferences. We’ll be sloppy and use the two terms interchangeably here. But for the sake of those who stick to the separation of the two field, let’s see how Wikipedia defines them&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Network science is an academic field which studies complex networks such as telecommunication networks, computer networks, biological networks, cognitive and semantic networks, and social networks, considering distinct elements or actors represented by nodes (or vertices) and the connections between the elements or actors as links (or edges). “&lt;/p&gt;

&lt;p&gt;Wikipedia: &lt;a href="https://en.wikipedia.org/wiki/Network_science"&gt;Network Science&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;“In mathematics, graph theory is the study of &lt;em&gt;graphs&lt;/em&gt;, which are mathematical structures used to model pairwise relations between objects. “&lt;/p&gt;

&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Graph_theory"&gt;Wikipedia: Graph theory&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Graphs and networks are versatile fields on their own. Here we focus on the very basics of the theory behind them. For the practical parts, we only deal with resources available to Pythonistas.&lt;/p&gt;

&lt;h2&gt;
  
  
  Theoretical Background
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--LpwbRT9F--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/crowintelligence.org/wp-content/uploads/2020/06/Truedeau-1.jpg%3Ffit%3D705%252C1024%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--LpwbRT9F--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/crowintelligence.org/wp-content/uploads/2020/06/Truedeau-1.jpg%3Ffit%3D705%252C1024%26ssl%3D1" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Richard J. Trudeau’s &lt;em&gt;Introduction to Graph Theory&lt;/em&gt; is a short, cheap, and accessible introduction into the field. It is a math classic from Dover, containing just enough material to get started with graphs.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--BjTSozFE--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/crowintelligence.org/wp-content/uploads/2020/06/barabasi.png%3Fw%3D800%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--BjTSozFE--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/crowintelligence.org/wp-content/uploads/2020/06/barabasi.png%3Fw%3D800%26ssl%3D1" alt=""&gt;&lt;/a&gt;Barabási, Network Science&lt;/p&gt;

&lt;p&gt;&lt;a href="http://networksciencebook.com/"&gt;&lt;em&gt;Network Science&lt;/em&gt;&lt;/a&gt; by Albert-László Barabási is a comprehensive, freely available textbook. It can be used as a reference work to look up the gritty nitty details of network theory from time to time. Don’t be scared by the long chapters of the book. To understand graph-based NLP, you don’t need the second half of it (from chapter 6).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--34-ud3Dw--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i0.wp.com/crowintelligence.org/wp-content/uploads/2020/06/graphnlp.jpg%3Ffit%3D708%252C1024%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--34-ud3Dw--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i0.wp.com/crowintelligence.org/wp-content/uploads/2020/06/graphnlp.jpg%3Ffit%3D708%252C1024%26ssl%3D1" alt=""&gt;&lt;/a&gt;&lt;em&gt;Graph-Based Natural Language Processing and Information Retrieval&lt;/em&gt; by Mihalcea and Radev is a short (less than 190 pages) yet comprehensive book. The authors are top-tier researchers of their field, their TextRank algorithm is one of the best unsupervised keyword extraction and extractive summarizer algorithms. The book gives you a comprehensive overview of graph-based methods in NLP. You can use it as a texbook as well as a reference work. Its first and second chapters (which are devoted to Graph Theory and Graph Based Algorithms respectively) are not suitable for complete beginners. Instead, we recommend Trudeau’s and Barabási’s books to learn the basics of graph theory and network science. If you want to learn more about graph algorithms, read &lt;a href="https://crowintelligence.org/2020/03/09/so-you-wanna-learn-algorithms/"&gt;our post on resources to learn the basics of algorithms&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Python way
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--xGRKhYn3--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i0.wp.com/crowintelligence.org/wp-content/uploads/2020/06/networkx_logo.jpg%3Fw%3D800%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--xGRKhYn3--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i0.wp.com/crowintelligence.org/wp-content/uploads/2020/06/networkx_logo.jpg%3Fw%3D800%26ssl%3D1" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Although there is a plethora of network packages, &lt;a href="https://networkx.github.io/"&gt;NetworkX&lt;/a&gt; stands out as one of the most comprehensive Python package, one with an active group of maintainers. It is awesome for small and medium sized networks up to about 100.000 nodes. Check out &lt;a href="https://www.timlrx.com/2020/05/10/benchmark-of-popular-graph-network-packages-v2/"&gt;this post&lt;/a&gt; on benchmarking all major graph libraries to select the one that best suits to your needs. Unless you have a very specific problem, we strongly recommend using NetworkX. If your network is too large, you should use a graph processing framework to analyze it. You’ll also need a graph database to store it and run analytic queries on it. We’ll cover these topics in the second part of these series. Now let’s turn back to Python tools.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--0GYNNgfZ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/crowintelligence.org/wp-content/uploads/2020/06/sna.png%3Fw%3D800%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--0GYNNgfZ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/crowintelligence.org/wp-content/uploads/2020/06/sna.png%3Fw%3D800%26ssl%3D1" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Social Network Analysis for Startups by Tsvetovat and Kouznetsov is a fantastic book despite its misleading title. This book is a practical introduction into graph theory/network science and social &lt;a href="https://en.wikipedia.org/wiki/Social_network_analysis"&gt;network analysis&lt;/a&gt; using Python. The chapters follow each other in a logical manner, the examples are really good, and the explanations are superb. The only problem with this book is its age. Having published in 2011, this book shows its age and you have to adapt the example code to the present day versions of Python, matplotlib, NetworkX and other tools.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--KC1udZCY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/crowintelligence.org/wp-content/uploads/2020/06/51flQ-bF8L._SX415_BO1204203200_.jpg%3Fw%3D800%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--KC1udZCY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/crowintelligence.org/wp-content/uploads/2020/06/51flQ-bF8L._SX415_BO1204203200_.jpg%3Fw%3D800%26ssl%3D1" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Complex Network Analysis in Python&lt;/em&gt; by Zinoviev is a more recent title. It uses NetworkX to teach network science in a pragmatic manner. The first part deals with the basics, the second is devoted to classic explicit networks. The third and fourth parts are rare gems. They are dealing with creating networks, based on co-occurrence and similarity,. These are topics which are hardly found in other sources! The last part is devoted to directed networks, which sadly contains only one chapter.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--MWJtS5-f--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/crowintelligence.org/wp-content/uploads/2020/06/Negro-GP-MEAP-HI.jpg%3Fw%3D800%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--MWJtS5-f--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/crowintelligence.org/wp-content/uploads/2020/06/Negro-GP-MEAP-HI.jpg%3Fw%3D800%26ssl%3D1" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you are interested in graph-based methods in machine learning in general, &lt;em&gt;Graph-Powered Machine Learning&lt;/em&gt; by Alessandro Negro is the best resource to use. It is freely available &lt;a href="https://livebook.manning.com/book/graph-powered-machine-learning/welcome/v-4/"&gt;here&lt;/a&gt;. By the way, Alessandro will speak at our meetup soon. &lt;a href="https://www.meetup.com/Hungarian-nlp/events/271201765/"&gt;Register here&lt;/a&gt; to attend the online event, or you can watch the recorded talk later on our &lt;a href="https://www.youtube.com/channel/UCPDpTte5_zC9IX-iv8UnNqA"&gt;YouTube channel&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What’s coming up next?
&lt;/h2&gt;

&lt;p&gt;We hope you enjoyed our journey in the world of graphs and networks. In the next part, we will collect the best resources on graph analytics frameworks and graph databases. The third part will be devoted to visualization. Stay tuned!&lt;/p&gt;

&lt;h2&gt;
  
  
  Subscribe to our newsletter
&lt;/h2&gt;

&lt;p&gt;Get highlights on NLP, AI, and applied cognitive science straight into your inbox.&lt;/p&gt;

&lt;p&gt;Enter your email address&lt;/p&gt;

&lt;p&gt;&lt;a href="https://tinyletter.com"&gt;powered by TinyLetter&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="http://creativecommons.org/licenses/by-nc-sa/4.0/"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--XPRdnNRf--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/i.creativecommons.org/l/by-nc-sa/4.0/88x31.png%3Fw%3D800%26ssl%3D1" alt="Creative Commons License"&gt;&lt;/a&gt;&lt;br&gt;
This work is licensed under a &lt;a href="http://creativecommons.org/licenses/by-nc-sa/4.0/"&gt;Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>networkscience</category>
      <category>graphtheory</category>
      <category>python</category>
      <category>datascience</category>
    </item>
    <item>
      <title>How to fuel your data-driven business with text data? – Part 2, Strategies and Tools</title>
      <dc:creator>crowintelligence</dc:creator>
      <pubDate>Tue, 30 Jun 2020 10:16:37 +0000</pubDate>
      <link>https://dev.to/crowintelligence/how-to-fuel-your-data-driven-business-with-text-data-part-2-strategies-and-tools-3hb4</link>
      <guid>https://dev.to/crowintelligence/how-to-fuel-your-data-driven-business-with-text-data-part-2-strategies-and-tools-3hb4</guid>
      <description>&lt;p&gt;If data is the new oil, then getting and enriching data is like fracking and refining it, at least in the case of textual data. Our &lt;a href="https://crowintelligence.org/2020/06/11/how-to-fuel-your-data-driven-business-with-text-data/"&gt;previous post&lt;/a&gt; introduced the basic idea of data gathering and annotation. Now we help you with the strategies and tools you can employ to fuel your algorithms.&lt;/p&gt;

&lt;p&gt;Both data gathering and annotation are complex enterprises. It should be carefully thought over who you trust to carry out these tasks and what tools you employ. Let’s see our tips on data gathering and annotation strategies and tools.&lt;/p&gt;

&lt;h2&gt;
  
  
  Data gathering options
&lt;/h2&gt;

&lt;h3&gt;
  
  
  In-house solution
&lt;/h3&gt;

&lt;p&gt;As we mentioned in the &lt;a href="https://crowintelligence.org/2020/06/11/how-to-fuel-your-data-driven-business-with-text-data/"&gt;first part &lt;/a&gt;of this series, data gathering should be think of as a process. That’s why most of our clients want to build their in-house capabilities. This way they can be flexible and react very fast to changes in the requirements. If one goes for the in-house solution, there are plenty of tools to use. Our favorite one is &lt;a href="https://scrapy.org/"&gt;Scrapy&lt;/a&gt;, the lingua franca of scraping and crawling the web. It is a mature and well-maintained Python framework with excellent documentation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--jlG7G6sP--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i0.wp.com/crowintelligence.org/wp-content/uploads/2020/05/scrapy.jpg%3Ffit%3D800%252C321%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--jlG7G6sP--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i0.wp.com/crowintelligence.org/wp-content/uploads/2020/05/scrapy.jpg%3Ffit%3D800%252C321%26ssl%3D1" alt="" width="800" height="321"&gt;&lt;/a&gt;Source: &lt;a href="https://miro.medium.com/max/1200/1*YJNS0JVl7RsVDTmORGZ6xA.png"&gt;https://miro.medium.com/max/1200/1*YJNS0JVl7RsVDTmORGZ6xA.png&lt;/a&gt;You can learn the basics of Scrapy and web scraping within a short time. A few minutes of googling will provide you with excellent tutorials. Our favorite resource is Mitchell’s &lt;em&gt;Web Scraping with Python&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--RDWXR_x7--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i0.wp.com/crowintelligence.org/wp-content/uploads/2020/05/lrg.jpg%3Fw%3D800%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--RDWXR_x7--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i0.wp.com/crowintelligence.org/wp-content/uploads/2020/05/lrg.jpg%3Fw%3D800%26ssl%3D1" alt="" width="500" height="656"&gt;&lt;/a&gt;Source: &lt;a href="https://covers.oreillystatic.com/images/0636920078067/lrg.jpg"&gt;https://covers.oreillystatic.com/images/0636920078067/lrg.jpg&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If your company is not a Python shop and/or you are interested in other technologies, take a look at &lt;a href="http://nutch.apache.org/"&gt;Apache Nutch&lt;/a&gt; and &lt;a href="https://github.com/USCDataScience/sparkler"&gt;sparkler&lt;/a&gt;, which is a Spark-based crawler.&lt;/p&gt;

&lt;p&gt;No matter which tool you use, you’ll have to manage your scrapers and the infrastructure around them. Your devops team should be prepared for the needs! You can go for cloud solutions too, e.g. Scrapinghub’s Scrapy Cloud.&lt;/p&gt;

&lt;h3&gt;
  
  
  Outsourcing
&lt;/h3&gt;

&lt;p&gt;Web scraping and crawling seems to be an easy task. Google has already done it for ages! That’s just partly true. Google is able to do it by employing an army of developers and running probably the largest hardware infrastructure of the world. However we learned from our mistakes that scraping is not that simple! We’ve already discussed the problem of modern JavaScript frameworks and locked sites. There are sites that ban particular IP address after a certain amount of requests, so it is a good tactic to &lt;a href="https://www.scrapehero.com/how-to-rotate-proxies-and-ip-addresses-using-python-3/"&gt;rotate your IP address&lt;/a&gt;. Sites are constantly changing these days, so scrapers should be maintained if one needs up-to-date data.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--K0qDRh5f--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/crowintelligence.org/wp-content/uploads/2020/06/Mistakes-to-avoid-when-hiring-freelancers-1.jpg%3Ffit%3D800%252C480%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--K0qDRh5f--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/crowintelligence.org/wp-content/uploads/2020/06/Mistakes-to-avoid-when-hiring-freelancers-1.jpg%3Ffit%3D800%252C480%26ssl%3D1" alt="" width="800" height="480"&gt;&lt;/a&gt;Source: &lt;a href="https://commons.wikimedia.org/wiki/File:Mistakes-to-avoid-when-hiring-freelancers.jpg"&gt;https://commons.wikimedia.org/wiki/File:Mistakes-to-avoid-when-hiring-freelancers.jpg&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you want to get your data from more sources, and you want to update your data on a regular basis, you need to manage your scrapers. There are companies specialized in such tasks! Scraping and crawling are highly specialized skills and most companies don’t need employers with such skills all the time. Chances are high that the easiest way to collect your data is not to compete with such specialized companies to hire developers, but becoming their client. Of course, there are plenty of firms which offer similar solutions. To find the most suitable one, don’t forget that Google is your friend!&lt;/p&gt;

&lt;h3&gt;
  
  
  Crowdsourcing, employing an army of developers
&lt;/h3&gt;

&lt;p&gt;As another option you can look for a specialist on big freelancer sites who can write or update a specific scraper for you. This is the crowdsourcing solution. By splitting up data gathering into small tasks, you can dramatically reduce your costs. You can group the sites into workpackages, or you can treat one site as one job and post them on freelancer sites. However, this option gives you more administrative work. You need to manage your contractors and constantly check the quality of their work. Also, this presupposes a robust architecture for managing and deploying the scrapers/crawlers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Annotation options
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The importance of annotation
&lt;/h3&gt;

&lt;p&gt;Why do we need annotation? The industry usually uses &lt;a href="https://machinelearningmastery.com/supervised-and-unsupervised-machine-learning-algorithms/"&gt;supervised&lt;/a&gt; algorithms, which needs labeled or annotated data. Raw data should be cleaned and labeled before it can fuel any training algorithm. When you read about t&lt;a href="https://www.infoworld.com/article/3228245/the-80-20-data-science-dilemma.html"&gt;he 80% rule in data science&lt;/a&gt;, the articles usually tell you that 80% of each project spent on collecting, cleaning and reshaping data. In case of projects involving textual data, this is not true. The reality suggests that even more time is needed to get your data right and annotated. We would say that &lt;strong&gt;90-95% of the time&lt;/strong&gt; should be devoted to gather, clean, transform and annotate your data. Sometimes even more.&lt;/p&gt;

&lt;p&gt;Regarding textual data, annotation can be carried out at different levels. A label can be given either to the whole text (e.g. its genre, like criminal news), or to each sentence (e.g. the sentence expresses positive or negative sentiment), or to the words/phrases (e.g.&lt;a href="https://en.wikipedia.org/wiki/Named_entity"&gt; Named Entities&lt;/a&gt; like names of persons, firms, institutions, etc.) The more data you have, the better your chances are to build a good model on it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s---ishdU-p--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/crowintelligence.org/wp-content/uploads/2020/06/zyxoairzm1z31.jpg%3Fw%3D800%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s---ishdU-p--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/crowintelligence.org/wp-content/uploads/2020/06/zyxoairzm1z31.jpg%3Fw%3D800%26ssl%3D1" alt="" width="640" height="853"&gt;&lt;/a&gt;Heavily annotated text!&lt;br&gt;
Source: &lt;a href="https://www.reddit.com/r/step1/comments/dx6f8t/mistake_for_those_who_recently_started_preparing/"&gt;https://www.reddit.com/r/step1/comments/dx6f8t/mistake_for_those_who_recently_started_preparing/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A good annotation software makes possible to upload texts in raw format, manage annotators, and define annotation, i.e. what kind of labels can annotators assign to texts or words. Annotators should be prepared for their tasks, which means that they need some training and a guideline at hand during their work. It is a good quality assurance practice to annotate every item, or a a certain percent of the whole corpus with at least three annotators and measure their &lt;a href="https://en.wikipedia.org/wiki/Inter-rater_reliability"&gt;agreement&lt;/a&gt;. One can easily think that annotation is a tedious and very time consuming task – and it is! However, thank to recent advances in the field of &lt;a href="https://www.datacamp.com/community/tutorials/active-learning"&gt;active learning&lt;/a&gt;, the costs and time horizon of annotation tasks can be dramatically reduced. (Read more on this topic in &lt;a href="https://crowintelligence.org/2020/06/04/active-learning-for-natural-language-processing/#more-1399"&gt;Robert Munro&lt;/a&gt;‘s book, &lt;a href="https://www.manning.com/books/human-in-the-loop-machine-learning"&gt;Human-in-the-Loop Machine Learning&lt;/a&gt;). Considering your annotation strategy, you have to keep in mind all these issues! No matter whether you build-up your in-house solution, or run your annotation tasks on crowdsourcing sites or you hire a specialist company.&lt;/p&gt;

&lt;h3&gt;
  
  
  In-house solution
&lt;/h3&gt;

&lt;p&gt;If you’d like to keep the data annotation task within your organization, you’ll need a good annotation tool. You can find free, open source tools like &lt;a href="https://github.com/doccano/doccano"&gt;doccano&lt;/a&gt;. It doesn’t support active learning out of the box, so it is a good task for your Python developers to integrate it with an active learning library. The creators of Spacy made Prodigy, an annotation tool that supports active learning. It’s not free but it is reasonably priced.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--nuNhXFDC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://i2.wp.com/crowintelligence.org/wp-content/uploads/2020/05/doccano.gif%3Fw%3D800%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--nuNhXFDC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://i2.wp.com/crowintelligence.org/wp-content/uploads/2020/05/doccano.gif%3Fw%3D800%26ssl%3D1" alt="" width="800" height="545"&gt;&lt;/a&gt;Source: &lt;a href="https://raw.githubusercontent.com/doccano/doccano/master/docs/images/demo/demo.gif"&gt;https://raw.githubusercontent.com/doccano/doccano/master/docs/images/demo/demo.gif&lt;/a&gt;Now you have data and an annotation tool, so you are ready to plan your annotation task. Read &lt;em&gt;Natural Language Annotation for Machine Learning&lt;/em&gt; by Pustejovsky and Stubbs to learn more about it. Keep in mind, annotation is not a black art, but you need experience to plan and execute it correctly.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--taiMCSew--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i0.wp.com/crowintelligence.org/wp-content/uploads/2020/05/nallangannot.jpg%3Fw%3D800%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--taiMCSew--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i0.wp.com/crowintelligence.org/wp-content/uploads/2020/05/nallangannot.jpg%3Fw%3D800%26ssl%3D1" alt="" width="383" height="499"&gt;&lt;/a&gt;Source: &lt;a href="https://images-na.ssl-images-amazon.com/images/I/51n62wukauL._SX381_BO1,204,203,200_.jpg"&gt;https://images-na.ssl-images-amazon.com/images/I/51n62wukauL._SX381_BO1,204,203,200_.jpg&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Crowdsourcing
&lt;/h3&gt;

&lt;p&gt;If building in-house competencies is not a viable option, it’s worth considering crowdsourcing. You still need someone who describes the tasks, manages the annotation process and takes care of quality issues, but you don’t have to deal much with annotators. Tools, like Amazon’s Mechanical Turk allows one to slice tasks into small micro-tasks, and present them to remote workers via a platform. You don’t have to deal with hiring workers and putting them on your pay-roll, since the crowdsourcing site manages these tasks. Usually, you can set some sort of experience limit, so you can select among applicants on the basis of their expertise. It is a good practice to provide workers with good instructions and a trial task before accepting their application.&lt;/p&gt;

&lt;p&gt;Crowdsourcing can be extremely fast, and if it is done wisely, the results can be of good quality for a relatively low price. However, the more complex the task is, the harder it is to find good workers. Also, crowdsourcing raises ethical and methodological questions both for &lt;a href="https://blogs.lse.ac.uk/impactofsocialsciences/2017/04/05/crowdsourcing-raises-methodological-and-ethical-questions-for-academia/"&gt;academia&lt;/a&gt; and for the &lt;a href="https://www.zdnet.com/article/crowdsourcing-faces-ethical-legal-risks/"&gt;industry&lt;/a&gt;. Also, it can rise privacy issues too.&lt;/p&gt;

&lt;h3&gt;
  
  
  Outsourcing
&lt;/h3&gt;

&lt;p&gt;There are data annotator companies that offer solutions to the problems of crowdsourcing. Such companies employ (permanently or for a limited time) lots of annotators, so their people are well trained, precise and paid better than workers of crowdsourcing sites. They can be of help in planning the annotation task too. Also, such companies are aware of the legal environment, like GDPR. The complete outsourcing of the annotation task to a company seems to be an expensive step, however sometimes it is the best way to get data. The market of such companies is huge and it is relatively easy to find one, you can go for a global provider (like Appen, for example) or look for local companies in your region.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s---qdYee9X--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/crowintelligence.org/wp-content/uploads/2020/06/Crowdsourcing.png%3Fw%3D800%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s---qdYee9X--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/crowintelligence.org/wp-content/uploads/2020/06/Crowdsourcing.png%3Fw%3D800%26ssl%3D1" alt="" width="800" height="565"&gt;&lt;/a&gt;Source: &lt;a href="https://upload.wikimedia.org/wikipedia/commons/7/72/Crowdsourcing.png"&gt;https://upload.wikimedia.org/wikipedia/commons/7/72/Crowdsourcing.png&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Do you need help? Hire Us!
&lt;/h2&gt;

&lt;p&gt;Considering such options can be daunting. Don’t panic! Contact us, and we’ll help you to make the right decision so your algorithms will be fueled by the finest oil.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;p&gt;The header image was downloaded from the following link: &lt;a href="https://www.flickr.com/photos/sfupamr/14601885300"&gt;https://www.flickr.com/photos/sfupamr/14601885300&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Subscribe to our newsletter
&lt;/h2&gt;

&lt;p&gt;Get highlights on NLP, AI, and applied cognitive science straight into your inbox.&lt;/p&gt;

&lt;p&gt;Enter your email address&lt;/p&gt;

&lt;p&gt;&lt;a href="https://tinyletter.com"&gt;powered by TinyLetter&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="http://creativecommons.org/licenses/by-nc-sa/4.0/"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--XPRdnNRf--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/i.creativecommons.org/l/by-nc-sa/4.0/88x31.png%3Fw%3D800%26ssl%3D1" alt="Creative Commons License" width="88" height="31"&gt;&lt;/a&gt;&lt;br&gt;
This work is licensed under a &lt;a href="http://creativecommons.org/licenses/by-nc-sa/4.0/"&gt;Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>naturallanguageprocessing</category>
      <category>machinelearning</category>
      <category>ai</category>
    </item>
    <item>
      <title>How to fuel your data-driven business with text data? – Part 1, Data gathering and annotation</title>
      <dc:creator>crowintelligence</dc:creator>
      <pubDate>Tue, 30 Jun 2020 10:14:37 +0000</pubDate>
      <link>https://dev.to/crowintelligence/how-to-fuel-your-data-driven-business-with-text-data-part-1-data-gathering-and-annotation-51d8</link>
      <guid>https://dev.to/crowintelligence/how-to-fuel-your-data-driven-business-with-text-data-part-1-data-gathering-and-annotation-51d8</guid>
      <description>&lt;p&gt;If data is the new oil, then getting and enriching your own data is like fracking and refining it, at least in the case of textual data. This post gives you an overall picture on how to think about gathering and labeling data. You also get some tips on what kind of business questions should be considered.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Data Science Hierarchy of Needs
&lt;/h2&gt;

&lt;p&gt;These days more and more people try to build a so-called &lt;a href="https://bradfordcross.com/vertical-ai-startups-solving-industry-specific-problems-by-combining-ai-and-subject-matter-expertise/"&gt;vertical AI startup/solution&lt;/a&gt;. These endeavors intend to solve industry specific problems by combining AI and subject matter expertise. They have four distinct features: 1) they are full stack products 2) they rely on subject matter expertise 3) they are built on the top of a proprietary dataset 4) AI delivers core value. Our experience suggests that the third point – getting the right proprietary dataset – is the hardest and most decisive factor regarding every data driven project, being either an intra- or entrepreneur endeavor.&lt;/p&gt;

&lt;p&gt;Most people take data for granted. We get news about the newest deep learning algorithms every day. We live in the era of big data. We hear (at least those who work in the tech field) about new machine learning/artificial intelligence startups every day. So it must be easy to get data!&lt;/p&gt;

&lt;p&gt;On the one hand, yes, there are awesome data repositories, like the &lt;a href="https://archive.ics.uci.edu/ml/index.php"&gt;UCI Machine Learning Repository&lt;/a&gt;. Governments are getting open and they are publishing their data via their own platforms or they are using something like &lt;a href="https://ckan.org/"&gt;CKAN&lt;/a&gt;. But keep in mind, your competitors can access these data too!&lt;/p&gt;

&lt;p&gt;On the other hand, you have to get your own, domain-specific dataset, and annotate it to train your model(s)! Deep learning and other fancy ML algorithms are just the tip of the iceberg. There are plenty of things to do underneath. If you can’t get the underlying levels right, even the sexiest new deep learning algorithm will perform badly on your specific problem. Again, you can start with combining open datasets, but your competitors are doing the same thing too. If you want to deliver real value that is different from your competitors (i.e. better or more precise), you have to build and annotate your own dataset. The popular data science hierarchy of needs pyramid should look like as follows.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s---krA-nm9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/crowintelligence.org/wp-content/uploads/2020/05/datascience_hierarch_of_needs-1.jpg%3Ffit%3D800%252C359%26ssl%3D1" alt="" width="800" height="359"&gt;Source of the original picture: &lt;a href="https://miro.medium.com/max/3760/1*jmk4Q2GAeUM_eqUtMh99oQ.png"&gt;https://miro.medium.com/max/3760/1*jmk4Q2GAeUM_eqUtMh99oQ.png&lt;/a&gt;Separate your tasks
&lt;/h2&gt;

&lt;p&gt;Harvesting and annotating data are two separate tasks done by two different groups. Data collection is often carried out by traditional software engineers, or by the data infrastructure team.While annotation is often lead (and sometimes even done) by Data Scientists/Analysts. A good product manager keeps his or her hands on the data and involves every stakeholder into the process. A PM should always remind one that getting and annotating data is a process, so you should constantly check the quality and scope of your raw and annotated data. The performance of the model you built using the data should be also monitored. You can use evaluation metrics and even some user feedback to plan further data gathering and data annotation task(s), which will help you build even better models.&lt;/p&gt;

&lt;p&gt;Before you consider various options to gather and label data, keep in mind that you should build your initial dataset AND a pipeline/process that will help you train better and better models. Choosing a solution at one phase doesn’t mean that you cannot move to another one at a later phase. But note that transitioning from outsourcing to in-house scraping and labeling can be hard and very costly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Your options
&lt;/h2&gt;

&lt;p&gt;In theory, you have an idea about a product, and you need a special purpose dataset to train its magical AI part. Before you think over your options, you have to answer a few questions. What kind of data do you need in order to train a model? How can you get the data? Should you clean up the raw data before annotation? How much data should be annotated for the first model(s)? What does it mean to make a representative dataset in your case? Probably, you won’t get final answers first, but don’t be afraid as a rough idea is enough initially.&lt;/p&gt;

&lt;p&gt;As a next step you should consider your options of data gathering and annotation, like&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;building in-house competency&lt;/li&gt;
&lt;li&gt;crowdsourcing&lt;/li&gt;
&lt;li&gt;outsourcing&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Pp6wUKvn--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/crowintelligence.org/wp-content/uploads/2020/06/traffic-sign-3015228_960_720.png%3Fw%3D800%26ssl%3D1" alt="" width="720" height="720"&gt;Source: &lt;a href="https://cdn.pixabay.com/photo/2017/12/12/17/59/traffic-sign-3015228_960_720.png"&gt;https://cdn.pixabay.com/photo/2017/12/12/17/59/traffic-sign-3015228_960_720.png&lt;/a&gt;Your constraints
&lt;/h2&gt;

&lt;p&gt;You should know about your constraints like&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;budget&lt;/li&gt;
&lt;li&gt;time&lt;/li&gt;
&lt;li&gt;law&lt;/li&gt;
&lt;li&gt;ethics&lt;/li&gt;
&lt;li&gt;technology&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you know your data sources, check them! Are they plain text or HTML? If they are websites, do you have to login to these sites? Do they use modern JavaScript frameworks, like React? Do these sites/texts contain sensitive information about humans? If you have to scrape a site, check its &lt;a href="https://support.google.com/webmasters/answer/6062608?hl=en"&gt;robots.txt&lt;/a&gt; to learn about what the owners let you scrape! Different regions have different laws to regulate scraping and storing publicly available data. Re-using data gained from scraping is often regulated by law. Although, it can be pretty expensive, ask your lawyer first!&lt;/p&gt;

&lt;p&gt;Keep in mind that if something is legal, it is not necessarily ethical. Your project should be legal AND ethical. It is hard to define what ethical means. Probably your colleagues follow the ethical regulations and guidelines published by professional bodies and governments at your region. If not, ask them to do so! Also, the team should agree on that the goal of the project is in accordance with the members’ ethical norms. Scraping sites that requires login is a shady part of the business. Imagine that your colleague thinks it is actually stealing data and harming the privacy of the users of this site. Will such a colleague build the best scraper for the task? – Presumably, no. So, even if you have nothing against scraping data from certain sources, accept the fact that someone may think that it is not acceptable, even it it is legal.&lt;/p&gt;

&lt;p&gt;Furthermore getting data from the web is not as easy as it sounds. For example modern JavaScript technologies requires a so-called pre-renderer, like Selenium, to pretend that a browser opened the site to show up its content.&lt;/p&gt;

&lt;p&gt;Last but not least, you have budgetary and time constraints too. The more ready-made a solution is, the more expensive it is, but usually the less time it requires to deliver the data. In-house solutions require hiring permanent and temporary workers. Finding the right people takes time. You can employ juniors who are willing to learn a new filed, but again, this takes time. If you have enough money, first start with outsourcing the tasks to reliable partners. Later you can build up your own capabilities. If you are very short of money, bring data scraping in-house and crowdsource annotation. Otherwise read on and consider the tools and options you have.&lt;/p&gt;

&lt;p&gt;That’s all for now. If you’d like to learn more about tools used for data gathering and annotation, stay tuned. The second part of this series will come soon!&lt;/p&gt;

&lt;h2&gt;
  
  
  Hire us
&lt;/h2&gt;

&lt;p&gt;If you face any issues during data gathering and annotation, don’t hesitate to contact us at &lt;a href="mailto:crowintelligence@gmail.com"&gt;crowintelligence@gmail.com&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Subscribe to our newsletter
&lt;/h2&gt;

&lt;p&gt;Get highlights on NLP, AI, and applied cognitive science straight into your inbox.&lt;/p&gt;

&lt;p&gt;Enter your email address&lt;/p&gt;

&lt;p&gt;&lt;a href="https://tinyletter.com"&gt;powered by TinyLetter&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="http://creativecommons.org/licenses/by-nc-sa/4.0/"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--XPRdnNRf--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/i.creativecommons.org/l/by-nc-sa/4.0/88x31.png%3Fw%3D800%26ssl%3D1" alt="Creative Commons License" width="88" height="31"&gt;&lt;/a&gt;&lt;br&gt;
This work is licensed under a &lt;a href="http://creativecommons.org/licenses/by-nc-sa/4.0/"&gt;Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>naturallanguageprocessing</category>
      <category>machinelearning</category>
      <category>ai</category>
    </item>
    <item>
      <title>Spark NLP: State of the art natural language processing at scale</title>
      <dc:creator>crowintelligence</dc:creator>
      <pubDate>Tue, 30 Jun 2020 10:13:01 +0000</pubDate>
      <link>https://dev.to/crowintelligence/spark-nlp-state-of-the-art-natural-language-processing-at-scale-173g</link>
      <guid>https://dev.to/crowintelligence/spark-nlp-state-of-the-art-natural-language-processing-at-scale-173g</guid>
      <description>&lt;p&gt;Natural language processing is a key component in many data science systems that must understand or reason about text. Common use cases include question answering, paraphrasing or summarization, sentiment analysis, natural language BI, language modeling, and disambiguation. This talk introduces the Spark NLP library – the most widely used NLP library in the enterprise, thanks to implementing production-grade, trainable, and scalable versions of state-of-the-art deep learning &amp;amp; transfer learning NLP research, as a permissive open-source library backed by a highly active community and team.&lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/WxPARvMtkK8"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;Spark NLP natively extends the Spark ML pipeline API’s which enabling zero-copy, distributed, unified pipelines, which leverage all of Spark’s built-in optimizations. Benchmarks and design best practices for building NLP, ML and DL pipelines will be shared. The library implements core NLP algorithms including lemmatization, part of speech tagging, dependency parsing, named entity recognition, spell checking and sentiment detection. The talk will demonstrate using these algorithms to solve commonly used tasks, using Python notebooks that will be made publicly available after the talk. Bio: David Talby is a chief technology officer at John Snow Labs, helping fast-growing companies apply big data and data science techniques to solve real-world problems in healthcare &amp;amp; life science. Previously, he was with Microsoft where he led business operations for Bing Shopping in the US and Europe, and before that at Amazon in Seattle and in the UK, where he built and ran distributed teams that helped scale global financial systems. David holds a PhD in computer science and master’s degrees in both computer science and business administration.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You can find David’s slides here &lt;a href="https://drive.google.com/file/d/1JY69DNcoBPkGlNTd2HyWvnUmDorkvH6f/view?usp=sharing"&gt;https://drive.google.com/file/d/1JY69DNcoBPkGlNTd2HyWvnUmDorkvH6f/view?usp=sharing&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Spark NLP homepage:
&lt;a href="https://nlp.johnsnowlabs.com/"&gt;https://nlp.johnsnowlabs.com/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Public notebooks about the open-source library:
&lt;a href="https://github.com/JohnSnowLabs/spark-nlp-workshop/tree/master/tutorials/colab"&gt;https://github.com/JohnSnowLabs/spark-nlp-workshop/tree/master/tutorials/colab&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="http://creativecommons.org/licenses/by-nc-sa/4.0/"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--XPRdnNRf--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/i.creativecommons.org/l/by-nc-sa/4.0/88x31.png%3Fw%3D800%26ssl%3D1" alt="Creative Commons License"&gt;&lt;/a&gt;&lt;br&gt;
This work is licensed under a &lt;a href="http://creativecommons.org/licenses/by-nc-sa/4.0/"&gt;Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>spark</category>
      <category>naturallanguageprocessing</category>
      <category>ai</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Active Learning for Natural Language Processing</title>
      <dc:creator>crowintelligence</dc:creator>
      <pubDate>Tue, 30 Jun 2020 10:10:17 +0000</pubDate>
      <link>https://dev.to/crowintelligence/active-learning-for-natural-language-processing-a0d</link>
      <guid>https://dev.to/crowintelligence/active-learning-for-natural-language-processing-a0d</guid>
      <description>&lt;p&gt;More than 90% of machine learning applications improve with human feedback. For example, a model that classifying news articles into pre-defined topics has been trained on 1000s of examples where humans have manually annotated the topics. However, if there are tens of millions of news articles, it might not be feasible to manually annotate even 1% of them. If we only sample randomly, we will mostly get popular topics like “politics” that the machine learning model can already identify accurately. So, we need to be smarter about how we sample. This talk is about “Active Learning”, the process of deciding what raw data is the most optimal for human review, covering: Uncertainty Sampling; Diversity Sampling; and some advanced methods like Active Transfer Learning.&lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/13dyBvIAa1E"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;Robert Munro has worked as a leader at several Silicon Valley machine learning companies and also led AWS’s first Natural Language Processing and Machine Translation solutions. Robert is the author of Human-in-the-Loop Machine Learning, covering practical methods for Active Learning, Transfer Learning, and Annotation. Robert organizes Bay Area NLP, the world’s largest community of Language Technology professionals. Robert is also a disaster responder and is currently helping with the response to COVID-19.&lt;/p&gt;

&lt;p&gt;The slides are available on &lt;a href="https://docs.google.com/presentation/d/1lgZRRcJR2V0Ih8LDw4ezCIyqA8-dJBltwKIgF5rWV4M/edit?usp=sharing"&gt;this link&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://creativecommons.org/licenses/by-nc-sa/4.0/"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--XPRdnNRf--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/i.creativecommons.org/l/by-nc-sa/4.0/88x31.png%3Fw%3D800%26ssl%3D1" alt="Creative Commons License"&gt;&lt;/a&gt;&lt;br&gt;
This work is licensed under a &lt;a href="http://creativecommons.org/licenses/by-nc-sa/4.0/"&gt;Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>activelearning</category>
      <category>ai</category>
    </item>
    <item>
      <title>Growth Hacking with NLP and Sentiment Analysis </title>
      <dc:creator>crowintelligence</dc:creator>
      <pubDate>Tue, 02 Jun 2020 13:39:25 +0000</pubDate>
      <link>https://dev.to/crowintelligence/growth-hacking-with-nlp-and-sentiment-analysis-1d</link>
      <guid>https://dev.to/crowintelligence/growth-hacking-with-nlp-and-sentiment-analysis-1d</guid>
      <description>&lt;p&gt;We developed a course, &lt;em&gt;&lt;a href="https://www.manning.com/liveproject/growth-hacking-with-nlp-and-sentiment-analysis"&gt;Growth Hacking with NLP and Sentiment Analysis&lt;/a&gt;&lt;/em&gt; during the past months. We loved working with Manning, and now we are excited to start mentoring our students. Join us if you’d like to learn about applied sentiment analysis using Python and libraries like simpletransformers and scikit-learn.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--84gACC5P--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i0.wp.com/crowintelligence.org/wp-content/uploads/2020/06/screencapture-manning-liveproject-growth-hacking-with-nlp-and-sentiment-analysis-2020-06-02-09_45_06.png%3Ffit%3D800%252C670%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--84gACC5P--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i0.wp.com/crowintelligence.org/wp-content/uploads/2020/06/screencapture-manning-liveproject-growth-hacking-with-nlp-and-sentiment-analysis-2020-06-02-09_45_06.png%3Ffit%3D800%252C670%26ssl%3D1" alt=""&gt;&lt;/a&gt;&lt;a href="http://creativecommons.org/licenses/by-nc-sa/4.0/"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--XPRdnNRf--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/i.creativecommons.org/l/by-nc-sa/4.0/88x31.png%3Fw%3D800%26ssl%3D1" alt="Creative Commons License"&gt;&lt;/a&gt;&lt;br&gt;
This work is licensed under a &lt;a href="http://creativecommons.org/licenses/by-nc-sa/4.0/"&gt;Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>naturallanguageprocessing</category>
      <category>sentimentanalysis</category>
      <category>python</category>
    </item>
    <item>
      <title>Corpus Linguistics - the theoretical minimum </title>
      <dc:creator>crowintelligence</dc:creator>
      <pubDate>Fri, 22 May 2020 11:49:23 +0000</pubDate>
      <link>https://dev.to/crowintelligence/corpus-linguistics-the-theoretical-minimum-38jo</link>
      <guid>https://dev.to/crowintelligence/corpus-linguistics-the-theoretical-minimum-38jo</guid>
      <description>&lt;p&gt;Corpus Linguistics is a neglected field of linguistics. Linguists tend to think that it cannot offer much, only some methodological tools to support their ideas. However, they often blame it, when it contradicts to their results. Corpus Linguistics was often considered the historic predecessor of Natural Language Processing in the pre-Big Data era. In this post, we claim that Corpus Linguistics offers a unique perspective on language, and it provides experts with theoretical and practical framework to analyze linguistic data.  For the best resources of Corpus Linguistics, don’t stop reading!&lt;/p&gt;

&lt;h2&gt;
  
  
  The Corpus MOOC
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s---EWUuyeb--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/crowintelligence.org/wp-content/uploads/2020/05/corpus_mooc.jpg%3Ffit%3D800%252C281%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s---EWUuyeb--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/crowintelligence.org/wp-content/uploads/2020/05/corpus_mooc.jpg%3Ffit%3D800%252C281%26ssl%3D1" alt=""&gt;&lt;/a&gt;Lancaster University is the epicenter of Corpus Linguistics and you can take their superb &lt;a href="https://www.futurelearn.com/courses/corpus-linguistics"&gt;Corpus Linguistics: Method, Analysis, Interpretation&lt;/a&gt; MOOC course on FutureLearn for free! This is the easiest way to get into Corpus Linguistics. It is strongly recommended even for professional NLP and text/content analysis experts, since it gives a different perspective on linguistic data than other disciplines do.&lt;/p&gt;

&lt;p&gt;Take a look at the &lt;a href="http://cass.lancs.ac.uk/"&gt;ESRC Centre for Corpus Approaches to Social Science (CASS) &lt;/a&gt;website to get an idea of how corpus methods can be applied to content analysis. If you are a student, consider applying to the &lt;a href="http://wp.lancs.ac.uk/corpussummerschools/"&gt;Lancaster Summer Schools in Corpus Linguistics&lt;/a&gt;. It has a reputation that students gain fantastic experiences there.&lt;/p&gt;

&lt;h2&gt;
  
  
  Books on the theory and methodology of Corpus Linguistics
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--IfDnjqNL--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/crowintelligence.org/wp-content/uploads/2020/05/cl_book.jpg%3Fw%3D800%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--IfDnjqNL--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/crowintelligence.org/wp-content/uploads/2020/05/cl_book.jpg%3Fw%3D800%26ssl%3D1" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Corpus Linguistics&lt;/em&gt; by Tony McEnery and Andrew Hardie is a perfect intro into the field. OK, it is not the most exciting book on earth, because it has to deal with questions of data sources and ethics. It shines when it describes use-cases in neo-Firthian/functional and cognitive linguistics – but don’t be afraid of those very technical terms! This is a textbook so it explains everything that you need to know about the topics.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--bavN5BI_--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i0.wp.com/crowintelligence.org/wp-content/uploads/2020/05/statforcl.jpg%3Fw%3D800%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--bavN5BI_--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i0.wp.com/crowintelligence.org/wp-content/uploads/2020/05/statforcl.jpg%3Fw%3D800%26ssl%3D1" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Oaks’ &lt;em&gt;Statistics for Corpus Linguistics&lt;/em&gt; is our favorite book from the field. First, we used it as a textbook during our studies in the early 2000s and we often open it as a reference book since then.&lt;/p&gt;

&lt;h2&gt;
  
  
  Software tools for the non-programmers
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--y-7A3FSy--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/crowintelligence.org/wp-content/uploads/2020/05/maxresdefault.jpg%3Ffit%3D800%252C450%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--y-7A3FSy--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/crowintelligence.org/wp-content/uploads/2020/05/maxresdefault.jpg%3Ffit%3D800%252C450%26ssl%3D1" alt=""&gt;&lt;/a&gt;Source: &lt;a href="https://i.ytimg.com/vi/ryYKHbPQof8/maxresdefault.jpg"&gt;https://i.ytimg.com/vi/ryYKHbPQof8/maxresdefault.jpg&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Laurence Anthony’s &lt;a href="https://www.laurenceanthony.net/software/antconc/"&gt;AntConc &lt;/a&gt;was the one and only free and comprehensive corpus analysis toolkit for non-programmers. The &lt;a href="https://www.youtube.com/user/AntlabJPN"&gt;accompanying YouTube tutorials&lt;/a&gt; are the best resources to learn how to use it in practice. We’ve been using AntConc for years now. Although its user interface is spartan, we learned to love it, since we haven’t found a better tool yet.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--XFV3xnaU--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/crowintelligence.org/wp-content/uploads/2020/05/lancbox.jpeg%3Fw%3D800%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--XFV3xnaU--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/crowintelligence.org/wp-content/uploads/2020/05/lancbox.jpeg%3Fw%3D800%26ssl%3D1" alt=""&gt;&lt;/a&gt;Source: &lt;a href="https://img.scoop.it/t8KfHWF_eh_GfK8O-7kfojl72eJkfbmt4t8yenImKBVvK0kTmF0xjctABnaLJIm9"&gt;https://img.scoop.it/t8KfHWF_eh_GfK8O-7kfojl72eJkfbmt4t8yenImKBVvK0kTmF0xjctABnaLJIm9&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="http://corpora.lancs.ac.uk/lancsbox/"&gt;#LancsBox&lt;/a&gt;: Lancaster University corpus toolbox is “a new-generation software package for the analysis of language data and corpora developed at Lancaster University ” Developed by the best corpus linguistics research center, #LancsBox seems to be the heir apparent to AntConc. Its user interface is more user-friendly and its functionality is more versatile. We esp. love its collocation network visualization capabilities.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical Programming for Corpus Linguistics
&lt;/h2&gt;

&lt;p&gt;God knows why, but corpus linguists prefer the R programming language, so here we list the best sources to learn R and corpus linguistics hand in hand.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--UDBHu0PU--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/crowintelligence.org/wp-content/uploads/2020/05/baayen.jpg%3Ffit%3D721%252C1024%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--UDBHu0PU--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/crowintelligence.org/wp-content/uploads/2020/05/baayen.jpg%3Ffit%3D721%252C1024%26ssl%3D1" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;R. Harald Baayen is one of the early pioneers of quantitative linguistics. His &lt;em&gt;Analyzing Linguistic Data&lt;/em&gt; is an excellent introduction into corpus/quantitative methods and into programming with R. This book came out in 2008 and shows its age now, so we don’t recommend it to complete beginners in R.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--HnMCBxBX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/crowintelligence.org/wp-content/uploads/2020/05/gries01.jpg%3Fw%3D800%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--HnMCBxBX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/crowintelligence.org/wp-content/uploads/2020/05/gries01.jpg%3Fw%3D800%26ssl%3D1" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you read only one book on corpus linguistics, and you are not afraid of coding, Gries’s Quantitative &lt;em&gt;Corpus Linguistics&lt;/em&gt; with R should be that book. Gries is an exceptional teacher, who wrote a pedagogically brilliant textbook. It helps you acquire the necessary skills to analyze linguistic data in a step-by-step fashion. It provides the reader with lucid explanations at every stage. Read our interview with Gries from 2010 &lt;a href="http://szamitogepesnyelveszet.blogspot.com/2010/11/on-computational-corpus-linguistics.html"&gt;on our previous blog&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--A5vzJsS9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/crowintelligence.org/wp-content/uploads/2020/05/gries02.jpg%3Fw%3D800%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--A5vzJsS9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/crowintelligence.org/wp-content/uploads/2020/05/gries02.jpg%3Fw%3D800%26ssl%3D1" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Written in the same vein as &lt;em&gt;Quantitative Corpus Linguistics&lt;/em&gt;, &lt;em&gt;Statistics for Linguistics with R&lt;/em&gt; introduces the main statistical methods and their use in linguistics. Just like Baayen’s book, this one covers topics of corpus and quantitative linguistics. Although it is a masterpiece, we only recommend it to those who have a strong interest in linguistics.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;The header image was generated for the meetup on visualizing linguistic data. If you speak Hungarian, you can read more about it &lt;a href="https://www.nyest.hu/hirek/egy-kep-tobbet-mond-ezer-szonal"&gt;here&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Each book cover was downloaded from Amazon via Google Image Search.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>corpuslinguistics</category>
      <category>statistics</category>
      <category>rstats</category>
      <category>naturallanguageprocessing</category>
    </item>
    <item>
      <title>Getting started with Statistics</title>
      <dc:creator>crowintelligence</dc:creator>
      <pubDate>Thu, 07 May 2020 15:16:21 +0000</pubDate>
      <link>https://dev.to/crowintelligence/getting-started-with-statistics-2m8g</link>
      <guid>https://dev.to/crowintelligence/getting-started-with-statistics-2m8g</guid>
      <description>&lt;p&gt;&lt;em&gt;“I keep saying the sexy job in the next ten years will be statisticians. People think I’m joking, but who would’ve guessed that computer engineers would’ve been the sexy job of the 1990s?”&lt;/em&gt;, said Hal Varian chief economist at Google in 2009. These days machine learning and artificial intelligence are the sexiest fields, but their practitioners should be undercover statisticians. If you are looking for an intro into stats, this is a must-read post for you.&lt;/p&gt;

&lt;h2&gt;
  
  
  Warm-up readings
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi2.wp.com%2Fcrowintelligence.org%2Fwp-content%2Fuploads%2F2020%2F02%2FIMG_20200208_163342.jpg%3Fresize%3D800%252C600%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi2.wp.com%2Fcrowintelligence.org%2Fwp-content%2Fuploads%2F2020%2F02%2FIMG_20200208_163342.jpg%3Fresize%3D800%252C600%26ssl%3D1"&gt;&lt;/a&gt;Charles Wheelan’s &lt;strong&gt;Naked Statistics&lt;/strong&gt; is the most entertaining book about statistics, which doesn’t use any equation, but explains the main concepts through real-world examples. It is absolutely beginner-friendly and provides you just with the first steps in your journey towards mastering statistics.&lt;/p&gt;

&lt;p&gt;David Salsburg’s &lt;strong&gt;The Lady Tasting Tea&lt;/strong&gt; is the best book on the history of statistics. Salsburg tells the history of the development of the field and the modern scientific thinking without using heavy math. If you learn stats, you will learn the names of Pearson, Spearman and others soon. You’ll wonder who was Student and why he had developed this t-test and how computers overtake statistical tables and calculations on papers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Learning by doing
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fcrowintelligence.org%2Fwp-content%2Fuploads%2F2020%2F04%2Fhf_statistics-rotated.jpg%3Ffit%3D800%252C912%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fcrowintelligence.org%2Fwp-content%2Fuploads%2F2020%2F04%2Fhf_statistics-rotated.jpg%3Ffit%3D800%252C912%26ssl%3D1"&gt;&lt;/a&gt;The Head First series by O’Reilly is using a unique approach to teaching that is based on the cognitive science of learning. This learning method involves lots of activities, pictures, and the explanation of the same concept several times from different angles. We love the series, especially &lt;strong&gt;Head First Statistics&lt;/strong&gt; by Dawn Griffiths. If you do the exercises of the book, and not just read it, you will have a solid foundation of the very basics of statistics.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fcrowintelligence.org%2Fwp-content%2Fuploads%2F2020%2F04%2Fhf_data_analysis.jpg%3Ffit%3D800%252C994%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fcrowintelligence.org%2Fwp-content%2Fuploads%2F2020%2F04%2Fhf_data_analysis.jpg%3Ffit%3D800%252C994%26ssl%3D1"&gt;&lt;/a&gt;Do you want to get some experience of how data analysts work? Milton’s &lt;strong&gt;Head First Data Analysis&lt;/strong&gt; is the best resource for you! You’ll learn about how to use a spreadsheet to analyze data, how to clean messy real-world data, and how to put your statistical knowledge into practice.&lt;/p&gt;

&lt;h2&gt;
  
  
  Think Python and Stats
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fcrowintelligence.org%2Fwp-content%2Fuploads%2F2020%2F04%2Fthink_stats_comp.png%3Fw%3D800%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fcrowintelligence.org%2Fwp-content%2Fuploads%2F2020%2F04%2Fthink_stats_comp.png%3Fw%3D800%26ssl%3D1"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Allen B. Downey publishes high quality open books on computer science, statistics and complexity. &lt;strong&gt;&lt;a href="https://greenteapress.com/thinkstats/" rel="noopener noreferrer"&gt;Think Stats&lt;/a&gt;&lt;/strong&gt; is an excellent book written for programmers. You can get the most from it if you’re a confident intermediate pythonista and you’ve already mastered the basics of statistics. Having worked through the book, you are ready to use advanced statistical Python modules.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi1.wp.com%2Fcrowintelligence.org%2Fwp-content%2Fuploads%2F2020%2F04%2Fstatsmodel.jpg%3Fw%3D800%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi1.wp.com%2Fcrowintelligence.org%2Fwp-content%2Fuploads%2F2020%2F04%2Fstatsmodel.jpg%3Fw%3D800%26ssl%3D1"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Although Python has the built-in statistics module, it is convenient only for the most basic tasks. If you are into classical statistics, the &lt;a href="https://www.statsmodels.org/" rel="noopener noreferrer"&gt;statsmodels&lt;/a&gt; module is made for you.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fcrowintelligence.org%2Fwp-content%2Fuploads%2F2020%2F04%2Flogo.png%3Fw%3D800%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fcrowintelligence.org%2Fwp-content%2Fuploads%2F2020%2F04%2Flogo.png%3Fw%3D800%26ssl%3D1"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi2.wp.com%2Fcrowintelligence.org%2Fwp-content%2Fuploads%2F2020%2F04%2Fscikit-learn-logo-small.png%3Fw%3D800%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi2.wp.com%2Fcrowintelligence.org%2Fwp-content%2Fuploads%2F2020%2F04%2Fscikit-learn-logo-small.png%3Fw%3D800%26ssl%3D1"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.scipy.org/" rel="noopener noreferrer"&gt;SciPy&lt;/a&gt; and &lt;a href="https://scikit-learn.org/stable/" rel="noopener noreferrer"&gt;scikit-learn&lt;/a&gt; provides you a plethora of statistical and machine learning algorithms.&lt;/p&gt;

&lt;h2&gt;
  
  
  Advanced topics
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://crowintelligence.org/2020/02/28/math-for-machine-learning-and-artificial-intelligence/" rel="noopener noreferrer"&gt;Math for Machine Learning and Artificial Intelligence&lt;/a&gt;&lt;/strong&gt; : in our previous post, we gave you some advice on learning higher math for ML, AI, and Statistics&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://crowintelligence.org/2020/02/21/getting-started-with-sql/" rel="noopener noreferrer"&gt;Getting Started with SQL&lt;/a&gt;&lt;/strong&gt; : if you are serious about data analysis, you should learn the basics of (relational) databases. You can learn from our post where to start your journey.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Sources
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;The header image was downloaded from xkcd. Its source can be found &lt;a href="https://xkcd.com/552/" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;The Think Stats cover image was downloaded from &lt;a href="https://greenteapress.com/thinkstats/think_stats_comp.png" rel="noopener noreferrer"&gt;this link&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="http://creativecommons.org/licenses/by-nc-sa/4.0/" rel="noopener noreferrer"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi2.wp.com%2Fi.creativecommons.org%2Fl%2Fby-nc-sa%2F4.0%2F88x31.png%3Fw%3D800%26ssl%3D1" alt="Creative Commons License"&gt;&lt;/a&gt;&lt;br&gt;
This work is licensed under a &lt;a href="http://creativecommons.org/licenses/by-nc-sa/4.0/" rel="noopener noreferrer"&gt;Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>python</category>
      <category>statistics</category>
      <category>machinelearning</category>
      <category>ai</category>
    </item>
    <item>
      <title>How to treat your robot?</title>
      <dc:creator>crowintelligence</dc:creator>
      <pubDate>Thu, 07 May 2020 15:14:01 +0000</pubDate>
      <link>https://dev.to/crowintelligence/how-to-treat-your-robot-15p8</link>
      <guid>https://dev.to/crowintelligence/how-to-treat-your-robot-15p8</guid>
      <description>&lt;p&gt;description: Is it possible that robots and creatures with artificial intelligence acquire rights? Will overworked carmaker robots establish their union one day? Shall abolitionists help sex robots? Whom to blame when a robot harms a worker in a factory? Will an artificial intelligence go to jail, and if yes, will it be incarcerated with human inmates? These questions seem to be impractical and science-fiction these days, but remember that one day children, women and animal rights were neither topics of public discourse. So, let’s see those features one by one that make something the subject of moral consideration.&lt;/p&gt;




&lt;p&gt;Children and women had no rights for a long time in human history. Universal suffrage and women’s rights were unimaginable for centuries before the modern era. These days, most of the developed countries protect the rights of animals and the (living) environment to some extent. The technological development raises the question if we should give rights to machines. Should we stop beating our robots?&lt;/p&gt;

&lt;p&gt;Is it possible that robots and creatures with artificial intelligence acquire rights? Will overworked carmaker robots establish their union one day? Shall abolitionists help sex robots? Whom to blame when a robot harms a worker in a factory? Will an artificial intelligence go to jail, and if yes, will it be incarcerated with human inmates? These questions seem to be impractical and science-fiction these days, but remember that one day children, women and animal rights were neither topics of public discourse. So, let’s see those features one by one that make something the subject of moral consideration.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Moral agency and patiency&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The morality of machine intelligence can be approached from two distinct directions. 1) When we wonder whether machines can be liable for their acts, can set their own goals and are capable of deliberate and conscious actions, we raise issues of moral agency. 2) When we speculate about if machines can be used as sex toys or can be beaten, we inquire if they are mere artifacts or entities that we should take care of. This consideration is called moral patiency.&lt;/p&gt;

&lt;p&gt;As the title of this post suggests we are dealing with moral patiency in detail, but this does not mean that moral agency is excluded from our argumentation, since we assume that moral agency entails patiency. More precisely, patiency is a necessary condition of agency. However, we don’t argue for this position here, but take it as a premise.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Sentience and patiency&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;As part of the Cartesian tradition, Western culture thought of animals as machines till the 1970s. The treatment of animals has radically changed since then, thanks to activists and books like &lt;em&gt;Animal Liberation&lt;/em&gt; by Peter Singer. According to Singer, one can be the subject of moral considerations if it is a sentient being, or to put it simply, if it can suffer. If we accept this point of view, we have to examine if machines and artificial minds can be sentient beings.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--WBdF9CeD--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/crowintelligence.org/wp-content/uploads/2020/04/animalliberation-rotated.jpg%3Ffit%3D768%252C1024%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--WBdF9CeD--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/crowintelligence.org/wp-content/uploads/2020/04/animalliberation-rotated.jpg%3Ffit%3D768%252C1024%26ssl%3D1" alt="" width="768" height="1024"&gt;&lt;/a&gt;No robot or artificial intelligence can feel anything. At the moment, the technology is far from producing a sentient machine. However there are lots of projects aiming to develop some sort of digital or robotic companion. The most well-known ones are chatbots for customer relations, chatbots for therapeutic use, supporting robots for the elderly, and robots as sex toys, just to mention a few. These projects don’t aim to build a fully autonomous general artificial intelligence, but to create reliable and useful tools that can be used in social interactions.&lt;/p&gt;

&lt;p&gt;Human-Computer Interaction researchers illustrate companion machines of the future with the analogy of working and companion dogs. Guide dogs are very smart in general and are trained to excel at aiding humans to move freely. This way, they are similar to companion machines. Moreover, &lt;a href="https://www.sciencedirect.com/science/article/pii/S0747563217306234"&gt;this study&lt;/a&gt; from the &lt;a href="https://familydogproject.elte.hu/"&gt;Family Dog Project&lt;/a&gt; argues that qualities of companion dogs, such as faithfulness, kindness, smartness should be implemented in companion robots to help humans accept machines. In this way, we may ascribe similar attributes, feelings, and emotional states to robots as to dogs.&lt;/p&gt;

&lt;p&gt;The projection of these qualities raises an important question. If machines exhibit some feelings and emotions, should they be in an emotional state? Or to think it even further, can robots be in an emotional state that is identical with the emotional state of humans?&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;The problem of other minds&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Maybe at first it sounds a stupid question, but why do we attribute sentience to animals? Is it just another form of anthropomorphism, or do they really have feelings? Anyway, how does one know that another human being shares the same feelings and emotions with her? On what basis can one attribute a rational mind to others? Philosophers of mind call this phenomenon “the problem of other minds”.&lt;/p&gt;

&lt;p&gt;According to Wittgenstein, this is a linguistic question. If I hit my finger when I drive a nail, I cry out loud and say “awwwww!”, because I learned this behavior from my environment. My parents and all the adults around me did the same when I was a child, so I learnt to do it too. I learnt what to say when I feel terrible physical pain, just like I learnt to say “Hello” to my neighbors when I meet them. All these things constitute a language game or a way of life and they are social by their very nature. I cannot feel pain without expressing it. I cannot feel anything if I cannot name it. Hence language is a precondition of other minds. This is the way Wittgenstein’s argumentation goes. Consequently, the condition of emotional states and mental activity is speaking.&lt;/p&gt;

&lt;p&gt;Our everyday experience contradicts with the view described above. We do attribute mental and emotional states to animals, although they cannot speak. We even speak about physical objects as if they were persons. E.g. “Why does my computer not want to work?” Philosophers call this strategy “intentional strategy”, which is a funny word for having mental states. If something behaves like an intentional agent, the best way to deal with it is to assume that it is really intentional.&lt;/p&gt;

&lt;p&gt;But what can we know about the mental states of other creatures? Can we imagine what it is like to be a bat? More precisely, can we put ourselves in the place of a bat? What would it like to navigate ourselves using only our ears? Some philosophers of mind think that echolocation cannot be imagined and we cannot know what it is like to be a bat, since being a bat or being a human comes with a different qualia, i.e. a distinct way of perceiving and experiencing the world around us.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--eSUdsKay--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i0.wp.com/crowintelligence.org/wp-content/uploads/2020/04/milyen-lehet-denevernek-lenni.jpg%3Fw%3D800%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--eSUdsKay--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i0.wp.com/crowintelligence.org/wp-content/uploads/2020/04/milyen-lehet-denevernek-lenni.jpg%3Fw%3D800%26ssl%3D1" alt="" width="410" height="244"&gt;&lt;/a&gt;Source: Hans Holbein / Wikimedia Commons / Public Domain&lt;/p&gt;

&lt;p&gt;If we want to attribute sentience to animals and machines, we need something more than the intentional strategy. We have to identify similar behavioral patterns that animals share and we have to find their physiological structure. Some behavioral patterns are produced by very similar physiological structures, while others are not, but are functionally very similar. If a behavioral pattern can be “implemented” by various organic structures, it can be implemented by inorganic ones as well. Using the philosophers’ terminology, if functionalism works, we can build sentient machines.&lt;/p&gt;

&lt;p&gt;One of the first lessons of robotics came from phenomenology and cognitive science. The mind of autonomous biological agents do not end at their skull. Humans and animals have bodies, and they sense the world through their organs. Also, they do not just passively navigate themselves in their environment, but they actively use the environment for various tasks to extend their minds. For example they apply landmarks for navigation. So human and animal cognition is embodied and extended at the same time. These embodied and extended minds created the abstract space of morality, or more exactly they are constantly creating morality.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Rights and obligations&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Although there is still much to do, animal rights are established and are codified in almost all developed countries around the globe. The most common argument about the necessity of laws protecting the rights of animals is that animals, or at least vertebrates, are sentient beings. Although rights are granted to animals, they are not exercised by them. In case of minors and animals, it is the caretaker and the public who act on their behalf and exercise their rights. Also, animals are aware neither of their rights nor of the moral consequences of their acts.&lt;/p&gt;

&lt;p&gt;Let’s study the case of a dog which bit a postman. No one would blame the dog for its act, but its owner would be in big trouble. On the one hand, he’d be charged with causing harm to the postman, on the other hand with treating his dog badly, which might have caused its aggressive behavior. But who should be blamed when an intelligent machine does harm? Its owner, its manufacturer or the programmer who trained it? What shall we do with such a machine? Can we simply switch it off or would it count as an execution?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--wU3Dhq5C--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/crowintelligence.org/wp-content/uploads/2020/04/actroid-der-feltetelezhetoen-meg-nem-erez-semmit.jpg%3Fw%3D800%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--wU3Dhq5C--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/crowintelligence.org/wp-content/uploads/2020/04/actroid-der-feltetelezhetoen-meg-nem-erez-semmit.jpg%3Fw%3D800%26ssl%3D1" alt="" width="307" height="410"&gt;&lt;/a&gt;Source: Wikimedia Commons / Gnsin – Gnsin / CC BY-SA 3.0&lt;/p&gt;

&lt;p&gt;During the course of history, minors, women and minorities were treated as sentient beings with limited rationality. As a result, they were deprived from the same rights as adults, mostly privileged and rich men, had . Also, they had specific obligations, e.g. to follow the orders of the head of the household, who were often adult men. They were also subjected to those persons’ orders who were above them in the societal hierarchy.&lt;/p&gt;

&lt;p&gt;Machines have no obligations, since they are not living beings, but they are built with the purpose to handle and execute various tasks. If you hire a gardener, she has got the obligation to trim your lawn, but the lawnmower has no obligation, although it was built to trim the lawn. Also, horses and companion dogs have no obligations, but they are kept for various tasks by their owners. If a lawnmower doesn’t work, its owner can throw it away. If a horse is sick or it doesn’t want to jump over fences all day, its owner cannot simply throw it away. How about a sentient machine? What if a sex robot becomes sentient one day and it has negative feelings when someone uses it? What if fashion changes and the old model of the robot goes out of fashion? Can its owner throw it away?&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;It’s not about the future, it’s about the present of humanity&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;If you happen to think that we raised issues that are not reasonable and practical, it’s high time to shed light on the importance of philosophizing on beating robots. When we are considering the moral acceptability of beating a robot, we are not only thinking about the moral status of robots, but that of ourselves. What kind of traits do we want to cultivate in ourselves?&lt;/p&gt;

&lt;p&gt;The questions of ethics are perennial, although there are no exact, timeless answers to them. The recent surge of Artificial Intelligence made us chewing over these problems again and again &lt;strong&gt;–&lt;/strong&gt; as technology is evolving rapidly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Peter Singer: Animal Liberation, Harper Perennial Modern Classics, 2009&lt;/li&gt;
&lt;li&gt;Ludwig Wittgenstein: Philosophical Investigations, John Wiley and Sons, 2016&lt;/li&gt;
&lt;li&gt;Thomas Nagel: What Is it Like to Be a Bat? In: Thomas Nagel: Mortal Questions, Cambridge University Press, 2003&lt;/li&gt;
&lt;li&gt;Paul M. Churchland: Matter and Consciousness, MIT Press, 1998&lt;/li&gt;
&lt;li&gt;Hursthouse, Rosalind and Pettigrove, Glen, “Virtue Ethics”, The Stanford Encyclopedia of Philosophy (Winter 2018 Edition), Edward N. Zalta (ed.), URL = &amp;lt;&lt;a href="https://plato.stanford.edu/archives/win2018/entries/ethics-virtue/&amp;amp;gt"&gt;https://plato.stanford.edu/archives/win2018/entries/ethics-virtue/&amp;amp;gt&lt;/a&gt;;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="http://creativecommons.org/licenses/by-nc-sa/4.0/"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--XPRdnNRf--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/i.creativecommons.org/l/by-nc-sa/4.0/88x31.png%3Fw%3D800%26ssl%3D1" alt="Creative Commons License" width="88" height="31"&gt;&lt;/a&gt;&lt;br&gt;
This work is licensed under a &lt;a href="http://creativecommons.org/licenses/by-nc-sa/4.0/"&gt;Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>ethics</category>
      <category>philosophy</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Getting started with SQL </title>
      <dc:creator>crowintelligence</dc:creator>
      <pubDate>Tue, 05 May 2020 17:18:46 +0000</pubDate>
      <link>https://dev.to/crowintelligence/getting-started-with-sql-5af6</link>
      <guid>https://dev.to/crowintelligence/getting-started-with-sql-5af6</guid>
      <description>&lt;p&gt;description: SQL and databases are among the most needed data science skills, it is #3 right afrer Python and R according to this empirical study. However, the need for a database isn’t obvious for the beginner programmer at first.At some point the aspiring data scientist will grow out the world of csvs and plain text files. Using databases becomes handy, when someone starts building Rest APIs or one has to connect to a remote SQL server full of gigabytes of valuable data. Here are our tips to get started with SQL and how to use it the Pythonic way.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://crowintelligence.org/2020/02/21/getting-started-with-sql/"&gt;canonical_url&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;SQL and databases are among the most needed data science skills. According to a recent &lt;a href="https://www.kaggle.com/discdiver/the-most-in-demand-skills-for-data-scientists"&gt;study&lt;/a&gt;, SQL is the third most demanded skill right afrer Python and R. Suprisingly a beginner programmer can happily live without a database for a long time. However, at some point the aspiring data scientist will grow out the world of csvs and plain text files. Using databases becomes handy, when someone starts building Rest APIs or one has to connect to a remote SQL server full of gigabytes of valuable data. Here are our tips to get started with SQL and how to use it in the Pythonic way.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Which SQL implementation should I use?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;SQL is a standard, its latest release came out in 2016. There are many closed and open source vendors who built their own implementation of the standard. Each likes to extend it with its own flavor, but the differences are minor (at least for a beginner). We encourage you to use &lt;a href="https://mariadb.org/"&gt;MariaDB&lt;/a&gt;, unless you have good reason to ignore it (e.g. at work, your company is using MySQL, or at school you are learning about databases using Postgres, etc.)&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Should I install it?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Absolutely no, you shouldn’t install it on your computer! Use the official Docker image of your prefered SQL implementation. If you don’t use Docker, invest some time into learning its basics.&lt;a href="https://www.tutorialspoint.com/docker/index.htm"&gt;This tutorial&lt;/a&gt; helps you learn how to install Docker and start a container on your machine (the first twelve lessons till “Docker – Containers and Shells” is enough at first). Don’t simply start your Docker image, attach a volume to it, since this is the way to preserve (i.e. save) your databases. your effort turns to be a bonus, as knowing some Docker is a very valuable data science skill!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--f_pwe7nQ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/crowintelligence.org/wp-content/uploads/2020/02/1920px-Docker_container_engine_logo.svg_.png%3Ffit%3D800%252C190%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--f_pwe7nQ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/crowintelligence.org/wp-content/uploads/2020/02/1920px-Docker_container_engine_logo.svg_.png%3Ffit%3D800%252C190%26ssl%3D1" alt=""&gt;&lt;/a&gt;We strongly recommend you to start the &lt;a href="https://en.wikipedia.org/wiki/PhpMyAdmin"&gt;phpMyAdmin&lt;/a&gt;, the free administration tool for SQL, Docker image along with your SQL implementation. phpMyAdmin provides a simple and intuitive interface to manage your databases and execute various SQL statements.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--aS_mgnr6--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/crowintelligence.org/wp-content/uploads/2020/02/PhpMyAdmin_logo.png%3Fw%3D800%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--aS_mgnr6--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/crowintelligence.org/wp-content/uploads/2020/02/PhpMyAdmin_logo.png%3Fw%3D800%26ssl%3D1" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://hackernoon.com/mariadb-phpmyadmin-docker-running-local-database-ok9q36ji"&gt;This short tutorial&lt;/a&gt; helps you to set up MariaDB and phpMyAdmin and persisting your databases using a &lt;a href="https://docs.docker.com/compose/"&gt;docker-compose&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;The pythonic way&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;SQL is a kind of programming language (actually, it is a so-called non-procedural programming language) and it is very different from Python. The easiest way to start using SQL in your Python projects is using the pymysql package, which lets you easily connect to your database,. On the top of that, you can write SQL statements as simple strings, which are passed to a function that sends them to the database engine for execution.&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;
&lt;br&gt;
&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--QdQT3OUb--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i0.wp.com/crowintelligence.org/wp-content/uploads/2020/02/sqlalchemy-logo.png%3Fw%3D800%26ssl%3D1" alt=""&gt;

&lt;p&gt;Using string variables to store your SQL statements isn’t pythonic. Although you can use f-strings to substitute parts of your expressions to your Python variables, this method can become very tedious esp. when you are working with complex statements. SQLAlchemy is the de facto standard way to use SQL in Python programs. It comes in two flavors, namely Core and ORM (which stands for object relational mapping). ORM is very advanced, hence chances are high that you won’t need it as a data scientist. Core provides you with the ability to use SQL statements as methods, so you can even chain them together. Also, you can use strings as SQL statements, aka “textual SQL” too. Using SQLAlchemy Core makes your code more pythonic and readable, which means a more maintainable code. If you don’t want to switch from pymysql to SQLAlchemy later, you can start using Core’s textual SQL and later you can gradually transist to Core objects and their methods. &lt;a href="https://docs.sqlalchemy.org/en/13/core/tutorial.html"&gt;This part&lt;/a&gt; of the official documentation of the toolkit is a pretty nice intro into using Core.&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;h3&gt;
  
  
  &lt;strong&gt;DataFrames and SQL tables – How to integrate all this into your workflow?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;You can easily make a pandas DataFrame from an SQL table and vice versa. &lt;a href="https://hackersandslackers.com/connecting-pandas-to-a-sql-database-with-sqlalchemy/"&gt;This short tutorial&lt;/a&gt; shows you how easy it is to achieve this.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Resources&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Although there are plenty of tutorials on the net, and we linked some of them in this post, we strongly recommend the following two books.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="http://shop.oreilly.com/product/9780596007270.do"&gt;Learning SQL, 2nd edition&lt;/a&gt;&lt;/strong&gt; by Alan Beaulieu: This title is a short, practice oriented intro into SQL. It is language and implementation agnostic and despite its age it is superb.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="http://shop.oreilly.com/product/0636920035800.do"&gt;Essential SQLAlchemy, 2nd edition&lt;/a&gt;&lt;/strong&gt; by Myers and Copeland: SQLAlchemy has got an extensive and very usable documentation, but it lacks user-friendly tutorials. This book is the only comprehensive intro into SQLAlchemy, as per our best knowledge.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Image sources&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Header image &lt;a href="https://cdn.pixabay.com/photo/2017/06/12/04/21/database-2394312_960_720.jpg"&gt;https://cdn.pixabay.com/photo/2017/06/12/04/21/database-2394312_960_720.jpg&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;phpMyAdmin logo &lt;a href="https://upload.wikimedia.org/wikipedia/commons/thumb/4/4f/PhpMyAdmin_logo.svg/115px-PhpMyAdmin_logo.svg.png"&gt;https://upload.wikimedia.org/wikipedia/commons/thumb/4/4f/PhpMyAdmin_logo.svg/115px-PhpMyAdmin_logo.svg.png&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Docker logo &lt;a href="https://upload.wikimedia.org/wikipedia/commons/c/c9/MariaDB_Logo.png"&gt;https://upload.wikimedia.org/wikipedia/commons/c/c9/MariaDB_Logo.png&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;SQLAlchemy logo &lt;a href="https://quintagroup.com/cms/python/images/sqlalchemy-logo.png"&gt;https://quintagroup.com/cms/python/images/sqlalchemy-logo.png&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="http://creativecommons.org/licenses/by-nc-sa/4.0/"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--XPRdnNRf--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/i.creativecommons.org/l/by-nc-sa/4.0/88x31.png%3Fw%3D800%26ssl%3D1" alt="Creative Commons License"&gt;&lt;/a&gt;&lt;br&gt;
This work is licensed under a &lt;a href="http://creativecommons.org/licenses/by-nc-sa/4.0/"&gt;Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>python</category>
      <category>ai</category>
      <category>unsupervised</category>
    </item>
  </channel>
</rss>
