<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Kyle</title>
    <description>The latest articles on DEV Community by Kyle (@stockholmux).</description>
    <link>https://dev.to/stockholmux</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F205788%2Ffe53406a-b0ad-416e-9d06-ab26c4a52e53.jpg</url>
      <title>DEV Community: Kyle</title>
      <link>https://dev.to/stockholmux</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/stockholmux"/>
    <language>en</language>
    <item>
      <title>A brief introduction to Piped Processing Language in Open Distro for Elasticsearch</title>
      <dc:creator>Kyle</dc:creator>
      <pubDate>Mon, 02 Nov 2020 16:08:24 +0000</pubDate>
      <link>https://dev.to/stockholmux/a-brief-introduction-to-piped-processing-language-in-open-distro-for-elasticsearch-1a9g</link>
      <guid>https://dev.to/stockholmux/a-brief-introduction-to-piped-processing-language-in-open-distro-for-elasticsearch-1a9g</guid>
      <description>&lt;p&gt;In &lt;a href="https://opendistro.github.io/for-elasticsearch/blog/odfe-updates/2020/10/odfe-1.11.0-released/" rel="noopener noreferrer"&gt;Open Distro for Elasticsearch 1.11.0&lt;/a&gt;, a new query language was introduced - Piped Processing Language (PPL). PPL provides a different way of thinking about data and compliments the existing query languages (SQL and the Query DLS) in Open Distro.&lt;/p&gt;

&lt;p&gt;The basis of PPL is the concept of pipes from UNIX. You take output of one operation and feed it into another operation. Mentally, I think of this as a factory where a widget is manufactured step-by-step, the output of one machine just leads to the next machine. &lt;/p&gt;

&lt;h2&gt;
  
  
  Setup
&lt;/h2&gt;

&lt;p&gt;First, let's add some documents to an index. There is nothing new here. I'm using cURL and the &lt;a href="https://opendistro.github.io/for-elasticsearch-docs/docs/elasticsearch/rest-api-reference/#bulk" rel="noopener noreferrer"&gt;bulk API&lt;/a&gt; to add 4 documents with information about vintage computers (because why not?). &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fqhz01z79qmvt35xq2jzk.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fqhz01z79qmvt35xq2jzk.gif" alt="Unix prompt - adding documents - animated gif"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you are familiar with Elasticsearch but not Open Distro, you might notice a few extra arguments on cURL. These are due to built-in security features of Open Distro for Elasticsearch. I'm running &lt;a href="https://opendistro.github.io/for-elasticsearch-docs/docs/install/docker/#start-a-cluster" rel="noopener noreferrer"&gt;a Docker Open Distro cluster locally&lt;/a&gt; and out-of-the-box this comes with a self-signed cert, so I'm using &lt;code&gt;-k&lt;/code&gt; to prevent peer verification. The other argument is &lt;code&gt;--user&lt;/code&gt; as Open Distro has built-in fine-grained access control. &lt;/p&gt;

&lt;h2&gt;
  
  
  Simple query
&lt;/h2&gt;

&lt;p&gt;Now that we have a tiny data set, let's do a very basic query. This will pipe two operations together. The first operation is to set the index with the 'source' command &lt;code&gt;source=vin-computers&lt;/code&gt;. Think of this making the entire index available to the pipeline. Next, we will take that entire index and remove anything but two fields - &lt;code&gt;name&lt;/code&gt; and &lt;code&gt;CPU&lt;/code&gt; using the 'fields command, &lt;code&gt;fields name, CPU&lt;/code&gt;. These two operations are concatenated together by a pipe character &lt;code&gt;|&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F6r6s4dcqtg3f48902pq2.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F6r6s4dcqtg3f48902pq2.gif" alt="Unix prompt - running a simple query in Piped Processing Langauge"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  A tad more complexity
&lt;/h2&gt;

&lt;p&gt;We can take our existing query and add a filtering clause through the &lt;code&gt;where&lt;/code&gt; command. The command is followed by a boolean expression - in this case a comparison. The comparison is built with a field on the left and the value on the right with an &lt;code&gt;=&lt;/code&gt; between.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fdfset6amdm4hyqz5o290.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fdfset6amdm4hyqz5o290.gif" alt="Unix prompt - running a more complex query in Piped Processing Language"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;At this point, it looks a tad like SQL. PPL isn't, however, as structured as SQL. So, you can actually invert the order of the last two pipes and get the same result:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;source=vin-computers | where CPU="MOS6502" | fields name,CPU&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;This doesn't execute exactly the same way and I wouldn't venture it's that great in efficiency, but in an analytical situation it's often more about getting the result in a way that works for your thinking process than running it in a particular performance envelope. If you want to understand what is going on behind the scenes, you can run the same query but append &lt;code&gt;_explain&lt;/code&gt; to the endpoint (e.g. &lt;code&gt;https://localhost:9200/_opendistro/_ppl/_explain&lt;/code&gt;).&lt;/p&gt;

&lt;h2&gt;
  
  
  Wrap up
&lt;/h2&gt;

&lt;p&gt;This trivial example is probably not what PPL will be used for in the real-world, but I hope it explains the basic mechanics of the query language. Knowing what you know now, imagine going from a very broad set of documents to more and more narrow sets just by adding additional commands piped together. Attempting to build the same type of query with the Query DSL or SQL would probably lead to concentrating more on the syntax of the queries than refining the result. &lt;/p&gt;

&lt;p&gt;You can find out more over at the &lt;a href="https://opendistro.github.io/for-elasticsearch-docs/docs/ppl/" rel="noopener noreferrer"&gt;Open Distro documentation&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>elasticsearch</category>
      <category>database</category>
      <category>datascience</category>
    </item>
    <item>
      <title>Using fine grained access control for search</title>
      <dc:creator>Kyle</dc:creator>
      <pubDate>Mon, 26 Oct 2020 18:59:03 +0000</pubDate>
      <link>https://dev.to/stockholmux/using-fine-grained-access-control-for-search-54ko</link>
      <guid>https://dev.to/stockholmux/using-fine-grained-access-control-for-search-54ko</guid>
      <description>&lt;p&gt;&lt;a href="https://opendistro.github.io/for-elasticsearch/"&gt;Open Distro for Elasticsearch&lt;/a&gt; has extensive access control capabilities built right in. Of course, access control can prevent access to sensitive information, but it can also help you build applications that depend on Open Distro for Elasticsearch for search. Let’s explore this a little more.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--AkEJtpNq--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/s49auo3dmejltc33z52l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--AkEJtpNq--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/s49auo3dmejltc33z52l.png" alt="User A and User B access the same document and get different results"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Say your application uses standard HTTP endpoints to serve data to end users. So, a web user loads a webpage, the client-side Javascript fires off a HTTP request to the application endpoint for a specific type of data, the data is returned to the client-side Javascript, and the browser renders it on screen for the user. Your server code is doing some minor handling of the HTTP route and parameters, making a request to Elasticsearch based on this and manipulating the returned results to make it easily useable by the client-side Javascript. Pretty typical stuff.&lt;/p&gt;

&lt;p&gt;When your application becomes sufficiently complex, you’ll probably find the code is generally repetitious. You’ll be writing almost the same methods over and over - listen for a HTTP request, process it, ask Elasticsearch, return results, yet with small, subtle differences. Not only is this boring to write, it represents a larger, harder to maintain codebase. So, how does &lt;em&gt;access control&lt;/em&gt; fit into all this?&lt;/p&gt;

&lt;p&gt;By varying the Open Distro for Elasticsearch users for types of different requests, you can change the data returned. In this scenario you have documents that represent items - these documents can contain not only public facing fields - price, colour, material, description, but also private fields like cost, suppliers, and performance. Your employees may need to see all the information from behind a special login while customers should only see the public information. Of course, you could write very similar server code for each, vary the &lt;code&gt;_source&lt;/code&gt; in the Query DSL and reimplement loads of logic. Alternatively, you can let Open Distro do this work for you and gain the ability to centrally control field availability.&lt;/p&gt;

&lt;p&gt;Let’s take this example - running this in query in Kibana Dev Tools (&lt;code&gt;&amp;lt;yourhost&amp;gt;:&amp;lt;yourport&amp;gt;/app/dev_tools#/console&lt;/code&gt;) we will add a single item to the index &lt;code&gt;ecommerce-items&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;PUT ecommerce-items/_doc/brown-mug
{
    "title": "Brown Mug",
    "description": "This mug features a large handle and a marbled, brown ceramic design.",
    "price" : 5.99,
    "private" : {
        "cost" : 1,
        "supplier" : "Mug World"
     }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No matter what happens, we don’t want our customers to be able to see anything in the &lt;code&gt;private&lt;/code&gt; object. We’ll use Open Distro’s built-in access control functionality to make this happen.&lt;/p&gt;

&lt;h2&gt;
  
  
  Creating a user &amp;amp; role
&lt;/h2&gt;

&lt;p&gt;First up in Kibana we need to create a user. This user will be used by the application for serving public information, so we’ll call it &lt;code&gt;public-item-user&lt;/code&gt;. First, go to the &lt;strong&gt;Security&lt;/strong&gt; item from the side menu then click &lt;strong&gt;Internal users&lt;/strong&gt; in the secondary menu. Now, click on the &lt;strong&gt;Create internal user&lt;/strong&gt; button. This will take you to &lt;code&gt;&amp;lt;yourhost&amp;gt;:&amp;lt;yourport&amp;gt;/app/opendistro_security#/users/create&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Under the heading “Credentials” enter the username of &lt;code&gt;public-item-user&lt;/code&gt; and a password twice for validation. Then click the &lt;strong&gt;Create&lt;/strong&gt; button. At this point we have created a user but it can’t do anything in Open Distro. To enable it to be useful, we’ll have to create and assign a role.&lt;/p&gt;

&lt;p&gt;From the side menu, go back to &lt;strong&gt;Security&lt;/strong&gt; and then &lt;strong&gt;Roles&lt;/strong&gt; from the secondary menu then click the &lt;strong&gt;Create role&lt;/strong&gt; button. You should be at &lt;code&gt;&amp;lt;yourhost&amp;gt;:&amp;lt;yourport&amp;gt;/app/opendistro_security#/roles/create&lt;/code&gt;. Under the “Name” heading enter &lt;code&gt;public-item-viewer&lt;/code&gt; into the “Name” text box. Then under the heading “Index Permissions” add the &lt;code&gt;ecommerce-items&lt;/code&gt; in the “Index” field and then &lt;code&gt;indices:data/read/search&lt;/code&gt; into the “Index Permissions” field. Finally, under the heading “Field level security” go to the “Exclude” field and enter &lt;code&gt;private&lt;/code&gt;. When you’ve done all this, go to the bottom and click the &lt;strong&gt;Create&lt;/strong&gt; button.&lt;/p&gt;

&lt;p&gt;Before proceeding, let’s take a moment to examine why* *&lt;code&gt;indices:data/read/search&lt;/code&gt; is the right permission for this situation. It grants users with this role the ability to search documents but not to manipulate them. If you scroll through the possibilities in the “Index Permissions” box you might notice the type &lt;code&gt;crud&lt;/code&gt; and &lt;code&gt;search&lt;/code&gt; which are *permission groups. *Permission groups bundle typically used permissions together as a shorthand; you could actually use these here, but you would also be granting far more permission than needed, so we’ll stick with a highly specific permission that can do very little (more on that later).&lt;/p&gt;

&lt;p&gt;At this point you have a user, &lt;code&gt;public-item-user&lt;/code&gt; and a role &lt;code&gt;public-item-viewer&lt;/code&gt; but they aren’t connected. In Open Distro this is called &lt;em&gt;mapping&lt;/em&gt;. To map a role to a user, go back to &lt;strong&gt;Security&lt;/strong&gt; and then select &lt;strong&gt;Roles&lt;/strong&gt; and find our user (&lt;code&gt;public-item-viewer&lt;/code&gt;) and click on it (you may need to use the search box). It should bring you to &lt;code&gt;&amp;lt;yourhost&amp;gt;:&amp;lt;yourport&amp;gt;/app/opendistro_security#/roles/view/public-item-viewer&lt;/code&gt;.  Now go to the tab &lt;strong&gt;Mapped Users&lt;/strong&gt; - the list will be empty so click &lt;strong&gt;Map users&lt;/strong&gt;. At this point you’ll be at the URL &lt;code&gt;&amp;lt;yourhost&amp;gt;:&amp;lt;yourport&amp;gt;/app/opendistro_security#/roles/edit/public-item-viewer/mapuser&lt;/code&gt;.  Under the heading "Internal users" go to the "Internal users" field and select &lt;code&gt;public-item-user&lt;/code&gt;. Then click &lt;strong&gt;Map&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Now your user (&lt;code&gt;public-item-user&lt;/code&gt;) has all the powers provided by the role &lt;code&gt;public-item-viewer&lt;/code&gt;. Let’s test it out by going to Kibana in a private or incognito window. This trick will allow you to be logged into two users at the same time. Go to Dev Tools (&lt;code&gt;&amp;lt;yourhost&amp;gt;:&amp;lt;yourport&amp;gt;/app/dev_tools#/console&lt;/code&gt;) and enter the following query:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;GET /ecommerce-items/_search
{
  "query": {
    "match" : {
      "title": "brown"
    }
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The result will show:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
    ...,
    "hits" : [
        {
        "_index" : "ecommerce-items",
        "_type" : "_doc",
        "_id" : "brown-mug",
        "_score" : 0.13353139,
        "_source" : {
            "title" : "Brown Mug",
                "description" : "This mug features a large handle and a marbled, brown ceramic design.",
                "price" : 5.99
            }
        }
    ]
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice that the &lt;code&gt;private&lt;/code&gt; object is gone. Our user and role has filtered this data right out.&lt;/p&gt;

&lt;p&gt;One additional note here - we set up these two roles to &lt;em&gt;only&lt;/em&gt; have access to this particular index. That means that if,  by some mistake or happenstance the credentials are released for these users or an application bug somehow allows for passing in other indices, you’re covered. No other index managed by Open Distro would be at risk of disclosure as access control is denied when querying. You can think of this as an implementation of the principle of least privilege.&lt;/p&gt;

&lt;p&gt;To set up a route for the employee section of your application, create a role called &lt;code&gt;private-item-viewer&lt;/code&gt;  that has permissions to see all information. Repeat the process above, except skipping the step where you exclude &lt;code&gt;private&lt;/code&gt; in the “Field level security section”. Then create a user called &lt;code&gt;private-item-user&lt;/code&gt; with the role of &lt;code&gt;private-item-viewer&lt;/code&gt;. All of the requests from to backend for employees would uses this user to make the requests, enabling them to have access to all the fields in the object &lt;code&gt;private&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Overview of the application, users / roles, &amp;amp; search
&lt;/h2&gt;

&lt;p&gt;As far as your application is concerned, you only need a minimal abstraction to provide different routes with different information. The authenticated route for employees needing to see the private information could reuse all the query preparation, request, serializing, deserializing and rendering code as the public route. Later when your data changes and you want to include or exclude other parts of the document, you don’t have to modify your code, just alter the roles and the application will serve only the fields you specify.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--AW5vayNM--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/192cu6fpxxwciz6vi619.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--AW5vayNM--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/192cu6fpxxwciz6vi619.png" alt="Diagram showing credentials and data flow"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Wrap up
&lt;/h2&gt;

&lt;p&gt;By using the access control features of Open Distro for Elasticsearch you can reduce the amount of code needed to implement an application that serves differing data from the same index depending on the situation. The pattern outlined above moves the responsibility of data visibility from the application to the search engine. In this way your application can be simpler and you gain centralized, code-free control at the data level. &lt;/p&gt;

&lt;p&gt;This pattern can be generalized to many other situations too. Say you have documents that are sensitive to two different groups and neither can see the whole document. Additionally, you have another group that needs to see the whole document, perhaps for security or compliance reasons. Using the same pattern of users, roles and permissions to provide differing visibility of the data with the same queries. But that is a story for another time - visit the &lt;a href="https://opendistro.github.io/for-elasticsearch/blog/"&gt;Open Distro for Elasticsearch Blog&lt;/a&gt; to find out more and keep up-to-date.&lt;/p&gt;

</description>
      <category>security</category>
      <category>database</category>
      <category>elasticsearch</category>
    </item>
  </channel>
</rss>
