<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Darren Fuller</title>
    <description>The latest articles on DEV Community by Darren Fuller (@dazfuller).</description>
    <link>https://dev.to/dazfuller</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F396290%2F45c2c3dd-c045-4d08-8021-59480a43bdef.png</url>
      <title>DEV Community: Darren Fuller</title>
      <link>https://dev.to/dazfuller</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/dazfuller"/>
    <language>en</language>
    <item>
      <title>Managing LLM prompts in DotNet with DotPrompt</title>
      <dc:creator>Darren Fuller</dc:creator>
      <pubDate>Sun, 27 Oct 2024 10:51:24 +0000</pubDate>
      <link>https://dev.to/dazfuller/managing-llm-prompts-in-dotnet-with-dotprompt-51e</link>
      <guid>https://dev.to/dazfuller/managing-llm-prompts-in-dotnet-with-dotprompt-51e</guid>
      <description>&lt;p&gt;We've been working on some internal tooling lately which makes use of AI to automate a number of tasks and to help reduce the time to production for customer use cases. While building this one of the things that became increasingly annoying was that, every time we needed to modify the system or user prompt we would have to rebuild and re-package, deploy, and then test again. While not massively time-consuming, it's a pain and a few minutes each time adds up over a day.&lt;/p&gt;

&lt;p&gt;So, easy solution, lets move the prompts to an external file so we can edit them without needing to rebuild. But...&lt;/p&gt;

&lt;p&gt;What if we could make them template driven as well, and what if we could include configuration information, what if we could load them up-front and refer to them by name, and what if it was thread safe access by default (because this app is very Task oriented)?&lt;/p&gt;

&lt;p&gt;And so, we built &lt;a href="https://github.com/elastacloud/DotPrompt" rel="noopener noreferrer"&gt;DotPrompt&lt;/a&gt; and, published it on GitHub, pushed the packages out to &lt;a href="https://www.nuget.org/packages/DotPrompt/" rel="noopener noreferrer"&gt;NuGet&lt;/a&gt; and made it available under an MIT license.&lt;/p&gt;

&lt;p&gt;I started piecing it but wanted to see if someone else had done this already. Turns out there's a couple of efforts out there, like &lt;a href="https://firebase.google.com/docs/genkit/dotprompt" rel="noopener noreferrer"&gt;Firebase Genkit Dotprompt&lt;/a&gt;, but it's not for the DotNet ecosystem. There were a couple of others as well but they were for Python.&lt;/p&gt;

&lt;p&gt;Nobody ever thinks of the DotNet devs 😭&lt;/p&gt;

&lt;p&gt;One of the things I didn't want to do was to come up with a new format and parser for this. JSON would have been an obvious choice, but it's not exactly user friendly to write multi-line strings in. XML? No, screw that. So what about YAML? Well, it's structured, has great support for multi-line strings, and is pretty user-friendly to read and write.&lt;/p&gt;

&lt;p&gt;So, what does a prompt file look like? Well, it looks like this.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Example&lt;/span&gt;
&lt;span class="na"&gt;config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;outputFormat&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;text&lt;/span&gt;
  &lt;span class="na"&gt;temperature&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;0.9&lt;/span&gt;
  &lt;span class="na"&gt;maxTokens&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;500&lt;/span&gt;
  &lt;span class="na"&gt;input&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;parameters&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;topic&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;string&lt;/span&gt;
      &lt;span class="s"&gt;style?&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s"&gt;string&lt;/span&gt;
    &lt;span class="na"&gt;default&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;topic&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;social media&lt;/span&gt;
&lt;span class="na"&gt;prompts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;system&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;You are a helpful research assistant who will provide descriptive responses for a given topic and how it impacts society&lt;/span&gt;
  &lt;span class="na"&gt;user&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;Explain the impact of {{ topic }} on how we engage with technology as a society&lt;/span&gt;
    &lt;span class="s"&gt;{% if style -%}&lt;/span&gt;
    &lt;span class="s"&gt;Can you answer in the style of a {{ style }}&lt;/span&gt;
    &lt;span class="s"&gt;{% endif -%}&lt;/span&gt;
&lt;span class="na"&gt;fewShots&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;user&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;What is Bluetooth&lt;/span&gt;
    &lt;span class="na"&gt;response&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Bluetooth is a short-range wireless technology standard that is used for exchanging data between fixed and mobile devices over short distances and building personal area networks.&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;user&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;How does machine learning differ from traditional programming?&lt;/span&gt;
    &lt;span class="na"&gt;response&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Machine learning allows algorithms to learn from data and improve over time without being explicitly programmed.&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;user&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Can you provide an example of AI in everyday life?&lt;/span&gt;
    &lt;span class="na"&gt;response&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AI is used in virtual assistants like Siri and Alexa, which understand and respond to voice commands.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We have our configuration, and name we can refer to it as later. We've got parameters (with optional items denoted by the question mark &lt;code&gt;?&lt;/code&gt;), default values, the system prompt, the user prompt, and some few-shot prompts.&lt;/p&gt;

&lt;p&gt;You might notice that the prompt has template instructions in there. To create this we used the &lt;a href="https://github.com/sebastienros/fluid/" rel="noopener noreferrer"&gt;Fluid&lt;/a&gt; library which is based on the &lt;a href="https://shopify.github.io/liquid/" rel="noopener noreferrer"&gt;Liquid&lt;/a&gt; template language from Shopify. It's got some great features in it and helps to make the prompt generation pretty powerful.&lt;/p&gt;

&lt;p&gt;The library expects the prompt files (all with the file extension of &lt;code&gt;.prompt&lt;/code&gt;) to be in a folder called &lt;code&gt;prompts&lt;/code&gt; in the current working directory. This means you can run your app from different directories and use different versions of the prompt files, useful if you have different tenants to target.&lt;/p&gt;

&lt;p&gt;So, how do you use it? Well, after installing it using NuGet you can either load a prompt file directly (see the repo), or use the prompt manager like this.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;promptManager&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;PromptManager&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;promptFile&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;promptManager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;GetPromptFile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"example"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And now &lt;code&gt;promptFile&lt;/code&gt; has all the information needed to generate the prompts. To generate the system prompt or user prompt you pass in a dictionary of values to fill in the template (or an empty dictionary, or null if there aren't any).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;systemPrompt&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;promptFile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;GetSystemPrompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;null&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;userPrompt&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;promptFile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;GetUserPrompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="n"&gt;Dictionary&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;object&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="s"&gt;"topic"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"bluetooth"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="s"&gt;"style"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"used car salesman"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And we've now got our generated prompts 🎉&lt;/p&gt;

&lt;p&gt;We even built out some extension methods for OpenAI as we're big users of Azure OpenAI, so you don't have to generate the prompts and use them in this way, you can swap it for a method call.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;promptValues&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="n"&gt;Dictionary&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;object&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="s"&gt;"topic"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"bluetooth"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="s"&gt;"style"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"used car salesman"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;completion&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;CompleteChatAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;promptFile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ToOpenAiChatMessages&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;promptValues&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;promptFile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ToOpenAiChatCompletionOptions&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So a full example would look something like this.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="nn"&gt;System.ClientModel&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="nn"&gt;Azure.AI.OpenAI&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="nn"&gt;DotPrompt&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="nn"&gt;DotPrompt.Extensions.OpenAi&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;openAiClient&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;Uri&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"https://endpoint"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;ApiKeyCredential&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"abc123"&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;openAiClient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;GetChatClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;promptManager&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;PromptManager&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;promptFile&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;promptManager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;GetPromptFile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"example"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;promptValues&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="n"&gt;Dictionary&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;object&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="s"&gt;"topic"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"bluetooth"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="s"&gt;"style"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"used car salesman"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;completion&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;CompleteChatAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;promptFile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ToOpenAiChatMessages&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;promptValues&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;promptFile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ToOpenAiChatCompletionOptions&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;completion&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="n"&gt;Console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WriteLine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;Text&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The call to &lt;code&gt;ToOpenAiChatMessages&lt;/code&gt; would also include the few-shot prompts if they were in the prompt file as well.&lt;/p&gt;

&lt;p&gt;And that's it, call to the LLM is done, we can change the parameter values on each call, and we change the prompts without re-compiling. We integrated this into our internal app and just made the whole process of tweaking so much easier. And we'll continue to dog-food the library.&lt;/p&gt;

&lt;p&gt;We did build in a few more things as well, because why not. So it also has.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ability to parse a prompt file from a stream&lt;/li&gt;
&lt;li&gt;parameter value validation at runtime&lt;/li&gt;
&lt;li&gt;Interfaces to allow the file store and the prompt manager to be mocked out or used in dependency injection scenarios&lt;/li&gt;
&lt;li&gt;Because the file store has an interface, the prompt manager takes an instance of that, so you can build your own stores in case you want to hold them in something like a database (see the repo for an example)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We should probably come up with a logo for it as well, right now it's the boring NuGet default.&lt;/p&gt;

&lt;p&gt;Is there something you'd like to see it do? If so then feel free to raise it over on GitHub.&lt;/p&gt;

</description>
      <category>dotnet</category>
      <category>promptengineering</category>
      <category>genai</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Kotlin and Azure Functions - Automating the deployment</title>
      <dc:creator>Darren Fuller</dc:creator>
      <pubDate>Sat, 16 Mar 2024 10:09:45 +0000</pubDate>
      <link>https://dev.to/dazfuller/kotlin-and-azure-functions-automating-the-deployment-3npe</link>
      <guid>https://dev.to/dazfuller/kotlin-and-azure-functions-automating-the-deployment-3npe</guid>
      <description>&lt;p&gt;Recently I needed to write an Azure Function app that uses the &lt;a href="https://poi.apache.org" rel="noopener noreferrer"&gt;Apache POI&lt;/a&gt; library for getting the text from Microsoft Word 94 documents (and yes, I am fully aware that the year is currently 2024, but some people still have 30 year old documents kicking around!).&lt;/p&gt;

&lt;p&gt;Being somewhat allergic to coding in Java (this is a personal thing, if you like Java then good for you) I decided to try out writing the code using &lt;a href="https://kotlinlang.org" rel="noopener noreferrer"&gt;Kotlin&lt;/a&gt; from &lt;a href="https://www.jetbrains.com" rel="noopener noreferrer"&gt;JetBrains&lt;/a&gt; instead. I'm already using &lt;a href="https://www.jetbrains.com/idea/" rel="noopener noreferrer"&gt;IntelliJ&lt;/a&gt; as I work with Apache Spark using Scala, so the tooling was already there and ready to go for this.&lt;/p&gt;

&lt;p&gt;Microsoft, unexpectedly, have an &lt;a href="https://learn.microsoft.com/azure/azure-functions/functions-create-maven-kotlin-intellij" rel="noopener noreferrer"&gt;article&lt;/a&gt; about creating Azure Function apps using Kotlin, so this was a great start and, pretty quickly, I had a version of my function app ready to try out.&lt;/p&gt;

&lt;h2&gt;
  
  
   Debugging
&lt;/h2&gt;

&lt;p&gt;This isn't as straight forward as just running a debug configuration unfortunately, it would be nice if it was, but it's honestly not too bad either. The Microsoft article talks about how to do it, and once you get into the habit of running the app and then running a configuration to attach the debugger, it just starts to flow.&lt;/p&gt;

&lt;p&gt;So, running locally I have an Azure Function app, written in Kotlin, using the Apache POI library, extracting text from a Wore 94 document (it kills me a little bit every time I write that part). So I opened up &lt;a href="https://httpie.io/desktop" rel="noopener noreferrer"&gt;HTTPie&lt;/a&gt; and started firing sample documents at it&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9s7352nmahdfdlt09dvg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9s7352nmahdfdlt09dvg.png" alt="IntelliJ running and debugging the code"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fug65rahebvebfr1fcquc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fug65rahebvebfr1fcquc.png" alt="API being called from HTTPie"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It works great, so next I want to deploy this thing, and that's where the smooth ride hits a few bumps.&lt;/p&gt;

&lt;h2&gt;
  
  
   Deployment
&lt;/h2&gt;

&lt;p&gt;In the article it tells you to just run the &lt;code&gt;azure-functions:deploy&lt;/code&gt; Maven target. It somewhat skips over the fact that you should probably first open up the &lt;code&gt;pom.xml&lt;/code&gt; file and do things like change the resource group name, function app name, and the app region. These might seem trivial, but if you have issues with data leaving certain regions then they become really important.&lt;/p&gt;

&lt;p&gt;So, first up, change the &lt;code&gt;pom.xml&lt;/code&gt; file, specifically these values in the &lt;code&gt;properties&lt;/code&gt; section.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;functionAppName&lt;/li&gt;
&lt;li&gt;functionAppRegion&lt;/li&gt;
&lt;li&gt;functionResourceGroup&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Having done that I tried to run the maven target again and... failed.&lt;/p&gt;

&lt;p&gt;But it failed because it was complaining that it couldn't read the &lt;code&gt;azureProfile.json&lt;/code&gt; or &lt;code&gt;accessTokens.json&lt;/code&gt; files and that I should check that I'm logged in using the Azure CLI.&lt;/p&gt;

&lt;p&gt;This is weird because I had just created the resource group and a few other things for my deployment using the CLI. But I logged in again and tried again and still the same error!&lt;/p&gt;

&lt;p&gt;A bit of searching around the web and it turns out that the Azure CLI had been updated a while ago and changed how it was handling tokens, but the Maven Archtype hasn't been updated to reflect this! So, what next?&lt;/p&gt;

&lt;p&gt;Well, most deployments to Azure Functions work by packaging up the function app, producing a zip, and then pushing this up. There are other ways, but this works well and is pretty robust. But it's all hidden away in the target, so how can we replicate what it does. We can do this fairly easily it turns out.&lt;/p&gt;

&lt;p&gt;There are a few steps we want to follow.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Clean everything up&lt;/li&gt;
&lt;li&gt;Package the app&lt;/li&gt;
&lt;li&gt;Create the zip&lt;/li&gt;
&lt;li&gt;Deploy the zip&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Cleaning things up is easy, we can just run &lt;code&gt;mvn clean&lt;/code&gt; and we're done.&lt;/p&gt;

&lt;p&gt;Packaging we need to run a couple of things. First is the &lt;code&gt;mvn package&lt;/code&gt; target to build our application, and then the next is the &lt;code&gt;mvn azure-functions:package&lt;/code&gt; target which creates our directory structure that we need to be able to publish the app (this is the really important step).&lt;/p&gt;

&lt;p&gt;Once both of these have been executed, if we look under the &lt;code&gt;target&lt;/code&gt; directory we can see a new &lt;code&gt;azure-functions&lt;/code&gt; directory which contains everything we need, all the &lt;code&gt;jar&lt;/code&gt; files, the &lt;code&gt;host.json&lt;/code&gt; file and more.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flm60yw0nq072wb2vltk9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flm60yw0nq072wb2vltk9.png" alt="Contents of the target directory after packaging"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now, we're just creating a zip to deploy, so in the command line we move to the &lt;code&gt;target/azure-functions/&amp;lt;function app name&amp;gt;&lt;/code&gt; folder and run the following.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; zip &lt;span class="nt"&gt;-r&lt;/span&gt; ../../../deploy.zip &lt;span class="k"&gt;*&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;That will create a &lt;code&gt;deploy.zip&lt;/code&gt; file in the root of the source directory with all the contents of the packaged function app directory within it.&lt;/p&gt;

&lt;p&gt;Then, we publish the zip to our function app (you need to create that first by the way, as a Java 8 function app) with the following command.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; az functionapp deployment &lt;span class="nb"&gt;source &lt;/span&gt;config-zip &lt;span class="nt"&gt;-g&lt;/span&gt; &amp;lt;resource group name&amp;gt; &lt;span class="nt"&gt;-n&lt;/span&gt; &amp;lt;&lt;span class="k"&gt;function &lt;/span&gt;app name&amp;gt; &lt;span class="nt"&gt;--src&lt;/span&gt; ./deploy.zip


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;And that's it, the function app is deployed. You can then grab the URL from the &lt;a href="https://portal.azure.com" rel="noopener noreferrer"&gt;Azure Portal&lt;/a&gt; and change the URL in HTTPie, send the document, and get the response.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxtytt5x6ptwdbw0716cb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxtytt5x6ptwdbw0716cb.png" alt="HTTPie showing response from deployed application"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;What I ended up doing was putting all of the steps into a Makefile so I could just run &lt;code&gt;make deploy&lt;/code&gt; whenever I wanted to push a new version&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight make"&gt;&lt;code&gt;

&lt;span class="nl"&gt;clean&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
    mvn clean

&lt;span class="nl"&gt;package&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;clean&lt;/span&gt;
    mvn package
    mvn azure-functions:package

&lt;span class="nl"&gt;deploy&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;package&lt;/span&gt;
    &lt;span class="nb"&gt;cd&lt;/span&gt; ./target/azure-functions/&amp;lt;app name&amp;gt;&lt;span class="p"&gt;;&lt;/span&gt; zip &lt;span class="nt"&gt;-r&lt;/span&gt; ../../../deploy.zip &lt;span class="k"&gt;*&lt;/span&gt;
    az functionapp deployment &lt;span class="nb"&gt;source &lt;/span&gt;config-zip &lt;span class="nt"&gt;-g&lt;/span&gt; &amp;lt;resource group name&amp;gt; &lt;span class="nt"&gt;-n&lt;/span&gt; &amp;lt;func app name&amp;gt; &lt;span class="nt"&gt;--src&lt;/span&gt; ./deploy.zip
    &lt;span class="nb"&gt;rm &lt;/span&gt;deploy.zip


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Putting in the dependencies meant that I only needed to execute the deploy step, instead of remembering to clean and package as well.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wrapping up
&lt;/h2&gt;

&lt;p&gt;Pretty quickly I ended up with a function app deployed, written in Kotlin, which could read and extract text from Word 94 (sob sob sob) file. Other than working out the deployment process it went really smoothly and it runs incredibly well.&lt;/p&gt;

&lt;p&gt;I'm probably not quite ready to give up C# and dotnet to use this for everything, but this is a nice option when a Java based solution is the better option.&lt;/p&gt;

</description>
      <category>kotlin</category>
      <category>azurefunctions</category>
      <category>deploy</category>
      <category>azure</category>
    </item>
    <item>
      <title>Querying SQL from Databricks without PyODBC</title>
      <dc:creator>Darren Fuller</dc:creator>
      <pubDate>Sat, 06 May 2023 10:00:55 +0000</pubDate>
      <link>https://dev.to/dazfuller/querying-sql-from-databricks-without-pyodbc-109f</link>
      <guid>https://dev.to/dazfuller/querying-sql-from-databricks-without-pyodbc-109f</guid>
      <description>&lt;p&gt;Okay, so this is probably a bit of a niche post but still.&lt;/p&gt;

&lt;p&gt;Something I see a lot of is people asking questions on how to do things like run arbitrary SQL statements on SQL databases from Databricks, or how to execute stored procedures. Often this leads to the same follow-ups on how to install PyODBC along with Unix Drivers, adding Microsoft's packages repo, accepting EULA's etc...&lt;/p&gt;

&lt;p&gt;It works, don't get me wrong, but it's a faff especially if you don't have permission to execute those things. Plus Microsoft's packages repo has had... issues in the past, and suddenly production jobs fail because they can't connect. I've &lt;a href="https://dev.to/dazfuller/databricks-and-pyodbc-avoiding-another-ms-repo-outage-4oea"&gt;posted&lt;/a&gt; about this before, so there are ways around it, but still, it's a faff.&lt;/p&gt;

&lt;p&gt;So, what if you want to connect to SQL and installing PyODBC isn't an option?&lt;/p&gt;

&lt;h2&gt;
  
  
  Using what's already available
&lt;/h2&gt;

&lt;p&gt;Yep, you can do this using only what is already available, and pretty easily, all thanks to Py4J. Py4J is important in Spark because Spark runs on the JVM, so how do all of those PySpark calls get executed? Well, it calls the Java/Scala methods under the hood through Py4J. And this works not just for Spark.&lt;/p&gt;

&lt;p&gt;To make things like the Azure SQL Databricks connector work it ships with the Microsoft SQL JDBC library (along with others such as PostgreSQL), and we can access these in Python.&lt;/p&gt;

&lt;p&gt;To keep things as secure as possible I'm going to show how to do this using a service principal for access, but this works just as well with SQL-based authentication as well.&lt;/p&gt;

&lt;h2&gt;
  
  
  How it works
&lt;/h2&gt;

&lt;p&gt;So first I'm going to assume some things.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;You have a SQL database you can connect to&lt;/li&gt;
&lt;li&gt;A service principal (or SQL account) exists which has permissions&lt;/li&gt;
&lt;li&gt;You have access to Databricks&lt;/li&gt;
&lt;li&gt;The credentials are in a secret scope (if not then why not!)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The first thing we need is a reference to &lt;code&gt;SQLServerDataSource&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;SQLServerDataSource&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;spark&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_sc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_gateway&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;jvm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;com&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;microsoft&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sqlserver&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;jdbc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SQLServerDataSource&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And that's the magic line which lets us access something in the JVM from Python. So we now have a Python variable which references this object. Now we can use it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;client_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dbutils&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;secrets&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;secret_scope&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;secret name&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;client_secret&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dbutils&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;secrets&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;secret_scope&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;secret name&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;datasource&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SQLServerDataSource&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;datasource&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setServerName&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;sql_server&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.database.windows.net&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;datasource&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setDatabaseName&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;database_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;datasource&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setAuthentication&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ActiveDirectoryServicePrincipal&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;datasource&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setAADSecurePrincipalId&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;datasource&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setAADSecurePrincipalSecret&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client_secret&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here we are getting the service principal's application id and client secret from our secret scope. Then we're creating an instance of &lt;code&gt;SQLServerDataSource&lt;/code&gt; and configuring it to connect to our database using AAD based authentication (for other options see the &lt;a href="https://learn.microsoft.com/sql/connect/jdbc/setting-the-connection-properties"&gt;connection string settings&lt;/a&gt; documentation).&lt;/p&gt;

&lt;p&gt;Now, we're read to execute something. So lets do a simple query to get the list of users.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;connection&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;datasource&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getConnection&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;statement&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;connection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createStatement&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;statement&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;executeQuery&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;SELECT name FROM sysusers&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;next&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="k"&gt;except&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;oops&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So, we get a connection from the data source, get a statement object, execute a query, and then iterate on the results.&lt;/p&gt;

&lt;p&gt;And that's pretty much it!&lt;/p&gt;

&lt;p&gt;There are other methods which allow you to prepare statements which take parameters in a secure way, and you can use the &lt;code&gt;execute&lt;/code&gt; method instead if you're not expecting results (such as when calling a stored procedure). Or maybe you want to make sure a database schema exists before writing your DataFrame to SQL.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;statement&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;IF NOT EXISTS (SELECT 1 FROM sys.schemas WHERE [name] = &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;MyCoolSchema&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;) BEGIN EXEC(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;CREATE SCHEMA [MyCoolSchema]&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;) END&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There are samples of how to do this in the &lt;a href="https://learn.microsoft.com/sql/connect/jdbc/data-source-sample"&gt;data source sample&lt;/a&gt; documentation.&lt;/p&gt;

&lt;p&gt;But we haven't had to install any new drivers or packages to make this happen.&lt;/p&gt;

&lt;h2&gt;
  
  
  Any issues?
&lt;/h2&gt;

&lt;p&gt;Yes. Don't go crazy with this. Spark is a Big Data platform used for processing huge amounts of data, it's not intended for making lots of small queries, and this can reduce connection availability for the Spark JDBC operations. But as with anything, if you use the tools the way they are intended to be used then you shouldn't have any issues.&lt;/p&gt;

</description>
      <category>spark</category>
      <category>databricks</category>
      <category>python</category>
      <category>database</category>
    </item>
    <item>
      <title>Azure Digital Twins - Fluent API v3.1 - MATCH the things!</title>
      <dc:creator>Darren Fuller</dc:creator>
      <pubDate>Sun, 09 Oct 2022 11:07:13 +0000</pubDate>
      <link>https://dev.to/dazfuller/azure-digital-twins-fluent-api-v31-match-the-things-4ndj</link>
      <guid>https://dev.to/dazfuller/azure-digital-twins-fluent-api-v31-match-the-things-4ndj</guid>
      <description>&lt;p&gt;Back in June of this year I &lt;a href="https://dev.to/dazfuller/azure-digital-twins-building-a-fluent-api-in-net-6g3"&gt;posted&lt;/a&gt; about the Fluent API we built at &lt;a href="https://intelligentspaces.io"&gt;Intelligent Spaces&lt;/a&gt; for Azure Digital Twins.&lt;/p&gt;

&lt;p&gt;Since then we've been busy making extensive use of the API ourselves and looking at making improvements. This last week we've finished implementing one of the larger improvements to the API since introducing the Fluent API itself. And that's bringing in the &lt;a href="https://learn.microsoft.com/azure/digital-twins/reference-query-clause-match"&gt;MATCH&lt;/a&gt; clause.&lt;/p&gt;

&lt;h2&gt;
  
  
  The MATCH clause
&lt;/h2&gt;

&lt;p&gt;Most of the time when writing queries against Azure Digital Twin you'll find yourself using the JOIN clause, which is pretty easy to use. You start with a twin on the left, and say you want to connect another twin on the right based on a relationship type, such as.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;building&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;level&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;
    &lt;span class="n"&gt;digitaltwins&lt;/span&gt; &lt;span class="n"&gt;building&lt;/span&gt;
    &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="k"&gt;level&lt;/span&gt; &lt;span class="n"&gt;RELATED&lt;/span&gt; &lt;span class="n"&gt;building&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;isPartOf&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt;
    &lt;span class="n"&gt;building&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;dtId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'myBuilding'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These work great but have a few limitations. You can't join more than 5 deep, there are no OUTER JOIN semantics, and you can only query on the twins, so if there's something about the relationship (other than it's type) that you want to filter on, you're going to have to do this in the client code.&lt;/p&gt;

&lt;p&gt;So, lets say we have a graph where we want to get from an owning organisation, down to the assets inside of rooms, but also want to get to the assets Building Information Modelling (BIM) data. In our ontology we have the following.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F88obai3r0bfkskjiq1dm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F88obai3r0bfkskjiq1dm.png" alt="Ontology structure showing query from organisation to asset and BIM properties" width="800" height="337"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There's a couple of things here which are going to stop us from using the JOIN clause, the primary one being that the query goes 6 deep and we have a limit of 5. But we can write this using a MATCH clause pretty easily as follows.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;asset&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bimproperties&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;
    &lt;span class="n"&gt;digitaltwins&lt;/span&gt;
&lt;span class="k"&gt;MATCH&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;company&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="n"&gt;owns&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;building&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="n"&gt;isPartOf&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;level&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="n"&gt;isPartOf&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;room&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="n"&gt;servedBy&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;asset&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="n"&gt;hasBimProperties&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bimproperties&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt;
    &lt;span class="n"&gt;company&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;dtId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'myOrg'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I say easily, but you can see that we've got arrows in there, relationships prefixed with colons, and it gets even more fun when we introduce hops &lt;code&gt;-[query_variable:relationship_name*min_hops..max_hops]-&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;This gets complex pretty quickly and becomes easy to really mess things up. Miss that &lt;code&gt;:&lt;/code&gt; and suddenly our relationship name becomes a query variable, and our join goes wrong.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Fluent API
&lt;/h2&gt;

&lt;p&gt;The MATCH clause provides a really powerful way to navigate the digital twin graph, but it's syntax is not natural SQL and, as we've just seen, it's pretty easy to get it wrong. This was one of the primary reasons for why we wanted bring it in to our Fluent API. To do this we introduced a new method called &lt;code&gt;Match&lt;/code&gt;, and it brings in all of the functionality of the &lt;code&gt;MATCH&lt;/code&gt; clause. Importantly the API also performs validation at run time on what has been specified, as you can't mix the &lt;code&gt;JOIN&lt;/code&gt; clause with the &lt;code&gt;MATCH&lt;/code&gt; clause, so it guards against this.&lt;/p&gt;

&lt;p&gt;We have also introduced the &lt;code&gt;BaseGraphInstanceFluentRelationship&lt;/code&gt; type which can be used in one of 2 ways. The first is directly, so you can create a new instance and specify the relationship name. The second way is to derive new relationship types from it and set the relationship name as part of its default constructor.&lt;/p&gt;

&lt;p&gt;Using directly means that, where there not existing C# models that describe the relationships you don't have to create entirely new ones to use the API. But they are pretty simple, so you can't do much with them other than use them for specifying the relationship name.&lt;/p&gt;

&lt;p&gt;If you derive and create new types, then you can use these with the &lt;code&gt;Where&lt;/code&gt; methods to also filter your queries based on relationship properties, and that gets a lot more powerful.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;
&lt;span class="c1"&gt;// Used directly&lt;/span&gt;
&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;owns&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;BaseGraphInstanceFluentRelationship&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"owns"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;// Or as an override&lt;/span&gt;
&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Owns&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;BaseGraphInstanceFluentRelationship&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;JsonPropertyName&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"start"&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="n"&gt;DateTime&lt;/span&gt; &lt;span class="n"&gt;StartOfLease&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;get&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;set&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;JsonPropertyName&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"end"&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="n"&gt;DateTime&lt;/span&gt; &lt;span class="n"&gt;EndOfLease&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;get&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;set&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="nf"&gt;Owns&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;RelationshipName&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"owns"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Using the &lt;code&gt;JsonPropertyName&lt;/code&gt; attribute allows us to map POCO properties to field names in the graph.&lt;/p&gt;

&lt;p&gt;So, lets re-build our query using the &lt;code&gt;MATCH&lt;/code&gt; clause above using the Fluent API.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="n"&gt;IEnumerable&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;(&lt;/span&gt;&lt;span class="n"&gt;Room&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Asset&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;BimProperties&lt;/span&gt;&lt;span class="p"&gt;)&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;FluentGraph&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithFluent&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;From&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Company&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;(&lt;/span&gt;&lt;span class="n"&gt;ModelVerificationType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IsOfModel&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Match&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Building&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Owns&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;()&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Match&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Room&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;IsPartOf&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;(&lt;/span&gt;&lt;span class="n"&gt;modelVerificationType&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ModelVerificationType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IsOfModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;minHops&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Match&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Asset&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ServedBy&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;(&lt;/span&gt;&lt;span class="n"&gt;modelVerificationType&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ModelVerificationType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IsOfModel&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Match&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;BimProperties&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;HasBimProperties&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;(&lt;/span&gt;&lt;span class="n"&gt;direction&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;MatchDirection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RightToLeft&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WhereId&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Company&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;(&lt;/span&gt;&lt;span class="s"&gt;"myOrg"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Project&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Room&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Asset&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;BimProperties&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can see we're starting the same way using &lt;code&gt;From&lt;/code&gt; method, but then jump into using &lt;code&gt;Match&lt;/code&gt; which, using derived types, takes two type arguments, the first is the model we're connecting to, and the second is the relationship type.&lt;/p&gt;

&lt;p&gt;The full method signature allows us to provide a query variable, minimum number of hops, maximum number, join direction, and model verification type. There are also overrides allowing us to specify multiple relationship types as the &lt;code&gt;MATCH&lt;/code&gt; clause lets us union different relationship names such as &lt;code&gt;-[:owns|leases|rents]-&lt;/code&gt; (that syntax just gets more complex right).&lt;/p&gt;

&lt;p&gt;The above Fluent API builds and executes the following Digital Twin query.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;room&lt;/span&gt;
    &lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;asset&lt;/span&gt;
    &lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bimproperties&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;
    &lt;span class="n"&gt;digitaltwins&lt;/span&gt;
&lt;span class="k"&gt;MATCH&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;company&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="n"&gt;owns&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;building&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="n"&gt;isPartOf&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;room&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="n"&gt;servedBy&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;asset&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="n"&gt;hasBimProperties&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bimproperties&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt;
    &lt;span class="n"&gt;IS_OF_MODEL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;company&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'dtmi:digitaltwins:rec_3_3:agents:Company;1'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;IS_OF_MODEL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;room&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'dtmi:digitaltwins:rec_3_3:core:Room;1'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;IS_OF_MODEL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;asset&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'dtmi:digitaltwins:rec_3_3:core:Asset;1'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  
    &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;company&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;dtId&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'myOrg'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There's a slight change here compared to the original. We're setting a minimum number of hops from building to room. Because we've not specified a maximum number then it's a specific number of hops, so "there are 2 hops from building to room".&lt;/p&gt;

&lt;p&gt;But we've now got type safety, compile time safety, and runtime validation of the query. If we specified right-to-left as the join direction then that's what we get, and there is always a &lt;code&gt;:&lt;/code&gt; before the relationship name.&lt;/p&gt;

&lt;p&gt;One of things we can also do (for matches which &lt;a href="https://learn.microsoft.com/azure/digital-twins/reference-query-clause-match#assign-query-variable-to-relationship-and-specify-relationship-properties"&gt;don't include a hop&lt;/a&gt;) is specify a query name and then use this in a &lt;code&gt;WHERE&lt;/code&gt; clause.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Change the building match statement as follows&lt;/span&gt;
&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Match&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Building&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Owns&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;(&lt;/span&gt;&lt;span class="s"&gt;"o"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;// Add the WHERE clause&lt;/span&gt;
&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;WhereClause&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Comparison&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Owns&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;DateTime&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;(&lt;/span&gt;
        &lt;span class="s"&gt;"o"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="n"&gt;o&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StartOfLease&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="n"&gt;DateTime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;UtcNow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;AddDays&lt;/span&gt;&lt;span class="p"&gt;(-&lt;/span&gt;&lt;span class="m"&gt;7&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; 
        &lt;span class="n"&gt;ComparisonOperatorType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GreaterThan&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here we specifying the query variable as &lt;code&gt;o&lt;/code&gt; and then using it in the where clause to filter only those buildings whose lease has started within the last 7 days. This generates the Digital Twin query (parts of it) like this&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Match&lt;/span&gt;
&lt;span class="k"&gt;MATCH&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;company&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;owns&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;building&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;-- Where&lt;/span&gt;
&lt;span class="n"&gt;company&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;dtId&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'myOrg'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;start&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'2022-10-02T00:00:00.0000000Z'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can see that the Fluent API has converted the POCO property name to &lt;code&gt;start&lt;/code&gt; because of the &lt;code&gt;JsonPropertyName&lt;/code&gt; attribute, and generated a correctly formatted date time value. In there &lt;code&gt;Where&lt;/code&gt; method, because we said that the property is a &lt;code&gt;DateTime&lt;/code&gt; we can only use the &lt;code&gt;DateTime&lt;/code&gt; properties, and can only provide a &lt;code&gt;DateTime&lt;/code&gt; value for comparison, otherwise it's a compile time error and we don't get to deploy the code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Current state
&lt;/h2&gt;

&lt;p&gt;With the introduction of the &lt;code&gt;Match&lt;/code&gt; clause we can now use the Fluent API for all methods of querying the Digital Twin graph (at the time of writing), in a method which is interface driven and so lets us unit test code without connecting to a Digital Twin instance. With greater protection against misspellings, queries that update as the twin model version changes, and better compile time safety.&lt;/p&gt;

&lt;p&gt;If you want to try it out yourself, we have our SDK published as a NuGet package over at &lt;a href="https://www.nuget.org/packages/Elastacloud.DigiTwin.SDK"&gt;nuget.org&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Don't forget to check out our other releases, such as &lt;a href="https://www.nuget.org/packages/AdxUtilities"&gt;AdxUtilities&lt;/a&gt;, or our Azure Digital Twin &lt;a href="https://github.com/intelligentspaces/spark-digitaltwin-reader"&gt;connector for Spark&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Or, come and check us out directly if you want to see what we can do for your spaces using Azure Digital Twins, over at our &lt;a href="https://intelligentspaces.io/"&gt;Intelligent Spaces&lt;/a&gt; website.&lt;/p&gt;

</description>
      <category>azure</category>
      <category>digitaltwins</category>
      <category>csharp</category>
      <category>dotnet</category>
    </item>
    <item>
      <title>Azure Digital Twins - Building a Fluent API in .NET</title>
      <dc:creator>Darren Fuller</dc:creator>
      <pubDate>Sat, 25 Jun 2022 09:05:53 +0000</pubDate>
      <link>https://dev.to/dazfuller/azure-digital-twins-building-a-fluent-api-in-net-6g3</link>
      <guid>https://dev.to/dazfuller/azure-digital-twins-building-a-fluent-api-in-net-6g3</guid>
      <description>&lt;p&gt;Azure Digital Twins is an amazing platform service which allows you to model virtual representations of a real world space. In &lt;a href="https://intelligentspaces.io"&gt;Intelligent Spaces&lt;/a&gt; we pull in data from BIM sources to create a digital twin and combine that with real world context from IoT sensors to optimize spaces for business use.&lt;/p&gt;

&lt;p&gt;Digital Twins has a &lt;a href="https://docs.microsoft.com/rest/api/digital-twins/dataplane/query/querytwins"&gt;Query API&lt;/a&gt; through which we can query the twin using the &lt;a href="https://docs.microsoft.com/azure/digital-twins/reference-query-clause-select"&gt;Digital Twin Query Language&lt;/a&gt; which provides a SQL type language for querying what is a graph database. Microsoft also provides &lt;a href="https://docs.microsoft.com/dotnet/api/overview/azure/digitaltwins/client?view=azure-dotnet&amp;amp;preserve-view=true"&gt;SDKs&lt;/a&gt; we can use instead of having to use the Query API directly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Creating queries
&lt;/h2&gt;

&lt;p&gt;So, what does a typical query look like? Well, lets says we want to query our digital twin to find all rooms in a building which have a maximum occupancy of 10 or greater, or the name contains the word "meeting". But we also want to make sure that the "room" twins we get back are definitely rooms.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;building&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;level&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;room&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;digitaltwins&lt;/span&gt; &lt;span class="n"&gt;building&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="k"&gt;level&lt;/span&gt; &lt;span class="n"&gt;RELATED&lt;/span&gt; &lt;span class="n"&gt;building&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;isPartOf&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;room&lt;/span&gt; &lt;span class="n"&gt;RELATED&lt;/span&gt; &lt;span class="k"&gt;level&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;isPartOf&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;building&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;dtId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'HeadOffice'&lt;/span&gt;
&lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;IS_OF_MODEL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;room&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'dtmi:digitaltwins:rec_3_3:core:Room;1'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;room&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;maximumOccupancy&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;
    &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="k"&gt;CONTAINS&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;room&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'meeting'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Hand writing this means that we're likely to make a mistake. Plus, it means that wherever we use the model information we're going to have to remember to change it in all places we've used it if we update the version.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Fluent API
&lt;/h2&gt;

&lt;p&gt;To help improve on this situation we decided to create a Fluent API. This way we could have an easy to assembly query builder and compile time safety. But we also wanted to use the POCOs we built to hold the output to bring in the model information. That way, if we update the version later we only have to change it in one place. So, what does this actually look like (including some POCO objects to work with).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;BuildingModel&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;BaseGraphInstanceFluentModel&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;override&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="n"&gt;TwinModelType&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s"&gt;"dtmi:digitaltwins:rec_3_3:core:Building;1"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;LevelModel&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;BaseGraphInstanceFluentModel&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;override&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="n"&gt;TwinModelType&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s"&gt;"dtmi:digitaltwins:rec_3_3:core:Level;1"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;RoomModel&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;BaseGraphInstanceFluentModel&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;override&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="n"&gt;TwinModelType&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s"&gt;"dtmi:digitaltwins:rec_3_3:core:Room;1"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;JsonPropertyName&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="n"&gt;Name&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;get&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;set&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;JsonPropertyName&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"maximumOccupancy"&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;MaximumOccupancy&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;get&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;set&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;dt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithFluent&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;From&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;BuildingModel&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;()&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Join&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;LevelModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;BuildingModel&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;(&lt;/span&gt;&lt;span class="s"&gt;"isPartOf"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Join&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;RoomModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;LevelModel&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;(&lt;/span&gt;&lt;span class="s"&gt;"isPartOf"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ModelVerificationType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IsOfModel&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WhereId&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;BuildingModel&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;(&lt;/span&gt;&lt;span class="s"&gt;"HeadOffice"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;WhereClause&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Or&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;WhereClause&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Comparison&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;RoomModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;MaximumOccupancy&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ComparisonOperatorType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GreaterThanOrEqual&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;WhereClause&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Function&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;RoomModel&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"meeting"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;FunctionType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Contains&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Project&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;BuildingModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;LevelModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;RoomModel&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This produces the follwing Digital Twin SQL statement.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;buildingmodel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;levelmodel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;roommodel&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;digitaltwins&lt;/span&gt; &lt;span class="n"&gt;buildingmodel&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;levelmodel&lt;/span&gt; &lt;span class="n"&gt;RELATED&lt;/span&gt; &lt;span class="n"&gt;buildingmodel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;isPartOf&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;roommodel&lt;/span&gt; &lt;span class="n"&gt;RELATED&lt;/span&gt; &lt;span class="n"&gt;levelmodel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;isPartOf&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;IS_OF_MODEL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;roommodel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'dtmi:digitaltwins:rec_3_3:core:Room;1'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;buildingmodel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;dtId&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'HeadOffice'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;roommodel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;maximumOccupancy&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;
    &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="k"&gt;CONTAINS&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;roommodel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'meeting'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We have 3 POCOs in this example which inherit from a &lt;code&gt;BaseGraphInstanceFluentModel&lt;/code&gt; class. That base class gives us a number of features and some common properties covering things like &lt;code&gt;$dtId&lt;/code&gt;, &lt;code&gt;$eTag&lt;/code&gt;, and &lt;code&gt;$metadata&lt;/code&gt;. It also takes care of generating the model aliases. The &lt;code&gt;JsonPropertyName&lt;/code&gt; attributes in the POCO classes are used to map the C# properties to the properties in the digital twins, the FluentAPI uses these to generate the correct property names for the query.&lt;/p&gt;

&lt;p&gt;We also included a few helper methods such as &lt;code&gt;WhereId&lt;/code&gt; as this is such a common query piece that it didn't make sense to keep writing it out in full every time.&lt;/p&gt;

&lt;p&gt;As you can see, the API is making extensive use of generics and property accessors. This way when we say we want to query a property  we get that property, and if we say we're doing a comparison on an integer then it's a compile time error to use a non-integer property or value.&lt;/p&gt;

&lt;p&gt;The final part of the builder is the &lt;code&gt;Project&lt;/code&gt; method. This is essentially the &lt;code&gt;SELECT&lt;/code&gt; part of the query. In this case we're selecting all of the models we've used in the builder, but it can also be a subset of them. But we also wanted a little more.&lt;/p&gt;

&lt;h2&gt;
  
  
  Projecting to a different type
&lt;/h2&gt;

&lt;p&gt;One of the things we often do with the output is to take only certain properties, so we added a version which lets us project an output to a different POCO.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ExampleOutput&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="n"&gt;BuildingId&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;get&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;set&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="n"&gt;LevelId&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;get&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;set&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="n"&gt;RoomId&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;get&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;set&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="n"&gt;RoomName&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;get&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;set&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;   
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;MaximumOccupancy&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;get&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;set&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;dt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithFluent&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;From&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;BuildingModel&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;()&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Join&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;LevelModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;BuildingModel&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;(&lt;/span&gt;&lt;span class="s"&gt;"isPartOf"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Join&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;RoomModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;LevelModel&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;(&lt;/span&gt;&lt;span class="s"&gt;"isPartOf"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ModelVerificationType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IsOfModel&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WhereId&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;BuildingModel&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;(&lt;/span&gt;&lt;span class="s"&gt;"HeadOffice"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;WhereClause&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Or&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;WhereClause&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Comparison&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;RoomModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;MaximumOccupancy&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ComparisonOperatorType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GreaterThanOrEqual&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;WhereClause&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Function&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;RoomModel&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"meeting"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;FunctionType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Contains&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Project&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;BuildingModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;LevelModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;RoomModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ExampleOutput&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;((&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="n"&gt;ExampleOutput&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;BuildingId&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ExternalId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;LevelId&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ExternalId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;RoomId&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ExternalId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;RoomName&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;MaximumOccupancy&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;MaximumOccupancy&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This time we get a different Digital Twin query statement generated which looks like this.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;buildingmodel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;dtId&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;BuildingId&lt;/span&gt;
    &lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;levelmodel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;dtId&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;LevelId&lt;/span&gt;
    &lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;roommodel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;dtId&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;RoomId&lt;/span&gt;
    &lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;roommodel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;RoomName&lt;/span&gt;
    &lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;roommodel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;maximumOccupancy&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;MaximumOccupancy&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;digitaltwins&lt;/span&gt; &lt;span class="n"&gt;buildingmodel&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;levelmodel&lt;/span&gt; &lt;span class="n"&gt;RELATED&lt;/span&gt; &lt;span class="n"&gt;buildingmodel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;isPartOf&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;roommodel&lt;/span&gt; &lt;span class="n"&gt;RELATED&lt;/span&gt; &lt;span class="n"&gt;levelmodel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;isPartOf&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;IS_OF_MODEL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;roommodel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'dtmi:digitaltwins:rec_3_3:core:Room;1'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;buildingmodel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;dtId&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'HeadOffice'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;roommodel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;maximumOccupancy&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;
    &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="k"&gt;CONTAINS&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;roommodel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'meeting'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This time instead of querying for the full models, we're returning just the properties we want and aliasing them so that we can match them to the new output type. Again, if we try to project out a string field to an integer then it's a compile time error.&lt;/p&gt;

&lt;h2&gt;
  
  
  Generic types for when not everything is modelled
&lt;/h2&gt;

&lt;p&gt;Last, but not least, we added in the capability to have "generic" types as well. These are models you can use when you want to join from one twin to another where you only care about the id, but you don't have a full POCO define. So if we didn't have a "level" model (weird, I know), then we could something like this instead.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;level&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;GenericGraphInstanceModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"level"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="n"&gt;dt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithFluent&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;From&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;BuildingModel&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;()&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;level&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"isPartOf"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Join&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;RoomModel&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;(&lt;/span&gt;&lt;span class="n"&gt;level&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"isPartOf"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ModelVerificationType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IsOfModel&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WhereId&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;BuildingModel&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;(&lt;/span&gt;&lt;span class="s"&gt;"HeadOffice"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;WhereClause&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Or&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;WhereClause&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Comparison&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;RoomModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;MaximumOccupancy&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ComparisonOperatorType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GreaterThanOrEqual&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;WhereClause&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Function&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;RoomModel&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"meeting"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;FunctionType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Contains&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Project&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;BuildingModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;RoomModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;GenericGraphInstanceModel&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;(&lt;/span&gt;&lt;span class="n"&gt;level&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And from this code we get the following query.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;buildingmodel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;roommodel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;level&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;digitaltwins&lt;/span&gt; &lt;span class="n"&gt;buildingmodel&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="k"&gt;level&lt;/span&gt; &lt;span class="n"&gt;RELATED&lt;/span&gt; &lt;span class="n"&gt;buildingmodel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;isPartOf&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;roommodel&lt;/span&gt; &lt;span class="n"&gt;RELATED&lt;/span&gt; &lt;span class="k"&gt;level&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;isPartOf&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;IS_OF_MODEL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;roommodel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'dtmi:digitaltwins:rec_3_3:core:Room;1'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;buildingmodel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;dtId&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'HeadOffice'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;roommodel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;maximumOccupancy&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;
    &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="k"&gt;CONTAINS&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;roommodel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'meeting'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As you can see, it's almost exactly the same as the original. But this gives us the flexibility to use intermediate twins without having to model everything.&lt;/p&gt;

&lt;p&gt;The results of all of these queries are &lt;code&gt;Task&amp;lt;IEnumerable&amp;lt;&amp;gt;&amp;gt;&lt;/code&gt;, in the case where we're projecting Building, Level, and Room then we get an &lt;code&gt;Task&amp;lt;IEnumerable&amp;lt;(BuildingModel, LevelModel, RoomModel)&amp;gt;&amp;gt;&lt;/code&gt; result, giving us typed access to all of the values returned. And naturally &lt;code&gt;Task&lt;/code&gt; because the &lt;code&gt;Project&lt;/code&gt; method is asynchronous.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wrapping up
&lt;/h2&gt;

&lt;p&gt;This is a pretty quick wrap up of the functionality. Built into the Fluent API is the capability to use all of the functions and operators available in the language specification, the capability to perform exact matches on the &lt;code&gt;IS_OF_MODEL&lt;/code&gt; function.&lt;/p&gt;

&lt;p&gt;This has been rolled out in our applications now, giving us more assurance over the queries we're executing, knowing that we haven't accidentally misspelt an alias or function name, and removing lots of code where we're simply creating strings to put into queries. And the API itself is extensively unit tested to give us further confidence still.&lt;/p&gt;

&lt;p&gt;Coming up next is looking at bringing relationship information into the API to remove the manual &lt;code&gt;"isPartOf"&lt;/code&gt; aspect of the query builder.&lt;/p&gt;

&lt;p&gt;We've also made the package available as a &lt;a href="https://www.nuget.org/packages/Elastacloud.DigiTwin.SDK"&gt;NuGet package&lt;/a&gt; if you want to try it out yourself.&lt;/p&gt;

</description>
      <category>azure</category>
      <category>digitaltwins</category>
      <category>dotnet</category>
      <category>csharp</category>
    </item>
    <item>
      <title>Nu Shell and Databricks</title>
      <dc:creator>Darren Fuller</dc:creator>
      <pubDate>Sat, 15 Jan 2022 12:59:57 +0000</pubDate>
      <link>https://dev.to/dazfuller/nu-shell-and-databricks-4eof</link>
      <guid>https://dev.to/dazfuller/nu-shell-and-databricks-4eof</guid>
      <description>&lt;p&gt;I'm a big fan of the command line. It's often something that can seem daunting to people at first, but with a little time and patience you can often speed up many tasks just by knowing some useful commands and how to chain them together.&lt;/p&gt;

&lt;p&gt;Most of the time I'm in Powershell which, thanks to &lt;a href="https://github.com/powershell/powershell"&gt;Powershell Core&lt;/a&gt;, is now cross-platform and incredibly powerful. But I'm finding myself also using &lt;a href="https://www.nushell.sh/"&gt;Nu&lt;/a&gt; more and more. In both shells I also use the &lt;a href="https://docs.databricks.com/dev-tools/cli/index.html"&gt;Databricks CLI&lt;/a&gt; a lot. Want to check the status of jobs? Use the CLI. Want to upload and download data? Use the CLI. And so on.&lt;/p&gt;

&lt;p&gt;Whilst the Databricks CLI is useful, there's times where I want a little more power over it. Such as, using the CLI to find a Databricks runtime version which is under Long Term Support (LTS) and is &lt;a href="https://databricks.com/product/photon"&gt;Photon&lt;/a&gt; enabled. So, I can do this using, for instance, the Databricks CLI and some &lt;a href="https://stedolan.github.io/jq/"&gt;jq&lt;/a&gt;. But I'm also lazy and wanted something that's a bit easier to query, and displays nicer, and is easier to output to something like CSV afterwards.&lt;/p&gt;

&lt;p&gt;Well, I can get all of that from Nushell. The only downside is that it's quite a few commands to get the data into the right shape to make querying it easy. So, instead, lets do the tedious bits and save them as a &lt;a href="https://www.nushell.sh/book/configuration.html"&gt;command aliases&lt;/a&gt;. So, lets fire up Nushell and give it a go.&lt;/p&gt;

&lt;p&gt;First up, lets find our config file.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; config path
C:&lt;span class="se"&gt;\U&lt;/span&gt;sers&lt;span class="se"&gt;\D&lt;/span&gt;arrenFuller&lt;span class="se"&gt;\A&lt;/span&gt;ppData&lt;span class="se"&gt;\R&lt;/span&gt;oaming&lt;span class="se"&gt;\n&lt;/span&gt;ushell&lt;span class="se"&gt;\n&lt;/span&gt;u&lt;span class="se"&gt;\c&lt;/span&gt;onfig&lt;span class="se"&gt;\c&lt;/span&gt;onfig.toml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Yours will look different to this, but this is the file we need to add our command aliases to.&lt;/p&gt;

&lt;p&gt;Now, lets work out what our command looks like. I want to create a command that calls the Databricks CLI for the runtime versions and adds some useful information such as if it's an LTS version. So what does that look like?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;  databricks clusters spark-versions 
    | from json 
    | get versions 
    | insert isLTS &lt;span class="o"&gt;{&lt;/span&gt; get name | str contains &lt;span class="s2"&gt;"LTS"&lt;/span&gt; &lt;span class="o"&gt;}&lt;/span&gt; 
    | insert isML &lt;span class="o"&gt;{&lt;/span&gt; get name | str contains &lt;span class="s2"&gt;"ML"&lt;/span&gt; &lt;span class="o"&gt;}&lt;/span&gt; 
    | insert photonEnabled &lt;span class="o"&gt;{&lt;/span&gt; get name | str contains &lt;span class="nt"&gt;-i&lt;/span&gt; &lt;span class="s2"&gt;"Photon"&lt;/span&gt; &lt;span class="o"&gt;}&lt;/span&gt;
    | insert details &lt;span class="o"&gt;{&lt;/span&gt; get name | parse &lt;span class="s2"&gt;"{runtime} (includes Apache Spark {spark},{remainder}"&lt;/span&gt; &lt;span class="o"&gt;}&lt;/span&gt;
    | insert runtime &lt;span class="o"&gt;{&lt;/span&gt; get details.runtime &lt;span class="o"&gt;}&lt;/span&gt; 
    | insert spark &lt;span class="o"&gt;{&lt;/span&gt; get details.spark &lt;span class="o"&gt;}&lt;/span&gt;
    | reject details
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I've put that over multiple lines to make it easier to read, but if you want to run it you'll need to have it all on the same line, like this.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; databricks clusters spark-versions | from json | get versions | insert isLTS &lt;span class="o"&gt;{&lt;/span&gt; get name | str contains &lt;span class="s2"&gt;"LTS"&lt;/span&gt; &lt;span class="o"&gt;}&lt;/span&gt; | insert isML &lt;span class="o"&gt;{&lt;/span&gt; get name | str contains &lt;span class="s2"&gt;"ML"&lt;/span&gt; &lt;span class="o"&gt;}&lt;/span&gt; | insert photonEnabled &lt;span class="o"&gt;{&lt;/span&gt; get name | str contains &lt;span class="nt"&gt;-i&lt;/span&gt; &lt;span class="s2"&gt;"Photon"&lt;/span&gt; &lt;span class="o"&gt;}&lt;/span&gt; | insert details &lt;span class="o"&gt;{&lt;/span&gt; get name | parse &lt;span class="s2"&gt;"{runtime} (includes Apache Spark {spark},{remainder}"&lt;/span&gt; &lt;span class="o"&gt;}&lt;/span&gt; | insert runtime &lt;span class="o"&gt;{&lt;/span&gt; get details.runtime &lt;span class="o"&gt;}&lt;/span&gt; | insert spark &lt;span class="o"&gt;{&lt;/span&gt; get details.spark &lt;span class="o"&gt;}&lt;/span&gt; | reject details
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So what's it doing? Lets break it down a bit.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;databricks clusters spark-versions
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run the Databricks CLI to get the available runtime information&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Parses the response from JSON as a table&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;get versions
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Gets the "version" part of the response object&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;insert isLTS { get name | str contains "LTS" }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Adds a new "isLTS" column by looking for the term "LTS" in the runtime name&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;insert isML { get name | str contains "ML" }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Adds a new "isML" column by looking for the term "ML" in the runtime name&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;insert photonEnabled { get name | str contains -i "Photon"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Adds a new "photonEnabled" column by doing a case-insensitive search for "Photon" in the runtime name&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;insert details { get name | parse "{runtime} (includes Apache Spark {spark},{remainder}"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Adds a new "details" column by parsing the name name and extracting key information (in curly braces)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;insert runtime { get details.runtime }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Adds a new "runtime" column by getting the runtime information from the details column&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;insert spark { get details.spark }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Adds a new "spark" column by getting the spark version information from the details column&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;reject detail
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Removes the "details" column&lt;/p&gt;

&lt;p&gt;That's a lot of commands to run each time, so lets instead save this as a command alias in our config file.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="py"&gt;startup&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="s"&gt;"alias dbx-runtimes = ( databricks clusters spark-versions | from json | get versions | insert isLTS { get name | str contains &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt;LTS&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt; } | insert isML { get name | str contains &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt;ML&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt; } | insert photonEnabled { get name | str contains -i &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt;Photon&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt; } | insert details { get name | parse &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt;{runtime} (includes Apache Spark {spark},{remainder}&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt; } | insert runtime { get details.runtime } | insert spark { get details.spark } | reject details )"&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here I've aliased the command with the name &lt;code&gt;dbx-runtimes&lt;/code&gt;. I've also had to escape the double-quotation marks. But now that we have this we can run all of the above by simply calling the alias.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; dbx-runtimes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;────┬──────────────────────────────────┬────────────────────────────────────────────────────────────────────┬───────┬───────┬───────────────┬────────────────────────────┬───────
 #  │               key                │                                name                                │ isLTS │ isML  │ photonEnabled │          runtime           │ spark
────┼──────────────────────────────────┼────────────────────────────────────────────────────────────────────┼───────┼───────┼───────────────┼────────────────────────────┼───────
  0 │ 6.4.x-esr-scala2.11              │ 6.4 Extended Support (includes Apache Spark 2.4.5, Scala 2.11)     │ false │ false │ false         │ 6.4 Extended Support       │ 2.4.5
  1 │ 7.3.x-cpu-ml-scala2.12           │ 7.3 LTS ML (includes Apache Spark 3.0.1, Scala 2.12)               │ true  │ true  │ false         │ 7.3 LTS ML                 │ 3.0.1
  2 │ 7.3.x-hls-scala2.12              │ 7.3 LTS Genomics (includes Apache Spark 3.0.1, Scala 2.12)         │ true  │ false │ false         │ 7.3 LTS Genomics           │ 3.0.1
  3 │ 10.2.x-gpu-ml-scala2.12          │ 10.2 ML (includes Apache Spark 3.2.0, GPU, Scala 2.12)             │ false │ true  │ false         │ 10.2 ML                    │ 3.2.0
  4 │ 7.3.x-gpu-ml-scala2.12           │ 7.3 LTS ML (includes Apache Spark 3.0.1, GPU, Scala 2.12)          │ true  │ true  │ false         │ 7.3 LTS ML                 │ 3.0.1
  5 │ 8.4.x-photon-scala2.12           │ 8.4 Photon (includes Apache Spark 3.1.2, Scala 2.12)               │ false │ false │ true          │ 8.4 Photon                 │ 3.1.2
  6 │ 10.1.x-photon-scala2.12          │ 10.1 Photon (includes Apache Spark 3.2.0, Scala 2.12)              │ false │ false │ true          │ 10.1 Photon                │ 3.2.0
  7 │ 9.1.x-photon-scala2.12           │ 9.1 LTS Photon (includes Apache Spark 3.1.2, Scala 2.12)           │ true  │ false │ true          │ 9.1 LTS Photon             │ 3.1.2
  8 │ 10.2.x-photon-scala2.12          │ 10.2 Photon (includes Apache Spark 3.2.0, Scala 2.12)              │ false │ false │ true          │ 10.2 Photon                │ 3.2.0
  9 │ 8.3.x-scala2.12                  │ 8.3 (includes Apache Spark 3.1.1, Scala 2.12)                      │ false │ false │ false         │ 8.3                        │ 3.1.1
 10 │ 9.0.x-photon-scala2.12           │ 9.0 Photon (includes Apache Spark 3.1.2, Scala 2.12)               │ false │ false │ true          │ 9.0 Photon                 │ 3.1.2
 11 │ 8.4.x-cpu-ml-scala2.12           │ 8.4 ML (includes Apache Spark 3.1.2, Scala 2.12)                   │ false │ true  │ false         │ 8.4 ML                     │ 3.1.2
 12 │ 10.1.x-gpu-ml-scala2.12          │ 10.1 ML (includes Apache Spark 3.2.0, GPU, Scala 2.12)             │ false │ true  │ false         │ 10.1 ML                    │ 3.2.0
 13 │ 9.1.x-scala2.12                  │ 9.1 LTS (includes Apache Spark 3.1.2, Scala 2.12)                  │ true  │ false │ false         │ 9.1 LTS                    │ 3.1.2
 14 │ 10.0.x-cpu-ml-scala2.12          │ 10.0 ML (includes Apache Spark 3.2.0, Scala 2.12)                  │ false │ true  │ false         │ 10.0 ML                    │ 3.2.0
 15 │ 9.0.x-gpu-ml-scala2.12           │ 9.0 ML (includes Apache Spark 3.1.2, GPU, Scala 2.12)              │ false │ true  │ false         │ 9.0 ML                     │ 3.1.2
 16 │ 9.0.x-scala2.12                  │ 9.0 (includes Apache Spark 3.1.2, Scala 2.12)                      │ false │ false │ false         │ 9.0                        │ 3.1.2
 17 │ 8.3.x-cpu-ml-scala2.12           │ 8.3 ML (includes Apache Spark 3.1.1, Scala 2.12)                   │ false │ true  │ false         │ 8.3 ML                     │ 3.1.1
 18 │ 10.1.x-cpu-ml-scala2.12          │ 10.1 ML (includes Apache Spark 3.2.0, Scala 2.12)                  │ false │ true  │ false         │ 10.1 ML                    │ 3.2.0
 19 │ 10.0.x-scala2.12                 │ 10.0 (includes Apache Spark 3.2.0, Scala 2.12)                     │ false │ false │ false         │ 10.0                       │ 3.2.0
 20 │ apache-spark-2.4.x-esr-scala2.11 │ Light 2.4 Extended Support (includes Apache Spark 2.4, Scala 2.11) │ false │ false │ false         │ Light 2.4 Extended Support │ 2.4
 21 │ 10.1.x-scala2.12                 │ 10.1 (includes Apache Spark 3.2.0, Scala 2.12)                     │ false │ false │ false         │ 10.1                       │ 3.2.0
 22 │ 9.1.x-cpu-ml-scala2.12           │ 9.1 LTS ML (includes Apache Spark 3.1.2, Scala 2.12)               │ true  │ true  │ false         │ 9.1 LTS ML                 │ 3.1.2
 23 │ 10.2.x-scala2.12                 │ 10.2 (includes Apache Spark 3.2.0, Scala 2.12)                     │ false │ false │ false         │ 10.2                       │ 3.2.0
 24 │ 10.2.x-cpu-ml-scala2.12          │ 10.2 ML (includes Apache Spark 3.2.0, Scala 2.12)                  │ false │ true  │ false         │ 10.2 ML                    │ 3.2.0
 25 │ 8.3.x-photon-scala2.12           │ 8.3 Photon (includes Apache Spark 3.1.1, Scala 2.12)               │ false │ false │ true          │ 8.3 Photon                 │ 3.1.1
 26 │ 10.0.x-photon-scala2.12          │ 10.0 Photon (includes Apache Spark 3.2.0, Scala 2.12)              │ false │ false │ true          │ 10.0 Photon                │ 3.2.0
 27 │ 10.0.x-gpu-ml-scala2.12          │ 10.0 ML (includes Apache Spark 3.2.0, GPU, Scala 2.12)             │ false │ true  │ false         │ 10.0 ML                    │ 3.2.0
 28 │ 8.4.x-scala2.12                  │ 8.4 (includes Apache Spark 3.1.2, Scala 2.12)                      │ false │ false │ false         │ 8.4                        │ 3.1.2
 29 │ 9.1.x-gpu-ml-scala2.12           │ 9.1 LTS ML (includes Apache Spark 3.1.2, GPU, Scala 2.12)          │ true  │ true  │ false         │ 9.1 LTS ML                 │ 3.1.2
 30 │ apache-spark-2.4.x-scala2.11     │ Light 2.4 (includes Apache Spark 2.4, Scala 2.11)                  │ false │ false │ false         │ Light 2.4                  │ 2.4
 31 │ 7.3.x-scala2.12                  │ 7.3 LTS (includes Apache Spark 3.0.1, Scala 2.12)                  │ true  │ false │ false         │ 7.3 LTS                    │ 3.0.1
 32 │ 8.4.x-gpu-ml-scala2.12           │ 8.4 ML (includes Apache Spark 3.1.2, GPU, Scala 2.12)              │ false │ true  │ false         │ 8.4 ML                     │ 3.1.2
 33 │ 9.0.x-cpu-ml-scala2.12           │ 9.0 ML (includes Apache Spark 3.1.2, Scala 2.12)                   │ false │ true  │ false         │ 9.0 ML                     │ 3.1.2
 34 │ 8.3.x-gpu-ml-scala2.12           │ 8.3 ML (includes Apache Spark 3.1.1, GPU, Scala 2.12)              │ false │ true  │ false         │ 8.3 ML                     │ 3.1.1
────┴──────────────────────────────────┴────────────────────────────────────────────────────────────────────┴───────┴───────┴───────────────┴────────────────────────────┴───────
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your output might look different depending on when you run the command.&lt;/p&gt;

&lt;p&gt;But from this we can now start adding in some filters to get to the records we want. So if I want to find all of the runtimes which are Long Term Support but aren't ML instances I can do the following.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; dbx-runtimes | where isLTS | where isML &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="nv"&gt;$false&lt;/span&gt; | sort-by key
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;───┬────────────────────────┬────────────────────────────────────────────────────────────┬───────┬───────┬───────────────┬──────────────────┬───────
 # │          key           │                            name                            │ isLTS │ isML  │ photonEnabled │     runtime      │ spark
───┼────────────────────────┼────────────────────────────────────────────────────────────┼───────┼───────┼───────────────┼──────────────────┼───────
 0 │ 7.3.x-hls-scala2.12    │ 7.3 LTS Genomics (includes Apache Spark 3.0.1, Scala 2.12) │ true  │ false │ false         │ 7.3 LTS Genomics │ 3.0.1
 1 │ 7.3.x-scala2.12        │ 7.3 LTS (includes Apache Spark 3.0.1, Scala 2.12)          │ true  │ false │ false         │ 7.3 LTS          │ 3.0.1
 2 │ 9.1.x-photon-scala2.12 │ 9.1 LTS Photon (includes Apache Spark 3.1.2, Scala 2.12)   │ true  │ false │ true          │ 9.1 LTS Photon   │ 3.1.2
 3 │ 9.1.x-scala2.12        │ 9.1 LTS (includes Apache Spark 3.1.2, Scala 2.12)          │ true  │ false │ false         │ 9.1 LTS          │ 3.1.2
───┴────────────────────────┴────────────────────────────────────────────────────────────┴───────┴───────┴───────────────┴──────────────────┴───────
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A lot simpler to read, and very easy to now work with. And if I want to save the results I could just add &lt;code&gt;| save runtimes.csv&lt;/code&gt; and I'll have a csv with the same data in it.&lt;/p&gt;

&lt;p&gt;I've done the same with the Databricks cluster node types as well, though that is a lot less complex than the above one, but it makes being able to query for the information a lot simpler. And with Nushell providing great features for filtering, displaying, and getting data, it's a smooth and easy workflow. &lt;/p&gt;

</description>
      <category>nushell</category>
      <category>databricks</category>
      <category>commandline</category>
      <category>nu</category>
    </item>
    <item>
      <title>Azure Bicep - Deploy Function Apps with KeyVault references</title>
      <dc:creator>Darren Fuller</dc:creator>
      <pubDate>Fri, 31 Dec 2021 15:55:00 +0000</pubDate>
      <link>https://dev.to/dazfuller/azure-bicep-deploy-function-apps-with-keyvault-references-36o1</link>
      <guid>https://dev.to/dazfuller/azure-bicep-deploy-function-apps-with-keyvault-references-36o1</guid>
      <description>&lt;p&gt;One of the things you often yourself doing when starting a new project, module, or component, is having to define the infrastructure that it is going to run on. In the world of Azure often what you find is that many will just drop into the portal and create the items that they need directly, this obviously isn't good as a repeatable pattern but works for throw-away tests. Others might be braver and fire up the console to use the &lt;a href="https://docs.microsoft.com/cli/azure/" rel="noopener noreferrer"&gt;Azure CLI&lt;/a&gt; tool, much better for repeating steps if you remember to script it.&lt;/p&gt;

&lt;p&gt;Better yet are those who take that little extra time and generate the environment using templates. Repeatable, consistent, and they give the option of parametrising them so that you can change the scale of components as they're deployed to different environments.&lt;/p&gt;

&lt;p&gt;What I want to do in this post is show doing exactly this, by deploying the following components, but also by setting up managed identities, adding RBAC permissions in, and hooking them up all from the same template. By putting in that little bit extra you can go from zero to fully configured platform in minutes, in a consistent and repeatable way.&lt;/p&gt;

&lt;p&gt;I'm going to do this using &lt;a href="https://docs.microsoft.com/azure/azure-resource-manager/bicep" rel="noopener noreferrer"&gt;Bicep&lt;/a&gt; which is Microsoft's DSL for deploying resources to Azure. Don't be fooled by the 0.4 version number, this is a more than capable language which should take centre stage of your deployment scripts. Unlike ARM templates before it also supports modules, allowing you to break apart your templates without the need for a public end-point to bring them all back together again. But for this post we're only going to need a single template.&lt;/p&gt;

&lt;h2&gt;
  
  
  The infrastructure
&lt;/h2&gt;

&lt;p&gt;So what are we deploying? Well for this post the core of the platform is going to be the Azure Function app, but it's going to need some support. We're also going to assume that the platform is going to collect data from an external service, so the secret we want will be an API key passed in to the template. So the full platform will be:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn1hk8r5en2zgtum99k0x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn1hk8r5en2zgtum99k0x.png" alt="Overview of Infrastructure"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Azure Function App with System Assigned managed identity and app settings for:

&lt;ul&gt;
&lt;li&gt;API Key from KeyVault using KeyVault references&lt;/li&gt;
&lt;li&gt;Storage account name&lt;/li&gt;
&lt;li&gt;Container name&lt;/li&gt;
&lt;li&gt;Key from Application Insights&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Azure Storage Account with container for future use

&lt;ul&gt;
&lt;li&gt;Also with data contributor permissions assigned to the Function App at the storage account level&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Azure KeyVault

&lt;ul&gt;
&lt;li&gt;Deployed using RBAC for authentication instead of using Access Control Lists&lt;/li&gt;
&lt;li&gt;With secret for the API key&lt;/li&gt;
&lt;li&gt;Secret User permissions assigned to the Function App&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Application Insights&lt;/li&gt;

&lt;li&gt;Azure Log Analytics to back the Application Insights instance&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;And whilst we're at it lets make sure we add some basic security steps like HTTPS only traffic and setting TLS 1.2 as the minimum permitted TLS version.&lt;/p&gt;

&lt;h2&gt;
  
  
  Creating the template
&lt;/h2&gt;

&lt;p&gt;The best way to get started with Azure Bicep is to use &lt;a href="https://code.visualstudio.com" rel="noopener noreferrer"&gt;Visual Studio Code&lt;/a&gt; with the Bicep extension which includes language server support and some other nice features. But the language server support giving us highlighting, template outlines, auto-complete etc... is the real nice to have feature.&lt;/p&gt;

&lt;p&gt;The template is available in full on &lt;a href="https://gist.github.com/dazfuller/7a61bf3dbf7be1b977c9fa54e7cdf17d" rel="noopener noreferrer"&gt;Github&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Parameters
&lt;/h3&gt;

&lt;p&gt;For this demo template I want to make 3 things configurable.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The prefix for all of the resources (but defaulted to "demo")&lt;/li&gt;
&lt;li&gt;The Storage Account SKU (defaulted to Standard LRS)&lt;/li&gt;
&lt;li&gt;The API key for the external service&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkc36ilt7d65ge4bz8g1p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkc36ilt7d65ge4bz8g1p.png" alt="Template parameters"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this we have defined the prefix as a string and restrict it's length to a minimum of 3 and a maximum of 6 and default it to "demo". We give the storage account SKU a set of values it needs to be restricted to and default it to "Standard_LRS". Finally we define the API key as a string, but annotate the template to make it a secure parameter. Making this a secure parameter means that we're not going to accidentally leak the value to anyone who can see the template deployment history.&lt;/p&gt;

&lt;h3&gt;
  
  
  Variable
&lt;/h3&gt;

&lt;p&gt;Next up are the variables. The bulk of these are defining things like locations and names. I'm using a naming convention of &lt;code&gt;&amp;lt;prefix&amp;gt;-&amp;lt;service name&amp;gt;-&amp;lt;unique id&amp;gt;&lt;/code&gt;. This lets multiple instances be deployed without (hopefully) hitting any naming conflict issues. The only variation in this is the storage account name as we can't use hyphens, only alpha-numeric characters.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fin45hrw9u8p6yg3issui.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fin45hrw9u8p6yg3issui.png" alt="Template variables"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;storageConnectionString&lt;/code&gt; is used later on in the template and is only defined here so I don't repeat myself later.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;storageBlobDataContributorRole&lt;/code&gt; and &lt;code&gt;keyVaultSecretsUserRole&lt;/code&gt; are the role definition values for the built in &lt;code&gt;Storage Blob Data Contributor&lt;/code&gt; and &lt;code&gt;Key Vault Secrets User&lt;/code&gt; built-in roles which you can find in the &lt;a href="https://docs.microsoft.com/azure/role-based-access-control/built-in-roles" rel="noopener noreferrer"&gt;Azure documentation&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Resources
&lt;/h3&gt;

&lt;p&gt;Right, on to the fun stuff. We've got our parameters and variables, now it's time to put them to good use.&lt;/p&gt;

&lt;h4&gt;
  
  
  Storage account
&lt;/h4&gt;

&lt;p&gt;Azure Storage is typically the beating heart of any platform, and so it's normally the first resource I start with.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

resource storage 'Microsoft.Storage/storageAccounts@2021-06-01' = {
  name: storageAccountName
  location: location
  sku: {
    name: storageSkuName
  }
  kind: 'StorageV2'
  properties: {
    supportsHttpsTrafficOnly: true
    minimumTlsVersion: 'TLS1_2'
    encryption: {
      keySource: 'Microsoft.Storage'
      services: {
        blob: {
          enabled: true
        }
        file: {
          enabled: true
        }
        queue: {
          enabled: true
        }
        table: {
          enabled: true
        }
      }
    }
  }

  resource blobService 'blobServices' = {
    name: 'default'

    resource content 'containers' = {
      name: 'content'
    }
  }
}


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Out of the box here we're setting the minimum TLS version to 1.2 and enabling HTTPS only traffic. There really should be very very few reasons as to why these are not in every template we produce. We're also turning on encryption of data in all storage services by default.&lt;/p&gt;

&lt;p&gt;We have a couple of nested resources which are how we create our containers. In this case I'm creating a container called "content".&lt;/p&gt;

&lt;h4&gt;
  
  
  Log Analytics and Application Insights
&lt;/h4&gt;

&lt;p&gt;Next up we want to deploy an Application Insights instance which will be used by our Azure Function app. It's a couple of years away at the time of writing, but Microsoft will be requiring that all instances are backed by a Log Analytics workspace. This makes a huge amount of sense really as it gives a single place to monitor your application and generate alerts based on health conditions.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

resource logAnalytics 'Microsoft.OperationalInsights/workspaces@2021-06-01' = {
  name: logAnalyticsName
  location: location
  properties: {
    sku: {
      name: 'PerGB2018'
    }
    retentionInDays: 30
    features: {
      enableLogAccessUsingOnlyResourcePermissions: true
    }
    workspaceCapping: {
      dailyQuotaGb: 1
    }
    publicNetworkAccessForIngestion: 'Enabled'
    publicNetworkAccessForQuery: 'Enabled'
  }
}

resource appInsights 'Microsoft.Insights/components@2020-02-02' = {
  name: appInsightsName
  location: location
  kind: 'web'
  properties: {
    Application_Type: 'web'
    publicNetworkAccessForIngestion: 'Enabled'
    publicNetworkAccessForQuery: 'Enabled'
    WorkspaceResourceId: logAnalytics.id
  }
}


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;If you're familiar with ARM templates you might be noticing the lack of &lt;code&gt;dependsOn&lt;/code&gt; entries. This is because in Bicep, where we use a resource reference to get a value (e.g. &lt;code&gt;logAnalytics.id&lt;/code&gt;) this adds an automatic dependency in for us. The use of the &lt;code&gt;id&lt;/code&gt; at this point sets the workspace resource instance backing the Application Insights instance. This is a pretty simple setup of the services and I won't dive in any further at this point.&lt;/p&gt;

&lt;h4&gt;
  
  
  KeyVault
&lt;/h4&gt;

&lt;p&gt;Okay, next up, KeyVault. This is core to accessing secrets on the platform in a secure manner. One of the things we're going to do here is enable the RBAC permissions instead of using Access Control Lists (ACLs). Normally when using ACLs we would give dependent services Get and List permissions over Secrets, with RBAC we can instead use the &lt;code&gt;Key Vault Secret User&lt;/code&gt; built-in role. This makes the management of secrets a lot easier and means we can view permissions in a single place.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

resource kv 'Microsoft.KeyVault/vaults@2021-06-01-preview' = {
  name: keyVaultName
  location: location
  properties: {
    sku: {
      family: 'A'
      name: 'standard'
    }
    tenantId: subscription().tenantId
    enableRbacAuthorization: true
    enabledForDeployment: false
    enabledForDiskEncryption: true
    enabledForTemplateDeployment: false
  }

  resource storageNameSecret 'secrets' = {
    name: 'ExternalServiceApiKey'
    properties: {
      contentType: 'text/plain'
      value: apiKey
    }
  }
}


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The &lt;code&gt;enableRbacAuthorization&lt;/code&gt; setting is key for us to enable RBAC permissions. Otherwise we're setting up a fairly standard instance. We're also create a new secret as a nested resource using our &lt;em&gt;secure&lt;/em&gt; API key parameter. Maybe pick a better secret name than I have, but it works for a demo.&lt;/p&gt;

&lt;h4&gt;
  
  
  Function app
&lt;/h4&gt;

&lt;p&gt;Last up for resources is the Azure Function application.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

resource plan 'Microsoft.Web/serverfarms@2021-02-01' = {
  name: appServicePlanName
  location: location
  kind: 'functionapp'
  sku: {
    name: 'Y1'
  }
  properties: {
  }
}

resource funcApp 'Microsoft.Web/sites@2021-02-01' = {
  name: functionAppName
  location: location
  kind: 'functionapp'
  identity: {
    type: 'SystemAssigned'
  }
  properties: {
    httpsOnly: true
    serverFarmId: plan.id
    siteConfig: {
      ftpsState: 'Disabled'
      minTlsVersion: '1.2'
      netFrameworkVersion: 'v6.0'
      appSettings: [
        {
          name: 'AzureWebJobsStorage'
          value: storageConnectionString
        }
        {
          name: 'WEBSITE_CONTENTAZUREFILECONNECTIONSTRING'
          value: storageConnectionString
        }
        {
          name: 'APPINSIGHTS_INSTRUMENTATIONKEY'
          value: appInsights.properties.InstrumentationKey
        }
        {
          name: 'FUNCTIONS_WORKER_RUNTIME'
          value: 'dotnet'
        }
        {
          name: 'FUNCTIONS_EXTENSION_VERSION'
          value: '~4'
        }
        {
          name: 'ApiKey'
          value: '@Microsoft.KeyVault(VaultName=${kv.name};SecretName=${kv::storageNameSecret.name})'
        }
        {
          name: 'ContentStorageAccount'
          value: storage.name
        }
        {
          name: 'ContentContainer'
          value: storage::blobService::content.name
        }
      ]
    }
  }
}


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;This is deployed under a consumption plan (mostly to keep the template a bit simpler), but it's the &lt;code&gt;appSettings&lt;/code&gt; where things start to get more interesting.&lt;/p&gt;

&lt;p&gt;In here we're creating the &lt;code&gt;AzureWebJobsStorage&lt;/code&gt; and &lt;code&gt;WEBSITE_CONTENTAZUREFILECONNECTIONSTRING&lt;/code&gt; settings using the connection string we defined earlier in the variables section.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;APPINSIGHTS_INSTRUMENTATIONKEY&lt;/code&gt; setting is being set using the instrumentation key of the yet-to-be-deployed Application Insights resource which we're accessing just using dot notation.&lt;/p&gt;

&lt;p&gt;For &lt;code&gt;ApiKey&lt;/code&gt; we're creating our KeyVault reference. We could do this using the URL format, but I'm using this format as I think it's easier to read. I'm also not using the secret version as I want to get the latest key each time. Because the secret itself is a nest resource we have to use the double-semi-colon notation &lt;code&gt;::&lt;/code&gt; to access it's values.&lt;/p&gt;

&lt;h4&gt;
  
  
  Setting up permissions
&lt;/h4&gt;

&lt;p&gt;Having the KeyVault reference is great, but if we deploy this now it will fail. This is because the Function App doesn't have permissions to read secrets, so lets set that up as well.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

resource storageFunctionAppPermissions 'Microsoft.Authorization/roleAssignments@2020-04-01-preview' = {
  name: guid(storage.id, funcApp.name, storageBlobDataContributorRole)
  scope: storage
  properties: {
    principalId: funcApp.identity.principalId
    principalType: 'ServicePrincipal'
    roleDefinitionId: storageBlobDataContributorRole
  }
}

resource kvFunctionAppPermissions 'Microsoft.Authorization/roleAssignments@2020-04-01-preview' = {
  name: guid(kv.id, funcApp.name, keyVaultSecretsUserRole)
  scope: kv
  properties: {
    principalId: funcApp.identity.principalId
    principalType: 'ServicePrincipal'
    roleDefinitionId: keyVaultSecretsUserRole
  }
}


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;There's a lot happening here, so lets break it down a bit.&lt;/p&gt;

&lt;p&gt;We have 2 &lt;code&gt;roleAssignments&lt;/code&gt; resources here. The first is assigning permissions to the storage account so that later on (if we want) the function app can read and write data to the storage account. This isn't needed for our example, but I added it for demonstration purposes. The second role assignment assigns secret user permissions to the Azure Function app.&lt;/p&gt;

&lt;p&gt;Inside of both we're creating a name by generating a guid. This has to be something which is known at deployment time so we can't use values like the principal id as that's only known after the resource deployment. Which values you provide is largely up to you, but I like the format of &lt;code&gt;target/source/role&lt;/code&gt; as it makes sense when reading it back later.&lt;/p&gt;

&lt;p&gt;In the properties we have to say which principal needs access, what role we're giving it, and that it's type is &lt;code&gt;ServicePrincipal&lt;/code&gt; (for our managed identities). This last point is important, the deployment will work without it but might occasionally throw a &lt;code&gt;PrincipalNotFound&lt;/code&gt; error as the identity might not have been created before we try assigning permissions. By adding this property we should be waiting for the identity to be created first.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deploying
&lt;/h2&gt;

&lt;p&gt;This is the easy part of the process. Assuming you've installed the Azure CLI and have bicep installed you can simply run the following (changing the location and resource group name to suit you)&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

az group create &lt;span class="nt"&gt;--name&lt;/span&gt; func-app-demo &lt;span class="nt"&gt;--location&lt;/span&gt; northeurope

az deployment group create &lt;span class="nt"&gt;--resource-group&lt;/span&gt; func-app-demo &lt;span class="nt"&gt;--template-file&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt;&lt;span class="nt"&gt;-app&lt;/span&gt;.bicep &lt;span class="nt"&gt;--parameters&lt;/span&gt; &lt;span class="nv"&gt;apiKey&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;abc123


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The above command skips the storage account SKU and resource prefix so the defaults are used.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdaq7q3scdqhcavyxn35a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdaq7q3scdqhcavyxn35a.png" alt="Deployed resources"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can check that permissions have been assigned by looking in the access control section of the KeyVault instance.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F13bp8p7jo1vds4z2uru4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F13bp8p7jo1vds4z2uru4.png" alt="Assigned permissions"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And we can see the KeyVault reference in the Azure Function configuration.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feu6ycpgv2xjemif4gqzg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feu6ycpgv2xjemif4gqzg.png" alt="KeyVault reference"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And that's it. We can now deploy our functions and make use of the app settings already created. And when we're ready we can take this template and deploy out to all of our other environments.&lt;/p&gt;

&lt;p&gt;If you get intrigued you can always see what the ARM equivalent of the template might have looked like by running the following.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

az bicep build &lt;span class="nt"&gt;-f&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt;&lt;span class="nt"&gt;-app&lt;/span&gt;.bicep


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;This will generate a &lt;code&gt;function-app.json&lt;/code&gt; ARM template file in which you will see of the dependencies added in, and what the shortcuts like &lt;code&gt;listKeys()&lt;/code&gt; have done for us. I'm sure you'll agree that the bicep is much nicer to read and maintain.&lt;/p&gt;

</description>
      <category>azure</category>
      <category>bicep</category>
      <category>infrastructure</category>
    </item>
    <item>
      <title>Databricks and PyODBC - Avoiding another MS repo outage</title>
      <dc:creator>Darren Fuller</dc:creator>
      <pubDate>Sat, 10 Jul 2021 10:21:32 +0000</pubDate>
      <link>https://dev.to/dazfuller/databricks-and-pyodbc-avoiding-another-ms-repo-outage-4oea</link>
      <guid>https://dev.to/dazfuller/databricks-and-pyodbc-avoiding-another-ms-repo-outage-4oea</guid>
      <description>&lt;p&gt;Not so long ago Microsoft suffered an outage with their package repo over at &lt;a href="https://packages.microsoft.com"&gt;https://packages.microsoft.com&lt;/a&gt;. The short version is that it looks as though the &lt;code&gt;pools&lt;/code&gt; directory was removed, forcing Microsoft to synchronize everything back out. Of course the outage started pushing traffic to other end-points which added bottlenecks and it was just a lot of pain for around 26 hours.&lt;/p&gt;

&lt;p&gt;Whilst this affected a lot of devs for a variety of projects, the one I was more affected by was the impact on Databricks environments which were pulling in the Linux ODBC driver for Microsoft SQL Server. The standard init script for doing this relies on installing the packages at the point the cluster is created. If the repo isn't available then the init script takes longer to run, fails, but the cluster still runs; it just breaks anything which needs the driver.&lt;/p&gt;

&lt;p&gt;I wanted to come up with a way to try and prevent this happening again. Whilst it's unlikely to happen again, adding some resiliency is never a bad thing.&lt;/p&gt;

&lt;p&gt;So I created a Databricks notebook which downloads the package and it's dependencies from the Microsoft repo to the cluster. It does this by downloading each required package, when they have all successfully completed it swaps out the current set of files with the new ones. If any of the files can't be downloaded for any reason then the existing files are left intact. The init script uses these downloaded files instead to install the driver. In the event of another outage the local files are not replaced and the cluster will start up as normal with the previously downloaded packages.&lt;/p&gt;

&lt;p&gt;Each time the packages are successfully downloaded the script updates the init script, just in case any more packages have been added or the file locations have been changed.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Download packages
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;previous&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exists&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="n"&gt;shutil&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rmtree&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;previous&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;shutil&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;move&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;current&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;previous&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;shutil&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;move&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;new&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;current&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;except&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Remove any newly downloaded files
&lt;/span&gt;    &lt;span class="n"&gt;shutil&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rmtree&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;next&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The notebook is available on &lt;a href="https://github.com/dazfuller/databricks/tree/main/mssql-odbc-driver"&gt;GitHub&lt;/a&gt; as both an &lt;code&gt;ipynb&lt;/code&gt; and &lt;code&gt;dbc&lt;/code&gt; file. Hopefully someone else finds it useful.&lt;/p&gt;

</description>
      <category>databricks</category>
      <category>spark</category>
      <category>pyodbc</category>
    </item>
    <item>
      <title>Scratching an itch with Spark</title>
      <dc:creator>Darren Fuller</dc:creator>
      <pubDate>Sun, 27 Jun 2021 10:59:18 +0000</pubDate>
      <link>https://dev.to/dazfuller/scratching-an-itch-with-spark-207f</link>
      <guid>https://dev.to/dazfuller/scratching-an-itch-with-spark-207f</guid>
      <description>&lt;p&gt;One of the things I love about Spark is how easy it makes working with various data sources. Once you've got the data out of the source and into a Dataset or DataFrame then you can manipulate that data without thought to the original format. Most of the times this is in CSV, Parquet, JSON, or a few others. But occasionally (or quite a lot of the time it seems if you're me) people will have the bulk of their data in Excel.&lt;/p&gt;

&lt;p&gt;I'm not going to go into why Excel is a bad format for data storage and distribution, you can find plenty of reasons on-line, or you've likely encountered it yourself (Excel interpreting dates anyone?).&lt;/p&gt;

&lt;p&gt;I've been working on a custom data source for Spark recently using the newer DatasourceV2 APIs, which make it easier to write data sources without having to consider RDDs, whilst bringing along a lot of new functionality like predicate push-down.&lt;/p&gt;

&lt;p&gt;At the moment I'm still working through a few kinks, and ensuring that I've got as much code and branch coverage as I can, at which point I'll do a deeper dive into the how as I can't see a lot about how to write your own data sources on-line or off-line. But at the moment the data source supports the following features.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Handling Excel 97-2003, 2010, and OOXML files (thanks to Apache POI)&lt;/li&gt;
&lt;li&gt;Multi-line headers&lt;/li&gt;
&lt;li&gt;Reading data from multiple sheets matching a given regular expression&lt;/li&gt;
&lt;li&gt;Handle glob patterns for reading multiple files&lt;/li&gt;
&lt;li&gt;Outputting the sheet name in the result set (set via an option)&lt;/li&gt;
&lt;li&gt;Schema inference&lt;/li&gt;
&lt;li&gt;Cleaning up column names (removing invalid characters)&lt;/li&gt;
&lt;li&gt;Handle merged cells (repeats data to all cells in the merged region)&lt;/li&gt;
&lt;li&gt;Formula evaluation (for those supported by Apache POI)&lt;/li&gt;
&lt;li&gt;Long name, short name, and helper method for loading data&lt;/li&gt;
&lt;li&gt;Works in Scala, PySpark, and Spark SQL&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F78d0i6i5g2itjg3x9anu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F78d0i6i5g2itjg3x9anu.png" alt="Querying Excel using Spark-SQL" width="800" height="413"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This screenshot is querying a sample workbook I put together which contains formula data, I wanted to do this to make sure that the data was extracted as expected. And thanks to the Apache POI library, most formulae people use should be covered.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5q6qcb1nwcdr5y5mt63g.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5q6qcb1nwcdr5y5mt63g.jpg" alt="The Excel source data" width="740" height="511"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Code coverage is looking pretty good at the moment, but I want to make sure all of the key parts of the data source are covered, as there's a few "quirks" of Excel which I need to make sure are working as expected.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frmsdwy1ywi2b4f09xd3r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frmsdwy1ywi2b4f09xd3r.png" alt="Current code coverage" width="800" height="301"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Hopefully I can jump onto the deeper dive soon, just in case there's anyone else out there crazy enough to want to write their own data source.&lt;/p&gt;

</description>
      <category>scala</category>
      <category>spark</category>
      <category>excel</category>
    </item>
    <item>
      <title>Unit testing your PySpark library</title>
      <dc:creator>Darren Fuller</dc:creator>
      <pubDate>Sun, 28 Mar 2021 10:39:49 +0000</pubDate>
      <link>https://dev.to/dazfuller/unit-testing-your-pyspark-library-e5j</link>
      <guid>https://dev.to/dazfuller/unit-testing-your-pyspark-library-e5j</guid>
      <description>&lt;p&gt;In software development we often unit test our code (hopefully). And code written for &lt;a href="https://spark.apache.org/" rel="noopener noreferrer"&gt;Spark&lt;/a&gt; is no different. So here I want to run through an example of building a small library using &lt;a href="https://spark.apache.org/docs/latest/api/python/index.html" rel="noopener noreferrer"&gt;PySpark&lt;/a&gt; and unit testing it. I'm using &lt;a href="https://code.visualstudio.com/" rel="noopener noreferrer"&gt;Visual Studio Code&lt;/a&gt; as my editor here, mostly because I think it's brilliant, but other editors are available.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building the demo library
&lt;/h2&gt;

&lt;p&gt;I really wanted to avoid the usual "lets add two numbers together" example and use something a little more like something you might come across day-to-day. So, I'm going to create a library with some methods around creating a date dimension.&lt;/p&gt;

&lt;p&gt;A date dimension is something you will tend to come across in any Data Warehouse or Data LakeHouse environment where you're working with dates. And really that's pretty much everywhere! This kind of dimension is typically created to contain information about days of a year which users might want to slice-and-dice data with later on. If you can join your fact tables to a date dimension based on a date key, and then want to see all data in a given month, quarter, last week, over weekends, it gets much easier.&lt;/p&gt;

&lt;p&gt;I'm going to stick with a fairly simple date dimension for brevity, so it will just include the key, basic information about the day, month, and year, and the ISO week number.&lt;/p&gt;

&lt;p&gt;The ISO week number is the kicker here, and it's better explained over on &lt;a href="https://en.wikipedia.org/wiki/ISO_week_date" rel="noopener noreferrer"&gt;Wikipedia&lt;/a&gt; but it gets interesting because, contrary to popular belief, there aren't always 52 weeks in a year (dates are always so much fun). This isn't something for which a function in Spark exists, but fortunately this is a piece of functionality we can get straight out of python, as the datetime class has an &lt;a href="https://docs.python.org/3/library/datetime.html#datetime.date.isocalendar" rel="noopener noreferrer"&gt;isocalendar&lt;/a&gt; function which gives us the ISO values back as a tuple.&lt;/p&gt;

&lt;p&gt;But... We can't just call Python code directly from PySpark (well, we can do, but we shouldn't because it's really bad for performance). So we're going to need to create a user defined function.&lt;/p&gt;

&lt;p&gt;PySpark has gotten a lot easier to work with in recent years, and that's especially true of UDFs with the arrival of &lt;a href="https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.functions.pandas_udf.html" rel="noopener noreferrer"&gt;Pandas UDFs&lt;/a&gt;. If you want to know why then I suggest you read the docs as it's a bit long winded for here, but it means we can write User Defined Functions in Python which are significantly more performant. Still not quite as good as Scala, but when you think about what it needs to do under the hood, they're very impressive.&lt;/p&gt;

&lt;h2&gt;
  
  
  The ISO week number function
&lt;/h2&gt;

&lt;p&gt;Write, lets dive into the code for the User Defined Function then.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_iso_weeknumber&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Union&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;type&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="nf"&gt;type&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;date&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;dt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;isocalendar&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;


&lt;span class="nd"&gt;@pandas_udf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;IntegerType&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;iso_week_of_year&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Series&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Series&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;dt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;apply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;get_iso_weeknumber&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;isnull&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NA&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;I've deliberately broken it into 2 functions here, and I probably could have used some better names but meh. The reason for breaking it down is because of testing. Where possible we want to test custom logic outside of Spark so we don't need to spin up a new session, this means we can write a lot of tests and execute them quickly to check it's working, and that's the purpose of the &lt;code&gt;get_iso_weeknumber&lt;/code&gt; function.&lt;/p&gt;

&lt;p&gt;In the &lt;code&gt;iso_week_of_year&lt;/code&gt; function we're getting a Pandas Series as our input, and we're returning another Pandas Series. This is a Series-to-Series function and the output must be the same length as the input. The code in this function is deliberately brief as we want to keep as much logic out of here as possible, it should really just care about applying the &lt;code&gt;get_iso_weeknumber&lt;/code&gt; function to the input. And that's all it's doing, as long as the input value isn't null.&lt;/p&gt;

&lt;h2&gt;
  
  
  The date dimension
&lt;/h2&gt;

&lt;p&gt;Now that we're able to create our ISO week number we can create a date dimension on demand. There's a few different ways of doing this, but my preferred method is using the &lt;code&gt;SparkSession.range&lt;/code&gt; function using UNIX timestamps.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create_date_dimension&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;spark&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;SparkSession&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;spark&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="nf"&gt;type&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;spark&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;SparkSession&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;A valid SparkSession instance must be provided&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;type&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Start date must be a datetime.datetime object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;type&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;End date must be a datetime.datetime object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Start date must be before the end date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tzinfo&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;combine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;date&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;tzinfo&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;timezone&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;utc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tzinfo&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;end&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;combine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;date&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;tzinfo&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;timezone&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;utc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;end&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nf"&gt;timedelta&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;days&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nf"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;spark&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;24&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
             &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;withColumn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;to_date&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;from_unixtime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
             &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;withColumn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;date_key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;date_format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;yyyyMMdd&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;cast&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;int&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
             &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;withColumn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;day&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;dayofmonth&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
             &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;withColumn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;day_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;date_format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;EEEE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
             &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;withColumn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;day_short_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;date_format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;EEE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
             &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;withColumn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;month&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;month&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
             &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;withColumn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;month_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;date_format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MMMM&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
             &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;withColumn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;month_short_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;date_format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MMM&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
             &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;withColumn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;year&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;year&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
             &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;withColumn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;week_number&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;iso_week_of_year&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;There's a lot of setup in this function, just to make sure we're giving all the right information to create the dimension, but the actual generation is pretty straight forward. You can see on the last line we're using our &lt;code&gt;iso_week_of_year&lt;/code&gt; UDF to get the week number and calling it just like any other Spark function.&lt;/p&gt;

&lt;p&gt;Using the &lt;code&gt;SparkSession.range&lt;/code&gt; function with a step of 86,400 means we're incrementing by 1 day at a time, regardless of any time component we're passing in to the function in the first place.&lt;/p&gt;

&lt;p&gt;Running this gives us a table looking like the following:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

&amp;gt;&amp;gt;&amp;gt; start = datetime(2021, 1, 1)
&amp;gt;&amp;gt;&amp;gt; end = datetime(2021, 1, 10)
&amp;gt;&amp;gt;&amp;gt; create_date_dimension(spark, start, end).show()
+----------+----------+--------+---+---------+--------------+-----+----------+----------------+----+-----------+
|        id|      date|date_key|day| day_name|day_short_name|month|month_name|month_short_name|year|week_number|
+----------+----------+--------+---+---------+--------------+-----+----------+----------------+----+-----------+
|1609459200|2021-01-01|20210101|  1|   Friday|           Fri|    1|   January|             Jan|2021|         53|
|1609545600|2021-01-02|20210102|  2| Saturday|           Sat|    1|   January|             Jan|2021|         53|
|1609632000|2021-01-03|20210103|  3|   Sunday|           Sun|    1|   January|             Jan|2021|         53|
|1609718400|2021-01-04|20210104|  4|   Monday|           Mon|    1|   January|             Jan|2021|          1|
|1609804800|2021-01-05|20210105|  5|  Tuesday|           Tue|    1|   January|             Jan|2021|          1|
|1609891200|2021-01-06|20210106|  6|Wednesday|           Wed|    1|   January|             Jan|2021|          1|
|1609977600|2021-01-07|20210107|  7| Thursday|           Thu|    1|   January|             Jan|2021|          1|
|1610064000|2021-01-08|20210108|  8|   Friday|           Fri|    1|   January|             Jan|2021|          1|
|1610150400|2021-01-09|20210109|  9| Saturday|           Sat|    1|   January|             Jan|2021|          1|
|1610236800|2021-01-10|20210110| 10|   Sunday|           Sun|    1|   January|             Jan|2021|          1|
+----------+----------+--------+---+---------+--------------+-----+----------+----------------+----+-----------+


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;You can see that the first few days of 2021 were part of week 53, and week 1 didn't actually start until the 4th January.&lt;/p&gt;

&lt;h2&gt;
  
  
  Unit testing the code
&lt;/h2&gt;

&lt;p&gt;But the title of this post is about unit testing the code, so how is that done?&lt;/p&gt;

&lt;p&gt;Well we've seen that the ISO week number functions are broken out, so unit testing &lt;code&gt;get_iso_weeknumber&lt;/code&gt; is pretty straight forward and can be tested just like any other Python code.&lt;/p&gt;

&lt;p&gt;Whilst we can test the &lt;code&gt;iso_week_of_year&lt;/code&gt; function we lose the ability to capture any coverage information because of the way Spark calls the function. If you're using Coverage.py then you can add patterns to your &lt;code&gt;.coveragerc&lt;/code&gt; file to exclude these functions like this from your coverage results such as the following.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

[report]
exclude_lines =
    def .* -&amp;gt; pd.Series


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;We're going to test our &lt;code&gt;create_date_dimension&lt;/code&gt; function which will call the &lt;code&gt;iso_week_of_year&lt;/code&gt; function.&lt;/p&gt;

&lt;p&gt;We'll need a Spark Session to test the code with, and the most obvious way of doing that would be like this (I'm using the Python &lt;a href="https://docs.python.org/3/library/unittest.html" rel="noopener noreferrer"&gt;unittest&lt;/a&gt; library here).&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;DateDimensionTests&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;unittest&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TestCase&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_simple_creation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;spark&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;SparkSession&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;appName&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unittest&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;master&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;local[*]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;getOrCreate&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_date_dimension&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;spark&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2021&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="nf"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2021&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;But there's a problem with this. For every single test we're going to be creating a brand new Spark Session, which is going to get expensive. And if we need to change any of the spark configuration, we're going to need to do it in every single test!&lt;/p&gt;

&lt;p&gt;So what's a better way? Well, we can create our own test class based on &lt;code&gt;unittest.TestCase&lt;/code&gt; which does all of the set up for us.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;PySparkTestCase&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;unittest&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TestCase&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;

    &lt;span class="nd"&gt;@classmethod&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;setUpClass&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cls&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;conf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SparkConf&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;conf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setAppName&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;demo-unittests&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;conf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setMaster&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;local[*]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;conf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;spark.sql.session.timezone&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;UTC&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;cls&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;spark&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;SparkSession&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;conf&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;conf&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;getOrCreate&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="nd"&gt;@classmethod&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;tearDownClass&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cls&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;cls&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;spark&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stop&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;tearDown&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;spark&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;catalog&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;clearCache&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;assert_dataframes_equal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;expected&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;actual&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;display_differences&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;diff&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;DataFrame&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;actual&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;subtract&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;expected&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;diff_count&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;diff&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;count&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;diff_count&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;display_differences&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;diff&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;assertEqual&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;diff_count&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;finally&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;diff&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;unpersist&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;What have we got now? Well, we have our &lt;code&gt;setUpClass&lt;/code&gt; method which is doing the work of setting up our Spark Session for us; and we have our &lt;code&gt;tearDownClass&lt;/code&gt; method which clears up the Spark Session once we're done.&lt;/p&gt;

&lt;p&gt;We've also added in a &lt;code&gt;tearDown&lt;/code&gt; method which clears up after each test, in this case it's clearing out the cache, just in case. And there's a new method for comparing data frames which abstracts a lot of repetitive code we might write in our unit tests.&lt;/p&gt;

&lt;p&gt;One of the other things that's different in this setup is the line &lt;code&gt;conf.set("spark.sql.session.timezone", "UTC")&lt;/code&gt;. Timestamps are evil. It's that simple. A date is sort of okay, it represents a point in the calendar so they're nice to work with. But the range we're creating is using seconds from the epoch, and so we're in the world of Timestamps, and that world has Daylight Savings Times and Timezones! And how they work varies based on your OS of choice.&lt;/p&gt;

&lt;p&gt;Running the following on my Windows 10 laptop using Python 3 gives an error of &lt;code&gt;OSError: [Errno 22] Invalid argument&lt;/code&gt;&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;
&lt;span class="nf"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1066&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;14&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;But running it in Ubuntu 20.04 on WSL gives a result of &lt;code&gt;-28502755125.0&lt;/code&gt;. Now, running it again in both, but with timezone information provided.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timezone&lt;/span&gt;
&lt;span class="nf"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1066&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;14&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tzinfo&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;timezone&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;utc&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Well, now they both give an answer of &lt;code&gt;-28502755200.0&lt;/code&gt;. So, setting the spark session timezone to UTC means that it's going to be handling timestamps consistently, regardless of where in the world I am or on what OS I'm running.&lt;/p&gt;

&lt;h3&gt;
  
  
  Writing the tests themselves
&lt;/h3&gt;

&lt;p&gt;Now that we have our test case class, we can write some unit tests, and by putting a number of them under the same test class we're going to use the same Spark Session, which means that we're not going to incur a performance penalty for constantly re-creating it.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;DateDimensionTests&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;PySparkTestCase&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_simple_creation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;expected&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;spark&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_input_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;simple_date_dim.csv&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;header&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;schema&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;DateDimensionTests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_dimension_schema&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_date_dimension&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;spark&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2021&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="nf"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2021&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;assert_dataframes_equal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;expected&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;display_differences&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_simple_range_week53&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;week_53_day_count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="nf"&gt;create_date_dimension&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;spark&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2020&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="nf"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2021&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;31&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
            &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;col&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;week_number&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;53&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;count&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;assertEqual&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;week_53_day_count&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;These can now be executed using the Visual Studio Code test runner...&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftwfrjxpet5r4hm355rv1.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftwfrjxpet5r4hm355rv1.jpg" alt="Tests executed using the VSCode test runner"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;... or from the command line.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

python &lt;span class="nt"&gt;-m&lt;/span&gt; unittest discover &lt;span class="nt"&gt;-v&lt;/span&gt; &lt;span class="nt"&gt;-s&lt;/span&gt; ./tests &lt;span class="nt"&gt;-p&lt;/span&gt; test_&lt;span class="k"&gt;*&lt;/span&gt;.py


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;They're still not as performant as tests which don't require a Spark Session and so, where possible, make sure business logic is tested outside of a Spark Session. But if you need one, then this is definitely a way to go.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;Phew. So this is my first post to &lt;a href="https://dev.to/"&gt;Dev.to&lt;/a&gt;. I've been sat reading content from others for a while and thought it was time I contributed something of my own. Hopefully it helps someone out, or gives someone a few ideas for doing something else.&lt;/p&gt;

&lt;p&gt;If you can think of anything which could be improved, or you have a different way of doing things then let me know, things only improve with collaboration.&lt;/p&gt;

&lt;p&gt;If you want to have a look at the code, or use it in any way (it's MIT licensed), then you can find it over on &lt;a href="https://github.com/dazfuller/example-pyspark-unittesting" rel="noopener noreferrer"&gt;Github&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>python</category>
      <category>spark</category>
      <category>testing</category>
      <category>pyspark</category>
    </item>
  </channel>
</rss>
