<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Daniel Bray</title>
    <description>The latest articles on DEV Community by Daniel Bray (@danielbraysonalake).</description>
    <link>https://dev.to/danielbraysonalake</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F475136%2Ff8c27062-56f7-44ec-805f-20a721231729.jpeg</url>
      <title>DEV Community: Daniel Bray</title>
      <link>https://dev.to/danielbraysonalake</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/danielbraysonalake"/>
    <language>en</language>
    <item>
      <title>Using Shotgun to Find and Limit Indirect Dependencies</title>
      <dc:creator>Daniel Bray</dc:creator>
      <pubDate>Tue, 26 Oct 2021 08:25:16 +0000</pubDate>
      <link>https://dev.to/sonalake/using-shotgun-to-find-and-limit-indirect-dependencies-mdo</link>
      <guid>https://dev.to/sonalake/using-shotgun-to-find-and-limit-indirect-dependencies-mdo</guid>
      <description>&lt;p&gt;On a well-run project, over time, novelty tends to zero, and all good software, if it’s being used at all, will eventually go into maintenance.&lt;/p&gt;

&lt;p&gt;At the beginning of a project, when designs are still coming together, commits are likely to be in many different components of the application. As the application moves into maintenance, then – if the &lt;a href="https://en.wikipedia.org/wiki/SOLID"&gt;&lt;strong&gt;SOLID&lt;/strong&gt;&lt;/a&gt; principles have been applied – one would expect commits to impact on smaller and smaller sections of the codebase. This would indicate that the changes for issue fixes and the like don’t require lots of small changes all over the codebase.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/sonalake/shotgun"&gt;&lt;strong&gt;Shotgun&lt;/strong&gt;&lt;/a&gt; (and its related &lt;a href="https://github.com/sonalake/shotgun-gradle-plugin"&gt;&lt;strong&gt;gradle plugin&lt;/strong&gt;&lt;/a&gt;) is a new tool that can identify overly complex and interdependent elements of your code base that other tools can’t.&lt;/p&gt;

&lt;h2&gt;
  
  
  What does this give me that a cyclomatic complexity rating doesn’t?
&lt;/h2&gt;

&lt;p&gt;We already have tools for calculating the complexity of code (e.g &lt;a href="https://en.wikipedia.org/wiki/Cyclomatic_complexity"&gt;&lt;strong&gt;Cyclomatic Complexity&lt;/strong&gt;&lt;/a&gt;) but they rely on finding the elements of code that talk directly to each other. Shotgun, however, reports on what elements of code are updated at the same time, measuring how coherent your commits are within a code hierarchy.&lt;/p&gt;

&lt;p&gt;For example, if you had an event-based architecture, where services interact through a queue, then you may find that changing one service might require changes in the events being sent, and so on down to the receiving services. A cyclomatic complexity rating wouldn’t take account of this, since the services don’t directly interact with each other. Shotgun, however, will report that when these elements are updated at the same time over and over again.&lt;/p&gt;

&lt;p&gt;The idea is that – if we’ve been doing our job right – then over time the complexity of commits is getting smaller. If it’s not, then it’s likely that “small” commits are touching too many files because the code is overly interdependent.&lt;/p&gt;

&lt;p&gt;The report looks like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--zXW5go03--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://sonalake.com/wp-content/uploads/2021/10/shotgun3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--zXW5go03--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://sonalake.com/wp-content/uploads/2021/10/shotgun3.png" alt="Shotgun complexity"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;With:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  A heatmap showing the complexity of each day’s commits.&lt;/li&gt;
&lt;li&gt;  A list of active commit sets: these are sets of files that are regularly committed in one go.
These file sets have a high inter-dependence.&lt;/li&gt;
&lt;li&gt;  A list of active files: these are single files that are being updated a lot.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can also click on any given day and view the details of the commits for that day.&lt;/p&gt;

&lt;h2&gt;
  
  
  What do I do with this information?
&lt;/h2&gt;

&lt;p&gt;Shotgun tells you if every change you’re making is a big change in lots of different components.&lt;/p&gt;

&lt;p&gt;At the beginning of a project this is normal – you’re just figuring things out. If, after a while, most commits are touching elements all over the codebase, then you need to start thinking about refactoring.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  The active commit sets will point to groups of elements that are committed together a lot. These are highly coherent. If the sets are too big then there’s probably a need to abstract some common behaviours to reduce the interdependence.&lt;/li&gt;
&lt;li&gt;  It’s also worth checking the larger commits for the same issues.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;How is shotgun coherency calculated?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The aim is to derive a score for each commit, that is lower if:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  The commits are limited to small numbers of files.&lt;/li&gt;
&lt;li&gt;  The commits are limited to files in the same package.&lt;/li&gt;
&lt;li&gt;  The commits are limited to files in the same package hierarchy.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The actual score is calculated as follows.&lt;/p&gt;

&lt;p&gt;Imagine a commit like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--IGbVzOJL--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://sonalake.com/wp-content/uploads/2021/10/shotgun1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--IGbVzOJL--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://sonalake.com/wp-content/uploads/2021/10/shotgun1.png" alt="Commit"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The process is a simple:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Ignore any merge commits – we don’t want to risk double-counting, or be at the mercy of whether or not the merge was fast-forwarded.&lt;/li&gt;
&lt;li&gt;  Ignore any files that were deleted – removing code doesn’t add to the complexity of the application.&lt;/li&gt;
&lt;li&gt;  Split the files up into different source sets, e.g.

&lt;ul&gt;
&lt;li&gt;  &lt;code&gt;src/main/java&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;code&gt;srs/main/resources&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;code&gt;src/test/java&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;code&gt;srs/test/resources&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;li&gt;  Build a set of simple directed graphs, where the vertices are the directories and files in the commit.&lt;/li&gt;
&lt;li&gt;  Prune out the roots of these graphs so we’re left with only the common root of the commit. In this example we’d be removing “&lt;code&gt;com.sonalake&lt;/code&gt;”&lt;/li&gt;
&lt;li&gt;  Finally, add up all the edges of the graphs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is the shotgun coherency score.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Where there are multiple commits on a given day, then it is the median score that is used for that day.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Some examples:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A single file commit | Score: 1&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--uBcypXND--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://sonalake.com/wp-content/uploads/2021/10/shotgun2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--uBcypXND--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://sonalake.com/wp-content/uploads/2021/10/shotgun2.png" alt="A single file commit"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Two files in the same directory | Score: 2&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--qEugw-gD--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://sonalake.com/wp-content/uploads/2021/10/shotgun4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--qEugw-gD--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://sonalake.com/wp-content/uploads/2021/10/shotgun4.png" alt="Two files in the same directory"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Two files in the same hierarchy | Score: 3&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--D2DUspVO--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://sonalake.com/wp-content/uploads/2021/10/shotgun6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--D2DUspVO--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://sonalake.com/wp-content/uploads/2021/10/shotgun6.png" alt="Two files in the same hierarchy"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Two files in parallel directories | Score: 4&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--VCAD1LCc--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://sonalake.com/wp-content/uploads/2021/10/shotgun5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--VCAD1LCc--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://sonalake.com/wp-content/uploads/2021/10/shotgun5.png" alt="Two files in parallel directories"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How can I use it?
&lt;/h2&gt;

&lt;p&gt;There is a basic library that comes with a command line: &lt;a href="https://github.com/sonalake/shotgun"&gt;&lt;strong&gt;shotgun&lt;/strong&gt;&lt;/a&gt;; where you can find full details of the configuration parameters; but in short you can define things like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  The different source sets.&lt;/li&gt;
&lt;li&gt;  How small should commits get before they are not included in the home page.&lt;/li&gt;
&lt;li&gt;  The size of the heatmap buckets.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The easiest way to use the tool in a project is to use the &lt;a href="https://github.com/sonalake/shotgun-gradle-plugin/"&gt;&lt;strong&gt;shotgun-gradle-plugin&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Just drop this in as a plugin in your gradle configuration (full details &lt;a href="https://plugins.gradle.org/plugin/com.sonalake.shotgun-gradle-plugin"&gt;&lt;strong&gt;here&lt;/strong&gt;&lt;/a&gt;):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="n"&gt;plugins&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="err"&gt;'&lt;/span&gt;&lt;span class="n"&gt;com&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;sonalake&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;shotgun&lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;gradle&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;plugin&lt;/span&gt;&lt;span class="err"&gt;'&lt;/span&gt; &lt;span class="n"&gt;version&lt;/span&gt; &lt;span class="s"&gt;"1.0.0"&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And configure it appropriately for your project:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="n"&gt;shotgun&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
  &lt;span class="n"&gt;inputDirectory&lt;/span&gt;            &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"$projectDir"&lt;/span&gt;
  &lt;span class="n"&gt;outputFile&lt;/span&gt;                &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;".shotgun/report.htm"&lt;/span&gt;
  &lt;span class="n"&gt;sourceSets&lt;/span&gt;                &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"src/main/java"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
                                &lt;span class="s"&gt;"src/main/resources"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
                                &lt;span class="s"&gt;"src/main/webapp"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
                                &lt;span class="s"&gt;"src/test/java"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
                                &lt;span class="s"&gt;"src/test/resources"&lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
  &lt;span class="n"&gt;minimumCommitInterest&lt;/span&gt;     &lt;span class="o"&gt;=&lt;/span&gt;   &lt;span class="mi"&gt;3&lt;/span&gt;
  &lt;span class="n"&gt;topCommitValueForFileSets&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;   &lt;span class="mi"&gt;10&lt;/span&gt;
  &lt;span class="n"&gt;topCommitValueForFiles&lt;/span&gt;    &lt;span class="o"&gt;=&lt;/span&gt;   &lt;span class="mi"&gt;40&lt;/span&gt;
  &lt;span class="n"&gt;legendLevels&lt;/span&gt;              &lt;span class="o"&gt;=&lt;/span&gt;   &lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;80&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;120&lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Why is it called shotgun?
&lt;/h2&gt;

&lt;p&gt;The purpose of this app is to spot when changes are consistently being applied in a scattergun approach over the entire codebase.&lt;/p&gt;

&lt;p&gt;Also because of &lt;a href="https://www.youtube.com/watch?v=lc7I9NLPt9A"&gt;&lt;strong&gt;this&lt;/strong&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Can I contribute to this project?
&lt;/h2&gt;

&lt;p&gt;We hope this tool is useful to everyone, so we made it public, along with some other tools, libraries and examples, in our &lt;a href="https://github.com/sonalake"&gt;&lt;strong&gt;Sonalake github project&lt;/strong&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you think there are improvements to make, please fork the project and submit them, and we’d be delighted to review and merge them.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Constraint Programming: Solving Sudoku with Choco Solver library</title>
      <dc:creator>Daniel Bray</dc:creator>
      <pubDate>Tue, 27 Apr 2021 09:51:20 +0000</pubDate>
      <link>https://dev.to/sonalake/constraint-programming-solving-sudoku-with-choco-solver-library-3mbj</link>
      <guid>https://dev.to/sonalake/constraint-programming-solving-sudoku-with-choco-solver-library-3mbj</guid>
      <description>&lt;h2&gt;
  
  
  Why solve sudoku?
&lt;/h2&gt;

&lt;p&gt;Enterprise application development is, for the most part, solving one of these types of problems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;em&gt;“Let me create, read, update and delete these things”&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;em&gt;“Do this same task to as many things as possible, as quickly as possible”&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;em&gt;“What’s the best way to allocate the resources I have to do these tasks, if whatever does task X, can’t be used to do task Y?”&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This last problem type is a &lt;a href="https://en.wikipedia.org/wiki/Graph_coloring"&gt;&lt;strong&gt;graph colouring&lt;/strong&gt;&lt;/a&gt; problem, and the nature of these is that solving one of them is much the same as solving another.&lt;/p&gt;

&lt;p&gt;Sudoku is one of these types of problems, but it has very simple rules, so it’s a nice playground to try out different ways to solve graph colouring problems. This post outlines a solution using &lt;a href="https://en.wikipedia.org/wiki/Constraint_programming"&gt;&lt;strong&gt;constraint programming&lt;/strong&gt;&lt;/a&gt; with &lt;a href="https://choco-solver.org/"&gt;&lt;strong&gt;choco solver&lt;/strong&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is constraint programming?
&lt;/h2&gt;

&lt;p&gt;Constraint programming is a paradigm for programming that can be a little unusual the first time you come to it, since it’s completely different to imperative programming.&lt;/p&gt;

&lt;p&gt;In short: you tell the program what problem needs to be solved, but not how to solve that problem.&lt;/p&gt;

&lt;p&gt;What this means in practical terms is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  First you define your variables.

&lt;ul&gt;
&lt;li&gt;  &lt;em&gt;These are my tasks&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;em&gt;These are my workers&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;li&gt;  Then you define the domains in which these variables exist. For example:

&lt;ul&gt;
&lt;li&gt;  &lt;em&gt;This variable has to have the value of 3&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;em&gt;This variable can have any value between 1 and 42&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;em&gt;This variable is a set of between 4 and 10 numbers, that are all taken from a domain running from 1 to 99.&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;li&gt;  Then you define the constraints for these variables, for example:

&lt;ul&gt;
&lt;li&gt;  &lt;em&gt;If one variable has a value, the other variable must have a different value.&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;em&gt;One variable must be the sum/max/min of a few other variables.&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;em&gt;One set must be a compliment of another set.&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;li&gt;  If you’re looking for any solution, then you’re done. If you’re looking for the best solution, then you need to define a set of cost variables that you aim to minimise or maximise. For example:

&lt;ul&gt;
&lt;li&gt;  &lt;em&gt;Find a solution that minimizes the “cost of doing business” variable.&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;em&gt;Find a solution that maximises the “how many messages are being transmitted” variable.&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;li&gt;  Finally, you give this to the framework to solve, and it will use different AI approaches to find solutions to the problem you’ve defined.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Solving Sudoku in Choco Solver
&lt;/h2&gt;

&lt;p&gt;So, to make these ideas more concrete, we’ll use them to solve a simple problem.&lt;/p&gt;

&lt;p&gt;For this example, we will choose the &lt;a href="https://puzzling.stackexchange.com/questions/252/how-do-i-solve-the-worlds-hardest-sudoku"&gt;&lt;strong&gt;world’s hardest sudoku problem&lt;/strong&gt;&lt;/a&gt;, you can find a full example of this in &lt;a href="https://github.com/sonalake/chocosolver-samples"&gt;&lt;strong&gt;Sudoku.java&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--HOImluMD--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://sonalake.com/wp-content/uploads/2021/04/sudoku.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--HOImluMD--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://sonalake.com/wp-content/uploads/2021/04/sudoku.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you don’t know &lt;a href="https://en.wikipedia.org/wiki/Sudoku"&gt;&lt;strong&gt;Sudoku&lt;/strong&gt;&lt;/a&gt;, the rules are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  The grid is a 9 X 9 area of squares.&lt;/li&gt;
&lt;li&gt;  Each square must contain a single number, from 1 to 9.&lt;/li&gt;
&lt;li&gt;  The same number can’t appear in the same row twice.&lt;/li&gt;
&lt;li&gt;  The same number can’t appear in the same column twice.&lt;/li&gt;
&lt;li&gt;  The grid is broken down into 9 distinct sub-grids of 9 squares each. The same number can’t appear in the same sub-grid twice.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s it. If you’re wondering what this has to do with resource usage, you could imagine the numbers represent available channels in a cell tower, and the squares represent the messages that need to be sent. The solution to this problem will tell you how you could send out these messages in an evenly distributed manner, without getting any resource contention on the channels.&lt;/p&gt;

&lt;h3&gt;
  
  
  Getting started with Choco Solver
&lt;/h3&gt;

&lt;p&gt;The complete code for this is available here: &lt;a href="https://github.com/sonalake/chocosolver-samples"&gt;&lt;strong&gt;Sudoku&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Before we do anything else, we must first create an empty &lt;a href="https://choco-solver.org/docs/modeling/"&gt;&lt;strong&gt;model&lt;/strong&gt;&lt;/a&gt; for the game, this is as simple as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;Model&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Model&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"sudoku"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This model is where new variables, constraints and optimizations are created.&lt;/p&gt;

&lt;h3&gt;
  
  
  Defining variables and domains
&lt;/h3&gt;

&lt;p&gt;For the sudoku problem, there are 81 variables, one for each square in our grid.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;IntVar&lt;/span&gt;&lt;span class="o"&gt;[][]&lt;/span&gt; &lt;span class="n"&gt;grid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;IntVar&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="o"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="o"&gt;];&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These come in one of two flavours:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  If the value is not known at the start, then we need to define it with a domain of possible values, in our case, they are values from 1 to 9
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="n"&gt;grid&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="o"&gt;][&lt;/span&gt;&lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;intVar&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="o"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;  If the value is known at the start, then we define it as a simple constant that can’t change. This is, in effect, saying that the variable has a domain of a single value.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="n"&gt;grid&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="o"&gt;][&lt;/span&gt;&lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;intVar&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Define constraints
&lt;/h3&gt;

&lt;p&gt;Once we have the variables, we need to set up their constraints, we have 9 rows, columns, and sub-squares where the values must all be different. We use the &lt;a href="https://choco-solver.org/docs/modeling/intconstraints/"&gt;&lt;strong&gt;allDifferent&lt;/strong&gt;&lt;/a&gt; constraint for this.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
  &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;allDifferent&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;getCellsInRow&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;grid&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;)).&lt;/span&gt;&lt;span class="na"&gt;post&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
  &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;allDifferent&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;getCellsInColumn&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;grid&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;)).&lt;/span&gt;&lt;span class="na"&gt;post&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
  &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;allDifferent&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;getCellsInSquare&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;grid&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;)).&lt;/span&gt;&lt;span class="na"&gt;post&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Solve it
&lt;/h3&gt;

&lt;p&gt;Finally, we solve it as follows.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;Solver&lt;/span&gt; &lt;span class="n"&gt;solver&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getSolver&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="n"&gt;solver&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;solve&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Choco solver will keep looking for solutions until it gives up, because it’s used them all, or because it thinks it won’t find anything better, or because you’ve told it to give up after doing enough work.&lt;/p&gt;

&lt;p&gt;Once this is done you can get at the value that choco solver found for each cell as follows&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="n"&gt;grid&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="o"&gt;][&lt;/span&gt;&lt;span class="n"&gt;column&lt;/span&gt;&lt;span class="o"&gt;].&lt;/span&gt;&lt;span class="na"&gt;getValue&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And that’s it, the world’s hardest sudoku is solved in under a second.&lt;/p&gt;

&lt;h2&gt;
  
  
  So what else can you do with choco solver?
&lt;/h2&gt;

&lt;p&gt;Whatever you want it to, so long as you can turn it into a combinatorics constraint problem.&lt;/p&gt;

&lt;p&gt;For example, here’s a more complex graph colouring program: &lt;a href="https://github.com/sonalake/chocosolver-samples"&gt;&lt;strong&gt;GraphColouring.java&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;What it’s doing is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Given a graph: &lt;em&gt;imagine these are tasks that can’t be done at the same time.&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;  Given you have N possible colours to choose from: &lt;em&gt;imagine these are workers that are available to do these tasks.&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;  Given that no colour can be used more the M times: &lt;em&gt;imagine a worker can only do so many things in a day.&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;  Given you want to use the least number of colours: &lt;em&gt;use the fewest workers required to do this work.&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;  Colour the graph in: &lt;em&gt;suggest a roster for the tasks and workers that uses the least number of people without overworking them.&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Where do we go now?
&lt;/h2&gt;

&lt;p&gt;This post refers to two simple examples of what is possible, but really most combinatorial problems can be solved using this approach.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  For a more formal and complete description of constraint programming, check out &lt;a href="https://arxiv.org/pdf/cs/0602027.pdf"&gt;&lt;strong&gt;Explaining Constraint Programming&lt;/strong&gt;&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  For a nice explanation of how to formalise actual problems, check out lectures 7, 8 and 9 of this AI &lt;a href="https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-034-artificial-intelligence-fall-2010/lecture-videos/"&gt;&lt;strong&gt;lecture series&lt;/strong&gt;&lt;/a&gt; from MIT (actually, all of this series is worth watching)

&lt;ul&gt;
&lt;li&gt;  &lt;a href="https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-034-artificial-intelligence-fall-2010/lecture-videos/lecture-7-constraints-interpreting-line-drawings"&gt;&lt;strong&gt;Lecture 7: Constraints: Interpreting Line Drawings&lt;/strong&gt;&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;a href="https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-034-artificial-intelligence-fall-2010/lecture-videos/lecture-8-constraints-search-domain-reduction"&gt;&lt;strong&gt;Lecture 8: Constraints: Search, Domain Reduction&lt;/strong&gt;&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;a href="https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-034-artificial-intelligence-fall-2010/lecture-videos/lecture-9-constraints-visual-object-recognition"&gt;&lt;strong&gt;Lecture 9: Constraints: Visual Object Recognition&lt;/strong&gt;&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>java</category>
      <category>programming</category>
    </item>
    <item>
      <title>Part 4: Hypothesis Testing of frequency-based samples</title>
      <dc:creator>Daniel Bray</dc:creator>
      <pubDate>Tue, 16 Feb 2021 15:59:19 +0000</pubDate>
      <link>https://dev.to/sonalake/part-4-hypothesis-testing-of-frequency-based-samples-48oi</link>
      <guid>https://dev.to/sonalake/part-4-hypothesis-testing-of-frequency-based-samples-48oi</guid>
      <description>&lt;p&gt;In &lt;a href="https://dev.to/sonalake/an-introduction-to-hypothesis-testing-41ne"&gt;&lt;strong&gt;part one of this series&lt;/strong&gt;&lt;/a&gt;, we introduced the idea of hypothesis testing, along with a full description of the different elements that go into using these tools. It ended with a cheat-sheet to help you choose which test to use based on the kind of data you’re testing.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://dev.to/sonalake/part-2-hypothesis-testing-of-proportion-based-samples-1ik2"&gt;&lt;strong&gt;Part two&lt;/strong&gt;&lt;/a&gt; outlined some code samples for how to perform z-tests on proportion-based samples.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://dev.to/sonalake/part-3-hypothesis-testing-of-mean-based-samples-5c16"&gt;&lt;strong&gt;Part three&lt;/strong&gt;&lt;/a&gt; outlined some code samples for how to perform t-tests on mean-based samples.&lt;/p&gt;

&lt;p&gt;This post will now go into more detail for &lt;strong&gt;frequency-based&lt;/strong&gt; samples.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://sonalake.com/wp-content/uploads/2019/12/hypothesis-testing.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If any of these terms – &lt;em&gt;Null Hypothesis, Alternative Hypothesis, p-value&lt;/em&gt; – are new to you, then I’d suggest reviewing the first part of this series before carrying on with this one..&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;What is a frequency-based sample?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;In these cases we’re interested in checking frequencies, e.g. I’m expecting my result set to have a given distribution: does it?&lt;/p&gt;

&lt;p&gt;Are differences between the distributions of two samples big enough that we should notice it? Are the distributions between variables in a single sample enough to indicate that the variables might depend on each other?&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Requirements for the quality of the sample&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;For these tests the following sampling rules are required:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;

&lt;tbody&gt;

&lt;tr&gt;

&lt;td&gt;&lt;strong&gt;Random&lt;/strong&gt;&lt;/td&gt;

&lt;td&gt;The sample must be a random sample from the entire population&lt;/td&gt;

&lt;/tr&gt;

&lt;tr&gt;

&lt;td&gt;&lt;strong&gt;Normal&lt;/strong&gt;&lt;/td&gt;

&lt;td&gt;The sample must be normal, for these tests either:
&lt;ul&gt;
&lt;li&gt;The underlying population must be normal – this can be tricky – as a population might normally be normal, only to be non-normal the day you sample it 😉&lt;/li&gt;
&lt;li&gt;If you can’t assume the underlying population is normal then you should use a sample size of at least 30 (as per the central limit theorem)&lt;/li&gt;

&lt;/ul&gt;
&lt;/td&gt;

&lt;/tr&gt;

&lt;tr&gt;

&lt;td&gt;&lt;strong&gt;Independent&lt;/strong&gt;&lt;/td&gt;

&lt;td&gt;The sample must be independent – for these tests a good rule of thumb is that the sample size be less than 10% of the total population.&lt;/td&gt;

&lt;/tr&gt;

&lt;/tbody&gt;

&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Tests for mean-based samples&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;All of these code samples are available in &lt;a href="https://bitbucket.org/sonalake/blog-hypothesis-testing"&gt;&lt;strong&gt;this git repository&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Chi-squared quality-of-fit&lt;/strong&gt;
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Compare the counts for some variables in a sample to an expected distribution&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In this test we have an expected distribution of data across a category, and we want to check if the sample matches that.&lt;/p&gt;

&lt;p&gt;For example, suppose a network was sized to have the expected distribution, and a sample observed the following counts&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;

&lt;tbody&gt;

&lt;tr&gt;

&lt;td&gt;&lt;strong&gt;Class of Service&lt;/strong&gt;&lt;/td&gt;

&lt;td&gt;&lt;strong&gt;Expected Distribution&lt;/strong&gt;&lt;/td&gt;

&lt;td&gt;&lt;strong&gt;Observed Count in sample (size 650)&lt;/strong&gt;&lt;/td&gt;

&lt;/tr&gt;

&lt;tr&gt;

&lt;td&gt;A&lt;/td&gt;

&lt;td&gt;5%&lt;/td&gt;

&lt;td&gt;27&lt;/td&gt;

&lt;/tr&gt;

&lt;tr&gt;

&lt;td&gt;B&lt;/td&gt;

&lt;td&gt;10%&lt;/td&gt;

&lt;td&gt;73&lt;/td&gt;

&lt;/tr&gt;

&lt;tr&gt;

&lt;td&gt;C&lt;/td&gt;

&lt;td&gt;15%&lt;/td&gt;

&lt;td&gt;82&lt;/td&gt;

&lt;/tr&gt;

&lt;tr&gt;

&lt;td&gt;D&lt;/td&gt;

&lt;td&gt;70%&lt;/td&gt;

&lt;td&gt;468&lt;/td&gt;

&lt;/tr&gt;

&lt;/tbody&gt;

&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Given a null hypothesis that the distribution is as expected, then the following python code would derive the probability that the sample fits into this expected distribution.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;scipy.stats&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;chisquare&lt;/span&gt;

&lt;span class="c1"&gt;# can we assume anything from our sample
&lt;/span&gt;&lt;span class="n"&gt;significance&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.05&lt;/span&gt;

&lt;span class="c1"&gt;# what do we expect to see in proportions?
&lt;/span&gt;&lt;span class="n"&gt;expected_proportions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[.&lt;/span&gt;&lt;span class="mi"&gt;05&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# what counts did we see in our sample?
&lt;/span&gt;&lt;span class="n"&gt;observed_counts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;27&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;73&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;82&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;468&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;########################
# how big was our sample
&lt;/span&gt;&lt;span class="n"&gt;sample_size&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;observed_counts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# we derive our comparison counts here for  our expected proportions, based on the sample size
&lt;/span&gt;&lt;span class="n"&gt;expected_counts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sample_size&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;expected_proportions&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# Get the stat data
&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chi_stat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p_value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chisquare&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;observed_counts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;expected_counts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# report
&lt;/span&gt;&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'chi_stat: %0.5f, p_value: %0.5f'&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chi_stat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p_value&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;p_value&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;significance&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
   &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Fail to reject the null hypothesis - we have nothing else to say"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
   &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Reject the null hypothesis - suggest the alternative hypothesis is true"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;Chi-squared (homogeneity)&lt;/strong&gt;
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Compare the counts for some variables between two samples&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In this case, the test is similar to the best fit (above) but rather than estimate the expected counts from the expected distribution, the test is comparing two sets of sampled counts to see if their frequencies are different enough to suggest that the underlying populations have different distributions.&lt;/p&gt;

&lt;p&gt;This is, in effect, the same code as above – only in this case we have actual expected values to match, rather than having to estimate them from the sample.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;scipy.stats&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;chisquare&lt;/span&gt;

&lt;span class="c1"&gt;# can we assume anything from our sample
&lt;/span&gt;&lt;span class="n"&gt;significance&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.05&lt;/span&gt;

&lt;span class="c1"&gt;# what counts did we see in our samples?
&lt;/span&gt;&lt;span class="n"&gt;observed_counts_A&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;65&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;97&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;450&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;observed_counts_B&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;27&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;73&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;82&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;468&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;########################
&lt;/span&gt;
&lt;span class="c1"&gt;# Get the stat data
&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chi_stat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p_value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chisquare&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;observed_counts_A&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;observed_counts_B&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# report
&lt;/span&gt;&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'chi_stat: %0.5f, p_value: %0.5f'&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chi_stat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p_value&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;p_value&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;significance&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
   &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Fail to reject the null hypothesis - we have nothing else to say"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
   &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Reject the null hypothesis - suggest the alternative hypothesis is true"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;Chi-squared (independence)&lt;/strong&gt;
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Check single sample to see if the discrete variables are independent&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In this case you have a sample from a population, over two discrete variables, and you want to tell if these two discrete variables have some kind of relationship – or if they are independent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;NOTE:&lt;/strong&gt; this is for &lt;em&gt;discrete&lt;/em&gt; variables (i.e. categories). If you wanted to check if numeric variables are independent you’d want to consider using something like a linear regression.&lt;/p&gt;

&lt;p&gt;Suppose we had a pivot to see how people from different area types (town/country) voted for three different political parties.&lt;/p&gt;

&lt;p&gt;The question we are asking is whether or not we can say whether or not there is likely to be a connection between these two variables (i.e. do town/country people have a strong preference to vote for a given party).&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;

&lt;tbody&gt;

&lt;tr&gt;

&lt;td&gt;&lt;/td&gt;

&lt;td&gt;&lt;em&gt;Party&lt;/em&gt;&lt;/td&gt;

&lt;td&gt;&lt;/td&gt;

&lt;td&gt;&lt;/td&gt;

&lt;/tr&gt;

&lt;tr&gt;

&lt;td&gt;&lt;/td&gt;

&lt;td&gt;&lt;strong&gt;Cocktail Party&lt;/strong&gt;&lt;/td&gt;

&lt;td&gt;&lt;strong&gt;Garden Party&lt;/strong&gt;&lt;/td&gt;

&lt;td&gt;&lt;strong&gt;Mouse Party&lt;/strong&gt;&lt;/td&gt;

&lt;/tr&gt;

&lt;tr&gt;

&lt;td&gt;&lt;em&gt;Voter Type&lt;/em&gt;&lt;/td&gt;

&lt;td&gt;&lt;/td&gt;

&lt;td&gt;&lt;/td&gt;

&lt;td&gt;&lt;/td&gt;

&lt;/tr&gt;

&lt;tr&gt;

&lt;td&gt;&lt;strong&gt;Town&lt;/strong&gt;&lt;/td&gt;

&lt;td&gt;200&lt;/td&gt;

&lt;td&gt;150&lt;/td&gt;

&lt;td&gt;50&lt;/td&gt;

&lt;/tr&gt;

&lt;tr&gt;

&lt;td&gt;&lt;strong&gt;Country&lt;/strong&gt;&lt;/td&gt;

&lt;td&gt;250&lt;/td&gt;

&lt;td&gt;300&lt;/td&gt;

&lt;td&gt;50&lt;/td&gt;

&lt;/tr&gt;

&lt;/tbody&gt;

&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The python code to check this is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;scipy.stats&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;chi2_contingency&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;

&lt;span class="c1"&gt;# can we assume anything from our sample
&lt;/span&gt;&lt;span class="n"&gt;significance&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.05&lt;/span&gt;

&lt;span class="n"&gt;pivot&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
  &lt;span class="c1"&gt;# town votes
&lt;/span&gt;  &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;150&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="c1"&gt;# country votes
&lt;/span&gt;  &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;250&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="c1"&gt;########################
# Get the stat data
&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chi_stat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p_value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;degrees_of_freedom&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;expected&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chi2_contingency&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pivot&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# report
&lt;/span&gt;&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'chi_stat: %0.5f, p_value: %0.5f'&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chi_stat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p_value&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;p_value&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;significance&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Fail to reject the null hypothesis - we have nothing else to say"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Reject the null hypothesis - suggest the alternative hypothesis is true"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  &lt;strong&gt;Where do we go next?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Thank you for reading the final part of our introduction into hypothesis testing. I hope you found it a useful introduction into the world of statistical analysis. If you would like to look deeper into this field, I’d suggest the following.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  I’ve not touched on issues of &lt;a href="https://en.wikipedia.org/wiki/Power_of_a_test"&gt;&lt;strong&gt;power&lt;/strong&gt;&lt;/a&gt; or &lt;a href="https://en.wikipedia.org/wiki/Effect_size"&gt;&lt;strong&gt;effect size&lt;/strong&gt;&lt;/a&gt; in this series. For that I would direct you to Robert Coe’s always worth reading: &lt;a href="https://www.leeds.ac.uk/educol/documents/00002182.htm"&gt;&lt;strong&gt;It’s the effect size, stupid: what effect size is and why it is important&lt;/strong&gt;&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  If you have more complex types of data to examine, then I’d suggest reading more into

&lt;ul&gt;
&lt;li&gt;  &lt;a href="https://en.wikipedia.org/wiki/Analysis_of_variance"&gt;&lt;strong&gt;Analysis Of Variance&lt;/strong&gt;&lt;/a&gt; – for when you have means in more than two sets of groups to compare, and using multiple t-sets would waste your power.&lt;/li&gt;
&lt;li&gt;  &lt;a href="https://en.wikipedia.org/wiki/Linear_regression"&gt;&lt;strong&gt;Linear Regression&lt;/strong&gt;&lt;/a&gt; – for when you want to predict the value of one continuous variable, based on the values of some other continuous value, or just want to see if different continuous variables are, in fact, related.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;li&gt;  If our previous post – &lt;a href="https://sonalake.com/latest/quantitative-analysis-is-as-subjective-as-qualitative-analysis/"&gt;&lt;strong&gt;Quantitative analysis is as subjective as qualitative analysis&lt;/strong&gt;&lt;/a&gt; – is making you doubt whether you can trust stats at all, then check out how &lt;a href="https://en.wikipedia.org/wiki/Meta-analysis"&gt;&lt;strong&gt;meta analysis&lt;/strong&gt;&lt;/a&gt; can be used to collect the results of multiple different analyses, and produce a single overall measure as to whether the underlying tests show a significant interaction.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you would like to know more or have any suggestions, please don’t hesitate to reach out to us!&lt;/p&gt;




&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;PART I: &lt;a href="https://dev.to/sonalake/an-introduction-to-hypothesis-testing-41ne"&gt;An Introduction to Hypothesis Testing&lt;/a&gt;&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;PART II: &lt;a href="https://dev.to/sonalake/part-2-hypothesis-testing-of-proportion-based-samples-1ik2"&gt;Hypothesis Testing of proportion-based samples&lt;/a&gt;&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;PART III: &lt;a href="https://dev.to/sonalake/part-3-hypothesis-testing-of-mean-based-samples-5c16"&gt;Hypothesis Testing of mean-based samples&lt;/a&gt;&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>datascience</category>
      <category>python</category>
    </item>
    <item>
      <title>Part 3: Hypothesis Testing of mean-based samples</title>
      <dc:creator>Daniel Bray</dc:creator>
      <pubDate>Wed, 10 Feb 2021 11:44:52 +0000</pubDate>
      <link>https://dev.to/sonalake/part-3-hypothesis-testing-of-mean-based-samples-5c16</link>
      <guid>https://dev.to/sonalake/part-3-hypothesis-testing-of-mean-based-samples-5c16</guid>
      <description>&lt;p&gt;In &lt;a href="https://dev.to/sonalake/an-introduction-to-hypothesis-testing-41ne"&gt;&lt;strong&gt;part one of this series&lt;/strong&gt;&lt;/a&gt;, we introduced the idea of hypothesis testing, along with a full description of the different elements that go into using these tools. It ended with a cheat-sheet to help you choose which test to use based on the kind of data you’re testing.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://dev.to/sonalake/part-2-hypothesis-testing-of-proportion-based-samples-1ik2"&gt;&lt;strong&gt;Part two&lt;/strong&gt;&lt;/a&gt; outlined some code samples for how to perform z-tests on proportion-based samples.&lt;/p&gt;

&lt;p&gt;This post will now go into more detail for &lt;strong&gt;mean-based&lt;/strong&gt; samples.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://sonalake.com/wp-content/uploads/2019/12/hypothesis-testing.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If any of the terms – &lt;em&gt;Null Hypothesis&lt;/em&gt;, &lt;em&gt;Alternative Hypothesis&lt;/em&gt;, &lt;em&gt;p-value&lt;/em&gt; – are new to you, then &lt;a href="https://dev.to/sonalake/an-introduction-to-hypothesis-testing-41ne"&gt;&lt;strong&gt;I’d suggest reviewing the first part of this series&lt;/strong&gt;&lt;/a&gt; before carrying on with this one.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is a mean-based sample?
&lt;/h2&gt;

&lt;p&gt;In these cases we’re interested in checking the arithmetic mean of some samples. This could be checking if the sample’s mean matches some expected value, or comparing two samples from two different populations, or comparing two samples from the same population, taken before and after some intervention.&lt;/p&gt;

&lt;h2&gt;
  
  
  Requirements for the quality of the sample
&lt;/h2&gt;

&lt;p&gt;For these tests the following sampling rules are required:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;

&lt;tbody&gt;

&lt;tr&gt;

&lt;td&gt;&lt;strong&gt;Random&lt;/strong&gt;&lt;/td&gt;

&lt;td&gt;The sample must be a random sample from the entire population&lt;/td&gt;

&lt;/tr&gt;

&lt;tr&gt;

&lt;td&gt;&lt;strong&gt;Normal&lt;/strong&gt;&lt;/td&gt;

&lt;td&gt;The expected values in the sample must be “big enough” – for these tests a good rule of thumb is that – given the sample size – every variable’s expected count must be at least 5.

&lt;br&gt;

&lt;em&gt;For example: suppose a network was sized to have 5% real time traffic, and 95% best effort messages: this is our expected frequency.&lt;/em&gt;

&lt;br&gt;

&lt;em&gt;A sample size of 50 would mean we would “expect” approximately 2.5 real time traffic messages in this sample – this is less than 5 so the sample would be rejected as not being big enough.&lt;/em&gt;

&lt;/td&gt;

&lt;/tr&gt;

&lt;tr&gt;

&lt;td&gt;&lt;strong&gt;Independent&lt;/strong&gt;&lt;/td&gt;

&lt;td&gt;The sample must be independent – for these tests a good rule of thumb is that the sample size be less than 10% of the total population.&lt;/td&gt;

&lt;/tr&gt;

&lt;/tbody&gt;

&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Tests for f-based samples
&lt;/h2&gt;

&lt;p&gt;All of these code samples are available in &lt;a href="https://bitbucket.org/sonalake/blog-hypothesis-testing"&gt;&lt;strong&gt;this git repository&lt;/strong&gt;&lt;/a&gt; They use the common &lt;a href="https://www.statsmodels.org/stable/index.html"&gt;&lt;strong&gt;statsmodels&lt;/strong&gt;&lt;/a&gt; library to perform the tests.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;1-sample t-test&lt;/strong&gt;
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Compare the proportion in a sample to an expected value&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Here we have a – defined by a mean – and we want to see if we can make some assertion about whether the overall mean the underlying population is greater than / less than / different to some expected mean.&lt;/p&gt;

&lt;p&gt;So, in this example, suppose we want to sample a call centre to check if the average call time is more than 2 minutes.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Our null hypothesis is: &lt;em&gt;the mean call time is exactly 2 minutes&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;  Our alternative hypothesis is: &lt;em&gt;the mean call time is more than 2 minutes&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;  From one population we sampled 500 calls, and found a mean call time of 122 seconds, with a standard deviation of 73 seconds&lt;/li&gt;
&lt;li&gt;  We use a 1-sample t test to check if sample allows us to accept or reject the null hypothesis&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To calculate the p-value in Python:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;scipy.stats&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;truncnorm&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;statsmodels.stats.weightstats&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DescrStatsW&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;stats&lt;/span&gt;
&lt;span class="c1"&gt;# can we assume anything from our sample?
&lt;/span&gt;&lt;span class="n"&gt;significance&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.025&lt;/span&gt;
&lt;span class="c1"&gt;# we're checking if calls can be resolved in over 2 minutes
# so Ho == 120 seconds
&lt;/span&gt;&lt;span class="n"&gt;null_hypothesis&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;120&lt;/span&gt;
&lt;span class="c1"&gt;# Normally, in the real world, you would process an entire sample (i.e. sample_a)
# But for this test, we'll generate a sample from this shape, wherE:
# - min/max is the range of available options
# - sample mean/dev are used to define the normal distribution
# - size is how large the sample will be
&lt;/span&gt;&lt;span class="nb"&gt;min&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;max&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sample_mean_a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sample_dev_a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sample_size_a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;121&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;########################
# here - for our test - we're generating a random string of durations to be our sample
# these are in a normal distribution between min/max, normalised around the mean
&lt;/span&gt;&lt;span class="n"&gt;sample_a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;truncnorm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;min&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;sample_mean_a&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;sample_dev_a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;max&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;sample_mean_a&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;sample_dev_a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;loc&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;sample_mean_a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;scale&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;sample_dev_a&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;rvs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sample_size_a&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Get the stat data
&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t_stat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p_value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;degree_of_freedom&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sample_a&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;ttest_mean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;null_hypothesis&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'larger'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# report
&lt;/span&gt;&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'t_stat: %0.3f, p_value: %0.3f'&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t_stat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p_value&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;p_value&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;significance&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Fail to reject the null hypothesis - we have nothing else to say"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Reject the null hypothesis - suggest the alternative hypothesis is true"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;2-sample independent t-test&lt;/strong&gt;
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Compare the mean of the samples from 2 different populations&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Here we have two samples – taken from two different populations – defined by a mean – and we want to see if we can make some assertion about whether the overall means of one the underlying populations is greater than / less than / different to the other.&lt;/p&gt;

&lt;p&gt;So, in this example, suppose we want to compare two different call centres to see how their call times relate to each other.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  We have two samples – A and B: our null hypothesis is: &lt;em&gt;the means from the two populations are the same&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;  Our alternative hypothesis is: &lt;em&gt;the means from the population A &amp;gt; mean from population B&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;  From one population we sampled 500 calls, and found a mean call time of 121 seconds, with a standard deviation of 56 seconds.&lt;/li&gt;
&lt;li&gt;  From the other population we sampled 500 calls, and found a mean call time of 125 seconds, with a standard deviation of 16 seconds&lt;/li&gt;
&lt;li&gt;  We use a 2-sample independent t-test to check if sample allows us to accept or reject the null hypothesis&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To calculate the p-value in Python:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;scipy.stats&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;truncnorm&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;statsmodels.stats.weightstats&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ttest_ind&lt;/span&gt;
&lt;span class="c1"&gt;# can we assume anything from our sample?
&lt;/span&gt;&lt;span class="n"&gt;significance&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.025&lt;/span&gt;
&lt;span class="c1"&gt;# we're checking if calls can be resolved in over 2 minutes
# so Ho == 120 seconds
&lt;/span&gt;&lt;span class="n"&gt;null_hypothesis&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;120&lt;/span&gt;
&lt;span class="c1"&gt;# Normally, in the real world, you would process an entire sample (i.e. sample_a)
# But for this test, we'll generate a sample from this shape, wherE:
# - min/max is the range of available options
# - sample mean/dev are used to define the normal distribution
# - size is how large the sample will be
&lt;/span&gt;&lt;span class="nb"&gt;min&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;max&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;sample_mean_v1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sample_dev_v1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sample_size_v1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;121&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;56&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;sample_mean_v2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sample_dev_v2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sample_size_v2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;125&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;########################
# here - for our test - we're generating a random string of durations to be our sample
# these are in a normal distribution between min/max, normalised around the mean
&lt;/span&gt;&lt;span class="n"&gt;sample_v1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;truncnorm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;min&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;sample_mean_v1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;sample_dev_v1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;max&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;sample_mean_v1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;sample_dev_v1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="n"&gt;loc&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;sample_mean_v1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="n"&gt;scale&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;sample_dev_v1&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;rvs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sample_size_v1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;sample_v2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;truncnorm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;min&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;sample_mean_v2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;sample_dev_v2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;max&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;sample_mean_v2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;sample_dev_v2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="n"&gt;loc&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;sample_mean_v2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="n"&gt;scale&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;sample_dev_v2&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;rvs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sample_size_v2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Get the stat data
# note that we're comparing V2 to V1 - so the sample we expect to be larger goes first here
&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t_stat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p_value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;degree_of_freedom&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ttest_ind&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sample_v2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sample_v1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;alternative&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'larger'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# report
&lt;/span&gt;&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'t_stat: %0.3f, p_value: %0.3f'&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t_stat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p_value&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;p_value&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;significance&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
 &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Fail to reject the null hypothesis - we have nothing else to say"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
 &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Reject the null hypothesis - suggest the alternative hypothesis is true"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;2-sample paired t-test&lt;/strong&gt;
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Compare the mean of two samples from the same population&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Here we have two samples – taken from the &lt;strong&gt;same&lt;/strong&gt; population – defined by a mean – and we want to see if we can make some assertion about whether the mean of the underlying population in the second sample is greater than / less than / different to how it was in the first.&lt;/p&gt;

&lt;p&gt;So, in this example, suppose we have made some code change and it looks like it has slowed things down, and so we want to sample the performance from before and after the change, to see if things have really slowed down.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  We have two samples – A and B: our null hypothesis is: &lt;em&gt;the means from the two populations are the same&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;  Our alternative hypothesis is: &lt;em&gt;the means from the population A &amp;gt; mean from population B&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;  Before the change, we sampled 500 events from the population, and found a mean processing time of 121 milliseconds, with a standard deviation of 56 milliseconds.&lt;/li&gt;
&lt;li&gt;  After the change, we sampled 500 events from the population, and found a mean processing time of 128 milliseconds, with a standard deviation of 16 milliseconds.&lt;/li&gt;
&lt;li&gt;  We use a 2-sample paired t-test to check if sample allows us to accept or reject the null hypothesis&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;NOTE: in this case it is assumed that the same elements have been sampled multiple times. So, this is, in effect, a 1-sample t test on the differences between the two samples with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Null hypothesis: difference is 0&lt;/li&gt;
&lt;li&gt;  Alternative hypothesis: difference is greater than 0&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To calculate the p-value in Python:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;scipy.stats&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;truncnorm&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;statsmodels.stats.weightstats&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DescrStatsW&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;stats&lt;/span&gt;
&lt;span class="c1"&gt;# can we assume anything from our sample?
&lt;/span&gt;&lt;span class="n"&gt;significance&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.05&lt;/span&gt;
&lt;span class="c1"&gt;# we're checking if calls can be resolved in over 2 minutes
# so Ho == 120 seconds
&lt;/span&gt;&lt;span class="n"&gt;null_hypothesis&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;120&lt;/span&gt;
&lt;span class="c1"&gt;# Normally, in the real world, you would process an entire sample (i.e. sample_a)
# But for this test, we'll generate a sample from this shape, wherE:
# - min/max is the range of available options
# - sample mean/dev are used to define the normal distribution
# - size is how large the sample will be
&lt;/span&gt;&lt;span class="nb"&gt;min&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;max&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;sample_mean_v1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sample_dev_v1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sample_size_v1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;121&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;56&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;sample_mean_v2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sample_dev_v2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sample_size_v2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;125&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;########################
# here - for our test - we're generating a random string of durations to be our sample
# these are in a normal distribution between min/max, normalised around the mean
&lt;/span&gt;&lt;span class="n"&gt;sample_v1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;truncnorm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
 &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;min&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;sample_mean_v1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;sample_dev_v1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;max&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;sample_mean_v1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;sample_dev_v1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="n"&gt;loc&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;sample_mean_v1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="n"&gt;scale&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;sample_dev_v1&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;rvs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sample_size_v1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;sample_v2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;truncnorm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
 &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;min&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;sample_mean_v2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;sample_dev_v2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;max&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;sample_mean_v2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;sample_dev_v2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="n"&gt;loc&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;sample_mean_v2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="n"&gt;scale&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;sample_dev_v2&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;rvs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sample_size_v2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Get the stat data
# note that this is, in effect, a sample t-test on the differences
# we want to see if v2 is slower than V1 so we get the differences and check the probability that they
# are larger than the null hypothesis here (of the default = 0.0)
&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t_stat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p_value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;degree_of_freedom&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sample_v2&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;sample_v1&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;ttest_mean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;alternative&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'larger'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# report
&lt;/span&gt;&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'t_stat: %0.5f, p_value: %0.5f'&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t_stat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p_value&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;p_value&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;significance&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Fail to reject the null hypothesis - we have nothing else to say"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Reject the null hypothesis - suggest the alternative hypothesis is true"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the next post I will focus on testing of frequency-based samples.&lt;/p&gt;




&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;PART I: &lt;a href="https://dev.to/sonalake/an-introduction-to-hypothesis-testing-41ne"&gt;An Introduction to Hypothesis Testing&lt;/a&gt;&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;PART II: &lt;a href="https://dev.to/sonalake/part-2-hypothesis-testing-of-proportion-based-samples-1ik2"&gt;Hypothesis Testing of proportion-based samples&lt;/a&gt;&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;PART IV: &lt;a href="https://dev.to/sonalake/part-4-hypothesis-testing-of-frequency-based-samples-48oi"&gt;Hypothesis Testing of frequency-based samples&lt;/a&gt;&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>datascience</category>
      <category>python</category>
    </item>
    <item>
      <title>Part 2: Hypothesis Testing of proportion-based samples</title>
      <dc:creator>Daniel Bray</dc:creator>
      <pubDate>Wed, 03 Feb 2021 14:26:09 +0000</pubDate>
      <link>https://dev.to/sonalake/part-2-hypothesis-testing-of-proportion-based-samples-1ik2</link>
      <guid>https://dev.to/sonalake/part-2-hypothesis-testing-of-proportion-based-samples-1ik2</guid>
      <description>&lt;p&gt;In &lt;a href="https://dev.to/sonalake/an-introduction-to-hypothesis-testing-41ne"&gt;&lt;strong&gt;part one of this series&lt;/strong&gt;&lt;/a&gt;, I introduced the concept of hypothesis testing, and described the different elements that go into using the various tests. It ended with a cheat-sheet to help you choose which test to use based on the kind of data you’re testing.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--sBrmv9ut--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://sonalake.com/wp-content/uploads/2019/12/hypothesis-testing.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--sBrmv9ut--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://sonalake.com/wp-content/uploads/2019/12/hypothesis-testing.png" alt="hypothesis-testing" width="880" height="495"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this second post I will go into more detail on &lt;strong&gt;proportion-based&lt;/strong&gt; samples.&lt;/p&gt;

&lt;p&gt;If any of the terms &lt;em&gt;Null Hypothesis&lt;/em&gt;, &lt;em&gt;Alternative Hypothesis&lt;/em&gt;, &lt;em&gt;p-value&lt;/em&gt; are new to you, I’d suggest reviewing the first part of this series before moving on.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;What is a proportion-based sample?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;In these cases we’re interested in checking proportions. For example 17% of a sample matches some profile, and the rest does not. This could be a test comparing a single sample against some expected value, or comparing two different samples.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; These tests are only valid when there are &lt;strong&gt;only two&lt;/strong&gt; possible options; and if the probability of one option is &lt;em&gt;&lt;strong&gt;p&lt;/strong&gt;&lt;/em&gt;, then the probability of the other must be &lt;em&gt;&lt;strong&gt;(1 – p)&lt;/strong&gt;&lt;/em&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Requirements for the quality of the sample&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;For these tests the following sampling rules are required:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;

&lt;tbody&gt;

&lt;tr&gt;

&lt;td&gt;&lt;strong&gt;Random&lt;/strong&gt;&lt;/td&gt;

&lt;td&gt;The sample must be a random sample from the entire population&lt;/td&gt;

&lt;/tr&gt;

&lt;tr&gt;

&lt;td&gt;&lt;strong&gt;Normal&lt;/strong&gt;&lt;/td&gt;

&lt;td&gt;The sample must reflect the distribution of the underlying population. For these tests a good rule of thumb is that:
&lt;ul&gt;
&lt;li&gt;Given a sample size of &lt;strong&gt;n&lt;strong&gt;&lt;/strong&gt;&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Given a sample proportion of &lt;strong&gt;p&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Then both &lt;strong&gt;np&lt;/strong&gt; and &lt;strong&gt;n(1-p)&lt;/strong&gt; must be at least &lt;strong&gt;10&lt;/strong&gt;
&lt;/li&gt;

&lt;em&gt;For example: if a sample finds that 80% of issues were resolved in 5 days, and 20% were not, then that sample must have at least 10 issues resolved within 5 days, and at least 10 issues resolved in more than 5 days.&lt;/em&gt;

&lt;/ul&gt;
&lt;/td&gt;

&lt;/tr&gt;

&lt;tr&gt;

&lt;td&gt;&lt;strong&gt;Independent&lt;/strong&gt;&lt;/td&gt;

&lt;td&gt;The sample must be independent – for these tests, a good rule of thumb is that the sample size is less than 10% of the total population.&lt;/td&gt;

&lt;/tr&gt;

&lt;/tbody&gt;

&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Code Samples for Proportion-based Tests&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Note that all of these code samples are &lt;a href="https://github.com/sonalake/blog-hypothesis-testing"&gt;&lt;strong&gt;available on Github&lt;/strong&gt;&lt;/a&gt;. They use the popular &lt;a href="https://www.statsmodels.org/stable/index.html"&gt;&lt;strong&gt;statsmodels&lt;/strong&gt;&lt;/a&gt; library to perform the tests.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;1-sample z-test&lt;/strong&gt;
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Compare the proportion in a sample to an expected value&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Here we have a sample and we want to see if some proportion of that sample is greater than/less than/different to some expected test value.&lt;/p&gt;

&lt;p&gt;In this example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  We expect more than 80% of the tests to pass, so our null hypothesis is: &lt;em&gt;80% of the tests pass&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;  Our alternative hypothesis is: &lt;em&gt;more than 80% of the tests pass&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;  We sampled 500 tests, and found 410 passed&lt;/li&gt;
&lt;li&gt;  We use a 1-sample z-test to check if the sample allows us to accept or reject the null hypothesis&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To calculate the p-value in Python:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;statsmodels.stats.proportion&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;proportions_ztest&lt;/span&gt;

&lt;span class="c1"&gt;# can we assume anything from our sample
&lt;/span&gt;&lt;span class="n"&gt;significance&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.05&lt;/span&gt;

&lt;span class="c1"&gt;# our sample - 82% are good
&lt;/span&gt;&lt;span class="n"&gt;sample_success&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;410&lt;/span&gt;
&lt;span class="n"&gt;sample_size&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;

&lt;span class="c1"&gt;# our Ho is  80%
&lt;/span&gt;&lt;span class="n"&gt;null_hypothesis&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.80&lt;/span&gt;

&lt;span class="c1"&gt;# check our sample against Ho for Ha &amp;gt; Ho
# for Ha &amp;lt; Ho use alternative='smaller'
# for Ha != Ho use alternative='two-sided'
&lt;/span&gt;&lt;span class="n"&gt;stat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p_value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;proportions_ztest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;sample_success&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;nobs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;sample_size&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;null_hypothesis&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;alternative&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'larger'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# report
&lt;/span&gt;&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'z_stat: %0.3f, p_value: %0.3f'&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p_value&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;p_value&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;significance&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
   &lt;span class="k"&gt;print&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Fail to reject the null hypothesis - we have nothing else to say"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
   &lt;span class="k"&gt;print&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Reject the null hypothesis - suggest the alternative hypothesis is true"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;2-sample z-test&lt;/strong&gt;
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Compare the proportions between 2 samples&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Here we have two samples, defined by a proportion, and we want to see if we can make an assertion about whether the overall proportions of one of the underlying populations is greater than / less than / different to the other.&lt;/p&gt;

&lt;p&gt;In this example, we want to compare two different populations to see how their tests relate to each other:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  We have two samples – A and B. Our null hypothesis is that &lt;em&gt;the proportions from the two populations are the same&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;  Our alternative hypothesis is that &lt;em&gt;the proportions from the two populations are different&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;  From one population we sampled 500 tests and found 410 passed&lt;/li&gt;
&lt;li&gt;  From the other population, we sampled 400 tests and found 379 passed&lt;/li&gt;
&lt;li&gt;  We use a 2-sample z-test to check if the sample allows us to accept or reject the null hypothesis&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To calculate the p-value in Python:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;statsmodels.stats.proportion&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;proportions_ztest&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;

&lt;span class="c1"&gt;# can we assume anything from our sample
&lt;/span&gt;&lt;span class="n"&gt;significance&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.025&lt;/span&gt;

&lt;span class="c1"&gt;# our samples - 82% are good in one, and ~79% are good in the other
# note - the samples do not need to be the same size
&lt;/span&gt;&lt;span class="n"&gt;sample_success_a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sample_size_a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;410&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;sample_success_b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sample_size_b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;379&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;400&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# check our sample against Ho for Ha != Ho
&lt;/span&gt;&lt;span class="n"&gt;successes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;sample_success_a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sample_success_b&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;samples&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;sample_size_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sample_size_b&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="c1"&gt;# note, no need for a Ho value here - it's derived from the other parameters
&lt;/span&gt;&lt;span class="n"&gt;stat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p_value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;proportions_ztest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;successes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;nobs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;samples&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="n"&gt;alternative&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'two-sided'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# report
&lt;/span&gt;&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'z_stat: %0.3f, p_value: %0.3f'&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p_value&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;p_value&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;significance&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
   &lt;span class="k"&gt;print&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Fail to reject the null hypothesis - we have nothing else to say"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
   &lt;span class="k"&gt;print&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Reject the null hypothesis - suggest the alternative hypothesis is true"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the next post I will focus on hypothesis testing mean-based samples.&lt;/p&gt;




&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;PART I: &lt;a href="https://dev.to/sonalake/an-introduction-to-hypothesis-testing-41ne"&gt;An Introduction to Hypothesis Testing&lt;/a&gt;&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;PART III: &lt;a href="https://dev.to/sonalake/part-3-hypothesis-testing-of-mean-based-samples-5c16"&gt;Hypothesis Testing of mean-based samples&lt;/a&gt;&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;PART IV: &lt;a href="https://dev.to/sonalake/part-4-hypothesis-testing-of-frequency-based-samples-48oi"&gt;Hypothesis Testing of frequency-based samples&lt;/a&gt;&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>datascience</category>
      <category>python</category>
    </item>
    <item>
      <title>An Introduction to Hypothesis Testing</title>
      <dc:creator>Daniel Bray</dc:creator>
      <pubDate>Thu, 21 Jan 2021 22:11:28 +0000</pubDate>
      <link>https://dev.to/sonalake/an-introduction-to-hypothesis-testing-41ne</link>
      <guid>https://dev.to/sonalake/an-introduction-to-hypothesis-testing-41ne</guid>
      <description>&lt;p&gt;As part of the ongoing development of our &lt;a href="https://sonalake.com/solutions/visual-analytics/"&gt;&lt;strong&gt;VisiMetrix&lt;/strong&gt;&lt;/a&gt; platform we are faced with the need to make decisions about how best to analyse massive datasets. We want to help users make decisions when looking at data. Sometimes though it’s too expensive to check all the data or it’s so complicated that it’s easy to make an incorrect assumption and be led away in the wrong direction. &lt;/p&gt;

&lt;p&gt;In cases like this, hypothesis testing can help by providing a degree of confidence that either our observations are real, or the changes we’ve made have, in fact, made a difference. &lt;/p&gt;

&lt;p&gt;In cases where a complete examination of the underlying data set is impossible – perhaps all the data is not yet available or is simply too expensive to process all of it – we have found the following statistical tests to be very helpful.&lt;/p&gt;

&lt;h1&gt;
  
  
  &lt;strong&gt;1-Sample Z-Test&lt;/strong&gt;
&lt;/h1&gt;

&lt;p&gt;VisiMetrix monitors large telecom networks, and in some cases, its data will suggest that new software or hardware elements should be added to the network to improve overall performance. Since changing telecom networks is costly, we need to determine whether this change would be worthwhile by verifying that a sizeable proportion of the underlying traffic matches a well-defined profile. Unfortunately, checking such vast quantities of data is extremely compute and time-intensive. &lt;/p&gt;

&lt;p&gt;In cases like this, a test known as the &lt;a href="https://www.statisticshowto.datasciencecentral.com/one-sample-z-test/"&gt;&lt;strong&gt;1-sample Z-test&lt;/strong&gt;&lt;/a&gt; can be applied to a sample of the data to determine if the network infrastructure change is, in fact, worthwhile implementing.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--ps3_KO_Z--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://sonalake.com/wp-content/uploads/2019/12/hypothesis-testing-1.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ps3_KO_Z--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://sonalake.com/wp-content/uploads/2019/12/hypothesis-testing-1.jpg" alt="hypothesis-testing-1" width="880" height="293"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  &lt;strong&gt;2-Sample Paired T-Test&lt;/strong&gt;
&lt;/h1&gt;

&lt;p&gt;When VisiMetrix draws the attention of a telco’s operations team to a history of PDP creation (user connectivity) errors, they will often apply a configuration change to their underlying network to correct this. However, since things like PDP creation errors are, for the most part, rare, it can be a challenge to validate that a configuration change has, in fact, corrected connection failures for real end-customers. &lt;/p&gt;

&lt;p&gt;In cases like this, a &lt;a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5579465/"&gt;&lt;strong&gt;2-sample paired t-test&lt;/strong&gt;&lt;/a&gt; can be applied to samples taken before and after the configuration changes to confirm that any reduction in errors was, in fact, real, and not just a random artefact of the data. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--N_rAipHg--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://sonalake.com/wp-content/uploads/2019/12/hypothesis-testing-2.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--N_rAipHg--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://sonalake.com/wp-content/uploads/2019/12/hypothesis-testing-2.jpg" alt="hypothesis-testing-2" width="880" height="372"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  &lt;strong&gt;Chi-Square Goodness of Fit&lt;/strong&gt;
&lt;/h1&gt;

&lt;p&gt;When a telco is planning new hardware deployments, they can use information from their monitoring infrastructure to understand the pre-upgrade state of the network. Looking beyond that they have to make some assumptions about traffic patterns as far as 2-3 years in the future. &lt;/p&gt;

&lt;p&gt;1. Expected future event volumes &lt;br&gt;
 2. Expected distribution for each event type &lt;/p&gt;

&lt;p&gt;They will use these predicted volumes to dimension new hardware, network, and other infrastructure. Once deployed, it is critical to validate these sizing assumptions early on. The challenge, however, is that the traffic soon after an upgrade will be nowhere near the upper limit of what was sized so it would be difficult to tell whether or not the upgrades will be able to support the predicted traffic volumes in the coming years. &lt;/p&gt;

&lt;p&gt;The challenge here is to validate the dimensioning assumptions in advance of peak traffic. Using the fact that the &lt;em&gt;proportions&lt;/em&gt; of event types should not differ significantly pre and post upgrade we can apply a &lt;a href="http://www.stat.yale.edu/Courses/1997-98/101/chigf.htm"&gt;&lt;strong&gt;Chi-Square Goodness of Fit&lt;/strong&gt;&lt;/a&gt; test to the initially limited production data and used to confirm that the observed distribution is as-expected.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--yNnWnvnu--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://sonalake.com/wp-content/uploads/2019/12/hypothesis-testing-chi.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--yNnWnvnu--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://sonalake.com/wp-content/uploads/2019/12/hypothesis-testing-chi.jpg" alt="hypothesis-testing-chi" width="880" height="360"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once we know this, we can be confident that the deployed hardware will support the eventual load. This test is performed regularly, to catch any changes in user behaviour over time that might affect the proportions. &lt;/p&gt;

&lt;p&gt;The purpose of this series of blog posts is to provide an introduction to hypothesis testing, and the types of problems to which it can be applied. At the end of this post, I will present a cheat sheet that will help you decide when to use which type of test. The following posts will go into more depth for each test, and provide a code sample for how to calculate it. &lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Hypothesis Testing&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Hypothesis testing is a statistical method that can be used to make decisions about a data set without having to examine every element in that dataset. For example, imagine you have a software system that processes billions of events per hour. Events are grouped into transactions of, say, hundreds of events. Your product owner has identified a candidate product feature that could provide real customer value but only if at least 80% of the transactions over the last 12 months contain events that match a given set of criteria (profile). &lt;/p&gt;

&lt;p&gt;Now we have a problem. It will take weeks to check to process 12 months of events. Why are we bothering to take a sample? Because we want to make a decision, and checking every element in the set might be too difficult (billions of events), or just impossible (testing food means destroying it).&lt;/p&gt;

&lt;p&gt;The issues then become:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;There is something we want to know about the entire population, but we can’t interrogate all of it. &lt;/li&gt;
&lt;li&gt;We sample the population and learn something about that sample, but since it’s only a sample, we can’t be sure that it is, in fact, representative of the entire population.&lt;/li&gt;
&lt;li&gt;Finally, what – if anything – can we guess about the population, given what we’ve learnt about the sample? &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This can all get very heavy, very quickly, so I’ll give a quick example of what a hypothesis test does. In this example we have a data set that’s so large we can’t process all of it to get an answer, so we have to sample it, and then check what conclusions we can deduce from this sample. &lt;/p&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Suppose your software application is processing billions of transactions per hour.&lt;/li&gt;
&lt;li&gt;Your product owner has asked you to implement some new way to process these transactions, but it’s only a worthwhile feature to implement if &lt;strong&gt;at least 80%&lt;/strong&gt; of the transactions – over the whole of the last year – match a given profile. &lt;/li&gt;
&lt;li&gt;Now suppose that a check to see if a given transaction fits this profile was so expensive to calculate that it would take weeks to check all of them. &lt;/li&gt;
&lt;li&gt;So, instead, you sample just 1,000 transactions and find out that &lt;strong&gt;82% of the sampled transactions&lt;/strong&gt; have the required profile. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What can we say about all these billions of transactions, given what we have learnt about just this sample of 1,000? This is where the &lt;em&gt;null hypothesis&lt;/em&gt; and &lt;em&gt;alternative hypothesis&lt;/em&gt; come into play.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Null and Alternative Hypothesis&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;A hypothesis test starts with making two hypotheses: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The null hypothesis – in general, this is a &lt;em&gt;“suppose there’s nothing to see here”&lt;/em&gt; case. &lt;/li&gt;
&lt;li&gt;The alternative hypothesis – this is what we’re checking for. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The test works by assuming the null hypothesis is true and then checking to see how likely a sample fits into that hypothesis. &lt;/p&gt;

&lt;p&gt;If it’s not likely enough, then we can suggest the alternative hypothesis is true. &lt;/p&gt;

&lt;p&gt;Before taking the sample a &lt;em&gt;significance&lt;/em&gt; level is selected. By convention this is 5% – but be advised, this is only a convention, and you must choose this with care. Later on, you will be making a judgement based on a derived probability by comparing it to this significance, so it’s important to consider the significance level before taking the sample. &lt;/p&gt;

&lt;p&gt;Technically, this makes this kind of hypothesis test a &lt;em&gt;significance&lt;/em&gt; test – we’re not proving anything. We are only deciding that, on the balance of probabilities, given how much risk we’re willing to take, that we’re happy to accept that something is likely enough to be true. &lt;/p&gt;

&lt;p&gt;Does that sound vague? It should. There are reasons to be very careful about the kinds of assumptions you should be willing to make based on the results of these tests. In short, these tests aren’t about &lt;em&gt;certainty&lt;/em&gt;, they’re about &lt;em&gt;confidence&lt;/em&gt;. &lt;/p&gt;

&lt;p&gt;In our example, we would start by assuming this null hypothesis is true:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Exactly 80% of the transactions match the profile&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Or in more formal language: &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;p(profile) = 0.8&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;What we want to do now is imagine the following. Note, we don’t actually have to &lt;em&gt;do&lt;/em&gt; the following, this is just here to explain to you why this all works.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--9AxOPFYR--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://sonalake.com/wp-content/uploads/2019/12/hypothesis-testing-4.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--9AxOPFYR--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://sonalake.com/wp-content/uploads/2019/12/hypothesis-testing-4.jpg" alt="hypothesis-testing-4" width="880" height="486"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Imagine what would happen if we were to take lots of samples from a population where the proportion was exactly 80% &lt;/li&gt;
&lt;li&gt;Each sample we took would have a different proportion; but we’d expect most of them to be near enough to the “real” one of 80% &lt;/li&gt;
&lt;li&gt;If we count how many of each proportion we get, the result is a histogram where the “real” proportion has the highest bar. &lt;/li&gt;
&lt;li&gt;Eventually, if we were to take more and more samples, this would tend towards a normal curve, centred around 80% Now we have a curve – for a fictional population that matches our null hypothesis – with which we can compare our sample.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Sample and Compare to Null Hypothesis&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;So, how do we check our sample against this null hypothesis curve? First, we define our alternative hypothesis – i.e. this is the thing we’re trying to prove. For the kinds of tests we’re talking about here, this &lt;em&gt;must&lt;/em&gt; be related to the null hypothesis – i.e. it must be comparing the same terms, just comparing them with a different operator. &lt;/p&gt;

&lt;p&gt;In our example, because we have the null hypothesis:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Exactly&lt;/strong&gt; 80% of the transactions match the profile &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;We would consider this as our alternative hypothesis:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;More than&lt;/strong&gt; 80% of the transactions match the profile &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Or in more formal language:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;p(profile) &amp;gt; 0.8&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Finally, we compare our sample proportion (in our example this was &lt;strong&gt;82%&lt;/strong&gt;) to the curve for the &lt;strong&gt;null hypothesis&lt;/strong&gt;, and we figure out how likely it is that this sample could have come from a population where the proportion was, in fact, exactly 80%. &lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--5nTj39rh--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://sonalake.com/wp-content/uploads/2019/12/hypothesis-testing-5.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--5nTj39rh--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://sonalake.com/wp-content/uploads/2019/12/hypothesis-testing-5.jpg" alt="" width="880" height="624"&gt;&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;In our example, since we’re checking how likely it is that our real population proportion is &lt;em&gt;&lt;strong&gt;greater&lt;/strong&gt;&lt;/em&gt; than 80% (our assumed null hypothesis population proportion), we are, in effect, comparing: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The area under this curve to the right of where our sample result is. &lt;/li&gt;
&lt;li&gt;To the total area under this curve. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This fraction is the probability of how likely it is that our sample came from a population that had a proportion that matched our null hypothesis.&lt;/p&gt;

&lt;h2&gt;
  
  
  Drawing conclusions about the sample
&lt;/h2&gt;

&lt;p&gt;All of the tests that follow derive a result called a p-value. These values are often misunderstood. This misunderstanding can lead the tester to make certain assumptions about the underlying population that cannot be justified. &lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;p-value&lt;/strong&gt; is the probability that the &lt;strong&gt;sample&lt;/strong&gt; result &lt;em&gt;could&lt;/em&gt; have occurred if the null hypothesis were true.&lt;/p&gt;

&lt;p&gt;So, a p-value has no meaning outside of the given sample, and cannot be related to any other sample or p-value, and doesn’t give an indication of how &lt;strong&gt;accurate&lt;/strong&gt; the sample value is. So, in our example, had we calculated a p-value of &lt;strong&gt;4%&lt;/strong&gt;, the following significance levels would have caused us to draw the following conclusions:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;

&lt;thead&gt;

&lt;tr&gt;

&lt;th&gt;Significance&lt;/th&gt;

&lt;th&gt;Conclusions&lt;/th&gt;

&lt;/tr&gt;

&lt;/thead&gt;

&lt;tbody&gt;

&lt;tr&gt;

&lt;td&gt;&lt;strong&gt;5%&lt;/strong&gt;&lt;/td&gt;

&lt;td&gt;
&lt;ul&gt;
&lt;li&gt;The p-value of 4% is less than the significance of 5%.&lt;/li&gt;
&lt;li&gt;So, the probability of this sample coming from a population with the values assumed by the null hypothesis is &lt;strong&gt;not significant&lt;/strong&gt;
&lt;/li&gt;

&lt;li&gt;So, we &lt;strong&gt;can&lt;/strong&gt; reject the null hypothesis, which suggests the alternative hypothesis. &lt;/li&gt;

&lt;strong&gt;NOTE:&lt;/strong&gt; this doesn’t prove the alternative hypothesis; only that we can feel a degree of confidence that more than 80% of the transactions match our profile. 
We &lt;strong&gt;cannot&lt;/strong&gt; say _anything_ else about the actual value of the proportion of the underlying population – i.e. we can’t say that it’s likely to be 82%, or even close to 82%&lt;/ul&gt;
&lt;/td&gt;

&lt;/tr&gt;

&lt;tr&gt;

&lt;td&gt;&lt;strong&gt;1%&lt;/strong&gt;&lt;/td&gt;

&lt;td&gt;
&lt;ul&gt;
&lt;li&gt;The p-value of 4% is greater than (or equal to) the significance of 1%.&lt;/li&gt;
&lt;li&gt;So, the probability of this sample coming from a population with the values assumed by the null hypothesis is &lt;strong&gt;significant&lt;/strong&gt;.&lt;/li&gt; 
&lt;li&gt;We &lt;strong&gt;cannot&lt;/strong&gt; reject the null hypothesis, i.e. we can’t feel confident that the sample came from a population different from the one assumed by the null hypothesis.&lt;/li&gt;
&lt;li&gt;We &lt;strong&gt;cannot&lt;/strong&gt; say _anything_ else about the actual value of the proportion of the underlying population – i.e. we can’t say that it’s likely to be less than 80%&lt;/li&gt;
&lt;/ul&gt;
&lt;/td&gt;

&lt;/tr&gt;

&lt;/tbody&gt;

&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;What Next?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;This above example is for a test comparing &lt;em&gt;proportions&lt;/em&gt;, but a different test would be required depending on what it was that you were comparing. This figure below offers a guide as to which test to apply depending on the nature of the data, and the observations you’re looking to make.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--sBrmv9ut--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://sonalake.com/wp-content/uploads/2019/12/hypothesis-testing.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--sBrmv9ut--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://sonalake.com/wp-content/uploads/2019/12/hypothesis-testing.png" alt="hypothesis-testing" width="880" height="495"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The rest of this series of blog posts will explain – with examples – when each of these different test types is applicable and will include sample code for each of them.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;PART I: &lt;a href="https://dev.to/sonalake/an-introduction-to-hypothesis-testing-41ne"&gt;An Introduction to Hypothesis Testing&lt;/a&gt;&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;PART II: &lt;a href="https://dev.to/sonalake/part-2-hypothesis-testing-of-proportion-based-samples-1ik2"&gt;Hypothesis Testing of proportion-based samples&lt;/a&gt;&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;PART III: &lt;a href="https://dev.to/sonalake/part-3-hypothesis-testing-of-mean-based-samples-5c16"&gt;Hypothesis Testing of mean-based samples&lt;/a&gt;&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;PART IV: &lt;a href="https://dev.to/sonalake/part-4-hypothesis-testing-of-frequency-based-samples-48oi"&gt;Hypothesis Testing of frequency-based samples&lt;/a&gt;&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>datascience</category>
      <category>python</category>
    </item>
    <item>
      <title>The Code Speaks for Itself – Generating API docs for Spring Applications</title>
      <dc:creator>Daniel Bray</dc:creator>
      <pubDate>Fri, 25 Sep 2020 11:18:09 +0000</pubDate>
      <link>https://dev.to/sonalake/the-code-speaks-for-itself-generating-api-docs-for-spring-applications-432d</link>
      <guid>https://dev.to/sonalake/the-code-speaks-for-itself-generating-api-docs-for-spring-applications-432d</guid>
      <description>&lt;h2&gt;
  
  
  &lt;strong&gt;Introduction&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Programming is mostly about communication, and one of the most time-consuming parts of this aspect of development is the communication of how service APIs function. If this is done poorly, then the documents can get out of date, or be so vague that the developers will spend too much time answering questions about how their API works.&lt;/p&gt;

&lt;p&gt;This post outlines a process that we in Sonalake have found to automate the creation of REST API documentation. It’s done in such a way that it won’t require too much in the way of manual effort once it’s started, because most of the documentation detail will come from work you’re already doing to test the service. We have provided a working example of this in the &lt;a href="https://github.com/sonalake/sonalake-autodoc-example"&gt;&lt;strong&gt;sonalake-autodoc-example&lt;/strong&gt;&lt;/a&gt; project.&lt;/p&gt;

&lt;p&gt;What drove the creation of this process was the aim to provide a good developer experience (DX) to our own developers, and our clients and partners, by delivering good documentation that:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Describes what, specifically, is in the API&lt;/li&gt;
&lt;li&gt; Provides examples of how to use the API&lt;/li&gt;
&lt;li&gt; Contains a changelog for how the API has evolved between versions&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Tools like &lt;a href="https://swagger.io"&gt;&lt;strong&gt;Swagger&lt;/strong&gt;&lt;/a&gt; do a great job on automating documentation for point 1, but when it comes to points 2 and 3, these types of documentation are generally written manually (or more likely, not written at all).&lt;/p&gt;

&lt;p&gt;By generating documentation from the source, we have found that it allows for significant portions of the API documentation to be automatically generated. By generating documented examples from unit tests, we can ensure that these examples always align with the reality of the application.&lt;/p&gt;

&lt;p&gt;It also allows for developers to keep documentation up-to-date without having to leave the development environment, and for documents to be released and published the same way as any other development artifact.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;How do we do this?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Some parts of the documentation are written manually by the developers in &lt;a href="https://asciidoc.org/"&gt;&lt;strong&gt;AsciiDoc&lt;/strong&gt;&lt;/a&gt;. These parts of the documentation are not expected to change much between releases, and are limited to things like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Introducing what the API is&lt;/li&gt;
&lt;li&gt;  Describing how to authenticate&lt;/li&gt;
&lt;li&gt;  Outlining a generic set of use case steps, without any actual code samples (the code samples will be auto-generated during the build, using the data passed to unit tests).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The rest of the process will generate the following sections, also in &lt;a href="https://asciidoc.org/"&gt;&lt;strong&gt;AsciiDoc&lt;/strong&gt;&lt;/a&gt; format.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;a href="https://swagger.io"&gt;&lt;strong&gt;Swagger&lt;/strong&gt;&lt;/a&gt; documentation concerning the paths and entities&lt;/li&gt;
&lt;li&gt;  Code samples for the use case steps, generated from the unit tests&lt;/li&gt;
&lt;li&gt;  Changelog history of differences between published versions of the swagger.json&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Finally, the AsciiDoc files are collated and published in a single PDF file using &lt;a href="https://asciidoctor.org/docs/asciidoctor-pdf/"&gt;&lt;strong&gt;Asciidoctor PDF&lt;/strong&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;At a high level, the main steps are as follows:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;th&gt;Step&lt;/th&gt;
&lt;th&gt;Comment&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;

&lt;td&gt; &lt;strong&gt;Define Theme&lt;/strong&gt; &lt;/td&gt;

&lt;td&gt;The theme in the above project is a simple, clean layout, , suitable for rendering most documents, and contains the standard document tracking elements such as document versions.

This uses the standard &lt;a href="https://github.com/asciidoctor/asciidoctor-pdf/blob/v1.5.0.beta.7/docs/theming-guide.adoc" rel="noopener noreferrer"&gt;&lt;strong&gt;AsciiDoctor-PDF&lt;/strong&gt;&lt;/a&gt; theme configurations.

&lt;/td&gt;

&lt;/tr&gt;

&lt;tr&gt;

&lt;td&gt;&lt;strong&gt;Generate Example Code Snippets&lt;/strong&gt;&lt;/td&gt;

&lt;td&gt;Use &lt;a href="https://github.com/spring-projects/spring-restdocs" rel="noopener noreferrer"&gt;&lt;strong&gt;spring-restdocs&lt;/strong&gt;&lt;/a&gt; to document the inputs/outputs for REST queries by writing unit tests that exercise the APIs. We’ll embed these snippets in the final documentation later on.&lt;/td&gt;

&lt;/tr&gt;

&lt;tr&gt;

&lt;td&gt;&lt;strong&gt;Generate swagger.json&lt;/strong&gt;&lt;/td&gt;

&lt;td&gt;Use a SpringBootTest to spin up the app in-memory and pull down the swagger.json to a local directory.  
You can use the test from the previous step to do this.&lt;/td&gt;

&lt;/tr&gt;

&lt;tr&gt;

&lt;td&gt;&lt;strong&gt;Generate Changelog&lt;/strong&gt;&lt;/td&gt;

&lt;td&gt;Use Sonalake’s &lt;a href="https://plugins.gradle.org/plugin/com.sonalake.swagger-changelog" rel="noopener noreferrer"&gt;&lt;strong&gt;swagger-changelog&lt;/strong&gt;&lt;/a&gt; plugin to parse any previously published API specs, compare it to the current dev version, and produce a changelog in AsciiDoc format.&lt;/td&gt;

&lt;/tr&gt;

&lt;tr&gt;

&lt;td&gt;&lt;strong&gt;Author Hand-written Content&lt;/strong&gt;&lt;/td&gt;

&lt;td&gt;A document containing:

*   Hand-written content that won’t change too often. For example, an introduction.
*   A code examples document of simple text, referencing the generated snippets.
*   Write a single framing document that will link to both the hand-written, and generated content.

&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;We have a developed sample project to showcase all of these steps: sonalake-autodoc-example. This is a very simple Spring Boot application with a trivial REST API with two GET methods. The rest of the project is solely dedicated to automating the documentation. Let’s walk through it.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Define Theme&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The main tool for the AsciiDoctor-to-PDF generation is &lt;a href="https://github.com/asciidoctor/asciidoctor-pdf/blob/v1.5.0.beta.7/docs/theming-guide.adoc"&gt;&lt;strong&gt;AsciiDoctor-PDF&lt;/strong&gt;&lt;/a&gt; and it comes with a full set of theming options. The &lt;a href="https://github.com/sonalake/sonalake-autodoc-example/blob/develop/src/docs/asciidoc/theme/simple-theme.yml"&gt;&lt;strong&gt;simple-theme.yml&lt;/strong&gt;&lt;/a&gt; sample provides a simple, clean professional layout, that you can probably re-use by just changing the logo image.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Generate Example Code Snippets&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;This part of the pipeline generates snippets in AsciiDoc format from unit tests. The output contains examples of REST calls, with request bodies and responses that will always be accurate for the current version of the code base.&lt;/p&gt;

&lt;p&gt;In the sample project this all happens in &lt;a href="https://github.com/sonalake/sonalake-autodoc-example/blob/develop/src/test/java/com/sonalake/autodoc/api/BaseWebTest.java"&gt;&lt;strong&gt;BaseWebTest&lt;/strong&gt;&lt;/a&gt;. It takes advantage of &lt;a href="https://github.com/spring-projects/spring-restdocs"&gt;&lt;strong&gt;spring-restdocs&lt;/strong&gt;&lt;/a&gt; and acts as a base class for all other web-based unit tests.&lt;/p&gt;

&lt;p&gt;API calls would be tested in the normal way:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="n"&gt;mockMvc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;perform&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"/api/endpoint-a"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;contentType&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;MediaType&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;APPLICATION_JSON&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;accept&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;MediaType&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;APPLICATION_JSON&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;characterEncoding&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;StandardCharsets&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;UTF_8&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt;
&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="na"&gt;andExpect&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;isOk&lt;/span&gt;&lt;span class="o"&gt;()).&lt;/span&gt;&lt;span class="na"&gt;andReturn&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A unit test of the form above generates the following snippets to&lt;br&gt;&lt;br&gt;
&lt;code&gt;build/generated-snippets/${test-class-name}/${test-method-name}&lt;/code&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;code&gt;curl-request.adoc&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;code&gt;http-request.adoc&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;code&gt;http-response.adoc&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;code&gt;httpie-request.adoc&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;code&gt;request-body.adoc&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;code&gt;response-body.adoc&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For example, a http-request for a POST might look like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;&lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;&lt;span class="n"&gt;options&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"nowrap"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
&lt;span class="o"&gt;----&lt;/span&gt;
&lt;span class="no"&gt;POST&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;api&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;endpoint&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="no"&gt;HTTP&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mf"&gt;1.1&lt;/span&gt;
&lt;span class="nc"&gt;Content&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nl"&gt;Type:&lt;/span&gt; &lt;span class="n"&gt;application&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;&lt;span class="n"&gt;charset&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="no"&gt;UTF&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;
&lt;span class="nl"&gt;Accept:&lt;/span&gt; &lt;span class="n"&gt;application&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="nc"&gt;Content&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nl"&gt;Length:&lt;/span&gt; &lt;span class="mi"&gt;52&lt;/span&gt;
&lt;span class="nl"&gt;Host:&lt;/span&gt; &lt;span class="n"&gt;autodoc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;sonalake&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;com&lt;/span&gt;
&lt;span class="o"&gt;{&lt;/span&gt;
  &lt;span class="s"&gt;"fieldA"&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"sample A"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
  &lt;span class="s"&gt;"fieldB"&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"sample B"&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;----&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These files can be referenced in your examples documents, with the result that examples will always be up-to-date.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Generate swagger.json&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;This part of the pipeline generates an up-to-date view of the REST paths and entities in AsciiDoc format. First by generating a swagger.json, and then translating this into AsciiDoc.&lt;/p&gt;

&lt;p&gt;The sample project contains a single test, &lt;a href="https://github.com/sonalake/sonalake-autodoc-example/blob/develop/src/test/java/com/sonalake/autodoc/GenerateDocumentationTest.java"&gt;&lt;strong&gt;GenerateDocumentationTest.java&lt;/strong&gt;&lt;/a&gt;, that starts up the application as &lt;code&gt;@SpringBootTest&lt;/code&gt; in the &lt;code&gt;test&lt;/code&gt; profile, and pulls down the &lt;code&gt;swagger.json&lt;/code&gt; generated by the &lt;a href="https://github.com/sonalake/sonalake-autodoc-example/blob/develop/src/main/java/com/sonalake/autodoc/config/SwaggerConfig.java"&gt;&lt;strong&gt;SwaggerConfig.java&lt;/strong&gt;&lt;/a&gt;. It then runs swagger.json through the &lt;a href="https://github.com/Swagger2Markup/swagger2markup-gradle-plugin"&gt;&lt;strong&gt;swagger2markup-gradle-plugin&lt;/strong&gt;&lt;/a&gt; to convert it to AsciiDoc format.&lt;/p&gt;

&lt;p&gt;This produces the following sets of files:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Overview.adoc&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Contains some metadata from application.yml such as title text and version information for inclusion on the main page&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  security.adoc&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;a simple page describing how authentication, for example HTTP headers, should be configured&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  paths.adoc&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A list of all the REST calls and responses the application will accept and respond with&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  definitions.adoc&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;a list of all the entities the application will accept and respond with&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Make Documentation Easier to Follow&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://swagger.io/docs/specification/grouping-operations-with-tags/"&gt;&lt;strong&gt;Tags&lt;/strong&gt;&lt;/a&gt; are an optional, but useful, tool for collecting related endpoints together, even when they are implemented in different classes. By default, Swagger will name the resources after their controller classes, but tags allow you to give them a different name.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Api&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tags&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s"&gt;"Section A"&lt;/span&gt;&lt;span class="o"&gt;})&lt;/span&gt;
&lt;span class="nd"&gt;@Description&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Some operations in section A"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ControllerA1&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  &lt;strong&gt;Generate Changelog&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The last part of the automated process is related to how to create a changelog. It assumes that previously released versions of the Swagger are published under Nexus. All of this configuration is contained in &lt;a href="https://github.com/sonalake/sonalake-autodoc-example/blob/develop/build.gradle"&gt;&lt;strong&gt;build.gradle&lt;/strong&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Publish Swagger Spec as a Nexus Artifact using the maven-publish and maven-publish-auth plugins.&lt;/li&gt;
&lt;li&gt; Generate changelog from Nexus history using the Sonalake swagger-changelog Gradle plugin.
The plugin will retrieve any previously published RELEASE versions of the Swagger spec, and will produce the following:

&lt;ul&gt;
&lt;li&gt;  A file of the form &lt;code&gt;change-log-0.0.1-0.0.2-SNAPSHOT.adoc&lt;/code&gt; for each version&lt;/li&gt;
&lt;li&gt;  An index file, &lt;code&gt;change-log.adoc&lt;/code&gt;, listing all versions&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Author Hand-written Content&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Writing the following documents will round out the process.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  introduction.adoc – a simple one or two paragraph description of what the applications is for&lt;/li&gt;
&lt;li&gt;  security.adoc – a quick description of how to authenticate, and what, if any, roles exist in the application. Do note, that if you want to, you can easily write other tests that will print out a list of such roles in AsciiDoc format, and include them in this file.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A special case of a hand-written document is the&lt;a href="https://github.com/sonalake/sonalake-autodoc-example/blob/develop/src/docs/asciidoc/examples.adoc"&gt; &lt;strong&gt;examples.adoc&lt;/strong&gt;&lt;/a&gt; where a high-level description of the overall flow of REST calls would be written. For example, to on-board a new user, you need to call X, then Y, and the Z. However, this document would not include any actual REST calls or parameters. Rather, it would refer to the results of the unit tests that you have written to test these endpoints.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="o"&gt;[[&lt;/span&gt;&lt;span class="n"&gt;examplesscheme&lt;/span&gt;&lt;span class="o"&gt;]]&lt;/span&gt;
&lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="nc"&gt;Examples&lt;/span&gt;
&lt;span class="nc"&gt;What&lt;/span&gt; &lt;span class="n"&gt;follows&lt;/span&gt; &lt;span class="n"&gt;are&lt;/span&gt; &lt;span class="n"&gt;some&lt;/span&gt; &lt;span class="n"&gt;examples&lt;/span&gt; &lt;span class="n"&gt;of&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="no"&gt;API&lt;/span&gt; &lt;span class="n"&gt;usage&lt;/span&gt;
&lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="nc"&gt;Endpoint&lt;/span&gt; &lt;span class="no"&gt;A&lt;/span&gt;
&lt;span class="no"&gt;A&lt;/span&gt; &lt;span class="n"&gt;get&lt;/span&gt; &lt;span class="n"&gt;call&lt;/span&gt;
&lt;span class="nl"&gt;include:&lt;/span&gt;&lt;span class="o"&gt;:{&lt;/span&gt;&lt;span class="n"&gt;snippets&lt;/span&gt;&lt;span class="o"&gt;}/&lt;/span&gt;&lt;span class="n"&gt;endpoint&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;adoc&lt;/span&gt;&lt;span class="o"&gt;[]&lt;/span&gt;
&lt;span class="nc"&gt;Returns&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;
&lt;span class="nl"&gt;include:&lt;/span&gt;&lt;span class="o"&gt;:{&lt;/span&gt;&lt;span class="n"&gt;snippets&lt;/span&gt;&lt;span class="o"&gt;}/&lt;/span&gt;&lt;span class="n"&gt;endpoint&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;adoc&lt;/span&gt;&lt;span class="o"&gt;[]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Since the overall flow of your application isn’t likely to change – even if the URLs and request/responses change – this document will remain relatively unchanged over time. The only thing you are likely to have to update are your unit tests, but you’d be doing that anyway. Right?&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Tying it All Together&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;All this work is done in &lt;a href="https://github.com/sonalake/sonalake-autodoc-example/blob/develop/build.gradle"&gt;&lt;strong&gt;build.gradle&lt;/strong&gt;&lt;/a&gt; – it dictates where to write the files in the build directory.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="n"&gt;ext&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
 &lt;span class="n"&gt;asciiDocOutputDir&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"${buildDir}/asciidoc/generated"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
 &lt;span class="n"&gt;swaggerOutputDir&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"${buildDir}/swagger"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
 &lt;span class="n"&gt;snippetsOutputDir&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"${buildDir}/generated-snippets"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The following tells Gradle to pass system properties down to the test tool, so the generate documentation task can know the current document version.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="n"&gt;test&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
 &lt;span class="n"&gt;systemProperties&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;System&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;properties&lt;/span&gt;
 &lt;span class="n"&gt;systemProperty&lt;/span&gt; &lt;span class="err"&gt;'&lt;/span&gt;&lt;span class="n"&gt;sg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;api&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="err"&gt;'&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;version&lt;/span&gt;

 &lt;span class="nf"&gt;useJUnitPlatform&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then use &lt;a href="http://swagger2markup.github.io/swagger2markup/1.3.1/"&gt;&lt;strong&gt;swagger2markup&lt;/strong&gt;&lt;/a&gt; to convert the swagger.json into AsciiDoc format&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="n"&gt;convertSwagger2markup&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
 &lt;span class="n"&gt;dependsOn&lt;/span&gt; &lt;span class="n"&gt;test&lt;/span&gt;
 &lt;span class="n"&gt;swaggerInput&lt;/span&gt; &lt;span class="s"&gt;"${swaggerOutputDir}/swagger.json"&lt;/span&gt;
 &lt;span class="n"&gt;outputDir&lt;/span&gt; &lt;span class="n"&gt;asciiDocOutputDir&lt;/span&gt;
 &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;
   &lt;span class="err"&gt;'&lt;/span&gt;&lt;span class="n"&gt;swagger2markup&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;pathsGroupedBy&lt;/span&gt;&lt;span class="err"&gt;'&lt;/span&gt;                          &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="err"&gt;'&lt;/span&gt;&lt;span class="no"&gt;TAGS&lt;/span&gt;&lt;span class="err"&gt;'&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
   &lt;span class="err"&gt;'&lt;/span&gt;&lt;span class="n"&gt;swagger2markup&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;extensions&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;springRestDocs&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;snippetBaseUri&lt;/span&gt;&lt;span class="err"&gt;'&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;snippetsOutputDir&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getAbsolutePath&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
 &lt;span class="o"&gt;]&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, the following tells the swagger changelog plugin from where to pull version information, and to where to write the diff files.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="n"&gt;swaggerChangeLog&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
 &lt;span class="n"&gt;groupId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"${rootProject.group}"&lt;/span&gt;
 &lt;span class="n"&gt;artifactId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"${rootProject.name}-API"&lt;/span&gt;

 &lt;span class="c1"&gt;// where to find the nexus repo&lt;/span&gt;
 &lt;span class="n"&gt;nexusHome&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="err"&gt;'&lt;/span&gt;&lt;span class="nl"&gt;http:&lt;/span&gt;&lt;span class="c1"&gt;//atlanta.sonalake.corp:8081/nexus'&lt;/span&gt;

 &lt;span class="c1"&gt;// where to store the changelog&lt;/span&gt;
 &lt;span class="n"&gt;targetdir&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"${buildDir}/asciidoc/generated/changelog"&lt;/span&gt;

 &lt;span class="c1"&gt;// if we’re building a snapshot version, then include it as the&lt;/span&gt;
 &lt;span class="c1"&gt;// end of the changelog&lt;/span&gt;
 &lt;span class="n"&gt;snapshotVersionFile&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"${buildDir}/swagger/swagger.json"&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Finally, this is where the &lt;a href="https://asciidoctor.org/docs/asciidoctor-pdf/"&gt;&lt;strong&gt;AsciiDoctor-PDF&lt;/strong&gt;&lt;/a&gt; Gradle plugin takes all the AsciiDoc files we have created, and converts them into a pdf.&lt;/p&gt;

&lt;p&gt;Note that the baseDirFollowsSourceDir setting, all paths are relative to the main index file. This is done because it allows for references within the AsciiDoc file structure to not have to worry about where they are on the file system.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// create a PDF from the asciidoc&lt;/span&gt;
&lt;span class="n"&gt;asciidoctorPdf&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
 &lt;span class="n"&gt;dependsOn&lt;/span&gt; &lt;span class="n"&gt;convertSwagger2markup&lt;/span&gt;
 &lt;span class="n"&gt;dependsOn&lt;/span&gt; &lt;span class="n"&gt;generateChangeLog&lt;/span&gt;

 &lt;span class="nf"&gt;baseDirFollowsSourceDir&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;

 &lt;span class="n"&gt;sources&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
   &lt;span class="n"&gt;include&lt;/span&gt; &lt;span class="err"&gt;'&lt;/span&gt;&lt;span class="n"&gt;api&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;guide&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;adoc&lt;/span&gt;&lt;span class="err"&gt;'&lt;/span&gt;
 &lt;span class="o"&gt;}&lt;/span&gt;
 &lt;span class="n"&gt;attributes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;
   &lt;span class="n"&gt;doctype&lt;/span&gt;        &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="err"&gt;'&lt;/span&gt;&lt;span class="n"&gt;book&lt;/span&gt;&lt;span class="err"&gt;'&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
   &lt;span class="n"&gt;toc&lt;/span&gt;            &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="err"&gt;'&lt;/span&gt;&lt;span class="n"&gt;left&lt;/span&gt;&lt;span class="err"&gt;'&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
   &lt;span class="n"&gt;toclevels&lt;/span&gt;      &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="sc"&gt;'3'&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
   &lt;span class="n"&gt;numbered&lt;/span&gt;       &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="err"&gt;''&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
   &lt;span class="n"&gt;sectlinks&lt;/span&gt;      &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="err"&gt;''&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
   &lt;span class="n"&gt;sectanchors&lt;/span&gt;    &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="err"&gt;''&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
   &lt;span class="n"&gt;hardbreaks&lt;/span&gt;     &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="err"&gt;''&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
   &lt;span class="n"&gt;generated&lt;/span&gt;      &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="err"&gt;'&lt;/span&gt;&lt;span class="o"&gt;../../../&lt;/span&gt;&lt;span class="n"&gt;build&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;asciidoc&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;generated&lt;/span&gt;&lt;span class="err"&gt;'&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
   &lt;span class="n"&gt;resources&lt;/span&gt;      &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="err"&gt;'&lt;/span&gt;&lt;span class="o"&gt;../../../&lt;/span&gt;&lt;span class="n"&gt;src&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;resources&lt;/span&gt;&lt;span class="err"&gt;'&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
   &lt;span class="n"&gt;snippets&lt;/span&gt;       &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="err"&gt;'&lt;/span&gt;&lt;span class="o"&gt;../../../&lt;/span&gt;&lt;span class="n"&gt;build&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;generated&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;snippets&lt;/span&gt;&lt;span class="err"&gt;'&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
   &lt;span class="n"&gt;changes&lt;/span&gt;        &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="err"&gt;'&lt;/span&gt;&lt;span class="o"&gt;../../../&lt;/span&gt;&lt;span class="n"&gt;build&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;asciidoc&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;generated&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;changelog&lt;/span&gt;&lt;span class="err"&gt;'&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
   &lt;span class="n"&gt;imagesdir&lt;/span&gt;      &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="err"&gt;'&lt;/span&gt;&lt;span class="n"&gt;theme&lt;/span&gt;&lt;span class="err"&gt;'&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
   &lt;span class="err"&gt;'&lt;/span&gt;&lt;span class="n"&gt;pdf&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;stylesdir&lt;/span&gt;&lt;span class="err"&gt;'&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="err"&gt;'&lt;/span&gt;&lt;span class="n"&gt;theme&lt;/span&gt;&lt;span class="err"&gt;'&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
   &lt;span class="err"&gt;'&lt;/span&gt;&lt;span class="n"&gt;pdf&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;style&lt;/span&gt;&lt;span class="err"&gt;'&lt;/span&gt;    &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="err"&gt;'&lt;/span&gt;&lt;span class="n"&gt;simple&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;theme&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;yml&lt;/span&gt;&lt;span class="err"&gt;'&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
   &lt;span class="n"&gt;revnumber&lt;/span&gt;      &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;version&lt;/span&gt;
 &lt;span class="o"&gt;]&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s it. You can take the code from the sample project into any Spring Boot project in about an hour, and produce professional, clean documents. We hope you find it as useful as we do!&lt;/p&gt;

</description>
      <category>java</category>
      <category>api</category>
      <category>rest</category>
    </item>
  </channel>
</rss>
