<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: r2c</title>
    <description>The latest articles on DEV Community by r2c (@r2c).</description>
    <link>https://dev.to/r2c</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F1847%2F848298ae-97cf-4c28-b4cf-c524dbe68c26.jpg</url>
      <title>DEV Community: r2c</title>
      <link>https://dev.to/r2c</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/r2c"/>
    <language>en</language>
    <item>
      <title>Introducing Semgrep and r2c</title>
      <dc:creator>Pablo Estrada</dc:creator>
      <pubDate>Thu, 29 Oct 2020 17:19:32 +0000</pubDate>
      <link>https://dev.to/r2c/introducing-semgrep-and-r2c-mho</link>
      <guid>https://dev.to/r2c/introducing-semgrep-and-r2c-mho</guid>
      <description>&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--u0W0AUVi--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/qrx9ly4jrhciyimmn0h0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--u0W0AUVi--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/qrx9ly4jrhciyimmn0h0.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This post is by &lt;a href="https://twitter.com/0xine"&gt;Isaac Evans&lt;/a&gt;, CEO and co-founder of &lt;a href="https://r2c.dev"&gt;r2c&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Free, fast, &lt;a href="https://github.com/returntocorp/semgrep"&gt;open-source&lt;/a&gt;, offline, customizable. These are not often words that describe code scanning tools, and that's a shame.&lt;/p&gt;

&lt;p&gt;We founded r2c to bring world-class security tools to developers based on our conviction that software will run the most exciting parts of the future: everything from medical equipment to robots to autonomous cars. The security process should not be the foe but rather the enabler of rapid software development. If developers lack tooling that is easy to set up and understand—or if a developer has to convince their manager to spend a few million dollars on advanced security tools each time they change jobs, the future is bleak.&lt;/p&gt;

&lt;p&gt;Before founding r2c, we worked on security and developer tools for large companies and governments. It was eye-opening to see that despite massive budgets, their security programs were generally a generation or more behind the tech giants. When it came to security tools for developers, most teams were jaded about scanning code for vulnerabilities; they hated the tools they had to use and usually ignored them beyond doing the minimum necessary to satisfy a compliance checkbox.&lt;/p&gt;

&lt;p&gt;What about code scanning at places like Facebook, Apple, Amazon, Netflix, and Google? They don't generally use traditional commercial security tools which ask "how can we find every bug?" Instead, they focus on custom tooling that can build guardrails for developers. This doesn't require million-dollar tools, PhDs in program analysis, or days of compute time. It looks much more like unit tests for security.&lt;/p&gt;

&lt;p&gt;We believe there is a gap between traditional compliance tools and simple linters that's ripe for a new approach, and we were fortunate to find partners from Redpoint Ventures and Sequoia Capital who agreed. With them, we raised a \$13M Series A round of funding to build a security tool that developers might actually love. We've been working on it quietly for a while now, and we're finally ready to announce it to the world!&lt;/p&gt;

&lt;h2&gt;
  
  
  Semgrep
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://semgrep.dev/"&gt;Semgrep&lt;/a&gt;, our open-source product, is specifically designed for eradicating bug classes.&lt;br&gt;
Developers and security engineers can say "this is the safe pattern we always use for (e.g. parsing XML)", write a rule in a few minutes, and enforce that on every editor save, commit, and pull request.&lt;/p&gt;

&lt;p&gt;Semgrep is ideal for building security guardrails: start by using frameworks designed with security in mind, then automatically flag code that strays from the &lt;a href="https://semgrep.dev/explore"&gt;secure-by-default path&lt;/a&gt;. This is an approach used by &lt;a href="https://landing.google.com/sre/resources/foundationsandprinciples/srs-book/"&gt;Google&lt;/a&gt;, &lt;a href="https://about.fb.com/news/2019/01/designing-security-for-billions/"&gt;Facebook&lt;/a&gt;, &lt;a href="https://homes.cs.washington.edu/~mernst/pubs/continuous-compliance-ase2020.pdf"&gt;Amazon&lt;/a&gt;, Dropbox, Stripe, &lt;a href="https://medium.com/@NetflixTechBlog/scaling-appsec-at-netflix-6a13d7ab6043"&gt;Netflix&lt;/a&gt;, and others—a topic &lt;a href="https://events.bizzabo.com/OWASPGlobalAppSec/agenda/session/315858"&gt;Clint Gibler and I presented on at Global AppSec 2020&lt;/a&gt;. This approach increases developer productivity, reduces attack surface, minimizes the areas for human inspection and audit, and allows the security team to scalably protect code written by thousands of developers.&lt;/p&gt;

&lt;p&gt;The idea behind Semgrep is simple: it feels like a regular search (grep) but is syntax-aware. You can &lt;a href="https://semgrep.dev/learn"&gt;learn Semgrep&lt;/a&gt; in a few minutes! And Semgrep can be used for &lt;a href="https://semgrep.dev/explore"&gt;more than just security&lt;/a&gt; issues: performance, internationalization, or just annoyances &lt;a href="https://r2c.dev/blog/2020/fixing-leaky-logs-how-to-find-a-bug-and-ensure-it-never-returns"&gt;committed by accident&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--cnWanLG9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://r2c.dev/static/730ca8541e43ed871d2edb7c02226b0c/bde6a/semgrep-foo-small.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--cnWanLG9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://r2c.dev/static/730ca8541e43ed871d2edb7c02226b0c/bde6a/semgrep-foo-small.png" alt="Semgrep pattern example"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;$ semgrep -e foo(1)&lt;/code&gt; matches all equivalent variations. &lt;a href="https://semgrep.dev/s/ievans:python-exec"&gt;See a live example of matching &lt;em&gt;exec&lt;/em&gt; calls&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next?
&lt;/h2&gt;

&lt;p&gt;Semgrep started as an open-source project at Facebook and we're lucky to have its original author, Yoann Padioleau, on our team at r2c. Since we released the first post-Facebook version (0.4) earlier this year, we've released 25 new versions, added support for 8 new languages, reworked the parsers so we could collaborate with Github on &lt;a href="https://tree-sitter.github.io/"&gt;tree-sitter&lt;/a&gt;, been joined by thousands of enthusiastic GitHub followers, and seen over 100K pulls of the Semgrep Docker image.&lt;/p&gt;

&lt;p&gt;Our roadmap contains more program analysis features to support the sorts of secure-by-default enforcement that large technology companies are already leveraging so heavily (constant propagation, taint tracking, and more), as well as support for many more languages.&lt;/p&gt;

&lt;h2&gt;
  
  
  Batteries Included
&lt;/h2&gt;

&lt;p&gt;Along with this release of Semgrep, we're announcing the availability of &lt;a href="https://semgrep.dev/"&gt;Semgrep Community&lt;/a&gt;, a free, hosted service for managing Semgrep CI as well as Semgrep Teams, a paid service which adds additional features for managing Semgrep that are useful for enterprises. Both these offerrings provide SaaS infrastructure for operating a modern AppSec program. They enable central definition of code standards for your projects and show results where you already work: GitHub, GitLab, Slack, Jira, VS Code, and more.&lt;/p&gt;

&lt;p&gt;We're also excited that &lt;a href="https://semgrep.dev/explore"&gt;Semgrep Registry&lt;/a&gt; already has 900+ rules written by r2c and the community—you can start running on your project right now! Or if you like to DIY, &lt;a href="https://semgrep.dev/editor"&gt;try writing your own&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>security</category>
      <category>startup</category>
    </item>
    <item>
      <title>Pain-free Custom Linting: Why I moved from ESLint and Bandit to Semgrep</title>
      <dc:creator>Ulzii</dc:creator>
      <pubDate>Fri, 15 May 2020 21:41:48 +0000</pubDate>
      <link>https://dev.to/r2c/serenading-semgrep-why-i-moved-to-semgrep-for-all-my-code-analysis-3eig</link>
      <guid>https://dev.to/r2c/serenading-semgrep-why-i-moved-to-semgrep-for-all-my-code-analysis-3eig</guid>
      <description>&lt;p&gt;&lt;strong&gt;&lt;em&gt;tldr&lt;/em&gt;&lt;/strong&gt;: &lt;a href="https://semgrep.dev/"&gt;Semgrep&lt;/a&gt; is an analysis tool that is easy to learn and easy to prototype rules with, and can be adopted across languages.&lt;/p&gt;

&lt;p&gt;For anyone who is looking to write a rule or sophisticated analysis using a free analysis tool, I wanted to share my experience of writing AST-based visitor rules in contrast to Semgrep rules.&lt;/p&gt;

&lt;p&gt;Having written multiple Flake8 rules for Python3, an ESLint plugin, and poked at Go-AST, I have gotten familiar with how many AST-based analysis engines and frameworks work. After writing about 10 AST-based visitors, I was struck with the non-intuitive nature of rule writing, regardless of whether it’s in Go, Python, or JavaScript. In contrast, I have written 40-50 rules in Semgrep in a matter of two months and I am still amazed at the ease of writing rules with it.&lt;/p&gt;

&lt;p&gt;For full disclosure, I work at r2c, and we open sourced Semgrep and actively develop it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Writing code to analyze code != Writing code
&lt;/h2&gt;

&lt;p&gt;When starting with an analysis, one usually has to program an AST-based visitor. If you’re not familiar with what an AST is or what a visitor means, feel free to check out this &lt;a href="https://medium.com/basecs/leveling-up-ones-parsing-game-with-asts-d7a6fc2400ff"&gt;excellent blog post&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;After writing a few visitors, it becomes obvious that the way I write my program is very different from the way I write program analysis for my program. When I write a visitor, I am essentially writing a graph algorithm that visits nodes in that graph and does certain logic.&lt;/p&gt;

&lt;p&gt;One of the core advantages for me in writing analysis with Semgrep is that I don’t have to be in that mental model of graph algorithms.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I can actually reason about my analysis in the way I write my code.&lt;br&gt;
To clarify the difference of mental model, consider writing analysis to match variable declaration like &lt;code&gt;my_var = myvar()&lt;/code&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In a typical AST based analysis, I’ll write a function that visits each statement in the AST of the program and programmatically specifies when to fire the rule.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="p"&gt;...&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;visit_Assign&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;AST&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt; &lt;span class="n"&gt;link&lt;/span&gt; &lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Logic of Flake8 rule goes here
&lt;/span&gt;&lt;span class="p"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;





&lt;div class="highlight"&gt;&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="p"&gt;...&lt;/span&gt;
&lt;span class="nx"&gt;module&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;exports&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;create&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="s2"&gt;`VariableDeclarator`&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;node&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="c1"&gt;// Logic of the ESLint rule goes here&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;With Semgrep, I write my analysis in the way I would write my code.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;my_var = $Y
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Given this similarity of mental models for writing the code and the analysis for it, Semgrep lends itself as easy-to-learn and easy-to-prototype rule writing engine.&lt;/p&gt;

&lt;h2&gt;
  
  
  Overview of Semgrep
&lt;/h2&gt;

&lt;p&gt;Without diving into the details, the core design decisions made in Semgrep are as follows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Metavariables: used to track a variable across a specific code scope.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;...&lt;/code&gt; (ellipsis) operator: it abstracts away sequences so I don’t have to sweat the details of a particular code pattern. Namely, this implies that even my simple rules can match very complex code blocks. Hence, &lt;strong&gt;less is more&lt;/strong&gt;, when writing Semgrep.&lt;/li&gt;
&lt;li&gt;smart matching: Semgrep uses different pattern matchers depending on the code pattern I write. If I want to target function like &lt;code&gt;def $FOO(...): ...&lt;/code&gt; it will match function declarations. If I want to match statements with patterns like &lt;code&gt;$FOO = exec(...)&lt;/code&gt;, it will match only statements.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Personally speaking, Semgrep has the mantra of “learn once, write anywhere,” as I can very easily adopt my analysis for other languages. It’s worth noting that core of Semgrep engine was written at Facebook, a company that is known for the“learn once, write anywhere” mantra of React and React Native.&lt;/p&gt;

&lt;h2&gt;
  
  
  Semgrep vs AST based analysis frameworks
&lt;/h2&gt;

&lt;p&gt;r2c previously talked about how hardcoded password checks is a &lt;a href="https://blog.r2c.dev/2019/three-things-your-linter-shouldnt-tell-you/"&gt;common and noisy rule&lt;/a&gt;. While most rules optimize for completeness, we find that precision is just as important if not more.&lt;/p&gt;

&lt;p&gt;For the sake of argument, lets say I was to write a rule to detect hardcoded passwords within Semgrep and compare the ease of development with other AST-based analysis frameworks.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Language written&lt;/th&gt;
&lt;th&gt;Framework&lt;/th&gt;
&lt;th&gt;Link&lt;/th&gt;
&lt;th&gt;Line of Code&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Python&lt;/td&gt;
&lt;td&gt;Bandit&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/PyCQA/bandit/blob/master/bandit/plugins/general_hardcoded_password.py"&gt;B105&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;144&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Go&lt;/td&gt;
&lt;td&gt;Gosec&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/securego/gosec/blob/master/rules/hardcoded_credentials.go"&gt;G101&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;119&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;YAML&lt;/td&gt;
&lt;td&gt;Semgrep&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/returntocorp/semgrep-rules/blob/36c589195f54e7bd88ad3e71aed1d566a84bcb3c/go/gosec/hardcoded_credentials/hardcoded_credentials.yaml"&gt;Protoype&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Because I don’t have to write boilerplate code at all, the analysis written in Semgrep is significantly (5 -10x) shorter. In addition, the expressive power of abstractions like metavariables and ellipsis operators in my analysis saves the additional code I need in other frameworks. And unlike other frameworks, because the matching engine of Semgrep smartly determines the type of visitor to use, I don’t have to programmatically write the types of nodes to visit explicitly. Given all of this, it’s easy to iterate and reduce false positive rates extremely quickly.&lt;/p&gt;

&lt;p&gt;Lastly, just by simply changing the target language of my rule, I can actually adapt this Go rule to be used for Python or JavaScript. In contrast, if you were to adopt a Bandit rule for JavaScript, you’ll mostly likely have to rewrite it from scratch.&lt;/p&gt;

&lt;h2&gt;
  
  
  Semgrep vs grep-based tools
&lt;/h2&gt;

&lt;p&gt;Grep-based tools like Ripgrep have been used extensively in code analysis. However, the structure-agnostic nature of the grep tools make analysis prone to false positives.&lt;/p&gt;

&lt;p&gt;For example, if I simple want to find instances of sensitive function calls like &lt;code&gt;exec(...)&lt;/code&gt;, the Semgrep pattern &lt;code&gt;exec(...)&lt;/code&gt; matches &lt;code&gt;exec()&lt;/code&gt; called with any arguments or across multiple lines, but not the string "exec" in comments or hard-coded strings, because Semgrep is aware of the code structure.&lt;/p&gt;

&lt;p&gt;Having to specify grep patterns that only fire inside function calls would be very complicated to say the least, and impossible to say the worst.&lt;/p&gt;

&lt;h2&gt;
  
  
  Semgrep niceties
&lt;/h2&gt;

&lt;p&gt;Beyond pattern matching, Semgrep offers a very robust set of features for complex analysis. These sets of features make it extremely easy to do robust static analysis in less than half the time it takes using other static analysis tools.&lt;/p&gt;

&lt;h3&gt;
  
  
  Types
&lt;/h3&gt;

&lt;p&gt;For any metavariable I use, I’m able further hone my analysis with type hints. Currently, I may use &lt;code&gt;int&lt;/code&gt;, &lt;code&gt;float&lt;/code&gt;, and &lt;code&gt;string&lt;/code&gt; literals and formatted strings.&lt;/p&gt;

&lt;p&gt;For example, this check will only fires on &lt;code&gt;time.sleep($X: float)&lt;/code&gt; , but not on &lt;code&gt;time.sleep(foo()).&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Module path
&lt;/h3&gt;

&lt;p&gt;Another advantage of Semgrep is that it’s smart about module paths, such that I can target the specific object I care about in my analysis.&lt;/p&gt;

&lt;p&gt;For example, when I was writing a rule to target &lt;code&gt;[HttpResponse](https://docs.djangoproject.com/en/3.0/ref/request-response/#httpresponse-subclasses)&lt;/code&gt; of the Django framework, I needed to not fire on usage of the vanilla Python &lt;a href="https://docs.python.org/3/library/http.client.html#http.client.HTTPResponse"&gt;HttpResponse&lt;/a&gt;. Semgrep module resolution lets me do this very easily.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;return django.http.HttpResponse(...)&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Custom post-analysis filtering
&lt;/h3&gt;

&lt;p&gt;Another great feature I like about Semgrep is that, after doing my AST-based analysis, I like to hone in my analysis based on certain captured metavariables. This is very useful for the types of analysis where I have some whitelist or blacklisting logic of strings or other literal values.&lt;/p&gt;

&lt;p&gt;The following is an example rule that takes advantage of post-analysis filtering.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;patterns&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;pattern-either&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;pattern&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
          &lt;span class="s"&gt;rsa.GenerateKey(..., $BITS)&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;pattern&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
          &lt;span class="s"&gt;rsa.GenerateMultiPrimeKey(..., $BITS)&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;pattern-where-python&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
      &lt;span class="s"&gt;int(vars['$BITS']) &amp;lt; 2048&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Having written all my analysis in Flake8, ESLint, and Semgrep, the amount of time Semgrep save me is very significant. There’s no obvious degradation with quality of analysis I can write and the features built into Semgrep only amplifies what I can express with my simple patterns. As a bonus, prototyping rules against real code using &lt;a href="http://semgrep.live/"&gt;semgrep.live&lt;/a&gt; is very robust and functions like an IDE, which is a much better experience compared to &lt;a href="https://astexplorer.net/"&gt;https://astexplorer.net/&lt;/a&gt; or &lt;a href="https://python-ast-explorer.com/"&gt;https://python-ast-explorer.com/&lt;/a&gt;. Overall, without any bias or contention, I don’t want to go back to writing AST-based visitors now that I’ve found Semgrep.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Preventing SQL injection: a Django author's perspective</title>
      <dc:creator>Pablo Estrada</dc:creator>
      <pubDate>Tue, 12 May 2020 21:42:44 +0000</pubDate>
      <link>https://dev.to/r2c/preventing-sql-injection-a-django-author-s-perspective-2okm</link>
      <guid>https://dev.to/r2c/preventing-sql-injection-a-django-author-s-perspective-2okm</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a guest post co-authored by &lt;a href="https://jacobian.org/"&gt;Jacob Kaplan-Moss&lt;/a&gt;, co-creator of Django, and Grayson Hardaway.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What’s SQL Injection?
&lt;/h2&gt;

&lt;p&gt;SQL Injection (SQLi) is one of the most dangerous classes of web vulnerabilities. Thankfully, it’s becoming increasingly rare — thanks mostly to increasing use of database abstraction layers like Django’s ORM — but where it occurs it can be devastating.&lt;/p&gt;

&lt;p&gt;SQLi happens when code incorrectly constructs SQL queries that contain user input. For example, imagine writing a search function without knowing about SQLi:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GET&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'q'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;sql&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;f"SELECT * FROM some_table WHERE title LIKE '%&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;%';"&lt;/span&gt;

    &lt;span class="n"&gt;cursor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Can you spot the problem? Notice that the query comes from the browser: &lt;code&gt;request.GET['q']&lt;/code&gt;. Think about what might happen if that query contains a single quote. What happens when the SQL string is constructed?&lt;/p&gt;

&lt;p&gt;Consider if an attacker searches for &lt;code&gt;' OR 'a'='a&lt;/code&gt;. In this case the constructed SQL would become:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;some_table&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="k"&gt;LIKE&lt;/span&gt; &lt;span class="s1"&gt;'%%'&lt;/span&gt; &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="s1"&gt;'a'&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'a'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;So that’s bad; now we’re returning the entire contents of the table. This could be a data breach, or it could overwhelm your database server.&lt;/p&gt;

&lt;p&gt;But it gets worse; imagine now that the attacker searches for &lt;code&gt;'; DELETE FROM some_table&lt;/code&gt;. Now, the constructed SQL becomes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;some_table&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="k"&gt;LIKE&lt;/span&gt; &lt;span class="s1"&gt;'%%'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;DELETE&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;some_table&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Uh oh.&lt;/p&gt;

&lt;h2&gt;
  
  
  General concepts for preventing SQLi
&lt;/h2&gt;

&lt;p&gt;We’ll get to Django specifics shortly, but first it’s important to really understand the fundamental rules of preventing SQL injection:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Never&lt;/strong&gt; trust any data submitted by the user&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Always&lt;/strong&gt; use "parameterized statements" when directly constructing SQL queries&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Anything that comes from the user could be maliciously constructed. Even things that seem safe, like browser headers (e.g., things like the user agent, &lt;code&gt;request.META['HTTP_USER_AGENT']&lt;/code&gt; in Django) are trivial to tamper with either directly in the browser or with tools like &lt;a href="https://portswigger.net/burp"&gt;Burp&lt;/a&gt; or &lt;a href="https://www.charlesproxy.com/"&gt;Charles&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Practically, in Django this means nearly anything that hangs off the &lt;a href="https://docs.djangoproject.com/en/3.0/ref/request-response/#httprequest-objects"&gt;HttpRequest object&lt;/a&gt;, i.e., the &lt;code&gt;request&lt;/code&gt; parameter that’s passed as the first argument to view functions. Though there are some exceptions, it’s probably best to consider anything on &lt;code&gt;request&lt;/code&gt; as fundamentally untrustworthy.&lt;/p&gt;

&lt;p&gt;However, just because some piece of data isn’t attached to &lt;code&gt;request&lt;/code&gt; &lt;em&gt;right now&lt;/em&gt; doesn't mean that you can trust it. For example, consider something like an image caption. You might access it through an API that doesn’t mention a request:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;objects&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(...)&lt;/span&gt;
&lt;span class="n"&gt;sql&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;f"""SELECT * FROM images WHERE similarity(caption, '&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;caption&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;') &amp;gt; 0.5;
...
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;But if that image caption was previously entered by a user…it’s still dangerous. So this brings us around to the second rule: always use &lt;em&gt;parameterized statements&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Parameterized statements are a mechanism to pass any dynamic parameters separate from the SQL query. They’re either interpreted directly by the database or safely escaped before being added to the query. Almost every database client on the planet supports parameterized statements — and if yours doesn’t, find a different one.&lt;/p&gt;

&lt;p&gt;Here’s what the search function from above would look like with parameterized statements:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;cursor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="s"&gt;"SELECT * FROM some_table WHERE title LIKE '%?%'"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GET&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'q'&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Notice the &lt;code&gt;?&lt;/code&gt; in the SQL string, and the second parameter to &lt;code&gt;execute&lt;/code&gt;. This second argument is the parameter list; items in this list are &lt;em&gt;safely&lt;/em&gt; injected into the query to replace the question marks.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.python.org/dev/peps/pep-0249/"&gt;PEP-249, the Python database API standard&lt;/a&gt; requires parameterized statements, though different libraries may use different syntax for the placeholders (%-style parameters, &lt;code&gt;:named&lt;/code&gt; parameters, numeric parameters, etc.).&lt;/p&gt;

&lt;p&gt;You can use code analysis tools to check for SQL injections. &lt;a href="https://bento.dev"&gt;Bento&lt;/a&gt; is one such tool that has several checks for common SQL injection problems. This can catch many common errors; but it’s still a best practice to use parameterized statements and one of the techniques below to completely prevent this attack.&lt;/p&gt;

&lt;h2&gt;
  
  
  Preventing SQLi in Django
&lt;/h2&gt;

&lt;p&gt;Django’s ORM uses parameterized statements everywhere, so it is highly resistant to SQLi. Thus, if you’re using the ORM to make database queries you can be fairly confident that your app is safe.&lt;/p&gt;

&lt;p&gt;However, there are still a few cases where you need to be aware of injection attacks; a very small minority of APIs are not 100% safe. These are where you should focus your auditing, and where your automated code analysis should focus its checks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Raw Queries&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Occasionally, the ORM isn’t expressive enough, and you need raw SQL. Before you do, consider whether there are ways to avoid it -- for example, &lt;a href="https://dev.to/mblayman/understand-django-views-on-views-4f82"&gt;building a Django model on top of a database view&lt;/a&gt;, or &lt;a href="https://docs.djangoproject.com/en/3.0/topics/db/sql/#calling-stored-procedures"&gt;calling a stored procedure&lt;/a&gt; can help prevent the need to embed raw SQL in your Python.&lt;/p&gt;

&lt;p&gt;But, sometimes raw SQL is unavoidable. There are several APIs for doing this, but all are somewhat dangerous. In order of desirability, these are the APIs that Django provides:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;a href="https://docs.djangoproject.com/en/3.0/topics/db/sql/#performing-raw-sql-queries"&gt;Raw queries&lt;/a&gt;, for example:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;    &lt;span class="n"&gt;sql&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"... some complex SQL query here ..."&lt;/span&gt;
    &lt;span class="n"&gt;qs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MyModel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;objects&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;param1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;param2&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="c1"&gt;# ^ note the parameterized statements in the line above
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;The &lt;a href="https://docs.djangoproject.com/en/3.0/ref/models/expressions/#django.db.models.expressions.RawSQL"&gt;RawSQL annotation&lt;/a&gt;, for example:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;    &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;django.db.models.expressions&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RawSQL&lt;/span&gt;

    &lt;span class="n"&gt;sql&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"... some complex subquery here ..."&lt;/span&gt;
    &lt;span class="n"&gt;qs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MyModel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;objects&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;annotate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;RawSQL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;param1&lt;/span&gt;&lt;span class="p"&gt;]))&lt;/span&gt;
    &lt;span class="c1"&gt;# ^ note the parameterized statement in the line above
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;a href="https://docs.djangoproject.com/en/3.0/topics/db/sql/#executing-custom-sql-directly"&gt;Use database cursors directly&lt;/a&gt;, for example:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;    &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;django.db&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;connection&lt;/span&gt;
    &lt;span class="n"&gt;sql&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"... some complex query here ..."&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;connection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
     &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;param1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
     &lt;span class="c1"&gt;# ^ again, note the parameterized statement in the line above
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;AVOID&lt;/strong&gt;: &lt;code&gt;Queryset.extra()&lt;/code&gt; (no example: this is unsafe, so it's just included for completeness).&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;To use these APIs safely:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Read the first part of this article and make sure you understand parameterized statements before proceeding.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Don’t use &lt;code&gt;extra()&lt;/code&gt;. It’s difficult (if not impossible) to use in a way that’s 100% safe, and should be considered deprecated.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Always pass parameterized statements — even if your parameter list is empty. That is, you should write something like:&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;    &lt;span class="n"&gt;sql&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;'SELECT * FROM something;'&lt;/span&gt;
    &lt;span class="n"&gt;qs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MyModel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;objects&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;This is to remind you to later add parameters to this list, and to make it easier for automated tools like Bento to find potentially incorrect API usage.&lt;/p&gt;

&lt;p&gt;The query itself should always be a static string, rather than one formed from concatenation or any other string processing. Again, this is to make it easier for automated tools to find incorrect API usage.&lt;/p&gt;

&lt;h2&gt;
  
  
  Automatic Prevention
&lt;/h2&gt;

&lt;p&gt;It is good practice to use code analysis tools to catch preventable mistakes — to err is human, as the saying goes. Bento will automatically check Django code for SQL injection patterns. The following will check your codebase all at once for SQL injections caused by something hanging off of the request object.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;pip3&lt;/span&gt; &lt;span class="n"&gt;install&lt;/span&gt; &lt;span class="n"&gt;bento&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;cli&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; \
  &lt;span class="n"&gt;bento&lt;/span&gt; &lt;span class="n"&gt;init&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; \
  &lt;span class="n"&gt;BENTO_REGISTRY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;r2c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;python&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;django&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;security&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;injection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sql&lt;/span&gt; &lt;span class="n"&gt;bento&lt;/span&gt; &lt;span class="n"&gt;check&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="n"&gt;semgrep&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="nb"&gt;all&lt;/span&gt; &lt;span class="p"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Better than checking your current code, however, is checking your future code! Bento is designed to be run as a pre-commit hook or in continuous integration (CI) environments. Bento is diff-aware and will only check commits, ensuring a speedy workflow while keeping your code secure. When you init Bento on your project, it will automatically set itself up to check commits.&lt;/p&gt;

&lt;p&gt;This commit-based workflow is especially powerful for ensuring certain patterns &lt;strong&gt;never&lt;/strong&gt; enter your codebase. To practically eliminate SQL injection from your codebase, you should automatically detect that your code:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Always&lt;/strong&gt; uses parameterized queries.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Never&lt;/strong&gt; uses .extra().&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Bento can detect these patterns by using a different registry:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;BENTO_REGISTRY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;r2c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;python&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;django&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;security&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;audit&lt;/span&gt; &lt;span class="n"&gt;bento&lt;/span&gt; &lt;span class="n"&gt;check&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="n"&gt;semgrep&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="nb"&gt;all&lt;/span&gt; &lt;span class="p"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;This set of rules will highlight many more findings even when there is not a vulnerability. It is much stricter and can be overwhelming if you check your code all at once. However, you can also &lt;code&gt;archive&lt;/code&gt; your findings with Bento, which will suppress findings until you’re ready to deal with them. This lets you continuously check your code for these patterns without being overwhelmed by findings.&lt;/p&gt;

&lt;p&gt;Under the hood, Bento is powered by &lt;a href="https://semgrep.dev"&gt;Semgrep&lt;/a&gt;. Semgrep is a tool for easily detecting and preventing bugs and anti-patterns in your codebase. It combines the convenience of grep with the correctness of syntactical and semantic search. This has advantages over normal grep — the most obvious one being that Semgrep is not thwarted by line boundaries.&lt;/p&gt;

&lt;p&gt;Let’s say you wanted to detect the following SQL injection:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;search_term&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GET&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'search_term'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;cursor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"SELECT &lt;/span&gt;&lt;span class="si"&gt;\&lt;/span&gt;&lt;span class="err"&gt;* FROM &lt;/span&gt;&lt;span class="se"&gt;t&lt;/span&gt;&lt;span class="s"&gt;able WHERE field="&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;search_term&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;This can be expressed in Semgrep like so:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;VAR&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GET&lt;/span&gt;&lt;span class="p"&gt;[...]&lt;/span&gt;
&lt;span class="p"&gt;...&lt;/span&gt;
&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;CUR&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"..."&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; \&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;VAR&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Detecting a pattern like this in a commit-based workflow is invaluable because it effectively eliminates this pattern of SQL injection from your codebase! You can check this out in action at &lt;a href="https://sgrep.live/0X5"&gt;https://sgrep.live/0X5&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Other ORMs&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Finally, if you are continually finding that Django’s ORM isn't expressive enough, you may want to experiment with replacing Django’s ORM with &lt;a href="https://www.sqlalchemy.org/"&gt;SQLAlchemy&lt;/a&gt;, which is a more powerful and expressive ORM. You’ll lose out on many of Django’s conveniences like the admin, model forms, and model-based generic views, but will gain a more powerful and expressive API that’s still safe.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Custom ORM additions&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Finally, there are a few potentially-dangerous areas that may be unsafe even though you’re not using raw SQL directly. Django allows for the creation of &lt;a href="https://docs.djangoproject.com/en/3.0/ref/models/expressions/#creating-your-own-aggregate-functions"&gt;custom aggregates&lt;/a&gt; and &lt;a href="https://docs.djangoproject.com/en/3.0/ref/models/expressions/#writing-your-own-query-expressions"&gt;custom expressions&lt;/a&gt; -- e.g. a third-party library could write APIs such that something like &lt;code&gt;Document.objects.filter(title__similar_to=other_title)&lt;/code&gt; would work.&lt;/p&gt;

&lt;p&gt;Django’s core ORM -- the core expressions, annotations, and aggregations -- are all mature and battle-hardened. The odds of a SQLi in the core parts of the ORM is very, very low. But ORM additions -- especially ones that you write yourself -- can still be a source of risk.&lt;/p&gt;

&lt;p&gt;To mitigate the risk of injection from these advanced features, I suggest the following:&lt;/p&gt;

&lt;p&gt;First, be cautious about including custom expressions/aggregates from third-party apps. You should audit those third-party apps carefully. Is the app mature, stable, and maintained? Are you confidant that any security issues would be promptly fixed and responsibly disclosed? And, of course, be sure to pin your dependencies to prevent newer and potentially less secure versions from being installed without your explicit direction.&lt;/p&gt;

&lt;p&gt;Similarly, be cautious about writing your own custom aggregates. Carefully read the beginning of this article, and &lt;a href="https://docs.djangoproject.com/en/3.0/ref/models/expressions/#avoiding-sql-injection"&gt;Django's documentation about avoiding SQL injection in custom expressions&lt;/a&gt;. As the documentation shows, if possible you should avoid doing any string interpolation in custom expressions. If you can’t, you'll need to escape any expression parameters yourself. This is tricky to do right, and will depend on the specifics of your database engine and Python wrapper API. Consult an expert before diving in here!&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;django.security.audit&lt;/code&gt; registry in Bento will detect if a custom ORM addition is defined in your codebase; you could also quickly audit third-party apps with this. The exploitation conditions are very nuanced, so if you find this in your project, be sure to consult that expert!&lt;/p&gt;

&lt;h2&gt;
  
  
  Wrapping up
&lt;/h2&gt;

&lt;p&gt;Django was designed to be resilient against SQL injection (and other common web vulnerabilities). Most common uses of Django will be automatically protected, so SQLi vulnerabilities in real-world Django apps are thankfully rare.&lt;/p&gt;

&lt;p&gt;However, when they occur, SQLi vulnerabilities are devastating. It’s well worth your time to audit your codebase to ensure you’re safe. Bento can help by flagging several common vulnerabilities. Now that you understand the concepts, and why certain errors are flagged, you should be better equipped to write safe code.&lt;/p&gt;

</description>
      <category>django</category>
      <category>security</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Our quest to make world-class security and bugfinding available to all developers, for free</title>
      <dc:creator>Pablo Estrada</dc:creator>
      <pubDate>Wed, 05 Feb 2020 17:40:37 +0000</pubDate>
      <link>https://dev.to/r2c/our-quest-to-make-world-class-security-and-bugfinding-available-to-all-developers-for-free-27np</link>
      <guid>https://dev.to/r2c/our-quest-to-make-world-class-security-and-bugfinding-available-to-all-developers-for-free-27np</guid>
      <description>&lt;p&gt;&lt;em&gt;by Isaac Evans, &lt;a href="https://r2c.dev/team"&gt;CEO and co-founder @ r2c&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This post was originally published on the &lt;a href="https://bento.dev/blog/"&gt;Bento blog&lt;/a&gt; in late December 2019.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why we’re building
&lt;/h2&gt;

&lt;p&gt;One thing we’ve learned at &lt;a href="https://r2c.dev/"&gt;r2c&lt;/a&gt; is that most Python or JavaScript developers have never heard of—let alone tried—the tools some devs use to find deep flaws in code: like &lt;a href="http://www.codenomicon.com/"&gt;Codenomicon&lt;/a&gt;, which found &lt;a href="https://heartbleed.com/"&gt;Heartbleed&lt;/a&gt;, or Zoncolan at Facebook, which finds more &lt;a href="https://cacm.acm.org/magazines/2019/8/238344-scaling-static-analyses-at-facebook/fulltext"&gt;top-severity security issues&lt;/a&gt; than any human effort. Not only do these tools find severe issues, they save time by pointing out &lt;a href="https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43322.pdf"&gt;hundreds of thousands&lt;/a&gt; of issues before humans do.&lt;/p&gt;

&lt;p&gt;We believe every developer deserves access to powerful tools, but most don’t know about or can’t afford them. r2c’s mission is to make those tools available to those who want to find bugs, discover security problems, and save time but don’t work for a giant company that prioritizes these problems with nearly unlimited resources.&lt;/p&gt;

&lt;p&gt;That’s why we’re excited to release &lt;a href="https://bento.dev/"&gt;Bento&lt;/a&gt;! It’s a free and opinionated toolkit for easily adopting linters and program analysis in a codebase. It includes analysis we’ve written and packages fantastic community-created tools, all running offline (no code is ever shipped off your machine). Over the next few months we’ll release more novel checks and include existing tools; &lt;a href="http://eepurl.com/gDeFvL"&gt;subscribe for updates&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Some members of &lt;a href="https://r2c.dev/team"&gt;our team&lt;/a&gt; wrote early versions of these tools at places like Facebook. r2c started by building infrastructure to make it easy to run static analysis tools &lt;a href="https://app.r2c.dev/"&gt;at massive scale&lt;/a&gt; (see &lt;a href="https://www.usenix.org/conference/usenixsecurity19/presentation/zimmerman"&gt;our paper co-published at USENIX&lt;/a&gt;) but our goal has always been to take the learnings from scaling analysis to benefit individual developers directly: &lt;a href="https://www.belfercenter.org/project/defending-digital-democracy"&gt;folks helping small teams writing voter registration systems for their city&lt;/a&gt;, non-profits who serve communities targeted by powerful hostile actors, &lt;a href="https://techcrunch.com/tag/data-breach/"&gt;startups who handle sensitive data about fellow humans&lt;/a&gt;, or developers who just want to automate away code review.&lt;/p&gt;

&lt;h2&gt;
  
  
  How can I get Bento now?
&lt;/h2&gt;

&lt;p&gt;Bento is in &lt;strong&gt;alpha&lt;/strong&gt;, but you can try it right away:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;pip3 install bento-cli&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Here’s a short demo:&lt;br&gt;
&lt;code&gt;youtube: https://www.youtube.com/embed/rGwd1aEF8Yk&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;A lot of love from &lt;a href="https://r2c.dev/team"&gt;our small team&lt;/a&gt; has gone into Bento. Please try it on your Python or JavaScript projects and send us feedback!&lt;/p&gt;

&lt;h2&gt;
  
  
  But is this just a glorified linter?
&lt;/h2&gt;

&lt;p&gt;Well yes, but actually, no; Bento is currently a union of curated AST-based lints, including new ones written by us, tuned to find bugs that matter. Our roadmap takes us far beyond AST-based linting though: finding sql injection through taint analysis, detecting dangerous dependency upgrades, etc.&lt;/p&gt;

&lt;p&gt;Linters have done a good job reaching developers and improving code consistency, especially style. But we want to surface issues and checks that are deep and avoid arguing about &lt;a href="https://www.youtube.com/watch?v=cowtgmZuai0"&gt;spaces vs tabs&lt;/a&gt; in code review. Bento ships with configurations that are tuned on real-world data and focuses the finding on correctness and security. They are based on using our &lt;a href="https://app.r2c.dev/"&gt;platform&lt;/a&gt; to analyze swathes of open-source repositories and see what checks developers turn on and off (&lt;a href="https://dev.to"&gt;Three Things Your Linter Shouldn’t Tell You&lt;/a&gt;. Our opinion is that you should forget about style and use a deterministic, zero-config formatter (&lt;a href="https://pypi.org/project/black/"&gt;Black&lt;/a&gt; for Python or &lt;a href="https://prettier.io/"&gt;Prettier&lt;/a&gt; for JavaScript).&lt;/p&gt;

&lt;p&gt;As opposed to other tools that try to measure code-quality or concatenate linter output, we have skin in the analysis game; we’re already making some contributions back to the tools we include. We’re collaborating with a few linter authors already and we would love to offer free compute resources on our platform for measuring check quality to anyone else who might be interested (&lt;a href="//mailto:hello@r2c.dev"&gt;hello@r2c.dev&lt;/a&gt;).&lt;/p&gt;

&lt;h2&gt;
  
  
  Here’s what’s coming next
&lt;/h2&gt;

&lt;p&gt;Our immediate focus is writing custom analysis tools to find security and other issues for users of the &lt;a href="https://www.palletsprojects.com/p/flask/"&gt;Flask&lt;/a&gt; web framework. If you or someone you know uses Flask and has ideas on what we might detect, &lt;a href="//mailto:hello@r2c.dev"&gt;send us a note&lt;/a&gt; or &lt;a href="https://github.com/returntocorp/bento/issues"&gt;make an issue&lt;/a&gt;!&lt;/p&gt;

&lt;h2&gt;
  
  
  Bento core values
&lt;/h2&gt;

&lt;p&gt;Our first releases are about making it easy to install, adopt, and get started before we ship everything on our roadmap.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Find bugs that matter&lt;/strong&gt;&lt;br&gt;
Bento automatically enables and configures relevant analysis based on your dependencies and frameworks, and it will never report style-related issues. You won’t painstakingly configure your tooling, we did that already!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Go fast&lt;/strong&gt;&lt;br&gt;
No one should have to dig through thousands of linter results and fix them before they can start using a tool. Bento ships with a built-in archiving feature that lets you establish a baseline without fixing all the issues at once and just look at any new problems entering the codebase.&lt;/p&gt;

&lt;p&gt;This philosophy also applies to setup: Bento auto-configures in about 30 seconds, it’s easy to install in a Docker container, and it can even install itself as a pre-commit hook automatically.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Get better over time&lt;/strong&gt;&lt;br&gt;
Bento automatically tailors itself to your project by enabling checks that correspond to your language, framework, and dependencies. As time goes on and based on community feedback, we’ll be writing and shipping new checks that you can adopt automatically. &lt;a href="https://github.com/returntocorp/bento/issues"&gt;And we want your feedback&lt;/a&gt;!&lt;/p&gt;

</description>
      <category>security</category>
      <category>python</category>
      <category>webdev</category>
    </item>
  </channel>
</rss>
