DEV Community

Cover image for Designing a Global Surveillance System for the NSA. A Look at Frostbite
Tim Udoma
Tim Udoma

Posted on

Designing a Global Surveillance System for the NSA. A Look at Frostbite

On September 11, 2001, the deadliest terrorist attack in US history occurred, with over 11,000 people either killed or injured. This led to a cascade of events and proactive measures to eliminate even the slightest chances of a recurrence. One such measure was the enactment of the USA PATRIOT act which although permitting substantial incursions on privacy, further empowered the National Security Agency (NSA) and related intelligence agencies to prevent terrorism by solidifying their liberty to engage in global surveillance.

Successful efforts were made in the past to achieve global surveillance. In 2008, XKeyScore, a data mining and exploitation system, was developed to gather nearly everything a user does on the internet. With over a decade of rapid developments in technology, there has been an enormous increase in data and changes in human-computer interaction. In fact, a whole new environment, the Metaverse, now exists due to advancements in Artificial Reality (AR) and Virtual Reality (VR), thus blurring the barrier between digital and physical existence. Consequently, there is an urgent need for the development of a system that not only extends and addresses the limitations of XKeyScore but adopts cutting-edge technologies to fulfill the NSA’s mission of gaining a decisive advantage for the nation. This new system, code-named Frostbite, would be an intelligence hive for the NSA (its custodian), the Federal Bureau of Investigation (FBI), and the Central Intelligence Agency (CIA).

Why Frostbite?

One limitation of XKeyScore is its inability to store data for longer due to influx. In some locations, as much as 20TB of information was received per day from the internet at a potential rate of 10gigabits per second. To curb this, the system was designed to only store content data for at most five days and metadata for 30 days. Content data refers to the actual data contained in the resources gathered. For example, the texts in an email, the voices in an audio recording, etc. On the other hand, metadata refers to supplementary information describing a resource. For example, a call log contains metadata like who was called, how long the call lasted, and the locations (based on cell towers) of the caller and receiver.

Frostbite largely addresses this limitation through the use of modern and cloud architecture. Giants like Google, Microsoft, AWS, and Oracle will provide the necessary enterprise capabilities through the Joint Warfighting Cloud Capability (JWCC) project. Migration to the cloud would allow the NSA to focus on its core objectives by offloading infrastructure concerns to professionals in form of cloud providers, thus shifting from Capital Expenditure (CAPEX) to Operating Expenditure (OPEX), a cost-effective approach. Additionally, the cloud would enhance the agility of field agents using edge devices as data would not have to travel across continents to a central server, but to the nearest data center for processing and storage.

As a central intelligence hive, Frostbite will consolidate data previously held in disparate systems like DISHFIRE, Boundless Informant, FAIRVIEW, and XKeyScore into a single data lake of highly voluminous, varied, and near real-time data. This consolidation will include but not be limited to financial transactions, text messages, compromised computer networks, telephone information, emails, phone calls, drones, satellite images, and fiber-optic cables; basically every device on earth. Resultingly, the NSA will have a bird’s eye view of whatever happens in the digital world in near real-time.

Implementation Detail

The unprecedented scale of incoming data demands that a distributed cluster architecture is used; one that can share workloads efficiently among commodity computers. Hadoop has been an effective tool for this job and rightly so. In combination with Hadoop, Spark, an in-memory computing engine, would be run on Kubernetes clusters to handle processing needs.

Database

In a distributed environment, relational databases are unsuitable due to the Consistency Availability Partition tolerance (CAP) theorem constraints, which basically states that scalability and availability cannot be obtained at the same time as high consistency as shown in fig 1.0. To illustrate, imagine there are three shops, each with an individual number of pencils available. To get the total number of pencils in the store, the store owner employs an assistant to go to each store at noon to get the total number of pencils and record it in a book. However, because each store makes daily sales, the individual number of pencils at any store may change and the logbook will not reflect the most up-to-date record until the assistant performs the routine task of updating the counts at noon. Conversely, if there was only one store, there would be no need for this tedious process. The “tedious process” is what is known as eventual consistency and is fundamentally how distributed databases work. While they cannot always provide an updated record of events, they are sure to be highly available and scalable; just like the collection of stores will be more likely to have a pencil.

Image description

Fig 1.0 CAP theorem.

NoSQL databases unlike relational databases are more suited to handling data at the scale required by Frostbite. Cassandra is a NoSQL database that is excellent at writing data at a high velocity and is, therefore, the preferred database for this system. Under the hood, it uses a commit log and writes sequentially to the disk which, in addition to improving its fault tolerance (Pries & Dunnigan, 2015), prevents rapid SSD disk failures. Disk failures are expensive and are typically avoided by addressing these seemingly subtle concerns as Uber recently experienced. In terms of reading performance, Cassandra uses caching to improve speed. This is a technique where recently used or frequently accessed data is kept in high-speed temporary storage for later access.

Data Analysis

A key feature of Frostbite is its ability to perform predictive modeling based on the data available. As a case study, imagine a field agent who is successfully authenticated and authorized by the system using Neuralink and an inconspicuous AR/VR glass. Through the glasses, data is sent to the system for real-time facial recognition and identification of other potential threats which could be dealt with before they materialize as shown in fig 1.1.

Image description

Fig 1.1 Edge communication with the cloud (Chan et al 2017).

This facial recognition would be achieved through the use of OpenCV and other relevant algorithms related to image processing, human physiology, and pattern recognition. Furthermore, Spark provides ML tools like SparkMLLib, a rich suite of statistical and Machine Learning (ML) tools for further diagnostic, descriptive, and predictive analytics like summary statistics and clustering.

Discovery

Discovery is a feature that powers the search and navigation of data combined from sources. This is essential considering the plethora of data expected to be ingested on Frostbite. The use of a discovery engine will, for example, enable an analyst to seek out all pdf documents created two weeks ago mentioning terrorism on the internet (Document Cloud, 2022). Elastic search, based on Apache Lucene would provide discovery by indexing both structured and unstructured data and allowing for fast and flexible retrieval.

Visualization

As humans, we are better able to discern patterns when we see them. Computers, on the other hand, are really good at crunching numbers and can quickly find patterns within. To gain a better understanding of what the data we have reveals, it is pertinent to translate this data into some sort of trend (histogram, map, pie chart, etc.) that we can see. This will help us make faster decisions. ArcGIS, a geographic information system will be used to visualize geospatial data on maps, as it allows for real-time data and deep image analysis.

Conclusion

Desperate times require desperate measures. The desperate times existing today require proactive monitoring. Frostbite, being at the core of national intelligence promises far-reaching benefits.

The system will utilize state-of-the-art technology to continue to enable agile decision-making. Hadoop, Spark, and Cassandra will form the backbone of the system by providing a distributed landscape for data ingestion, processing, and storage. Furthermore, a discovery engine like Elastic Search will give the needed flexibility for finding needles in a haystack, the needle being targets of interest; some we know, some we don’t. Finally, visualization in form of maps, statistical methods, and color variations will unlock insights whether for executives at the top level or agents in the field, helping them make faster decisions.

If you learned something new from this article, please like and share

Top comments (0)