DEV Community: OLABAYO BALOGUN

Steps to Follow for Container Security

OLABAYO BALOGUN — Sat, 07 May 2022 07:58:49 +0000

Containerization since it hit the market has optimized development, scalability of applications, and a ton of other utilities, so much so that it can easily be viewed by development teams as the gift that keeps on giving. Be that as it may, containerization for all its benefits has led to the unintended consequence of creating some application silos. Depending on the scale of your company’s digital operation, this can be a lot to handle even with the help of Docker.

Container security has become a hot topic in the wake of several cybersecurity attacks, notable among which is the cryptojacking attack on Tesla, which was temporarily successful due to their insufficient container security.

Container security involves bulletproofing your deployment environment and resources and inculcating Kubernetes security best practices to secure your Docker, host, and/or cloud deployment resources.

Important Steps For Container Security

Image Source

Container security isn’t a random “once and done” check you do when you’re about to deploy an application. And it’s not a periodic audit of your deployment environment to check for issues or free up space. Effective container security is policy and real-time container monitoring that leverages software and professional inspection to prevent external threats from slipping through the cracks.

Some of the best practices for container security are basic rule of thumb actions that do not require learning more than you already know. These are the steps we explore in this article.

It’s an open secret that a lot of organizations do not have a proper handle on how many containers they’re running. For massive corporations that are slowly transitioning into microservices architecture, you will have a mix of containers and monolithic applications to handle. It’s very important to know the location of all containers as well as their responsibilities in order to speed up monitoring and managing your container security.

Creating a Deployment Control Policy

During pre-deployment processes, it’s important to develop policy-based deployment control rules that will be implemented in your deployment environment. By infusing your rules into your Kubernetes object’s properties, you make it easier for other tools that evaluate your deployment control to determine if your image is ready for deployment or if it needs a second look. Little things like this can have a snowball effect if things go wrong. This is why it’s best to err on the side of caution.

Container Image Scanning

Starting with deployment, container image scanning has to rank as the first thing you need to do regardless of whether you’re deploying to a new operating system or one that already hosts other applications. Image scanning helps reduce the likelihood of deploying applications that are corrupted directly or are compromised as a result of the packages and other dependencies that they’ve been bundled with.

VPNs and MFA

While a lot of best practices dictate that you foolproof your containers against external threats, sometimes internal threats can be more damaging. The use of a VPN (virtual private network) and MFA (multi-factor authentication) for employees who have administrative access to your containers can reduce the likelihood of their user accounts falling into the wrong hands and being used maliciously. This measure is especially important now that working remotely has become the norm.

Using the Right OS

When deciding on operating systems for your containers, it’s best to leverage the OS that best aligns with the nature of your container. Reducing the amount of unnecessary packages and resources in an environment will directly reduce the number or possible loopholes that can be exploited. The rule of thumb is that if your container doesn’t interact with a dependency resource, the resource shouldn’t be in your OS. The rise in infected packages justifies the need to trim the excess in your OS.

Container Management

There are times when it’s easy to forget why a microservice architecture is revolutionary when building applications. Knowing when to make a component of your application into a separate service can sometimes be challenging. A mistake at this point of development can result in a container that does too many things and needs to constantly be modified.

Containers must remain lightweight and focused on specific responsibilities in order to reduce the number of modifications needed on it. This in turn reduces the likelihood of compromising the container security by mistakenly introducing resources that compromise the container.

Continuous Monitoring

Deploying your application and having your containers running without any challenges doesn’t exactly mean it’s ok to kick back and relax, it’s important that you continuously monitor your environment and containers as cybersecurity threats can spring out of nowhere. Moreover, by virtue of the fact that your applications interact with the internet and with users who have different motives, frequent inspections are needed to confirm that things are the way they should be or resolve anomalies in your container.

The use of monitoring tools can’t be overemphasized in container security. Procurement of container security tools can provide round-the-clock security for your containers and react faster to threats as they appear. Because the human ability to actively juggle multiple pieces of information is relatively finite, it’s more pragmatic to saddle container security tools with the bulk of the responsibility of enforcing policies and checking container images during and after deployment.

Conclusion

As the world pivots more applications from the monolithic architecture to microservice architecture, it’s important that cybersecurity is enforced not just in the application, but in the environment that helps the application interact with the broader internet. Protecting your digital infrastructure for the benefit of your clients and your organization must also include container security in order to guarantee holistic immunity to the attackers of hackers and other malignant actors.

There are tools that make the process of securing containers less tedious and it can be really handy to leverage them where necessary, depending on the size of your team and your budget.

Managing YAML Errors for Kubernetes Configuration

OLABAYO BALOGUN — Tue, 22 Feb 2022 06:10:04 +0000

The process of developing and scaling enterprise software solutions across the globe to serve netizens has evolved greatly over the last few decades. The dependence on hardware has rapidly reduced in favor of software optimizations that allow software engineers and development teams to scale up the utility they can derive from hardware installation by as much as is needed to get the job done.

Kubernetes is one such solution that has allowed us to significantly scale software solutions. However, to leverage its benefits, adequate knowledge of Kubernetes configuration is required to ensure that things don’t break in production. YAML is arguably the neatest tool that can be leveraged for providing configuration instructions when running and deploying applications using Kubernetes.

Despite the proven utility of YAML as a data serialization language and its ergonomics, there are unavoidable errors that occur using YAML in Kubernetes configuration. Learning how to avoid or manage these errors will greatly impact your development journey and the stability of your software solution using containerization. In this article, you’ll learn a number of tips for managing YAML errors for Kubernetes configuration.

5 Tips for Managing YAML Errors for Kubernetes Configuration

Keep Your YAML Structure Simple

The entire point of YAML is its simplicity, which is aimed at reducing convoluted code that isn’t manageable. Sometimes, this may mean using more common patterns like:

Source

Rather than patterns like:

Source

The latter can become harder to read if your list has nested objects or lists within them. All of these lean into the KISS (Keep It Simple Stupid) ideology that is prevalent in the field of software development.

Employ Consistent Conventions

This is arguably more important than the first tip. A code’s quality is made apparent when multiple professionals work on the codebase, but it looks like one person did it all. It is important to agree on a consistent writing style for your YAML configuration because as the size of your configuration grows, it may become much harder to spot errors in a configuration with multiple writing conventions.
Leverage Linters and YAML Formatters

Having linters can be incredibly helpful for proofreading your code on the go. Also, online YAML formatters can be equally helpful for providing insights on how to correct hard-to-spot errors that may be messing with your Kubernetes configurations. At other times, copying and pasting your YAML code in a new environment helps bring about a new perspective that will be instrumental in finding bugs. Finally, tools like Datree can save a lot of debugging time.
Leverage Multiple YAML Files

Trying to fit all of your configurations in one YAML file is a recipe for disaster because regardless of how easy YAML is on the eyes, debugging errors will be a lot tougher. Therefore, it is important to make use of multiple YAML files where possible to further decouple your Kubernetes configuration. As a rule of thumb, when trying to debug your YAML errors, it can be incredibly helpful to copy bits of the code into separate files and review them in bits rather than taking on the whole chunk of the YAML configuration in one file.
Seek Help from Peers

There are times when your YAML script’s bug is a result of a much deeper problem. In such instances, it can be very helpful to escalate your error messages to more experienced developers who will save you time that would have been wasted running in circles. Bear in mind, however, that you need to mask or remove sensitive data when sharing your code with the public to avoid compromising your digital infrastructure. Platforms like StackOverflow have a rich community of developers who have either been through what you’re going through or are able to debug your code better than you can.

Conclusion

Managing YAML errors during Kubernetes configuration can seem like looking for a needle in a haystack due to how many of the mistakes result directly from human error. More often than not, these errors can be hard to spot because, upon revision of the code, you’re more likely to see what you wanted to write rather than what you actually wrote. This challenge is further compounded by overly verbose YAML configuration files.

In this article, we discussed useful tips to ensure your YAML configuration files leverage the best aspect of YAML, which is its cleanliness. This will enable better configuration. You also learned a number of tips that drive home the need for keeping your configuration files small and decoupled to aid easy debugging and reduce the chances of something going wrong. The time lost in following best practices is ultimately recovered over time in the form of achieving an easily manageable digital infrastructure with minimal technical debt.

I hope you found this post enjoyable and easy to follow. If you have any queries or feedback, please feel free to leave a comment. Good luck!

Introduction to Data Engineering

OLABAYO BALOGUN — Thu, 23 Dec 2021 08:42:38 +0000

Introduction

The year is 2021, and the internet as we know it is transitioning into web 3.0, with web 4.0 becoming more than a concept note. Underneath all the pomp and fanfare, we produce a massive amount of data that needs to be refined and properly categorized before all of the massive digital infrastructures we leverage on a daily basis can work.

Data engineering is a practice where you design and build systems to collect, store, and analyze data at scale. A more relatable explanation is that data engineering is what is done to data that we receive from users (or alternative sources) to ensure that the said data is useful and can be relied upon to make decisions that may result in a product or service.

For many, the term data engineering seems like something that should fall under the purview of data scientists. However, the reality is that data engineers have entirely different responsibilities from what data scientists do. Data engineers create an enabling environment for data scientists to perform their analysis of data.

In the past (and present in some organizations that are behind the times), data scientists would complain about the data they needed to analyze. Most of the data they were working with required a lot of cleanup to be useful. These cleanups constituted 80% of their workload and delayed data analysis. Data engineering exists to get rid of this bottleneck.

To the untrained eye, having data scientists clean the data might not look like much, but for organizations in fast-moving industries, such as machine learning, it can be quite a nightmare for data scientists. Data engineering exists to improve the efficiency of data scientists by working with all stakeholders, including software engineers and database administrators.

Applications Of Data Engineering

The applications of data are inexhaustible. But, despite the work of software engineers and other well-meaning professionals, the data that makes its way from applications to databases is far from what data scientists need. While there are processes like input validation put in place by software engineers, a huge amount of data slips through the cracks and must be fixed.

Data Cleansing

If there’s anything this article must have sold you on, it should be this: we need data engineers to clean up our data. For many, data is hard to conceptualize. We perceive data as just pictures, music, videos, and word documents floating around cyberspace and possibly on devices. Perceiving data in such a way prevents us from seeing the need for clean up of data.

Data cleansing or data cleaning is when you fix or remove incorrect, corrupted, inaccurately formatted, repeated, or incomplete data within a dataset. Data cleansing is different from data transformation (though some argue that data transformation is encompassed by data cleansing). Data transformation involves changing the format or the structure of data.

If data is dirty, then it is unreliable. The impact of unreliable data manifests in algorithms. A recent example of what can happen when algorithms are wrong is Zillow’s $569 million financial scandal, where the algorithm was blamed for the error. However, algorithms work hand in hand with data. So, a bad algorithm is a function of bad data.

ETL

ETL is an acronym for extract, transform and load. ETL describes the process of taking data from one or more sources, manipulating it, and storing it somewhere else. The process of manipulation of this data changes some of the characters and characteristics of the data. ETL is where data warehousing is called upon.

The amount of liquid capital required to store and manage data gave rise to industries like PaaS (Platform as a Service) and IaaS (Infrastructure as a Service), all of which can be regarded as a cloud data warehouse. ETL has to occur because if a project is big enough, the data it requires will come from different sources, and these sources may have different storage conventions.

At other times, ETL is used to create databases in compliance with laws and as a company decides to change its cloud storage processes. On a small scale, this may not seem like much. However, when you’re dealing with thousands of databases holding millions of datasets, ETL becomes a project in itself.

Feature Engineering

Feature engineering describes the act of identifying and manipulating key component(s) of unprocessed data to create statistical models and predictive models through the use of machine learning. As a result of the amount of esoteric knowledge one needs to have about the dataset, data engineers are the best qualified to carry out feature engineering.

Modeling is at the heart of machine learning, artificial intelligence, and other disciplines that require strong predictive analysis. Feature engineering is a critical process that can determine how algorithms behave, and, if Zillow is anything to go by, mistakes at this level can prove costly if not quickly identified.

Conclusion

Data engineering remains indispensable for making sense of the noise we now know as data. The work that goes into clearing the rubble to create order among data sets is one that is well worth its weight in gold now that data is beginning to take on a larger-than-life persona. As we stabilize web 3.0 and try to chart our course for web 4.0, our need for solid footing on not just data, but refined and structured data, can very well be the determining factor in building the internet of the future.

5 Advantages of Data Versioning Technology for AI/ML

OLABAYO BALOGUN — Wed, 08 Dec 2021 18:30:05 +0000

Introduction

Anyone who has ever used a computer can attest to the immeasurable value of ctrl/cmd + z. The ability to undo changes is the greatest attribute of computers—a great tool that provides great possibilities. And its usability in data versioning has been a game-changer in coding.

Data versioning by itself represents some of the ideologies of time travel. One can go back and forth between different versions of data. In this article, we’re going to look at how data versioning can be advantageous for AI/ML projects.

Advantages of Data Versioning Tech for AI/ML

While data versioning has been around a lot longer than the phrase “data versioning”, it’s worth noting that data versioning technologies have been built to give this process a more robust use-case have a number of advantages that are either underutilized or ignored by AI/ML teams, some of these “underneath your nose” perks are summarized below.

1) Organization: AI and ML work closely with big data. As a result, it deals with mind-boggling data sets. As data sets become more extensive, it becomes harder and harder to work with them without leveraging the organization capacity of data versioning. The ability to work with data reduces as the project size increases. This eventually results in a tipping point where the project is unmanageable.

For context, imagine building Google’s search engine with binary language. It’s practically impossible. The use of high-level languages makes the project less herculean. In such massive projects involving multiple teams, data versioning is invaluable.

2) Debugging: The process of debugging in production, remotely or in development, is one that takes a toll on even the most experienced software engineers. Especially when a project has numerous data points, it can be hard to tell where things have gone wrong if there is an issue in a new update. Data versioning makes it easy to compare changes and review impacts.

This is especially important when an update doesn’t have the desired result. With multiple teams working on a project, mistakes during testing or during development can often remain undetected till the project is live. Data versioning is a lifesaver in this regard.

3) Reversion of Changes: Something that doesn’t get talked about enough is the impact of being able to revert changes made to a codebase or dataset. While there’s no conclusive evidence to support or adequately measure the impact of being able to revert data, it is a great tool for development teams.

Millions of hours that would have been lost to debugging are saved due to the use of data versioning. It helps rollback changes when they have less than the desired effect. Data versioning doesn’t just benefit development teams. It also makes it easier to work with clients who are resistant to change.

Considering the amount of data and resources being expended on AI/ML development projects, the absence of reversion of changes would make collaboration even harder. A lot of manual copying and saving of projects would be needed to gain the benefits data versioning provides.

Technical debt, which involves managing a bad situation due to the fallacy of a sunk cost or having to restart a project, are two hard choices that can be avoided through data versioning. Data versioning technologies make reversion of changes synchronized, seamless, and less error-prone.

4) Collaboration: Since AI and ML are still in their infancy, many hands are needed to help build them quickly. Data versioning technology enables teams to collaborate regardless of distance and in real-time. This has yielded a positive growth in the speed at which the fields of AI and ML are growing.

The pandemic had a huge impact on the way we work. The field of software development was able to rebound a lot quicker than other fields owing largely to data versioning. Data versioning technology quickly became the new confluence where work could be aligned, and it showed in the increased earning of the top tech companies.

5) Fluid Development: Features like CI/CD (continuous delivery and continuous integration) and extended forms of automation testing are only possible due to the advances that have been made in data versioning. Development is a lot sleeker with less downtime.

Before CI/CD, the rollout of new changes could only occur during off-peak hours and after spending a considerable amount of time informing clients of the impending change. These changes would then be gradually monitored over a period of time, with the development team on standby. Such practices are now a thing of the past.

Conclusion

AI and ML is still a developing arm of software engineering. While there’s a lot we don’t know, it seems clear that data versioning will play a huge role in the evolution of AI/ML. As technology and trends change, data will remain the bedrock upon which AI and ML are built.

For these reasons, there’s an implicit and explicit burden placed on development teams to properly leverage data versioning technologies in line with global best practices. This helps ensure that projects costing millions of dollars and billions of datasets don’t end up abandoned due to an increasing difficulty in managing and manipulating data.

Introduction to Typosquatting Attacks

OLABAYO BALOGUN — Tue, 05 Oct 2021 06:28:55 +0000

Introduction

The internet is a digital universe that shares a lot in common with a maze. It’s so easy to lose our way and tumble into a rabbit hole that feels like an abyss. From pop-up ads to other intrusive forms of advertising that redirect us to other platforms, netizens have to contend with a lot of assault to surf the web.

Notable among the many boobytraps on the internet is one called typosquatting. It is a situation where malignant actors and cutthroat marketers register domain names that are somewhat similar to the domain names of organizations and popular entities with the hope of ensnaring unsuspecting netizens who click or mistype the URL of their intended website.

Many times, typosquatting attacks can be used to redirect to other platforms (a competitor’s website, betting, ads, or pornographic platforms). However, there are times when typosquatting can be used for utterly malignant activities, one of which is automatically installing dangerous software on your device. This can happen by simply visiting these platforms; a strategy termed drive-by download.

Image Source

Typosquatting is a cyber-attack strategy that is used in every sphere of human interaction. While the business use is more common, there are famous cases where it has been utilized in politics to defame other political candidates and redirect potential political donors to the typosquatter’s donor. This was famously done in the 2020 US presidential election.

Common Typosquatting Attack Strategies

Image Source

Typosquatting is built on the Achilles heel of the human mind that makes us likely to generalize words to infer their meaning quickly, preserve brain power, and boost efficiency. A lot of research has gone into trying to understand why humans are prone to typographical errors. Hence, it’s safe to say typosquatting attacks will continue to have a disconcerting success rate.

Common methods of typosquatting are:

Omission: This is when a domain name similar to a popular domain name target is registered, albeit with one or more letters missing. One example is registering a domain name like “Gogle.com” in a bid to ensnare users who subconsciously assumed the URL to be “Google.com”.
Addition: This is when an extra letter(s) is added when registering a domain name that looks like the target domain name of the typosquatting attack. One example is registering a domain name like “G0oogle.com” because it looks like “Google.com”.
Transposition: This is when a domain name intended to look like the target domain name is registered with the positions of the letters of the URL swapped around. One example is registering a domain name like “Googel.com” because it looks like “Google.com”.
Substitution: This is when a domain name has substituted characters to confuse unsuspecting netizens that they’re the same thing. For example, one can use “G00gle.com” to imitate “Google.com. Hint: zeroes have been used in the first one. It may or may not be apparent, depending on the font you’re using.

All of the above tricks of typosquatting are a means to an end. Typosquatting is typically the pipeline for executing unethical endeavors like:

Impersonating financial organizations to illegally obtain the financial details of victims for the purpose of theft or money laundering.
E-commerce fraud, which can occur by getting unsuspecting victims to pay for products that don’t get delivered.
Hate pages aimed at demarketing the product or services of a business and defaming public figures.
Hacking unsuspecting users by facilitating a drive-by download of malware or ransomware.
Intrusive advertising, which is aimed at bombarding victims with adverts and redirecting them to an advertising page. Redirecting victims to betting or pornographic platforms.

Reducing Risk Exposure To Typosquatting Attacks

Image Source

Due to the ease with which typosquatting attacks can be executed, it may appear like all hope is lost. However, there are cybersecurity tips that can help end-users, business owners, and software engineers avoid these boobytraps. It is worth noting that there is a burden on software engineers and DevOps engineers to reduce the likelihood of success of typosquatting attacks.

Use browsers like Google Chrome that have policies that help ensure netizens can safely surf the internet.
DevOps engineers need to purchase domain names that are similar to the domain name of their client to redirect traffic from those domain names to the client’s main domain name and to forestall typosquatting attacks. For example, “www.google.co” redirects to “www.google.com” (a smart move by the Google team).
Netizens need to avoid clicking on links without reading through them carefully.
Pin your favorite websites to your browser homepage (Google Chrome, Opera Mini, and other browsers do this automatically).
Leveraging development tools: Software engineers typically leverage packages and libraries to build enterprise software solutions. Unfortunately, this can sometimes expose the developer and software solution to unintended typosquatting attacks.

Because a lot of packages and libraries consumed have names that sometimes defy normal language conventions, developers are less likely to carefully scrutinize resources before using them. This risk can have damaging consequences if things go wrong. As a result, many teams are beginning to rely on software solutions that help minimize this risk.

Tools like WS Diffend (a free tool for Ruby applications) help protect development teams and their solutions from falling prey to cyber threats that arise as a result of digital handshakes with third-party (and sometimes open source dependencies). As a result of the number of dependencies that developers use, tools like WS Diffend are a must.

WS Diffend is particularly notable because it scans for risks in Ruby applications (with a Javascript version nearing release) and offers robust, end-to-end cybersecurity that guards against typosquatting attacks, accidental injections, botnet code injections, viruses, package tampering, Ruby CVEs, dependency confusion, etc.

Conclusion

Typosquatting is one of those obvious threats that are still easy to fall for. It’s surprising how the simplest tricks work so well. We are all busy professionals who make thousands of decisions daily. As such, it’s common to make split-second decisions or skip out on reading things clearly before clicking or visiting websites.

There is an explicit burden placed on software engineers to help reduce the success rate of typosquatting by collaborating with cybersecurity experts and leveraging tools like WS Diffend in foolproofing software solutions.