DEV Community: STX Next

Python vs. Node.js: Comparing the Pros, Cons, and Use Cases

Adam Przewoźny — Sun, 29 May 2022 12:38:59 +0000

Originally written by Adam Stempniak and Jacek Mirowski

We love Python—that much is clear.

But is it a miracle drug to remedy all your software development challenges?

No, it’s not. And if I were you, I’d be suspicious of anyone telling you otherwise.

Whether it’s building software or doing just about anything in life, it’s rare that you can apply the same solution to every problem.

Each software project comes with its own unique set of needs and requirements. What works for one may not work for another at all. At STX Next, we use whatever tech stack fits a given project best.

That being said, comparisons are inevitable. After all, there are so many programming languages and frameworks to choose from that you can’t be faulted for wanting a little help in picking the one that is right for you.

Right alongside Python vs. Golang or Python vs. Java, one of the most popular queries we’ve seen lately is Node.js vs. Python. We’re gonna shed some light on that.

Read on for our in-depth look at Python and Node.js to learn their differences and similarities, strengths and weaknesses, and most importantly: which is better?

What is Python and what is Node.js?

Before we jump to specifics, we should clarify what it is we’re actually comparing.

Python is a programming language; Node.js is not. The language here is JavaScript, while Node.js is a runtime environment for JavaScript.

What this key difference between Python and Node.js means is that when you write in Node.js, you use the same language for both the frontend and the backend.

Without further ado, here’s a more detailed breakdown of both choices.

What are the advantages of Python?

Python is much friendlier to junior developers

Inexperienced Node.js developers can easily slow down development by making mistakes common to people who don’t fully understand some of the more advanced concepts and workings of JavaScript.

This may be attributed to the way JavaScript has advanced over the years. Concepts like object-oriented programming received meaningful traction only after plenty of far less elegant code snippets and tutorials had spread online.

Python, on the other hand, is very beginner-friendly, which is likely why it’s currently the most popular introductory language at U.S. Universities. Most of the mistakes junior Python developers may make in some frameworks are more forgivable and less of a blocker moving forward.

Most Python frameworks don’t require a very high skill level from the developers

A good example of that is Django, which has a mature ecosystem and allows you to write quality code fast.

Python is more universal and versatile

While it can also be used for desktop apps, thanks to Electron, the main use case for Node.js is the web. The applications of Python, however, go far beyond the web.

Python is a major contender to become the leading programming language of data science. It’s a great asset to system administrators because it allows them to easily write small, one-off scripts, at the same time providing larger sysadmin tools like Ansible.

For an example, look no further than Dropbox, one of the most common desktop apps written in Python.

Python is simpler to use, better covered, and better documented

This gives Python an edge over Node.js, even though both technologies are admittedly very fast to write in.

Node.js is a JavaScript implementation, a language with a long history, and predominantly a frontend tool. Therefore, solutions found for Node.js online may be inadequate for backend use or rely on inaccessible interfaces.

This cannot be overstated, since each JavaScript framework or runtime environment tends to have slight deviations in how it implements JavaScript. The situation has admittedly improved over the years, but it’s still a far cry from Python, where CPython, a single implementation, is used in the vast majority of Python projects.

What are the advantages of Node.js?

Node.js developers are more flexible

Because we have the same language on the frontend and the backend, you need to be a JavaScript developer to use Node.js.

"Node.js is a perfect tool if you want to rapidly develop your application. That's because you can use the same language (JavaScript) to develop the backend and the frontend sides of the app. Therefore, one programmer can implement the whole feature easily on their own, without the need to know another language. This also implies a reduction in development costs."

David Solomon, Node.js developer at STX Next

This interdisciplinarity can come in handy when you least expect it, especially when there’s a fire you need to put out quickly.

But let’s leave that eventuality aside for now. The fact remains that if you have solid project understanding and the right people to build it, you can’t go wrong with JavaScript and Node.js.

"What I like about Node.js is that it uses the same language that I use on the frontend. This allows me to use the same libraries and tools (including TypeScript!) for both parts of my application, which results in a smaller tech stack. Additionally, I don't need to do almost any request data processing, as it is most often in JSON format, which is just JavaScript objects, ready to be used. That is what makes Node.js cooperation with frontend applications perfectly splendid."

Przemysław Lewandowski, Senior JavaScript developer at STX Next

The ecosystem of Node.js is less opinionated

Many Node.js packages are simple, single-purpose libraries and microframeworks forcing developers to make more conscious decisions what to use and when.

This makes Node.js more demanding and requires a higher level of advancement on the developers’ side than what it takes to write code in Python on top of Django, using built-in solutions for ORM, caching abstraction, and so on.

With Node.js, you can use JavaScript to code everything

Node.js allows you to code both the frontend and the backend using JavaScript. This means you don’t need that many different technologies, which in turn means you don’t need your developers to learn yet another implementation of the same programming paradigms.

At the stage of assembling the team, you often don’t know how many Python or JavaScript developers you’re going to need. Your needs also may differ in particular Sprints—if you’re working in Scrum, that is—depending on your goals.

Using the same language for the whole project removes that risk from the equation. You can even share a part of the code between the front and the back. It’s a huge benefit, not having to waste time doing the same thing twice.

The Node.js community is large and JavaScript is one of the most dynamic and fastest-growing programming languages out there

JavaScript has been growing at an exponential rate, with over 500 new packages being produced every day. Python no longer has the advantage of numerous libraries and frameworks it used to enjoy for many years, as JavaScript has caught up to it on that front by now.

"What I like most about Node.js is that it's JavaScript. There is one language for the frontend and the backend. Another advantage of Node.js is that it's easy and popular, and it comes with plenty of packages, which actually also applies to Python."

Bartosz Marciniec, Node.js developer at STX Next

Alas, it’s not all sunshine and rainbows for JavaScript—but we’ll get to that in a second. First, an infographic!

Node.js vs. Python: an infographic

Okay, let’s take a breather for a quick recap before we go on with our comparison of Python and Node.js.

Here’s a visual summary of everything we’ve covered so far:
There; all caught up!

Let us now move on to discussing a particular issue that JavaScript—and, by extension, Node.js—is facing.

Problems with JavaScript and Node.js

JavaScript may be developing super fast now, but that wasn’t always the case.

In the beginning, the language was written haphazardly, and to this day it is still struggling with issues caused by its old versions and their compatibility.

JavaScript’s recent rise in popularity has brought with it another curious downside.

So many developers have turned to the language in such a short period of time lately that it’s pretty difficult to keep up with all the new updates and tech intricacies. See for yourself!

The rapid growth of JavaScript libraries brings with it documentation problems, which consequently results in poor quality for Node.js. That is precisely why more skilled developers are necessary for the backend—handling it well requires more preparation beforehand.

Sadly, this is nothing new for JavaScript; it’s actually quite typical. If history has taught us anything, it’s that Python has always been more reliable.

What should you especially consider when comparing Node.js and Python?

Comparing technologies is always a challenge, and the question of Python vs. Node.js is no different.

Here are 3 main points, each focusing on a different side of the story, that will help you make an informed decision.

Trending technologies

Python is perfectly suited for trending technologies, especially machine learning.

Python is a major player in the world of data science and offers several tried and tested libraries that support ML.
It’s easier to find machine learning experts who are well versed in Python than JavaScript.
MicroPython—a lighter and smaller version of Python—can be run with less power and fewer resources, making it a perfect fit for IoT devices (JavaScript has an equivalent in the form of Espurino, but it’s significantly less popular).

On the other hand, Node.js is more comfortable to use for the Internet of Things, when you compare the most popular libraries for Python or JavaScript.

As always, the choice is yours, and it depends on what you’re trying to build.

Node.js allows you to use new technological trends earlier, though it comes with a risk: you may need to rewrite your entire project later. For long-term projects, Python is far less risky.

Why is that the case? Because the ecosystem of JavaScript seems like anarchy when compared to Python. Each JavaScript user is pulling in their own direction, in a sense, which results in substantial trend fluidity. Because of that, technologies like Isomorphic JavaScript or Meteor become outdated much quicker.

It’s different with Python. Significant changes are introduced slowly, sometimes incredibly so. What other way is there to call the 10-year-long support for Python 2.7?

This instability and unpredictability of JavaScript is precisely why Python is the safer choice.

Speed and performance

Here’s the thing about Node.js: it can’t do too many things at the same time unless you write very well in it. Both Python and Node.js only pretend to be able to do a whole lot of stuff at once, but Python applications tend to use a simpler, more traditional multiprocessing model instead of a more advanced asynchronous paradigm.

Use Node.js poorly, and you might easily end up with a slow-working, low-performance product. If your project involves a particularly time-consuming task, it’ll put all the focus there while other functionalities will lag behind.

When written correctly, your software will send information to the operating system that it needs to perform a certain function and move on, while the system works on that. But if you write that poorly, the app will wait for the system to complete that task, doing nothing else in the meantime.

For the end user, this will seem like your software is slow. Such problems notoriously occur when a lot of people use your product at the same time.

A huge benefit of Python is that some of its frameworks are specifically designed to spare you the trouble. Granted, Django will also work slowly if written poorly, but it has built-in solutions to handle high load that make it easier to prevent that outcome. That is only one of many instances when Python puts fewer technical expectations on the developers.

The main difference is that Node.js is designed to use a small number of workers. This is why it may slow down easily when some of the workers hang. However, it may also perform much better due to not wasting time on context switching between them.

The opposite approach is to use many workers. In this case, when some of them hang, the users served by the rest of them don’t suffer. The most recent Node.js releases also support worker threads, for CPU-bound workloads, though it is still an experimental feature.

"Node.js is really good for developing real-time applications. It’s also quite easy to learn, which makes it straightforward to become a full-stack developer thanks to JavaScript. Unfortunately, Node.js is single-threaded so we have some cases when it’s not advisable to use that environment. Besides, Node.js is slow when we compare it to, for instance, Go."

Kacper Małkowski, Node.js developer at STX Next

Team composition

Like we said before, each project is special and has its own needs. You need to understand those needs to build it successfully.

Truth is, Python is better suited for some projects and JavaScript for others. Let’s not forget that building a software product is a fluid process. You often end up adapting your tech stack as you go along—usually the frameworks, but sometimes the language, too.

However, the most important thing is your team composition.

Do you have good Python developers? Use Python!

Do you have good Node.js developers? Use Node.js!

Some team members work on one part of the project, some on the other. Sometimes team composition is decided upfront, but needs to be changed on the go, both on the frontend and the backend.

Who you have on your team dictates both the choice of language and the choice of frameworks, above everything else.

Naturally, the problem solves itself if you’re lucky enough to have full-stack developers with both Python and JavaScript on your team. Those, however, are in short supply and it’s much more common to be working with people who are well versed in one or the other.

Is Python or Node.js better?

Now, that’s the million-dollar question!

Truth be told, the winner is… neither?

We do get that you were hoping for a short-and-sweet takeaway to help you make your choice, so here it goes:

If you only have a group of junior developers with little experience on hand, go with Python; if your team is more skilled and accomplished, choose Node.js.

But there’s more to it than that.

Even though Node.js favors seasoned players in the software engineering game, expert developers claim that it doesn’t actually offer them a whole lot in return. Yes, they need to call upon their expertise to use Node.js effectively, but it doesn’t really matter to them which of the two technologies they pick.

So in the grand scheme of things, you could say that Python wins, because it doesn’t discriminate against junior developers.

However…

Final thoughts on Node.js and Python

The bottom line is what the expert developers say: at the end of the day, it doesn't make that much of a difference whether you choose Python or Node.js.

Some Python frameworks, like asyncio, allow you to work in Python the same way you would in Node.js. It’s possible to make the experience really similar if you care about it enough.

The journey may differ, but the destination can be very much the same. From a certain point of view, comparisons such as “Python vs. Node.js” are maybe just a little bit… pointless.

Why? Because it all comes down to your team.

The language and the frameworks you choose may not necessarily be better, but it won’t matter as long as you have the right people on the team.

Talk to your team members. Ask them questions.

How do they feel about the choice of language?

Would they prefer to work in Python or Node.js?

Don’t pay that much attention to tool selection; pay all the attention to team composition.

Your team is what makes or breaks your software project.

R vs. Python: What’s the Real Difference Between R and Python?

Adam Przewoźny — Sun, 22 May 2022 14:21:10 +0000

Originally written by Susan Johnson and Maciej Urbański

The swift proliferation of data into our lives has resulted in the rise of tools used to analyze and extract valuable insights from this information. Python and R are the two most popular programming languages used to dissect data. If you’re venturing on a new data science project, choosing between them can be challenging.

Both R and Python are state-of-the-art in terms of their orientation toward data science excellence, making it a tough decision to find the better option. If you use the Venn diagram to map the capabilities of the two languages, you will see a lot of convergence around the data-focused fields.

Nevertheless, Python and R have varying strengths and weaknesses. They also take a different approach to developing code and sharing results.

Learning about both Python and R is obviously the ideal solution to choosing the right language. To help you do just that, we wrote this article. Below we’ll discuss:

the differences and similarities of the two languages,
their advantages and disadvantages,
what the future has in store for them.

What is R? What is R used for?

Developed by Ross Ihaka and Robert Gentleman more than two decades ago, R is an open-source programming language and free software that possesses one of the richest ecosystems to perform statistical analysis and data visualization.

R features a broad catalog of statistical and graphical methods, including linear regression, time series, machine learning algorithms, statistical inference, and more. Additionally, it offers complex data models and sophisticated tools for data reporting.

Popular among data science scholars and researchers, there’s a library for almost every analysis you may wish to perform. In fact, the extensive array of libraries makes R the top choice for statistical analysis, particularly for specialized analytical work. Many multinational corporations (MNCs) use the R programming language, such as Facebook, Uber, Airbnb, Google, etc.

Data analysis with R is completed in a few short steps—programming, transforming, discovering, modeling, and then communicating the results. When it comes to communicating the findings, this is where R truly stands out. R has a fantastic range of tools that allows sharing the results in the form of a presentation or a document, making reporting both elegant and trivial.

Typically, R is used within RStudio—an integrated development environment (IDE) that simplifies statistical analysis, visualization, and reporting. But that’s not the only way to run R. For instance, R applications can be used directly and interactively on the web through Shiny.

What is Python? What is Python used for?

Python is an object-oriented, general-purpose, and high-level programming language that was first released in 1989. It emphasizes code readability through its substantial use of white space. All in all, it was built in a way that it is comparatively intuitive to write and understand, making Python an ideal coding language for those looking for quick development.

Some of the world’s largest organizations—from NASA to Netflix, Spotify, Google, and more—leverage Python in some form to power their services. According to the TIOBE index, Python is the third most popular programming language in the world, only behind Java and C. Various reasons contribute to this achievement, including Python’s ease of use, its simple syntax, thriving community, and most importantly, versatility.

Python can be used for various projects, from data analytics and visualization to artificial intelligence, language development, design, and web development.

Python is especially great for deploying machine learning at a large scale, as it has libraries with tools like TensorFlow, scikit-learn, and Keras, which enable the creation of sophisticated data models that can be plugged directly into a production system.

Additionally, a lot of Python libraries support data science tasks, like the ones listed below:

Astropy—a library featuring functionalities that are ideal for use in astronomy
Biopython—a collection of non-commercial Python tools to represent biological sequences and sequence annotations
Bokeh—a Python interactive visualization library that helps create interactive plots, dashboards, and data applications quickly
DEAP—a computation framework perfect for rapid prototyping and testing of ideas

(Looking for more examples of useful Python scientific libraries? Read all about them on our blog.)

The differences between R and Python

If you’re planning to choose either Python or R for your next software project, it’s essential that you know the different features of both languages so you can make an informed decision. Here are the primary differences between R and Python.

1. Learning curve

Generally, the ease of learning would primarily depend on your background.

R is quite hard for beginners to master due to its non-standardized code. The language looks clunky and awkward even to some experienced programmers. On the other hand, Python is easier and features a smoother learning curve, though statisticians often feel that this language focuses on seemingly unimportant things.

So, the right programming language for your data science project will be the one that appears closer to the way of thinking about data you’re used to.

For instance, if you prefer ease and time-efficiency over everything else, then Python might seem more appealing to you. The language demands less coding time, thanks to its syntax that’s similar to the English language.

It’s a running joke that the only thing that pseudo-code needs to become a Python program is saving it in a .py file. This allows you to get your tasks done quickly, in turn giving you more time to work with Python. Additionally, R’s coding requires an extended learning period.

2. Popularity

Python and R are both popular. However, Python is used by a broader audience than R. R in comparison to Python is considered a niche programming language. Many organizations, as stated earlier, use Python for their production systems.

R, on the other hand, is generally used in the academia and research industry. Though industry users favor Python, they are starting to consider R due to its prowess in data manipulation.

3. Packages

Both R and Python offer thousands of open-source packages you can readily use in your next project.

R puts forward a CRAN and hundreds of alternative packages to perform a single task, but they are less standardized. As a result, the API and its usage greatly varies, making it hard to learn and combine.

Additionally, the authors of highly specialized packages in R are often scientists and statisticians and not programmers. This means the outcome is simply a set of specialized tools designed for a specific purpose, such as DNA sequencing data analysis or even broadly defined statistical analysis.

However, R’s packages are less mix-and-match than Python’s. Currently, some attempts are being made to orchestrate suites of tools, like tidyverse, which gather packages working well together and using similar coding standards. When it comes to Python, its packages are more customizable and efficient, but they’re typically less specialized toward data analysis tasks.

Nevertheless, Python does feature some solid tools for data science like scikit-learn, Keras (ML), TensorFlow, pandas, NumPy (data manipulations), matplotlib, seaborn, and plotly (visualizations). R, on the other hand, has caret (ML), tidyverse (data manipulations), and ggplot2 (excellent for visualizations).

Furthermore, R has Shiny for rapid app deployment, while with Python, you will have to put in a bit more effort. Python also has better tools for integrations with databases than R, most importantly Dash.

In simple words, Python will be the ideal choice if you’re planning to build a full-fledged application, though both choices are good for a proof of concept. R comes with specialized packages for statistical purposes, and Python is not nearly as strong in this particular field. Additionally, R is very good at manipulating data from most popular data stores.

Another aspect worth mentioning here is maintainability. Python allows you to create, use, destroy, and duplicate a wild and vibrant menagerie of environments, each with different packages installed. With R, this happens to be a challenge, only exacerbated by package incompatibilities.

Experts often use Jupyter Notebook, a popular tool for scripting, rapid exploration, and sketch-like code development iterations. It supports kernels of both R and Python, but it’s worth mentioning that the tool itself was written and originated in the Python ecosystem.

4. Visualizations

R was explicitly created for data analysis and visualization. Hence, its visualizations are easier on the eyes than Python’s extensive visualization libraries that make visualizations complex. In R, ggplot2 makes customizing graphics far simpler and more intuitive than in Python with Matplotlib.

However, you can overcome this issue with Python using the Seaborn library that offers standard solutions. Seaborn can help you achieve similar plots to ggplot2 with relatively fewer lines of code.

Overall, there are disagreements about which programming language is better for creating plots efficiently, clearly, and intuitively. The ideal software for you will depend on your individual programming language preferences and experience. At the end of the day, you can leverage both Python and R to visualize data clearly, but Python is more suited for deep learning than data visualization.

5. Speed and performance

Python is a high-level programming language, meaning it’s the perfect choice if you’re planning to build critical applications fast. On the other hand, R often requires longer code for even simple processes. This significantly increases development time.

When it comes to execution speed, the difference between Python and R is minute. Both programming languages are capable of handling big data operations.

Though either R or Python aren’t as fast as some compiled programming languages, they circumvent this issue by allowing C/C++-based extensions. Additionally, communities of both languages have implemented data-managing libraries leveraging this feature.

This means data analysis in Python and R can be done at C-like speed without losing expressivity or dealing with memory management and other low-level programming concepts.

Python vs. R: Advantages and disadvantages

Both Python and R have pros and cons. A few of them are noticeable, while others can easily be missed.

Advantages of R

R is a comfortable and clear language for professional programmers, since it was mainly created for data analysis. Therefore, most specialists are familiar with how the language works.
Checking statistical hypotheses only takes a few lines of code with R, as many functions necessary for data analysis come as built-in language functions. (But remember that this does come at the cost of customizability.)
RStudio (IDE) and other essential data processing packages are easy to install.
R has many data structures, parameters, and operators that involve many things—from arrays to matrices, recursion, and loops alongside integration with other programming languages like Fortran, C, and C++.
R is primarily used for statistical computations. One of its primary highlights is a set of algorithms for machine learning engineers and consultants. In addition, it is used for classification, linear modeling, time series analysis, clustering, and more.
R puts forward an efficient package repository and an extensive array of ready-made tests for almost all types of data science and machine learning.
There are multiple quality packages for data visualization for various tasks. For example, users can build two-dimensional graphics and three-dimensional models.
Basic statistical methods are executed as standard functions that boost the development speed.
With R, you can find numerous additional packages for every taste—whether you want a package with data from Twitter or one for modeling pollution levels. Every day, more and more packages reach the market, and all of them are collected under a single roof: the special CRAN repository.

Disadvantages of R

Like any other programming language, R comes with a few disadvantages.

Typically, the R programming language offers low performance, though you’ll still be able to find packages in the system that allow a developer to improve the speed.
Compared to other programming languages, R is highly specialized, meaning skills in it can’t be as easily applied to other fields than data processing.
As most of the code in R is written by people who aren’t familiar with programming, the readability of quite a few programs is questionable. After all, not every user sticks to the guidelines of proper code design.
R is the perfect tool for statistics and standalone applications. However, it doesn’t work that well in areas where traditional general-purpose languages are used.
You can use the same functionalities of R in various ways, but the syntax for several tasks isn’t entirely obvious.
As there’s an extensive number of R libraries, the documentation of a few less popular ones can’t be considered complete.

Advantages of Python

Python is widely used for its simplicity, but that doesn’t mean it has low functionality.

Being a multipurpose language, Python is great for data processing. The language comes in handy there especially because it facilitates easy development of a data processing pipeline where the results are incorporated into web applications.
Programmers find Python particularly beneficial due to its interactivity that’s crucial for testing hypotheses interactively in data science.
Python is being actively developed. With every new version, the performance and syntax keep improving. For instance, version 3.8 featured a new walrus operator, which is quite the event when it comes to any language. In other languages like Java and C++, the rate of change is comparatively slower—changes need to be approved by a special committee that holds meetings every few years. Python changes are proposed by PEPs, and make it into the language often even after a single release cycle, which is one year. In simple words, this means Python is evolving faster than R.

Disadvantages of Python

When it comes to choosing software for data analysis, visualization is a vital capability you should consider. However, while Python has an extensive list of libraries for visualization, choosing a single option can be too overwhelming. Furthermore, visualization in Python is often more complicated than in R, and its results are also not entirely clear sometimes.
Python lacks alternatives for most R libraries, which makes statistical data analysis and/or R-to-Python conversion challenging.

The future of Python and R

As far as programming languages go, there’s no denying that Python is hot. Though it was created as a general-purpose scripting language, Python quickly evolved to be the most popular language for data science. Some even began to suggest that R is doomed and destined to eventually be replaced completely by Python.

However, while Python might appear to be consuming R, the R language is far from dead. Regardless of what the naysayers claim, R is making a furious comeback into the data science arena. The popularity indexes continue to show this programming language’s repeated resurgence and prove that it’s still a strong candidate to consider in data science projects.

Ever since its advent, R has consistently risen in popularity in the world of data science. From its #73 spot in December 2008, R became the 14th most popular language in August 2021 on the TIOBE index. On the other hand, Python took over the second position from Java this year, hitting an 11.86% popularity rating. Meanwhile, R had a popularity rating of 1.05%, a decrease of 1.75% from the previous year.

“Although R is still used by academics and data scientists, companies interested in data analytics are turning to Python for its scalability and ease of use,” Nick Kolakowski, senior editor at Dice Insights, said. “Relying on usage by a handful of academics and nobody else might not be enough to keep R alive. That’s not viable,” he wrote.

Similarly, Martijn Theuwissen, the co-founder of DataCamp, admits that Python has momentum. However, he denies the assertion that R is dead or dying. According to him, “Reports of R’s decline are greatly exaggerated. If you look at the growth of R, it’s still growing. Based on what I observe, Python is growing faster.”

Many other data points also suggest that Python’s success over the years has come at the expense of R. Nevertheless, measuring the popularity of a language is an extremely difficult task. Almost every language has a natural life, and there is no foolproof way to pinpoint when their lifecycle might end. In the end, there is no way to predict the exact future of any given language.

Summary and final thoughts on R vs. Python

Python and R are both high-level, open-source programming languages that are among the most popular for data science and statistics. Nevertheless, R tends to be the right fit for traditional statistical analysis, while Python is ideal for conventional data science applications.

Python is a simple, well-designed, and powerful language that was created with web development in mind. However, it is still efficient at data science projects.

Python is relatively easy to learn, as it focuses on simplicity. So, provided you have access to the right tools and libraries, the language can effortlessly take you from statistics to data science and beyond to a full-fledged production app. In fact, this is one of the most significant advantages of using Python.

Furthermore, unlike with other programming languages like JavaScript, the choice of frameworks isn’t too overwhelming with Python. Therefore, you’ll be able to create a practical and reliable toolbox without worrying you aren’t using the latest tech.

On the other hand, R’s most significant advantage is the presence of highly specialized packages that can take you effortlessly through the not-so-customizable pipelines of data manipulation. However, R was created for statistical computing, and people without prior experience find it hard to work with the language initially.

Even so, there are instances where you can use a combination of both languages. For instance, you can use R in Python code through r2py. This is particularly beneficial when you’re outsourcing computation to R.

If you’re interested in learning more about Python, here are a few of our resources that can help:

Python vs. Java: Comparing the Pros, Cons, and Use Cases

Adam Przewoźny — Sat, 21 May 2022 22:57:37 +0000

Originally written by Adam Stempniak

Some battles seem unwinnable and have apparently been raging forever.

Star Wars vs. Star Trek. Cats vs. dogs. Apple vs. Samsung.

But when it comes to software development, different conflicts tend to come up:

Quality vs. time. Time vs. cost. Cost vs. quality.

These concerns play a key role in choosing the programming language for your project, which is one of the first major decisions you have to make.

As a Python software house, we are intimately familiar with the challenge of contrasting Python with other languages:

Python vs. Golang. Python vs. Node.js. Python vs. Java.

In this article, we’ll focus on the last one.

Granted, such comparisons aren’t as set in stone as they may appear. It’s usually a little more complicated than a cut-and-dry list of pros and cons.

For example, building an MVP in Java can take months, while Python gets you the same results in weeks. And yet, Java is still popular with big banks and fintechs. Are they justified in their choice?

There’s a lot more to the story. Each language has different use cases, and you should look at what matters most to you when you make your choice.

Without further ado, let’s break down in detail how Python compares to Java.

1. Interpreted vs. compiled and dynamic vs. static

The differences between Python and Java start at the most basic level.

Python is an interpreted language, while Java is a compiled language.

Interpreted languages convert human-readable code to machine-readable code on the go, as the program executes commands, making it easier to revise or debug.

Conversely, compiled languages must translate source code into machine code before run time, making the code harder to revise or debug.

What’s more, Python is dynamically typed, while Java is statically typed.

Even though code translation has nothing to do with type-checking, their definitions are mirror images of one another: dynamic typing means checking types during run time, while static typing means checking types before execution.

The resulting difference is noticeable in compilation time. Generally speaking, we could say that Python launches faster and runs slower, while Java launches slower and runs faster.

It’s also a conflict between flexibility and reliability, respectively. Python and Java have opposing approaches to when errors are detected and how strict each language is about allowing variables to change types.

2. Entry point: Python gets you started faster

One of my colleagues once said something that stuck with me:

“You can learn the basics of Python over a weekend and start coding.”

While slightly exaggerated, this statement isn’t far from the truth.

Python has a low entry point and is very user-friendly, making it the perfect choice for junior developers and programming newcomers.

Getting started on Python quickly is only one side of the coin, though; it takes much more time to learn how to use it well.

Python can give you the same functionalities as Java, but only if your developers have more experience in it, which can be inconvenient in some cases.

With Java, there is a definite learning curve and the entry point is high. It takes a lot of time to start writing in it and get to know it well, and each API is a different story.

However, once your developers put in the work, you will end up with higher-quality code from day one. So the time Java demands is well spent, but it will take a while before you see the results.

In short, it takes weeks to get started on Python, and months to get started on Java.

3. Stability: Java is slightly more stable

Java always requires more code. The language is designed in such a way that everything needs to be defined from the get-go.

This means you obviously need more time to review code written in Java, because there’s simply more of it—not to mention fixing all the potential issues you may find. When it’s bad enough, you’re actually better off rewriting the whole thing from scratch, rather than burning time and money to debug it all.

But this code volume doesn’t necessarily have to be a bad thing.

Certainly, the more code you have, the more complicated it gets—but if you write it well, you get more robust and stable software that crashes less. It may not matter as much for smaller-scale projects, but it’s a very different story for large ones that process a lot of data of all kinds.

This perceived stability is the reason large companies see Java as a strong language that gives them order and security. Big players like banks or fintech businesses usually don’t think twice before settling on Java due to its seemingly superior stability, without considering other options.

While Java may have traditionally been the go-to language for corporations, we should ask ourselves: can we really say that Java is the enterprise solution?

Not quite. Python is also well equipped to handle large-scale software products; otherwise, tech giants like Dropbox, Instagram, or Google wouldn’t have chosen it for their tech stack. Actually, there’s a particularly strong case to be made for using Python in fintech. All in all, it would be inaccurate to say that Python gives you an unstable product.

So why the preconception that Java is better for corporations?

Other than sheer code volume—which isn’t always an advantage in and of itself—Java is seen as enterprise-friendly because of the strong library support it enjoys.

Java offers plenty of libraries that help you perform various tasks common in enterprise applications. Examples include:

Drools (a Business Rule Engine),
Lucene (a search engine),
Hadoop (a Big Data tool).

Libraries matter. It’s the same reason Python is AI/ML-friendly—more on that later.

The bottom line is that performance is a sum total of a plethora of factors, mostly involving your code environment and external support. That being said, the stability scale is slightly tipped in Java’s favor.

4. Speed: Python is perfect for building an MVP fast

Python is known for its speed and famously easy to write in, making development really fast. If you’re pressed for time and looking to meet a deadline, you should go with Python.

Building an MVP with medium-quality code written in Python, then refactoring it later is a perfectly valid solution. Sometimes, you even start off by doing a mockup to see how your product is going to work before you decide on the actual language. Python is an ideal choice for that purpose.

Time is decidedly in favor of Python. It can take months to build an MVP with Java, while with Python you can get to that stage in a matter of weeks.

In line with this, Java projects usually take years—a year-long project is considered small—while for Python it is perfectly normal to have projects that require only months of work.

What’s more, Java usually requires larger development teams. Python demands fewer developers, and sometimes even one will suffice, helping you lower the total cost of your project.

For all of these reasons, Python is a great choice for startups. If developing an MVP as fast as possible is your top priority, Python won’t let you down.

5. Resources: Java requires a larger investment than Python

Another reason why Java is considered the language of corporations, is because development in it demands a large budget and a lot of time. It’s a sizable investment all around.

Python is more cost-effective than Java, which is why small- and medium-sized projects prefer it. For most use cases, it’s a perfect fit.

Mind you, just because Java is more stable and expensive doesn’t mean Python is unreliable or a lower tier language. Far from it.

Writing some projects in Java can be overkill—form over content, if you will—but large companies with resources to spare often choose it over other languages simply because it is the pricier solution, and thus better in their eyes.

The logic there is debatable, but it’s not like those corporations suffer for it.

If you have plenty of time and a generous budget at your disposal, there’s no reason not to go with Java. You will end up with a product of highly comparable quality to Python, though your development will be longer and slower.

6. Trending technologies: Python is the best choice for AI/ML

There are no two ways about it: Python has no equal when it comes to trending technologies.

The easy architecture of Python makes it a perfect fit for artificial intelligence, while Python’s simplicity and clarity gives it the necessary edge over other languages to effectively design the complex internal logic of machine learning.

Writing either in Java would require much more code, slowing down development and losing focus of the task at hand in the process.

But the main reason Python has been adopted as the go-to solution for trending technologies is the strong support it offers with its wide variety of ready-made libraries. Whatever technical novelty you’re after, there’s a Python tool out there to help you out.

The race for quicker implementation of AI or ML is only picking up speed, and there’s every indication that Python will not only stay in the leader’s seat, but also continue to become more popular and widely used.

7. Key takeaways

If your top priority is development speed, go with Python; if your top priority is stability over all else, go with Java.

Generally speaking, Java is better equipped to handle complex tasks. On the other hand, Python is clearer, easier, and simpler—to read, write, and modify.

Keep in mind that this is an oversimplification. Your choice should always depend on your individual needs, your budget, and the type of project you have in the works.

However, if the current trends continue and the language keeps growing in popularity, one thing is certain: Python is the future.

Does that mean we will soon see a day when one language emerges victorious, and the other fades into obscurity?

Unlikely.

What’s far more likely is that the battle will carry on for years to come, like so many others of the same kind.

It’s up to you to decide: which side are you on?

Python vs. JavaScript: Is It a Fair Comparison?

Adam Przewoźny — Sun, 15 May 2022 11:53:03 +0000

Originally written by Michał Słupski

When we talk about building a project with Python or JavaScript, we very rarely mean building every software component with one programming language.

That’s just not how modern software development works. If you want to build software that’s up to standards, make it before the deadline, or create an app that will handle millions of users, you’re usually going to end up using several languages, frameworks, tools, and APIs.

So if we want to compare Python vs. JavaScript, we should talk about building mission-critical components of your software with either language.

This is going to be our main theme for this article, and we’ll also talk about:

how Instagram became the biggest Python app in the world,
what are the most natural applications of Python and JavaScript,
how these two languages complement each other.

A quick introduction to Python and JavaScript

Before we get into the nitty-gritty, let’s go over a few basic facts about Python and JavaScript. I won’t bore you with irrelevant details. This is just a rundown of how these languages came to fame, and what’s unique about their current position in the world of programming.

How Python became one of the biggest programming languages

Python first came out in 1991. It was built as a general-purpose programming language, so it can be used to solve any problem that can be quantified and described in code.

The tech market has seen a big surge in Python’s popularity in recent years. It was already popular thanks to web development frameworks like Django, and because it was popular in the academic environment. Then it became the language of choice for machine learning and data processing, which further increased Python’s popularity.

Thanks to the web development framework Django, Python is also quite popular on the web—although not quite as popular as JavaScript. In the 2020 StackOverflow developers survey, JavaScript holds the top position with 67.7% out of 65,000 developers using it. Python is used by 44.1%. In terms of frameworks, JavaScript libraries and frameworks jQuery, React and Angular hold the 3 top spots. Django comes up on the 10th spot.

The cool thing about Python is that it’s used by many scientists and researchers. For people well-versed in the complexities of science, Python is an easy language to learn, even if they aren’t particularly tech-savvy. It’s very useful for fast prototyping, which makes it even more appealing for scientists.

It’s all because Python is one of the simplest languages, but it’s a kind of easy-to-learn, hard-to-master sort of thing. Even if you’re not a programmer, but you want to automate a simple process—like scraping data from a website, or moving data from one program to another—it shouldn’t take you long to get Python to do the work for you.

And if you put the time in and really master Python, you can use it to build a wide variety of software.

JavaScript’s long way to becoming a general programming language

JavaScript first came out in 1995. Web apps weren’t a thing back then, and the goal of JavaScript was to make the web into a real application platform.

JavaScript hasn’t had an easy life. It was hated by a lot of developers due to some of the design choices, as well as poor marketing, and it was limited technologically by low internet speed and low bandwidth. Plus, for a long time there was a problem with cross-browser compatibility, making it hard for developers to build sites that would work on all browsers.

For several years, its popularity was growing at a pretty stable pace. One of the first libraries that removed the issue of cross-browser compatibility was jQuery, released in 2006. It made it easy to add interactivity to websites. The next major framework was AngularJS. It was later replaced by Angular 2+, which is still very popular in enterprise-scale solutions.

Around 2011/2012 was the first time that JavaScript became supported by all major browsers at the time—Firefox, Chrome, Opera, and Safari. But even now, it’s still not 100% supported. That’s because JavaScript is regularly getting new features, so browser developers have to constantly work on improving JS support.

In 2013, the Facebook engineering team released React, which quickly became popular, and played a big part in cementing JavaScript’s position as the web’s favorite workhorse.

Of course, this is an extremely simplified version of JavaScript’s history; the real version is much longer, and more complex. The main point is that a lot of things had to happen in order to bring JavaScript to where it is now.

At the moment, new versions of JavaScript are becoming more similar in design to a full-fledged general-purpose programming language.

When to use Python vs. JavaScript for mission-critical components

When a mission-critical component or system breaks down for too long, your whole project goes belly-up. This is the part where choosing the right technology really matters.

With low-priority systems, you can browse around, try different options, and optimize costs. When you try to do that with mission-critical systems, you might end up writing a death sentence for your project from the start.

For example, when you’re managing a mature photo- and video-sharing application with over 1 billion users worldwide, the servers that process the incredibly large amounts of content are mission-critical. The app that I’m thinking of is of course Instagram, or “the world’s largest Python site.”

As one of Instagram’s engineers put it, “Instagram Server is entirely Python-powered.” The Instagram server application is a “monolith, one big codebase of several million lines and a few thousand Django endpoints.” Every single photo, video, and like goes through the most popular Python web framework Django, as another Instagram engineer mentioned in a presentation.

Why does Instagram use Python to manage mission-critical servers?

Instagram uses a big chunk of servers at the massive Facebook-owned data centers. Engineers don’t just manage the looks of the app, how your feed works, or the content suggestion algorithms. They literally have to make sure that the CPUs of their servers don’t overheat.

That’s an extremely difficult task. Why did they choose Python as the main language?

The answer can be found on the Instagram developer blog: “We initially chose to use Python because of its reputation for simplicity and practicality, which aligns well with our philosophy of ‘do the simple thing first.’ But simplicity can come with a tradeoff: efficiency.”

Simplicity and practicality. Martin Fowler, a true software development guru with decades of experience, and author of several books, once wrote, “Any fool can write code that a computer can understand. Good programmers write code that humans can understand.”

Which means that even when your goal is to make sure that machines don’t overheat from serving billions of users everyday, you don’t achieve that goal by being a better machine whisperer. You do that by writing code that other developers can easily understand, so they can quickly debug it if necessary, or build on top of it without wondering if they’ll break the system.

Python is perfect for this purpose, because of its readability, cleanliness, and ease of understanding.

Does Instagram use JavaScript?

Now we get to the interesting part. Even though Instagram engineers use Python for their whole server, Python isn’t responsible for how the interface looks. It stores and manages all the data, but the interface that you see on your smartphone is built with native programming languages, and a lot of help from JavaScript.

The mobile interfaces are built in Swift (iOS) and Java (Android), but the popular mobile frontend JavaScript framework, React Native, also plays a big part. Instagram engineers chose it because they wanted to have high developer velocity—which means they wanted to be able to add new features to both iOS and Android versions of their app as fast as possible.

React Native is exactly what they needed, because it allows engineers to use the same code to ship features to different systems. They can use JavaScript code to create native interface views on both systems.

They could’ve used another approach, like building separate interfaces in Swift and Java. But they chose the middle option, and went for React Native. Maintaining interfaces in Swift and Java, with support from React Native, allows Instagram developers to optimize costs and development time, making their life easier.

Python vs. JavaScript—which language has more uses?

Instagram’s example is a good benchmark for the current web and mobile industry. Many popular apps have a similar structure—Python on the backend and JavaScript on the frontend.

Even PayPal, which is completely different from Instagram, has a similar tech stack. In a very simplified statement, they use Python for managing data and JavaScript for their user interfaces.

The statement is simplified because if you were to get into the specifics of how they use different programming languages and tools, you’d quickly get overwhelmed with the complexity. Plus, they’re not as keen as Instagram on sharing details about their stack with the whole world.

One blog post I was able to find explains that PayPal engineers use Node.js for their middle-tier infrastructure, meaning web servers and their frontend, because it allows them to use only JavaScript to build their sites.

But again, this isn’t mission-critical. The mission-critical parts of PayPal are hidden under all of that, a lot of it coded in Python (and most likely several other languages), and taking care of security, stability, and data management.

JavaScript is not built for mission-critical systems. It started as a programming language for adding interactivity to websites, and even though it has grown into an incredibly useful tool, you could say it’s limited by design.

On the other hand, Python was designed as a general-purpose programming language. It is used far beyond web development. It’s strongly rooted in the academic community. While it can be used to build a great website, with Python you can also build neural networks for developing new drugs or AI technology that hides in the heart of apps like Uber.

So, ultimately, Python has more uses than JavaScript. But there are several areas where JavaScript reigns supreme, so much so that it would be silly to try and use Python for them.

Where does JavaScript win with Python?

JavaScript is a clear winner in the category of mobile development. There are some niche frameworks to do mobile development with Python—like Kivy and PyQT—but pretty much nobody uses them.

It would make more sense for a Python developer to learn JavaScript and use its most popular mobile development framework, React Native, to build an app.

Another area where JavaScript wins is frontend development. It has the best frameworks for building modern interfaces (React, Angular, Vue). With Node.js, developers can use JavaScript to also build the server side of their applications. Thanks to JAMstack (JavaScript + APIs + markup), developers can build super-fast, beautiful web apps within very short deadlines.

For a small/medium web and mobile development team on a budget and with tight deadlines, JavaScript is definitely the best option.

And, as the Instagram and PayPal examples show, when you combine Python with JavaScript, you can build amazing applications that dominate markets and revolutionize life for billions of people.

Is JavaScript better than Python in terms of performance?

JavaScript was built to be fast on the web. When you compare a Node.js web app to a Python app, the Node.js one is almost definitely going to be faster.

As Towards Data Science puts it, “Python is comparatively slower in performance as it processes requests in a single flow, unlike Node.js, where advanced multithreading is possible.”

There are ways to optimize Python’s performance by taking advantage of the fact that it uses the C programming language under the hood. For example, NumPy comes with optimized C code that makes Python code faster. Cython is a compiler, and a superset of the Python language that enables developers to build fast C modules that speed up the execution of Python code.

Generally speaking, JavaScript works well in I/O intensive situations—which means apps like Facebook, where a lot of data comes in and out of the application in real time, and it’s crucial that the user doesn’t have to wait for anything.

Python works well in CPU-intensive situations—like a machine learning model that needs to crunch a huge amount of data to solve a specific problem. It’s also a good language for doing heavy computations using GPUs.

Then again, Instagram’s server is more of an I/O intensive situation, but it uses Python. It goes to show that if you know how to optimize Python, you can make it perform quickly.

What about the other side of performance: time-to-market? Both languages can be used to quickly build a simple MVP as long as it’s done by good developers. For complex programs, Python makes for a quicker time-to-market because it’s easy to read and easy to debug. Python fosters smooth collaboration.

With JavaScript, things can get really complicated, really fast, which can lead to longer development times. For this reason a lot of companies have switched to TypeScript, which some developers would say is even easier to read and maintain than Python.

Machine learning with Python vs. JavaScript

Python is the main language of choice for machine learning developers. It makes a lot of sense. Machine learning is complicated and involves huge amounts of data. Python is a simple and readable language, so it makes life easier for developers by removing complexity, and it has always been the standard for data science.

The most popular ML frameworks—TensorFlow, scikit-learn, PyTorch—are mostly based on Python, and provide dedicated Python APIs which are the most popular way of using them. TensorFlow did release a JS version of the framework in 2018, and it allows developers to build machine learning models that work in the browser or in a Node.js server.

But that’s not enough to win over the ML world. Python is perfectly suited for machine learning, and it’s unlikely to be supplanted by another language in the near future.

The future of Python and JavaScript

Everything we talked about in this article leads to the conclusion that comparing Python and JavaScript isn’t really fair. These languages were designed with different goals in mind, which led to the differences in how they’re currently applied in software development.

And it’s exactly those differences that allow these technologies to perfectly supplement each other in the modern world of programming.

Will that change in the future? At the moment, JavaScript’s position as the most powerful web and mobile application development toolset seems very strong. More and more, it’s being turned into a general-purpose programming language, but it’s unclear if it could be a good substitute for a language like Python.

As for Python, it’s most likely going to continue to dominate the machine learning market, as well as academia, because of Python’s readability and ease of use, as well as its power for manipulating data.

In the end, the choice of your tech stack will always depend on the nature of your project, availability of programmers, and multiple other variables.

DevOps: How to Host a Simple Static Website on AWS S3

Adam Przewoźny — Sat, 16 Apr 2022 14:03:43 +0000

Originally written by Adam Stempniak and Adrian Dratwicki

If you’re thinking of hosting a website, you typically have two options:

buy a virtual server and manage it yourself,
use the services of a hosting company. The latter solution is one of the simplest and most popular when it comes to setting up websites, since it makes them easy to manage and cheap to maintain.

Using a hosting company usually comes down to getting access to an FTP server or panel, where you can upload the files necessary for your site to work. The rest is handled by your hosting service provider.

But what if I told you there was a third way? An even better solution for static websites?

In this article, I’ll show you how to host your website on AWS S3. I’ll also tell you how using AWS S3 benefitted one of our clients.

When should you host your static website on AWS S3?

Not everyone knows Amazon S3 offers a functionality that allows you to host a static website—“static” being the key here. The website should be simple, without much happening on the backend, since server-side scripts like PHP, JSP, or ASP.NET are not supported.

The best use case for this are company or personal sites serving as business cards, where your users will mostly find your contact information.

However, if you’re in need of using a CMS like Wordpress, I suggest you look at companies that provide hosting solutions or use a dedicated service like AWS Lightsail.

What are the benefits of using AWS S3 static website hosting?

There are many reasons why you should host your website on AWS S3.

Among others, it’s because the service:

manages everything, so there’s no need for you to worry about underlying software like the web server or the operating system in general;
scales well for temporary high traffic load;
is cheap (in the Frankfurt region, the cost is $0.00043 per 1,000 GET requests);
has great integration with CloudFront;
is simple to operate, allowing you to set up your website in a blink of an eye.

What do you need to host a static website on AWS S3?

You need to meet certain requirements before hosting your website on S3.

Make sure you have the following:

registered domain (for the purposes of this article, let’s assume you own a domain named “example.com”);
access to the panel that allows you to manage the DNS records for your domain;
AWS account;
basic knowledge of operating the S3 service;
fully prepared website (I won’t be showing you how to build your own website in this article).

A 4-step guide to AWS S3 static website hosting

I will now walk you through the process of hosting your site on S3.

1. Creating logging buckets

Let’s start by creating a bucket in S3. It will log the requests to your website. This isn’t mandatory, but I think there’s always value in collecting such information.

I suggest you call this bucket “example.com-logs.” Configure it by keeping the default settings for all the options, with the exception of “Manage system permissions.” Here, it’s necessary to grant write access to the S3 Log Delivery group.
Granting write access to the S3 Log Delivery group

If you intend to host more than one website on S3, you’ll be better off using a different, collective name for this bucket. Setting up the right prefixes will minimize the risk of you confusing logs from multiple sites—more on that later.

2. Creating S3 buckets

Moving on, we’ll create 2 buckets in S3. We’ll put the files with your website’s code in one of them.

Your buckets should be named after your domain, so call the 1st one “example.com” and the other one “www.example.com.”

Let’s focus on the 1st bucket now.
Creating a bucket for your website

When configuring your bucket, I recommend turning on object versioning. It may be useful in case you wish to restore a previous version of an image or an entire website template.

The next move is turning on request logging to your site. Select the bucket you’ve created as the target bucket. The folder where the logs are placed will be the prefix.

If you’re going to be collecting logs from several buckets to this one, I suggest you set the domain name as the prefix. Thanks to this, you’ll know which bucket the logs originate from as you browse through them.

Right now, we’ll just use the logs, since we’re hosting only 1 site.
Bucket configuration

The next step is unlocking public ACLs for objects.

Doing so is crucial, otherwise you won’t be able to publicly share the objects. As a result, displaying your website in the browser will be impossible.
Unlocking public ACLs

Your 1st bucket is almost ready. Now you can upload the files with your site’s code to it.

As you upload the files, remember to make sure that they’re publicly accessible.
Granting public access to bucket objects

In the following step, you have the option to choose the Storage class. Leave it set to “standard.”

3. Enabling static website hosting

It’s time to enable the hosting service.

Go to your bucket’s Properties. You’ll find the “Static website hosting” tab there.
Bucket properties view

By default, the service will be disabled. Enable it.
Enabling static website hosting

You need to input the name of your website’s “core” file. Usually, it’s “index.html,” as is the case here.

If you have a file that should be displayed in the event of an error, it should also be defined here—and placed in your bucket beforehand, naturally. However, this only applies to errors belonging to the 4XX group.

Additionally, it’s worth it to save the endpoint displayed above at this point.

If you’ve done all of that correctly, your website should display as intended once you enter the endpoint into the browser.
Checking if our site is accessible

Let’s create the 2nd bucket now: “www.example.com.”

You can leave the entire configuration set to default. What you care about the most here is redirection from “www.example.com” to “example.com.”

Once you create this bucket, go to its Properties and enable Static website hosting, just like before. This time around, though, you should also:

pick “Redirect requests” from the available options,
input the “example.com” domain in the “Target bucket or domain” field,
set “http” as the protocol. Setting up redirection for ''www.example.com''

4. Setting up DNS records

Your website should now be online. However, it’s clear you don’t want to use this lengthy URL as the access point to the site.

In order to use the “example.com” domain, log into the panel where you can add new DNS records. In this instance, I use Cloudflare to manage DNS records, but regardless, what you need to do comes down to 2 things:

Removing existing records that point “example.com” and “www.example.com” to an IP address or other domains
Adding 2 CNAME records
“example.com” that will point to our “example.com” bucket endpoint
“www.example.com” that will point to our “www.example.com” bucket endpoint DNS records settings for our website

You should wait a while for the DNS records to propagate. After that, you’ll be able to enter your website from either the “example.com” or “www.example.com” domain.

How one of our clients benefitted from AWS S3

At STX Next, we often use Amazon S3 in our projects for storing static or media files uploaded by the user and generated by the application. Static files are files such as images or CSS/JS files that aren’t generated on the go.

Very often, this is highly sensitive data, like PDFs containing confidential user information. Because S3 supports server/client-side encryption, the service is a really safe solution to store this kind of data.

One of our clients came to us with the intention of moving his entire infrastructure from a different hosting provider to AWS. The client had only one virtual server, where multiple services were installed.

One of these services was NGINX, which distributed the traffic between the application and the main company website. The client’s main site consisted only of some HTML, CSS, and JS files.

Through this website, a potential buyer could find the client’s contact information and learn about their services. In the process of moving the infrastructure to AWS, we decided to separate this website from the application services and move the database to the RDS.

It was the most convenient and secure choice. One instance, one role. If we ever have to move the application to another instance, we won’t have to concern ourselves with additional components like website, database, etc.

To spin up a new instance, we’ll only have to run the script that will automatically install packages required for the application to start running, then simply deploy it to the server. That’s it.

Maintaining an additional EC2 instance for a single website is unnecessary and we don’t expect any heavy traffic to the main website—but if it happens, the AWS will handle it.

As you can see, static website hosting on S3 turned out to be the perfect solution for us, and I believe the same will be true for your business.

Summary

You now know how to create 2 S3 buckets for your website. The 1st bucket you’ve created allows you to store files required to run your website, while the 2nd one serves strictly redirection purposes.

You’ve set up DNS records, so that your visitors can now enter your website through a friendly domain. Your site is available to the world, and you don’t have to worry about underlying servers, software updates, scaling, or costs.

If you’d like to use the HTTPS protocol—which is highly recommended—you should take a look at the CloudFront service. Using Cloudflare gives you the ability to establish a SSL/TLS connection by routing traffic through their servers. However, the traffic is encrypted between the client and the Cloudflare servers. For the full SSL/TLS setup, CloudFront is required.

AWS Glue Studio Guide—How to Build Data Pipelines Without Writing Code

Adam Przewoźny — Mon, 11 Apr 2022 07:34:37 +0000

Originally written by Maksymilian Jaworski and Lidia Kurasińska

You’ve probably heard that creating ETL (extract, transform, load) pipelines, especially complex ones, is a complicated task. Various tools have been developed to make this process much easier, but most of them still require some knowledge of a programming language (for example, Python or R) often combined with an understanding of tools such as Spark.

In August 2017, AWS created Glue DataBrew, a tool perfect for data and business analysts, since it facilitates data preparation and profiling. A year ago, the company released AWS Glue Studio, a visual tool to create, run, and monitor Glue ETL Jobs.

AWS Glue Studio supports various types of data sources, such as S3, Glue Data Catalog, Amazon Redshift, RDS, MySQL, PostgreSQL, or even streaming services, including Kinesis and Kafka. Out of the box, it offers many transformations, for instance ApplyMapping, SelectFields, DropFields, Filter, FillMissingValues, SparkSQL, among many. We can save the results of our jobs to Amazon S3 and tables defined in the AWS Glue Data Catalog.

Also, apparently, we can use it all without knowing Spark, as Glue Studio will generate Apache Spark code for us.

So, let’s see in practice what we can do with AWS Glue Studio. I promised myself that I would try not to write a single line of code when solving my case.

In this article, I used a slightly modified E-Commerce Data dataset.

AWS Glue Studio in practice

Let’s assume that you received the following task from your data analysts:

One system is uploading daily CSV files, which contain the following information: invoice_no, stock_code, description, quantity, unit_price, customer_id, country, invoice_date. Calculate the total number of sold items (count quantity) and the total purchase value (sum per item quantity multiplied by unit_price) per customer and day. Save the results in a CSV file separated by commas in an S3 bucket.

I know this sounds more like a job for a data analyst and some of you probably think this is a simple and boring task, but don’t worry, we will add some action later on.

My input files are located in this directory: aws-glue-demo-202109/inputsOkay, let’s see how we can do this with AWS Glue Studio.

To access it, choose AWS Glue from the main AWS Management Console, then from the left panel (under ETL) click on AWS Glue Studio. Go to Jobs, and at the top you should see the Create job panel—it allows you to create new jobs in a few different ways: Visual with a source and target, Visual with a blank canvas, Spark script editor, and Python Shell script editor.
I selected Visual with a blank canvas. It should create a new, empty, untitled job.

Before we start building our ETL process, let’s go to the Job Details tab and discuss some important properties that determine how AWS Glue will run the job. Except for the job’s name, you can change those settings any time you want.

Glue version—it determines the versions of Apache Spark and Python that are available to the job. Note: some features might not be available for particular versions. For instance, at the time of writing, the Data Preview function is not working with 3.0, the latest version.
Language—either Python or Scala. I will go with Python.
Worker type (G.1X or G.2X)—for the G.1X worker type, each worker maps to 1 DPU. For the G.2X, it’s 2 DPU for each worker. I chose G.1X as it has way more resources than I will actually need.
Job bookmark (Enable, Disable, Pause)—now this is a pretty important variable, especially that Enable is a default value. In a nutshell: when you enable job bookmarks, once you process some data (e.g., an S3 file), the data is marked by a job as processed and will not be processed by this job in the next executions. Bookmarks are tied to jobs, so different jobs can process the same file. I’m disabling it now, as I will probably have to process the same files over and over again during the development of my job. For extended description of Job Bookmarks, please read Tracking Processed Data Using Job Bookmarks.
Number of retries—this is pretty self-explanatory. I switch this value from 3 to 0 right now, as I don’t want Glue to retry executing a task which fails just because I made some dummy mistake, like using an empty file, for example.

I’m not going into the Advanced properties section, but keep in mind that this is where you can configure your S3 script path, Spark UI logs path, set maximum concurrency, disable metrics, etc.

Note: you have to set those settings for each created job, unless you’re cloning a job—then its copy has the same settings as the original one.

Now, let’s go back to the Visual section.

1.From Source, select the Amazon S3 node.
In Data source properties - S3 set:

S3 source type: S3 location—so you will access S3 files directly
S3 url: s3://aws-glue-demo-202109/input/
Recursive: true
Data format: CSV
Delimiter: Comma (,) Note: you are limited to the following delimiters: Comma, Ctrl+A, Pipe, Semicolon, Tab. Now, let’s go to the Output schema tab. Well, those datatypes don’t look right—you will have to change this. Click the Edit button and for each key set the right data type (according to the information provided by data analysts): It looks good, click on Next and go to the next step.

2.Doing aggregations. Here lies the first “obstacle,” as Glue Studio does not have a built-in transformation node which allows us to do aggregates. The best solution is to use Transform - Spark SQL node. You can name the node aggByCustomerByDate, and in the transform section, select Amazon S3 (the name of the parent node) as input source and give it a sales alias, which you can use in your SQL code as the table name. In the Code block, you can put a simple SQL query, which takes the customer_id column, gets the date from the invoice_date column, sums quantity and quantity*unit_price, and groups it by customer and sale_date.

Let’s assume that it doesn’t break my “not-writing-a-line-of-code” rule.

Note: this editor does not verify syntax, so double-check your query before you run it.
3.Now that you are done with aggregates, save its result to S3. From the Target menu, choose Amazon S3, pick up CSV Format. Decide if you want to apply any compression and set the target location to s3://aws-glue-demo-202109/output/1st-direct
4.The job is ready. In the right-upper corner, hit Save, then Run, and wait. You can go into the Runs tab and see the job progress, as well as links to logs and run metadata.

After a minute, you can see that the job has succeeded.
Now, go to the target S3 directory. You have 4 files here:
Each file contains customer_id, sale_date, total_quantity and total_sale.
Okay, that was quite quick and simple. But I’m not satisfied with this solution, and you might not be, either. True, I did calculations for all files, but what will happen tomorrow when the new file arrives? Like you, I don’t want to process all the data once again. We’re only interested in processing newly created files.

Besides, data analysts also have some further requests.

They remembered that sometimes quantity might be a negative value which indicates that an item was returned. They would like me to exclude those rows from the aggregations and store the list of returned items in a separate file.
If you took a good look at the last image, you should notice that the second row is missing a value for customer_id. Analysts would like to get rid of empty customers.
Analysts would like to link some customers’ data (first name, last name, address) to the file with aggregations.
Working on four separate files is a bit troubling—they would prefer to have one file only.

Let’s address my issue first. The way I see it, there are two possible approaches:

The aforementioned Job Bookmarks. As the doc says: “Job bookmarks are used to track the source data that has already been processed, preventing the reprocessing of old data. Job bookmarks can be used with JDBC data sources and some Amazon Simple Storage Service (Amazon S3) sources. Job bookmarks are tied to jobs. If you delete a job, then its job bookmark is also deleted. You can rewind your job bookmarks for your Glue Spark ETL jobs to any previous job run, which allows your job to reprocess the data. If you want to reprocess all the data using the same job, you can reset the job bookmark.”
Manually, or by using a function or some parameter, decide which files (day or range of days) should be processed.

The first solution seems really cool, but let’s say that I’m not 100% sure how the data is loaded—perhaps files are overwritten every day? Or they could suddenly start doing that? Or you’re going to modify our jobs often and re-run them for particular files, and you don’t want to remember about resetting the job bookmark?

Either way, let’s say that for the sake of this article, you simply can’t or don’t want to rely on job bookmarks. What can you do now?

You could filter out the data in, for instance, Transform - Spark SQL node, but it doesn’t really solve the issue—the job will still be processing all files, you will just filter out the output or the data which goes into aggregations. You have to figure out how to divide those files “logically.”

Instead of working directly with S3 files, try to organize S3 files into databases and tables. Perhaps then you will be able to query the data more efficiently. Use the AWS Glue Crawler for this.

In a nutshell, AWS Glue can combine S3 files into tables that can be partitioned based on their paths. For example, if your files are organized as follows:

bucket1/year/month/day/file.csv

then AWS Glue can create one table from all files in bucket1, which will be partitioned by year, month, and day. The level of partition creation is also definable, and you can have, for example, a table for each separate day, month or year. You’ll find more details in this article on working with partitioned data in AWS Glue.

For now, there are two ideas for you to test:

Create a separate table for each day.
Create one table with partitions by year, month, and day.

But before you create databases and tables, you have to reorganize the structure of your S3 bucket from:

bucket/YYYYMMDD_data.csv

bucket/year/month/day/data

So, instead of:

aws-glue-demo-2021/inputs/20210901_data.csv
aws-glue-demo-2021/inputs/20210902_data.csv
aws-glue-demo-2021/inputs/20210903_data.csv
aws-glue-demo-2021/inputs/20210904_data.csv
aws-glue-demo-2021/inputs/20210905_data.csv

your files will be organized this way:

aws-glue-demo-2021/inputs/2021/09/01/data.csv
aws-glue-demo-2021/inputs/2021/09/02/data.csv
aws-glue-demo-2021/inputs/2021/09/03/data.csv
aws-glue-demo-2021/inputs/2021/09/04/data.csv
aws-glue-demo-2021/inputs/2021/09/05/data.csv

Once it’s done, you can start working with AWS Glue Crawler (which is also available from the AWS Glue Studio panel in the Glue Console tab.)
First, configure a crawler which will create a single table out of all the files.

Click on Add Crawler, then:

Name the Crawler get-sales-data-partitioned, and click Next.
Keep the Crawler source type on default settings (Crawler source type: Data stores & Crawl all folders), then click Next again.
Select S3 as the datastore and specify the path of s3://aws-glue-demo-202109/inputs and click Next.
No, you don’t want to add another data store.
Now, you can either choose an existing role or create a new one. Go ahead and create a new one.
For now, choose Frequency: Run on demand.
To store the Crawler output, create a database called sales_partitioned and select the created database from the drop-down menu. In configuration options, select Ignore the change and don’t update the table in the data catalog and Delete tables and partitions from the data catalog (I will explain why later on), and click Next.
Review your crawler and confirm. For the crawler that will create separate tables the process is pretty much the same; the only changes are in following steps:
Step #1: Name it get-sales-data-partitioned-sep.
Step #2: Choose an already existing role (created for the previous crawler.)
Step #7: In Configure the crawler output, create a new database called sales_partitioned_sep. In Group behavior for S3 - Table level, enter 5. Why 5? Well, counting from the beginning: bucket is the first level, inputs—the second one, year—the third one, month—the fourth one, and day comes fifth. Next, review your crawler and save it. At this point, you should have two crawlers that will create two separate databases—run both and wait.

After a while, you can see that the get-sales-data-partitioned crawler created one table and get-sales-data-partitioned-sep created nine tables.
Why nine? Most likely because someone created new files in the meantime. You can find out by going into Databases, where you should see your databases:
In sales_partitioned you should see one table:
Whereas in sales_partitioned_sep, you should see nine tables.
The first thing that stands out is that one of the tables repeated in two months got an ugly suffix. But let’s go to the input_partitioned table. At the bottom you should see columns, which are your partitions:
How to figure out what’s behind partition_0, partition_1, and partition_2? Click on View Partitions: you can see that partition_0 is year, partition_1 is month, and partition_2 is day.
Click on Close Partitions, next Edit Schema, and rename these columns. Save it, and now your table should look like this:
Note: now it is important to explain why I previously checked Ignore the change and don’t update the table in the data catalog option during crawler creation. If I had not selected this option and run the crawler tomorrow, the column names I gave (year, month, day) would have been overwritten back to partition_0, partition_1, and partition_2.

Update! There is one thing that I missed while writing this article. If you name your S3 partitions the following way:

year=2021/month=08/day=30/data.csv

Glue Crawler will automatically pick up partition names, and you won't have to rename columns by yourself.

Now, back to your jobs—what are the options? On the left-hand side, you can see that I can select individual tables from the sales_partitioned_sep database. On the right-hand side, I have only one table to choose from, but I can send partition predicates.
Both approaches reduce the amount of initial data. However, there are few things that I don’t like about the second one:

the names of the tables do not clearly indicate (at least in my case) what data we are working on,
when I have to work on more than one day, I will have to add separate source nodes and create joins, whereas in the first solution, I will only have to modify the query in predicament pushdown.

Okay, so I’m good with the first one but, before we proceed, here are a few notes.

Now that we are working with a Data Catalog table instead of S3 files directly, we should note that the output of the file has changed. We no longer have choice fields—now data types were defined by Glue Crawler.

So, how can you change this? There are three options:

you can edit schema types in the Glue Data Catalog (same way we changed the partition name),
you can change the types in the first step (like in the previous job),
you can add a separate transformation node that will change the types.

I chose the third option. Why? Firstly, because I want everyone to see that this step is taking place (that we are expecting some data types.) Secondly, because I want to show you another transformation.

So, with partition predicate and extra step for applying data types, our job looks like this:
Now, back to the analyst issues.

Firstly, let’s address item returns. You want to filter out rows with negative unit_price and save them in a separate file. Add two additional transforms - Filter nodes after setDatatypes node. The first one (getSales) takes records with unit_price >= 0; the second one (getReturns) takes records with unit_price < 0. After getSales, proceed with aggregations and then save it to S3; after getReturns, just save it to S3.
Note: it seems that you cannot combine or and and operators in one step. If you really need to do it, you should use a Spark SQL node and write a custom SQL query.

What’s next? Try to remove empty customers from the calculation. Now, this one is a bit tricky—you want to filter it out after you set data types using Transform - Filter node, but this step is limited to “=, !=, <, >, <=, >=” and using != with Null or empty string is not working...
So maybe you could filter out before you set data types? Well, you can’t do that—before applying data types you can only filter columns for matching values using regex.
So, what’s left? Once again, you have to use Spark SQL transformation to write a query which will filter out null values. The key function is isnotnull(column) = 1. I placed this step after applying data types and before filtering out discounts.
Next, let’s handle the number of partitions of output files. And here I have to give up—it can’t be accomplished without using custom transformation (well, at least I didn’t figure out how to do that.) Is it tough?

Well, it depends. The documentation (AWS doc: transforms custom) is not very extensive and contains only one example. The most important thing to remember is that the Custom Transformation node only accepts glueContext and DynamicFrameCollection as input and must also return DynamicFrameCollections (collections of DynamicFrames) as output.

What does it mean? It means that in order to perform any transformations, in the first step you have to choose which DynamicFrame you want to work on. If the transform has only one parent, there is no problem—you choose the first one and convert DynamicFrame to DataFrame:

def reducePartitionNumber(glueContext, dfc) -> DynamicFrameCollection:
  df = dfc.select(list(dfc.keys())[0]).toDF()

Then, you can do whatever Spark transformations you might want. In my case, I want to reduce the number of partitions created, so I do:

df_w_less_partitions = df.coalesce(1)

Then I convert the dataframe back to DynamicFrame and return it as the DynamicFrames collection:

df_one_partition = DynamicFrame.fromDF(df_w_less_partitions, glueContext, "one_part_df")
return DynamicFrameCollection({"CustomTransform0": df_one_partition}, glueContext)

To sum up, the node should look like this:
But that’s not the end. According to the documentation and what you saw above, “A custom code transform returns a collection of DynamicFrames, even if there is only one DynamicFrame in the result set,” and, unfortunately, output nodes do not accept DynamicFrameCollection as input, so you will have to add one more a step that will select a specific DynamicFrame from the collection.

For this step you can use SelectFromCollection transform node which allows you to indicate which dataset you want to use. Since there is only one parent node here, there is only one dataset to choose from.

But if this node had several parents or a parent returning many DynamicFrames (for example SplitFields transformation, which splits dataframe into two separate dataframes), you would have a choice.
With that done, we can move on to the last task: adding customer information to the dataframe with aggregates. My analysts loaded the customers.csv file into the main folder aws-glue-demo-202109. File contains the following data: id, name, surname, address.

In our job, we first add another source that will directly reference the CSV file contained in S3 bucket. I give this node the name Customers. AWS Glue Studio detects data formats, separators and defines data types by itself.
Then we will use the Transform - Join node to combine the data from the two sources. This type requires at least two parent nodes, so I’m adding aggByCustomers node as the second source.

Why am I joining here? Well, I’m afraid that joining after reducing the number of partitions (with the getReducedDynamicFrame node) will once again result in multiple partitions, and joining earlier (before the aggregate is performed) is less efficient, since that joining would be performed on larger number of rows.
Now you need to select the join type and declare join conditions. Since my left dataset is Customers node, right is aggByCustomers and I know that I don’t have data for all customers, I chose right join. I join those datasets using columns id and customer_id.
The last thing to do is to change the parent output of reducePartitionsNumber from aggByCustomers to joinAggregatesWithCustomers and you are good to go. Now the job should look like this:
Let’s run it and see what happens. Once the job is finished, go to S3 and...

...it’s working! You received only two files:
In the first one, you’ll have customers with their data (if found) and aggregates:
In the second one, you’ll have returned items:
Well, frankly speaking I do not like the order of the columns, because when join did not find the client’s data, it looks like this:
And I don’t see any other way to reorder them than another Spark SQL node, where I would just select the columns in a different order. But let’s leave it as-is. Optionally, I could also use a FillMissingValues transformer node and type, for example, “NOT_FOUND” to fill missing values and make the file more readable.

How to schedule an AWS Glue job?

At this point, you could say: “Okay, but I don’t want to edit and run my job manually every day.” There are two things that should be set up to automate the whole process.

Automate crawler, so newly added files are visible as consecutive partitions. You don’t have to create a new crawler—you can edit an existing one. Choose a crawler, click Action -> Edit Crawler and hit Next until you reach step number #5 (Create a scheduler for this crawler.) Choose Daily instead of Run from demand and then you can set up the time. We go through the next steps and save changes. If everything’s gone well, you should see that a scheduler appeared in the crawler:
The second thing is job automation. First, create a job scheduler, which is a relatively simple task. Go to the created job and click the Scheduler tab. You’ll immediately see the options to create a new scheduler: You can set up a scheduler name, choose its frequency (Hourly, Daily, Weekly, Monthly, Custom) and, optionally, add a description. I set up a job scheduler to run one hour after the Glue scheduler. Now, it would be great if you did not have to manually change the date for which the aggregates are to be made. The easiest solution would be to modify the Partition predicate function in the source node, so it automatically extracts year, month and day from the current date.

year == year(current_date) AND 
month == month(current_date) AND 
day == day(current_date)

Note: please keep in mind that the months and days are not integer values (01, 02, 03, etc.), and Spark functions Extracts the month as an integer from a given date/timestamp/string. So, you would have to once again reorganize your bucket structure.

For those interested in more advanced solutions, please read about passing custom parameters to Glue Job (since AWS Glue Studio also allows you to pass up to 50 custom parameters) in this article on Calling AWS Glue APIs in Python—AWS Glue.

When should you use AWS Glue Studio?

You might be wondering when you should use AWS Glue Studio. Let’s focus on Glue Studio, not Glue as a whole service.

AWS Glue Studio is great if you want to:

quickly create ETL jobs that run regularly,
combine large amounts of data from many different sources,
and perform simple transformations such as rename or drop fields, join, split, or filter dataframe.

Since you can see and copy the code, a job prepared in Studio could be used as a starting point for a larger, more complex job in Glue.

And when is it not that useful? For instance, when you are looking for a job orchestrator that allows you to repeat your work from any step, use sensors to wait for a file, among other things, you will be better off with Step Functions or Airflow. If you are looking for a data preparation tool, DataBrew would be a better choice since it offers way more data transformation options.

Also, please keep in mind that AWS Glue Studio is not free. When you create a job you have to declare the number of workers (a minimum of two) and choose one of the two possible instance types: G.X1 or G.X2. In G.X1, each worker maps to one DPU and one executor, which launches with eight Spark cores and 10 GiB of memory. G.X2 doubles those values. And you are charged an hourly rate based on the number of DPUs used to run your ETL jobs.

What are the pros and cons of AWS Glue Studio?

One of the main advantages of AWS Glue Studio is the fact that, except for just one step, I didn’t need to use any programming language to create the above-mentioned ETL job.

What is not obvious can often be played around with a Spark SQL query. Integrating AWS Glue Studio with S3 or Data Catalog and scheduling jobs is extremely easy, and the same applies to job scheduling. Besides, let’s not forget that you can get data from streaming services like Kinesis or Kafka.

What’s more, in AWS Glue Studio we can monitor all the jobs in one view, and Job bookmarks is a very handy feature, too.

The main disadvantage is definitely the UI, which is a bit clumsy. It can hang and behave unpredictably at times. Since you can’t manually move nodes, larger jobs might be unclear at some point.

For users unfamiliar with Spark, the limited number of transforms might be a big pain in some cases. And, the latest version does not support all features such as Data Preview, which is something you will find out only when you try to run it.

Final thoughts on building complex data pipelines with AWS Glue Studio

Here is what you’ve accomplished by following the instructions in this article:

took files from both Data Glue Catalog and S3 directly,
applied data types,
removed unwanted data,
divided the data based on certain values,
did some joins and aggregations,
reduced the number of partitions,
saved the results to desired paths in an S3 bucket.

As you’ve seen in this tutorial, building complex data pipelines doesn’t need to be challenging. I hope you’ve enjoyed following the instructions and learnt a bit more about AWS Glue Studio.

Here are some other articles you might find useful:

In the article, I used a dataset provided by Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

How to Build a Good API That Won’t Embarrass You

Adam Przewoźny — Sun, 03 Apr 2022 17:06:02 +0000

Everyone and their puppy wants an API these days. APIs first gained popularity around 20 years ago. Roy Fielding introduced the term REST in his doctoral dissertation in the year 2000. It was the same year that Amazon, Salesforce and eBay introduced their APIs to developers around the world, forever changing the way that we build software.

Before REST, the principles in Roy Fielding’s dissertation were known as the “HTTP object model”, and you’ll see why that’s important soon.

As you read on, you’ll also see how to determine if your API is mature, what are the main qualities of a good API, and why you should focus on adaptability when building APIs.

The basics of RESTful architecture

REST stands for Representational State Transfer, and it has long been the holy grail of APIs for services, first defined by Roy Fielding in his dissertation. It’s not the only way to build APIs, but it’s the kind of standard that even non-developers know about thanks to its popularity.

There are six key characteristics of RESTful software:

Client-Server architecture
Statelessness
Cacheability
Layered system
Code on demand (optional)
Uniform interface

But that’s too theoretical for daily usage. We want something more actionable, and that’s going to be the API maturity model.

The Richardson Maturity Model

Developed by Leonard Richardson, this model combines the principles of RESTful development into four easy-to-follow steps.

The higher you are in the model, the closer you get to the original idea of RESTful as defined by Roy Fielding.

Level 0: The swamp of POX

A level 0 API is a set of plain XML or JSON descriptions. In the introduction, I mentioned that before Fielding’s dissertation, RESTful principles were known as the “HTTP object model”.

That’s because the HTTP protocol is the most important part of RESTful development. REST revolves around the idea of using as many inherent properties of HTTP as possible.

At level 0, you don’t use any of that stuff. You just build your own protocol and use it as a proprietary layer. This architecture is known as Remote Procedure Call (RPC), and it’s good for remote procedures / commands.

You usually have one endpoint you can call upon to receive a bunch of XML data. One example of this is the SOAP protocol:

Another good example is the Slack API. It is a bit more diverse, it has several endpoints, but it’s still an RPC-style API. It exposes various functions of Slack without any added features in-between. The code below allows you to post a message to a specific channel.

Even though it’s a level 0 API according to Richardson’s model, it doesn’t mean it’s bad. As long as it’s usable and properly serves the business needs, it’s a great API.

Level 1: Resources

To build a level 1 API, you need to find nouns in your system and expose them through different URLs, like in the example below.

/api/books will take me to the general book directory. /api/profile will take me to the profile of the author of those books—if there’s only one of them. To get the first specific instance of a resource, I add an ID (or another reference) to the URL.

I can also nest the resources in the URLs, and show that they’re organized in a hierarchy.

Going back to the Slack example, here’s how it would look like as a level 1 API:

The URL changed; instead of /api/chat.postMessage, now we have /api/channels/general/messages.

The “channel” part of the information has been moved from the body to the URL. It literally says that using this API, you can expect a message to be posted to the general channel.

Level 2: HTTP verbs

A level 2 API leverages HTTP verbs to add more meaning and intention. There are quite a few of these verbs, I’ll just use a fundamental subset: PUT / DELETE / GET / POST.

With these verbs, we expect different behaviors from URLs containing them:

POST—create new data
PUT—update existing data
DELETE—remove data
GET—find the data output of a specific id, fetch a resource (or an entire collection)

Or, using the previous /api/books example:

What does “safe” and “idempotent” mean?

A “safe” method is one that will never change data. REST recommends that GET should only fetch data, so it’s the only safe method in the above set. No matter how many times you call a REST-based GET method, it should never change anything in the database. But it’s not inherent in the verb—it’s about how you implement it, so you need to make sure that this works. All other methods will change data in different ways, and can’t be used at random. In REST, GET is both safe and idempotent.

An “idempotent” method is one that won’t produce different results over many uses. DELETE should be idempotent according to REST—if you delete a resource once and then call DELETE for the same resource a second time, it shouldn’t change anything. The resource should already be gone. POST is the only non-idempotent method in REST specifications, so you can POST the same resource several times and you’ll get duplicates.

Let’s revisit the Slack example, and see what it would look like if we used HTTP verbs in it to do more operations.

We could use POST to send a message to the general channel. We could fetch messages from the general channel with GET. We could delete messages with a specific ID with DELETE—which gets interesting because messages are not tied to specific channels, so I might want to design a separate API for removing messages. This example shows that it’s not always easy to design an API; there are plenty of options to choose and trade-offs to make.

Level 3: HATEOAS

Remember text-only computer games, without any graphics? You just had a lot of text with descriptions of where you are, and what you can do next. To progress, you had to type your choice. That’s kind of what HATEOAS is.!
Text game APIHATEOAS stands for “Hypermedia as the Engine of Application State”

When you have HATEOAS, whenever someone uses your API they can see other things they can do with it. HATEOAS answers the question, “Where can I go from here?”

But that’s not all. HATEOAS can also model data relationships. We can have a resource, and we don’t have authors nested in the URL—but we can post the links, so if someone’s interested in authors, they can go there and explore.

This is not as popular as other levels of the maturity model, but some developers use it. One example is Jira. Below is a chunk from their search API:

They nest links to other resources you can explore, as well as a list of transitions for this issue. Their API is quite interesting because of the “expand” parameter at the top. It allows you to choose fields where you don’t want links, and prefer the full content instead.

Another example of using HATEOAS is Artsy. Their API heavily relies on HATEOAS. They also use JSON Plus call specifications, which imposes a special convention of structuring links. Below is an example of pagination, one of the coolest examples of using HATEOAS.

You can provide links to next, previous, first, last pages, as well as other pages you find necessary. This simplifies the consumption of an API, because you don’t need to add the URL parsing logic to your client, or a way to append the page number. You just get the client ready to use already structured links.

What makes a good API

So much for Richardson’s model, but that’s not all that makes a good API. What are other important qualities?

Error/exception handling

One of the fundamental things I expect from an API that I consume is that there needs to be an obvious way to tell if there’s an error or an exception. I need to know if my request was processed or not.

Lo and behold, HTTP also has an easy way to do that: HTTP Status Codes.

The basic rules governing status codes are:

2xx is OK
3xx means your princess is in another castle—the resource you’re looking for is in another place
4xx means the client did something wrong
5xx means the server failed At the very least, your API should provide 4xx and 5xx status codes. 5xx are sometimes generated automatically. For example, the client sends something to the server, it’s an invalid request, the validation is flawed, the issue goes down the code and we have an exception—it will return a 5xx status code.

If you want to commit to using specific status codes, you’ll find yourself wondering, “Which code is best for this case?” That question isn’t always easy to answer.

I recommend you go to RFC which specifies these status codes, they give a wider explanation than other sources, and tell you when these codes are appropriate etc. Luckily, there are several resources online that will help you choose, like this HTTP status code guide from Mozilla.

Documentation

Great APIs have great documentation. The biggest problem with documentation is usually finding someone to update it as the API grows. One great option is self-updating documentation that isn’t detached from the code.

For example, comments aren’t connected to the code. When the code changes, the comments stay the same and become obsolete. They can be worse than no comments at all, because after a while they’ll be providing false information. Comments don’t update automatically, so developers need to remember to maintain them alongside the code.

Self-updating documentation tools solve this problem. One popular tool for this is Swagger, a tool built around the OpenAPI specification which makes it easy to describe your API.

The cool part of Swagger is that it’s executable, so you can play around with the API and instantly see what it does and how it changes.

To add self-updating to Swagger, you need to use other plugins and tools. In Python there are plugins for most major frameworks. They generate descriptions of how API requests should be structured, and define what data comes in and what comes out.

What if you don’t want Swagger, and prefer something simpler? A popular alternative is Slate—a static API you can build and expose on your URL.

Something in-between that’s also worth recommending is a combination of widdershins and api2html. It’ll allow you to generate Slate-like docs from Swagger’s definition.

Cacheability

Cacheability may not be a big deal in some systems. You might not have a lot of data that can be cached, everything changes all the time, or maybe you don’t have a lot of traffic.

But in most cases, cacheability is crucial for good performance. It’s relevant to RESTful APIs because the HTTP protocol has a lot to do with cache, for example HTTP headers allow you to control cache behaviour.

You might want to cache things on the client side, or in your application if you have a registry or value store to keep data. But HTTP allows you to get a good cache essentially for free, so if it’s possible—don’t walk away from a free lunch.

Also, since caching is part of the HTTP spec, a lot of things that participate in HTTP will know how to cache things: browsers, which support caching natively, as well as other intermediary servers between you and the client.

Evolutionary API design

The most important part of building APIs, and modern software in general, is adaptability. Without adaptability, development time slows down, and it becomes harder to ship features in a reasonable time, especially when you're facing deadlines.

“Software architecture” means different things in different contexts, but let’s adopt this definition for now:

Software architecture: the act/art of dodging decisions that prevent change in the future.

With that in mind, when you design your software and have to choose between options with similar benefits, you should always choose the one that’s more future-proof.

Good practices aren’t everything. Building the wrong thing in the right way is not what you want to do. It’s better to adopt a growth mindset and accept the fact that change is inevitable, especially if your project is going to continue growing.

To make your APIs more adaptable, one of the key things to do is to keep your API layers thin. The real complexity should be shifted down.

APIs shouldn’t dictate the implementation

Once you publish a public API, it’s done, it’s immutable, you can’t touch it. But what can you do if you have no other choice but to commit to a weirdly designed API?

You should always look for ways to simplify your implementation. Sometimes controlling your APIs response format with a special HTTP header is a leaner solution compared to building another API and calling it v2.

APIs are just another layer of abstraction. They shouldn’t dictate the implementation. There are several development patterns that you can apply in order to avoid this issue.

API gateway

This is a facade-like development pattern. If you break up a monolith into a bunch of microservices, and want to expose some functionalities to the world, you simply build an API gateway that acts like a facade.

It will provide a uniform interface for the different microservices (which may have different APIs, use different error formats, etc).

Backend for frontend

If you have to build one API to satisfy a bunch of different clients, it might be difficult. Decisions for one client will impact the functionality for others.

Backend for frontend says—if you have different clients that like different APIs, say mobile apps which like GraphQL, just build it for them.

This works only if your API is a layer of abstraction, and it’s thin. If it’s coupled to your database, or it’s too big, with too much logic, you won’t be able to do this.

GraphQL vs. RESTful

There’s a lot of hype for GraphQL. It’s kind of the new kid on the block, but it has already gathered a lot of fans. So much so, that some developers claim that it will dethrone REST.

Even though GraphQL is much newer compared to the RESTful specification, they share a lot of similarities. The biggest downside of GraphQL is cacheability—it has to be implemented in the client or in the application. There are client-libraries out there that have caching capabilities built-in (like Apollo), but it’s still harder than using the almost-free cacheability provided by HTTP.

Technically, GraphQL is level 0 in terms of the Richardson model, but it has qualities of a good API. You might not be able to use several HTTP functionalities, but GraphQL is built to solve specific problems.

One killer use for GraphQL is aggregating different APIs, and exposing them as one GraphQL API.

GraphQL does wonders with underfetching and overfetching, which are issues where REST APIs can be difficult to manage. Both are related to performance—if you underfetch, you’re not using API calls efficiently, so you have to make a lot of them. When you overfetch, your calls take result in a bigger data transfer than necessary, which is a waste of bandwidth.

The comparison of REST vs. GraphQL is a great segue into summarizing the most important qualities of a good API.

You need a clear representation for data—RESTful gives you that in the form of resources.

You need a way to show which operations are available—RESTful does that by combining resources with HTTP verbs.

There needs to be a way to confirm that there’s an error/exception—HTTP status codes do this, possibly with responses that explain them.

It’s nice to have discoverability and possibility to navigate—in RESTful, HATEOAS takes care of that.

It’s important to have great documentation—in this case executable, self-updating docs can take care of that, which goes beyond the RESTful spec.

Last but not least—great APIs should have cacheability, unless your specific case dictates that it’s not necessary.

The biggest difference between REST and GraphQL is the way they handle cacheability. When you build your API the REST way, you get HTTP cacheability essentially for free. If you choose GraphQL, you need to worry about adding a cache to your client or your application.

Step-by-Step: An Agile Product Validation Process Using Impact Mapping

Adam Przewoźny — Tue, 29 Mar 2022 19:23:03 +0000

While working on various IT projects, I realized that regardless of all these “lean” product development movements, many companies still encounter major issues in terms of product validation. In other words, it’s hard for people to understand if the product they develop is what the market needs.

To help you with your validation efforts, I’d like to give you an inside look into the day-by-day Agile product validation process for a live product that we have worked out with one of our clients.

What is product validation?

Let’s start by answering the fundamental question: what is product validation?

There are many approaches to what’s called “product validation.” Try to Google the phrase and you will find that there is more than one way to define it. My short explanation would be that product validation is a process aimed at answering the following question: is my product something that people need? When you build a product, it’s a good practice to verify its value each time you release a new version

One of the most famous frameworks in this field was presented in The Lean Startup by Eric Ries (you can find it on our list of must-read books for CTOs). He suggests following these steps:

Build an experiment to test if people need what you are trying to sell them.
Measure the result of your experiment.
Learn your lesson and then iterate by building your next experiment.

What I like about this approach is that Eric reveals that in order to gain value out of the cycle, you should plan it in the reverse order. First you need to know what you want to learn about your users. Next, you should think about how you will measure it. It's only at the end that you should try to figure out how to build the fastest and cheapest experiment to gain that knowledge.

An inside look at our team’s product validation process

I personally think that product validation is one of the greatest challenges in the whole workflow. In the project I'm currently work on, we encountered the same challenge. However, we decided to structure a process that would support our validation. We don’t call our approach the lean startup framework, although ultimately it does include the 3 aforementioned steps: build, measure, learn.

What does our validation process look like?

1. The pre-development phase: “how to measure?”

a) The Product Owner (PO) adds their idea of validation to the user story.

(Not sure what a PO does? Read our article about Product Owner's responsibilities.) Every time our PO adds a new user story to the product backlog, they have to fill in the “how to measure?” field. This approach helps them analyze, prior to any further discussion, how to recognize if the change we're about to make is something that our users need, and if we have enough tools to allow user story validation. See the snapshot of our user story template below—this is the “planning” part of our framework.

b) The validation idea is discussed with other people during the weekly meeting.

The meeting should be attended by the Product Owner, UX designer, Scrum Master, and the client’s representative (optionally).

The way we run this session is very easy: the PO presents their idea for validation while the other participants discuss it and suggest improvements where applicable. This way, the PO receives a wide range of opinions and makes more informed decisions.

2. Development phase

No validation work is done during this stage. This is the “build” part of the lean startup framework.

3. Post-development phase: product validation

After the new product increment is deployed to production, our validation process starts. Some of it happens ad-hoc—our data analysts start producing stats. However, the main point of the process is a weekly meeting during which the Product Owner, UX designer, Scrum Master, and, optionally, the client’s representative, discuss what has been released to our “live” environment and what results it's generated. In other words, we validate whether the new product features had a positive, neutral, or negative impact on end users. For this purpose, we use Google Analytics, a heatmap tool, and other data analysis tools.

After we collect the necessary data and validate new features, we make decisions regarding future development plans. We use a Jira board to facilitate the meeting (see below). We take the items that are in the “To verify” column and try to move them to “Verified,” which means that we have actually validated our feature. These are the the “measure” and the “learn” parts of the lean startup framework.
We iterate these practices and the cycle starts from the beginning every next development sprint. To summarize, the validation workflow is presented on the graph below.

What is an impact map and how do we use it?

Aside from Jira mentioned in the example above, we use one additional tool to visualize product validation status—the impact map.

Impact mapping is startegic planning technique created by Gojko Adzic to help businesses achieve their goals.

(We were quite excited to see Gojko Adzic himself retweet this article soon after it was published.)

The idea is simple. First, you define a business goal, e.g. “Sell our product to a million clients within 3 months.” Gojko suggests thinking about impact mapping as if it were an exercise in navigating a map from point A to point B. In this case, point A would be where we are right now, and point B is the goal we would like to reach.

If we look at the map, there are usually many ways to reach the same goal. The idea is to hit the road and check if all the roads are passable. It might turn out that some are actually closed, or there may be roads that don’t exist just yet.

Using impact mapping terms, you are supposed to define 4 elements:

1. Goal

Decide what would you like to achieve from the business perspective. Maybe you'd like to reach a certain volume of sales in a given period of time, or maybe you'd prefer to focus on the number of new registered users. Whatever your goal is, try to define it as precisely as possible.

2. Actors

Think about various groups of people who can help you achieve the goal. These can be your segments of users, your employees or anyone else.

3. Impacts

At this stage, you need to think about the actions your actors can take to help you achieve the goal.

4. Deliverables

Try to define precise product features that will help actors make an impact.

An example could look like this:
If creating a deliverable drives actors to make an impact that supports our main goal, we validate this deliverable positively. Using that information, we then plan our future work. If a deliverable doesn’t create any positive change, we validate it negatively, which is also a significant part of our learning.

An example of what our impact map looks like

The central part of the impact map (dark blue) presents the main goal we'd set as a team. It can be any sort of a business goal, e.g. 100,000 new users registered in our application in the next 3 months. It should be something precise and easy to measure at the end of the specified period.

On the borders of our impact map, we presented the deliverables (yellow, red, black, and green) we would like to offer to the end users in order to support our main goal, e.g. a free gift for everyone who registers within the specified period. In the Agile environment, deliverables would be called user stories or product backlog items.

The colors of the deliverables carry additional meaning:

yellow—for all deliverables that were added to the impact map and are in the pre-development stage,
black—the deliverable is either in the development or post-development stage but prior to the validation stage,
green—the deliverable was validated positively,
red—the deliverable was validated negatively. In our case, the impact map is updated weekly and overseen by the Scrum Master. However, the Product Owner or anyone else familiar with it can make sure the map stays up to date.

Based on the information we have visualized on the impact map, we make data-driven decisions about the future development of the product.

What are the benefits of impact mapping?

Impact mapping has a number of advantages your software house can greatly benefit from. Among others, impact mapping:

helps maximize customer value and minimize waste by making it easier for teams to focus on the objectives they’re working to reach;
visualizes the process in an accessible way for everyone involved;
increases collaboration within the team and unlocks creativity by facilitating the collective creation and testing of assumptions;
facilitates horizontal working, rather than top-to-bottom;
helps the team align their work with business objectives;
makes it easy for team members to spot challenges immediately and respond to them in a timely manner, as well as adapt to changing circumstances;
keeps the focus on people, their experiences and ideas.

Final thoughts

To recap, here’s a few reasons why impact mapping for Agile product validation is definitely worth your time:

Creating an impact map helps the team remain focused on the business goals, i.e. the actions that actually create revenue.
It allows you to see the whole landscape of all possible routes toward your business goal—including faster, easier, cost-effective routes you may have missed.
Impact mapping opens you up to more creative decisionmaking; you can look at whole sections of the map and be able to see which sections have been effective and which ones you should give more consideration to.
There's always a different section of the map you can focus on.
Thinking with the map helps you stay motivated. Hopefully this post will help you organize your product validation process within your team. Good luck, and let us know how it goes!

Why Use React Native for Your Mobile App?

Adam Przewoźny — Sun, 27 Mar 2022 22:28:38 +0000

Originally written by Jakub Grajcar and Bartek Klukaczewski

What are your plans to capitalize on the growing mobile market?

According to statista.com, by the year 2020 mobile apps are set to generate $188.9 billion USD in global revenues via app stores and in-app advertising. That’s a huge opportunity for business.

It’s high time to start utilizing the mobile market. But users have already seen dozens of apps on their smartphones. What can you show them that’s impressive, works great and seamlessly integrates with their mobile OS?

One solution at your disposal is React Native.

This article will tell you what React Native is and why (or why not) you should consider it for your mobile project.

To dig deeper into the subject of React Native, I spoke with Bartosz Kazuła, one of our JavaScript developers. Before he joined STX Next, Bartosz was already a big fan of React Native, and used it to create features for a text-based browser game. Now, he’s using his expertise to solve the business problems of our clients.

What is React Native?

React Native is a framework that allows you to build native mobile apps using JavaScript. Normally, you’d need to program your mobile app using Java (for Android) and Swift/Obj-C (for iOS). React Native removes that requirement, leading to fully functional apps on both platforms in much less time and using just one coding language.

Is React Native an entirely new invention? Not exactly. The framework was developed only a few years ago, by a social media company you may have heard of…

Facebook is the company behind both ReactJS and React Native. In fact, Facebook first created React to build the social platform we all love to hate. After further development, Facebook released ReactJS for the web as open source.

But Facebook was still struggling with their mobile app. They needed to maintain two codebases: one for iOS, one for Android. Features implemented in Swift on iOS had to be implemented separately in Java on Android, leading to work duplication and asymmetrical apps.

React Native neatly solves that problem.

Coming on the heels of ReactJS, the purpose of React Native was to facilitate the creation of mobile apps. It’s simple: if you can code an app once in JavaScript and deploy it both to Android and iOS, your life gets a lot easier.

If you’ve ever used the official Facebook app on Android or iOS, you’ve seen React Native in action. (Same goes for the mobile Airbnb app.)

How popular is React Native in terms of market share?

React Native is rising in popularity as a convenient solution to build cross-platform mobile apps with less strain on your budget.

How popular is it, exactly?

Take a look at React Native’s market share:

The stats are especially telling when you consider the top apps in the App Store and Google Play Store right now. Among the top 500 apps in the US, 14.85% of installed apps are built with React Native.

In fact, in the category of top US apps, React Native is the third most popular framework, right after Kotlin and Android Architecture Components.

Why use React Native?

For a long time, React Native was considered to be commercially unviable. It wasn’t developed or supported enough to produce “native-like” apps.

But the times have changed. React Native is gaining popularity, gaining community support, and gaining more market share. It’s getting easier and easier to write brilliant apps using React Native—and the world is taking notice.

Tried and trusted

Facebook built React Native first and foremost to create a fantastic mobile app for their own social portal. More likely than not, you’ve used it on your phone by way of the Facebook mobile app. Does it feel like a native app? Sure it does.

But since React Native has gone open-source, more companies have decided to bet on it and create their mobile apps this way. Here are just a few examples.

Popular React Native apps

Facebook
Instagram
Skype
Tesla
Walmart
Discord
Bloomberg

Putting your app in such company isn’t the worst idea in the world.

One codebase for two platforms

With React Native, you create one codebase that works on both Android and iOS. And it doesn’t just “work”—it compiles to native Java and Swift code. Specifically, React Native creates a bridge between web UI components and their native Java/Swift counterparts.

Think about the implications for your software project. No need for two development teams for two platforms. No need to synchronize features and layouts. You simply develop faster and can get more out of your budget.

Use the language you already know

You need less specialized knowledge to create React Native apps. Chances are you already have someone on your team that can program in JavaScript, possibly even in ReactJS specifically. JS developers are simply easier to find compared to developers with skills in Java or Swift.
Stack Overflow 2017 Developer Survey
Even if your frontend developer has never used React Native, with some self-education they can get up to speed very quickly, especially if they’ve used React already. All you need is a little time googling to find out which web components correspond with which mobile components—and you’re set.

Growing all the time

React Native is under active development. Both Facebook and the massive community around React Native are constantly working on improvements for the framework. If you can’t solve a problem right now because React Native doesn’t have the solution, in a few months the situation might be different.

For example, Bartosz was recently faced with the task of implementing a “speedometer” component for one of his projects. His initial idea was to create such a component by himself. But first, he decided to check if someone from the community had the same need in the past. Lo and behold, he found an open license speedometer ready to use, saving him time.

Even Microsoft took notice of React Native, prompting them to create their own fork: React Native for Windows. Their solution allows developers to more easily create apps for Windows 10, Windows 10 Mobile, and Xbox One.

Save even more time with a web app

If you have a ReactJS web app, fragments of the frontend code (such as business logic) can be shared between mobile and web, facilitating development even more.

What’s an example of a React Native app?

Bloomberg, the business and financial news provider, decided to use React Native to create its new consumer app after initial testing and prototyping.

Previously, Bloomberg engineers had to develop iOS and Android versions separately, without being able to share the code they created. Switching to React Native saved them time as they could unify their development capabilities and each developer could focus on one feature at a time. As a result, the team managed to create the new app in five months—about half the usual development time.

Using React Native also allowed the company to add a number of new, interactive features to the app, such as the ability to swipe a headline to share or bookmark an article. Users can also access live TV and on-demand event feeds.

The app allows users to customize it extensively, including personalizing content according to their interests and location, and monitoring their own personal portfolios.

The React Native design automates code refreshes and, instead of recompiling, it reloads the app instantly.

According to a senior software engineer at Bloomberg who worked on the transition, “React Native is the best out there,” and the company would be using it again in other mobile apps.

What should you watch out for when using React Native?

As with any solution, React Native has its drawbacks. What should you be wary of?

React Native is not 100% native...

Despite what Facebook might try to tell you, React Native apps are not equivalent to true native apps written entirely in Swift/Obj-C or Java.

Let’s take the native Gmail as an example. The Gmail app includes a worker that checks your inbox. Your Gmail account is synced with the whole mobile system, not just the app. Gmail also knows that if you’re on mobile data, not WiFi, it should sync less often.

With React Native you’re not integrated so closely with the system—at least not out of the box. Your account will be stored in the app, not the system. It’s also harder to take into account WiFi versus mobile data. For some apps, this will not be a problem; for others, it might be game-changing.

...but you can make it as native as you need

When you choose to use React Native, you don’t have to use it exclusively. You can still use React for most of the app, and native for the crucial pieces that need to integrate 100% with the native system.

One of our prospective clients recently argued against React Native, saying it’s still not the same as a true native app. Our counterargument was simple: it doesn’t have to be the same. With a little additional work, you can make it “as native as you need” by adding some Java/Swift code for specific cases and features.

If you use React Native, you have to use React

This might be obvious, but it still needs to be said as a disclaimer. When you opt for React Native, you will have to use ReactJS to create your app. React has competitors for a reason; it’s not always the best choice.

Fortunately, there are some competing solutions that fulfill a similar purpose to React Native without locking you down. One such competitor is Ionic. Their community isn’t as active, nor is it growing so quickly. But Ionic has one advantage: you can use different JS frameworks with it, including Angular, Vue.js and even React if you need some parts of it.

Badly written JavaScript or heavy calculations could hurt your performance

(Thanks to Adrian Warkocz for contributing a comment on Facebook that prompted this section.)

Using JavaScript in itself is a solution that has its drawbacks. For example, if the JavaScript code in your React Native app is poorly written, you will feel the difference more strongly than if your app was a pure native one.

However, this is a slight drawback that you won't feel if you have skilled React Native developers on your side.

Tim Mensch described the tradeoff very well in one of his Quora answers:

“Badly written JavaScript is slightly more likely to be slow than badly written native code. So if your developers are borderline, then you might be slightly better off spending nearly twice as much to develop two native apps. I prefer to hire strong developers, though.”

—Tim Mensch on Quora

JavaScript may also slow down performance if your app is very calculation-heavy. This is not a very common case for most mobile apps, but still a scenario you should consider.

Is React Native good for your project?

You’ve now discovered some of the benefits of React Native, but you’re still not sure if you should use it in your project? Here’s what it can help you achieve:

Save time and money

If you need to develop an app for both iOS and Android, React Native is the best tool out there. It can reduce the codebase by about 95%, saving you time and money. On top of that, React Native has a number of open-source libraries of pre-built components which can help you further speed up the development process.

Create great mobile apps

React Native is great for mobile apps. It provides a slick, smooth and responsive user interface, while significantly reducing load time. It’s also much faster and cheaper to build apps in React Native as opposed to building native ones, without the need to compromise on quality and functionality.

Use existing skills

Since React Native is based on JavaScript, your developers will not need long to get to grips with it. Having said this, it’s an open-source and community-driven framework, so if they ever need support, it’s widely available online.

Add third-party plugins

React Native also allows you to easily incorporate third-party plug-ins and APIs, including maps and payment systems.

Final thoughts

So what should you know about React Native? Here are the key takeaways:

If you’re using the Facebook or Airbnb mobile app, you’re using React Native without even knowing it.
React Native apps are easy to write, saving time for developers and cutting costs for project managers.
React Native apps lower your development and maintenance costs, because you don’t have to deal with two separate codebases for iOS and Android.
Since React Native is just a wrapper for native components, there’s nothing stopping you from adding native Java or Swift code where you need it.
At the end of the day, you’re still coding in JavaScript. There’s no need to learn Swift/Java, or to add developers with such skills to your roster.
React Native is growing fast with no signs of stopping.

These are just some of the many benefits React Native can deliver.

If you’d like to find out how this framework compares to Ionic or Flutter, you’ll find these pieces useful:

Sources

https://www.statista.com/statistics/269025/worldwide-mobile-app-revenue-forecast/
https://facebook.github.io/react-native/
http://facebook.github.io/react-native/showcase.html
https://medium.com/react-native-development/a-brief-history-of-react-native-aae11f4ca39
https://github.com/Microsoft/react-native-windows
https://www.quora.com/Will-an-app-written-entirely-in-React-Native-be-much-slower-than-a-Native-App-What-are-the-differences

FastAPI vs. Flask: Comparing the Pros and Cons of Top Microframeworks for Building a REST API in Python

Adam Przewoźny — Sat, 26 Mar 2022 19:56:06 +0000

Originally written by Adam Stempniak and Daniel Różycki

Creating web applications such as REST APIs is the bread and butter of backend developers. Therefore, working with a web framework should be quick and easy.

Microframeworks are a great start for small projects, MVPs, or even large systems that need a REST API—including Flask and FastAPI.

I wrote an application to create, update, download, and delete news in these two frameworks. As a result, here’s my comparison of FastAPI and Flask.

What is Flask? Why use it?

Flask is one of the most popular libraries for building web applications in Python. People who start their adventure with programming will easily find a lot of Flask tutorials and solutions to common problems.

It is lightweight (a “microframework”) and very well documented, with many extensions and a large community.

What is FastAPI? Why use it?

FastAPI ranks among the highest-performing Python web frameworks for building APIs out there and it’s being used more and more day by day.

Its emphasis on speed, not only in terms of the number of queries handled per second, but also the speed of development and its built-in data validation, makes it an ideal candidate for the backend side of our web application.

Data validation

Here’s where we can find the first significant difference between the two libraries.

By installing Flask, we don’t get any data validation tool. However, we can work around that by using extensions offered by the community, such as Flask-Marshmallow or Flask-Inputs.

The downside of this solution is that we have to rely on libraries that are developed separately from our main framework, meaning we can’t be 100% sure they will be compatible.

FastAPI, on the other hand, gives us the Pydantic library to use, which makes data validation much simpler and faster than typing it by hand. It’s closely related to FastAPI itself, so we can be sure that Pydantic will be compatible with our framework at all times.

So, what are the validations in the individual libraries based on our simple API?

We create classes named NewsSchema / CreatorSchema that will be the base classes for validating our news and authors.

# Flask
@dataclass()
class NewsSchema(BaseSchema):
   title: str = ""
   content: str = ""
   creator: CreatorSchema = CreatorSchema()

@dataclass
class CreatorSchema(BaseSchema):
   first_name: str = ""
   last_name: str = ""

  # FastAPI
   class NewsSchema(BaseModel):
      title: str = ""
      content: str = ""
      creator: CreatorSchema

   class CreatorSchema(BaseModel):
      first_name: str = ""
      last_name: str = ""

We can notice that FastAPI’s NewsSchema / CreatorSchema use BaseModel as a parent class. This is required because BaseModel comes from the Pydantic library and has the functions necessary for data validation.

In Flask, however, we inherit from the BaseSchema class, which is a regular data class and contains several methods the inheriting classes will use or override.

In our case, we will only check whether the text we enter is within the character limit.

The validation itself will take place in the NewsSchemaInput / CreatorSchemaInput classes:

 # Flask
   @dataclass()
   class NewsSchemaInput(NewsSchema):
      _errors: dict = field(init=False, default_factory=dict)

      def _validate_title(self) -> None:
         if MIN_TITLE_LEN > len(self.title) < MAX_TITLE_LEN:
            self._errors[
               "title"
            ] = f"Title should be {MIN_TITLE_LEN}-{MAX_TITLE_LEN} characters long"

      def _validate_content(self) -> None:
         if len(self.content) < MIN_CONTENT_LEN:
            self._errors[
               "content"
            ] = f"Content should be minimum {MIN_CONTENT_LEN} characters long"

      def __post_init__(self) -> None:
         self._validate_content()
         self._validate_title()
         try:
            if not isinstance(self.creator, CreatorSchemaInput):
               self.creator = CreatorSchemaInput(**self.creator)
         except ValidationError as err:
            self._errors["creator"] = err.errors
         if self._errors:
            raise ValidationError(
               f"Validation failed on {type(self).__name__}", self._errors
            )

# Flask
   @dataclass
   class CreatorSchemaInput(CreatorSchema):
      _errors: dict = field(init=False, default_factory=dict)

      def _validate_first_name(self) -> None:
         if FIRST_NAME_MIN_LEN > len(self.first_name) < FIRST_NAME_MAX_LEN:
            self._errors[
               "first_name"
            ] = f"First name should be {FIRST_NAME_MIN_LEN}-{FIRST_NAME_MAX_LEN} characters long"

      def _validate_last_name(self) -> None:
         if LAST_NAME_MIN_LEN > len(self.last_name) < LAST_NAME_MAX_LEN:
            self._errors[
               "last_name"
            ] = f"Last name should be {LAST_NAME_MIN_LEN}-{LAST_NAME_MAX_LEN} characters long"

      def __post_init__(self) -> None:
         self._validate_first_name()
         self._validate_last_name()
         if self._errors:
            raise ValidationError(
               f"Validation failed on {type(self).__name__}", self._errors
            )

When we create our object NewsSchemaInput / CreatorSchemaInput, the __post_init__ method will be run, where we execute data validation (checking the text length). If it’s incorrect, we add errors to the _errors variable, and finally raise a Validation Error exception.

In the case of structures that are nested (CreatorSchemaInput), we have to create these objects manually. We do it after the NewsSchemaInput validation is done in the __post_init__ method.

The data checking itself is not a big problem—only adding new fields will be cumbersome, because we have to add a separate _validate method each time. In the case of a nested structure, we have to create an instance of this object and catch an exception.

We can see that the classes that validate the incoming data become quite extensive—and that’s just for a few keys. We also need to add our own implementation of error handling, so that we can add nested error information in the API responses.

In FastAPI, it is much simpler and more enjoyable:

 # FastAPI
   class NewsSchemaInput(NewsSchema):
      title: str = Field(
         title="Title of the News",
         max_length=MAX_TITLE_LEN,
         min_length=MIN_TITLE_LEN,
         example="Clickbait title",
      )
      content: str = Field(
         title="Content of the News", min_length=50, example="Lorem ipsum..."
      )
      creator: CreatorSchemaInput

   # FastAPI
   class CreatorSchemaInput(CreatorSchema):
      first_name: str = Field(
         title="First name of the creator",
         min_length=FIRST_NAME_MIN_LEN,
         max_length=FIRST_NAME_MAX_LEN,
         example="John",
      )
      last_name: str = Field(
         title="Last name of the creator",
         min_length=LAST_NAME_MIN_LEN,
         max_length=LAST_NAME_MAX_LEN,
         example="Doe",
      )

By importing Field from Pydantic, we have access to simple rules that must be followed for user input to be valid. Data types are also validated on the basis of variable types, so if our first_name variable has the str type, we must pass text in the input (and act similarly for all built-in data types).

Without any extra code, Pydantic does a great job checking nested structures (CreatorSchemaInput in this case).

We can find all of this in no more than a few lines of code!

In addition to max_length and min_length, we can also see two additional parameters: title and example. They are optional, but will be visible in the automatic documentation generated by FastAPI for us.

Outbound data serialization

Now that we know how to validate the data, we should think about how we want to return it.

The message will have not only the content, title, and author, but also its unique number (id) and the date it was created and updated. We need to create a new class that will serialize the News domain model and it will be NewsSchemaOutput.

# Flask
   @dataclass
   class NewsSchemaOutput(NewsSchema):
      id: int = 0
      created_at: datetime = datetime.now()
      updated_at: datetime = datetime.now()

      def as_dict(self) -> dict:
         schema_as_dict = super().as_dict()
         schema_as_dict["created_at"] = int(self.created_at.timestamp())
         schema_as_dict["updated_at"] = int(self.updated_at.timestamp())
         return schema_as_dict

 # FastAPI
   class NewsSchemaOutput(NewsSchema):
      id: int = Field(example="26")
      created_at: datetime = Field(example="1614198897")
      updated_at: datetime = Field(example="1614198897")

      class Config:
         json_encoders = {datetime: lambda dt: int(dt.timestamp())}

The NewsSchemaOutput class is practically the same in both cases, the only difference being the parent class and the method of serialization to the dictionary (together with changing the datetime object into timestamp).

In FastAPI, while using Pydantic, we have the option of adding a Config class, in which we have placed the json_encoders variable. It helps to serialize the data in the way that we require. In this case, we want to pass the date object as a timestamp. In Flask, however, we had to change the data in the already created dictionary into those that we want to return.

Creating views and defining data

Setting up messages in both libraries is very similar and uses a simple decorator on the function we want to use. However, the ways of defining data validation and serialization differ.

 # Flask
   @news_router.route("/news", methods=["POST"])
   def add_news():
      db_repo = get_database_repo()
      news_schema = NewsSchemaInput(**request.get_json())
      news_dto = NewsDTO.from_news_schema(news_schema=news_schema)
      saved_news = db_repo.save_news(news_dto=news_dto)
      output_schema = NewsSchemaOutput.from_entity(news=saved_news).as_dict()
      return output_schema, HTTPStatus.CREATED

 # FastAPI
   @news_router.post(
      "/news",
      response_model=NewsSchemaOutput,
      summary="Create the news",
      status_code=status.HTTP_201_CREATED,
   )
   async def add_news(
      news_input: NewsSchemaInput,
      db_repo: DatabaseRepository = Depends(get_database_repo),
   ):
      """
      Create the news with following information:

      - **title**: Title of news
      - **content**: News content
      - **creator**: Creator of content
      """
      news_dto = NewsDTO.from_news_schema(news_schema=news_input)
      db_news = await db_repo.save_news(news_dto=news_dto)
      return db_news.as_dict()

At the very beginning, we have a decorator that specifies the path and the HTTP method that will be handled. Flask sets it using the methods parameter, where we need to pass the list of supported methods, while FastAPI uses the post attribute on news_router.

The decorator FastAPI uses is not only used to determine the HTTP path and methods, but also to serialize the data (response_model), describe the view in automatic documentation (summary), define the response status (status_code), and much more—not all of its functions have been included in this example.

It can be said that FastAPI not only defines the access path and method, but also describes the whole view in depth. But what’s really going on in this view? Let’s start with Flask!

The first thing we do is get the database repository for our function with: db_repo = get_database_repo ()

In the next step, we validate the data submitted by the user, which are in the request object:

   db_repo = get_database_repo()

   news_schema = NewsSchemaInput(**request.get_json())

This line will raise a ValidationError exception if the input is invalid.

The exception will be caught in the errorhandler we created and Flask will return a reply with all errors that are in the _errors variable on NewsSchemaInput.

But hold on just a second! We haven’t yet discussed the errorhandler we supposedly created.

In Flask and FastAPI, we can add our own exception handling, which will be thrown in the views implementation. They look like this:

 # Flask
   @app.errorhandler(ValidationError)
   def handle_validation_error(exc: ValidationError) -> Tuple[dict, int]:
      status_code = HTTPStatus.UNPROCESSABLE_ENTITY
      return {"detail": exc.errors}, status_code

# FastAPI
   @app.exception_handler(ValidationError)
   async def handle_validation_error(request: Request, exc: ValidationError):
      return JSONResponse(
         status_code=status.HTTP_422_UNPROCESSABLE_ENTITY,
         content={"detail": exc.errors()},
      )

If the validation was successful, create a NewsDTO object that will pass the necessary information to the database repository. The repository will do its magic (save a message in the database) and return the News domain object to us, which we then serialize with the NewsSchemaOutput class:

 news_dto = NewsDTO.from_news_schema(news_schema=news_schema)
   saved_news = db_repo.save_news(news_dto=news_dto)
   output_schema = NewsSchemaOutput.from_entity(news=saved_news).as_dict()

At the very end, we return NewsSchemaOutput as the dictionary and the response status:

   return output_schema, HTTPStatus.CREATED

Now, let’s take a look at FastAPI. This time, we get two parameters in the view: news_input anddb_repo.

In the first one, the input data validation happens before the execution of our view method, thanks to the news_input parameter.

You might be asking yourself: how does FastAPI know which class to use? It’s thanks to typing. The news_input parameter has theNewsSchemaInput type, so what FastAPI does is pass all the data to this class that we sent using the POST method. We don’t need to create an instance of the NewsSchemaInput object because we will get validated data in the news_input parameter.

Regarding db_repo, it works similar to Flask, except that here we’re using dependency injection. The Depends keyword allows you to substitute classes or functions while our application is running. We’ll talk about dependency injection a bit later.

   async def add_news(
      news_input: NewsSchemaInput,
      db_repo: DatabaseRepository = Depends(get_database_repo),
   ):

When our method is called, we save the message in the database.

   db_news = await db_repo.save_news(news_dto=news_dto)

In Flask, we had to create an instance of the NewsSchemaOutput class to return the correct data. Same with the response status: it’s also returned using the return keyword.

FastAPI allows you to specify a class to serialize data using the response_model parameter in the decorator. All we need to do is to provide the correct structure that Pydatnic will understand. The response status can also be set in the same place as response_model, but using thestatus_code parameter.

Fetching messages, variables in the address, and GET parameters

Just as when we create a post, we define the view with a simple decorator. This time, however, we use the GET method.

   # Flask
   @news_router.route("/news/<int:news_id>", methods=["GET"])
   def get_news(news_id: int):
      db_repo = get_database_repo()
      news_from_db = db_repo.get_news(news_id=news_id)
      output_schema = NewsSchemaOutput.from_entity(news=news_from_db).as_dict()
      return output_schema

   # FastAPI
   @router.get(
      "/news/{news_id}",
      response_model=NewsSchemaOutput,
      summary="Get the news by ID",
      responses=NOT_FOUND_FOR_ID,
   )
   async def get_news(
      news_id: int, db_repo: DatabaseRepository = Depends(get_database_repo)
   ):
      """
      Get the news with passed ID
      """
      db_news = await db_repo.get_news(news_id=news_id)
      return db_news.as_dict()

To download the message we’re interested in, we need to pass its id to our view. We do this with an address to which we add the news_id parameter. In Flask, we have to specify its type in detail using angle brackets and the name, i.e. <int: news_id>. We’re forced to use only basic types that Flask understands, such as int, uuid, str or float, and so on.

FastAPI uses a convention that is similar to that used by f-string, where the name of our variable is defined by curly brackets and its type is set in the parameters of the view function.

This is a more flexible solution, as we can try to pass complicated structures in the address. You may also have noticed a new parameter that has appeared in the view decorator. This parameter is called responses—we’ll come back to it when we discuss automatic documentation.

Filtering messages with GET parameters

When we want a flexible solution, instead of creating a view that needs defined variables in the address, we use GET parameters. In this case, we need to return messages that meet the criteria passed to us by the so-called query parameters. We have two parameters: id and created_at.

   # Flask
   @news_router.route("/news", methods=["GET"])
   def get_news_by_filter():
      db_repo = get_database_repo()
      ids = request.args.getlist("id", type=int)
      created_at = request.args.getlist("created_at", type=int)
      news_from_db = db_repo.get_news_by_filter(id=ids, created_at=created_at)
      return jsonify(
         [NewsSchemaOutput.from_entity(news=news).as_dict() for news in news_from_db]
      )

   # FastAPI
   @router.get(
      "/news",
      response_model=List[NewsSchemaOutput],
      summary="Get the news by filter",
      responses=NOT_FOUND_FOR_ID,
   )
   async def get_news_by_filter(
      id: Set[int] = Query(set()),
      created_at: Set[datetime] = Query(set()),
      db_repo: DatabaseRepository = Depends(get_database_repo),
   ):
      """
      Get the news with passed filters.

      - **id**: List of id to search for
      - **created_at**: List of date of creation timestamps
      """
      db_news = await db_repo.get_news_by_filter(id=id, created_at=created_at)
      return [news.as_dict() for news in db_news]

Flask provides the request object from which we can extract data about the request to our view method. Flask offers a request object from which we can retrieve all query data to our view.

This time, we’re interested in the id and created_at parameters. We also know that we can expect a list of these parameters—for this, we use the getlist method from the special args dictionary.

   ids = request.args.getlist("id", type=int)
   created_at = request.args.getlist("created_at", type=int)

Then we send the extracted data to the database repository to get a list of News domain models, which we turn into a list of dictionaries from the NewsSchemaOutput class.

   news_from_db = db_repo.get_news_by_filter(id=ids, created_at=created_at)
   [NewsSchemaOutput.from_entity(news=news).as_dict() for news in news_from_db]

We must also remember that we can’t return the list from the view—it’s necessary to execute the jsonify function for our endpoint to return the Response object with the correct serialization of the list.

   return jsonify(
         [NewsSchemaOutput.from_entity(news=news).as_dict() for news in news_from_db]
      )

With FastAPI, the whole process looks quite similar to Flask—the difference is that we get the address variables in the function parameters, which is much more readable than executing request.args.getlist with each variable we need. In order for FastAPI to know that the function parameters are address variables, we need to add the default Query value to them, which is predefined.

How does FastAPI know that we want a specific data type if we haven’t specified it in curly brackets? Typing shows it.

All we need to do is to add a type to our parameters, e.g. set [int], and we will be sure that the variable will contain a set with integers only.

After the address variables are validated, we extract the News domain models from the database repository using the sent criteria. Then we return the list of message model dictionaries and the response_model in the decorator will deal with correct serialization of the data.

   db_news = await db_repo.get_news_by_filter(id=id, created_at=created_at)
      return [news.as_dict() for news in db_news]

Dependency injection

Dependency injection is a pattern in design and software architecture based on removing direct dependencies between components.

Sounds pretty complicated, right? Well, FastAPI was able to implement this pattern in a very simple way.

We may have noticed that in each view, there is something like this in the function parameters:

   db_repo: DatabaseRepository = Depends(get_database_repo)

This is what we call a dependency injection—in this case, we’re injecting the database repository. The Depends keyword is able to inject anything that can be named (e.g. classes or functions). This is a good method, as it allows you to stick to the DRY (Don’t Repeat Yourself) rule, because you don’t have to create a new variable for the database repository each time, as it is done in Flask:

   db_repo = get_database_repo()

Another advantage of Depends is that it can easily substitute implementations in tests.

In Flask, to replace the return value from get_database_repo, we would have to mock this function every time we run tests.

   @mock.patch("path.to.dependency.get_database_repo)
   def test_some_view(db_repo_inject_mock):
      db_repo_inject_mock.return_value = OUR OWN DB REPO IMPLEMENTATION

Thanks to dependency injection in FastAPI. we can use…

   app.dependency_overrides[db_repo] = OUR OWN CALLABLE IMPLEMENTATION

…to replace the implementation when running the tests.

Depends can also be used to not repeat the same function parameters n times. For more, take a look at the documentation.

Asynchronicity

Unfortunately, Flask doesn’t support asynchronicity and ASGI interface, which means that some long-running queries may block our application. This is related to a smaller number of users we can handle with our REST API.

As you may have noticed, the view functions in FastAPI start with async and each method calling on the database repository is preceded by the word await.

FastAPI is fully asynchronous—which doesn’t mean it’s required, since we can also implement ordinary synchronous functions—and uses the ASGI interface. Thanks to that, we can use non-blocking queries to databases or external services, which means the number of simultaneous users using our application will be much larger than in the case of Flask.

In its documentation, FastAPI has a very well written example of using async and await. I highly recommend reading it!

And how about running a benchmark?

For this task, we will use Locust. It’s a free, open-source Python load testing tool. Our test will be based on adding 100 users to the pool of active connections every second, until we reach 2,000 users at the same time.
Flask
As we can see, the number of queries per second we can handle is around 633. That’s not bad, right? It could be better, though. The average waiting time for a response is about 1,642 ms—practically one and a half seconds to receive any data from the API is definitely too much. To this, we can add 7% of unsuccessful queries.
FastAPI
FastAPI did much better in this task. The number of queries that we can handle is about 1,150 per second (almost twice as much as in Flask), and the average waiting time for a response is only… 14 ms. All queries were correct and we didn’t spot any errors.

Automatic documentation

When creating a REST API, documentation is essential for a team of developers or users who want to use this interface to communicate with our application.

You can do it manually, e.g. in the Jira Confluence / Github wiki or any other design data collection tool. However, there is a risk of human error, e.g. when someone forgets to update the addresses to views or makes a typo.

The most common standard for creating such documentation is OpenAPI and JSONSchema.

Flask offers extensions, such as Flask-Swagger or Flasgger, which operate using the specification mentioned above. They require additional installation and knowledge of the format used by these standards.

Also, the specifications of the transferred data must be saved manually—they will not be taken from the classes that validate or the parameters that we download.

FastAPI has documentation that is fully compatible with OpenAPI and JSONSchema, which is created automatically from Pydantic schemas and function parameters or GET variables. The user interface is provided by SwaggerUI and Redoc.

This is a very interesting feature, as it doesn’t require any work from us (unless we want to embellish our documentation with details). All the rules for the required data can be found in the Pydatnic schemas.

Documentation is available at host / doc (SwaggerUI) and host / redoc (ReDoc) and looks like this:
Swagger UI

In SwaggerUI, we also have access to all the schemas that we have defined in our application:

We can notice that the information from the summary and title parameters from CreatorSchemaInput appeared.

How does FastAPI know what information to pass to the documentation? Let’s look at an example of downloading messages:

   # FastAPI
   @router.get(
      "/news/{news_id}",
      response_model=NewsSchemaOutput,
      summary="Get the news by ID",
      responses=NOT_FOUND_FOR_ID,
   )
   async def get_news(
      news_id: int, db_repo: DatabaseRepository = Depends(get_database_repo)
   ):
      """
      Get the news with passed ID
      """
      db_news = await db_repo.get_news(news_id=news_id)
      return db_news.as_dict()

There are parameters in the decorator that will be taken into account when creating documentation:

/ news / {news_id}—in the documentation, we will see that the news_id parameter is required and must be an integer
response_model—this response scheme will be automatically displayed in the documentation
responses—if our view returns response codes other than 200/400/422 or 500, we can add a special dictionary with the statuses and the returned data schema, like here:

   NOT_FOUND_FOR_ID: Response_Type = {
      404: {
         "description": "News with given ID wasn't found",
         "content": {
            "application/json": {"example": {"detail": "News with id {id} don't exist"}}
         },
      }
   }

Also, the docstring is taken into account and will be shown as additional information for the specific view.

Final thoughts on Flask and FastAPI

Thanks for reading my comparison of these two great libraries using a very simple CRUD application from a REST API as an example.

On the one hand, we have the very popular Flask, which can’t be ignored; on the other, there is FastAPI, which wins the hearts of users with the number of built-in functionalities and asynchronicity.

So, which one is better? Personally, if I were to pick the framework for my next REST project, I’d certainly lean toward FastAPI.

Of course, you’re free to draw your own conclusions and choose differently. However, I hope you’ll at least try to give FastAPI a chance.

DevOps Tools Overview: Monitoring Cloud Infrastructure with CloudWatch and OpsGenie

Adam Przewoźny — Sat, 19 Mar 2022 14:01:28 +0000

Originally written by Lidia Kurasińska

For your business to perform at an optimal level, you need to take a proactive approach to in-depth monitoring and analysis of your key asset: the product you’re offering.

Server crashes and unexpected downtime mean frustrated users and lost revenue. Therefore, being able to detect any issues in your IT infrastructure before they escalate and monitor failure patterns will go a long way toward making sure you deliver seamless performance to your end users.

But infrastructure monitoring is not only about minimizing disruption. By providing you with in-depth insights about your product, it will enable you to better understand its day-to-day performance and make data-driven, long-term decisions about its future.

Based on our experience of helping clients monitor their infrastructure 24/7, we wrote this post to help you understand the importance of having reliable monitoring tools and mechanisms in place.

In our collaboration with our partners, we’ve used mainly Amazon CloudWatch and OpsGenie, so we’ll focus on these two tools. However, many of their underlying principles can be replicated with other monitoring services available on the market.

What is infrastructure monitoring?

Infrastructure monitoring refers to the process of collecting and reviewing data about your infrastructure’s status and performance.

Some of the monitored metrics include:

the load levels of, for instance, CPU or RAM;
the status of services that run on the server (e.g. the application or the database);
the number of errors that have occurred in certain services (e.g. the 5xx error code on the NGINX server).

The collected data can come from various sources: from the application itself to the computer that hosts it. Gathering this information is the basis of infrastructure monitoring, as it allows administrators to define the server’s status and configure the alerts that provide notifications on any unusual performance.

By gathering a wealth of data, infrastructure monitoring tools give administrators the necessary insight to protect the business and plan ahead.

What are the benefits of infrastructure monitoring?

Continuous infrastructure monitoring helps you achieve the desired product performance, maximize efficiency, and save resources by detecting problems before they escalate and impact your business.

Below are some of the many reasons why you should invest in a reliable monitoring tool.

1. Respond to incidents fast

If an incident happens, you should be the first one to know about it. Having a clear view of your infrastructure is crucial if you want to be able to detect and resolve any problems before they spread and potentially damage your relationship with your users.

Regardless of the nature of the issue you’re facing, being able to respond to it as soon as it appears will go a long way toward protecting your business.

2. Get a better understanding of your infrastructure

Monitoring your infrastructure continuously and proactively gives you a clear picture of how it performs on a daily basis, and allows you to monitor failure patterns and pick up on any warning signs early on.

For instance, if your application suddenly starts to perform below expectations, visualizing the monitoring data might give you valuable insight into what has caused the bottleneck.

3. Make informed, data-driven decisions

Having clear, data-driven insights into the health of your infrastructure isn’t crucial in just helping you understand its ongoing performance. Most importantly, it allows you to make informed decisions about your long-term IT infrastructure strategy and investment plans.

4. Save time and money

If you want to keep your infrastructure-related costs under control, you can’t afford to go without the use of monitoring tools. They offer an easy way to find out how your server plans correspond to your actual needs.

Whether you’ve been underutilizing your existing cloud service or are about to require a larger and more expensive package, analyzing your monitoring data will help you manage your budget.

5. Give yourself peace of mind

Monitoring your infrastructure 24/7 simply gives you peace of mind. If anything goes wrong, you will be notified the very same moment and can get down to fixing the problem straight away.

Analyzing the monitoring data can also provide you with insight on the kind of long-term trends you can expect in the future.

What are some of the tools used in infrastructure monitoring?

To get started with infrastructure monitoring, first you need to pick the right tool. Since different systems require different solutions, it’s worth taking a look around to make sure you choose the one with the most suitable features for your set of servers.

For the purpose of this article, we’ll broadly break them up into two categories: cloud-native and non-native.

1. Cloud-native solutions

If you use cloud services to host your infrastructure, sticking to your provider’s native solutions is usually the best option.

The native tools are a breeze to set up, since they come with your cloud account and are easy to maintain. You won’t need to worry about your tool-hosting server or install additional agents on the server for most basic metrics.

The most popular cloud services and their native monitoring tools are:

These tools usually provide us with some basic data at the hypervisor level (the tool responsible for server virtualization). However, it’s possible to push custom data using scripts or agents. AWS even offers one that monitors RAM and disk usage.

On top of monitoring, cloud services can also track API calls using CloudTrail. This can provide you with some insight on, for instance, the changes made to the infrastructure, the resources accessed by the users, and others.

2. Non-native solutions

Besides cloud-native solutions, you can always make use of non-native tools. They’re a good option when you need to monitor on-premise infrastructure.

Some non-native solutions require additional software (a client) to be installed on each server that needs to be monitored. The data is then sent by the agent to the server (the tool) which subsequently processes the information. Some of the most popular ones include:

Prometheus + Alertmanager
Grafana (as data visualizer and for triggering alerts)
Nagios
Zabbix
Icinga/Icinga2
Graphite
Sensu

An overview of Amazon CloudWatch and OpsGenie

As you’ve seen above, there are plenty of infrastructure monitoring tools to choose from. However, for the purpose of giving you a general idea of how they work, we will focus on the examples of Amazon CloudWatch and OpsGenie.

We used these tools for one of our clients who had chosen AWS as a cloud provider for their infrastructure. Although we can’t identify who the client was for this particular project, you can always explore our portfolio for other client stories.

1. Amazon CloudWatch

Amazon CloudWatch is a native tool to monitor AWS resources, such as EC2, RDS, SQS, ElastiCache, SES, and others. It allows you to create dashboards to visualize metrics, which can include the amount of RAM used by the EC2 instance or the number of connections established to the RDS.

The Dashboard feature is very useful, as it can give you instant insight into the status of your infrastructure. It auto-refreshes, so you can display it on a TV screen in the developers’ room.
A sample dashboard in Cloudwatch

However, the key functionality of CloudWatch is Alarms. It allows you to set up alarms for the metrics you want to monitor.

Although many of them will indeed have to be client-specific, you should also consider using basic AWS metrics such as the CPU and RAM usage of EC2 instances, Read/Write IOPS for DB instances, and the number of bounced emails from SES.

Usually, you need to adjust the alarms after the initial setup to avoid false positives.

An alarm in CloudWatch can have one of these three statuses:

OK
INSUFFICIENT_DATA
ERROR Alarms in Amazon Cloudwatch

Whenever an alarm changes its status, you need to identify what action to take. This leads us to the next tool we want to give you a general overview of: OpsGenie.

2. OpsGenie

OpsGenie is an incident management service that integrates with your monitoring tool to provide you with around-the-clock notifications on the status of your infrastructure. It’s triggered by SNS—the AWS messaging service that can send notifications via email, webhook, or text message.

A tool like OpsGenie plays a crucial role when you want to stay ahead of disruptions and resolve any issues before they escalate. It will help your team build on-call schedules and will keep everyone informed on who’s accountable for any alerts that emerge.

The platform’s reporting and analytics tools will also help you get to the bottom of the alerts and analyze the team’s workload and performance.

OpsGenie simple statistics

One OpsGenie instance can work for multiple teams, and each team member can look up the schedule, revise it, or just go through past alarms.

The tool is available as a native app for both Android and iOS. We use it a great deal, because it supports push and email notifications as well as mobile phone calls.

The app allows you to manage the way you want to be notified about any alerts. For instance, we’ve customized it to receive the first notification as a push notification. If the notification isn’t acknowledged within three minutes, the app will make a phone call to the on-duty engineer. In case the engineer doesn’t act upon it, OpsGenie will escalate the process and alert other team members.

OpsGenie Alerts panel

How to Audit the Quality of Your Python Code: A Step-by-Step Guide (Checklist Inside)

Adam Przewoźny — Sun, 06 Mar 2022 19:05:18 +0000

Originally written by Maciej Król

Building a software development project is a bit like a game of Jenga.

All elements create one perfect tower. Usually, it might be tweaked and worked on with no consequences. But if it has even one vulnerable place, a wrong move might ruin all the hard work.

Okay, so it’s not a perfect analogy. A software program requires much more work than a pile of wooden blocks and we don’t necessarily strip it of its parts, but rather add the next ones.

However, the “poke one and all will fall” metaphor still stands. If your project has any weak points, they might doom the entire construction.

It doesn’t matter how well written the rest of your code is if that one tool you used is outdated and might cause serious security breaches. And the more sensitive data your product is dealing with, the more careful you have to be.

A code audit is vital to ensure your product is of good quality, secure, and ready to launch.

In this article, you will find a detailed guide on what a code audit is, why you need it and how to perform it, step by step. As a Python-centered software house, we decided to focus on how to run an audit of Python-based code. However, you will find some of the tips and guidelines relevant regardless of your technology choice.

We will also provide you with a checklist and a sample report from an audit so that you can see what a well-prepared, comprehensive auditing process looks like. The exemplary audit is over 20 pages long and will serve as a fantastic point of reference for your future work! Download the checklist and report in PDF format below.

With our guide, you will be able to run a Python code audit yourself, and learn what you should expect from one.

Read on!

What is a code audit?

“Code audit is a comprehensive analysis of source code in a programming project with the intent of discovering bugs, security breaches or violations of programming conventions,” according to Wikipedia. I know that quoting Wikipedia in an article is like quoting Merriam-Webster during a wedding speech—but this time they got it so right that they deserve credit!

The intention of every code audit is ensuring that a given program is:

secure,
devoid of bugs and defects,
easy to maintain and be further worked with,
up to date with the current standards,
in line with coding best practices.

Skip any of these, and you’re sacrificing the quality and security of your code, which may—and most probably will—have disastrous consequences. Poor documentation and tech debt might slow down or even halt your project; bugs and security breaches might cost you clients, reputation, and good user ratings. And that’s just the start.

With a code audit, you’ll be sure your code is secure, bug-free and ready for handover.

Code audit vs. code review

After reading the section above, you might think: okay, but everything you’ve just described can be achieved with the help of a code review, and we run these regularly!

It’s true that the terms might sometimes be used interchangeably, but there are a few subtle differences between them.

Code review is contained within one team—the developers review each other's code, and they focus only on one specific part.

A code audit, on the other hand, always concerns the whole project and is performed by a person outside of the team—be it other developers, or even an outside company.

While code reviews are useful and necessary, performing a code audit every once in a while makes a tremendous difference. Let me use another metaphor here: reviews are like checking different parts of your car for potential malfunctions. Of course, it’s necessary to see if the headlights operate correctly, if both wipers are fine, and if your brakes do their job… But unless you start the car, take a drive, and assess how everything works together, you won’t know how good the whole machine actually is.

As the unwritten rule goes, the more people see your code, the better. And the more you fix, the more faultlessly the project will perform in the long run.

When can you benefit from a code audit?

You might find yourself in need of running a code audit on many different stages of development and different situations:

Before introducing your product to the market, to make extra sure the quality is impeccable and you won’t wake up the next day seeing a bunch of one-star reviews;
After inheriting legacy code, to help you plan your future work and assess the scope, cost and time-frame of the project;
Before you invest in a project, to verify whether it’s a good bet;
Whenever you feel your product is suboptimal. Perhaps the app’s lagging, or there are a few too many bugs to ignore? It’s never too late to check the code and apply changes.

The benefits of a code audit

A code audit serves many different purposes. They all depend on where you stand.

From a developers’ point of view, you get the following advantages:

As mentioned above, the more people see your code, the better. If your product has undergone a comprehensive check-up, chances are any potential bugs and vulnerabilities will be found pre-release and you can fix them stress-free. Not to mention it will help you sleep soundly knowing that all the tools are up to date and following the maximum security protocol.
It’s extremely rare that the very same team of developers works on the same product from start to finish. Coworkers might change, sometimes an outsourced team or two might join the efforts, and the total number of developers might be scaled up or down. Additionally, every part of the source code is written by different people with different skills and competences.

That’s why it’s advisable to perform an audit each time you get source code that you haven’t worked with before (for example, we usually run an internal audit on the code we receive from a client before we start working on it). It will help you assess the scope of your work, the general quality, and maintainability.

An audit will help you avoid technical debt. Trust me, “that’s a problem for future me” is not a good approach when it comes to software development.

As a team leader, you’ll find that:

Performing a code audit yourself or at least participating in it will give you an overview of the whole project. Usually, team leaders don’t see the code on a daily basis, so an audit will help them get acquainted with the present state of the project, its structure, and its functionalities.

And from a strictly business perspective, you get the following advantages:

An audit helps prove that your program is ready to be launched and introduced to your clients and customers. Malfunctions or security breaches might potentially cost you a lot of money—and your reputation.
An up-to-date, fresh and technologically relevant project is more attractive for developers. High-quality code will attract high-quality talents!
Audited code helps simplify and streamline the development process, which in turn means work can progress faster with fewer blockers.

Python code audit—step-by-step guide + checklist

In this section, we will introduce a step-by-step process of how to run a Python code audit.

Each subsection details the crucial elements of the code audit. It will give you an idea of how to structure the document.

We also included tips on how to ensure you follow the best possible practices from the very beginning. You can apply them even before the audit!

To see what the end result should look like, consult our example provided in the free PDF below. It’s over 20 pages long and based on a real-life code audit we performed.

Let’s get started!

1. Code repository

In the beginning, it’s important to check for a version control system that tracks and provides changes to the source (like GIT, for example). Verify if it’s well-maintained.

TIP: Consider working according to the Gitflow Workflow, which “dictates what kind of branches to set up and how to merge them together.” Pay attention to the right names of the branches. If your product is particularly vast, consider using appropriate Git tags. It makes managing a larger project infinitely easier.

2. Software architecture

a) Technology choices

The point of this section is to verify if the tech stack is the optimal choice for the project and if it’s internally compatible.

When you start verifying the technology choices, the first step should be to check if all applications used are named according to the LTS version and if they are up-to-date.

Then, it’s time to judge if all the components are well-tested and if they fit each other.

What does it mean in practice? For example, Django apps go together with Postgres much more often than with other database engines, like MySQL. While the less popular choices are not necessarily technologically weaker, opting for them will drastically reduce your opportunities to find help with any potential problems.

Such aspects are important to be taken into account in order to assess the sustainability of the project.

b) Deployment configuration

It’s always worth checking which services are used to support the application. You should pay attention to the software providing hosting services (uwsgi, gunicorn, nginx) and the hosting method (whether it’s cloud or local).

TIP: There is no clear answer which methods are right—each hosting type has its advantages and disadvantages. Everything depends on the type of project you’re working with.

However, I sincerely recommend cloud hosting. It will not only help you save money (no need to care about the hardware, less maintenance, increased productivity), but you also gain much higher availability of the app. Most cloud providers offer over 99,99%!

The next step is to verify whether the application contains files which are responsible for the virtualization of the project.

TIP: I highly advise using Docker. It allows solving a lot of potential problems and bugs during the development stage, as the development version functions in an environment identical to the product version.

Then, it’s time to check whether the README file contains all the necessary elements:

instructions for configuration,
instructions for installation,
a user’s manual,
a manifest file (with an attached list of files),
information on copyrights and licenses,
contact details for the distributors and developers,
known bugs and malfunctions,
problem solving section,
a changelog (for developers).

While revising your project catalog, you should check if it includes files responsible for continuous integration and deployment (CI/CD).

TIP: Well-constructed CI/CD pipelines can greatly benefit your project. They allow for a more effective way of building the program, but they also include scripts responsible for testing the application and verifying its validity during code-building.

Check the project configuration and verify if it doesn’t contain any passwords that a third person could find.

TIP: It’s advisable to keep all logins and passwords necessary to run the application in environment variables—whether in a machine on which the application runs or in the tool responsible for CI/CD.

Check if there’s an Error Tracking System in place. One of the most popular ones is Sentry.

3. Coding best practices

This section will look differently depending on the programming language and the packages/libs you use.

With Python, you need to check carefully whether the code is compliant with the PEP 8 style guide and the PEP 257 docstring conventions.

The good news is, you don’t have to do it all manually. There are tools that might help you along the way.

a) Linters

b) Other standalone tools

Pylint—a source code, bug and quality checker for Python;
PyFlakes—another bug checker (it only checks for logical errors, not for style, but it works faster);
Pycodestyle—checks Python code against the style conventions in PEP 8;
Pydocstyle—checks compliance with Python docstring conventions;
Bandit—finds common security issues in Python code;
MyPy—static type checker for Python.

c) Code analysis and formatting tools

Mccabe—a Python complexity checker;
Radon—a Python tool that computes various metrics from the source code;
Black—a Python code formatter;
Isort—a Python utility/library to sort imports;
Yapf—a Python formatter.

Even though the tools can greatly automate and speed up your work, it’s still worth it to analyze the code manually in order to find any potential:

bugs,
bottlenecks,
performance issues,
security vulnerabilities,
dangers connected with maintaining the application.

4. Tips for the future: how to ensure the quality of your code

Code audits can help improve your code and get rid of any existing issues.

But if upon running the code audit the list of things to improve feels too long, try getting familiar with a few good practices. While not all of them may be applied in every single team, here are a few that are worth taking into consideration:

Every piece of code should be reviewed by at least two developers;
Use githooks;
Decide on one specific formatter configuration for the whole team;
Share your knowledge! Both when it comes to technologies that you are proficient in and when it comes to tasks that you solved—it helps the team to adopt the same good practices;
Consider asking the team members to use the same code editor—it will help with standardization.

Final thoughts

We hope our guide on how to run a code audit will help you perform one on your own, or assess what a good audit document should look like.

If you want to get to work, we recommend you download our PDF—it consists of a checklist and a real-life audit example for reference.

And if you want to find out more about how to ensure the better quality of your code, why not check out the following articles: