Rachel Curioso :D

Posted on Oct 7, 2019 • Edited on Oct 15, 2019

Identifying the dirt in our code - formatting, objects and data structures and how to handle errors

#cleancode #practices #productivity #books

This post is the second in a series of 3 about clean code.

We will talk about code formating, object and data structure, and how to handle errors.

I know that some of those things may seem silly, but a lot of interesting things are discussed within these topics.

In formating I talk, among other things, about the best order for your functions and variables. When I talk about object and data structure, I talk about how you protect the inside of your object. And about errors, I talk about prepare your code to deal with problems when they happen.

In the first post I talked about names, functions and comments I advise you to read (: (Or not! You are not obliged! Rebel!)

The intention is to finish the series in the next post, talking about how to work with third-party code, how to organize your unit test and, finally, how to organize better your classes.

Let's Go

Formatting

When we look at a code that doesn't have good formatting, the first impression is that all code is not very good, and we end up not taking it seriously.

When the subject is formatting, we look not only at the indentation but also at how code blocks are organized.

We can divide the subject formatting into two parts: Vertical formatting and horizontal formatting.

Vertical Formatting

A newspaper story begins with a good headline that tells the reader what the story is about. The first paragraph is the equivalent of a synopsis, so you can get an idea of the tone of the story and as you read it, you'll discover more details.

A good code should be like that too. The module name should be clear so people know what it is about, soon after comes higher-level concepts along with algorithms, and as the code unfolds, there should come lower-level functions with the source code details.

Ideally, the code should be divided into modules, with a space between each module, and they should follow an order, for the code to tell a story.

First, the variables must be declared and then the dependent functions. The only thing that shouldn't follow this rule are functions that have a conceptual affinity.

In the example below, I show the order that functions should be called, and when I speak of functions that have a conceptual affinity, I speak of chocolate? and not_chocolate?

(People, focus on the order that the functions appear and not the names)

something
somewhere = some_place

def ice_cream
  banana(something_1)
  cheese(something_2)
end

def banana(something_3)
  return "something" if chocolate?(something_3)
  avocado(something_3)
end

def chocolate?(something_4)
  #some code
end

def chocolate?(something_4)
  #Another code
end

def avocado(something_5)
  #One more code
end

def cheese(something_6)
  #One more of another code
end

Still talking about the order in which things appear in code, remember that loop control variables are declared within the loop itself:

for(var i = 0; i > something; i++)

Horizontal

It's a good practice to avoid very long lines of code. Some people like to use a limit of 80 characters, others 100, and that's fine, but over 120 characters may be a sign of carelessness.

Another good practice is to group things that we don't want to separate and not separate things that we want to group together. It seems obvious to say so, but sometimes this is not what happens:

in the expression def something(args) we want that something and (args) to be together because they are the same thing. If we put a space between them, def something (args), we get an idea that they are different things.

In int something = another_something we are talking that something and another_something are different things, and when we write int something=another_something, we lose a little of this distinction.

Another important thing to note about horizontal formatting is how we align our variables.

Somethime we feel the need to align this way:

private Socket        socket;
private InputStream   input;
private OutputStream  output;
private Request       request;
private Response      response;
private boolean       hasError;
private long          requestParsingTimeLimit;
private long          requestedProgress;

But we have three problems when we do this:

We take away the intention of what matters. In the example above, we end up not looking for the type of the variable. We only look at your name
Usually, if we use the editor tools to format our code, it takes all these spaces, and we lost our formatting
If we realize that we should align the variables, it's a sign that we have too many variables and, probably, we can divide what we have in more pieces.

This the most appropriate way to align the variables:

private Socket socket;
private InputStream input;
private OutputStream output;
private Request request;
private Response Response;
private boolean hasError;
private long requestParsingTimeLimit;
private long requestedProgress;

Objects and data structure

If we like private variables, why we always write getters and setters in our code? How to handle access to our classes?

Data Abstraction

When we use a method or a class, we don't need to know how they are implemented. What interests us is their returns.

In the example below, we can see the difference between a class with public variables and an interface with various access policies.

public class Point(){
  public double x;
  public double y;
}

The interface with various access policies:

public interface Point(){
  double getX();
  double getY();
  void setCartesian (double x, double y);
  double getR();
  double getTheta();
  void setPolar(double r, double theta);
}

Notice that on the second example, we don't have any method like setX or setY. If we want to create a point, we need to pass both like a couple, using setCartesian.

The first example forces us to deal with each variable separately, even though they are a couple.

Hide the implementation it's not only a matter of hiding variables. It's a matter of abstraction. A class is not just a tangle of getters and setters. They manipulate data and that should be their essence.

Data Structure or Objects?

Objects hide behind abstractions, and its public functions serve to manipulate their data.

As a counterpoint, in data structures, everything is exposed.

class Square < Shape
  Point top_and_left
  side
end

class Circle < Shape
  Point center
  radius;
end

class Geometry
  PI = 3.141592

  def area(geometric_form)
    if geometric_form.is_a?(Square)
      # calculate the area of a square
    elsif geometric_form.is_a?(Circle)
      # calculate the area of a circle using PI
    end
  end
end

In the example above, we have a data structure with this problem:

Any new shape that we want to create, we need to go to the Geometry class and add one more if to the area function.

In this particular case, we could use inheritance to implement the area function in each object, not the Geometry class.

In Object Orientation, it's easy to create new classes without modifying functions, but on the other hand, it's difficult to add new functions in Object Orientated Programming (OOP), as this impacts all the child classes.

Occasionally, it's more important to add new functions than to create child classes. In such cases, don't be afraid to use procedural code.

The important is not to mix data structure with objects, because in that case, we would have the worst of both worlds. The result is structured that are hard to inherit, and difficult to modify existing functions.

The law of Demeter

The law of Demeter states that a module must not know the inside of the module it handles.

A method shouldn't invoke methods or objects of another class.

Looking at the example bellow:

final String outputDir = ctxt.getOptions().getScratchDir().getAbsolutePath();

If ctxt was an object, to access getAbsolutePath() it would need the getOptions() return in order to work. That is, the functions are dependent on each other in order to function.

We could solve this problem this way:

Options opts = ctxt.getOptions();
File scracthDir = opts.getScratchDir();
final String outputDir = stracthDir.getAbsolutePath();

However, we should be aware that if the example above were a data structure, it wouldn't be a problem.

first_name = full_name.to_downcase.split(" ").first

The first example given, we have an information access problem and we have a violation of Demeter's law. But in the example above, we are transforming a piece of information, and there is no problem with this.

Data transfer

When we talk about data transfer, we are referring to objects that have only one variable, but no method.

Some people like to leave all attributes private, and access them through getters and setters.

The only real advantage in this is to makes the code more object-oriented, and ready for possible growth.

public class Person{
  private String name;
  private int age;

  public Person (String name, int age){
    this.name = name;
    this.age = age;
  }

  public getName(){
    return name;
  }

  public getAge(){
    return age;
  }
}

Active Record

The active record works as direct database translations. Usually, they are Data Transfer Objects with some navigational methods like save or find.

It turns out that when we try to add business rules and make it more robust, we end up creating a hybrid structure, half object, half data structure that, as we saw above, is always bad.

The solution to this problem is to treat the Active Record as a data structure and create separate objects that have business rules and other internal data.

Error Handling

Dealing with error is important, but when error handling leaves logic obscure and confusing, it is being done wrong.

In the old days, people tried to check for all kinds of mistakes at once, and that brought many problems besides those mentioned above.

Another problem in this approach is that you need to think in all scenarios before writing code, which not only impairs readability, but it is easy to forget about handling the error later:

if ("scenery 01")
  #...
elsif ("scenery 02")
#(...)
elsif ("scenery 234334134134")
  #...
end

In the example above, it would be interesting to encapsulate the code with a more generic error handling, but clear enough that you can find the root of the problem. For example, if you have a class that fetches users on another system via HTTP, and you have an HTTP error, you don't want to have a generic HTTP exception, but an exception that says you couldn't find the user.

When we throw exceptions, we need to be careful to make the messages clear so that you, the developer, know how to identify the problem after it occurs.

Remember that not every error is an exception. Many errors can be treated as normal code flow. Well done logging is often better than throwing an exception.

One tip to writing error-handling code is to write your try-catch first.

Think that all that is inside the try-catch block should be independent and decoupled from the rest of the application, because if that piece of code goes wrong, probably we don't want to broke everything else.

Writing the try-catch block first helps to make it independent.

It is important to note that not all places need exception handling: those most in need are those where there is uncertainty about the outside world, such as network connections, database interactions, etc.

Another point that needs to be raised is knowing where in the stack you will want to leave the error. The higher above the stack and closer to the user, the better, because the farther from the interaction layer you leave, the harder it is to find the error. However, leaving it close to the user is different from popping the error for the user to see. Sometimes we don't want the user to know what went wrong, but we always need to know what the user did to make such an error happen.

This is a very long topic that would make a book

A preview about dealing with someone else's code

When we use third-party code, it is best not to treat its errors directly in your code.

The ideal is to encapsulate this lib in a class of its own and handle the error inside.

Null/Nil

When we work with null in our code, we create a difficult problem to debug.

When something goes wrong, it throws a generic exception, informing us that something wrong is not right. This exception is so generic that finding its root is very difficult.

Sometimes we need to check if something doesn't exist and we are tempted to verify if something == null. In this case, look to see if there are no other ways of checking. Probably there is a way.

If it's a list, we can verify if the list is empty. If it's a String, we can verify if it's blank and so on.

And, the most important, don't pass null as an argument. When we do this, the chances of receiving a null pointer exception (and equivalents) are much greater.

cover image by Jazmin Quaynor

🚀 pgai Vectorizer: SQLAlchemy and LiteLLM Make Vector Search Simple

We built pgai Vectorizer to simplify embedding management for AI applications—without needing a separate database or complex infrastructure. Since launch, developers have created over 3,000 vectorizers on Timescale Cloud, with many more self-hosted.

Read full post →

Top comments (1)

Patrick Decker • Oct 8 '19

Nice article, especially the view on error handling.
I think error messages are something where most codebases have still lots of room for improvement.
Most error messages just tell the user or developer what went wrong, but in most cases they could also tell how to fix the problem.

DEV Community