DEV Community: Grigor Bezirganyan

Uncertainty-Aware AI from Multimodal Data: A PyTorch Tutorial with LUMA Dataset

Grigor Bezirganyan — Thu, 04 Jul 2024 12:03:36 +0000

We perceive the world in a multimodal manner, combining information from our various senses — such as sight, hearing, smell, touch, and taste — to form a comprehensive understanding of our surroundings. To develop AI models capable of making decisions as well as, or better than, humans, it is essential for these models to also consider multimodal data. Furthermore, AI models must be aware of the confidence levels in their decisions, as incorrect decisions can lead to catastrophic outcomes. In this tutorial, we present a simple guide on how to use the LUMA multimodal dataset to introduce varying levels of uncertainty in the data and estimate the model’s uncertainty.

Uncertainty Quantification

Machine Learning and Deep Learning now drive a wide range of products and applications that we use daily, from image editing software to self-driving cars. These applications often process diverse types of information, including audio, images, text, and sensor data. To build Deep Learning models that perform well, it is crucial to integrate all these types of information during training. We refer to these various forms of data as “data modalities,” and the deep learning models that utilize them are known as Multimodal Deep Learning models.

Similar to conventional deep learning models, Multimodal Deep Learning models also suffer from overconfidence. Overconfidence occurs when a model assigns excessively high probabilities to its predictions, even when they are incorrect. This can often lead to catastrophic results. For example, a confidently wrong prediction in self-driving cars, can lead to injury or death of the passengers, as happened in 2016. To exclude such scenarios, we need to understand how confident really deep learning models are in their predictions. Uncertainty Quantification (UQ) serves this purpose and tries to quantify uncertainties in the data and in the trained model.

Bayesian statistics mostly distinguishes between two types of uncertainties: aleatoric and epistemic. The aleatoric uncertainty refers to the uncertainty inherent in data and cannot be reduced by observing more data. For example, if we look at the image below, we can see that the two classes are mixed, and it is hard to infer what the label of a new point on shall be in the mixed regions. Adding more data, will not make the classification easier.

Aleatoric Uncertainty [left] and Epistemic Uncertainty [right]. Image retrieved from: https://link.springer.com/article/10.1007/s10994-021-05946-3

Epistemic uncertainty, on the other hand, is the uncertainty of the model due to lack of knowledge. For example, in the image above, we see that we don’t have enough data points to confidently say which decision boundary is the best one. In contrast to the aleatoric uncertainty, in this case if we add more data points it can help to acquire additional information and hence, reduce the epistemic uncertainty.

In Multimodal Deep Learning we can have more complex interactions between uncertainties in modalities. It is possible to have a complementary information, which shall reduce the uncertainties, or to have conflicting information, which can increase the uncertainties.

In this blog post we will try to explore different uncertainty scenarios and measure the corresponding uncertainties on LUMA multimodal dataset¹.

LUMA Dataset

We are going to use the LUMA dataset, which allows us to inject different types of noises into each of the modalities and observe the changes in uncertainties. LUMA dataset is comprised of three modalities: audio, image and text. Image modality contains small 32x32 images of different objects. The audio modality contains the pronunciation of the labels of this object, and the text modality contains text passages about the objects. In total there are 50 classes, 42 of which are designed for model training and testing, and another 8 are provided as out-of-distribution data.

First, we need to download and compile the dataset. For that, we need to go to our command line interface (bash in my case), and run the following command, which will clone the LUMA dataset compiler and noise injector:

git clone https://github.com/bezirganyan/LUMA.git 
cd LUMA

Then, we need to install the dependences by creating and activating a conda environment (make sure you have anaconda or miniconda installed):

conda env create -f environment.yml
conda activate luma_env

Having all the dependencies, we can download the dataset to data directory with:

git lfs install
git clone https://huggingface.co/datasets/bezirganyan/LUMA data

Finally, we can compile different dataset versions with different types and amounts of noises in each modality. For compiling the default dataset (i.e. without additional noises), we need to run:

python compile_dataset.py

Now, the LUMA tool allows us to inject different types of noises.

Sample Noise — This type of noise adds realistic noise to each of the modalities. For example, for text modality, it can replace words with antonyms, add typo noise, spelling errors, etc. For audio modality, it can add background conversations, typing noises, etc. And for the image modality, noises like blur, defocus, frost, etc., can be added.
Label Noise — This type of noise, randomly witches the labels of the data samples to their closest classes, which shall increase the mixture between classes.
Diversity — This controls how divers the data points are. If we want to reduce the diversity, then the data points will be more concentrated in the latent space, which means the models will have less information to work with.
Out-of-distribution (OOD) sample — The LUMA dataset also provides us with OOD samples, which means that they are samples that are outside the training distribution. Ideally, the ML model shall have high uncertainty on these kinds of samples, so that it doesn’t make a confidently wrong decision on a distribution it hadn’t seen before.

Noise injection pipeline in LUMA Dataset

Let’s separately inject these noises. To control the amount of noises, we can modify (or create) the configuration file in cfg folder. Nevertheless, there are already some pre-configured options available, that we will use. For sample noise, we can make use of pre-defined configuration file cfg/noise_sample.yml. In particular, we can pay attention to theses lines in configuration for each modality:

sample_noise:
  add_noise_train: True
  add_noise_test: True

They turn on or off the sample noise per modality. The lines immediately below, control noise parameters, and are different for each modality. For audio they look like this:

  sample_noise:
    add_noise_train: True
    add_noise_test: True
    noisy_data_ratio: 1
    min_snr: 3
    max_snr: 5
    output_path: data/noisy_audio

where we can control the noisy data ratio (0.0–1.0), minimum and maximum signal-to-noise ratio, and where to save the noisy audio files.

For text, they look like this:

  sample_noise:
    add_noise_train: True
    add_noise_test: True
    noisy_data_ratio: 1
    noise_config: 
      KeyboardNoise:
        aug_char_min: 1
        aug_char_max: 5
        aug_word_min: 3
        aug_word_max: 8
      BackTranslationNoise:
        device: cuda # cuda or cpu
    ...

Here, you can specify noises from: KeyboardNoise, BackTranslationNoise, SpellingNoise, OCRNoise, RandomCharNoise, RandomWordNoise, AntonymNoise. The parameters for each noise can be found here.

Finally, for image modality, the configuration looks like this:

    sample_noise:
        add_noise_train: True
        add_noise_test: True
        noisy_data_ratio: 1
        output_path: data/noisy_images.pth
        noise_config:
          gaussian_noise:
            severity: 4
          shot_noise:
              severity: 4
          impulse_noise:
            severity: 4

You can choose noises from: gaussian_noise, shot_noise, impulse_noise,
defocus_blur, frosted_glass_blur, motion_blur, zoom_blur, snow, frost, fog, brightness, contrast, elastic, pixelate, jpeg_compression. For each of noises, you can specify a severity parameter, which obtains values from 1–5. Below you can see the examples of different noise types for image:

Image noise types. Image retrieved from: https://arxiv.org/pdf/1903.12261

Then, we can compile the datset with sample noise with:

python compile_dataset.py -c cfg/noise_sample.yml

You can of course use any other configuration files.

To add label noise, one only needs to change the label_switch_prob for each modality. As an example, one can look at cfg/noise_label.yml. Finally, for diversity, one needs to change the compactness parameter. The higher the compactness value, the less diverse the data will be. An example of this can be seen in cfg/noise_diversity.yml.

The OOD data for each generation is saved in a separate file specified in the configuration file.

Loading the Dataset in PyTorch

We can use the class from dataset.py to load the dataset in PyTorch.

from dataset import LUMADataset

train_audio_path = 'data/audio/datalist_train.csv'
train_text_path = 'data/text_data_train.tsv'
train_image_path = 'data/image_data_train.pickle'
train_audio_data_path = 'data/audio'

train_dataset = LUMADataset(train_image_path, 
                            train_audio_path, 
                            train_audio_data_path,
                            train_text_path)

Nevertheless, this will return a raw texts, audios and images, which may not be very comfortable to use in our models. Hence, we would like to process this samples before using them in our models and convert them to more convenient formats. For audio we would like to convert the raw audio data to mel-spectrograms. For that we will define a transform as:

from torchvision.transforms import Compose
from torchaudio.transforms import MelSpectrogram
import torch

class PadCutToSizeAudioTransform():
    def __init__(self, size):
        self.size = size

    def __call__(self, audio):
        if audio.shape[-1] < self.size:
            audio = torch.nn.functional.pad(audio, (0, self.size - audio.shape[-1]))
        elif audio.shape[-1] > self.size:
            audio = audio[:, :self.size]
        return audio

audio_transform = Compose([MelSpectrogram(), PadCutToSizeAudioTransform(128)])

Here we use the MelSpectrogram transform, and then use a custom transform to pad/cut the spectrogram into the same size for all samples.

For text data, we choose to use the average Bert embeddings for training. To do that we can extract the text features into a file, and then define a custom transform for loading the embeddings instead of raw text:

from data_generation.text_processing import extract_deep_text_features

extract_deep_text_features(train_text_path, output_path='text_features_train.npy')
class Text2FeatureTransform():
    def __init__(self, features_path):
        with open(features_path, 'rb') as f:
            self.features = np.load(f)

    def __call__(self, text, idx):
        return self.features[idx]

text_transform=Text2FeatureTransform('text_features_train.npy')

For the image modality, we will normalize the images and convert them to tensors:

from torchvision.transforms import ToTensor, Normalize

image_transform = Compose([
    ToTensor(),
    Normalize(mean=(0.51, 0.49, 0.44),
              std=(0.27, 0.26, 0.28))
])

Finally, we will apply these transforms by passing them to the dataset class:

train_dataset = LUMADataset(train_image_path, train_audio_path, train_audio_data_path, train_text_path,
                            text_transform=text_transform,
                            audio_transform=audio_transform,
                            image_transform=image_transform)

We can load test and OOD data in a similar fashion. The final data loading procedure will be:

import torch
from torchaudio.transforms import MelSpectrogram
from torchvision.transforms import Compose, Normalize, ToTensor

from data_generation.text_processing import extract_deep_text_features
from dataset import LUMADataset

train_audio_path = 'data/audio/datalist_train.csv'
train_text_path = 'data/text_data_train.tsv'
train_image_path = 'data/image_data_train.pickle'
audio_data_path = 'data/audio'

test_audio_path = 'data/audio/datalist_test.csv'
test_text_path = 'data/text_data_test.tsv'
test_image_path = 'data/image_data_test.pickle'

ood_audio_path = 'data/audio/datalist_ood.csv'
ood_text_path = 'data/text_data_ood.tsv'
ood_image_path = 'data/image_data_ood.pickle'


class PadCutToSizeAudioTransform():
    def __init__(self, size):
        self.size = size

    def __call__(self, audio):
        if audio.shape[-1] < self.size:
            audio = torch.nn.functional.pad(audio, (0, self.size - audio.shape[-1]))
        elif audio.shape[-1] > self.size:
            audio = audio[:, :self.size]
        return audio


class Text2FeatureTransform():
    def __init__(self, features_path):
        with open(features_path, 'rb') as f:
            self.features = np.load(f)

    def __call__(self, text, idx):
        return self.features[idx]


extract_deep_text_features(train_text_path, output_path='text_features_train.npy')
extract_deep_text_features(test_text_path, output_path='text_features_test.npy')
extract_deep_text_features(ood_text_path, output_path='text_features_ood.npy')

image_transform = Compose([
    ToTensor(),
    Normalize(mean=(0.51, 0.49, 0.44),
              std=(0.27, 0.26, 0.28))
])

text_transform_train = Text2FeatureTransform('text_features_train.npy')
text_transform_test = Text2FeatureTransform('text_features_test.npy')
text_transform_ood = Text2FeatureTransform('text_features_ood.npy')

audio_transform = Compose([MelSpectrogram(), PadCutToSizeAudioTransform(128)])

train_dataset = LUMADataset(train_image_path, train_audio_path, audio_data_path, train_text_path,
                            text_transform=text_transform_train,
                            audio_transform=audio_transform,
                            image_transform=image_transform)

test_dataset = LUMADataset(test_image_path, test_audio_path, audio_data_path, test_text_path,
                           text_transform=text_transform_test,
                           audio_transform=audio_transform,
                           image_transform=image_transform)

ood_dataset = LUMADataset(ood_image_path, ood_audio_path, audio_data_path, ood_text_path,
                          text_transform=text_transform_ood,
                          audio_transform=audio_transform,
                          image_transform=image_transform)

Building Multimodal UQ model

For building the multimodal UQ model, we are going to use a recent multimodal approach based on evidential learning. Evidential deep learning³ is a method that enhances traditional deep learning models by not only making predictions but also providing a measure of uncertainty about those predictions. It leverages principles from Dempster-Shafer theory, a mathematical framework for evidence-based reasoning. This theory allows the model to combine different pieces of evidence to calculate degrees of belief, rather than a single deterministic output. Instead of just giving a single answer, evidential learning outputs a range of possible answers along with the confidence level in each.

Following the ideas presented by Xu et al., (2024), we are going to build evidential networks for each modality and combine them using their proposed conflictive opinion aggregation strategy (RCML⁴). The image classifier, hence, will look like this:

class ImageClassifier(torch.nn.Module):
    def __init__(self, num_classes, dropout=0.3):
        super(ImageClassifier, self).__init__()
        self.image_model = torch.nn.Sequential(
            torch.nn.Conv2d(3, 32, 3),
            torch.nn.ReLU(),
            torch.nn.MaxPool2d(2),
            torch.nn.Dropout(dropout),
            torch.nn.Conv2d(32, 64, 3),
            torch.nn.ReLU(),
            torch.nn.MaxPool2d(2),
            torch.nn.Dropout(dropout),
            torch.nn.Flatten(),
        )
        self.classifier = torch.nn.Linear(64 * 6 * 6, num_classes)

    def forward(self, x):
        image, audio, text = x
        image = self.image_model(image.float())
        return self.classifier(image)

Similarly, the audio and text classifiers will be:

class AudioClassifier(torch.nn.Module):
    def __init__(self, num_classes, dropout=0.5):
        super(AudioClassifier, self).__init__()
        self.audio_model = torch.nn.Sequential(  # from batch_size x 1 x 128 x 128 spectrogram
            torch.nn.Conv2d(1, 32, 5),
            torch.nn.ReLU(),
            torch.nn.MaxPool2d(2),
            torch.nn.Dropout(dropout),
            torch.nn.Conv2d(32, 64, 3),
            torch.nn.ReLU(),
            torch.nn.MaxPool2d(2),
            torch.nn.Dropout(dropout),
            torch.nn.Conv2d(64, 64, 3),
            torch.nn.ReLU(),
            torch.nn.MaxPool2d(2),
            torch.nn.Dropout(dropout),
            torch.nn.Flatten()
        )
        self.classifier = torch.nn.Linear(64 * 14 * 14, num_classes)

    def forward(self, x):
        image, audio, text = x
        audio = self.audio_model(audio)
        return self.classifier(audio)

class TextClassifier(torch.nn.Module):
    def __init__(self, num_classes, dropout=0.5):
        super(TextClassifier, self).__init__()
        self.text_model = torch.nn.Sequential(
            torch.nn.Linear(768, 512),
            torch.nn.ReLU(),
            torch.nn.Dropout(dropout),
            torch.nn.Linear(512, 256),
            torch.nn.ReLU(),
            torch.nn.Dropout(dropout),
        )
        self.classifier = torch.nn.Linear(256, num_classes)

    def forward(self, x):
        image, audio, text = x
        text = self.text_model(text)
        return self.classifier(text)

Having these uni-modal classifiers, we will combine them into a multimodal network:

class MultimodalClassifier(torch.nn.Module):
    def __init__(self, num_classes, dropout=0.5):
        super(MultimodalClassifier, self).__init__()
        self.image_model = ImageClassifier(num_classes, dropout)
        self.audio_model = AudioClassifier(num_classes, dropout)
        self.text_model = TextClassifier(num_classes, dropout)

    def forward(self, x):
        image_outputs = self.image_model(x)
        audio_outputs = self.audio_model(x)
        text_outputs = self.text_model(x)

        image_logits = torch.nn.functional.softplus(image_outputs)
        audio_logits = torch.nn.functional.softplus(audio_outputs)
        text_logits = torch.nn.functional.softplus(text_outputs)
        logits = [image_logits, audio_logits, text_logits]
        agg_logits = image_logits
        for i in range(1, 3):
            agg_logits = (agg_logits + logits[i])/2
        return agg_logits, (image_logits, audio_logits, text_logits)

Here we use the softplus function, since in the evidential networks, evidences shall be non-negative numbers. The diagram of the architecture can be seen in the image below:

The Architecture of the Multimodal Classifier. It takes input from the 3 modalities, and provides the prediction based on the information form those modalities. The Fusion is performed based on the RCML approach discussed above.

To make our training easier, we are going to use the PyTorch Lightning framework. For that, we need to define another lightning class:

import numpy as np
import pytorch_lightning as pl
import torch
from torchmetrics import Accuracy

from baselines.utils import AvgTrustedLoss

class DirichletModel(pl.LightningModule):
    def __init__(self, model, num_classes=42, dropout=0.):
        super(DirichletModel, self).__init__()
        self.num_classes = num_classes
        self.model = model(num_classes=num_classes, monte_carlo=False, dropout=dropout, dirichlet=True)
        self.train_acc = Accuracy(task='multiclass', num_classes=num_classes)
        self.val_acc = Accuracy(task='multiclass', num_classes=num_classes)
        self.test_acc = Accuracy(task='multiclass', num_classes=num_classes)
        self.criterion = AvgTrustedLoss(num_views=3)
        self.aleatoric_uncertainties = None
        self.epistemic_uncertainties = None

    def forward(self, inputs):
        return self.model(inputs)

    def training_step(self, batch, batch_idx):
        loss, output, target = self.shared_step(batch)
        self.log('train_loss', loss)
        acc = self.train_acc(output, target)
        self.log('train_acc_step', acc, prog_bar=True)
        return loss

    def shared_step(self, batch):
        image, audio, text, target = batch
        output_a, output = self((image, audio, text))
        output = torch.stack(output)
        loss = self.criterion(output, target, output_a)
        return loss, output_a, target

    def validation_step(self, batch, batch_idx):
        loss, output, target = self.shared_step(batch)
        self.val_acc(output, target)
        alphas = output + 1
        probs = alphas / alphas.sum(dim=-1, keepdim=True)
        entropy = self.num_classes / alphas.sum(dim=-1)
        alpha_0 = alphas.sum(dim=-1, keepdim=True)
        aleatoric_uncertainty = -torch.sum(probs * (torch.digamma(alphas + 1) - torch.digamma(alpha_0 + 1)), dim=-1)
        return loss, output, target, entropy, aleatoric_uncertainty

    def test_step(self, batch, batch_idx):
        loss, output, target = self.shared_step(batch)
        self.test_acc(output, target)
        alphas = output + 1
        probs = alphas / alphas.sum(dim=-1, keepdim=True)
        entropy = self.num_classes / alphas.sum(dim=-1)
        alpha_0 = alphas.sum(dim=-1, keepdim=True)
        aleatoric_uncertainty = -torch.sum(probs * (torch.digamma(alphas + 1) - torch.digamma(alpha_0 + 1)), dim=-1)
        return loss, output, target, entropy, aleatoric_uncertainty

    def training_epoch_end(self, outputs):
        self.log('train_acc', self.train_acc.compute(), prog_bar=True)
        self.criterion.annealing_step += 1

    def validation_epoch_end(self, outputs):
        self.log('val_acc', self.val_acc.compute(), prog_bar=True)
        self.log('val_loss', np.mean([x[0].detach().cpu().numpy() for x in outputs]), prog_bar=True)
        self.log('val_entropy', torch.cat([x[3] for x in outputs]).mean(), prog_bar=True)
        self.log('val_sigma', torch.cat([x[4] for x in outputs]).mean(), prog_bar=True)

    def test_epoch_end(self, outputs):
        self.log('test_acc', self.test_acc.compute(), prog_bar=True)
        self.log('test_entropy_epi', torch.cat([x[3] for x in outputs]).mean())
        self.log('test_ale', torch.cat([x[4] for x in outputs]).mean())
        self.aleatoric_uncertainties = torch.cat([x[4] for x in outputs]).detach().cpu().numpy()
        self.epistemic_uncertainties = torch.cat([x[3] for x in outputs]).detach().cpu().numpy()

    def configure_optimizers(self):
        optimizer = torch.optim.Adam(self.parameters(), lr=1e-2)
        scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.33, patience=5,
                                                               verbose=True)
        return {
            'optimizer': optimizer,
            'lr_scheduler': scheduler,
            'monitor': 'val_loss'
        }

Here we predict the correct class of the network, and also compute the aleatoric and epistemic uncertainties.

Training the Multimodal Model

For training we just need to define dataloaders, and use PyTorch Lightning Trainer class for training.

batch_size = 128
classes = 42
dropout_p = 0.3
train_dataset, val_dataset = torch.utils.data.random_split(train_dataset, [int(0.8 * len(train_dataset)),
                                                                           len(train_dataset) - int(
                                                                               0.8 * len(train_dataset))])

train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=8)
val_loader = torch.utils.data.DataLoader(val_dataset, batch_size=batch_size, shuffle=False, num_workers=8)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=batch_size, shuffle=False, num_workers=8)
ood_loader = torch.utils.data.DataLoader(ood_dataset, batch_size=batch_size, shuffle=False, num_workers=8)

# Now we can use the loaders to train a model


model = DirichletModel(MultimodalClassifier, classes, dropout=dropout_p)
trainer = pl.Trainer(max_epochs=300,
                     gpus=1 if torch.cuda.is_available() else 0,
                     callbacks=[pl.callbacks.EarlyStopping(monitor='val_loss', patience=10, mode='min'),
                                pl.callbacks.ModelCheckpoint(monitor='val_loss', mode='min', save_last=True)])
trainer.fit(model, train_loader, val_loader)
print('Testing model')
trainer.test(model, test_loader)
print('Test results:')
print(trainer.callback_metrics)
aleatoric_uncertainties = model.aleatoric_uncertainties
epistemic_uncertainties = model.epistemic_uncertainties
print('Testing OOD')
trainer.test(model, ood_loader)
aleatoric_uncertainties_ood = model.aleatoric_uncertainties
epistemic_uncertainties_ood = model.epistemic_uncertainties
auc_score = roc_auc_score(
    np.concatenate([np.zeros(len(epistemic_uncertainties)), np.ones(len(epistemic_uncertainties_ood))]),
    np.concatenate([epistemic_uncertainties, epistemic_uncertainties_ood]))
print(f'AUC score: {auc_score}')

Here we are logging the classification accuracy, the average uncertainty values and the AUC score for OOD detection.

For training on the noisy versions of the datasets, we just need to change the data paths to noisy data paths.

Training Results

On the clean data (without injecting additional noise), we get the following results:

As we can see, adding noise effectively raises the uncertainty metrics. An interesting research direction, hence, is to adjust the noise levels and see how the uncertainties change. It is essential not only to build DL models robust to these noises but find UQ methods that reliably can indicate when the models are unsure about their predictions.

Acknowledgements

This blog post is written based on the code and dataset of LUMA, published within the scope of my PhD thesis at Aix-Marseille University (AMU), CNRS, LIS. I would like to mention and thank my PhD Supervisors and paper co-authors Sana Sellami (AMU, CNRS, LIS), Laure Berti-Équille (IRD, ESPACE-DEV), and Sébastien Fournier (AMU, CNRS, LIS).

If you liked this port, please star LUMA at GitHub. We will be happy to hear you thoughts, questions or suggestions in the discussion below.

[1] Bezirganyan, G., Sellami, S., Berti-Équille, L., & Fournier, S. (2024). LUMA: A Benchmark Dataset for Learning from Uncertain and Multimodal Data. http://arxiv.org/abs/2406.09864 arXiv:2406.09864

[2] Devlin, J., Chang, M., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. North American Chapter of the Association for Computational Linguistics.

[3] Sensoy, M., Kandemir, M., & Kaplan, L.M. (2018). Evidential Deep Learning to Quantify Classification Uncertainty. ArXiv, abs/1806.01768.

[4] Xu, C., Si, J., Guan, Z., Zhao, W., Wu, Y., & Gao, X. (2024). Reliable Conflictive Multi-View Learning. AAAI Conference on Artificial Intelligence.

Editor Wars: VIM as a perfect Python IDE

Grigor Bezirganyan — Sun, 08 Sep 2019 17:12:46 +0000

Backstory

Nowadays, there are many text editors and IDEs present for developers and it is often a hard decision to choose a particular one. Since most of them offer quite similar interface and functionality, it doesn’t really matter what editor will a beginner developer use. Nevertheless, for an intermediate or advanced developer choosing the right editor can give a significant performance boost.

There are many modern and powerful editors and IDEs, such as JetBrains IDEs, Visual Studio Code, Atom, etc. Nevertheless I would like to concentrate on one of the relatively old text editors, which can be as powerful (or more powerful :) ) as the other text editors, while, giving you a better typing experience. If you haven’t guessed it, I am talking about vim: a text editor released in 1991, which is still very popular. Although vim isn’t very beginner-friendly and isn’t as powerful out-of-the-box as other IDEs, with several plug-ins and configurations it can give you a better performance than your standard IDE.

In my previous post I talked about configuring VIM to compete with other IDEs. It was more C/C++ oriented, as that time my main language of coding was C++. A year ago, I changed my workplace and Python became my main programming language. Since almost everyone was using PyCharm at my workplace for Python development, I decided to give it a shot. I have to admit, PyCharm is a really good IDE and I enjoyed it’s smart functionality. Nevertheless I missed my typing experience in VIM. Installing vim layout for PyCharm didn’t rally help as well, and I decided to spend a day and configure my vim to have all the functionalities of PyCharm that I really needed.

Now let’s go to some of the features I managed to bring to VIM that made not to miss the PyCharm IDE.

Note that the plugins mentioned here are not python specific, but since I have mostly tested them on python, I cannot say for sure how they will work on other languages. If you have tested them it would be interesting to hear about your experience in the comment section:

Configuration

Code Completion

In my previous post I talked about YouCompleteMe, which is an awesome and open-source plug-in that offers a very good code suggestion for many languages. It was one of the best plug-ins I found while I was developing in C++, nevertheless, for Python I found a better alternative. Kite claims to use Machine Learning to offer useful code completion. Since Kite is closed-source we can not be really sure whether they really use Machine Learning or not, but I can guarantee that you will like it’s code suggestion.

One of the disadvantages of Kite is the fact that it is closed-source, and if you are a sworn open-source person, YouCompleteMe is still a pretty good option.

Error Detection

For syntax checking and error detection I use the ALE (Asynchronous Lint Engine) plug-in, which allows you to check your syntax while you type. It uses

ALE makes use of NeoVim and Vim 8 job control functions and timers to run linters on the contents of text buffers and return errors as text is changed in Vim. This allows for displaying warnings and errors in files being edited in Vim before files have been saved back to a filesystem

Furthermore, after some configurations ALE can check whether the python code is PEP-8 compliant and fixes it if it is not.

Navigation

Fast and effective navigation across files is an essential feature for fast development. One of the well known plug-ins for navigation in vim is the NERD Tree. Nevertheless I found the vim-vinegar to be a better alternative. It offers a cleaner interface and better shortcuts for navigation.

crtl-p is another useful plug-in for easy navigation in vim. It is a

Full path fuzzy file, buffer, mru, tag, ... finder for Vim.

Written in pure Vimscript for MacVim, gVim and Vim 7.0+.

Full support for Vim's regexp as search patterns.

Built-in Most Recently Used (MRU) files monitoring.

Built-in project's root finder.

Open multiple files at once.

Create new files and directories.

Extensible.

tmux

Although tmux is not a vim plug-in, it really improves the experience while coding in vim. Not only I can ssh to my machine anytime I want from anywhere in the world and get my session and layout, but it also takes my multitasking ability to a whole level.

Other plugins?

As I mentioned in my previous post there is a very awesome website full of various vim plugins, called vimawesome.com. You can find many many more plugins there and make your vim much closer to an actual IDE.

You can comment about your vim configuration below. By the way this is my first blog post, so I am waiting for your positive criticism in the comment section.

Who said that VIM cannot compete with IDEs?

Grigor Bezirganyan — Fri, 29 Dec 2017 22:35:49 +0000

Every developer has a favorite text editor or an IDE, and as long as developers exist, Text Editor wars are not going anywhere. The two main forces of the so called "Editor War" are Vim and Emacs. They are both great and can do a lot, but... but they are not an IDE.

I've heard many developers, who are in love with vim or Emacs, but use IDEs like clion, CodeBlocks, VisualStudio, etc., because of certain features.

But wait! What if I tell you that you can turn your vim or Emacs into an IDE? Yes, it will work, and in majority of cases it will work better and faster than your traditional IDE, at least for C/C++ it does.

Configuring vim to become an IDE

Well, the main basic features of an IDE are

Code completion
Error detection
Debugging

Of course there are many more features that IDEs have, and it is possible to configure vim to do most of them, but in this post I will only speak about these main three.

So let's look at each feature and try to make it work in vim

Note: The configuration is for C/C++, but with light modifications and right plugins it is possible to make them work for your desired language.

Code completion

Vim already has a built in code completion and all you have to do is type the starting letters and press crtl + n, and it will bring suggestions from your code. But that's not enough. All IDEs have smarter code completion, which will bring suggestions not only from the code you've already written, but from the libraries you use.

You can achieve such feature with a vim plugin called YouCompleteMe.

All you have to do is install the plugin and configure for your desired language and libraries.

Error Detection

One of the things I like in Vim is when you use make command from it, in case of errors it brings the cursor on the line, where the first compile error occurred. However, the IDEs do a lot more than that. They detect the errors, right after you type
them.

For vim, the same functionality provides the YouCompleteMe plugin, which we've already discussed above. After typing, when you press the Esc key, in case of compile errors, it will highlight the code parts causing the errors.

Debugging

When I was just starting to use Vim, for debugging my programs I had to open another terminal, run gdb, type ref and try to use not so user friendly interface, without syntax highlighting. When writing command I couldn't even use my arrow keys. Well, that sucked.

Then I discovered the Conque-GDB

Conque-GDB is a vim script that integrated GDB into your vim. Cool right? What's more cool, is that it shows the breakpoints or segmentation fault location right on your code in vim. You can even use your arrow keys moving your cursor!

I will be honest, I hated GDB even after using this plugin, until, one day, during a Programming Olympiad I had to use Visual Studio's debugger. Well I couldn't even find where my program threw a SegFault. So what I did (I don't even know if I had the right to do so) was to boot Linux from my USB and use Conque-GDB.

Other plugins?

There is a very awesome website full of various vim plugins, called vimawesome.com. You can find many many more plugins there and make your vim much closer to an actual IDE.

You can comment about your vim configuration below. By the way this is my first blog post, so I am waiting for your positive criticism in the comment section.

source