dacbreakpoint

Posted on May 22, 2022

Setup your private LanguageTool server

#privacy #languagetool #vscode

General

This year I started my new job in my first English-speaking company. As a senior software engineer, I have to write many emails, complex task- and bug descriptions and create presentations about various topics. Therefore it is great to have an assistant to improve my grammar and spelling.

When looking at some great applications I stumbled across LanguageTool - an open-source grammar, style and spell checker. To increase your privacy and be the owner of your data, it is possible to host your own instance.

This blog post will guide you through the setup and configuration.

LanguageTool and Grammarly

From time to time, I also write emails that contain sensitive information and I developed an awareness of how often these tools are going to process this critical information. I was getting the uncomfortable feeling that this data could be intercepted or used in some way I did not want.

When using closed source software, you have to trust the company and as soon as they store the data on their server - the data is out of my hand. When writing a blog post about LanguageTool, I have to be fair and mention Grammarly. From my personal experience, it is much more powerful and the premium features "readability" and "fluency" are really good. The Privacy Policy from Grammarly says that they don't sell my "information".

LanguageTool comes with its own embedded HTTP server so that a text can send to your own instance of LanguageTool. It is also possible to write your own application and use the HTTP API.

Installation

Docker Compose

I like to host every application within a dedicated docker-compose file. When doing so, the HTTP reverse proxy traefik integrates very well with docker and enables you a dynamic configuration - it is worth taking a look!

Prerequisites

Infrastructure with docker, docker-compose and traefik
Approximately 20 GB of free storage
Prepare your n-gram datasets
- Download the n-gram datasets onto your server - I am using the de and en dataset for my daily work which is consuming 18 GB in total! "n-gram" data sets are used to detect errors with words. For more information, see here and here.

Ensure that you have the following folder structure:

├── languagetool # base folder
│   ├── docker-compose.yml # configuration file
│   └── ngrams
│       ├── de # downloaded n-gram data for de
│       │   ├── 1grams
│       │   ├── 2grams
│       │   └── 3grams
│       └── en
│           ├── 1grams
│           ├── 2grams
│           └── 3grams

docker-compose.yml

Following, you can see my personal docker-compose.yml, which you can use as a reference. For a more detailed description, you can look at the image description erikvl87/languagetool.

version: "3"
services:
  languagetool:
    container_name: languagetool
    image: erikvl87/languagetool
    restart: unless-stopped
    environment:
      - langtool_languageModel=/ngrams  # OPTIONAL: Using ngrams data
      - Java_Xms=512m  # OPTIONAL: Setting a minimal Java heap size of 512 mib
      - Java_Xmx=1g  # OPTIONAL: Setting a maximum Java heap size of 1 Gib
    volumes:
      - ./ngrams:/ngrams
    ports:
      - 8010
    labels:
      - traefik.enable=true
      - traefik.http.services.languagetool.loadbalancer.server.port=8010
      - traefik.http.routers.languagetool-http.rule=Host(`languagetool.example.com`)
      - traefik.http.routers.languagetool-http.entrypoints=web
      - traefik.http.routers.languagetool-http.middlewares=resecure@file
      - traefik.http.routers.languagetool-https.rule=Host(`languagetool.example.com`)
      - traefik.http.routers.languagetool-https.entrypoints=websecure
      - traefik.http.routers.languagetool-https.tls=true
      - traefik.http.routers.languagetool-https.tls.certresolver=le
      - traefik.docker.network=web
    networks:
      - web

networks:
  web:
    external: true

Once everything is up and running, you can verify container logs or query a test request like:

curl --data "language=en-US&text=a simple test" https://languagetool.example.com/v2/check

Using your own languagetool server

LanguageTool is well-integrated within popular browsers and various tools. When editing markdown files, I mainly use it within my primary browser Google Chrome and within my source-code editor vscode.

Browser-Plugin

When using the LanguageTool browser-plugin, you need to navigate to the options-menue and setup your own LanguageTool server like this:

Visual Studio Code Extension

If you want language-tool support for your markdown files within vscode you can use the LTEX Extension.

It will provide issue highlighting, supports comment checking for many programming languages (opt-in) and you can use the replacement suggestions via the quick-fixes dialog, which looks like the following:

To use your own LanguageTool server, you have to configure the root url within the ltex.languageToolHttpServerUrisetting:

DEV Community

Setup your private LanguageTool server

General

LanguageTool and Grammarly

Installation

Docker Compose

Prerequisites

docker-compose.yml

Using your own languagetool server

Browser-Plugin

Visual Studio Code Extension

Top comments (0)

Read next

Building a Spam Email Classifier Using AI: A Basic Application

Automating JIRA Ticket Creation with a Flask API: A GitHub Webhook Integration Guide

HTTP Status Codes Explained

Real-World Crypto Magic: Putting Go's Crypto Package to Work, Go Crypto 13