DEV Community

dacbreakpoint
dacbreakpoint

Posted on

Setup your private LanguageTool server

General

This year I started my new job in my first English-speaking company. As a senior software engineer, I have to write many emails, complex task- and bug descriptions and create presentations about various topics. Therefore it is great to have an assistant to improve my grammar and spelling.

When looking at some great applications I stumbled across LanguageTool - an open-source grammar, style and spell checker. To increase your privacy and be the owner of your data, it is possible to host your own instance.

This blog post will guide you through the setup and configuration.

LanguageTool and Grammarly

From time to time, I also write emails that contain sensitive information and I developed an awareness of how often these tools are going to process this critical information. I was getting the uncomfortable feeling that this data could be intercepted or used in some way I did not want.

When using closed source software, you have to trust the company and as soon as they store the data on their server - the data is out of my hand. When writing a blog post about LanguageTool, I have to be fair and mention Grammarly. From my personal experience, it is much more powerful and the premium features "readability" and "fluency" are really good. The Privacy Policy from Grammarly says that they don't sell my "information".

LanguageTool comes with its own embedded HTTP server so that a text can send to your own instance of LanguageTool. It is also possible to write your own application and use the HTTP API.

Installation

Docker Compose

I like to host every application within a dedicated docker-compose file. When doing so, the HTTP reverse proxy traefik integrates very well with docker and enables you a dynamic configuration - it is worth taking a look!

Prerequisites

  • Infrastructure with docker, docker-compose and traefik
  • Approximately 20 GB of free storage
  • Prepare your n-gram datasets
    • Download the n-gram datasets onto your server - I am using the de and en dataset for my daily work which is consuming 18 GB in total! "n-gram" data sets are used to detect errors with words. For more information, see here and here.

Ensure that you have the following folder structure:

├── languagetool # base folder
│   ├── docker-compose.yml # configuration file
│   └── ngrams
│       ├── de # downloaded n-gram data for de
│       │   ├── 1grams
│       │   ├── 2grams
│       │   └── 3grams
│       └── en
│           ├── 1grams
│           ├── 2grams
│           └── 3grams
Enter fullscreen mode Exit fullscreen mode

docker-compose.yml

Following, you can see my personal docker-compose.yml, which you can use as a reference. For a more detailed description, you can look at the image description erikvl87/languagetool.

version: "3"
services:
  languagetool:
    container_name: languagetool
    image: erikvl87/languagetool
    restart: unless-stopped
    environment:
      - langtool_languageModel=/ngrams  # OPTIONAL: Using ngrams data
      - Java_Xms=512m  # OPTIONAL: Setting a minimal Java heap size of 512 mib
      - Java_Xmx=1g  # OPTIONAL: Setting a maximum Java heap size of 1 Gib
    volumes:
      - ./ngrams:/ngrams
    ports:
      - 8010
    labels:
      - traefik.enable=true
      - traefik.http.services.languagetool.loadbalancer.server.port=8010
      - traefik.http.routers.languagetool-http.rule=Host(`languagetool.example.com`)
      - traefik.http.routers.languagetool-http.entrypoints=web
      - traefik.http.routers.languagetool-http.middlewares=resecure@file
      - traefik.http.routers.languagetool-https.rule=Host(`languagetool.example.com`)
      - traefik.http.routers.languagetool-https.entrypoints=websecure
      - traefik.http.routers.languagetool-https.tls=true
      - traefik.http.routers.languagetool-https.tls.certresolver=le
      - traefik.docker.network=web
    networks:
      - web

networks:
  web:
    external: true
Enter fullscreen mode Exit fullscreen mode

Once everything is up and running, you can verify container logs or query a test request like:

curl --data "language=en-US&text=a simple test" https://languagetool.example.com/v2/check
Enter fullscreen mode Exit fullscreen mode

Using your own languagetool server

LanguageTool is well-integrated within popular browsers and various tools. When editing markdown files, I mainly use it within my primary browser Google Chrome and within my source-code editor vscode.

Browser-Plugin

When using the LanguageTool browser-plugin, you need to navigate to the options-menue and setup your own LanguageTool server like this:

LanguageTool, Browser plugin configuration

Visual Studio Code Extension

If you want language-tool support for your markdown files within vscode you can use the LTEX Extension.

It will provide issue highlighting, supports comment checking for many programming languages (opt-in) and you can use the replacement suggestions via the quick-fixes dialog, which looks like the following:

vscode, replacement suggestion

To use your own LanguageTool server, you have to configure the root url within the ltex.languageToolHttpServerUrisetting:

LanguageTool, Configure vscode extension

Discussion (0)