This year I started my new job in my first English-speaking company. As a senior software engineer, I have to write many emails, complex task- and bug descriptions and create presentations about various topics. Therefore it is great to have an assistant to improve my grammar and spelling.
When looking at some great applications I stumbled across LanguageTool - an open-source grammar, style and spell checker. To increase your privacy and be the owner of your data, it is possible to host your own instance.
This blog post will guide you through the setup and configuration.
From time to time, I also write emails that contain sensitive information and I developed an awareness of how often these tools are going to process this critical information. I was getting the uncomfortable feeling that this data could be intercepted or used in some way I did not want.
LanguageTool comes with its own embedded HTTP server so that a text can send to your own instance of LanguageTool. It is also possible to write your own application and use the HTTP API.
I like to host every application within a dedicated docker-compose file. When doing so, the HTTP reverse proxy traefik integrates very well with docker and enables you a dynamic configuration - it is worth taking a look!
- Infrastructure with docker, docker-compose and traefik
- Approximately 20 GB of free storage
- Prepare your n-gram datasets
Ensure that you have the following folder structure:
├── languagetool # base folder │ ├── docker-compose.yml # configuration file │ └── ngrams │ ├── de # downloaded n-gram data for de │ │ ├── 1grams │ │ ├── 2grams │ │ └── 3grams │ └── en │ ├── 1grams │ ├── 2grams │ └── 3grams
Following, you can see my personal
docker-compose.yml, which you can use as a reference. For a more detailed description, you can look at the image description erikvl87/languagetool.
version: "3" services: languagetool: container_name: languagetool image: erikvl87/languagetool restart: unless-stopped environment: - langtool_languageModel=/ngrams # OPTIONAL: Using ngrams data - Java_Xms=512m # OPTIONAL: Setting a minimal Java heap size of 512 mib - Java_Xmx=1g # OPTIONAL: Setting a maximum Java heap size of 1 Gib volumes: - ./ngrams:/ngrams ports: - 8010 labels: - traefik.enable=true - traefik.http.services.languagetool.loadbalancer.server.port=8010 - traefik.http.routers.languagetool-http.rule=Host(`languagetool.example.com`) - traefik.http.routers.languagetool-http.entrypoints=web - traefik.http.routers.languagetool-http.middlewares=resecure@file - traefik.http.routers.languagetool-https.rule=Host(`languagetool.example.com`) - traefik.http.routers.languagetool-https.entrypoints=websecure - traefik.http.routers.languagetool-https.tls=true - traefik.http.routers.languagetool-https.tls.certresolver=le - traefik.docker.network=web networks: - web networks: web: external: true
Once everything is up and running, you can verify container logs or query a test request like:
curl --data "language=en-US&text=a simple test" https://languagetool.example.com/v2/check
LanguageTool is well-integrated within popular browsers and various tools. When editing markdown files, I mainly use it within my primary browser Google Chrome and within my source-code editor vscode.
When using the LanguageTool browser-plugin, you need to navigate to the options-menue and setup your own LanguageTool server like this:
If you want language-tool support for your markdown files within vscode you can use the LTEX Extension.
It will provide issue highlighting, supports comment checking for many programming languages (opt-in) and you can use the replacement suggestions via the quick-fixes dialog, which looks like the following:
To use your own LanguageTool server, you have to configure the root url within the