Setting up a pre-commit spell checker using husky and spellchecker-cli
A year after setting up this website, and getting some views on my earlier blogs, I chanced upon a feedback shared by a reader which brought a grave mistake to light.
The mistake was not that the reader found a spelling error in the blog, but that there had been tens of such errors across different pages on the website. On a lucky pass, the feedback helped me notice that I really needed some sort of spell check in all my pages. This led to me setting up a pre-commit task for spell checking on the changed files in my Next Js project.
This blog talks my setup for the same.
Starting with an empty project
For the purpose of the blog, we will start with creating a basic react app with
create react app
. However, the below setup will work with most modern frontend frameworks, including Next Js.You can skip this section if you already have a working project in which you would want to integrate.
We will start with creating a starter react project.
npx create-react-app spell-checker-example
cd spell-checker-example
While we don't need to, the dev server is now available through:
npm start
Setting up the spell checker
Let's the install the spellchecker-cli
npm package which would abstract out the logic behind spell checking. This package wonderfully checks for spelling errors, and basic grammar and supports a bunch of customizations ( Link to the documentation ).
npm i -D spellchecker-cli
The spell checking can now be run as a script from package.json.
// package.json
{
...
"scripts": {
...
// spellcheckerrc.json is the config file respected by spellchecker-cli.
"spell-check": "spellchecker --config .spellcheckerrc.json"
As a next step, let's add our basic configuration for the spellchecker-cli.
touch .spellcheckerrc.json
// .spellcheckerrc.json
{
"files": [
"./src"
],
"generateDictionary": true,
"quiet": false
}
With this, our spell-checker is now working with:
npm run spell-check
Running a spell-check will find all spelling errors in the files passed in the files array. All the errors from the run are copied into dictionary.txt
.
Our current setup has a couple of limitations -
- It requires to be run manually.
- It can be slow since it always checks all the files in the entire src folder.
- It checks and runs into errors due to variable name and any other words which are not actually a part of the english vocabulary.
Solving for the limitations
Limitation 1: It runs manually
Husky is a library that integrates deeply with git hooks and makes native git hooks easier. We will use it to create a pre-commit hook which runs the spell checker before every commit. Depending on your usecase, the same logic below would work for a pre-push/any other git hook if needed.
husky-init
is a one time command to quickly setup husky.
npx husky-init && npm install
This should create a sample precommit file. Edit it to run the spell-check command on every pre-commit.
#!/bin/sh
. "$(dirname "$0")/_/husky.sh"
npm run spell-check
That's about it. Henceforth, spell-check would run before every commit on the repository.
Limitation 2: Spell-check is slow and runs more files than needed
Our problem here could be solved by running only those files which have changed, and are staged. Luckily, .husky/pre-commit
file is just another shell file which has access to the git environment.
We can get the list of staged files by using git diff --name-only --cached
Let's use this command in the file to override the hardcode list of files in the .spellcheckerrc.json
#!/bin/sh
. "$(dirname "$0")/_/husky.sh"
changedFiles="$(git diff --name-only --cached)"
npm run spell-check --files ${changedFiles}
This works as expected and now, spell-checker would only check files which have been staged. However, now it checks images, along any other non text staged files as well. To get past that, one can eloquently pass a negation of the regex of the assets/public folder to the list of files.
npm run spell-check --files ${changedFiles} "!public/**" "!**/*.scss"<other paths to public folders>
Limitation 3: Spell-check does not recognize non english keywords
There are two ways of solving this -
The
ignore
option in spellcheckerrc would let us pass an array of regexes that can be ignored during spell-checking.// .spellcheckerrc.json
// example regex to match camelCased words
// variables are camelCased in Js
"ignore": [
"[a-z]+((\\d)|([A-Z0-9][a-z0-9]+))+"
]
With
"generateDictionary": true
, every run of spell-check generates adictionary.txt
file which contains a list of words in which the spell errors were found. The ideal flow would be fix all the spelling errors from the file until a run goes smoothly. If any valid words are found which spell-checker should recognize, they can be added to an extra dictionary file.// .spellcheckerrc.json
// Our project specific vocabulary/any other valid words
"dictionaries": [
"./dictionary-clean.txt"
]
With this, our spell checker is all ready to go.