7 free software tools for linguists

Today I’m giving a quick rundown of free software tools for linguistic research. There’s a bit of a preamble, but I’ll get there. Promise.

What is free software?

Easy. I know what free software is already, you might say. It’s software you don’t have to pay for - like Adobe Reader or Skype.

Somewhat unintuitively, no. And actually, neither of those applications qualify as free software. Sure, they’re distributed free of charge (gratis), but their source code is owned by private companies. You aren’t free to take a peak at how they work to change or improve them. There are conditions on their use. Both of these applications are, more correctly, ‘freeware.’

In contrast, free software is software any individual, community or business can use as they please1. No one can force you to pay a licence fee for your copy. No one’s going to stop you from looking at the source code and modifying it. You can make as many copies as you like and give them to whoever you want.

“Oh, you mean like open source software?”

Well, that’s almost right. All free software is open source - anyone can poke around and look at the code. But not all open source software is ‘free.’ For example, a company might release all the source code for a project, but restrict its use for commercial purposes.

The precise definition of “free software” remains the subject of debate. There are a number of free software licenses in common use (e.g. the GPL and Apache License), though different organizations have their own criteria for what counts as ‘free.’ However, on basic principles, there is broad consensus.

The Free Software Foundation considers a program free software if the user has the freedom to:

  • Use the program as they wish, for whatever purpose
  • Study and change how the program works
  • Redistribute copies of the program
  • Redistribute modified copies of the program2

Other organizations have similar definitions, though these don’t differ dramatically.

Why does it matter?

For the same reason having an open academic community matters. We learn by looking at what others have done and building on this. Knowledge becomes more rigorous when we subject it to scrutiny. In the context of software, having more eyes on your code leads to better code, not just in the short-term, but for the future as well. Even if you don’t have any programming experience, by using free software you’re creating an incentive for those who do to keep developing better free software applications.

What’s more, research shouldn’t be reliant on a handful of companies’ intellectual property. Unfortunately, this is exactly what’s happened in the realm of academic publishing. Using free software helps to create an ecosystem where people can build the tools they need, without depending on corporate benevolence. It also means time/effort isn’t wasted reinventing the wheel when a new application relies on the code of the another.

The good news

If you’re doing any linguistic research nowadays, you’re probably already using multiple free software tools without even realizing. Go you!

List of tools

Below is a list of software useful for linguistic research. Most of the tools are pretty user-friendly, although the last couple are a bit more technical.

LibreOffice

LibreOffice is a full-featured office suite, similar to the proprietary Microsoft Office. It’s a derivative of OpenOffice (discontinued in 20113). Everything you’re used to doing in your current suite – writing assignments, making presentations, working on spreadsheets – is just as straightforward in LibreOffice.

An example document in LibreOffice Writer. Look kinda familiar?

Zotero

Zotero is tool for managing references. Your reference lists are stored locally, but can easily be synced to a Zotero account for portability. Crucially, it can be integrated with with your office suite (LibreOffice Writer and Microsoft Word are both supported) and your web browser (Firefox or Chrome).

Integrating Zotero with LibreOffice Writer allows you to use keyboard shortcuts to add citations.

Praat

Though you might not guess it from the hideous logo and late-90s webpage Praat is one of the best tools for acoustic analysis out there. It’s ubiquitous in the world of linguistics and rightly so.

The good ol’ Praat interface.

Elan

If you’re looking for a tool to analyse spoken corpora Elan is the way to go. With a bit of practice, you’ll be tagging those sociolinguistic interviews in no time.

Elan can be used to annotate videos too. Image courtesy of The Language Archive.

Audacity

Manipulating audio files is essential for anyone doing speech analysis. Audacity is a powerful, no-nonsense application for recording, editing and converting these files.

Checking out an audio file in Audacity.

Natural Language Toolkit (NLTK)

Python is useful. Not just for linguists, but just about everyone. If you fall into the former group, NLTK makes it even usefuller. POS-tagging, concordances, frequency analysis, tokenization - you name the text analysis method and NLTK probably has a neat function or two for it.

Using Python to find instances of the word “grail” in Monty Python and the Holy Grail.

Reaper

This one’s a little specific. It’s also not particularly user-friendly, so being comfortable with the command line (and phonetic analysis) is crucial. Reaper is designed to help you make detailed analyses of _f_​₀ and and voicing in speech files. It’s particularly useful when

It ain’t pretty, but Reaper certainly does the job.


  1. For example, there is no end-user license agreement (EULA) dictating how you can/can’t use the software. If you want to use free software to start a chemical weapons business and commit acts of fraud and terrorism, that’s entirely up to you. ↩︎

  2. One caveat to be mentioned here is that some licenses (e.g. the GPL) prohibit the software, either in whole or in part, from being incorporated into proprietary software. ↩︎

  3. Confusingly, the project was then picked up by the Apache Foundation and renamed Apache OpenOffice. This project remains in active development. ↩︎