January 03, 2014
Extracting n word phrases in large texts
This is a summary of resources posted on [Corpora-List] early 2014
CMU-Cambridge Statistical Language Modeling toolkit
Lawrence Anthony's AntConc
Software for the extraction of n-grams as well as patterns that are not consecutive (skipgrams). The software is written in C++ for speed and memory efficiency but comes with a Python binding for usage from Python script. It also has a standalone CLI tool that can do what you want.
Maarten van Gompel
GnuPG key: 0x1A31555C XMPP: firstname.lastname@example.org