January 12, 2009, 11:45 am
Fatten Up Your CorpusBy
Jacob HarrisAh, January! It’s that special time of year when marketers manipulate our resolution-shackled psyches to sell us all sorts of diet pills and exercise schemes. But if you’re a researcher in computational linguistics, natural language processing or machine learning, the last thing you want to do is slim down. As Google’s example has consistently shown,
more data usually beats better algorithms, and if you’re a researcher looking for a new motherlode of high-quality textual data — who also has a love of The New York Times’s writerly chops — where better to start than
The New York Times Annotated Corpus?
...
Full article at:
http://open.blogs.nytimes.com/2009/01/12/fatten-up-your-corpus/