(Better, Faster,) Lighter NLTK

NLTK-Lite is substantially simplified and streamlined version of NLTK (Natural Language Toolkit). NLTK is no longer supported.

NLTK-Lite is a new collection of lightweight NLP modules designed for maximum simplicity and efficiency. NLTK-Lite only covers the simple variants of standard data structures and tasks. Simplicity and efficiency are valued over generality and extensibility.

Key differences from NLTK:

  • requires Python 2.4
  • tokens are represented as strings, tuples, or trees
  • all tokenizers are iterators; large tasks produce output as early as possible
  • more emphasis on Python constructs instead of NLTK constructs
  • default pipeline processing paradigm leads to more transparent code
  • taggers incorporate backoff for smaller models and faster operation
  • shorter names (e.g. tokenizer.RegexpTokenizer() becomes tokenize.regexp())
  • tutorials are more easily maintained now with docutils and doctest
  • contributed software is more easily incorporated

unrelated: Better, Faster, Lighter Java

Published by

bact

bact' is a name

2 thoughts on “(Better, Faster,) Lighter NLTK”

  1. แต่ก่อนนึกว่า NLTK-Lite จะเหมือน TLE-Lite :-Pแบบนี้ต้องเอามาใช้บ้างละ :-)blog ท่าน bact' นี่แน่นไปสาระประโยชน์จริงๆ 😉

  2. NLTK-Lite is much easier to learn and use. ตอนนี้ จะได้เล่น NLP บ้างสักที(ทำไมภาษาไทย ไม่มีแจกดีๆ แบบนี้บ้างนะ)

Leave a Reply