Tag: corpora

  • Using The Web For Linguistic Research

    อืมม.. Slashdot ว่าไว้: prostoalex writes “The Economist says linguists are gradually adopting the World Wide Web as a useful corpus for linguistic research. Google is used, among other resources, to research how the written language evolves and how some non-standard examples of usage become more or less acceptable (The Economist quotes the phrase ‘He far…

  • CLaRK for Corpus building

    อันนี้อีกตัว / another one XML-based, don’t know much in details yet. Will read. Have at least one real world applicatoin, BulTreeBank, the Bulgarian HPSG TreeBank.

  • Penn Treebank in XML format

    Vee, you may interest in this one. No instant noodle here. Still have to work something out, but it shouldn’t be a difficult task. TIGERRegistry PTB -> TIGER XML filter, included in TIGERSearch, a treebank explorer package. TIGER API, Java lib for accessing corpus in TIGER XML format.