-
Using The Web For Linguistic Research
อืมม.. Slashdot ว่าไว้: prostoalex writes “The Economist says linguists are gradually adopting the World Wide Web as a useful corpus for linguistic research. Google is used, among other resources, to research how the written language evolves and how some non-standard examples of usage become more or less acceptable (The Economist quotes the phrase ‘He far…
-
CLaRK for Corpus building
อันนี้อีกตัว / another one XML-based, don’t know much in details yet. Will read. Have at least one real world applicatoin, BulTreeBank, the Bulgarian HPSG TreeBank.
-
Penn Treebank in XML format
Vee, you may interest in this one. No instant noodle here. Still have to work something out, but it shouldn’t be a difficult task. TIGERRegistry PTB -> TIGER XML filter, included in TIGERSearch, a treebank explorer package. TIGER API, Java lib for accessing corpus in TIGER XML format.