-
Open Source HTML Parsers in Java
-
Looking for Structures
keywords: semi-structured text, unstructured text, structure recognition Retrieving Hierarchical Text Structure from Typeset : Scientific Articles – a Prerequisite for E-Science Text Mining Indexing Real-World Data using Semi-Structured Documents Inferring Structure Information from Typography Dr. Rolf Brugger Modeling Documents for Structure Recognition Using Generalized N-Grams A DTD Extension for Document Structure Recognition Jedi: Extracting and…
-
invalid/partial HTML parsing
-
Island Grammars / Parsing
Water of uncertainty. Islands of certainty. Island Grammars and Island Parsing + Document Structure Parsing What is a Topic Map? (Durusau & O’Donnell, 2002) Semantic Role Parsing: Adding Semantic Structure to Unstructured Text (Pradhan, 2003) Adding Structure to Unstructured Text (Maletic & Collard, 2005) Island Parsing and Bidirectional Charts (Stock, 1988) (CiteSeer) Generating Robust Parsers…
-
Parsing Parsing
Natural Language Parsing (course) @ Uni Heidelberg The Program Transformation Wiki ANTLR tutorial @ The University of Birmingham (+ many other Java-related tutorials) Parsing books: by Dick Grune Modern Compiler Design, Parsing Techniques – A Practical Guide, Parsing Techniques – 2nd Edition Formalism / Tools SDF – Modular Syntax Definition Formalism TXL – The TXL…
-
ANTLR for Ruby
-
Piccolo SAX Parser
-
Universal Feed Parser (Python)
-
Project Log Analyzer
-
FreeLing
An open source C++ library providing language analysis services. Like tokenization, sentence splitting, morphological analysis, named entity and date/number/currency recognition, PoS tagging, and shallow parsing. The software is released under LGPL. Developed by Natural Language Research Group, Technical University of Catalonia, Spain