Managing Gigabytes (Book)

Sometimes it’s more than just ‘search’. We may want it ‘faster’, and many times we want it ‘smaller’.

(And for the case of database/index size, smaller one is probably the faster one — less things to looking for.)

Managing Gigabytes: Compressing and Indexing Documents and Images by Ian H. Witten, Alistair Moffat, and Timothy C. Bell. (read reviews)

From the authors of the book, MG, an open-source indexing and retrieval system for text, images, and textual images. read more

Summarization for Search Engine

Talking about Document Clustering/Categorization/Classification, about ‘approach’ to aid user access to mountains of pages may be a Summarization.

Instead of just only page title, url, and few first (nonsense) paragraphs from the page.

Short summaries may help users to decide which pages are whattheywant and whattheydontwant.

นอกจากจะแบ่งกลุ่มเอกสารที่หามาได้ ให้หา(ต่อโดยผู้ใช้ว่าอันไหนจะเอา อันไหนไม่เอา)ง่ายๆ แล้ว read more

