tutorial session, แพงจัง ไปดีมั๊ยเนี่ย
by Mirella Lapata
(slides for COM3110 Text Processing class, Department of Computer Science, University of Sheffield)
breifly explains Google search, IR, issues in IR, indexing, inverted file, boolean model, vector space model, TF/IDF, term weighting, evaluation, precision, recall, and F-measure.
Sometimes it’s more than just ‘search’. We may want it ‘faster’, and many times we want it ‘smaller’.
(And for the case of database/index size, smaller one is probably the faster one — less things to looking for.)
Managing Gigabytes: Compressing and Indexing Documents and Images by Ian H. Witten, Alistair Moffat, and Timothy C. Bell. (read reviews)
From the authors of the book, MG, an open-source indexing and retrieval system for text, images, and textual images.
Instead of just only page title, url, and few first (nonsense) paragraphs from the page.
Short summaries may help users to decide which pages are whattheywant and whattheydontwant.
นอกจากจะแบ่งกลุ่มเอกสารที่หามาได้ ให้หา(ต่อโดยผู้ใช้ว่าอันไหนจะเอา อันไหนไม่เอา)ง่ายๆ แล้ว
PageRank is one of algorithms used by Google search engine.
If you want to know how PageRank works, this is the site.
WhatWeWant is up online, thanks to keng. A blog about search engine, information retrieval, and those kind of stuffs.
My clothes are currently running around, play catching up each other in the washing machine.
It’s just almost 6am here, in Edinburgh. Still very dark. During this time of year, the Sun will rises on around 9am … and says bye-bye on around 4pm <gosh!> -_-“
Actually, just very lazy to woke up. But tomorrow 6pm I have to work at Thai restaurant, and my ‘uniform’ is not get washed yet. (Today I guess I will back to my flat late, have to finished practical part of my assignment — due tomorrow 5pm.)