2008-04-19

Crawling the deep Web

clipped from blog.wired.com

Google Spiders to Start Crawling The 'Deep' Web

google.jpgGoogle recently announced it will soon begin indexing the so-called "deep" web, those pages hiding behind HTML forms and other inadvertently spider-blocking HTML elements. The move will potentially open up a whole new range of webpages that were previously invisible to the search engine.

clipped from en.wikipedia.org
Deep Web

The deep Web (or Deepnet, invisible Web or hidden Web) refers to World Wide Web content that is not part of the surface Web indexed by search engines. It is estimated that the deep Web is several orders of magnitude larger than the surface Web.[1]

Deep Web resources may be classified into one or more of the following categories
Dynamic content
Unlinked content
Private Web
Contextual Web
Limited access content
Scripted content
Non-HTML/text content
See also

blog it

Related:
Google Spiders to Start Crawling The 'Deep' Web | Compiler from Wired.com
Slashdot | Google Crawls The Deep Web
Deep Web - Wikipedia, the free encyclopedia
Official Google Webmaster Central Blog: Crawling through HTML forms