Friday 8 May 2009

'Human error' hits Google search

Google screen grab
Users were warned that all search results were dangerous

Google's search service has been hit by technical problems, with users unable to access search results.

For a period on Saturday, all search results were flagged as potentially harmful, with users warned that the site "may harm your computer".

Users who clicked on their preferred search result were advised to pick another one.

Google attributed the fault to human error and said most users were affected for about 40 minutes.

"What happened? Very simply, human error," wrote Marissa Mayer, vice president, search products and user experience, on the Official Google Blog.

The internet search engine works with stopbadware.org to ascertain which sites install malicious software on people's computers and merit a warning.

Stopbadware.org investigates consumer complaints to decide which sites are dangerous.

The list of malevolent sites is regularly updated and handed to Google.

When Google updated the list on Saturday, it mistakenly flagged all sites as potentially dangerous.

"We will carefully investigate this incident and put more robust file checks in place to prevent it from happening again," Ms Mayer wrote.


About Google's search service

Google Custom Search and Custom Search Business Edition

Google uses the index they've created for the web search engine, and limits by domain name, host, and/or URLs. When someone enters a query in the search form on your site, the Google server application receives the query, formats the results, and sends them back in either HTML or XML (for the business version) with links directly to the pages on your site.

Features

  • Finding Content
    • Can include multiple sites (unlimited pages in the non-business version)
    • Only those pages within the Google search index are available, no promises about additional indexing.
    • No access to pages secured by passwords or other access control.
    • Updates to new versions of pages when the Google search index updates (no daily or weekly updating).
    • Powerful robot crawler can handle most kinds of links
  • Indexing
    • Handles file formats: HTML, XML, text, PostScript, RTF, PDF, Lotus, MacWrite, MS Word, Excel, and PowerPoint
    • Excellent character set and language recognition for best tokenization
    • Does not store the contents of meta tags or page properties.
  • Querying
    • Defaults matching all words in the query, case-insensitively
    • Uses the Google query language, including Internet Query Operators - (minus) and "" (quotes) , along with OR and various field names and other parameters.
    • Optional Safe Search for eight languages (Dutch, English, French, German, Italian, Portuguese (Brazilian), Spanish, Traditional Chinese)
    • Light pluralization using an internal wordlist rather than stemming
  • Retrieval
    • Retrieves all matching pages (though the CSE doesn't say how many that is)
    • Shows spellchecker "did you mean?" for misspelled and mistyped words, but they may not have any match on a particular site or set of sites, so it can be a dead end.
    • Search results can have "Refinements", zones based on URLs which appear as links along the top of the results
    • Search Suggestions appear using the "subscriptions" mechanism, which is quite poorly documented
  • Relevance
    • Relevance ranking uses all the Google algorithms, including PageRank
    • Adjusting relevance weight can only be done via an XML "background label" and "boost" process
  • Results UI
    • Default looks like the Google web search results.
    • Can display interface in English, French, Spanish, German, Bulgarian, Chinese (Simplified and Traditional), Croatian, Czech, Danish, Dutch, Finnish, Greek, Hungarian, Italian, Japanese, Korean, Norwegian, Polish, Portuguese, Russian, Slovak, Swedish.
    • Hides duplicate pages based on snippet similarity
    • Page size and cache link seem to appear or not appear randomly
    • Basic results page customization: logo, text and link colors
    • Option to use JavaScript and show results in an iframe (not well documented)
    • Option to request XML results and use a scripting language or presentation program to show them.
  • Search Analytics and reports
    • Shows traffic by hour, day, week, month or "overall" (since installing the search service)
    • Shows most popular queries in the same time periods, with links to the queries and flags on no match (zero results) with details.
    • Note: report periods for low-traffic search installations may end the previous Saturday, even for daily and weekly reports.
  • Administration
    • All admin done via web
    • Option to allow "contributors" who can edit the URLs to be included or excluded, and annotate them with any refinement labels that you have created, but not otherwise change the search engine.

  • Business Edition (CSBE) features
    • No advertising
    • Google logo ("branding") not required
    • XML results option - allowing flexible display customization
    • Technical support by email, and for larger customers, an option for paid telephone support

Articles & Reviews

Reference :

BBC

SearchTools

1 comment:

Stephen said...

Nice post. :D