Skip to main content

Google and Google Scholar: Understanding Google

Autocomplete

Autocomplete predicts and displays search terms that may be similar to those you are typing.

These predictions are determined by Google's algorithm without any human intervention, and are based on various factors:

  • Other users' web search activities
  • The popularity of search terms
  • Your past search queries (if you are logged in to your Google Account)

How do search engines work?

Search engines do not search the World Wide Web directly. Rather, they search databases of webpages that have been harvested from the Internet by computer programs known as robots or spiders.

Spiders periodically crawl the web and index the text, links and other data in every webpage. This information is then stored in a search engine's database. When we create a search by using keywords, the files of the database are searched and if your search matches a webpage's content, it will be retrieved.

What search engines do not find: the Invisible Web

Some content remains hidden from search engines:

  • Webpages that are excluded by a search engine's policy as of little value or use
  • Webpages deliberately excluded by their owners
  • Sites that require registration and login
  • Content of searchable databases such as university library article databases and catalogues

This type of information, known as the invisible, hidden or deep web, is generally inaccessible to the robots or spiders that crawl the web and is therefore not included in search engine results.

As the invisible web is estimated 500 times larger than the visible or surface web, it is important to broaden your searches to other resources and not limit them to a single search engine.

Ranking results

The challenge for search engines is not only to find useful information, but also to order it in such a way that the most relevant content is displayed at the top of the results list. This is called relevance ranking.

Google uses a ranking algorithm called PageRank. PageRank analyses the number and quality of links to a webpage in an attempt to estimate its importance. The assumption is that the more links a page receives from other pages, the more important it is. Note that no evaluation on the quality of the information is made.

Other factors that are considered include:

  • How often search terms appear in the text
  • Where the search terms appear, e.g. in the title, URL
  • The number of citations of a result