In simpler words, web scraping is a computer software technique which is used to extract data from the web. The technique is closely related to web indexing in which information is taken out using a web crawler or a bot. A universal technique used by search engines, the popular web scraping technology is often used in the domain of search engine optimization and is proving to be a major tool for web access in contemporary times.
At a basic level, there are a number of scraping options that one can look across for collecting and collating useful data from the web. These options can be quite conveniently installed in almost any machine, but then, installing the same on systems that are running on Mac and Linux could prove to be a little taxing and require professional help. One of the best things about using these tools is that you can follow links with ease, and can effectively crawl without entering each URL manually.
As a matter of fact, many common programming languages come with their own open source crawlers, out of which Nutch, Heritrix, WebSphinx and Harvest Man are the most common ones.