Start spider URL (spider mode only)

Top  Previous  Next

In spider mode, you are required to specify the URL from which the indexer will start the spider scanning from. Typically, you would point this to the entrance page of your website, (such as index.html) so that it will be able to find links to other pages on your website by following the links it finds on each page (as a visitor would).

Also note that the spider indexing mode automatically skip links to external web sites, i.e. those that are outside of the base URL defined (see below). This is to prevent indexing pages outside of the specified website.

Advanced spider URL options: Clicking on the screenshot_morebutton button will bring up a window which allows you to add more spider URLs or specify advanced spider crawling options. This is particularly helpful when indexing across multiple websites or domains.

screenshot_addurl_linux

Spidering options

With each spider URL in this list you can specify the following options:

Index page and follow internal links (default) – will index the contents of the page and follow any internal links found (URLs beginning with the base URL).
Index page and follow internal and external links – will index the contents of the page and follow internal and external links (but only up to one level of external links – eg. it will scan each external page linked from an internal page, but will not index external pages linked from external pages).
Index single page only – will only index the contents of the specified page, and not follow any links found.
Follow links only – will only follow the links found on the specified page but will not index the content of the page itself. The spider will then index and follow the links found on the pages that are linked to from this page.
Follow all links on this page only – will follow the links found on the specified page and index the linked pages, but not follow any further links. This indexes only one level of links, that is, only pages which are linked to this start point will be indexed.

You can also override the automatic base URL determined from this window, if necessary.

 

lightbulb

Tip: You can specify multiple base URLs for each individual start point. This allows a start point to span across multiple domains or sub-domains. For more information, see "Base URL".

Limits files for this start point

You can limit the number of files to index from this particular start point by checking this option. You can specify a global limit for all start points on the "Limits" tab of the Configuration window. Note that when both the global and individual limit is set, both settings will apply, so which ever limit is first reached (ie: the lower limit of the two), will cause the indexer to stop indexing the current start point.

Weighting for this start point

This adjusts the score weighting for the pages indexed under this start point. This can be used to make pages found from a particular start point or domain to be ranked higher or considered more important than pages from other start points. See "Weightings" for more information.

Import and export start points

You can also Import and Export additional URLs from a text file using the Import and Export button. See "Importing and Exporting additional start URLs" for more information.

The number of start points you can have in this list are only limited by the system resources available. However, the total number of pages indexed would still be limited by the indexing limits (max. pages, max. unique words, etc.) specified on the "Limits" tab.