Search Appliance


Thunderstone Search Appliance Manual

Extra URLs REX


Syntax: zero or more regular expressions (REX), separated by space or line break

Restricts walks to fetch URLs only matching any of the specified regular expressions anywhere in the URL (hostname, path, or query) when the Base URL matches.

If a Base URL is matched by an Extra URLs REX, then the only URLs that match the Extra URLs REX will be walked on that host. If a Base URL does not match an Extra URLs REX, then it is walked as normal.

It is a rarely used setting, most commonly used in conjunction with a hostname to fetch matching URLs on an additional host. Links still need to be found to those pages for them to be indexed.

For example, with the following Extra URLs REX:


(which matches a URL that begins with and contains supplierid=BigCo), and using the following Base URLs:

The Extra URLs REX matches the URL, so only pages with supplier=BigCo will be walked, while all of will be walked (following other inclusion/exclusion rules).

Available from version 4.3.9.

See also Extra Domains, here.

Copyright © Thunderstone Software     Last updated: Dec 5 2019