Search Appliance

 

Thunderstone Search Appliance Manual

Categories

Syntax: textual name and URL pattern pairs, additional input boxes will appear as you fill the ones provided

The Search Appliance can create searchable sub-categories that will appear in a drop down box on the Search page. Enter the name of the category on the left, and its corresponding URL pattern on the right. URL patterns must fully match the URL (e.g. including protocol), and may contain asterisk (*) to indicate "anything" or question mark (?) to indicate any single character. There may be more than one pattern for each category; separate multiple patterns with space. Category names must not contain the pipe ("|") character, as it may be used to separate multiple categories in the category search parameter. A category should also not be named "Everything", as the search interface provides that option in the category selection box to search everything (i.e. any category), which might be confused with a specific category of the same name.

The following table provides an example.

Category URL Pattern
Demonstrations http://www.example.com/demos/*
Manuals http://www.example.com/manual/*
Books http://www.example.com/a1/* http://example.com/b3/*

Table 3.2: Example Categories

This example would create a category named Demonstrations which would only search the URL http://www.example.com/demos/ and any files under this directory, thereby creating a more concise match to the user's search. The same is true for Manuals. However, the Books category would include pages from both the /a1 and /b3 directories. The user would now have the option to search within just these categories or the entire database. The pattern should not be a single page unless you want a category with just that single page in it (e.g. http://www.example.com/manual/index.html or http://www.example.com/manual/ would generally be incorrect). It should typically be a prefix for a directory that has multiple pages within it, followed by an asterisk (*).

Note that URL Patterns will not be used to determine categories if any Data From Field rules set Category. Please see the Data from Field settings (p.  here) for more details.

For best search performance, categories that overlap one another (i.e. contain walked pages in common) should be avoided if possible. If overlapping categories are used, they should be listed most-commonly-searched first. Also, the CatnoLowest field should be selected as one of the Compound Index Fields (here); this is the default. These guidelines will allow the Auto-detect mode to optimize the most searches to the fastest possible speed.

Also note that changing, deleting or adding Category and/or URL Pattern after a walk has been performed will trigger a recategorization. This procedure, which runs in the background, re-applies the category changes to the walked data. While it is faster than a full walk - as pages do not need to be fetched and fully processed - it nonetheless can take some time, particularly for large walks. For best performance, wait for the recategorization to complete (it can be monitored on the Dashboard or Walk Status as a task) before starting another walk.


Copyright © Thunderstone Software     Last updated: May 24 2023