Syntax: select Yes or No buttons
With this set to Yes, the Search Appliance will initially get
/robots.txt from any site being indexed and respect its
directives for what prefixes to ignore. Turning this setting off is
not generally recommended. Supported directives in
Respect the meta tag called
robots. With this set to Y
the Search Appliance will process and respect the robot control information
within each retrieved HTML page.
Whether to still put an (empty) entry - a placeholder - in the
html search table for URLs that are excluded via
<meta name="robots"> tags. Leaving a placeholder improves
refresh walks, as the URL can then have its own individual refresh
time like any other stored URL. Without a placeholder, the URL would
be fetched every time a link to it is found, because no knowledge that
it has been recently fetched would be stored.
The downside to placeholders is that if the URL is also being searched
in queries - i.e.
Url is part of Index Fields - then
the excluded URL might be found in results. Placeholders have empty
text fields (e.g. no body, meta, etc.) to avoid matches on text, but
the URL field must remain.