This page describes how to tweak the ranking of the search results returned by your search engines.
- Overview
- Boosting Results with Keywords
- Changing Search Results with Labels
- Tagging Sites with Labels
- Modulating the Effects of Labels
Overview
Say that you've compiled a list of sites that you want your search engine to cover, but when you test out some queries, the search results do not quite match what you had in mind. The results that you think are most relevant to the query are not at the top of the page. Or perhaps you want to give preference to webpages from your favorite research institution or your own website. You can straighten that out by promoting or demoting results. Programmable Search Engine lets you tune results by three means: keywords, weighted labels, and scores. Keywords and weights are defined in the context file, while scores are defined in the annotations file.
- Keywords are a quick way of boosting certain webpages in your search results and getting more search results about a particular subject.
- Weighted labels tell Programmable Search Engine whether to exclude, promote, or demote a site. How much a site is promoted or demoted depends on the weights that you apply to the labels.
- Scores, which are applied to individual annotations, temper or reverse the influence of the weighted labels. They add another layer of granularity to the fine-tuning of the ranking.
Weights in labels and scores in annotations are the primary knobs and dials
for changing the ranking of search results. Both have values that range from
-1.0
to +1.0
. You can promote and demote sites by
turning the dials (increasing or decreasing values) with scores and weights.
You have strong influence over the ranking, but you do not have absolute control over the results. The promotion or demotion of results is a function of many parameters, including the relevancy of the webpage, the choice of keywords, the weight on the labels, the scores in the annotations, and so on.
Boosting Results with Keywords
Keywords are the quickest way to change results. Programmable Search Engine boosts webpages that include your keywords. It can also retrieve more search results about that subject. So if your search results seem paltry, try adding keywords. While Programmable Search Engine boosts webpages that contain those keywords, it does not demote or filter out webpages that don't contain the keywords.
Keywords are a way for you to apply the intent of your users to the search engine. For example, when users of the yoga search engine search for "mat", they are actually searching for "yoga mat", not "Miller Analogy Test" or "house mats". Think about the main focus of your search engine and the context of your users' search queries. In our search engine example, "yoga" would be an obvious keyword. Don't use keywords that are too broad or straddle too many categories. For example, "exercise" and "eastern practices" would retrieve many webpages that have nothing to do with yoga. The best keywords describe the content of the sites that your search engine covers.
Start out with a single word first, and see if you can get the results that
you want. If you don't get enough results, try using multiple keywords. You can
also use phrases, which are series of words enclosed within quotation marks
(for example, "yoga pose"), but single-word keywords are better. Programmable Search Engine
interprets yoga pose stretch
as three keywords, "yoga", "stretch",
and "pose".
Keywords are not independent from each other; they work together. So if you have the keywords "yoga" and "pose", webpages that contain "yoga" and webpages that contain "pose" get boosted, but webpages that contain both "yoga" and "pose" get boosted even more.
Example: Keywords
Let's compare search results for "mat" in two versions of a yoga programmable search engine.
Figure 1: Results for the search query "mat" from a search engine that does not use keywords. (To see the entire result set, click the image.)
Figure 2: Results for the search query "mat" from a search engine with the keyword "yoga".
In the version with the "yoga" keyword, webpages that contain the keyword are promoted in the results page.
Creating Keywords
You can create as many keywords as you want, as long as you don't exceed 100 characters. The easiest way to create keywords is through the Basics section of the Overview page in the Control Panel. You can use that tab to experiment, trying out different keywords and checking out their effects on the results page. If you don't like the results, you can easily remove a keyword and try another one.
If you want to create keywords in your context file, you can use the
keywords
attribute of the CustomSearchEngine
element
to define the keyword values. Separate keywords from each other using a single
space. Enclose phrases in quotation marks; you can use either the punctuation
mark ("
) or the character entity ("
).
<CustomSearchEngine keywords="asana "yoga postures""> </CustomSearchEngine>
Changing Search Results with Labels
The other way to change search results is with labels, which are the workhorses of search results ranking, determining how sites should be treated.
You can use two kinds of labels: search engine labels and refinement labels.
Search engine labels determine which sites should be covered by the search
engine. They are invisible to your users and run in the background; hence,
their parent element is called BackgroundLabels
. Refinement labels,
on the other hand, are visible to your users and show up as links. Refinements
are discussed in detail in the Refining Searches
page. Most of this page focuses on search engine labels, although
modes, weights, and
scores operate in the same way in both search engine and refinement
labels.
The following code shows the two kinds of labels in the context file:
<!--Search engine labels--> <BackgroundLabels> <Label name="_include_" mode="FILTER"/> <Label name="_exclude_" mode="ELIMINATE"/> <lt;/BackgroundLabels> <!--Refinement label--> <Facet> <FacetItem title="Lectures"> <Label name="lectures" mode="BOOST" weight="0.8"> <Rewrite>lecture OR lectures</Rewrite> </Label> </FacetItem> </Facet>
When you first create a
Programmable Search Engine using the Control Panel, Programmable Search Engine creates two
search engine labels for you. The labels have modes, which determine how the
sites should be treated. One of them is exclusive
(mode="ELIMINATE"
), and the other one is inclusive
(mode="FILTER"
). (You can change the mode for the inclusive label
from "FILTER" to "BOOST" after creating the Programmable Search Engine).
Using Labels
To use search engine labels, do the following:
- In the context file, create or redefine search engine labels.
- Define the label name. You can accept the name generated by the Control Panel, or you can define your own.
- Define the mode.
- Optional. Define the weights.
- In the annotations file, tag sites with labels.
Example: Context File with Labels
The following is a truncated example of a context file with search engine labels.
<CustomSearchEngine keywords="climate "global warming" "greenhouse gases""> <Title>RealClimate</Title> <Description>"Climate change"</Description> <Context> <BackgroundLabels> <Label name="_include_" mode="FILTER"/> <Label name="_exclude_" mode="ELIMINATE"/> </BackgroundLabels> </Context> </CustomSearchEngine>
Defining the Mode of the Label
Whether a site is promoted, demoted, or excluded depends on the search engine label it is associated with. A search engine label can have the following modes:
Note: Follow the capitalization. Use uppercase letters for the modes.
Mode | Does the following... | Use this mode if... |
---|---|---|
ELIMINATE |
Excludes sites tagged with this label from your search engine. | You want to exclude webpages that rank highly on Google search but are not that great for your audience. For example, if you are creating a search engine for the scientific
study of hamsters, you would use labels with |
FILTER |
Includes only sites tagged with this label, and excludes everything else. | You want the search engine to search only your site, affiliated sites, or sites that focus on a particular subject. Because the coverage of such search engines is restricted to a handful of sites, you can have more precise control over the ranking of the search results. Changing the order of the search results using weights is discussed in the next section. For example, if you want to create a search engine just for your website,
have a single site tagged with a label that has the |
BOOST |
Includes all websites in your search engine, but promotes or demotes sites with this label. How much a site is promoted or demoted depends on the weight you assign to it. | You want a broad search engine that emphasizes some sites but does not
exclude other sites altogether.
For example, if you want to create a search engine with a wide coverage, but
you are partial to your own website (the best website ever!), use labels
with the |
Creating Weighted Labels
Once you have labels that include, promote, or exclude sites, you can assign
weights to the inclusive labels. Weights let you define how much a label should
promote or demote a tagged site. The values for weights can range from
-1.0
to +1.0
. The weight range gives you fairly
refined control over sites. A positive weight in the label emphasizes sites
tagged with it, while a negative weight, de-emphasizes.
The following code shows a weighted label:
<BackgroundLabels> <Label name="_include_" mode="FILTER" weight="0.65"/> <Label name="_exclude_" mode="ELIMINATE"/> </BackgroundLabels>
The boost and filter labels that do not have defined weights, such as those
generated by Programmable Search Engine, have a default weight of +0.7
. So if
you want to strengthen the generated label's ability to promote sites, change
the value to something greater than +0.7
. If you change the value
to something lower than default, you weaken the label's boosting effect on the
ranking of the site. When you go the other way and assign a negative weight for
the label, that label will demote or suppress a site. As you approach
-1.0
, it gets increasingly hard for sites to have a high ranking
in the results. At -1.0
, even a highly ranked site will have a hard
time overcoming the strong demotion.
The following table demonstrates how results are adjusted based on the mode and weight of a label.
Mode | Weight | Effect |
---|---|---|
BOOST |
+1.0 |
Gives the site a big promotion. However, it does not necessarily mean
that the tagged site will be the top result at all times, nor that other
sites will be excluded. It is not the same as setting the mode to
FILTER . Results could still be shown even when none of them
matches the
label. And results that are significantly more relevant to the search query can
still trump your heavily favored but irrelevant sites.
If you feel strongly that the sites you tag with heavily weighted labels should be the top results at the exclusion of all other results, you should use a filter label instead of a boost label. |
BOOST |
-1.0 |
Gives the site a big demotion. This is not the same as setting the mode
to ELIMINATE , because results that are deeply relevant might
still be shown.
The site will have an upstream battle to get a fairly high ranking, but it is
not blocked out completely. |
BOOST |
Undefined | If you do not define the weight (for example,
<Label name="standard" mode="BOOST"/> ), it has an
implicit weight of +0.7 . |
FILTER |
+1.0 |
Gives the selected site a big promotion. When the mode is set to
FILTER , Programmable Search Engine will show only sites that match the
label. So
if none of your selected sites is relevant to the user query, no result will
be displayed. |
FILTER |
-1.0 |
Effectively blocks the selected site from the results. It is as though you have tagged the site with an eliminate label. |
FILTER |
Undefined | If you do not define the weight (for example,
<Label name="standard" mode="FILTER"/> ), it will have an
implicit weight of +0.7 . |
ELIMINATE |
No weight | Blocks the site. Sites that match the label will not be shown. If all relevant results happen to have an eliminate label, you could have an empty results page. This is more likely to happen with filter-type search engines, not boost-type search engines. |
You can create multiple labels of varying weights, and apply them to sites as you see fit. For example, you might want to create a label that strongly promotes sites and another that mildly promotes sites. You can create as many weighted labels as you want, but after a certain point, they can become hard to manage. A better way to control the ranking of sites at a more granular level is through scores, which are discussed in the next section.
Tagging Sites with Labels
Once you have defined labels, you can start tagging sites with them. Each annotation can have multiple labels, which means that the same site can be used in other search engines and be ranked differently.
<Annotations> <Annotation about="webcast.berkeley.edu/*" score="1"> <Label name="cse_university_boost_highest"/> <Label name="cse_bicycles_exclude"/> <Label name="cse_hamsters_filter"/> </Annotation> </Annotations>
Modulating the Effects of Labels
Scores let you modulate the influence of labels. They can dampen or reverse
the effects of the labels on specific sites. The score
attribute of
the Annotation
element can have a value that ranges from
-1.0
to 1.0
. A score of 0
removes the
influence of the label over the ranking of the site; a score of 1
applies the full influence; a score of -1
completely reverses the
effects. Values between 0
and 1
or -1
and
0
(for example, 0.55
) are for fine-tuning the
influence of the labels. If you do not assign a score to an annotation, Custom
Search applies the full effect of the label to the site. It is as though you
have assigned it a score of 1
.
The following table demonstrates how scores can adjust the influence of labels:
Mode | Weight | Score | Effect |
---|---|---|---|
Any | Any | None | The same as giving the annotation a score of 1.0 . The label
is applied to the site in full. |
BOOST |
+1.0 |
-1.0 |
The same as reversing the BOOST label and giving it a
weight of -1.0 . It aggressively demotes the site. |
BOOST |
-1.0 |
-1.0 |
The same as reversing the BOOST label and giving it a
weight of +1.0 . It aggressively promotes the site. |
FILTER |
+1.0 |
-1.0 |
The same as tagging the site with an ELIMINATE label. It
completely excludes the site. |
FILTER |
-1.0 |
-1.0 |
The same as reversing the FILTER label and giving it a
weight of +1.0 . It aggressively promotes the site. |
ELIMINATE
|
No weight | -1.0 |
The same as converting the ELIMINATE label into a filter
label with a score of +1.0 . It aggressively promotes the
site. |
Example: Code for Score
In the following example, we have three sites tagged with the same search engine label. However, the effects of the label are not uniform across the three different sites because each annotation has a different score, applying the label with different intensities.
<Annotations> <Annotation about="*.edu/*" score="0.0001"> <Label name="vision_label"/> </Annotation> <Annotation about="*.ucsd.edu/*" score="0.7"> <Label name="vision_label"/> </Annotation> <Annotation about="*.vision.ucsd.edu/*" score="1"> <Label name="vision_label"/> </Annotation> </Annotations>
Even though all three annotations have the vision_label
tag,
Programmable Search Engine treats them differently on account of their scores. Results from
vision.ucsd.edu
are heavily favored; those from ucsd.edu
are moderately favored; and those from .edu
top-level
domains are slightly favored over other sites.