|
|
|
|
|
|
|
|
This service genre has been
authored in order to support content discovery applications relying on
search functionality, and provides an overview of the essential
behaviours and design concerns involved.
|
| Domain(s) |
[ ] Learning & Teaching |
[ ] Research
[ ] Libraries |
[ ] Administration
[ ] IT Services |
[X] Common |
| Maturity |
[X] Immature |
[ ] Mature |
| Development Scale |
[X] Isolated |
[ ] Ubiquitous |
| Status |
[ ] Approved |
[ ] Placeholder
[ ] Unapproved |
[ ] Superseded
[ ] Withdrawn |
| Confidence Level |
[ ] High |
[ ] Medium |
[ ] Low |
|
-
e-Framework Serice Genre Version: v1.0
|
| Version |
Date |
Author |
Description |
Organisation / Project |
v0.1
|
2007-10-15
|
Daniel Rehak
|
Initial Version hdl:FREDNA/819843C836A649E4AFE2B82186AB5D60 |
FRED
|
v0.11
|
2007-03-12
|
Nick Nicholas
|
Editorial: removed some redundancies
|
FRED |
v0.12
|
2007-03-12 |
Daniel Rehak |
Editorial
|
FRED |
v0.13
|
2007-03-13
|
Nick Nicholas |
Inserted definition of indexing
|
FRED |
v0.21
|
2007-03-26
|
Nick Nicholas
|
Addressed queries & rewording from Nigel Ward, added classification,
defined flow control adn persistent results; "resource" is now changed
to "data source" |
FRED |
v0.25
|
2007-03-27
|
Daniel Rehak
|
Editorial. Review and comments. Structure section and diagram added.
|
FRED |
v0.26
|
2007-05-21 |
Nick Nicholas |
Finalised
|
FRED |
v1.0.0
|
2007-05-21 |
Daniel Rehak |
Submission to the e-Framework |
FRED |
|
|
Search is the process by which an application presents a query to a
data source and the data source responds by returning information
corresponding to objects in the collection that match the query
criterion. A service interface for the data source provides a mechanism
for the external agent (accessor) to send the query to the data source
to get the set of query results. The returned set of results is
generally made available to an external user or application. The
returned results may be the entire objects matching the query, but are
more typically information abut the matching objects. This information
may include object locators, which allows the search service to be used
together with an obtain service that allows access to the full objects.
How the search service represents queries and how it matches the query
criterion are specified within service expressions that specialize this
service genre.
This is a general description of a search service genre, independent
of application end point, data source, query language, or underlying
communications protocols and service models. The service genre does not
include a mechanism to authenticate clients. The service genre may be
used in conjunction with authorization methods to control the return
and filtering of results. Such authorization may be included in a
service expression extending this service genre, or as part of a
service usage model that combines the authorization service genre with
the search service genre. If the query results only include object
identifiers and not object locators, the service genre can also be used
in a service usage model with resolution processes (including
appropriate copy provisions) to obtain an object given an object
identifier.
The service genre description that follows uses e-Framework service
genre description elements as of 2006-12-09, updated to include draft
e-Framework service classifications. Other terms, e.g., client,
provider, data source, are used as defined in the e-Framework.
Items are tagged and identified using names assigned by the FRED
project. Formal e-Framework names will be assigned by the e-Framework.
The search service genre
provides the mechanism to query a data source to find objects matching
the query. It is an example of a request-response process where a
single request may return a composite or multi-part response. It may be
part of a stateful workflow. (Stateful behaviour is defined by the
e-Framework Classification Scheme for SUMs.) The service genre may be
used to send a query to a single or to multiple data sources through
the single search service end point; if these data sources are hosted
as distinct entities (e.g., as separate repositories), the function
provided by the service genre is termed federated search.
The
data source is assumed to be a collection of data objects, each of
which is discoverable through a set of attributes (keyword, labels,
object text) collectively denoted search terms. These search terms are
used in the matching or query process to determine the results set,
i.e., the objects that match the query are determined based on a
comparison between the objects and the search terms. The prime use of
this service genre is identifying one or more of the objects in the
collection that match the query
This service genre focuses
on discovery of information from repositories, registries and other
similar data collections of discoverable content objects. Typically,
the data retrieved will be information about the matching objects in
the repository being searched (not the entire object). The search may
specify what type of information is to be returned (e.g., identifier,
metadata, summary, full object) for the objects in the results.
Searching is used to discover content objects and present information
about them to external applications and end users. The service genre
provides a discovery interface for any content repository or registry
that supports search, or that is part of a federation that supports
search. Combined with a service such as the obtain service, the service
genre provides the mechanism to retrieve the discovered object.
The
service genre may be specialized in service expressions to obtain
particular types of information about the objects in the results, to
specify query language or to specify communications, messaging and
transport protocols.
As defined, the search service genre
is not access controlled; i.e., any client may attempt to contact a
search service end point. There are no authentication controls. The
service end point for the data source is responsible for determining
what results it will return and from which clients it will accept
requests
|
|
Query Functions. These functions are used to query the data source and obtain the results set.
Requests SHALL specify:
- A query defined in a query language.
Requests MAY specify:
- The query language being used in the request.
- The desired format or digital representation for the result, e.g., the schema of the objects in the results set.
- Filters
to be applied to the results set before it is returned to the client.
For example, the filter may specify: selection of parts of the result
set, formatting the result set, sorting the result set, grouping
similar results, ranking, merging. (Filters are defined in the
Structure element.)
Mechanisms MAY
exist to provide flow control and persistence of results so that large
results sets are returned in chunks or so that queries MAY request a
part of a results set.
The query functionality MAY be
split across multiple behaviours and requests. Some service expressions
that specialize the service genre MAY manage control information
separately from the query request while others MAY combine the control
and query request into a single behaviour. Separating these behaviours
implies that a query request is stateful.
In federated
search, the specification of which subset of data sources to query is
assumed to be a part of the query specification or the control data,
not a separate function.
Search Description Functions.
These functions are used to get descriptive information about the
capabilities of the service end point. This information enables a
client to successfully communicate with and obtain information about
the service end point that is providing the interface to the data
source. The information SHOULD be machine processible, so that a
service can formulate a query based on search description information
without human intervention. Description information gathered MAY
include: - Description of the data source at the service end point.
- Description (machine processible) of communications and transport protocols supported.
- Protocol information SHOULD include version numbers
- Description
of information that can be included in a query request (e.g., search
terms). Schema and digital representation format information SHOULD
include version numbers.
No other functionality is defined. The functionality that is defined MAY be extended.
|
|
In the simplest case, an end user
(or a system acting on behalf of the end user) wishes to ascertain what
objects are available from the data source matching a given search
criterion. The requestor issues the request, and receives a response in
terms of metadata describing the matching objects. The requestor may
process the metadata further, to identify particular objects of
interest. This further processing may take the form of another search
(operating on the data source and constraining the previous query; or a
local search on the retrieved data). Alternatively, the results may be
presented to the end user to enable browsing through the search
results. The requestor may then use data from the retrieved metadata
descriptions (typically an identifier or locator) to trigger a distinct
service of obtaining the objects identified by the search. A typical
usage workflow is: - Make search request
- Retrieve set of metadata for matching objects
- (Optional) Identify objects of interest through metadata description
- (Optional) Obtain objects of interest through identifiers and locators included with metadata description
This scenario may be enhanced as follows:
- The search returns the matching objects themselves, rather than metadata describing them. Step 4 is unnecessary.
- The
metadata descriptions retrieved are filtered, so that only specific
fields are returned. For instance, only the locators of the objects
concerned are returned, so that the requestor may proceed directly to
obtaining them. If enough information is present in the filtered
record, Step 3 is still possible.
- The complete set
of matching records is not retrieved in the one request, due to
processing or user constraints. The retrieval of matching records is
spread out over several request–response pairs, amounting to
repetitions of Steps 2–4; the result set persists over the series of
requests. Each response contains a resumption token, which is passed
back to the data source to retrieve the next subset of matching
records: for example, the response containing the 100th though 149th
matching result records (object descriptions) contains a resumption
token, which is used to request the next 50 records (the 150th through
199th matching result records). The resumption token enables flow
control.
- The requestor may not already know what
information is available from the provider, including what attributes
the data source may be queried by, what the possible formats for
results are, what the possible query languages are, etc. Preparatory to
making the search request, the requestor queries the data source for
metadata about its search interface, in order to formulate a search
request accurately.
|
|
As defined, the service genre may
request any digital representation available from the data source; it
does not define behaviour if the specified digital representation is
not available.
This service genre does not define behaviour when the data source requires authentication to permit search.
This service genre does not define behaviour when the data source requires authorization or access controls to permit search.
This
service genre does not define behaviour when the search service end
point attempts to apply a filter relying on information not available
from the data source. For instance, if a filter relies on authorization
policies or rules that are not communicated to the data source through
the query interface, the behaviour is not defined.
This service genre does not define behaviour if communications need to be secure.
|
|
The format and definitions for
requests and responses SHALL BE defined by the service expressions that
specialize the service genre. Requests and behaviours SHALL meet the
following conditions: - At least one request providing a query function SHALL be defined.
- The request SHALL process the defined query language.
- The query language MAY be profiled.
- The request SHOULD include mechanisms to specify or control the format or content of results sets.
- The request SHOULD include mechanisms to specify the part of the results set to be returned.
- Responses SHOULD include flow control.
- At least one request providing a search description function SHOULD be defined.
- The response SHALL return basic metadata about the target data source.
- The response SHOULD include protocol and result format information.
- Responses SHALL include error indicators or other needed control information.
- Indicators SHALL be available for the requests as a whole (e.g., malformed query).
- Separate indicators SHALL be used to describe the availability of the results.
|
|
The model for a client to
interact with a service implementation end point SHALL BE defined by
the service expression that specialises the service genre.
|
|
A schematic of the service genre
is shown below. The service endpoint provides the interfaces for the
requests and responses for query functionality and search description
functionality. Internally the service implementation MAY include
various features (indexing, flow control, persistence layer, etc)
needed to provide the defined functionality. The service genre
interfaces with the data source holding the object collection and the
search description data.

If
the response to a search request is too large to fit in a single
response, then mechanisms are used to retrieve the response from the
data source over a series of requests, each request asking for another
subset of the set of matching objects. The set of objects matching the
query needs to persist on the data source over the duration of the
series of requests (persistent results). Moreover, the sequence of
requests requesting subsets of the set needs to be coordinated, so that
the entire set is ultimately retrieved (flow control). The data used to
coordinate the series of requests is termed control information.
Search
results may be passed through a filter before being passed back to the
user. A filter specifies a data transformation which is applied to the
response prepared to be delivered to the user. The user receives the
transformed version of the response, rather than the original response.
|
|
None. No standards are directly
applicable to the service genre as a whole. The service expression that
specialises the service genre SHALL BE defined in terms of standards: - Service expressions SHALL specify applicable query language standards or query language standard profiles (e.g., CQL).
- Service
expressions SHALL specify an applicable data model of what is
searchable (i.e., what the allowed search terms are; e.g., Dublin Core).
- Service expressions SHALL specify applicable query communications interfaces (e.g., SRW/SRU, Z39-50).
- Service
expressions SHALL specify applicable data encoding and representations
for the returned objects (e.g., XML representation of LOM).
- Service expressions SHALL specify applicable communications, encoding and transport protocols.
|
|
The following design decisions apply to the service expressions that specialize the service genre.
Design:
- The service expression MAY include flow control
for managing results. The service expression SHALL define whether flow
control is supported, and limits on request size, results sets size and
result set persistence.
- The service expression MAY
include the specification of the communications protocol as part of its
definition (e.g., as in SRW) or it MAY layer the functions on top of
another defined communications protocol (e.g., using SQI as the
communications and control protocol).
- The service expression SHOULD clearly and cleanly separate service search description functions from query functions.
Consistency:
- The service implementation SHOULD ensure that all
discoverable objects and all defined data formats are discoverable and
included in the results set, i.e., if an object is in the data source,
it should be searchable; if a data format is specified, the result set
should be expressible in the defined format.
- The
service implementation SHOULD ensure that once an object is included in
a persistent result set, the object description SHOULD remain available
from the data source in the data format specified throughout the
lifetime of the result set. If all objects included remain available
throughout, the result set is referred to as consistent.
- The
service implementation MAY NOT ensure consistency of results across
stateful queries. For example, if a result set is large and it is
returned in chunks across multiple steps, an object that is part of the
result set may be deleted from the data source but this deletion may
not be reflected in the result set.
- The timing of
updates and transactions on the data source MAY impact search requests
in a way that would omit objects from the results set.
Performance:
- A service implementation SHALL be capable of handling simultaneous requests from different clients.
- A service implementation SHOULD implement an indexing scheme or equivalent method to permit efficient discovery.
- Load balancing SHOULD be implemented for large data source collections or those which are searched frequently
|
|
Security and Privacy Considerations:
- Service implementations may be subject to denial-of-service attacks.
- Care should be taken to maintain privacy of any personal data or other records that may disclose usage patterns.
- There are no authorization or authentication controls. Care should be taken to maintain data privacy.
|
|
Actual: None
Potential:
The service genre could be used in a service usage model for federated
metadata repositories. A client would discover the existence of an
object through a federated metadata registry. The client could then use
the obtain service to retrieve the identified object from the source
repository.
Potential: The service genre could be part of a generic repository access and management service usage model.
Potential Service Expressions: Specializations of the service genre include:
- search data source—sru basic: Search a repository using SRU as defined (CQL, http).
- search data source—srw basic: Search a repository using SRW as defined (CQL, SOAP).
- search data source—srw ws: Search a repository using an SRW web service (CQL, WSDL over SOAP; based on WS-I).
- search
data source open search google ajax: Search a repository using Open
Search as defined with the Google query language used in the Google
AJAX API.
- search federation—sqi basic: Federated search using SQI.
|
- FRED Service Usage Model - Registry federation - Search (genre) is a part of the registry federation service usage model
(genre based) and is provided to discover objects from a registry for
an end user or end user application.
- FRED Service Usage Model: roap obtain - The roap obtain service usage model (genre based) is a combination of
four services used to access an object given an object identifier. The
identifier is obtained through a search request, so that a search
service genre is combined with the roap obtain service usage model to
retrieve an object discovered through search. A request, in the form of
an identifier, is resolved [R] (including multiple resolution based on
a FRBR model to obtain the appropriate work, expression, manifestation
or copy) yielding the source data source that manages the object. An
obtain accessor [O] is used to get the object, subject to access
control filtering [A]. The resulting object is encoded and packaged in
the requested format [P]. Search is used as a precursor to the roap
process to discover the object identifier. Selected object identifiers
are then passed through the roap workflow to obtain the object.
- UK HE Admissions Structured Personal Statement v1.2
|
|
Actual: None as documented.
Potential: roap plus search service.
|
|
None
|
The
words MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT,
RECOMMENDED, MAY, and OPTIONAL in this document are to be interpreted
as described in [RFC 2119].
The Service Genre description uses
e-Framework Service Genre description elements as of 2006-12-09,
updated to include draft e-Framework service classifications. Other
terms, e.g., client, provider, and resource, are used as defined in the
e-Framework. |
|
|
|