Contributions > Service Genres > Search Login
  Minimize
 
 
  Print  Minimize
Name
 
  • Name: Search
Rationale
 

This service genre has been authored in order to support content discovery applications relying on search functionality, and provides an overview of the essential behaviours and design concerns involved.

Version
 
  • e-Framework Serice Genre Version: v1.0
Description
 

Search is the process by which an application presents a query to a data source and the data source responds by returning information corresponding to objects in the collection that match the query criterion. A service interface for the data source provides a mechanism for the external agent (accessor) to send the query to the data source to get the set of query results. The returned set of results is generally made available to an external user or application. The returned results may be the entire objects matching the query, but are more typically information abut the matching objects. This information may include object locators, which allows the search service to be used together with an obtain service that allows access to the full objects. How the search service represents queries and how it matches the query criterion are specified within service expressions that specialize this service genre.

This is a general description of a search service genre, independent of application end point, data source, query language, or underlying communications protocols and service models. The service genre does not include a mechanism to authenticate clients. The service genre may be used in conjunction with authorization methods to control the return and filtering of results. Such authorization may be included in a service expression extending this service genre, or as part of a service usage model that combines the authorization service genre with the search service genre. If the query results only include object identifiers and not object locators, the service genre can also be used in a service usage model with resolution processes (including appropriate copy provisions) to obtain an object given an object identifier.

The service genre description that follows uses e-Framework service genre description elements as of 2006-12-09, updated to include draft e-Framework service classifications. Other terms, e.g., client, provider, data source, are used as defined in the e-Framework.

Items are tagged and identified using names assigned by the FRED project. Formal e-Framework names will be assigned by the e-Framework.

The search service genre provides the mechanism to query a data source to find objects matching the query. It is an example of a request-response process where a single request may return a composite or multi-part response. It may be part of a stateful workflow. (Stateful behaviour is defined by the e-Framework Classification Scheme for SUMs.) The service genre may be used to send a query to a single or to multiple data sources through the single search service end point; if these data sources are hosted as distinct entities (e.g., as separate repositories), the function provided by the service genre is termed federated search.

The data source is assumed to be a collection of data objects, each of which is discoverable through a set of attributes (keyword, labels, object text) collectively denoted search terms. These search terms are used in the matching or query process to determine the results set, i.e., the objects that match the query are determined based on a comparison between the objects and the search terms. The prime use of this service genre is identifying one or more of the objects in the collection that match the query

This service genre focuses on discovery of information from repositories, registries and other similar data collections of discoverable content objects. Typically, the data retrieved will be information about the matching objects in the repository being searched (not the entire object). The search may specify what type of information is to be returned (e.g., identifier, metadata, summary, full object) for the objects in the results. Searching is used to discover content objects and present information about them to external applications and end users. The service genre provides a discovery interface for any content repository or registry that supports search, or that is part of a federation that supports search. Combined with a service such as the obtain service, the service genre provides the mechanism to retrieve the discovered object.

The service genre may be specialized in service expressions to obtain particular types of information about the objects in the results, to specify query language or to specify communications, messaging and transport protocols.

As defined, the search service genre is not access controlled; i.e., any client may attempt to contact a search service end point. There are no authentication controls. The service end point for the data source is responsible for determining what results it will return and from which clients it will accept requests

Functionality
 

Query Functions. These functions are used to query the data source and obtain the results set.

Requests SHALL specify:

  • A query defined in a query language.

Requests MAY specify:

  • The query language being used in the request.
  • The desired format or digital representation for the result, e.g., the schema of the objects in the results set.
  • Filters to be applied to the results set before it is returned to the client. For example, the filter may specify: selection of parts of the result set, formatting the result set, sorting the result set, grouping similar results, ranking, merging. (Filters are defined in the Structure element.)

Mechanisms MAY exist to provide flow control and persistence of results so that large results sets are returned in chunks or so that queries MAY request a part of a results set.

The query functionality MAY be split across multiple behaviours and requests. Some service expressions that specialize the service genre MAY manage control information separately from the query request while others MAY combine the control and query request into a single behaviour. Separating these behaviours implies that a query request is stateful.

In federated search, the specification of which subset of data sources to query is assumed to be a part of the query specification or the control data, not a separate function.

Search Description Functions. These functions are used to get descriptive information about the capabilities of the service end point. This information enables a client to successfully communicate with and obtain information about the service end point that is providing the interface to the data source. The information SHOULD be machine processible, so that a service can formulate a query based on search description information without human intervention. Description information gathered MAY include:

  • Description of the data source at the service end point.
  • Description (machine processible) of communications and transport protocols supported.
    • Protocol information SHOULD include version numbers
  • Description of information that can be included in a query request (e.g., search terms). Schema and digital representation format information SHOULD include version numbers.

No other functionality is defined. The functionality that is defined MAY be extended.

Usage Scenarios
 

In the simplest case, an end user (or a system acting on behalf of the end user) wishes to ascertain what objects are available from the data source matching a given search criterion. The requestor issues the request, and receives a response in terms of metadata describing the matching objects. The requestor may process the metadata further, to identify particular objects of interest. This further processing may take the form of another search (operating on the data source and constraining the previous query; or a local search on the retrieved data). Alternatively, the results may be presented to the end user to enable browsing through the search results. The requestor may then use data from the retrieved metadata descriptions (typically an identifier or locator) to trigger a distinct service of obtaining the objects identified by the search. A typical usage workflow is:

  1. Make search request
  2. Retrieve set of metadata for matching objects
  3. (Optional) Identify objects of interest through metadata description
  4. (Optional) Obtain objects of interest through identifiers and locators included with metadata description

This scenario may be enhanced as follows:

  1. The search returns the matching objects themselves, rather than metadata describing them. Step 4 is unnecessary.
  2. The metadata descriptions retrieved are filtered, so that only specific fields are returned. For instance, only the locators of the objects concerned are returned, so that the requestor may proceed directly to obtaining them. If enough information is present in the filtered record, Step 3 is still possible.
  3. The complete set of matching records is not retrieved in the one request, due to processing or user constraints. The retrieval of matching records is spread out over several request–response pairs, amounting to repetitions of Steps 2–4; the result set persists over the series of requests. Each response contains a resumption token, which is passed back to the data source to retrieve the next subset of matching records: for example, the response containing the 100th though 149th matching result records (object descriptions) contains a resumption token, which is used to request the next 50 records (the 150th through 199th matching result records). The resumption token enables flow control.
  4. The requestor may not already know what information is available from the provider, including what attributes the data source may be queried by, what the possible formats for results are, what the possible query languages are, etc. Preparatory to making the search request, the requestor queries the data source for metadata about its search interface, in order to formulate a search request accurately.
Applicability
 

As defined, the service genre may request any digital representation available from the data source; it does not define behaviour if the specified digital representation is not available.

This service genre does not define behaviour when the data source requires authentication to permit search.

This service genre does not define behaviour when the data source requires authorization or access controls to permit search.

This service genre does not define behaviour when the search service end point attempts to apply a filter relying on information not available from the data source. For instance, if a filter relies on authorization policies or rules that are not communicated to the data source through the query interface, the behaviour is not defined.

This service genre does not define behaviour if communications need to be secure.

Requests & Behaviours
 

The format and definitions for requests and responses SHALL BE defined by the service expressions that specialize the service genre. Requests and behaviours SHALL meet the following conditions:

  • At least one request providing a query function SHALL be defined.
    • The request SHALL process the defined query language.
    • The query language MAY be profiled.
    • The request SHOULD include mechanisms to specify or control the format or content of results sets.
    • The request SHOULD include mechanisms to specify the part of the results set to be returned.
    • Responses SHOULD include flow control.
  • At least one request providing a search description function SHOULD be defined.
    • The response SHALL return basic metadata about the target data source.
    • The response SHOULD include protocol and result format information.
  • Responses SHALL include error indicators or other needed control information.
    • Indicators SHALL be available for the requests as a whole (e.g., malformed query).
    • Separate indicators SHALL be used to describe the availability of the results.
Use & Interactions
 

The model for a client to interact with a service implementation end point SHALL BE defined by the service expression that specialises the service genre.

Structure
 

A schematic of the service genre is shown below. The service endpoint provides the interfaces for the requests and responses for query functionality and search description functionality. Internally the service implementation MAY include various features (indexing, flow control, persistence layer, etc) needed to provide the defined functionality. The service genre interfaces with the data source holding the object collection and the search description data.

 Search001.gif

If the response to a search request is too large to fit in a single response, then mechanisms are used to retrieve the response from the data source over a series of requests, each request asking for another subset of the set of matching objects. The set of objects matching the query needs to persist on the data source over the duration of the series of requests (persistent results). Moreover, the sequence of requests requesting subsets of the set needs to be coordinated, so that the entire set is ultimately retrieved (flow control). The data used to coordinate the series of requests is termed control information.

Search results may be passed through a filter before being passed back to the user. A filter specifies a data transformation which is applied to the response prepared to be delivered to the user. The user receives the transformed version of the response, rather than the original response.

Applicable Standards
 

None. No standards are directly applicable to the service genre as a whole. The service expression that specialises the service genre SHALL BE defined in terms of standards:

  • Service expressions SHALL specify applicable query language standards or query language standard profiles (e.g., CQL).
  • Service expressions SHALL specify an applicable data model of what is searchable (i.e., what the allowed search terms are; e.g., Dublin Core).
  • Service expressions SHALL specify applicable query communications interfaces (e.g., SRW/SRU, Z39-50).
  • Service expressions SHALL specify applicable data encoding and representations for the returned objects (e.g., XML representation of LOM).
  • Service expressions SHALL specify applicable communications, encoding and transport protocols.
Design Decisions and Tradeoffs
 

The following design decisions apply to the service expressions that specialize the service genre.

Design:

  • The service expression MAY include flow control for managing results. The service expression SHALL define whether flow control is supported, and limits on request size, results sets size and result set persistence.
  • The service expression MAY include the specification of the communications protocol as part of its definition (e.g., as in SRW) or it MAY layer the functions on top of another defined communications protocol (e.g., using SQI as the communications and control protocol).
  • The service expression SHOULD clearly and cleanly separate service search description functions from query functions.

Consistency:

  • The service implementation SHOULD ensure that all discoverable objects and all defined data formats are discoverable and included in the results set, i.e., if an object is in the data source, it should be searchable; if a data format is specified, the result set should be expressible in the defined format.
  • The service implementation SHOULD ensure that once an object is included in a persistent result set, the object description SHOULD remain available from the data source in the data format specified throughout the lifetime of the result set. If all objects included remain available throughout, the result set is referred to as consistent.
  • The service implementation MAY NOT ensure consistency of results across stateful queries. For example, if a result set is large and it is returned in chunks across multiple steps, an object that is part of the result set may be deleted from the data source but this deletion may not be reflected in the result set.
  • The timing of updates and transactions on the data source MAY impact search requests in a way that would omit objects from the results set.

Performance:

  • A service implementation SHALL be capable of handling simultaneous requests from different clients.
  • A service implementation SHOULD implement an indexing scheme or equivalent method to permit efficient discovery.
  • Load balancing SHOULD be implemented for large data source collections or those which are searched frequently
Implementation Guidance and Dependencies
 

Security and Privacy Considerations:

  • Service implementations may be subject to denial-of-service attacks.
  • Care should be taken to maintain privacy of any personal data or other records that may disclose usage patterns.
  • There are no authorization or authentication controls. Care should be taken to maintain data privacy.
Known Uses
 

Actual: None

Potential: The service genre could be used in a service usage model for federated metadata repositories. A client would discover the existence of an object through a federated metadata registry. The client could then use the obtain service to retrieve the identified object from the source repository.

Potential: The service genre could be part of a generic repository access and management service usage model.

Potential Service Expressions: Specializations of the service genre include:

  • search data source—sru basic: Search a repository using SRU as defined (CQL, http).
  • search data source—srw basic: Search a repository using SRW as defined (CQL, SOAP).
  • search data source—srw ws: Search a repository using an SRW web service (CQL, WSDL over SOAP; based on WS-I).
  • search data source open search google ajax: Search a repository using Open Search as defined with the Google query language used in the Google AJAX API.
  • search federation—sqi basic: Federated search using SQI.
Related Service Usage Models (SUMs)
 
  • FRED Service Usage Model - Registry federation - Search (genre) is a part of the registry federation service usage model (genre based) and is provided to discover objects from a registry for an end user or end user application.
  • FRED Service Usage Model: roap obtain - The roap obtain service usage model (genre based) is a combination of four services used to access an object given an object identifier. The identifier is obtained through a search request, so that a search service genre is combined with the roap obtain service usage model to retrieve an object discovered through search. A request, in the form of an identifier, is resolved [R] (including multiple resolution based on a FRBR model to obtain the appropriate work, expression, manifestation or copy) yielding the source data source that manages the object. An obtain accessor [O] is used to get the object, subject to access control filtering [A]. The resulting object is encoded and packaged in the requested format [P]. Search is used as a precursor to the roap process to discover the object identifier. Selected object identifiers are then passed through the roap workflow to obtain the object.
  • UK HE Admissions Structured Personal Statement v1.2
Related Service Patterns
 

Actual: None as documented.

Potential: roap plus search service.

Related CORE SUMs
 

None

Classification
 
Domain(s) [ ] Learning & Teaching [ ] Research
[ ] Libraries
[ ] Administration
[ ] IT Services
[X] Common
Maturity [X] Immature [ ] Mature
Development Scale [X] Isolated [ ] Ubiquitous
Status [ ] Approved [ ] Placeholder
[ ] Unapproved
[ ] Superseded
[ ] Withdrawn
Confidence Level [ ] High [ ] Medium [ ] Low
Version History
 
Version Date Author Description Organisation / Project
v0.1 2007-10-15 Daniel Rehak Initial Version FRED
v0.11 2007-03-12 Nick Nicholas Editorial: removed some redundancies FRED
v0.12 2007-03-12  Daniel Rehak  Editorial FRED
v0.13 2007-03-13 Nick Nicholas  Inserted definition of indexing FRED
v0.21 2007-03-26 Nick Nicholas Addressed queries & rewording from Nigel Ward, added classification, defined flow control adn persistent results; "resource" is now changed to "data source" FRED
v0.25 2007-03-27 Daniel Rehak Editorial. Review and comments. Structure section and diagram added. FRED
v0.26 2007-05-21  Nick Nicholas  Finalised FRED
v1.0.0 2007-05-21  Daniel Rehak  Submission to the e-Framework  FRED
Intellectual Property
 

© Copyright, e-Framework Partners 2007

Attribute this work as:
Search Genre, The e-Framework Partners, 2007. Derived from...

Attribution History:

  1. This document was derived from the Search Genre document, created by Nick Nicholas, Daniel Rehak, and Nigel Ward, and submitted as part of the Federated Repositories for Education (FRED) Project within the Australian ADL Partnership Laboratory. Copyright © University of Southern Queensland and University of Memphis, 2007

Last updated 14 October 2008

 
 
Related SUMs Minimize
 
  Minimize
Unless otherwise noted material from the e-Framework website can be downloaded for your own use under a Creative Commons Attribution-ShareAlike 2.5 Australia License
CreativeCommons-by-sa.png
 
Tuesday, February 09, 2010
Copyright e-Framework Partners 2006 - 2009

Terms and Conditions

Privacy Statement