|
|
|
|
|
- Name: Search - SRW Basic (FRED Project)
| | This service expression is a specialization of the search service genre Search Resource using SRW and CQL. SRW is a protocol for searching for resources held remotely, using the CQL query language, and Web Services as a transport. SRW queries allow both full-text searches (including proximity searches), and searches of specific fields of records. Results may be returned sorted. For searching specific fields and for sorting by specific fields, a record format needs to be specified which contains those fields. There is no expectation that the record be stored permanently in that format: records can be transformed to the required format through a “utility schema” and indexed accordingly beforehand. This service expression profiles SRW to the requirements of the FRED project, which provides infrastructure for repository federation within the e-Learning space in Australia. This imposes the following requirements: - Metadata schemata supported: Unqualified DC, IEEE LOM
- SRW queries are addressed to the federation registry, rather than to individual participating repositories
SRW (FRED Project) is intended to provide an SOA component for transacting searches on federation registries from clients, which can be incorporated by FRED partners into their deployments of federations. | | | | | - Author name(s) : Dan Rehak and Nick Nicholas
- Organisation : Federated Repositories for Education (FRED)
| | SRW is a web-service–based protocol for searching for resources held remotely. It builds on several pre-existing standards: - SRU is the HTTP/REST protocol for searching for remote resources. SRW is a SOAP skin imposed on SRU, so the detailed description of functionality is in the SRU specification.
- Z39.50 is the pre-Web standard for searching for remote resources, on which SRU is based. It is a more formal specification than SRU.
- CQL is the query language used by SRU and SRW (among others).
- ZeeRex is a schema for Z39.50 compliant providers to expose information about themselves: network connections, description, authority metadata, server configuration, available indexes (= searchable fields), and available XML (= metadata) schemas
The SRU, CQL, and ZeeRex functionality must be implemented in order to implement SRW. (Z39.50 may be treated as background knowledge.) Under SRW, clients submit search requests to a provider in SOAP. These search requests must contain a query in CQL. These queries consist of search clauses, and each search clause is either a search term on its own (in which case full-text search is triggered), or a search on a specific field (termed index in SRU/W). Searches on indexes specify the index, the search term, and a relation between the two; this may be (substring) equality (e.g. title = cat); but other relations are possible (e.g. title < cat). Indexes are specified as being available for search through a CQL Context Set. Context Sets are profiles of CQL, which allow additional relations and types to be added to CQL. CQL Context Sets are informal specifications of the functionality of additional relations and types, and the semantics of available indexes. Mapping these to the fields exposed to search in the data store is a matter for implementation, although a few approaches are suggested below. Consumers can additionally specify the format for records retrieved by the search (typically as a metadata schema). If the format is not specified, the provider returns records in its own default format. The specification takes the form of a URI. A number of record schemas are registered with SRU, although there is no requirement that a record schema be registered in order to be used. Consumers can also specify the sort order of results (as an XPath expression identifying the field to sort on within the result schema). They specify a starting record number and number of records to return, if multiple retrieval sessions are necessary. SRW allows persistent result sets, with a specified time to live, to support this functionality, and this service expression requires it. Objects are returned as XML records; these are typically object metadata, rather than the objects themselves. A “diagnostic” XML record is returned on error; such records have their own schema specified under SRW, and their own identifiers functioning as error numbers. | | There are three request types in this service expression, all defined in SRW: - SearchRetrieve: search for resources through a query in CQL language, and retrieve them as XML records in the specified metadata schema. For instance, the query dc.title =/fuzzy kirkegård requests all records whose title under the Dublin Core Context Set (dc.title —i.e. the Dublin Core metadata title element) is a fuzzy match (=/fuzzy) for the search term kirkegård.
- Scan: browse a given indexes of search terms, from a given point onwards. For instance, the query dc.title = kirkegård , with the parameter maximumTerms=50, returns not all records with titles matching kirkegård, but the next 50 titles in the search index beginning with kirkegård. (e.g. Kirkegård, Kirkegaard, Kierchegærd). The request may also return how many records match each retrieved search term. This request allows systems to be proactive in making possible queries available.
- In terms of the types of functionality specified by the Search Resource genre, SearchRetrieve and Scan are a Query function, and Explain is a Search Description function.
In terms of the types of functionality specified by the Search Resource genre, SearchRetrieve and Scan are a Query function, and Explain is a Search Description function. | | Under FRED, SRW is used to process search requests to the federation registry from an end user. The resources being queried are metadata records harvested from the participating repositories, and the purpose of the search is to enable discovery of the content items described by the metadata, and ultimately accessioning those content items from their repositories: - Issue CQL query to registry
- Retrieve metadata records matching query
- Browse metadata records
- Identify relevant content item
- Obtain content item (through identifier/locator in the metadata)
As the queries to metadata become more elaborate, more of the functionality of SRW is exploited. In the simplest case, no index or relation is specified. The query is then a full-text search of the metadata records. The next step is to allow specific indexes (metadata fields) to be queried, including queries with relations other than equality. The relations built into CQL include any (of the following words), all (of the following words), and range operations. Additional operations can be added through context sets, and profile-specific modifications of relations can also be added (e.g. =/fuzzy is the fuzzy version of = ). CQL itself has a predefined context set, including some of these extensions. Context sets are specified through a URI, and elements specific to a context set are indicated in queries through prefixing (e.g. dc.title); so context sets are parameterised in SRW queries, and not hardcoded. If a requestor is not already familiar with the metadata schema and associated conventions of the provider, the Explain request allows the requestor to determine what fields are available for query, and design their queries accordingly. A requester can also determine what metadata formats are available from the provider, and request a particular format for the results; this format need not match the format assumed by the CQL query. (e.g. a query may involve Dublin Core elements, but return its results in LOM). In a federation context, the metadata schemata and available indices are presumed to have been established through the federation profile, so the participating repository manager is unlikely to need to issue an Explain query once the repository has joined the federation. The search interface to indexed queries can be enhanced through using Scan requests to identify relevant search terms. For instance, given the first three letters of a search term the end user has typed in, the Scan request can work with Ajax to retrieve the next 50 matching search terms available in the index, along with how many matches each one has in the provider resource. It can then display these to the end user as a dynamically updated listing of selections. | | This service expression is not restricted to any one metadata format, since metadata formats are parameterised through the use of Context Set prefixes (for requests) and the recordSchema parameter (for responses). Nonetheless, this service expression is written with only two metadata formats in mind: unqualified Dublin Core and IEEE LOM. To that end, support on querying and retrieving in LOM is described in this service expression; other metadata formats are not covered here. The Explain request SHALL be implemented for conformance to the SRU Base Profile (http://www.loc.gov/standards/sru/base-profile.html). Authentication and authorisation are not provided in this service expression. A context requiring A&A (including filtering search results according to authorisation) will need require a deriviative service expression. The service expression defines that XML shall be transmitted as plaintext, so it makes no provision for encryption or secure transmission (outside of transmission across https). A distinct service expression or SUM is required for secure transmission. | | The three requests of SRW are: - SearchRequest. This request performs a search and retrieves reponse sets as XML records.
- Explain. This request asks for information about the SRW provider, which is returned as a ZeeRex v.2.0 XML record.
- Scan. Scan returns a list of index terms in an index on the provider, starting with the index term matching the CQL query.
Inputs and outputs for these requests are fully described in the e-Framework Description of Standard for SRW which FRED has submitted to the e-Framework. The following input and parameter values are fixed in this service expression: - SearchRequest input parameters.
- version: This service expression SHALL use the value 1.1 .
- maximumRecords: Consistent with the FRED Harvest OAI-PMH service expression, the default value for FRED SHALL be 1000.
- recordPacking: FRED SHALL use XML as the value.
- recordSchema: This service expression SHALL allow two values, DC (Unqualified Dublin Core) and FREDLOM (IEEE LOM). The default value SHALL be DC.
- resultSetTTL. For consistency with the FRED Harvest OAI-PMH service expression, this time to live shall be at least 10 minutes.
- sortKeys
- schema: As with recordSchema above, this service expression SHALL allow the values DC and FREDLOM, with the default DC.
Sorting may be applied to an existing result set still residing on the provider, as well as in a new query. In the former case, the result set is specified through the predefined cql.resultSetID index, as described under CQL and Use and Interactions. The service expression SHALL support sorting of stored result sets. - extraRequestData. This parameter SHALL NOT be used under this service expression.
Service implementation SHALL ignore the stylesheet parameter if present. This service expression SHALL NOT support XPath filtering through the recordXPath parameter. XPath support is not used by the FRED user community. - SearchRequest response parameters.
- version: This service expression SHALL use the value 1.1 .
- resultSetId: The identifier SHALL be a Handle.
- resultSetIdleTime: Under FRED, this time to live SHALL be 600 seconds.
- records:
- recordPacking: As with input, FRED SHALL use XML as the value.
- recordPosition: FRED SHALL populate this parameter.
- extraRecordData: This parameter SHALL NOT be returned under this service expression.
- extraResponseData: This parameter SHALL NOT be used under this service expression.
- Explain input parameters.
- version: This service expression SHALL use the value 1.1 .
- recordPacking: FRED SHALL use XML as the value.
- extraRequestData. This parameter SHALL NOT be used under this service expression.
- Explain response parameters.
- version: This service expression SHALL use the value 1.1 .
- extraResponseData: The XML response SHALL NOT include this element under this service expression. Service expression behaviour is undefined in queries with this parameter present.
- records:
- configInfo: For FRED, the following configurations SHALL be set:
<default type="numberOfRecords">1000</default> <default type="contextSet">DC</default> <default type="relation">=</default> <default type="sortSchema">DC</default> <default type="retrieveSchema">DC</default> <default type="recordPacking">XML</default> <setting type="maximumRecords">1000</setting> <supports type="resultSets" /> <supports type="sort" /> <supports type="scan" /> That is to say, FRED SRW provides support for result sets (= flow control), sorting results, and the Scan request; its default context set is Dublin Core; its default schema for sorting and retrieval of records is also Dublin Core; its default relation is equality; XML records returned are embedded rather than URL-encoded; and the provider returns 1000 records at a time, which is also the maximum per request. As required by the SRU base profile, the ServerInfo element SHALL be populated with host, port, and database values; the ServerInfo protocol value SHALL be SRW/U; and the version element SHALL have the value 1.1 . IndexInfo must contain at least one entry for set (= CQL Context Set) and for index/map/name (the index in the context set which is being accessed for searching); the set attribute of the latter must be populated. The following illustrates the minimum IndexInfo XML fragment for this service expression (Dublin Core titles): <indexInfo> <set identifier="info:srw/cql-context-set/1/dc-v1.1" name="dc"/> <index> <map><name set="dc">title</name></map> </index> </indexInfo> All supported indexes and schemata SHALL be listed in the ZeeRex record, for conformance with the SRU Base profile and for discovery of federations. The ZeeRex record returned by Explain requests is exemplified below, for a database hosted at http://gondolin.hist.liv.ac.uk/freddb : <serverInfo protocol=”SRU/W” version=”1.1” transport=”https” wsdl=”http://fred.na/serverdescription.wsdl”> <host>gondolin.hist.liv.ac.uk</host> <port>210</port> <database numRecs=100003 lastUpdate=”2007-03-06 11-02-21>freddb</database> <authentication> <user>azaroth</user> <password>squirrelfish</password> </authentication> </serverInfo> <databaseInfo> <title lang=”en” primary=”true”>The Science Fiction Foundation Collection</title> <description lang=”en” primary=”true”> A database containing bibliographic records describing the books and articles in the Science Fiction Foundation’s collection held at the University of Liverpool. </description> <author> Andy Sawyer </author> <contact> Rob Sanderson, azaroth@liv.ac.uk</contact> <extent> This database is complete</extent> <history> This database dates from 1863</history> <langUsage codes=”en fr ru”> The records are in English, French and Russian. </langUsage> <restrictions> This database is available only by subscription</restrictions> <subjects> <subject>cyborgs</subject> <subject>aliens</subject> <subject>latex</subject> </subjects> <implementation version=”1.2” identifier=”info-url:hdl/102/201”> <title>Fredware server</title> </implementation> <links> <link type=”www”>http://gondolin.hist.liv.ac.uk/freddb/www</link> <link type=”rss”>http://gondolin.hist.liv.ac.uk/freddb/rss</link> <link type=”oai”>http://gondolin.hist.liv.ac.uk/freddb/oai</link> </links> </databaseInfo> <metaInfo> <dateModified>2002-03-29 19:00:00</dateModified> <aggregatedFrom> z39.50r://gondolin.hist.liv.ac.uk:210/IR-Explain---1?id=ghlau-1;esn=F;rs=XML </aggregatedFrom> <dateAggregated>2002-03-30 06:30:00</dateAggregated> </metaInfo> <indexInfo> <set identifier=”info:srw/cql-context-set/1/dc-v1.1” name=”dc”/> <set identifier=”info-url:hdl/102/20101” name=”fredlom”/> <index> <title lang=”en”>Book Title</title> <map><name set=”dc” primary>title</name></map> <map><name set=”fredlom”>title</name></map> </index> <index> <title lang=”en”>Date of Publication</title> <map><name set=”dc” primary>date</name></map> <map><name set=”fredlom”>date</name></map> </index> </indexInfo> <schemaInfo> <schema identifier=”http://www.loc.gov/zing/srw/dcschema/v1.0/” location=”http://www.loc.gov/zing/srw/dc.xsd” sort=”false” retrieve=”true”> <title lang=”en”>Dublin Core</title> </schema> </schemaInfo> <configInfo> <default type=”numberOfRecords”>1</default> <setting type=”maximumRecords”>10</setting> <supports type=”proximity”/> <supports type=”relationModifier”>stem</supports> </configInfo> - Scan input parameters.
- version: This service expression SHALL use the value 1.1 .
- maximumTerms: Consistent with the FRED Harvest OAI-PMH service expression, the default value SHALL be 1000.
- extraRequestData. This parameter SHALL NOT be used under this service expression.
Service implementation SHALL ignore the stylesheet parameter if present. - Scan response parameters.
- version: This service expression SHALL use the value 1.1 .
- extraResponseData. This parameter SHALL NOT be used under this service expression.
- terms.
- extraTermData. This parameter shall not be used under this service expression.
A CQL query, as used in SearchRetrieve and Scan requests, consists of a sequence of one or more search clauses, connected by Boolean operators, and optionally a series of prefix assignments to context set URIs (indicated by >). The prefix assignments introduce the context set namespaces into queries, and have forwards scope (modulo embedding). For instance: >dc=" info:srw/cql-context-set/1/dc-v1.1" dc.title = "cat " and >cql=" info:srw/cql-context-set/1/cql-v1.1" dc.date cql.within "2002 2005" Search clauses are either search terms on their own, or clauses of the form index relation searchTerm. In the latter, index is the index (search field) being targeted, and relation the operator relating the search term to the index. Available indexes are defined in context sets. On searchTerm-only queries, see below. Relations may be modified by modifiers and qualifiers, expressed as relation/qualifier; e.g. all/fuzzy (all terms must be matched, but fuzzily), encloses/partial (allow for partial overlap between the search term and the index value). The relations predefined in CQL are limited to comparisons (=, >, <, >=, <=, <>) and proximity searches (A prox B: A is near B textually). Note that A = B in CQL means that A is a token in B, not that A is identical to B; the latter relation is expressed as A exact B, or A =/string B. (That is to say, the default qualifier of = is =/word, indicating that A is a word token within B.) As a result, the indexes used by CQL are assumed to be tokenized. Context Sets define a range of relations and relation modifiers and qualifiers. This service expression SHALL support the CQL context set, which is the default context set for CQL queries, and the Australian Education CQL Context set (available from FRED: <link forthcoming>, with prefix fredlom). This service expression SHALL support CQL to Level 2 performance: - parsing and supporting term-only queries
- parsing relational queries and Boolean combinations of query clauses
- supporting either relational or Boolean queries (this service expression SHALL support both)
- parsing any CQL query
- responding with appropriate diagnostic messages if any CQL feature is unsupported (diagnostic numbers 13–49; see below)
This service expression SHALL support all of the core functionality of CQL: prefix assignment, comparisons, and proximity searches. In the CQL context set, all additional indexes SHALL be supported. All relations SHALL be supported: scr (default relation, which SHALL be = ), exact, all, any, within, and encloses. Commonly searchTerm-only queries are interpreted as full-text searches of the record. This is a consequence of Dublin Core being used to expose fields for search; FRED shall support this expectation by make the fredlom.anyelement index available from the Australian Education context set. To elaborate: CQL interprets searchTerm-only queries as having an implicit relation of scr, an implicit index of cql.serverChoice, and a server default context set for that index. For FRED, the default index shall be anywhere. This means all indexes exposed for search by the provider, for all available metadata formats. In the case of FRED, that means all indexes exposed for Dublin Core, plus all indexes exposed for FRED LOM (including fredlom.anyelement, which includes the entire LOM record). Queries involving within and encloses MUST be defined for numerical terms. They MAY also be defined for non-numerical terms. The service expression MAY support the CQL context set relation modifiers (stem, relevant, phonetic, fuzzy). If it does not, queries using the modifiers must return with the appropriate diagnostic (error) record, info:srw/diagnostic/1/20 “Unsupported relation modifier”. The service expression SHALL support all CQL context set relation qualifiers. Masking of search terms SHALL be supported. The service expression SHALL support the CQL context set boolean modifiers, which apply to the prox relation: distance, unit, ordered, and unordered. SRW errors SHALL be reported using the SRU diagnostic record schema. Diagnostics are identified by a URI (prefixed by info:srw/diagnostic/1 ), which corresponds to an error number, and an optional details string. Registered error values are listed under http://www.loc.gov/standards/sru/diagnostics-list.html , and SHALL be used wherever applicable in preference to creating new error values. Errors 110 and 111 relate to stylesheets, which are SRU and not SRW functionality. | | SRW allows flow control, and this service expression SHALL support it. SRW implements flow control by keeping search result sets on the server, and issuing them identifiers. The predefined index cql.resultSetID is used to retrieve result sets stored on the provider through the identifier. If a query for dc.title = fred returns a record with the parameters resultSetId = hdl:201/102 and nextRecordPosition = 100, this means that the result set for the query has been kept on the server, and is to be resumed at the 100th record in the result set. The retrieval is resumed by issuing a SearchRetrieve request with the query parameter cql.resultSetID = hdl:201/102 , and startRecord = 100 . Failure to retrieve the specified result set triggers the error code info:srw/diagnostic/1/51 “Result set does not exist”. Responses for all requests SHALL include all applicable error codes. SRW errors SHALL be communicated in the SRW response, as “diagnostic” records, while SOAP errors SHALL be handled at the SOAP protocol level. (This is contrary to the SRW specification, which prefers using SRW as a default, “as the client must support the diagnostics, and toolkits may not handle SOAP faults as gracefully”.) | | All SRW requests are read-only, and do not affect the content of the provider. However, supporting flow control through result sets requires that queries through SRW be stateful: the provider must remember how many sets and items it has already presented to the client. Between queries, the set of records that the initial request specified may change, as new records are created, modified or deleted; the provider SHALL return an error indicating that the request should be reinitiated. SRW models resources as documents consisting of elements, with individual elements are exposed for search as record fields through context sets. The record field is required to be tokenised, since CQL allows the operator = to be used to identify single words in the search term. The data model does not inherently support full text searches, but full text search can be modelled as a search in an XML record with a single search field consisting of the entire source record text (as proposed for fredlom.anyelement in the Australian Education Context Set), or else with the exposed search fields spanning the entire record (cql.anywhere, as occurs for Dublin Core). The data model does not inherently support any notion of element hierarchy: search results are not contingent on whether an element is embedded within another element. SRW further presumes that the schema through which a resource is accessed can be transformed at will: so long as the provider supports both LOM and DC, for example, a query can be made assuming either schema for the resource, whatever schema the resource might be stored in (if it is statically stored). So the metadata schema is treated as a view of the resource, rather than inherent to the resource or used in its underlying coding; the schema may be merely a utility schema, and may not even be used to render the retrieved record. So: - Resource > Internal Representation > Individual Elements > Context Set Index > Search Field
This does not exhaust the transformations necessary for an SRW query. The query involving search field abstracted from the resource needs to be transformed into a query specific to the resource, potentially with a different field name internal structure. For instance, a single search field exposed to CQL may correspond to an aggregation of several fields in the resource native record. The transformation MAY be mediated through a Z39.50 interface, which uses ordinal numbers to identify indexes on a resource; Z39.50 predates SRW, and is already deployed for several resource types. 
| | The service expression interface SHALL conform to the FRED Profile for Core Service Standards [link to service genre] The versions of the relevant standards used are given under Applicable Standards. The service expression SHALL be implemented through SOAP HTTP binding, allowing the SOAP Request-Response message exchange pattern, which uses HTTP POST (SOAP v1.2 2 7.5). This XML response SHALL be encapsulated within the body of a SOAP envelope, which MAY exclude the XML declaration. SRW request and response structures are defined through the SRW Types schema, srw-types.xsd (also available at http://www.loc.gov/standards/sru/xml-files/). This schema in turn imports diagnostics.xsd , used for diagnostic records (= error messages), and xcql.xsd, used to express CQL in XML. SRW Explain responses follow the ZeeRex schema, available at http://explain.z3950.org/dtd/index.html (as a DTD and XML Schema). The commentary to the schema (http://explain.z3950.org/dtd/commentary.html ) outlines the modifications to ZeeRex specific to SRU/W. Records to be transferred in unqualified Dublin Core SHALL follow the registered SRU schema at http://www.loc.gov/standards/sru/dc-schema.xsd . The SRW identifier for Dublin Core schema, info:srw/schema/1/dc-v1.1, SHALL be used to identify this schema in SRW requests and responses. Records to be transferred in LOM SHALL follow the IEEE LOM schema as defined in IEEE 1484.12.3-2005. The schema is available at: http://ltsc.ieee.org/xsd/lomv1.0/lom.xsd. The LOM XML namespace, http://ltsc.ieee.org/xsd/LOM, SHALL be used to identify this schema in SRW requests and responses. CQL follows the definition in http://www.loc.gov/standards/sru/cql/index.html , and presupposes the CQL Context Set in http://www.loc.gov/standards/sru/cql/cql-context-set.html (which is the default context set). CQL Queries involving Dublin Core elements SHALL follow the CQL Dublin Core Context Set: http://www.loc.gov/standards/sru/cql/dc-context-set.html. (This is limited to a listing of the 15 elements of unqualified Dublin Core, all of which must be exposed for query.) CQL queries involving LOM elements SHALL follow the Australian Education CQL LOM Context Set [ref]. This context set is limited to exposing LOM elements as indexes; it serializes the classification taxonomy paths so that all intermediate stages in the taxonomy are also available as search terms. Sort keys SHALL be serialized as specified for SRU. The five fields of a sort key are delimited by commas, and sort keys are delimited from each other by spaces; Boolean values are expressed as 0 or 1, and parameters containing quotes, commas, or spaces must be quoted. The following illustrations use XML representations of sortKeys for clarity, although no such XML representations actually appear in SRW: <sortKeys> <path>title</path> <schema>info:srw/schema/1/dc-v1.1</schema> <ascending>true</ascending> <caseSensitive>insensitive<caseSensitive> <missingValue>highValue</missingValue> </sortKeys> <sortKeys> <path>date</path> <schema>info:srw/schema/1/dc-v1.1</schema> <ascending>false</ascending> <caseSensitive>insensitive<caseSensitive> <missingValue>highValue</missingValue> </sortKeys> The serialization omits the default values true (for ascending), insensitive, and highValue: <sortKeys>title,info:srw/schema/1/dc-v1.1 date,info:srw/schema/1/dc-v1.1,0</sortKeys> i.e. sort by title element under Dublin Core, or else date element under Dublin core, descending. <sortKeys> <path>/record/title</path> <schema>http://ltsc.ieee.org/xsd/LOM</schema> <ascending>false</ascending> </sortKeys> <sortKeys> <path>/record/rights/datafield[@tag=”100”]/subfield[@code=”a”]</path> </sortKeys> Serialisation: <sortKeys>”/record/classification”,”http://ltsc.ieee.org/xsd/LOM”,1 “/record/rights/datafield[@tag=\”100\”]/subfield[@code=\”a\”]"</sortKeys> i.e. sort by the /record/classification element of the metadata record under LOM; or else use the subfield element with attribute code=a of the datafield element with attribute tag=100 of the rights element under the record element, using the default schema for the provider (Dublin Core). SOAP error processing SHALL generate an error message for each SOAP error encountered, rather than allowing only one random message to be generated. A VersionMismatch SOAP fault SHALL provide an Upgrade SOAP header block, detailing what SOAP envelope versions are supported by the node (SOAP v.1.2 1 5.4.7). A decoding SOAP fault of subcode enc:MissingID, enc:DuplicateID, or enc:UntypedValue SHALL be generated under the conditions given in SOAP v.1.2 2 3.2. A RPC SOAP fault of code env:Receiver, env:DataEncodingUnknown or env:Sender SHALL be generated under the conditions given in SOAP v.1.2 4.4. The service expression SHALL accept URIs of at least 4000 bytes in length. A SOAP Fault code with value Sender (indicating malformed message), and a detail element explaining the malformed URI, SHALL be generated if the URI is too large to be processed. All child elements of the SOAP envelope SHALL be namespace-qualified (SOAP v1.2. 1 5.3.1) SOAP Active Intermediaries SHALL NOT be used in the context of this service expression (SOAP v1.2 1 2.7.3). A service implementation SHALL return an HTTP status 400 BAD Request when the request contains code injection or other malicious elements. SRW diagnostic records SHALL be generated in response to queries involving unavailable metadata formats. In particular: - Queries formulated using an unavailable metadata format SHALL trigger the appropriate error info:srw/diagnostic/1/15 “Unsupported context set” or info:srw/diagnostic/1/16 “Unsupported index”.
- Queries requesting records to be returned in an unavailable metadata format SHALL trigger an appropriate error: info:srw/diagnostic/1/66 “Unknown schema for retrieval”, info:srw/diagnostic/1/67 “Record not available in this schema”, info:srw/diagnostic/1/69 “Not authorized to send record in this schema”, info:srw/diagnostic/1/87 “Unsupported schema for sort”.
| | The following standards are involved: | Performance: - While the data model allows on-the-fly transformation of records, those transformations SHOULD be anticipated by corresponding indexes being made available ahead of time—that is, if the records are already transformed into the available schemas, and the resulting search terms indexed. If this is not practical, the implementer will need to determine ahead of time which transformations and searches will be most frequent, and SHOULD prioritise those for indexing.
- A service implementation SHALL be capable of handling simultaneous requests from different clients.
- Load balancing SHOULD be implemented for large repositories or those which are queried frequently (continuously).
- The IEEE reference schema for LOM is multilayered and considered cumbersome; see for example http://www.ostyn.com/standards/scorm/samples/simplerlomschemadoc.htm. However, to guarantee conformance with any LOM instance, including custom extensions, the IEEE schema SHALL be used.
Interoperability: - Clients should be aware that URIs longer than 255 bytes may not be supported by intermediaries (caches, proxies).
Security: - A service implementation SHALL inspect all requests for possible code injection.
- A client SHOULD validate all XML results against appropriate schemas. (SOAP v1.2 2 C).
- SOAP messages SHOULD validate against both the minimum schema and the SOAP Encoding schema.
- Only well-defined SOAP header and body blocks should be processed (SOAP v.1.2 1 7.1).
- Since SOAP intermediary nodes are men-in-the-middle, all messages must be properly authenticated with regard to all nodes along the message path (SOAP v1.2 1 7). See also SOAP v.1.2 2 A.2 for HTTP Binding, RFC 3023 on XML Media types, section 10; WS-I v1.2 section 6.
| | Security Considerations: - Service implementation may be subject to denial-of-service attacks.
- Service implementation may log request and results. Security of logs should be maintained.
- There are no authorization or authentication controls. Care should be taken to maintain data privacy.
- Service implementations must respect flow control responses from repositories.
The Norwegian Z39.50 Interest Group (NorZIG) have an SRW/SRU profile which may be used for comparison: http://www.norzig.no/srw/norzig_profile.html There is preliminary discussion of authentication for SRW (using tokens) under http://www.loc.gov/standards/sru/token.html The Cheshire3 project has an implementation of SRW: http://srw.cheshire3.org/, http://cheshire3.sourceforge.net The specification of indexes in CQL context sets is deliberately not machine-readable, in order not to constrain context sets to any one interface. Frequently the native format for metadata in a repository is Dublin Core, in which case explicit mapping is not needed (e.g. DSpace with the OCLC toolkit: http://pubserv.oclc.org/srw/DSpaceSRWConfiguration.html). Another alternative is mapping the CQL query to a Z39.50 query, and relying on existing Z39.50-based infrastructure to map that query to a query specific to the database (e.g. http://pubserv.oclc.org/srw/PearsSRWConfiguration.html). The YAZ toolkit, which supports both Z39.50 and SRW, does the latter ( http://indexdata.dk/yaz/doc/). The Cheshire3 project proposes using ZeeRex records, with a namespace enhancement specific to their project, to make such mapping explicit ( http://cheshire3.sourceforge.net/docs/build_protocolMap.html). The following from their site illustrates this: it is a ZeeRex record a Dublin Core title to search as both words and stems, but the c3 namespace adds the Cheshire3 local database counterparts to the required elements: a local title index, distinct local indexes for title words and title stems, and a local transformer of the native record format to Dublin Core. <explain xmlns="http://explain.z3950.org/dtd/2.0/" xmlns:c3="http://www.cheshire3.org/schemas/explain/"> <serverInfo protocol="srw/u" version="1.1" transport="http"> <host>myhostname.mydomain.com</host> <port>8080</port> <database>services/databasename</database> </serverInfo> <indexInfo> <set identifier="info:srw/cql-context-set/1/dc-v1.1" name="dc"/> <index c3:index="title-idx"> <title>Title</title> <map><name set="dc">title</name></map> <configInfo> <supports type="relationModifier" c3:index="titleword-idx">word</supports> <supports type="relationModifier" c3:index="titlewordstem-idx">stem</supports> </configInfo> </index> </indexInfo> <schemaInfo> <schema identifier="info:srw/schema/1/dc-v1.1" name="dc" c3:transformer="dublinCoreTransformer"> <title>Simple Dublin Core</title> </schema> </schemaInfo> </explain> Note that certain changes in the interface will be required by the next version of SRU/W. These are given in the e-framework Description of Standard. | | Domain(s) | [X] Learning & Teaching | [ ] Research [ ] Libraries | [ ] Administration [ ] IT Services | [ ] Common | | Maturity | [X] Immature | [ ] Mature | | Development Status | [ ] Proposed | [X] Developmental | [ ] Prototype | [ ] Production | | State Behaviour | [ ] Stateful | [ ] Stateless | | Transactional Behaviour | [ ] Transactional and ACID | [X] Transactional but Not ACID | [ ] Non-Transactional | | Batch Behaviour(s) | [X] Individual | [ ] Batch | | Time-Constraint Behaviour | [ ] Hard Real Time | [ ] Soft Real Time | [X] None | | Service End Point | [X] Provider | [ ] Requestor | [ ] Transcoder (both requests and provides) | | Authentication / Authorisation Dependency | [ ] Auth-Dependent | [X] Auth-Independent | Protocol Binding(s)
| [ ] Web Service [X] SOAP | [ ] REST [ ] HTTP | [ ] Other | | Deployment Scale | [X] Isolated | [ ] Ubiquitous | | Status | [ ] Approved | [ ] Placeholder [ ] Unapproved | [ ] Superseded [ ] Withdrawn | | Confidence Level | [ ] High | [ ] Medium | [ ] Low | | | Version | Date | Author | Description | Organisation / Project | | v1.0 | 2007-03-05 | NN | Initial Draft
| FRED | | v1.1 | 2007-03-12 | DR | Editorial, classifications, clarifications | FRED | | v1.12 | 2007-03-13 | NN | Editorial | FRED | | v1.13 | 2007-03-16 | NN | Dropped support for XPath | FRED | | v1.14 | 2007-03-27 | DR | Editorial | FRED | | v1.15 | 2007-04-11 | NN | Mentioned tokenization; editorial | FRED | | v1.2 | 2007-10-02 | NN | Deleted content overlapping with efDOS | FRED | | | © Copyright, University of Southern Queensland 2007 and University of Memphis 2007. This work is created as part of the Federated Repositories for Education (FRED) Project within the Australian ADL Partnership Laboratory. The FRED project is sponsored by the Australian Commonwealth Department of Education, Science and Training under the Framework for Open Learning Programme. The Australian ADL Partnership Laboratory is supported by the University of Southern Queensland Attribute this work as: Search - SRW Basic (FRED Project) Service Expression, authored and submitted by Nick Nicholas and Daniel Rehak on behalf of the Federated Repositories for Education (FRED) Project within the Australian ADL Partnership Laboratory, © University of Southern Queensland and the University of Memphis, 2007. Last updated 14 October 2008 | | | | The words MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL in this document are to be interpreted as described in [RFC 2119]. |
|
|
|