URI syntax supported by TC
These calls to Uniform Resource Idenitifers (URIs) parallel calls through the API. However, we in time want these to become URI (Uniform Resource Identifiers)/ LOD (Linked Open Data) calls. The system described here will change as we explore our options.
We assume this format for the URIs, based on standard URN syntax and employing the Kahn/Wilensky system (http://www.cnri.reston.va.us/k-w.html) of specifying a naming authority followed by the name:
- We express the naming authority as follows: urn:det:tc:usask:CTP2, representing community CTP2 at the usask instance of the det:tc scheme.
- We separate the naming authority from the name by '/'.
- The name itself is set as a set of hierarchical property value pairs, beginning at the top of the hierarchy (respectively, document and entity) for each resource.
Thus:
for documents: urn:det:tc:usask:CTP2/document=Hg. Document Hg (the Hengwrt manuscript of the Canterbury Tales)
for entities (an act of communication): urn:det:usask:CTP2/entity=GP The General Prologue of the Canterbury Tales
for texts, which is defined by an instance of an entity in a document: urn:det:tc:usask:CTP2/document=Hg:entity=GP. The General Prologue of the Canterbury Tales in the Hengwrt manuscript.
In addition to identifying a particular page or entity we wish to request a particular aspect of this page/entity/text. We do this through a query string affixed to the resource identifier. This has two parts, 'type' and 'format', separated by &. The format specifier may be omitted when the result is a JSON object. Thus:
urn:det:tc:usask:CTP2/document=Hg?type=transcript&format=xml. Will return the whole transcription of the Hengwrt manuscript (currently not implemented)
urn:det:tc:usask:CTP2/document=Hg:folio=1r?type=transcript&format=xml l Will return the transcription of folio 1r of the Hengwrt manuscript in xml
urn:det:tc:usask:CTP2/document=Hg:folio=1r?type=transcript&format=html Will return the transcription of folio 1r of the Hengwrt manuscript in html, as a stand-alone web page suitable for embedding in an iframe. This is the same html as shown if you click on 'Preview' for any page
urn:det:tc:usask:CTP2/entity=GP:line=1:document=Hg?type=transcript&format=xml Returns the text of GP line 1 in the Hengwrt manuscript in xml (note: as there may be more than one version of the text in the manuscript, the text is returned in a series of JSON array elements, giving the page each text begins on)
urn:det:tc:usask:CTP2/entity=GP:line=1:document=Hg?type=CollEditor Returns a Collation Editor JSON object containing the text of GP line 1, ready to be collated by the Collation Editor
urn:det:tc:usask:CTP2/entity=GP:line=1?type=apparatus&format=xml/positive Returns the XML positive apparatus of GP line 1, as approved in the Collation Editor. Returns an empty <app> element if there is no approved collation for this line.
For CollateX, see https://collatex.net/. For the Collation Editor, see https://github.com/itsee-birmingham/collation_editor.
Images in TC are always held in IIIF format, and may be accessed via IIIF calls:
urn:det:tc:usask:CTP2/document=Hg:folio=1r?type=IIIF&format=url For a given document page, returns the url call which when called returns the json structure from which the IIIF viewer generates the image of this page. Note that as TC might have multiple images for a page, the urls are held in a JSON array with each element of the array being the url for one image. If there are no images for this page, the array is empty (as of January 2019: multiple images per page are not implemented in TC)
urn:det:tc:usask:CTP2/document=Hg:folio=1r?type=IIIF&format=json For a given document page, returns the json structure from which the IIIF viewer generates the image of this page. As above, because TC might have multiple images for a page, the json structures are held in an array. Note that images in TC may actually be held not within TC's own IIIF server but might be on another IIIF server. The calls above are still valid: they will simply direct the call to that other server, and all should still work just as if the images were actually on TC's own IIIF image server.
Wildcard calls
The wildcard * can be used in calls to specify every instance of a resource. You can use sequences of wildcard calls to construct heirarchical menus. For example: first you retrieve a list of all the communities; then choose a community, retrieve a list of all the documents in the community, choose a document, retrieve a list of all the pages in the document, and then retrieve the transcription of that page.
First, we can return information about the communities on the server, and the entities and documents they contain:
urn:det:tc:usask:*/?type=count Returns the number of public communities on this server, as "{count: xxx}" where xxx is the number of communities.
urn:det:tc:usask:*/?type=list Returns a JSON list of all public communities on this server
urn:det:tc:usask:CTP2/document=*?type=count Returns the number of documents in this community
urn:det:tc:usask:CTP2/document=*?type=list Returns a JSON list of all documents in this community
urn:det:tc:usask:CTP2/entity=*?type=count Returns the number of top-level entities in this community
urn:det:tc:usask:CTP2/entity=*?type=list Returns a JSON list of all top-level entities in this community
urn:det:tc:usask:CTP2/vmap=*?type=count Returns the number of variant maps in this community
urn:det:tc:usask:CTP2/vmap=*?type=list Returns a JSON list of all variant maps in this community
We can use this information to dig down into the documents, texts and entities in this community:
urn:det:tc:usask:CTP2/document=Hg:*=*?type=count Returns the number of folios in document Hg (document=Hg:pb=* would give the same)
urn:det:tc:usask:CTP2/document=Hg:*=*?type=list Returns a JSON list of all the folios in document Hg (=document=Hg:pb=*)
urn:det:tc:usask:CTP2/entity=GP:*=*?type=count Returns the number of parts of entity GP (lines 1 to 856) (entity=GP:line=* would give the same)
urn:det:tc:usask:CTP2/entity=GP:*=*?type=list Returns a JSON list of all parts of entity GP (lines 1 to 856)
urn:det:tc:usask:CTP2/entity=GP:document=*?type=count Returns the number of documents containing texts of GP (the General Prologue of the Canterbury Tales)
urn:det:tc:usask:CTP2/entity=GP:document=*?type=list Returns a JSON list of all documents containing texts of GP
urn:det:tc:usask:CTP2/entity=GP:line=1:document=*?type=count Returns the number of documents containing line 1 of entity GP
urn:det:tc:usask:CTP2/entity=GP:line=1:document=*?type=list Returns a JSON list of all documents containing line 1 of entity GP
urn:det:tc:usask:CTP2/entity=GP:document=Hg:pb=*?type=count Returns the number of pages of document Hg containing entity GP
urn:det:tc:usask:CTP2/entity=GP:document=Hg:pb=*?type=list Returns a JSON list of all pages of document Hg containing entity GP
urn:det:tc:usask:CTP2/entity=GP:line=*?type=apparatus&format=xml/positive Returns the XML positive apparatus of all lines in GP, as approved in the Collation Editor. Prefaced by a list of lines which do not have an apparatus. You can return the information as a NEXUS structure ready for input to a phylogenetic analysis program.
urn:det:tc:usask:CTP2/entity=*:document=Hg:pb=1r?type=list Returns a list of all entities on this page. The "collateable" property indicates if this entity is collateable (that is; a "terminal" entity which does not contain another entity)
urn:det:tc:usask:CTP2/vmap=WBP1 Returns the variant map WBP1 for this community.
Calling the URN
In this implementation, simply add the URN to the URI server prefix, as follows:
Resource types
This implementation currently supports the following resource types:
- format=html: returns the html for a page of transcription
- format=xml: returns the xml for a page of transcription
- list=parts&format=json when used with the wild card *: returns the parts of a document (as pages, columns, lines) as a list of names, which can then be used in further calls to navigate whole documents/entities.
- document=Hg:page=*?list=parts&format=json returns a JSON list of the pages in the Hengwrt manuscript (1r, 1v..). These could be used in calls to document=Hg?folio=1r, etc, to return resources relating to that page
- ?type=IIIF&format=url: returns an array of urls referencing the IIIF server for images of one page
- ?type=IIIF&format=json: returns an array of JSON structures from which a IIIF viewer generates the images for one page
To come
- Calls to return IIIF manifests for whole documents, etc
- Etc...
Notes
- We do not use a url suffix "#html' to denote the data type of the resource, as suggested at https://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/, because the # suffix (in URL terms, a "fragment identifier") is not passed to the server.
- Following URN rules, the Namespace Identifier (here, 'det' for 'documents, entities and texts') is case-sensitive and lower case. The first two elements of the naming authority specification, 'tc' and 'usask', are lower case and case-sensitive, following URN community practice. The next element of the naming authority, here 'CTP2', must be the abbreviation for a public community on the server.