Page tree
Skip to end of metadata
Go to start of metadata

Welcome to the sandbox version of Textual Communities 2.0 ("TC"), at https://textualcommunitiessandbox.org (do not use the previous address at http://textcomtest.usask.org). For the differences between the sandbox and production version (at https://textualcommunities.org) see the differences between the versions.

If you just want to see what TC can do: go to the Canterbury Tales project at https://textualcommunities.org/. If you want to learn the principles behind TC, read the launch statement given at the 2018 ADHO conference in Mexico City, at https://wiki.usask.ca/pages/viewpage.action?spaceKey=TC&title=Creating+and+Implementing+an+Ontology+of+Documents+and+Texts

Sample files

You can get the sample files used in this documentation at tcstart.zip (in zip form)  or individually at www.sd-editions.com/tc.  . 

Logging in

Here is what you see:


Press the inviting "Start" button, and you will be asked to log in by social media, or create a log-in using your email address. If you do the latter, you will be sent an email to that address to confirm your registration. (Note: TC uses email addresses to uniquely identify each user).

Creating or joining a community

When you first log in as a new user, the Start button has changed:



The "Create Community" button brings you to this screen:



The two compulsory fields, "Name" and "Abbreviation", are marked with *. Note the accessibility options: you can hide your community from everyone, or allow anyone to do anything, and many options in between.

Your first document: an XML file

Once you have a community, you need documents! The "Start" button at the centre of the screen has changed again:



Choose "Add Document" and you are offered three choices:


This time, select the "XML file" option. TC likes TEI! Here is a very simple example of a TEI/XML file, optimized for TC use:

<?xml version="1.0" ?> 
<TEI xmlns="http://www.tei-c.org/ns/1.0">
 <teiHeader>
 <fileDesc>
 <titleStmt><title>Fairfax</title></titleStmt>
 <publicationStmt>
 <p>Draft for Textual Communities site (spelling modernized)</p>
 </publicationStmt>
 <sourceDesc><p>Murray McGillivray</p></sourceDesc>
 </fileDesc>
 </teiHeader>
 <text>
  <body>
   <pb n="130r" facs="FF130R.JPG"/>
 <div n="Book of the Duchess">
         <lb/><head n="Title">The book of the Duchesse</head>
         <lb/><l n="1">I Have great wonder/ be this light</l>
         <lb/><l n="2">How that I live/ for day nor night</l>
         <lb/><l n="3">I may nat slepe/ wel nigh nought</l>
         <lb/><l n="4">I have so many/ an idel thought</l>
         <lb/><l n="5">Purely/ for default of sleep</l>
         <lb/><l n="6">That by my truthe/ I take no keep</l>
         <lb/><l n="7">Of no thing/ how it cometh or goth</l>
         <lb/><l n="8">Ne me is no thing/ leief nor loth</l>
         <lb/><l n="9">Al is y like good / to me</l>
         <lb/><l n="10">Joy or sorrow / where so it be</l>
        </div>
   </body>
  </text>
</TEI>

There are a few things to note about this file:
  • "Content" elements with "n" attributes (<l n="1">) are especially important to TC. TC uses these to identify all content sections. Thus: the first line is labelled by TC as "div=Book of the Duchess:l=1", and TC then uses this identifier to locate all versions of the first line in every document
  • Note the explicit use of <lb/> elements to mark each document new line. TC uses the implicit hierarchy of page, column and line breaks (<pb/> <cb/> <lb/>) to construct a "text-tree" for each document, alongside the "text-tree" it creates for the hierarchy of <div> and <l> elements.
TC's understanding, that every text is composed of two distinct text-trees, one for the document (<pb/> <lb/> etc) and one for the act of communication represented in the document (<div>, <l> etc), is what separates TC from other systems for creating scholarly editions.

Adding more documents, adding images

After selecting "XML file" you will get this dialogue:
Choose the file "Fairfax.xml" from the sample files (see above: it is in the zip file at tcstart.zip. Or, get it from Fairfax.xml). Give it the name "Ff" (or similar), and press "Load".
You will receive various encouraging messages, and the window should change to show you the sigil for this manuscript in the left hand pane:
Click on the arrow beside Ff to see the pages in Ff, and then click on the first page. Its transcription will now appear in bottom right pane:
Now, you can add an image to the page. You can do this in several ways:
  • Click on the "Add Image" button in the top-right pane, or the camera icon beside the page number "130r". You will get a box inviting you to choose an image file or drop it onto the dialogue. Choose FF130R.JPG from the sample files, or from FF130R.JPG.
  • You can load multiple images by putting them all in a folder, zipping the folder, and then clicking on the ZIP icon next to the manuscript name. Choose FairfaxImages.zip from the sample files (or here, FairfaxImages.zip).
  • You can load images from a IIIF manifest. See below.
In either case, you will see the image appear in the top right pane. The red camera icon beside each page which now has an image will turn black. If you have all the images for the manuscript, the multiple image icon (two cameras above one another) will also turn black:
Play around with the other icons on this page. Try pressing the "Save" "Preview" and "Commit" buttons, to see what happens. (Note: "Commit" will write the page to the underlying database.)
Add another document by clicking on the + icon in the left hand pane. Again, choose the "XML file" option, this time add "Bodley.xml" from the sample files (or from Bodley.xml), with the name Bd. The image for this page is at BD110V.JPG

Adding a document from an IIIF manifest

One of the most exciting developments in manuscript studies over the last years is the rise of IIIF: the "International Image Interoperability Framework" (http://iiif.io). This has the promise of revolutionising how we look at manuscripts. Digital manuscript images have been around for over twenty-five years now: I actually wrote a book about it, as long ago as 1993: The Digitization of Primary Textual Sources (Office for Humanities Communication). However, for many years high-quality manuscript images were comparatively rare on the internet: to be found in boutique digitization projects, or as prototypes for something which never arrived. Several factors stood (it appeared) in the way of the mass digitization of manuscripts which I (and others) anticipated. One was the cost of the digitization process itself, in terms of special equipment for taking, handling, storing and distributing the images. Another was the high cost of specialist software systems for organizing and displaying the images. A third was the reluctance of libraries to allow high-quality manuscript images to go out on the web, free-to-all. 

The first of these factors (the costs of physical capture and handling of the images) has been eliminated by the extraordinary advances of technology: even our phones now take higher-resolution images than the first digital cameras. The second and third factors is where IIIF has had an extraordinary impact. For the second: IIIF is all three of a standard, a set of software protocols, and the software itself. Thanks to a remarkable community of developers, archive and library staff and many others, IIIF allows institutions with few resources to organize their images, put them online, and have anyone view them. For the third: IIIF, like the web itself, distributes itself everywhere: in libraries, on desktop and tablet computers, on your mobile phone. From its foundation, the ethos of IIIF has been that of the web at its best: good things come from giving good things away. Accordingly, IIIF has open data at its heart. And not just at its heart: in its design. The core of IIIF is a "manifest": a highly-structured file which lists images together with the instructions for their viewing, in such a way that suitable software can read the file and show the images in your web browser just as the maker of the manifest intends. It does not matter where the server which holds the images are: if you have the manifest, you can see the images anywhere.

In the ideal world, there would be digital images of every manuscript, and a IIIF manifest for every set of manuscript images. You would simply point your browser at the manifest for any manuscript you want to see, and hey presto. We are not quite there yet!

However, TC aims to be as IIIF compliant as it can be. All images you see on the TC site are actually held on a IIIF server, and TC uses manifests internally to manage and show the images. Later, TC will make these manifests available, so that you can import the images easily into your own website. In the meantime: TC allows you to add a document from a IIIF manifest.  Thus:

  • Click on the icon in the left panel, to add a document
  • The Add Document dialogue appears, with the choice "IIIF":

     

  • Click on IIIF, and it brings you this dialogue:



  • So, if you follow the hint and go to https://www.e-codices.unifr.ch/en/searchresult/list/one/fmb/cb-0048 you will find this page, on the marvellous eCodices site:



  • Next to the IIIF Drag-n-drop logo you will see the manifest address: https://www.e-codices.unifr.ch/metadata/iiif/fmb-cb-0048/manifest.json
  • Drop or paste that address into the "Add document from IIIF manifest" dialogue, give the document a name, and press the  button

  • Watch magical things happen! the whole manuscript appears page-by-page in the left hand window, with superb high-resolution images. Just like that. Thus:


    So easy! all we need is for all the world's archives to adopt IIIF and allow everyone free access to their images with IIIF. If only...

Collation

The power of Textual Communities may be seen in the Collation system. At the top of the left panel, click the "Collation" tab:
In TC terms, an "entity" is a discrete segment of an act of communication: a line of poetry, a paragraph of prose. Click on the arrow beside "Book of the Duchess" to open up the entities (lines of poetry) within it:
(The order of these may vary.) Now, click on one of these lines. You will get this advice:
So, go to that menu:
Choose a base text (it does not matter which). Now, go back to click on line 1 in the collation. The right hand panel will change, to present the wonderful Collation Editor (developed by Dr Catherine Smith at the Institute for Textual Scholarship and Electronic Editing, University of Birmingham, UK, as part of the AHRC-funded project A Workspace for Collaborative Editing):
(You may need to make the window larger to see the menu at the bottom of the pane). Spend some time playing with this. You can regularize variants (e.g. remove the variant wonder/wondir) by dropping one word on another:
After choosing "Save", you will see that both manuscripts now have the reading "wonder":
Play with the settings menu. You can change how the collation works from this menu:
You will see how the collation changes as these selections change.
This brief introduction gives only a glimpse of the power of the Collation Editor. Try the following, for example:
  1. Go back to one of the documents, change line 1, commit the change (this writes it to the database used by the collation), and return to the collation. You will see your change there.
  2. Now, for fun: go to the second page of Ff (130v) and have line 38 continue from the previous page onto this page and add something to it. Hint: change the "From previous page" value:
Then, commit this change and return to the collation. You will see that line 38 now includes this extra text, across the page break. You can view the XML for this page by clicking on the XML icon beside the manuscript name, to comfirm that the line indeed continues across the page break:

Other facilities

There is a great deal more in TC than this sketch shows. It is particularly rich in community management features, as follows:
  1. You can invite other people to become members of your community (click on the "Members" link when you have chosen your community, or on the "Member profile" item on the log-in menu) and follow the "Invite" link
  2. You can change the status of any member, assign them pages to transcribe, check the progress of the transcription, assign them someone to approve their transcripts (the "Members" link for each community you lead)
  3. You can permit other people to join your community without need of your approval, or require that anyone who wants to join must be approved by you ("Member profile" on the log-in menu)
Further, you can permit anyone to access pages, whole documents, or any part of the text of any document, and import it to their own website.

Copyright, etc.

We encourage anyone contributing materials to TC to make these available under the Creative Commons Attribution (CC-A) license. That is: no share-alike and no "non-commercial" restrictions. This means there should no restrictions at all except requiring all subsequent users of the material to acknowledge your part in making it.
We require that all projects on the production version of TC make all transcripts held on TC available to all without restriction, as above. See Can I set up a community on the production version

Some interesting features of TC

Here, in no particular order, are some aspects of TC which make it unusual, even unique:
  • TC is built on an explicit ontology of texts, documents and works. Various of my publications describe this ontology (see https://www.academia.edu/12297061/Some_principles_for_the_making_of_collaborative_scholarly_editions_in_digital_formhttps://www.academia.edu/9575974/The_Concept_of_the_Work_in_the_Digital_Age_published_version_https://www.academia.edu/3233227/Towards_a_Theory_of_Digital_Editions; https://wiki.usask.ca/pages/viewpage.action?spaceKey=TC&title=Creating+and+Implementing+an+Ontology+of+Documents+and+Texts). Briefly: TC sees text as a collection of leaves, with all leaves present on two distinct trees, each of which conforms precisely to the "OHCO" (ordered hierarchy of content objects) model. One of the trees represents the document (codex/quires/pages/columns/lines). The other tree represents the act of communication ("entity") inscribed in the document: as Play/Scenes/Acts/Lines, or Poem/Stanzas/Lines, etc. Note that this is not simply a matter of "overlapping hierarchies", as usually characterized. It is actually two quite distinct trees: distinct to the point that branches and their leaves might appear with quite different orders on the two trees (as in the case of notes or alterations spanning across the margins of multiple pages, etc.) Broadly, TC uses the 'document' tree to display the document page by page, line by line, and TC uses the 'entity' tree to locate units of text across multiple documents for collation.
  • XML and all the tools associated with it famously supports "one text, one tree". (Long ago, XML's predecessor SGML did attempt to enable multiple trees in any one text through the CONCUR feature. I never did discover a useful implementation of CONCUR.) Over some twenty-five years, I have tried to manipulate the two hierarchies using a variety of tools (most prominently, the Anastasia publishing system). One problem was that for long I thought the problem was simply "overlapping hierarchies", and not the more demanding scenario of two distinct trees. Another problem was the inefficiency of XML tools. Accordingly, while TC uses XML as its standard input format, it creates the two distinct trees from the XML and then stores the two trees not as XML but as a series of JSON documents stored in a MongoDB backend. In essence, the text is a collection of leaves stored in JSON fields, with each leaf also stored in distinct JSON documents representing the two trees. Over the last decade I have attempted to express this model  with three different database systems: first, XML in the form of XML-DB; then SQL in a relational database (underlying the first version of TC, still to be seen at www.textualcommunities.usask.ca), and finally JSON. JSON wins. A key reason for the success of JSON was the requirement that we be able to edit pages in real time: that is, take out a chunk of each tree, rebuild both trees as needed and then reattach the leaves of text to each rebuilt tree, all while the editor watches. Doing this in real time is like gathering leaves in a howling gale. As a bonus, JSON (much more than XML) is the native language of web content, with an immense range of Javascript/HTML tools available to process it.
  • Technically: TC is built in pure javascript, using node.js and npm tools (https://nodejs.org/en/https://www.npmjs.com/), for both server and browser components. This makes maintenance, etc, far easier. TC also uses the Angular framework to provide all interface components (https://angularjs.org/; drawing on the Bootstrap and JQuery libraries). This architecture was designed by Xiaohan Zhang between 2012 (when we realized that the SQL solution would not work) and 2015. All code is freely available on Github, at https://github.com/DigitalResearchCentre/tc-new.
  • Theoretically: there is no limit to the number of trees structuring every text. TC supports two. Best of British luck to whoever wants to deal with more than two.
  • TC uses a IIIF server and viewer software (http://iiif.io/). We can import whole sets of IIIF images.
  • We would like to be obsolete very very soon. Someone please do this better than we did.