For our first XQuery exercise we’ll be working with a special collection of Shakespeare’s plays coded in TEI that are part of our eXist XML database. Because the XML elements in this collection are coded in the TEI namespace, we need to begin by declaring that TEI is our default element namespace (otherwise we will be unable to access the element nodes in the collection). Open eXide, and a new XQuery window, and paste in the following line, all the way to the semicolon, to establish that we are working in the TEI namespace:
declare default element namespace "http://www.tei-c.org/ns/1.0";
You can then access this collection:
collection('/db/apps/shakespeare/data/')
As you work on this it will help you to refer to our XQuery tutorial page to look up how to access files in a collection and see examples of queries. Write XQuery expressions for each of the following tasks using the eXide window in our eXist database, and test them by hitting the Eval
button. Then paste your XQuery expressions into a text file, adding comments as needed. You will be submitting your text file to Courseweb.
titleStmt
). The simplest answer is a single XPath expression starting with the collection function and descending to the nodes you want. The output should look something like:
1 <title xmlns="http://www.tei-c.org/ns/1.0">Love's Labour's Lost</title> 2 <title xmlns="http://www.tei-c.org/ns/1.0">Macbeth</title> 3 <title xmlns="http://www.tei-c.org/ns/1.0">A Lover's Complaint</title> 4 <title xmlns="http://www.tei-c.org/ns/1.0">Pericles, Prince of Tyre</title> 5 <title xmlns="http://www.tei-c.org/ns/1.0">Cymbeline</title> 6 <title xmlns="http://www.tei-c.org/ns/1.0">Romeo and Juliet</title> 7 <title xmlns="http://www.tei-c.org/ns/1.0">All's Well That Ends Well</title> ...
text()
or data()
or string()
. Your output should look something like:
1 Love's Labour's Lost 2 Macbeth 3 A Lover's Complaint 4 Pericles, Prince of Tyre 5 Cymbeline 6 Romeo and Juliet 7 All's Well That Ends Well
TEI
of each play. Notice how you can page through the results using the arrows on top of the return window in eXide. We want to be able to isolate specific plays with interesting features, and to do that we will write filters on the root elements of each one.<sp who="ID"><speaker>Name</speaker> text of the speech</sp>Write an expression that locates a play holding a speaker named Ferdinand. Which play is it? Record your expression.
count()
and distinct-values()
, and you’ll need a construction involving
a count(of something) greater than 58 .
Starting from the collection, drill down to the <TEI>
elements in the collection (you know there are 42 TEI root elements—one for each play), then filter them based
on whether or not they contain more than 58 distinct speakers. You will need to tinker a little to make a filter based on getting a count()
of distinct values()
, either of @who attribute on sp
or of the contents of speaker
elements (that is up to you; either will sp/@who
or speaker
will work for our purposes). And you want to find out if that count()
is greater than 58. Once you’re retreiving the three plays that meet that description, you can add
path steps to retrieve just the main titles of those three plays.
<title>
tags. You can take the same approach that you did for the transition from question #1 to question #2.base-uri()
function can be useful. Try appending base-uri()
to your XQuery expression and run it: What result do you see in the output window, and what is it telling you? /
) in the preceding results of base-uri()
? How could we remove the previous string of text in our output? We would use the tokenize()
function (which you can look up on at the w3schools list of XPath functions or in the Michael Kay book). That function breaks apart a string of text by dividing it at a particular regex pattern, and in this case the pattern is the forward slash. The tokenize()
function returns tokens or broken-off pieces of a string: each chunk before and after the regex you enter. In order to isolate just the piece we want, we can identify the pieces by their position in the sequence of broken pieces: is it the first token, the second, the third, or the last one, whatever it is? To retrieve the first token, after you run the tokenize function, you can place a predicate holding the position value: [1]
, [2]
, etc. To retrieve the last item in a series, without knowing its numerical position, you can use the last()
function (which you can read about in the same resources we mentioned above or in The Xpath functions we use the most). Note that nothing goes inside the parentheses in
last()
. With this information, then, how would you write your XQuery to return just the last part of the results of the base-uri()
function, the part that appears after the last forward slash character? Record your expression. When you have completed the assignment copy and paste your expressions into a text file. Upload your text file containing your XQuery expressions to Courseweb.