Last modified: Thursday, 28-Feb-2019 00:08:30 UTC. Maintained by: Elisa E. Beshero-Bondar (eeb4 at psu.edu). Powered by firebellies.

XQuery Exercise 1

For our first XQuery exercise we’ll be working with a special collection of Shakespeare’s plays coded in TEI that are part of our eXist XML database. Because the XML elements in this collection are coded in the TEI namespace, we need to begin by declaring that TEI is our default element namespace (otherwise we will be unable to access the element nodes in the collection). Open eXide, and a new XQuery window, and paste in the following line, all the way to the semicolon, to establish that we are working in the TEI namespace:

declare default element namespace "http://www.tei-c.org/ns/1.0";

You can then access this collection:

collection('/db/apps/shakespeare/data/')

As you work on this it will help you to refer to our XQuery tutorial page to look up how to access files in a collection and see examples of queries. Write XQuery expressions for each of the following tasks using the eXide window in our eXist database, and test them by hitting the Eval button. Then paste your XQuery expressions into a text file, adding comments as needed. You will be submitting your text file to Courseweb.

Find all of the main titles of each of the 42 Shakespeare plays in the collection, by stepping down the descendant axis from the collection. You will need to look at the TEI code of the collection first to see where the main titles are (hint: the play’s main title is coded near the top of the file in a special element called the titleStmt). The simplest answer is a single XPath expression starting with the collection function and descending to the nodes you want. The output should look something like:

1
<title xmlns="http://www.tei-c.org/ns/1.0">Love's Labour's Lost</title>
2
<title xmlns="http://www.tei-c.org/ns/1.0">Macbeth</title>
3
<title xmlns="http://www.tei-c.org/ns/1.0">A Lover's Complaint</title>
4
<title xmlns="http://www.tei-c.org/ns/1.0">Pericles, Prince of Tyre</title>
5
<title xmlns="http://www.tei-c.org/ns/1.0">Cymbeline</title>
6
<title xmlns="http://www.tei-c.org/ns/1.0">Romeo and Juliet</title>
7
<title xmlns="http://www.tei-c.org/ns/1.0">All's Well That Ends Well</title>
...

Modify your XPath above to return just the text of the titles, without the tags. You can do that by using text() or data() or string() . Your output should look something like:
```
1
Love's Labour's Lost
2
Macbeth
3
A Lover's Complaint
4
Pericles, Prince of Tyre
5
Cymbeline
6
Romeo and Juliet
7
All's Well That Ends Well
            
```
Write an XPath expression that isolates the root element TEI of each play. Notice how you can page through the results using the arrows on top of the return window in eXide. We want to be able to isolate specific plays with interesting features, and to do that we will write filters on the root elements of each one.
Speeches are coded in the Shakespeare plays like this:
```
<sp who="ID"><speaker>Name</speaker> text of the speech</sp>
```
Write an expression that locates a play holding a speaker named Ferdinand. Which play is it? Record your expression.
Modify your expression to return only the main title of that play, and record your expression.
Now, let’s see if we can find three very special plays that contains a count of more than 58 unique (distinct) speakers! First, see if you can find the play, and then return only its main title (recalling the code you wrote previously). You will need to use count() and distinct-values(), and you’ll need a construction involving a count(of something) greater than 58 .
Starting from the collection, drill down to the <TEI> elements in the collection (you know there are 42 TEI root elements—one for each play), then filter them based on whether or not they contain more than 58 distinct speakers. You will need to tinker a little to make a filter based on getting a count() of distinct values(), either of @who attribute on sp or of the contents of speaker elements (that is up to you; either will sp/@who or speaker will work for our purposes). And you want to find out if that count() is greater than 58. Once you’re retreiving the three plays that meet that description, you can add path steps to retrieve just the main titles of those three plays.
- Modify your solution to the preceding question to return just the text of the three play titles, without the <title> tags. You can take the same approach that you did for the transition from question #1 to question #2.
- When retrieving a single file from a collection, the base-uri() function can be useful. Try appending base-uri() to your XQuery expression and run it: What result do you see in the output window, and what is it telling you?
- What if we wanted only to return the file name with its file extension after the last forward slash (/) in the preceding results of base-uri()? How could we remove the previous string of text in our output? We would use the tokenize() function (which you can look up on at the w3schools list of XPath functions or in the Michael Kay book). That function breaks apart a string of text by dividing it at a particular regex pattern, and in this case the pattern is the forward slash. The tokenize() function returns tokens or broken-off pieces of a string: each chunk before and after the regex you enter. In order to isolate just the piece we want, we can identify the pieces by their position in the sequence of broken pieces: is it the first token, the second, the third, or the last one, whatever it is? To retrieve the first token, after you run the tokenize function, you can place a predicate holding the position value: [1], [2], etc. To retrieve the last item in a series, without knowing its numerical position, you can use the last() function (which you can read about in the same resources we mentioned above or in The Xpath functions we use the most). Note that nothing goes inside the parentheses in last(). With this information, then, how would you write your XQuery to return just the last part of the results of the base-uri() function, the part that appears after the last forward slash character? Record your expression.
FLWOR Statement or XPath expression?: Did you write your XQuery for the play with the count of more than 58 distinct speakers with a long XPath expression (from left to right)? Or did you write it up as a FLWOR statement? (Review our tutorial for details and examples on writing FLWOR statements using variables.) Whichever way you chose to write your XQuery in the previous steps, try the other way and see if you can duplicate your results. Record your XQuery expressions in your text file.

When you have completed the assignment copy and paste your expressions into a text file. Upload your text file containing your XQuery expressions to Courseweb.