Spring 2023 Syllabus (Schedule)
Classes meet M W F 12:20 - 1:10pm in Witkowski 109.
This contains a detailed explanation of course policies and the basis for grades.
This link jumps to the closest day to today's date. Review the schedule as we get
started to get a sense of how this course will work on a daily basis.
All the Tools You Need As We Begin:
Download and install the following software on your own personal computer(s) on or
before the first day of class. These software tools are available in our campus computing
labs, too.
- <oXygen/>.
(You will probably have this installed from DIGIT 100 or 110.) The DIGIT program has purchased a site license for this software, which
is installed in Kochel 77, the Lilley Library computers, and Witkowski 109. The license also permits
students enrolled in the
course to install the software on their home computers (for course-related use
only). When installing this on your own computers, you will need the
license key, which we have posted on our course Announcements section of
Canvas.
- AntConc: (You may have this installed from DIGIT 100.)
Free corpus text analysis tool.
- We will ask you to install Python version 3.8 or
higher on your computer, and install PyCharm Edu to assist in learning and
writing Python code with syntax checking. Follow instructions and links from
Pycharm ( https://www.jetbrains.com/help/pycharm/quick-start-guide.html#meet ) paying attention to what you need for your own computer systems.
Feel free to download and explore Pycharm Edu on your own before we start
working with it together: https://www.jetbrains.com/pycharm-edu/. Also, configure
Anaconda so it is available to work within Pycharm following this guide: https://www.jetbrains.com/help/pycharm/conda-support-creating-conda-virtual-environment.html. (We will provide guidance on this in class.)
- Zoom: Make sure your Zoom installation is up-to-date, and you are ready to
connect. Sometimes we will record portions of class meetings and tutorial sessions for future reference to share over Zoom. Look for these in Canvas Announcements and use the Zoom menu option in Canvas to access these meetings.
- We will use GitHub for for sharing code and for project management. Create an account (choose the free options) at the https://github.com and install the GitHub client software for your operating system on your own machine on your computer. (We will explain how to use git and GitHub this in our course.)
- We will use the Slack chat platform for discussion and for asking questions (see https://slack.com/help/articles/218080037-Getting-started-for-new-members). Download and install the Slack client, configuring your account to use use your Penn State email address (the official address, which looks like xyz123@psu.edu, and not an alias based on your name that you may have set up), so you can join our Slack workspace: DIGIT-coders. When you receive an invitation to join this workspace you should accept.
- Later in the semester we may ask you to install a local copy of the eXist-db
XML database, which you can download from https://exist-db.org/.
- No coding experience? Don’t worry! Past students
in this course
who never saw anything like markup or XML code have designed projects (like these) and even spoken about them at academic conferences! You will learn to develop
your own digital tools and how to manage digital projects as teamwork.
Class Web Resources:
Week 1 | Class topics | Do before class |
---|
M 1-09 |
Welcome! Intro to the course, and the era of AI, Chat-GPT, and natural
language processing. Join class Hypothes.is anotation group. Class exercise with Chat-GPT. |
.... |
W 1-11 |
Orientation to Natural Language Processing and how AI works. Voyant and Antconc review. |
- Join our DIGIT Slack Group and Hypothes.is group (if you did not already)
- ChatGPT and Git Review Exercise 1: Organize a directory in your GitHub repo (or create a new GitHub repo + directory for this):
Curate your experimental prompts and responses as text or XML files in your GitHub repo.
|
F 1-13 |
Class protocols for handling code files: GitHub and version controlled file management. Making a branch on the textAnalysis-Hub. Review adding, pulling, adding,
committing, and pushing.
|
|
Week 2 | Class topics | Do before class |
---|
M 1-16 |
Martin Luther King Day: No classes. |
.... |
W 1-18 |
Discussion of word embeddings, word math , and AI. How we play with mathy AI and NLP in digital arts and humanities.
Prep for Python unit: Orientation to Pycharm’s IDE and the Pycharm Edu tutorial
|
|
F 1-20 |
Pycharm Edu tutorial work together. Manipulating strings wtih Python, and Pythonic data structures (lists, tuples, dictionaries).
|
- ChatGPT + Git Exercise 2: Push this assignment to your branch of the textAnalysis-Hub repository and issue a pull request assigned to Dr. B (ebeshero).
- Pycharm Edu tutorials: through Strings (submit evidence of completion via screen capture on Canvas).
|
Week 3 | Class topics | Do before class |
---|
M 1-23 |
Pycharm Edu tutorial review. Python environments: Pycharm vs. Jupyter notebooks:
Introduce Google CoLab (Jupyter) Notebook via Google Colab Notebook cellblocks in Tutorial: Exploring Gender Bias in Word Emedding.
|
- Read and annotate in our class Hypothes.is group:
- Pycharm Edu Community tutorials: Complete the Tutorial through the Condition expressions Unit (submit evidence of completion via screen capture on Canvas).
- Review git branching, catch up/fix git homeworks if necessary.
|
W 1-25 |
- Ethical issues and research paths with AI, NLP, word embeddings. Where do we find arts and humanities in computational processing?
Python: Working with libraries (modules and packages), saving and executing files.
- Initiate Five Days of Command Line GitHub Test.
|
|
F 1-27 |
Writing your own Python: Getting started with Natural Language Processing (NLP) with Python:
installations/imports: nltk, spaCy, gensim
|
- Finish Pycharm Edu Intro to Python tutorials: Classes and
objects, Modules and packages, File input and output. Submit evidence of completion via screen capture on Canvas.
|
Week 4 | Class topics | Do before class |
---|
M 1-30 |
- Git branching and pull requests. GitHub markdown.
- Python file imports and exports. Reading file collections.
|
- Python NLP Exercise 1
- (By the end of the day): Five Days of Git: Part 1: Record completion on Canvas as part of GitHub Test
|
W 2-01 |
spaCy's word similarity calculations. NLP and large language models, vs. customized, specialized modeling.
|
- Five Days of Git: Part 2: Record completion on Canvas as part of GitHub Test
|
F 2-03 |
Reviewing Python NLP word similarity.
|
- Python NLP Exercise 2: comparing a set of files using word vector data: (words similar to a word of interest)
- Five Days of Git: Part 3: Record completion on Canvas as part of GitHub Test
|
Week 5 | Class topics | Do before class |
---|
M 2-06 |
Review of GitHub Markdown. Introducing Python web scraping with Beautiful Soup.
Annotating vectors: the case for XML. Orientation to / Review of XML as document annotation and information architecture.
|
|
W 2-08 |
Python: Introducing Topic Modeling with nltk (LDA)
Discussion of XML recipe code. XML structure and annotation: working with attributes |
- Python NLP Exercise 3: Build a web scraper to collect text files.
- Five Days of Git: Part 5: Record completion on Canvas to finish the GitHub Test.
|
F 2-10 |
- Topic Modeling code and visualization
- Semester Project Possibilities: ideas, sources, teamwork expectations: discussion
|
Python NLP Exercise 4: Experiment with topic modeling (LDA)
|
Week 6 | Class topics | Do before class |
---|
M 2-13 |
Python Topic Modeling Visualization libraries, and NLP review/preview. Semester projects.
|
|
W 2-15
|
Semester project discussion. XML for document data structures. Data frames, containers, and cross-walking data formats.
See XML chapter from a Corpus Linguistics course
|
|
F 2-17 |
- XML discussion. Schema validation review for 110 students
- Form Project Teams (today or Monday 2/20)
|
- XML Exercise 1
- Project Milestone 1: Launch the project GitHub repo and invite your teammates and me to join (using Settings > Manage settings). Launch Slack channel for project and invite teammates and Dr. B. Post (in your Slack project thread or on GitHub) your available meeting times to help determine a regular meeting time for your group.
|
Week 7 | Class topics | Do before class |
---|
M 2-20 |
- Structuring and regularizing data from documents with markup.
- Introduce document analysis with Regular Expressions: the dot, the backslash, numbers (
\d , repetition indicators, matching on lines, and autotagging. Greedy and non-greedy matching.
- Preview Intro to Regular Expressions
- Choosing a license for your project GitHub repo.
|
- XML and/or RNG Exercise 2
|
W 2-22 |
- Regular Expressions: Thinking (and writing) in markdown, algorithmically.
The fine art of
Looking Stuff Up in the Regex tutorials:
Character sets, symbols, capturing groups.
|
- XML Exercise 3
- Watch Regex Orientation Videos:
|
F 2-24 |
Reviewing and debugging regex search and replace. Character set matches. Writing clear markdown documentation.
|
|
Week 8 | Class topics | Do before class |
---|
M 2-27 |
Regex greedy and non-greedy matches.
Dr B is guest-speaking at Pitt in David Birnbaum's Digital Humanities class on network analysis. |
|
W 3-01 |
- Regex debugging
- HTML review. Relationship to / difference from XML. Setting up docs/ directory for GitHub Pages.
- GitHub Project Management tools and markdown.
- Review of GitHub Pages, coordinating web work for project milestone
|
|
F 3-03 |
|
- Regex Exercise 4
- Project Milestone 2 due:
- Create a file directory structure for the project GitHub repo(s): Initiate the project website within the docs directory with an index.html page and some CSS. Consult with your team and Dr. B to decide on a place to work on the text files (in its own directory, or in a separate private repo?) and create that space. Create a directory for XML files. Begin populating those file directories (even with placeholder Readme.md files to describe what belongs where).
- Assemble the text files you want to work with on the project. As a team, work on document analysis to plan for how you want these to be marked for structure. What XML structure do you want to use to contain meaningful units of text data? Aim for a clear, simple structure that distinguishes the kind of info you want to be able to track.
|
M 3-06 - F 3-10 |
Spring Break |
Enjoy this week! |
Week 9 | Class topics | Do before class |
---|
M 3-13 |
- Launch Take-home Regex Test (due next Monday)
- Start XPath and XQuery in oXygen, and in eXist-dB: simple functions and sequences. Exploring XML through child and descendant axes. Predicate
filters.
- Start XPath Orientation Exercise together
- Access newtfire eXist-dB and find the eXide window.
|
|
W 3-15 |
- XPath predicates
[ ] as filters.
- Awareness of sequences: An XPath sequence can be zero, one, or more results (either XML nodes or information about them)
- XQuery and writing a FLWOR over a sequence
|
|
F 3-17 |
- XQuery: Writing FLWOR statements and outputting HTML lists and
tables
- Outputting files and saving them to the eXist-db database for
previewing
- XQuery online and offline: in eXist and in <oXygen/>
|
|
Week 10 | Class topics | Do before class |
---|
M 3-20 |
- What you can count and measure with XPath in XQuery
- Saving and Accessing files in the Newtfire eXist-db: set up individual
and team project directories.
- Test logging in to newtfire eXist-dB
|
|
W 3-22 |
- XQuery
for statements: singling out each member of a sequence of XML nodes, as well as values off the tree .
- XQuery from eXist to Web: Writing HTML output from eXist-dB
|
|
F 3-24 |
XQuery to HTML. Working with eXist-dB outputs. |
- XQuery Exercise 4: Storing data in HTML
- End of the day: Project Milestone 4 due: exploring project files with XQuery
|
Week 11 | Class topics | Do before class |
---|
M 3-27 |
Revisit Python: Named Entity Recognition in Project XML Data |
|
W 3-29 |
Python Named Entity Recognition in Project XML Data continued. Saxon C library for reading XPath from Python.
Applying Regular Expressions for cleaning input. Outputting data to files, autotagging XML from Python |
- Python NER Exercise
- Extra Credit Opportunity: Attend the World-Wide Climate Justice Teach-In from 5:30 - 8pm in Reed Bldg., or come at least at 7pm for the Technology panel (where Dr. B is presenting).
|
F 3-31 |
Python Named Entity Recognition in Project XML Data: Refining input strings
|
|
Week 12 | Class topics | Do before class |
---|
M 4-03 |
Launch take-home Python and Documentation Test
Introducing Network Analysis for projects: Modeling relationships in project data.
XQuery prep for NLP and Network Analysis:
- oXygen XML Editor for writing XQuery on large collections.
- Output formats to save: the CSV / TSV file.
|
Work on Project Milestone 5
|
W 4-05 |
Network analysis: orientation to network statistics. Reading in data for Cytoscape and working with network visualization parameters
|
End of Day:
- XQuery to Network Analysis: Exercise 1 prepare a TSV from class today
- Project Milestone 5 due:
|
F 4-07 |
Network Analysis, continued: XQuery to Network Analysis: Refining the visualization, outputting SVG.
Selecting/querying the network,
creating sub-networks. Working with output files on your website.
|
Network Analysis Exercise 2
|
Week 13 | Class topics | Do before class |
---|
M 4-10 |
Network Analysis: refining visualizations.
|
Complete take-home Python and Documentation Test (end of the day).
|
W 4-12 |
Catch-up / Project Workday |
|
F 4-14 |
XML that makes graphics: SVG (Scalable Vector Graphics). Drawing elements,
and screen grid coordinates.
|
- Project Milestone (due end of day)
|
Week 14 | Class topics | Do before class |
---|
M 4-17 |
Write XQuery to output SVG: Pulling data for visualizing, and plotting graphs using FLWOR statements. Namespace issues.
Writing variables for plotting, labelling, scaling, colors.
|
XQuery test due |
W 4-19 |
- XQuery to SVG Development
- Launch take-home XQuery test
|
|
F 4-21 |
XQuery to SVG: exploring possibilities with project data |
- Project Milestone due: Visualization and Documentation Development
|
Week 15 | Class topics | Do before class |
M 4-24 |
Putting it all together: Discussion, analysis, documentation, web work. Ethics in public-facing digital data representation. |
Project development sprint, prep for DIGIT Works presentation |
W 4-26 |
Ethical representation: What the data does not say, documenting what's missing. Thinking about user experience, range of audiences. |
Project development sprint, prep for DIGIT Works presentation |
F 4-28 |
Last Day! Project Milestone: Teams deliver DIGIT Works presentations |
Prep for presentations |
Finals Week: May 1 - 5 |
To Complete |
H 5-05
|
Semester projects due by 11:59pm
Finish developing projects, and send a post to me on GitHub and Canvas to indicate your team is finished.
|