Advanced Topics in Humanities Programming with Python
Digital humanists use digital methods to help them answer research questions in the humanities. The vast majority of digital humanists rely on existing tools to perform the necessary digital operations on their data. We find this, however, to be both severely limiting for the scholar and harmful for the discipline of digital humanities. In this course, we will explore one avenue to DH liberation and empowerment: writing your own tools. The participants will use the scripting language Python to explore their data using simple yet powerful scripts to harness the potential of existing algorithms and techniques. The first week will be spent learning to apply the most popular Python machine-learning package sklearn to tasks of categorization, clustering, and recognition. The second week will focus on using the numpy package to represent large textual corpora as matrices, on extracting semantic information from these corpora, and then on comparing this information between and among corpora.
Both parts of this workshop can be taken independently and both assume a familiarity with Python or some similar object-oriented programming language. In this course, there will be no introduction to the basics of Python such as object types, loops, and conditions. Instead, the participants will immediately start learning the theoretical basics of the methods and applying them by writing their own scripts for their own data.
Week 1: Machine Learning
Day 1: What is machine learning?
Day 2: Supervised techniques and data classification
Day 3: Unsupervised techniques and data clustering
Day 4: “Deep Learning” and data recognition
Day 5: Deeper Learning
Week 2: Automatic Semantic Information Extraction
Day 1: Crash course in distributional semantics, matrices, and linear algebra
Day 2: Calculating and measuring co-occurrence
Day 3: Comparing words to each other
Day 4: Comparing corpora to each other
Day 5: Exploring the possibilities!
- Important dates
- Child care
- Scientific Committee