Workshop on Insights from Negative Results in NLP
Thursday, May 26, 2022
8:45–9:00 Opening remarks
9:00–10:00 Invited talk: Barbara Plank (LMU Munich)
Off the Beaten Track: On Serendipity and Turning “Failures” into Signal VIDEO
In this talk, I’ll first reflect upon the research process in current NLP and discuss how the principle of serendipity can play an important role in the design of research projects. In the second part, I will provide a series of examples to illustrate how something perceived as “noise” can yield research opportunities. These include leveraging fortuitous data like meta-data for low-resource NLP, human disagreement in labelling, and I will also provide some puzzling results on an understudied BERT detail.
10:00–10:30 Thematic Session 1: Improving Evaluation Practices
10:30–11:30 Coffee Break
11:30–12:00 Thematic Session 2: Transformers
- How Much Do Modifications to Transformer Language Models Affect Their Ability to Learn Linguistic Knowledge?
Simeng Sun, Brian Dillon and Mohit Iyyer [PDF], [Video]- Pathologies of Pre-trained Language Models in Few-shot Fine-tuning
Hanjie Chen, Guoqing Zheng, Ahmed Hassan Awadallah and Yangfeng Ji [PDF], [Video]- On Isotropy Calibration of Transformer Models
Yue Ding, Karolis Martinkus, Damian Pascual, Simon Clematide and RogerWattenhofer [PDF], [Video]
12:00–12:30 Thematic Session 3: Towards Better Data
- Do Data-based Curricula Work?
Maxim K. Surkov, Vladislav D. Mosin and Ivan P. Yamshchikov [PDF]- Clustering Examples in Multi-Dataset Benchmarks with Item Response Theory
Pedro Rodriguez, Phu Mon Htut, John P. Lalor and Jo˜ao Sedoc [PDF]- On the Impact of Data Augmentation on Downstream Performance in Natural Language Processing
Itsuki Okimura, Machel Reid, Makoto Kawano and Yutaka Matsuo [PDF], [Video]
12:30–14:00 Lunch
14:00–15:00 Panel Discussion: How Bad are Annotation Disagreements, Really? VIDEO
Panelists: Margot Mieskes (University of Applied Sciences, Darmstadt), Barbara Plank (LMU Munich), Massimo Poesio (Queen Mary University of London), Bonnie Webber (University of Edinburgh)
Moderator: Anna Rogers (University of Copenhagen)
15:00–15:30 Coffee Break
15:30–16:00 Thematic Session 4: Linguistically Informed Analysis
- Do Dependency Relations Help in the Task of Stance Detection?
Alessandra Teresa Cignarella, Cristina Bosco and Paolo Rosso [PDF], [Video]- BPE beyond Word Boundary: How NOT to use Multi Word Expressions in Neural Machine Translation
Dipesh Kumar and Avijit Thawani [PDF], [Video]- Challenges in including extra-linguistic context in pre-trained language models
Ionut Teodor Sorodoc, Laura Aina and Gemma Boleda [PDF], [Video]
16:00–17:00 Invited talk: Tal Linzen (NYU)
Sensitivity to Initial Weights in Out-of-distribution Generalization VIDEO
The results of experiments that involve training neural networks can be sensitive to the networks’ initial weights. In this talk I will review work from my group and others that shows that such sensitivity can be quite dramatic when the network is evaluated on its out-of-distribution generalization accuracy, as is typically the case with the challenge datasets popular in the “interpretability” community. In one experiment, when we fine-tuned BERT 100 times on the same dataset, in-distribution test set accuracy was reasonably stable, but out-of-distribution behavior differed qualitatively across runs. The recent MultiBERTs project, where BERT was retrained 25 times, demonstrates that this variability persists across pretrained models as well. This variability makes it harder to interpret the results on a single fine-tuning run of a challenge dataset, and highlights a potentially underappreciated consequence of neural networks’ weak inductive biases.
17:00–18:00 Poster Session
- Evaluating the Practical Utility of Confidence-score based Techniques for Unsupervised Open-world Classification
Sopan Khosla, Rashmi Gangadharaiah [PDF], [Video]- Extending the Scope of Out-of-Domain: Examining QA models in multiple subdomains
Chenyang Lyu, Jennifer Foster, Yvette Graham [PDF], [Video]- What Do You Get When You Cross Beam Search with Nucleus Sampling?
Uri Shaham, Omer Levy [PDF], [Video]- Cross-lingual Inflection as a Data Augmentation Method for Parsing
Alberto Muñoz-Ortiz, Carlos Gómez-Rodríguez, David Vilares [PDF], [Video]- Is BERT Robust to Label Noise? A Study on Learning with Noisy Labels in Text Classification
Dawei Zhu, Michael Hedderich, Fangzhou Zhai, David Adelani, Dietrich Klakow [PDF], [Video]- Ancestor-to-Creole Transfer is Not a Walk in the Park
Heather Lent, Emanuele Bugliarello, Anders Søgaard [PDF]- What GPT Knows About Who is Who
Xiaohan Yang, Eduardo Peynetti, Vasco Meerman, Chris Tanner [PDF], [Video]- Evaluating Biomedical Word Embeddings for Vocabulary Alignment at Scale in the UMLS Metathesaurus Using Siamese Networks
Goonmeet Bajaj, Vinh Nguyen, Thilini Wijesiriwardene, Hong Yung Yip, Vishesh Javangula, Amit Sheth, Srinivasan Parthasarathy, Olivier Bodenreider [PDF]- Can Question Rewriting Help Conversational Question Answering?
Etsuko Ishii, Yan Xu, Samuel Cahyawijaya, Bryan Wilie [PDF], [Video]- The Document Vectors Using Cosine Similarity Revisited
Zhang Bingyu, Nikolay Arefyev [PDF], [Video]- Label Errors in BANKING77
Cecilia Ying, Stephen Thomas [PDF], [Video]- An Empirical study to understand the Compositional Prowess of Neural Dialog Models
Vinayshekhar Kumar, Vaibhav Kumar, Mukul Bhutani, Alexander Rudnicky [PDF], [Video]- Combining Extraction and Generation for Constructing Belief-Consequence Causal Links
Maria Alexeeva, Allegra A. Beal Cohen, Mihai Surdeanu [PDF], [Video]- Pre-trained language models evaluating themselves - A comparative study
Philipp Koch, Matthias Aßenmacher, Christian Heumann [PDF], [Video]
18:00–18:10 Closing Remarks