WHITL?: Computational Linguistics

Showing posts with label Computational Linguistics. Show all posts

August 28, 2024

AI-Driven Hype in Classrooms: Navigating Ethical Issues - Presented!

During the summer of 2023, Lex Konnelly and Nathan Sanders presented on AI "hype" in classrooms to help instructors address issues bubbling to the surface as ChatGPT's range broadens with each question it is asked by some unsuspecting student.

Addressing ethical and pedagogical considerations for AI-driven text generation in classrooms, particularly of linguistics, they presented a foray into the ever-changing landscape evolving at a rate "faster than scholars can publish work on them" (Sanders).

Though some faculty members with whom the researchers partnered focused on ways they could Chat-GPT-proof their assessments, others were interested in integrating such tools into their classwork.

Importantly, the researchers' approach is not punitive, but rather, constructive - an approach to merging of AI tools with educational models which will benefit not only morale in the classroom, but student media and technology literacy in a world rapidly going wireless.

Perhaps today's students can benefit from learning how to hack tools such as chatbots to maximize their potential for learning.

Perhaps future integration of artificial intelligence brings with it the potential of a rapid decline or even total erasure of the capacity to learn hard skills.

Regardless, the researchers' position is that the fields of linguistics and artificial intelligence are necessarily intertwined.

Due to tools' like Chat-GPT's reliance on large language models, students of both linguistics and computer science, or even artificial intelligence engineering, have much to gain by probing the threads linking their interests to each other, potentially by exploring something like the groundbreaking focus on Computational Linguistics offered by UofT.

Presented by Konnelly at the 2024 Annual Meeting of the Linguistic Society of America in January, this work will be published soon as a proceedings paper.

As Sanders and Konnelly will be the first to tell you, by the time this post goes live, this information may be obsolete.

We at WHITL don't see this as a reason not to comment, but as an opportunity to mark our ideas on an AI-timeline quickly extending into the future, and an exciting chance to engage with all students across UofT.

AI in the classroom poses all kinds of ethical questions for students and professors, and raises new questions every time it is used. This work gives us some interesting and thought-provoking ways of dealing with technology which passes the Turing Test daily, and (usually) with flying colours at that.

August 7, 2024

Linguistics Spotlight: Myrto Grigoroglou

WHITL Blog just got caught up with Myrto Grigoroglou’s many recent publications, coming out of 3 streams of research foci in just as many languages!

Speaking to Grigoroglou about her research these past few months, many incredible projects come to light. A more comprehensive list can be found linked below.

Read along to discover the impressive work being done by the Assistant Professor and her team.

Communication/development of pragmatic abilities: How children and adjust language to informational needs of listeners

In the field of linguistics and cognitive science, researchers are concerned with the different ways in which people describe events to a third party who cannot see them occurring.

Grigoroglou and her team probe this research, manipulating their experimental setup to get as much information out of the speaker as possible. Compiling this data, the team created a database comparing not only cross-linguistic data, but multi-modal signaling information.

For example, in Turkish, considered to be a more gestured language than, say, English, speakers were found to gesture much more when presented with a familiar listener.

Furthermore, the developmental component of this study, taking place in a naturalistic setting, is a novel one. Previously, studies mainly looked at the role of the addressee to examine how their involvement changed the speakers’ descriptions of events.

This was Grigoroglou's launching pad as well: they started with the assumption that visual perception of a listener would affect a speaker's informativeness. Though this was enough to increase responsiveness in adults, they didn’t see meaningful changes with children. Having a naïve listener, however, did affect communication.

In a step-wise approach, researchers manipulated the role of the listener, giving them more responsibility, a clear goal, and eventually interaction with the speaker. The most helpful listeners are, it turns out, naïve, familiar to the speaker, visible, and engaged.

Though interaction increases how informative the speaker will be, this goes both ways. Researchers found that even an assumption that the listener was distracted was enough to decrease the amount of information they were willing to give.

Being told by researchers that your listener is not paying attention, even if given visual or oral evidence to the contrary, is enough to shut down communication for most people.

Communication factors in use of spatial languages: Language to describe space

This branch of Grigoroglou's work studies language used to describe space, words such as in/on. The team realized that when making static descriptions, people don't use out/off as much as they use motion descriptions. For example, instead of "the cat was off the rug," people might say "the cat was next to the rug" - a description more closely aligned with motion.

Existing semantic theories say that these PPs (prepositional phrases) are ambiguous, and that motion PPs lie between static and motion prepositions. Analysis in the field currently says that out/off are infelicitous, needing context in static descriptions.

Grigoroglou and her team offer an alternative account: these are negative prepositions and have a negative meaning regardless of whether they are used in motion or static descriptions.

For something to come out of or off of means that means it was once in/on, and necessitates movement. Thus, in/out are complimentary antonyms existing in a relationship of entailment.

However, one of the most exciting aspects of this interview was discovering that, due to the similar spatial patterns of Turkish, French, and Greek, Grigoroglou's research results have universal validity.

Acquisition of logical language/logical cognition: How children acquire conditionals

While working as a postdoctoral researcher studying the expression of hypothetical language in children aged 3-6, Grigoroglou was meeting in person with the participants and their parents to gauge how information is presented to familiar vs. non-familiar listeners. However, when the pandemic hit, this research was moved online.

Trying to connect through Zoom, a new issue presented itself: a misalignment between the perspectives of speaker and listener, posing a newer, more intense need for information. Online data collection in this field reflects entirely different results than would appear in person.

Now, Grigoroglou is collecting child data from Turkish studies. This labor-intensive project, involving hundreds of hours of coding, is being completed by graduate students in Türkiye.

This procedure for measuring gesturing in Turkish involves multiple steps. First, researchers segment speech into clauses. Next, they align the gestures with the speech segment using coding software called ELAN, which enables them to transcribe speech for clauses – for example, using 1 to code the presence of an instrument, and 0 to code its absence. Next, they look at and categorize gesture. This is partially done using a code book to standardize coding of otherwise hard-to-quantify movements.

Plans for a second study in English, taking place in person, will manipulate knowledge of listener to see how that affects gesture.

Looking Forward

Myrto Grigoroglou and her team have much to look forward to as they prepare for the 2024-2025 academic year, and all that it will entail in the many fields of research in which they are conducting continuously impressive work.

To read more about her work, see the three pieces discussed in this article linked below, or her profile on the University of Toronto's Discover Research page.

Multimodal Description of Instrument Events in Turkish and English

Metaphor Comprehension in Preschoolers: A Pragmatic Skill

Logical language and the development of reasoning by the disjunctive syllogism

June 24, 2024

UTM Faculty Dr. Samantha Jackson and Derek Denis Publish their Research into Accent-Based Biases in the GTA

Postdoctoral Fellow of Language Studies at UTM, Dr. Samantha Jackson, and Associate Professor of Linguistics, Derek Denis, have recently published their work titled What I say, or how I say it? Ethnic accents and hiring evaluations in the Greater Toronto Area.

Jackson’s work, focusing on sociolinguistics, investigates how immigrants to Canada speaking with an identifiably non-Canadian accent are perceived by prospective employers. She investigates strategies to reduce such workplace discrimination and target other societal problems.

Denis' interests follow variationist sociolinguistics (language change), and how human language faculty allows for variation both within the individual’s grammar and the larger context of the society in which it exists.

During their study, they recorded 12 women giving scripted 6 answers to interview questions, (3 good, 3 bad) and asked Human Resources students at universities and colleges in the GTA to rank the content of responses, as well as the employability of each voice. They were also asked to determine for which, if any, job interview to recommend these individuals.

Jackson and Denis analyzed the results using conditional inference tree modeling and random forest analysis.

They found that the accent heard by participants affected their ratings of all these scripted responses, viewing Canadian accents as superior to those of non-Canadians – specifically, the most disadvantaged being Chinese, Nigerian, and German accents. These were least likely to be recommended for customer-facing and, importantly, higher-ranking jobs.

Presented at online conferences in 2021 and 2022, in Germany and in Vancouver, a full thematic analysis of comments from the full study’s sample will be presented in June at the CLA (Canadian Linguistics Association) Conference, held in Ottawa. Watch out for WHITL’s coverage of that event, coming soon.

As for this publication, major recommendations from the report include (1) adding language to the Ontario Human Rights Commission’s grounds for discrimination, among others, which can be found in their full published work. It will also be available in the June issue of Language.

Though linguistic protection is an idea covered in sections 15 (Equality Rights), and 23 (Minority language and educational rights) of the Canadian Charter, Jackson and Denis’ work puts a spotlight on the need for specific and targeted legislation to protect Canadians with non-Canadian accents in the workplace.

Real change in public policy and legislation which emerges from projects like these are some of the most exciting moments we get to watch as they evolve. Looking forward to seeing this work at the CLA Conference in June.

An important p.s.: Dr. Jackson will join the UofT Department of Linguistics in January 2025. We can't wait!

November 24, 2023

New paper by Prof. Barend Beekhuizen and Colleagues in Corpus Linguistics and Linguistic Theory

A new paper on the cross-linguistic variation of word meanings has been published in the journal of Corpus Linguistics and Linguistic Theory by Prof. Barend Beekhuizen, Maya Blumenthal (MA Alumni), Lee Jiang (PhD Student), and colleagues! The paper is entitled 'Truth be told: a corpus-based study of the cross-linguistic colexification of representational and (inter)subjective meanings'.

We've included the abstract below:

The study of crosslinguistic variation in word meaning often focuses on representational and concrete meanings. We argue other kinds of word meanings (e.g., abstract and (inter)subjective meanings) can be fruitfully studied in translation corpora, and present a quantitative procedure for doing so. We focus on the cross-linguistic patterns for lemmas pertaining to truth and reality (English true and real), as these abstract meanings been found to frequently colexify with particular (inter)subjective meanings. Applying our method to a corpus of translated subtitles of TED talks, we show that (1) the abstract-representational meanings are colexified in patterned ways, that, however, are more complex than previously observed (some languages not splitting a ‘true’-like from ‘real’-like terms; many languages displaying further splits of representational meanings); (2) some non-representational meanings strongly colexify with representational meanings of ‘truth’ and ‘reality’, while others also often colexify with other fields.

Beekhuizen, B., Blumenthal, M., Jiang, L., Pyrtchenkov, A. & Savevska, J. (2023). Truth be told: a corpus-based study of the cross-linguistic colexification of representational and (inter)subjective meanings. Corpus Linguistics and Linguistic Theory. https://doi.org/10.1515/cllt-2021-0058

https://www.degruyter.com/document/doi/10.1515/cllt-2021-0058/html

March 2, 2023

TULCON16!!!

In case you have not heard, SLUGS is hosting TULCON16 this upcoming weekend!! With an incredible line up of keynote speakers, including Noam Chomsky himself), this will be a jam packed weekend of wonderful linguistic work!

Presentations cover a range of topics from speech disorders to the use of A.I. to theoretical syntax and everything in between! Below you can see the list of all UofT speakers presenting at the conference.Visit the TULCON16 website to get the extensive schedule with abstracts.

UofT Presentations:

- Dr. Regina Jokel (Faculty) "Language as a diagnostic tool"

- Ewan Dunbar (Department of French) "Are the Robots as smart as babies now?

- Siyi Fan (Undergrad) & Shiyang Sun (Undergrad) "A variationist study of first-person-singular subject ellipsis in epistemic verb phrases of Heritage Cantonese"

- Tony (Juntao) Hu (Undergrad) "Secondary thematic role encoding in require vs. allow verbs"

- Patrick Kinshular (Undergrad) "Semantic constraints in Kirundi phonology"

- Mechelle Wu (Undergrad) "The citizens of everywhere and nowhere: A pilot study examining the linguistic behaviours of Third-Culture Kids (TCKs)"

- Naim Lim (Undergrad) "Acoustic study of word-initial liquids in Korean loanwords for English produced by Korean speakers"

- Hafza Nuh (Undergrad) "An analysis of English stop consonant perception and production in L1 Somali speakers and comparison with L1 English speakers: A study of /p/ and /b/

- Laura Escobar (Undergrad) "The intonation of statements in the casual spontaneous speech of Tokyo Japanese"

If you are interesting in attending, please register in advance. For those who would like to get involved with TULCON, SLUGS is accepting volunteers to help with organization matters throughout the day.

We are SO excited for TULCON and hope you all can make it!

August 3, 2022

Newest Faculty Member: Shohini Bhattasali!

In the Fall, we will be welcoming a new faculty member to the Department of Language Studies at UTSC! Shohini Bhattasali will be joining us as a computational linguist! We had the great pleasure of sitting down with her for an interview. Keep reading to learn more about her!

What attracted you to the UofT linguistics department?

UofT has an incredible intellectual community and this is reflected through the research and the curriculum. I would love to help strengthen the computational linguistics program and I’m very excited to collaborate within Linguistics and with other departments (e.g. cognitive science, and information science). I also like how each campus has its unique identity but still makes up one cohesive whole.

Do you have any expectations regarding the department?

Everyone seems really welcoming and friendly. I am excited to see what everyone is working on and learn more about collaborative, interdisciplinary opportunities. The students at UofT seem very motivated and I’m excited to work with them and guide them along the way. I’m especially looking forward to working with students who want to incorporate computational modelling into their projects or are interested in the cognitive science of language and need guidance.

You have taught/assisted many courses ranging from computational linguistics to Hindi to writing, which has been your favourite?

Definitely the linguistics courses! They line up with my interests much more. While I was a teaching assistant for linguistics courses, I got to design tutorials. This was a great teaching experience as I got to see how the students were able to apply the theories they were learning. The writing courses were also great because I was able to design a course from scratch for first-year students. It was very fulfilling to see the students' trajectories as they improved their academic writing skills. These courses were the most rewarding in terms of seeing students improve and gain confidence in their writing!

Do you notice any trends amongst your top students?

My top students are typically the ones who are engaged and ask questions in class. They are the ones who are not afraid to dive deeper into ongoing topics during class discussions. I know some students are shy and might be intimidated by speaking up in class, but they can still participate in tutorials and drop by during office hours. While it is hard to generalize, student engagement can often be an indicator of how they are doing. If they can relate their personal interests to the material, they will be more motivated and interested in learning. It is great to see students interested in what I am lecturing about and how it changes the way they see linguistics. Students coming from high school often don’t know much about linguistics so it's particularly enjoyable to observe the ah-ha moment where their interest is sparked and they figure out how linguistics isn’t centred around prescriptivism.

What has been your most memorable research project?

My dissertation was mainly based on a large-scale fMRI study. I had started grad school with an interest in computational linguistics and discovered neurolinguistics along the way. My advisor was starting a new cognitive neuroscience project and gave me an opportunity to be involved in this cross-linguistic fMRI study. He believes in experiential learning so it was a steep learning curve but I was involved in the experimental design, data collection, data analysis and then training other grad students and undergrad RAs. It was my first time working with neuroimaging data, but this experience really helped guide my research program. It took over a year to collect the brain data but the good thing with using continuous, naturalistic fMRI datasets is that it's not tailored to one research project and we can use it for many different research topics. I’m a big fan of naturalistic fMRI/EEG/MEG datasets for reusability and replicability purposes!

What are some of the issues you face in the field of computational linguistics?

In the last 10 years, the field has exploded and grown exponentially. It can be challenging to even define what “computational linguistics” is as the field is changing so quickly. Additionally, the line between natural language processing and computational linguistics is getting blurry. I personally see computational linguistics as a scientific study of language using computational tools, whereas natural language processing is more about engineering and building tools that are useful for language applications, e.g., Amazon Alexa (speech recognition) and Google Translate (machine translation).

Artificial intelligence and machine learning approaches have also become tremendously popular, but we need to be careful in applying these approaches blindly to neuroimaging data because there is still so much about the brain we don’t know. While we can use these new fancy tools to get good results on certain tasks, we cannot always rely on them to understand why we get the results we get. For example, a computational model like GPT-3 is very good at predicting the next word in a sentence, but we don’t fully know how the prediction is being generated. If we don’t fully understand the representations being learnt by these models, how can we use them to understand the representations that the brain is using? As scientists, we always critically think about the tools we use and this is just another tool we have at our disposal. Maybe in a few years, we will have a more in-depth understanding of these models, and we can leverage that to understand cognitive mechanisms behind language comprehension and production. I do use computational models in my work to operationalize and embody cognitive hypotheses but I always prefer using simple and interpretable models over these fancier, black-box models.

Do you have any hobbies / secret passions?

I love reading! I also like to bake since it’s a great way to destress while still feeling productive. Dance and music have played a large role in my life. Growing up in India, I trained as a classical Indian dancer (Odissi) for 15 years and then, I was on my college dance team too. I also love attending classical music concerts and dance performances. I’m looking forward to attending more of those in Toronto!

What are you most looking forward to about living in Toronto?

Toronto is a big diverse city which is exciting! I grew up in a large city too, but I have mostly lived in smaller, college towns during undergrad and grad school so I’m very happy to be moving to an urban area. I’ve also heard a lot of good things about Toronto’s multicultural food scene which makes sense given the large immigrant population. I also love visiting museums, discovering local bakeries, and finding new go-to coffee spots. It will be interesting to see what I will find in Toronto! I’m also looking forward to exploring more of Ontario and Canada in general since I’ve only visited Quebec City.

I will be going back and forth between the Scarborough and St. George campuses, and luckily for me I already have a few connections on all campuses which I’m excited about. Nathan Sanders (Faculty) was actually my undergrad thesis advisor so it’s such a small world moment to now be his colleague! One of my best friends from grad school is a faculty in iSchool (Shion Guha) and another friend is joining UTM Language Studies (Lingzi Zhuang, new faculty member). Overall, I am excited to join UofT and am looking forward to creating a lab at the intersection of computational linguistics and cognitive neuroscience, meeting the students and making more connections here!

We would like to thank Shohini for taking the time out of her busy schedule to be interviewed! We look forward to seeing her on campus in the Fall! Feel free to connect with her on Twitter if you have any questions or if you want to introduce yourself!

June 28, 2022

Reunited at DiGS 23!!


From right to left: Erin Hall, Ana Pérez-Leroux, Ailis Cournane

The Diachronic Generative Syntax (DiGS) conference, held at NYU, took place June 8-10. DiGS 23 had a special workshop on "Child learners in syntactic change: Theory and methods"! This workshop hosted presentations and discussions on topics combining theoretical syntax, child language acquisition, variationist sociolinguistics and computational modelling of language change.

Erin Hall (PhD Alum 2020, now assistant professor at California State University, San Bernardino) and Ana Pérez-Leroux (Faculty) presented their work entitled "Children take steps toward cyclic and non-cyclic diachronic changes". The pair examine the role of child language acquisition in systematic, cyclic processes of grammaticalization, and non-cyclic changes in progress. They propose that there are 2 different processes involved in child language acquisition.

Not only did this workshop allow for Hall and Pérez-Leroux to share their amazing work, but they also got to reunite with Ailis Cournane! Cournane is a PhD alum, class of 2015, and is now an assistant professor at NYU!

Overall, UofT faculty and alum had a successful time at DiGS 23!

May 5, 2021

Research Groups: Friday, May 7

10:00 AM - 11:30 AM: Cognitive Science of Language Group
Yang Xu (faculty, Department of Computer Science): "Chaining and the growth of word meaning."

Natural language relies on a finite lexicon to express a potentially infinite range of meanings. This tension creates a funnel effect where meanings are compressed through a limited set of words. Prior work suggests that word meanings are structured for efficient compression. I describe recent development that extends this work to investigate the cognitive mechanisms in the dynamic growth of word meaning through time. I first present work that synthesizes cognitive linguistic theories of chaining with classic models of categorization to predict the historical extension of numeral classifiers for emerging referents. I then present evidence that similar models of chaining predict children’s spatial word generalization. Our findings suggest that an exemplar-based model of chaining may underlie the general mechanisms in word meaning growth. I discuss applications of this work to natural language processing and implications for research in lexicon evolution.

March 25, 2021

Research Groups: Friday, March 26

10:00 AM - 11:30 AM: Language Variation and Change Group
Group discussion led by Lisa Sullivan (Ph.D.) of a paper: D'Onofrio, Annette (forthcoming). Age-based perceptions of a reversing regional sound change. Journal of Phonetics, 86.

11:30 AM - 1:00 PM: Phonetics/Phonology Research Group
Marjorie Leduc (MA): "Vowel harmony in Karajá."

This talk will present a preliminary OT analysis of Karajá’s ATR harmony accounting for the regressive properties of the pattern as well as the icy target behaviour of the high vowels /ɪ/ and /ʊ/, which harmonize to [i] and [u], but then block harmony from proceeding past them. This contrasts with underlying /i/ and /u/, which are trigger to the harmony process, creating a distinction between underlying and derived vowel behaviour which can be difficult to deal with in surface-oriented frameworks like OT.

1:00 PM - 2:00 PM: Semantics Research Group
Laurestine Bradford (MA): "Using communicative need to predict colexification in CLICS-3."

There is a cross-linguistic tendency for the more complex systems of vocabulary to be the ones with the most communicative power (Kemp, Xu, and Regier, 2017). Recently, a new cross-linguistic and cross-domain tool, that can help test such generalizations about vocabulary, was published: the third edition of the Database of Cross-Linguistic Colexifications (CLICS3; Rzymski, Tresoldi, et al., 2019). Using this, we can ask to what extent communicative efficiency predicts amounts of colexification. That is, do languages colexify more words in domains that are less often needed? In my project, I attempt to compare the frequency of selected domains, in different languages' corpora, with the amount of domain-internal colexification attested in CLICS3. I will explain some of the theoretical and practical issues that have come up so far, as I attempt to quantify amounts of colexification and make use of potentially noisy data.

March 18, 2021

Research Groups: Friday, March 19

10:00 AM - 11:30 AM: Cognitive Science of Language Group
1. Guest speaker: Juliette Millet (Université Paris Diderot-Paris 7): "Inductive biases, pretraining and fine-tuning jointly account for brain responses to speech."

Our ability to comprehend speech remains, to date, unrivaled by deep learning models. This feat could result from the brain's ability to fine-tune generic sound representations for speech-specific processes. To test this hypothesis, we compare i) five types of deep neural networks to ii) human brain responses elicited by spoken sentences and recorded in 102 Dutch subjects using functional Magnetic Resonance Imaging (fMRI). Each network was either trained on an acoustics scene classification, a speech-to-text task (based on Bengali, English, or Dutch), or not trained. The similarity between each model and the brain is assessed by correlating their respective activations after an optimal linear projection. The differences in brain-similarity across networks revealed three main results. First, speech representations in the brain can be accounted for by random deep networks. Second, learning to classify acoustic scenes leads deep nets to increase their brain similarity. Third, learning to process phonetically-related speech inputs (i.e., Dutch vs English) leads deep nets to reach higher levels of brain-similarity than learning to process phonetically-distant speech inputs (i.e. Dutch vs Bengali). Together, these results suggest that the human brain fine-tunes its heavily-trained auditory hierarchy to learn to process speech.

2. Julia Watson (MSc., Department of Computer Science), Jai Aggarwal (MSc., Department of Computer Science), and Anna Kapron-King (MSc., Department of Computer Science): "Come together: Integrating perspective taking and perspectival expressions."

Conversational interaction involves integrating the perspectives of multiple interlocutors with varying knowledge and beliefs. An issue that has received little attention in cognitive modeling of pragmatics is how speakers deal with the choice of words like come that are inherently perspectival. How do such lexical perspectival items fit into a speaker's overall integration of conversational perspective? We present new experimental results on production of perspectival words, in which speakers have varying degrees of certainty about their addressee's perspective. We show that the Multiple Perspectives Model closely fits the empirical data, lending support to the hypothesis that use of perspectival words can be naturally accommodated as a type of conversational perspective taking.

1:00 PM - 2:30 PM: Fieldwork Group
Ana Tona Messina (Ph.D.): "A POS tagger for Nahuatl."

Nahuatl is the most widely spoken indigenous language in Mexico; it enjoys the attention of many academics and scholars in the country and abroad; and it has an active community of native researchers and young advocators. Yet, there aren´t many language resources for Nahuatl speakers. In this talk, I will look at the practical issues that stand in the way of progress in this particular case, and I present the first part-of-speech tagger for Nahuatl (still under construction).

March 3, 2021

TULCON 14

The 14th Toronto Undergraduate Linguistics Conference (TULCON 14) is taking place online on March 6 and 7, hosted by our own SLUGS. Note that if you would like to attend (and are not already involved as a presenter and/or organizer), you will need to register here.

Undergraduates of ours who are presenting talks are:

Eloisa Cervantes (BA): "Variation of /ʎ/ in Toronto heritage speakers of Calabrian Italian: Support for the effect of language use."
Diana Gil Hamel (BA): "An-game nó an-ghame: Irish consonant mutations in English loanwords."
Anastasia Koutlemanis (BA): "Generational usage of 'Greeklish'."
Nathan Leung (BA): "Tonal assignment of loanwords in Medan Hokkien."
Anna Pyrtchenkov (BA), Maya Blumenthal (BA), and Lee Jiang (BA): "Truth be told: A corpus-based study of adjectives of truth and reality across languages."
Haili Su (BA): "Disyllabic contraction in Taiwan Mandarin: Modelling the complexity of variation with Optimality Theory."

December 2, 2020

Research Groups: Friday, December 4

Note that this week's meeting of the Semantics Research Group is cancelled.

10:00 AM - 11:30 AM: Language Variation and Change Research Group
Guest speaker: Suzanne Robillard (University of Ottawa): "Implicit norms and prestige forms: Linguistic cohesion of G2 French in Victoria."

11:30 AM - 1:00 PM: Phonetics/Phonology Research Group
Ewan Dunbar (faculty, Department of French): "Modelling early language acquisition from raw speech data."

The problem of language acquisition is key to the way questions are posed and answered in linguistics and in the cognitive sciences of language more broadly. And we now know quite a lot about the earliest stages of language acquisition, which, logically, show infants tuning into the signal, learning the sound inventory of the language and developing a small early lexicon between six and twelve months. What can recent advances in machine learning bring to the table? I will discuss how we have been able to take advantage of an interest from industry in applied problems in speech recognition, and channel the forces of modern machine learning towards cognitively interesting problems in early language acquisition. I will cover the small number of initial results that seem to come out of this line of research, which suggest that abstract phonet/emic categories are both critically important and somewhat overrated, depending on what facts need to be explained.

October 20, 2020

Research Groups: Friday, October 23

10:00 AM - 11:30 AM: Psycholinguistics Group
Ewan Dunbar (faculty): "The Zero Resource Speech Challenge 2021."

11:30 AM - 1:00 PM: Phonetics/Phonology Research Group
Group discussion of a recent paper: Durvasula, Karthik, and Adam Liter (2020). There is a simplicity bias when generalising from ambiguous data. Phonology, 37(2), 177-213.

1:00 PM - 2:00 PM: Fieldwork Group
Fiona Wilson (Ph.D.): "Quantitative analysis of negation in two Cree corpora"

October 8, 2020

Research Groups: Friday, October 9

10:00 AM - 11:30 AM: Psycholinguistics Group
Guest speaker: Jiangtian Li (University of Western Ontario): "On polysemy: a philosophical, psycholinguistic, computational approach."

Most words in natural languages are polysemous, that is they have related but diﬀerent meanings in diﬀerent contexts. These polysemous meanings (senses) are marked by their structuredness, ﬂexibility, productivity, and regularity. Previous theories have focused on some of these features but not all of them together. Thus, I propose a new theory of polysemy, which has two components. First, word meaning is actively modulated by broad contexts in a continuous fashion. Second, clustering arises from contextual modulations of a word and is then entrenched in our long term memory to facilitate future production and processing. Hence, polysemous senses are entrenched clusters in contextual modulation of word meaning and a word is polysemous if and only if it has entrenched clustering in its contextual modulation. I argue that this theory explains all the features of polysemous senses. In order to demonstrate more thoroughly how clusters emerge from meaning modulation during processing and provide evidence for this new theory, I implement the theory by training a recurrent neural network (RNN) that learns distributional information through exposure to a large corpus of English. Clusters of contextually modulated meanings emerge from how the model processes individual words in sentences. This trained model is validated against a human-annotated corpus of polysemy, focusing on the gradedness and ﬂexibility of polysemous sense individuation, a human-annotated corpus of regular polysemy, focusing on the regularity of polysemy, and behavioral ﬁndings of oﬄine sense relatedness ratings and online sentence processing. Last, the implication to philosophy of this new theory of polysemy is discussed. I focus on the debate between semantic minimalism and semantic contextualism. I argue that the phenomenon of polysemy poses a severe challenge to semantic minimalism. No solution is foreseeable if the minimalist thesis is kept, and the existence of contextual modulation is denied within the literal truth condition of an utterance.

1:00 PM - 2:30 PM: Semantics Research Group
Guillaume Thomas (faculty) presenting on collaborative work with language consultant Germino Duarte: "Switch-Reference: Syntax and/or (discourse) semantics?"

September 14, 2020

Research Groups: Friday, September 18

Note that all groups are meeting online until otherwise indicated; see the emails from group administrators for links and for further details. Also note that subsequent meetings of the Fieldwork Group this semester will be in the afternoon time-slot instead (1 PM - 2:30 PM).

10:00 AM - 11:30 AM: Language Variation and Change Research Group
Jeremy Needle (postdoc): "Two computational studies of lexical knowledge in te reo Māori in NZ."

The two studies presented in this talk demonstrate our efforts with computational and experimental approaches to replicate and extend traditional formal descriptions of te reo Māori. In the first study, we compare wordlikeness ratings for words and non-words to gradient phonotactic scores based on subsets of the lexicon derived from spoken and written corpora. In additional to deriving a gradient probabilistic description of Māori phonotactics which extends prior phonological work, we find that non-Māori-speaking New Zealanders demonstrate wordlikeness knowledge of Māori which suggests form-only familiarity with about 2000 morphemes. The importance of morphology in the lexical model for this study spurred us toward the second study: a quantitative survey of morphological patterns in Māori which combines knowledge from expert informants with machine-learning morphological parsing models. Among our findings, we particularly note that our native-speaker informants do not appear sensitive to the same taxonomy of reduplication patterns that appear in traditional grammars.

11:30 AM - 1:00 PM: Fieldwork Group
Introductions and group discussion of developing elicitation materials.

1:00 PM - 2:30 PM: Semantics Research Group
1. Angelika Kiss (Ph.D.): "Question tags projecting sourcehood in Italian."

Question tags like isn't it or right? in English can serve the purpose of eliciting confirmation or acknowledgment from the addressee. In Italian, no?, o sbaglio? and vero? have such a function, but there is another tag in its inventory, eh?, which is subject to further restrictions. In addition to elicit the addressee's acknowledgment/confirmation, eh? also conveys evidential meaning. When a tag question hosts eh?, the speaker conveys i) that the addressee is independently committed to the proposition conveyed by the anchor (p), and presupposes ii) that the speaker knows i) from a direct source. That is, a question like Buono, eh? 'It's tasty, EH?', is pronounced felicitously in a context where the speaker directly perceives an event where the addressee has direct evidence for the truth of p (i.e., that whatever the addressee is eating, the addressee finds it tasty). Acknowledging a tag question like Buono, eh? results in registering p as a projected independent commitment of the addressee on the scoreboard of Farkas and Roelofsen (2012).

2. Michela Ippolito (faculty): "Gestures and the semantics of non-canonical questions."

I argue that both the co-speech and pro-speech symbolic gesture MAT (mano a tulipano) used by native speakers of Italian characterizes non-canonical wh-questions. MAT can be executed with either a fast tempo contour or a slow tempo contour. Tempo is semantically significant: descriptively, a fast tempo characterizes a biased but information-seeking non-canonical question; a slow tempo characterizes a rhetorical non-canonical question. I argue that the fast contour is the default tempo of MAT and that it brings about a biased interpretation. Slowing down the movement occurs when the feature [slow] is added: the semantic contribution of this feature is to add the presupposition that the question is resolved in the conversational context. This results in generalizing the speaker's bias to all discourse participants. More generally, I aim to show that both modalities (speech and gesture) can be analyzed and modelled using the same linguistic tools and principles.

August 12, 2020

Goodbyes and hellos for 2020-21

The 2020-21 year approaches! We bid farewell to our students completing their MAs in our department; to Erin Hall (Ph.D. 2020), taking up a tenure-track position in linguistics and speech pathology at California State University, San Bernardino; and to Naomi Francis (MA 2014; recent faculty), beginning a postdoc in semantics at the University of Oslo. Also, best of luck to Susana Béjar as she begins a semester of research leave.

Welcome and/or welcome back to:

Curt Anderson (faculty), a semanticist beginning a contract faculty position at UTSC.
Tahohtharátye William Joseph Brant (faculty), who will be beginning a joint teaching-stream position in the Department of Linguistics and the Centre for Indigenous Studies starting next summer.
Emily Clare (postdoc; Ph.D. 2019), beginning a postdoctoral fellowship with Jessamyn Schertz (faculty) at UTM.
Ewan Dunbar (faculty; MA 2008, BA 2007), beginning a tenure-track position in the Department of French as an Assistant Professor with a focus on computational linguistics.
Myrto Grigoroglou (faculty), completing a postdoctoral fellowship in the Language and Learning Lab at OISE and joining the Department of Linguistics as a tenure-track faculty member in language acquisition and psycholinguistics.
Qandeel Hussain (postdoc), joining us as an Arts and Science Postdoctoral Fellow associated with the Phonetics Lab, working on sociophonetic variation in the speech of South Asian communities in the Toronto area.
Samantha Jackson (postdoc), beginning a U of T Provost's Postdoctoral Fellowship and working with Derek Denis at UTM.
Dave Kush (faculty), beginning a tenure-track position at UTSC in psycholinguistics starting in 2021.
Avery Ozburn (faculty; MA 2014), completing a postdoctoral fellowship at McGill University and joining UTM as a tenure-track faculty member in phonology.
Pedro Mateo Pedro (faculty), beginning a teaching-stream position in Indigenous languages and revitalization.
Ai Taniguchi (faculty), beginning a teaching-stream position in semantics at UTM.
Deem Waham (staff), returning from leave.

We also have 16 new graduate students: 11 in the MA program and 5 in the Ph.D. Welcome, all!

May 25, 2020

Symposium on Jackman Scholars-in-Residence project

For this year's Jackman Scholars-in-Residence program, Barend Beekhuizen (faculty) has guided a group of outstanding undergraduates - Mah Noor Amir, Maya Blumenthal, Li Jiang, Anna Pyrtchenkov, and Jana Savevska - on an intense 4-week computational project examining cross-linguistic variation in the translations of words such as true, real, actual, and right in a sample of languages (Urdu, Hindi, Hebrew, German, Mandarin, Greek, Russian, Spanish, Macedonian, and Bulgarian). At the conclusion of the project, the students will be presenting their findings on Thursday, May 28, at 11 AM to 12 PM, online. See the email for the Zoom link and come hear about what this powerhouse team of emerging researchers has been up to!

February 29, 2020

Guest speaker: Félix Desmeules-Trudel (University of Western Ontario)

The Department of French is hosting a talk by Félix Desmeules-Trudel, who is a postdoctoral fellow at the University of Western Ontario, having completed his Ph.D. in linguistics at the University of Ottawa in 2018 with a focus on perception, acquisition, and computational methods - especially when it comes to phonetics. His talk is taking place on Monday, March 2, at 3 PM, in 201 Odette Hall on St. Joseph Street: "Modéliser la dynamique du language : Outils expérimentaux, statistiques, et computationnels" (Modelling the dynamics of language: Experimental, statistical, and computational tools.") Note that the talk will be held in French.

(A substantial proportion of the population is able to use more than one language on an everyday basis, but most models of language processing are based on monolingual speakers. Moreover, some major theoretical frameworks consider cognitive representations to be relatively fixed over time. However, language is a dynamic phenomenon. With a view to examining this dynamic reality, I will present a series of projects using sophisticated experimental methods that will allow the modelling of real-time language processing in order to probe more deeply the interactions between phonetic variation and L2 learning. I pay particular attention to the variability and dynamics of nasal vowel production in French (measured according to changes in nasal airflow), to the real-time processing of these vowels in French as an L1 and as an L2 (eye-tracking), and to various linguistic and cognitive factors that influence processing in bilingual children and L2 lexical learning in adults. In conjunction with advanced statistical methods, the results suggest that phonetic features usually thought to be redundant in phono-lexical representations in the mind do contribute to improving linguistic processing. That said, the use of these phonetic features seems to be inextricably linked to speakers' linguistic background. Late bilinguals have perceptual patterns less precise than those of monolingual speakers. Ultimately, this research will allow me to adapt computational models of word recognition (jtrace) to L2 learners, and thus to acquire a more realistic understanding of how language processing and learning function, in tandem with phonetic, phonological, and lexical representations.)

February 22, 2020

Guest speaker: Ewan Dunbar (Université de Paris Diderot)

The Department of French is hosting a talk by Ewan Dunbar. After finishing our department's BA (2007) and MA (2008) programs, Ewan earned his Ph.D. from the University of Maryland in 2013. He is now a maître de conférences (Assistant Professor) at l'Université de Paris - Paris Diderot (Paris 7), where his research focuses on computational approaches to learning and perception, especially on the phonological level. His talk is taking place on Monday, February 24, at 3 PM, in 201 Odette Hall on St. Joseph Street: "Un jour, Google, tu deviendras un vrai garçon" ("Someday, Google, you'll be a real boy"). Note that the language of the talk is French.

(We increasingly speak to our computers, smartphones, and digital assistants. In many cases, these devices understand us perfectly. But it doesn't take long to realise that that our devices don't perceive speech the same way human beings do: they can make offbeat errors even under relatively normal listening conditions. Understanding the processes and representations involved in human speech perception is one of the primary goals of phonetics and phonology. I will show how we have approached fundamental questions for speech sciences through the use of reverse-engineering methods, as we attempt to ensure that the technology underlying our digital assistants behaves exactly the same way as human speech perception does. As an example, I describe our initial progress towards developing a teaching tool able to suggest targeted interventions for improving pronunciation in a second language – an application that needs to model and predict the likely difficulties that speakers of a given L1 will have when learning a given L2. I present experimental results gathered in English and French and compare the behaviour of our current models with that of human participants. I show how this work is integrated into a larger research program of modelling human speech perception, and the implications of such models for the speech-based technology that we interact with more and more in daily life.)

January 21, 2020

Research Groups: Week of January 20-24

Thursday, January 23, 4:00 PM - 5:00 PM in SS 1078: Morphology Reading Group
Ross Godfrey (Ph.D.) leading a group discussion of: Chandlee, Jane (2017). Computational locality in morphological maps. Morphology, 27, 599–641.

Friday, January 24, 10:00 AM - 11:30 AM in SS 4043: Psycholinguistics Group

Jade Lei Yu (Ph.D., Department of Computer Science) and John Xu (Ph.D., Department of Computer Science), making two presentations: "How nouns surface as verbs: A generative framework for word class conversion" and "Prototype theory and meaning change in the semantic field of emotion."

Friday, January 24, 1:15 PM - 2:45 PM in SS 560A: Semantics Research Group

Heather Stephens (Ph.D.): "Yeah, no, that was implied: Targeting non-asserted propositions with propositional anaphors."

Most contemporary treatments of polarity particles agree that these expressions are in some sense anaphoric to propositions (e.g., Krifka 2015, Roelofsen and Farkas 2015). When two particles of opposing polarity are used in a single response, as in (1), several questions arise, including: exactly which propositions are the particles picking up? How can such responses be modelled? I will provide some thoughts in response to these questions, using the framework laid out by Roelofsen and Farkas (2015) as the point of departure:

(1) Dorothy: [We’ve got] to do this shopping Peter.
Peter: Yeah, no it’s alright nanna, we’ve got 5 minutes. (Burridge and Florey 2002).

Links