Tom Juzek,
Assistant Professor,
Department of Modern Languages and Linguistics,
Florida State University

"The Syntactic Acceptability Dataset as a resource for machine learning and linguistic analysis"

October 26, 2022, Schedule:

Nespresso & Teatime ( 417 DSL - Commons )
 
03:00 to 03:30 PM Eastern Time (US and Canada)

Colloquium - F2F (  499 DSL ) / Virtual ( Zoom )
 
03:30 to 04:30 PM Eastern Time (US and Canada)

Meeting # 942 7359 5552

Abstract:

Linguistic datasets are popular in machine learning, particularly in the emerging field of few shot learning (learning from limited data), as linguistic data is often complex and difficult to generalize from, and thus a welcome challenge (Wang et al. 2020). In this talk, I will outline ongoing research on building a new dataset valuable to both the machine learning community and the linguistic community. The new dataset will be based on COLA (Corpus of Linguistic Acceptability; Warstadt et al. 2018), a popular dataset in machine learning. I will briefly introduce COLA, the challenges it poses, and relevant linguistic distinctions (acceptability vs grammaticality). Further, I will motivate the need for new data, a different kind of data, outline its structure, and its expected relevance to machine learning and linguistics.

Attachments:
FileDescriptionFile size
Download this file (theywilltalkdata.jpg)theywilltalkdata.jpgAdvertisement334 kB
Download this file (Tom Juzek.jpg)Tom Juzek.jpgAdvertisement28 kB
Dept. of Scientific Computing
Florida State University
400 Dirac Science Library
Tallahassee, FL 32306-4120
Phone: (850) 644-1010
admin@sc.fsu.edu
© Scientific Computing, Florida State University
Scientific Computing