Insubordination studies often refer to some prosodic features of insubordinated constructions but, to date, no systematic research has been carried out on the effects of insubordination on prosody. This paper analyzes the prosody of both independent and semi-dependent clauses, with subordination marks, using a corpus of 1,230 utterances. The data show that while subordinate and elliptical clauses that can recover the elided clause show prosodic markings of continuation (rising boundary tones), insubordinated clauses do not.
In this paper, we present SFU ReviewSP-NEG, the first Spanish corpus annotated with negation with a wide coverage freely available. We describe the methodology applied in the annotation of the corpus including the tagset, the linguistic criteria and the inter-annotator agreement tests. We also include a complete typology of negation patterns in Spanish. This typology has the advantage that it is easy to express in terms of a tagset for corpus annotation: the types are clearly defined, which avoids ambiguity in the annotation process, and they provide wide coverage (i.e.
Jiménez-Zafra Salud María , M Teresa Martín-Valdivia, Luis Alfonso Ureña-López, M. Antònia Martí, Mariona Taulé
Proceedings of the Workshop on Extra-Propositional Aspects of Meaning in Computational Linguistics
This paper presents the main sources of disagreement found during the annotation of the Spanish SFU Review Corpus with negation (SFU ReviewSP -NEG). Negation detection is a challenge in most of the task related to NLP, so the availability of corpora annotated with this phenomenon is essential in order to advance in tasks related to this area. A thorough analysis of the problems found during the annotation could help in the study of this phenomenon.
The present study approaches the indexing of epistemicity and evidentiality from the point of view of register by analyzing a total of 30 oral and 30 written productions of two opinion reports (one dealing with a debatable issue and the other with a nondebatable issue) produced by 15 Catalan speakers. The main aim of the paper is to test the potential effects of register (i.e., oral vs. written discourse) and debatability (i.e., debatable vs. nondebatable issue) on the discourse marking of evidentiality and epistemicity.
Kovatchev, Venelin , Maria Salamó, M. Antònia Martí
Procesamiento del Lenguaje Natural, Vol. 57: 109-116
Distributional Semantic Models (DSM) are growing in popularity in Computational Linguistics. DSM use corpora of language use to automatically induce formal representations of word meaning. This article focuses on one of the applications of DSM: identifying groups of semantically related words. We compare two models for obtaining formal representations: a well known approach (CLUTO) and a more recently introduced one (Word2Vec). We compare the two models with respect to the PoS coherence and the semantic relatedness of the words within the obtained groups.
J. Barnes, A. Brugos, S. Shattuck-Hufnagel, N. Veilleux (eds.): Proceedings of Speech Prosody 2016, pp. 888-982 (Boston, May 31-June 4, 2016)
Boston, Boston University
In this article, we focus on two languages within the Romance group (Catalan and Friulian) that have been reported to use intonation and pragmatic markers to different extents to mark epistemic meanings (e.g., , ). A total of 15 speakers per language were asked to participate in a Discourse Completion Task designed to elicit statements with several degrees of speaker commitment.
Martí, M. Antònia, M. Teresa Martín-Valdivia, Mariona Taulé, Salud María Jiménez-Zafra, Montserrat Nofre, Laia Marsó
Procesamiento del Lenguaje Natural, Vol. 57: 41-48
En este artículo se presentan los criterios aplicados para la anotación del corpus SFU ReviewSP-NEG con negación y la tipología lingüística correspondiente. Esta tipología presenta la ventaja de ser fácilmente expresable en términos de un tagset para la anotación de corpus, de presentar tipos claramente delimitados, evitando así la ambigüedad en el proceso de anotación, y de presentar una amplia cobertura, es decir, que ha servido para resolver todos los casos que han aparecido. El corpus contiene 400 comentarios y 198.551 palabras.