Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Identifying Thesis and Conclusion Statements in Student Essays to Scaffold Peer Review, Essays (university) of English Literature

You may look and share the identify essay types

Typology: Essays (university)

2020/2021

Uploaded on 05/04/2021

anahitay
anahitay 🇺🇸

4.7

(16)

255 documents

1 / 6

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Identifying Thesis and Conclusion Statements in
Student Essays to Scaffold Peer Review
Mohammad H. Falakmasir, Kevin D. Ashley, Christian D. Schunn, Diane J. Litman
Learning Research and Development Center,
Intelligent Systems Program, University of Pittsburgh
Abstract. Peer-reviewing is a recommended instructional techniqu e to
encourage good writing. Peer reviewers, however, may fail to identify key
elements of an essay, such as thesis and conclusion statements, especially in
high school writing. Our system identifies thesis and conclusion statements, or
their absence, in students’ essays in order to scaffold reviewer reflection. We
showed that computational linguistics and interactive machine learning have the
potential to facilitate peer-review processes.
Keywords: Peer-review, high school writing instruction, discourse analysis,
natural language processing, interactive machine learning
1 Introduction
Writing is essential to communication, learning, and problem solving. However, poor
achievement in high school writing is a major deficiency in the US educational
system [1]. There appears to be no single best approach to teaching writing; however,
some practices have been shown to be more effective than others.
One of these practices, peer-review of writing assignments, is a commonly
recommended technique to improve writing skills, especially in large class settings.
Peer-review not only provides students with feedback, it also gives them the
opportunity to read essays of other students and improve their reflective and
metacognitive skills. Several studies have found that providing feedback leads to
improvement in the reviewer’s writing [2], especially when the students provide
constructive feedback [3] and put effort into the process [4].
While web-based peer-review systems solve logistical challenges of the review
process, such as distribution of documents, providing rubrics and review criteria, and
supporting successive drafts, they are still far from optimal [5]. In particular,
reviewers may not focus on the core aspects of the text being evaluated [6]. In
argumentative writing, a thesis statement plays a pivotal role: it communicates the
author’s position and opinion about the essay prompt; it anchors the framework of the
essay, serving as a hook for tying the reasons and evidence presented and anticipates
critiques and counterarguments [7]. The thesis statement thus has a major influence in
assessing writing skills [8]. A conclusion reiterates the main idea and summarizes the
entire argument in an essay. It may contain new information, such as self-reflections
pf3
pf4
pf5

Partial preview of the text

Download Identifying Thesis and Conclusion Statements in Student Essays to Scaffold Peer Review and more Essays (university) English Literature in PDF only on Docsity!

Identifying Thesis and Conclusion Statements in

Student Essays to Scaffold Peer Review

Mohammad H. Falakmasir, Kevin D. Ashley, Christian D. Schunn, Diane J. Litman

Learning Research and Development Center, Intelligent Systems Program, University of Pittsburgh

Abstract. Peer-reviewing is a recommended instructional technique to encourage good writing. Peer reviewers, however, may fail to identify key elements of an essay, such as thesis and conclusion statements, especially in high school writing. Our system identifies thesis and conclusion statements, or their absence, in students’ essays in order to scaffold reviewer reflection. We showed that computational linguistics and interactive machine learning have the potential to facilitate peer-review processes. Keywords: Peer-review, high school writing instruction, discourse analysis, natural language processing, interactive machine learning

1 Introduction

Writing is essential to communication, learning, and problem solving. However, poor achievement in high school writing is a major deficiency in the US educational system [1]. There appears to be no single best approach to teaching writing; however, some practices have been shown to be more effective than others. One of these practices, peer-review of writing assignments, is a commonly recommended technique to improve writing skills, especially in large class settings. Peer-review not only provides students with feedback, it also gives them the opportunity to read essays of other students and improve their reflective and metacognitive skills. Several studies have found that providing feedback leads to improvement in the reviewer’s writing [2], especially when the students provide constructive feedback [3] and put effort into the process [4]. While web-based peer-review systems solve logistical challenges of the review process, such as distribution of documents, providing rubrics and review criteria, and supporting successive drafts, they are still far from optimal [5]. In particular, reviewers may not focus on the core aspects of the text being evaluated [6]. In argumentative writing, a thesis statement plays a pivotal role: it communicates the author’s position and opinion about the essay prompt; it anchors the framework of the essay, serving as a hook for tying the reasons and evidence presented and anticipates critiques and counterarguments [7]. The thesis statement thus has a major influence in assessing writing skills [8]. A conclusion reiterates the main idea and summarizes the entire argument in an essay. It may contain new information, such as self-reflections

on the writer’s position [ 7 ]. Since thesis and conclusion statements both play a critical role in the overall argument and share similar linguistic elements, in this paper we focus on automatically identifying these two core aspects. Advances in computational linguistics enable systems to automatically and quickly analyze large text corpora. Shermis et al. [ 9 ] reviewed the features of the three most successful Automated Essay Evaluation (AEE) systems. These systems can analyze certain pedagogically significant aspects of essays as reliably as expert human graders. In particular, Burstein and Marcu [ 10 ] presented a machine learning model for detecting thesis and conclusion sentences in students’ essays. Later they extended their model into a discourse analysis system as a part of ETS Criterion® software for online essay evaluation [ 11 ]. Their model uses lexical, syntactic, and rhetorical features and a complex classification framework to label different discourse elements of the essays like introductory material, thesis statement, topic sentences, and conclusion. Writing Pal (W-Pal) [ 12 ], an Intelligent Tutoring System, uses another AEE methodology to offer writing strategy instruction, game-based essay writing practice, and formative feedback to high school writers. It uses the Coh- Metrix AEE [ 13 ] to analyze student essays and provide formative feedback. We hypothesize that AEE techniques can also improve computer-supported peer- review by calling reviewers’ attention to particular features of an essay (e.g. thesis or conclusion statements) that deserve comment. Our AEE model is designed to be used as a part of the SWoRD peer-review system [ 14 ]. To the best of our knowledge, no one has used AEE techniques to support intelligent scaffolding of peer-reviews. We believe that our system has the potential to combine the strengths of both web-based peer review and automated essay evaluation systems. With an ability to identify thesis statements, the system will scaffold reviewers’ consideration of these issues posing such questions as:

  • SWoRD thinks [quoted text] is [pseudonym]’s thesis statement. Do you agree?
  • SWoRD cannot find a thesis statement for [pseudonym]’s paper. Can you?
  • Tell [pseudonym] to add a thesis statement. What thesis statement would you recommend? Since the papers we assess are mainly the first drafts of high school essays that often lacking in both style and structure, the peer-review context gives us a unique opportunity to evaluate and improve our model in practice. We are planning to use the model in an interactive machine learning [ 15 ] framework. Since we use the results of our model to scaffold peer-review, the model’s outputs will be evaluated first by the author of the paper, and then by a number of peer-reviewers. We can use these author and peer evaluations as feedback to improve the model, thus reducing the need for post hoc time-consuming manual text annotation. This exclusive advantage will enable the system to assess its performance in action and improve toward the desired behavior.

2 Methodology

It is important that reviewers attend to thesis statements: how well they are articulated and supported, and whether alternative interpretations/viewpoints are considered [ 16 ,

Positional Features : We used 3 positional features: paragraph number, sentence number in the paragraph, and type of paragraph (first, body, and last paragraph). We also used the same positional baseline as [ 11 ] in order to compare our results with their model. The positional baseline predicts all sentences in the first paragraph as a thesis statement and all sentences within the last paragraph as conclusion sentences. Sentence Level Features : We used a number of sentence level features based on the syntactic, semantic, and dependency parsing of the sentence. Based on our feature selection process, prepositional and gerund phrases are highly predictive of thesis and conclusion sentences. The number of adjectives and adverbs within the sentence is also highly correlated with a sentence being a thesis or conclusion statement. A set of frequent words was also predictive for thesis and conclusion sentences (e.g., “although”, “even though”, “because”, “due to”, “led to”, “caused”), and we used the number of occurrences of these words in a sentence as a feature in our model. Essay Level Features : We used 4 essay level features: number of keywords among the most frequent words of the essay, number of words overlapping with the assignment prompt, and a sentence importance score based on Rhetorical Structure Theory (RST) adapted from [ 19 ]. Table 2 shows the top 5 most predictive features for each category based on the Gini Coefficient [ 20 ] attribute selection method. This method considers the prior distribution of the classes and looks for the largest class in the training set (in our case sentences that are not the thesis) and tries to isolate it from other classes, which is suitable based on the nature of our classification task.

Table 2. Top 5 most predictive features for each category based on Gini Coefficent. Ranking Thesis Conclusion 1 Last Sentence Last Paragraph 2 First Paragraph Keyword Overlap 3 Common Words Common Words 4 Keyword Overlap Number of Adjectives 5 Number of Noun Phrases Number of Noun Phrases

3. Results and Discussion

After a data cleaning and pre-processing step, we created feature vectors for all of the sentences in the training set essays. Our target class had 3 labels: “thesis”, “conclusion”, and “other”. We considered sentences rated 2 and 3 as thesis and conclusion statements and put the ones rated 1 (incomplete) into the “other” category. We evaluated our model on two levels: sentence level and essay level, and compared its performance against the positional baseline and human annotated data. We used 3 classifiers in RapidMiner [ 21 ] in order to develop the sentence level models: Naïve Bayes, Decision Tree, and Support Vector Machine (SVM). We used 10 - fold essay stratified cross validation in order to evaluate our models on sentence level. In order to evaluate the models on essay level, we aggregated the results of the sentence level model in order to predict whether an essay contains a thesis/conclusion statement or not. Table 3 shows the performance of the 3 classifiers based on average Precision (P), Recall (R), and F-measure (F) among all 10 rounds of cross-validation. We use F, the harmonic mean of P and R, as our main performance evaluation metric.

Table 3. Average performance of 3 models and the positional baseline on development set Thesis Conclusion Essay Classifier P R F P R F P R F Positional Baseline 0.53 0.89 0.50 0.51 0.89 0.46 0.61 0.78 0. Naïve Bayes 0. 62 0. 76 0. 68 0.57 0.72 0.62 0.71 0.66 0. Decision Tree 0.75 0. 68 0. 71 0. 62 0. 43 0. 51 0.75 0.71 0. SVM 0. 85 0. 66 0. 74 0. 67 0. 41 0. 51 0.69 0.64 0.

In order to indicate how well the models generalize to new essays, we evaluated our models on an unseen test set. Table 4 shows the performance of 3 models.

Table 4. Average performance of 3 models and the positional baseline on unseen test set Thesis Conclusion Essay Classifier P R F P R F P R F Positional Baseline 0.58 0.88 0.57 0.58 0.84 0.55 0.58 0.84 0.5 5 Naïve Bayes 0. 70 0. 79 0. 74 0. 65 0. 69 0. 67 0.63 0.65 0. Decision Tree 0.82 0.84 0. 83 0. 49 0. 75 0. 59 0.75 0.73 0. SVM 0. 82 0. 65 0. 72 0.60 0.54 0.56 0.62 0.58 0.

The results show that all three models outperform the positional baseline. While the SVM classifier had the best precision on both development and test set at the sentence level, the Decision Tree classifier achieved higher recall and better overall performance at the essay level. Since we are not using the same training and test set as in [ 11 ], it is not valid to compare the exact value reported for P, R, and F. However, because we use the same positional baseline, and the results of the baseline can be considered as a rough estimate of the quality of the essays, we can compare the systems in terms of improvement over the baseline. In the thesis detection category, their highest reported improvement (regarding F) over the positional baseline is 0. while our best improvement is 0. 24 on the development set and 0.26 on the unseen test set. In the conclusion detection category, their highest reported improvement is 0.23 while our best improvement is 0.1 6 development set and 0. 12 on the unseen test set. In general, we have low performance in the conclusion category because the essays in our training set are first drafts of writing assignments and the students tend to spread the summary of their arguments across multiple sentences and our current model only works on the sentence level. In conclusion, our study shows that even with a relatively small corpus of essays, a computational linguistic model can identify core aspects of students’ essays. Our first priority was to detect the presence of thesis or conclusion statements within the student essays to provide instant feedback to authors upon submission. The second priority was to identify the particular sentences, to direct reviewers’ attention so that they focus some comments on how well the author has framed and supported his/her argument. Our next step is to embed our model into the SWoRD peer review system and evaluate its impact on the quality of student reviews. The peer-review nature of SWoRD gives us a unique opportunity benefit from both author and peer feedbacks in order to evaluate and refine our model while being used. We also plan to extend the model to detect other core elements of student essays such as topic sentences and supporting materials in order to provide feedback and scaffolding.