Skip to content

Les Perelman’s robo-graded essay

June 10, 2012

The New York Times reports on a competition to develop the best automated essay grading software.  The idea is that students taking an essay test (such as the SAT) could have their essays graded by computer instead of a human teacher (or jointly, both by computer and human).  

One prominent example of this type of automated grading software is the Educational Testing Service’s (E.T.S.) e-Rater.  In fact, E.T.S. claims this grading software is used today, with human raters, to grade GRE and TOEFL examinations, and without human raters in various practice tests. 

I am deeply skeptical of the ability of electronic grading software to give meaningful answers.  I’m not the only one with this suspicion.   Les Perelman of MIT has shown that it is possible to game this software by writing an essay that has a certain form, even though the content of the essay is nonsense.  He submitted the following essay to demonstrate his point, and received the best possible grade by E.T.S.  The essay is hilarious:

Question: "The rising cost of a college education is the fault of students who demand that colleges offer students luxuries unheard of by earlier generations of college students — single dorm rooms, private bathrooms, gourmet meals, etc."

Discuss the extent to which you agree or disagree with this opinion. Support your views with specific reasons and examples from your own experience, observations, or reading.

In today’s society, college is ambiguous. We need it to live, but we also need it to love. Moreover, without college most of the world’s learning would be egregious. College, however, has myriad costs. One of the most important issues facing the world is how to reduce college costs. Some have argued that college costs are due to the luxuries students now expect. Others have argued that the costs are a result of athletics. In reality, high college costs are the result of excessive pay for teaching assistants.

I live in a luxury dorm. In reality, it costs no more than rat infested rooms at a Motel Six. The best minds of my generation were destroyed by madness, starving hysterical naked, and publishing obscene odes on the windows of the skull. Luxury dorms pay for themselves because they generate thousand and thousands of dollars of revenue. In the Middle Ages, the University of Paris grew because it provided comfortable accommodations for each of its students, large rooms with servants and legs of mutton. Although they are expensive, these rooms are necessary to learning. The second reason for the five-paragraph theme is that it makes you focus on a single topic. Some people start writing on the usual topic, like TV commercials, and they wind up all over the place, talking about where TV came from or capitalism or health foods or whatever. But with only five paragraphs and one topic you’re not tempted to get beyond your original idea, like commercials are a good source of information about products. You give your three examples, and zap! you’re done. This is another way the five-paragraph theme keeps you from thinking too much.

Teaching assistants are paid an excessive amount of money. The average teaching assistant makes six times as much money as college presidents. In addition, they often receive a plethora of extra benefits such as private jets, vacations in the south seas, a staring roles in motion pictures. Moreover, in the Dickens novel Great Expectation, Pip makes his fortune by being a teaching assistant. It doesn’t matter what the subject is, since there are three parts to everything you can think of. If you can’t think of more than two, you just have to think harder or come up with something that might fit. An example will often work, like the three causes of the Civil War or abortion or reasons why the ridiculous twenty-one-year-old limit for drinking alcohol should be abolished. A worse problem is when you wind up with more than three subtopics, since sometimes you want to talk about all of them.

There are three main reasons while Teaching Assistants receive such high remuneration. First, they have the most powerful union in the United States. Their union is greater than the Teamsters or Freemasons, although it is slightly smaller than the international secret society of the Jedi Knights. Second, most teaching assistants have political connections, from being children of judges and governors to being the brothers and sisters of kings and princes. In Heart of Darkness, Mr. Kurtz is a teaching assistant because of his connections, and he ruins all the universities that employ him. Finally, teaching assistants are able to exercise mind control over the rest of the university community. The last reason to write this way is the most important. Once you have it down, you can use it for practically anything. Does God exist? Well, you can say yes and give three reasons, or no and give three different reasons. It doesn’t really matter. You’re sure to get a good grade whatever you pick to put into the formula. And that’s the real reason for education, to get those good grades without thinking too much and using up too much time.

In conclusion, as Oscar Wilde said, "I can resist everything except temptation." Luxury dorms are not the problem. The problem is greedy teaching assistants. It gives me an organizational scheme that looks like an essay, it limits my focus to one topic and three subtopics so I don’t wander about thinking irrelevant thoughts, and it will be useful for whatever writing I do in any subject.1 I don’t know why some teachers seem to dislike it so much. They must have a different idea about education than I do. By Les Perelman

One of Les Perelman’s most basic observations is that longer writing invariably receives better grades.  Big words are better than short words.  (“ ‘Egregious’ is better than ‘bad.’ ”)  And the ways that one can corrupt a grader are almost limitless  (“Mr. Perelman takes great pleasure in fooling e-Rater. He has written an essay, then randomly cut a sentence from the middle of each paragraph and has still gotten a 6.”)

E.T.S. responds:

E.T.S. officials say that Mr. Perelman’s test prep advice is too complex for most students to absorb; if they can, they’re using the higher level of thinking the test seeks to reward anyway. In other words, if they’re smart enough to master such sophisticated test prep, they deserve a 6[…] 

As for good writing being long writing, Mr. Deane [Principal Scientist at E.T.S.] said there was a correlation. Good writers have internalized the skills that give them better fluency, he said, enabling them to write more in a limited time.

If you are a potential graduate student who is taking the GRE examination, you may want to pay attention to Perelman’s advice.  After all, your essay will be graded by a machine. 

About these ads
6 Comments leave one →
  1. June 11, 2012 7:45 am

    The earliest half-way successful prototype of what may be the GRE machine scored essay is ETS’s “Criterion® Online Writing Evaluation service.” In the college level ESL programs and writing courses I run, students and faculty members have for years been using this machine-scoring, machine-feedback-generating essay software. There are actually two coded programs running in the background: the one that produces the standardized score (on a 6 point range) and the other that gives the feedback on machine-found errors in categories of “organization / development” and “grammar” and “usage” and the like.

    The PC online version was developed from the ETS “Test of Written English,” which later became part of the computer-based-test version and then the internet-based-test versions of the “Test of English as a Foreign Language.” The TWE is a 30-minute essay written by the test taker to be scored on a 12-point scale (0 to 6 on increments of 0,5) by two trained human readers. (The scoring guide is now published online with information about the paper-based-test version of TOEFL. The Criterion® product was developed using hundreds or thousands of real essay and human scored data, but as mentioned it yields scores on the 6 point scale but only on whole-integer increments. Our faculty members find the scoring highly unreliable, and yes they can trick it too, as can students. Thus, for measures of writing that really count, human raters are always brought in. The program is used as an initial guess of a student’s writing proficiency and as a self-teaching tool by the students. There are a number of GRE topics available for those who want to try to practice writing essays for that exam.

  2. June 11, 2012 1:17 pm

    Yes, I think they are all part of the same family of tools. While there may be valid uses for such programs, I do not think they should be used for real exams such as the TOEFL and GRE. Effectively, only a single human scorer (working under incredible time pressure) is determining important grades. In situations like that, I think it is likely that the human being is acting a little bit like a machine himself.

  3. Tom Anderson permalink
    May 10, 2013 10:04 am

    So basically, I can write this response and leave it at that. Or maybe I should keep on writing more?

Trackbacks

  1. The problems with MOOCs 1: Robo-essay grading | BLT
  2. Look Behind the Numbers « Ars Docendi
  3. Technology and communication: Wouldn’t it be cool if AI could respond to all your texts, tweets, and posts? | Meet the world

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 376 other followers

%d bloggers like this: