The humanity of Google Translate, Perl, and a multi-lingual foodie app

September 14, 2011

For those of us who value the human touch, here are a few tidbits to remind that the best programs and machines are because of smart people. Here are quotations about Google Translate (GT), about the computer language Perl, and about a new app for anyone who loves new foods but may not yet have the human languages to order them.

something on the humanity of GT –

Translators don’t reinvent hot water every day. They behave more like GT – scanning their own memories in double-quick time for the most probable solution to the issue at hand. GT’s basic mode of operation is much more like professional translation than is the slow descent into the “great basement” of pure meaning that early mechanical translation developers imagined.

GT is also a splendidly cheeky response to one of the great myths of modern language studies. It was claimed, and for decades it was barely disputed, that what was so special about a natural language was that its underlying structure allowed an infinite number of different sentences to be generated by a finite set of words and rules.

A few wits pointed out that this was no different from a British motor car plant, capable of producing an infinite number of vehicles each one of which had something different wrong with it – but the objection didn’t make much impact outside Oxford.

GT deals with translation on the basis not that every sentence is different, but that anything submitted to it has probably been said before. Whatever a language may be in principle, in practice it is used most commonly to say the same things over and over again. There is a good reason for that. In the great basement that is the foundation of all human activities, including language behaviour, we find not anything as abstract as “pure meaning”, but common human needs and desires.

All languages serve those same needs, and serve them equally well. If we do say the same things over and over again, it is because we encounter the same needs, feel the same fears, desires and sensations at every turn. The skills of translators and the basic design of GT are, in their different ways, parallel reflections of our common humanity.

something on GT as yielding a rather human outcome –

Kira Simon-Kennedy wrote to me from Beijing that she is chaperoning 30 French high school students on their first trip to China to learn Mandarin.

Yesterday afternoon, the French students were trying to decipher the following banner at a bus stop: “没有共产党, 没有新中国.” Most of the students have already taken a couple years of lessons, so they could be classed as having reached intermediate level. They got as far in their interpretation of the sign on the banner as “There is no collective __, there is no new China.” Not bad for intermediate level learners, but the banner remained a mystery to them, if only at the lexical level because they didn’t know what 共产党 meant. However, when Kira told the students that 共产党 meant Communist Party, they were all the more puzzled. “Are they allowed to say that (‘there is no Communist Party’)?” one student asked. “Isn’t that really dangerous to deny the existence of the Party in public?”

The students thought that someone had the nerve to buy a public ad to tell the world: “There is no Communist Party, there is no New China” — superficially that’s what the sign on the banner seemed to be saying. The close grammatical parallelism of the two clauses only made such an interpretation seem all the more certain.

How to break through the impasse of a sentence that seems relatively easy to understand, but yet remains incomprehensible (i.e., it is at odds with patent reality, viz., “there is a Communist Party, there is a New China” — quite the opposite of what the sentence seems to be saying, grammatically speaking, viz., “there is no Communist Party, there is no New China”)?

We have repeatedly been disappointed by translation software, but let’s run this problematic sentence through Google Translate and Baidu Fanyi to see if they can do any better than the intermediate level students from France.

Méiyǒu Gòngchǎndǎng, méiyǒu xīn Zhōngguó 没有共产党，没有新中国 (“If there were no Communist Party, there would be no New China”)
Google Translate: Without the Communist Party, No New China
Baidu Fanyi: Without the Communist Party, there’ll be no new China

I’m impressed.

Let’s see how they do with the comma removed: Méiyǒu Gòngchǎndǎng méiyǒu xīn Zhōngguó 没有共产党没有新中国 (“If there were no Communist Party, there would be no New China”)
Google Translate: No new China without the Communist Party
Baidu Fanyi: There’ll be no new China without the Communist Party

I’m really impressed!

something on the humanity of Perl –

I don’t want to talk to a stupid computer language. I want my computer language to understand the strings I type.

Perl is a postmodern language, and a lot of conservative folks feel like Postmodernism is a rather liberal notion. So it’s rather ironic that my views on Postmodernism were primarily informed by studying linguistics and translation as taught by missionaries, specifically, the Wycliffe Bible Translators. One of the things they hammered home is that there’s really no such thing as a primitive human language. By which they mean essentially that all human languages are Turing complete.

When you go out to so-called primitive tribes and analyze their languages, you find that structurally they’re just about as complex as any other human language. Basically, you can say pretty much anything in any human language, if you work at it long enough. Human languages are Turing complete, as it were.

Human languages therefore differ not so much in what you can say but in what you must say. In English, you are forced to differentiate singular from plural. In Japanese, you don’t have to distinguish singular from plural, but you do have to pick a specific level of politeness, taking into account not only your degree of respect for the person you’re talking to, but also your degree of respect for the person or thing you’re talking about.

So languages differ in what you’re forced to say. Obviously, if your language forces you to say something, you can’t be concise in that particular dimension using your language. Which brings us back to scripting.

How many ways are there for different scripting languages to be concise?

How many recipes for borscht are there in Russia?

Language designers have many degrees of freedom. I’d like to point out just a few of them.

something on the humanity of the multi-lingual foodie app –

Foodie culture has sent America’s culinary adventurers into the deepest regions of their local ethnic neighborhoods in search of new delicacies. Unfortunately for more open-minded eaters, they often find themselves confronted with unintelligible menus written in an intimidating foreign language.

A new app from Purdue University helps intrepid restaurant goers overcome that language barrier by not only translating the menu, but providing instructions about food allergies in a number of different dialects.

The user types the name of a desired dish into a prompt field in the graphical user interface. The text is translated, and the best possible translations are then listed, along with other information, including pictures and ingredients. The user can then browse the multimedia database to obtain more information about the dish or the ingredients. When appropriate, information and questions for the waiter are suggested.

Sources:

http://www.independent.co.uk/life-style/gadgets-and-tech/features/how-google-translate-works-2353594.html

http://languagelog.ldc.upenn.edu/nll/?p=3340

http://speakeristic.blogspot.com/2008/01/empowerment-of-vulgar.html

http://www.msnbc.msn.com/id/44492610/ns/technology_and_science-innovation/#.TnCmXezfTl4

(Now try reading this post, or this blog, in one of your other human languages, using GT. Here’s how it reads in one of mine.)

6 Comments leave one →

Theophrastus permalink*

September 14, 2011 3:57 pm

It seems as if you are surprised that Perl has “humanistic” qualities, but I’m not sure why you should be. Perl is not written for machines, but for humans. The authors of computer programs are usually human, and the question of designing a computer language that is easy for humans to use (or even more important: hard for humans to misuse) occupies a large part of the question of the philosophy of computer language design.

In fact, it is an awesome responsibility to write a de novo computer language, especially one that has no precedents. I am sure you are aware that there are many people who have designed their own artificial languages (such as Esperanto), but these almost always are based directly or indirectly on existing languages. Imagine that you were designing your own language (either for human-to-human communication, or as a set of instructions for computers.) Where would you begin with such an effort? Can you break out of Chomsky’s rules for human languages?

It is a humbling task. Any of us could quickly come up with a long list of small improvements to English (e.g., more consistent spelling, simplified grammar, etc.) But if we are to design a language completely from scratch, we are stymied in our search of where to begin.
J. K. Gayle permalink*

September 15, 2011 9:28 am

a long list of small improvements to English

I love how Larry Wall, the one who created Perl and allows an entire culture of people (a cult?) to continue to re-create the computer langauge, talks about English in relation to Perl:

Moving right along, I believe that learnability is a laudable goal, but frequently misplaced. The purpose of a language is not to help you learn the language, but to help you learn other things by using the language. We don’t water down English to make it easy to learn. We prefer English to remain a rich language, quirky, sloppy, and full of redundancy. Same for Perl.

A corollary to that is, while we don’t water down the language itself, we do allow people to speak subsets of the language. We don’t expect a five-year-old to speak with the same diction as a fifty-year-old. We don’t expect a native German speaker to use the same subset of English as a native Mandarin speaker. Similarly, we don’t look down on people for using subsets of Perl. There are certainly enough of them. You can write Perl programs that resemble sed, or awk, or C, or Lisp, or Python. This is Officially Okay in Perl culture. By way of contrast, try writing in the C subset of C++ and they’ll make a laughingstock of you.

I also believe that, while languages can have efficiencies and deficiencies, the languages themselves are essentially amoral. Language is not the level at which we should enforce ‘good thoughts’, if we want our language to be maximally useful. You can’t enforce morality by syntax. In English it is just as easy to say ‘bless you’ as it is to say ‘fuck you’. You may argue that in Perl it’s easier to use the verb “bless” because it’s built-in, but in actual fact, Perl lets you define ‘fuck’ any way you choose. You can also ‘goto hell’ if you like, which will of course work better if you’ve defined the label ‘hell’.

But seriously, many computer scientists have fallen into the trap of trying to define languages like George Orwell’s Newspeak, in which it is impossible to think bad thoughts. What they end up doing is killing the creativity of programming.

And since you ask, “Can you break out of Chomsky’s rules for human languages?,” I must reply that Chomsky’s rules are Chomsky’s rules. At least and most clearly in his earliest iterations of his theory of Language, Chomsky was platonic. And was aristotelian too. The ideal first, then the syllogism followed: Competence counted, in his binary, over and above Performance. The rules for human languages really merely counted as the rules for Language, for Competence with any given human language.

Seems to me that Wall resists such a narrow ideal of what Language must be and what languages really actually are. Wall, like Kenneth Pike, finds Chomskyian formalism fairly reductive. He’s after letting Perl conform more to and to evolve more into the richness, the quirkinesses, the sloppiness, and awfully redundant redundancies, of English.

Here’s another Wallian statement to illustrate this view:

In computer science, we do not value sloppiness. We do not value unpredictability. And we certainly don’t value redundancy. At least, not till we get on an airplane…

Yet these are the very attributes that allow evolution. The quirkiness of evolution allows a proto-insect to take a pair of heat control surfaces and turn them into wings. The sloppines of evolution allows small mammals to exist for eons despite the presence of the obviously superior dinosaurs. The massive redundancy in our genetic code allows two copies of a gene to diverge and perform different tasks.

That works in Perl culture too. As you know, the slogan for Perl culture is, ‘There’s more than one way to do it.’

You no doubt know much better than I how that compares to various other computer languages and the cultures around their creators. Wall has said, “Obviously, both Perl and Linux owe a lot to Unix culture.” I would guess that Perl owes little to BASIC or to .NET or to Chomsky.
Theophrastus permalink*

September 15, 2011 10:17 am

Those are interesting points, but they are somewhat orthogonal to the concerns you and I are talking about.

First, note that the quotes about Wall give a great deal of insight into Wall’s personality.

Second, I do not know the full context of the quotes, but Perl has come into disfavor over the years for being an “everything plus the kitchen sink” language with many internal design inconsistencies. It is often contrasted, for example, with Python, which has a much cleaner and more logical design. Wall appears to have that criticism in the back of his mind as he defends the design of Perl.

You are correct in implying that few programming languages are designed completely from scratch — most build on predecessors. However, there are a few that are startling original that they represent completely new ways about thinking about programming.

However, all of those programming languages are intended for people to use. Here is an analogy — a well made hammer is certainly a tool, and a tool primarily used for attaching another tools (e.g., nails) to structures (e.g., wood). But it is designed for people, with a balanced handle and grip designed to allow humans to take maximum advantage of it. If we were designing a hammer for a robot, we would probably not achieve the same design.

Regarding the question of originality in language design, we have seen natural languages that, in their linguistic diversity, also offer new insights into structuring communication. But what I have not seen is an artificial human-to-human language that even begins to approach the originality of some computer languages; or alternatively, the natural diversity of human languages. It seems somehow that a failure of imagination grips those who would design artificial languages.

BLT — Bible * Literature * Translation

The humanity of Google Translate, Perl, and a multi-lingual foodie app

Trackbacks

Leave a comment

Search BLT posts

Posts by author

Recent Posts

Recent Comments

Archives

Authors of BLT

Blogroll: Ancient Near East

Blogroll: Arts and Music

Blogroll: Bible

Blogroll: Books

Blogroll: Language and Linguistics

Blogroll: Literature

Blogroll: Liturgy

Blogroll: Math and Science

Blogroll: Philosophy

Blogroll: Religion

Blogroll: Rhetoric

Blogroll: Translation

Other Places You Might Go Trying to Get Here

Categories

code for recognition

archive code

BLT — Bible * Literature * Translation

The humanity of Google Translate, Perl, and a multi-lingual foodie app

Share this:

Trackbacks

Leave a comment

Search BLT posts

Posts by author

Recent Posts

Recent Comments

Archives

Authors of BLT

Blogroll: Ancient Near East

Blogroll: Arts and Music

Blogroll: Bible

Blogroll: Books

Blogroll: Language and Linguistics

Blogroll: Literature

Blogroll: Liturgy

Blogroll: Math and Science

Blogroll: Philosophy

Blogroll: Religion

Blogroll: Rhetoric

Blogroll: Translation

Other Places You Might Go Trying to Get Here

Categories

code for recognition

archive code