Thursday 16 October 2008

I fear I to be unable such a thing do, Dave.

Learning a language is like digging a moat for your sandcastle, as the tide inexorably rises. Or maybe like gardening. It isn't enough to say "There, I've done that bit - now I can move on" - you must constantly revisit and renew your earlier endeavours, or they will be washed away, overgrown, lost like tears in the rain...

I have a pretty good facility for languages, I think. I don't know why - a memory for detail and vocabulary, decent ability to pick up accents, or simply enough interest to make it stick - but whatever the reason, it's something I struggle with less than most. Some years ago, during a very brief and somewhat abortive relationship with a lovely South African girl, I couldn't help trying to pick up a bit of Afrikaans as a courtesy.
The accent wasn't difficult - light on the tip of the tongue, heavy on the pharynx - and the grammar was the simplest I'd ever encountered (except perhaps Chinese), so it was good fun to throw new phrases I'd learned into conversation, and have the occasional slow, stuttering conversation in her native tongue.

As you can imagine, the opportunities to reprise my conversational Afrikaans have been somewhat scarce since then. I didn't realise just how much of it I'd lost until someone offered to make me a cuppa tea. "Please", I wanted to respond, and perversely chose to do it in Afrikaans. Only... I couldn't remember the word!!
I mean, please, for goodness' sake! It's got to be one of the first ten words or phrases you learn in any language, and I was stumped. From having been able to understand and construct simple sentences, I suddenly had next-to-no vocabulary, just six years later.

The phrase I wanted (I remembered after a few moments) was Asseblief - roughly "if you please". And yet I had no problem recalling the phrase for I only speak a little - it's a pretty language, but I never use it. Obviously this phrase was one for which I'd had more use...

Human memory, of course, works nothing like a database. There are no convenient boxes in which to store information. There is no empty Tweetaalige Woordeboek (bilingual dictionary) waiting for you to indelibly inscribe it with every acquired transliteration.
Memory serves its purpose by retaining and reinforcing that which is used frequently, and slowly losing grip on that which is fleeting or trivial. The passage of memory from short-term, through its various stages, to long-term memory and (in the case of a skill like languages) into active process has been thoroughly researched by neuroscientists, linguists and tinkering hobbyist educational reformers for decades, and it all comes down to the three 'R's of learning:
  • Repetition
  • Redundancy
  • Repetition
(The above stolen from a Jhonen Vasquez comic about the spirit-crushing drudgery of state schooling, but I like it anyway.)

So it's about what you use, and how often you use it. You can even unlearn your native tongue through atrophy. I know of a man who moved from England to Germany in his early thirties. Now at 65, he is still in touch with his friends in England - but he finds he can only communicate, haltingly, over the phone. If he tries to write or email, he struggles with the English language. In a Firefox-esque feat, he now thinks in German, quite naturally, and struggles to do so in English.

I wonder: will I ever be that good at Japanese? If I work hard, and move over there someday then... well, why not?

The process of professional translation intrigues me; I find myself wondering, how does it work in their heads? Do they listen in one language, and then express it quite naturally in the other without any intervening explicit process? Or do they listen in one language, then switch their thinking to the other - donning a different thinking-cap, as it were - before trying to express the nebulous ideas and idiosyncracies in a natural fashion? I'm quite certain that it's possible to "think" natively in more than one language...

Even then, translation is not a simple process. Grammar notwithstanding, even syntax can become confusing when expression is rendered in culturally-significant shades of meaning.

I recall hearing of an assembly in the European parliament being brought to a standstill as, during a speech by the French representative, several of the English-speaking delegates burst into laughter. Having made an appeal for calm and rational consideration of the issues, he exclaimed that what the problem needed was "la sagesse Normande".
The English translators, quite faithfully, relayed the speech thus:
"What we need is Norman Wisdom!"

That's not the half of it though. Humans, with their inherent understanding of the ideas behind the words, can translate faithfully rather than accurately. Computer software has no such cognitive gifts at its disposal, and the results of even the most sophisticated attempts at translation are derided throughout the blogosphere.

It's the same problem: a database can give you a word-for-word equivalent, but nothing cogent or intuitive - and even with simple words, cultural ignorance can lead to confusion. A generation or two ago, there was no distinct word for "green" in common use! あお (ao) is taken to mean blue, but it was also used for green not so long ago, and some Japanese still use it as such. This sort of cultural knowledge is invaluable when trying to make sense of, for example, Natsume Sooseki's Ten Nights of Dream. It's easy to get stuck trying to understand the significance of the lily's blue stalk...

Does this mean that elderly Japanese people can't tell the difference between blue and green? No, of course not...
And yet, there is some truth in that statement, bizarre as it may sound. Not in an extreme sense, but studies have shown the importance of language to perception. According to research undertaken at Goldsmith College (and almost certainly many other studies since), the range of words you have for different hues affects your ability to distinguish between them. If we had 20 words for subtly different shades of orange in the English language, we would perceive them as distinct colours, and would recall them as such without difficulty.

It all smacks of Derrida and Phenomenology, doesn't it...?

This ties in nicely with another study (thank god for New Scientist) investigating the way in which our infant brains adapt to perceive distinct sounds characteristic to our mother tongue. Through repeatedly hearing - and presumably expressing - certain ranges of sound and learning to interpret them as the same sound, we lose the ability to distinguish between the subtle variations. This is quite necessary, for the sake of efficiency in communication, but can be a hindrance when learning a new language.
The classic example is the Japanese l/r sound, which is neither one nor the other. Through careful and diligent study, one can relearn the distinctions lost in infancy, but it is difficult - the mind learns to perceive certain patterns in the chaotic landscape of reality, and convincing our brains to jump tracks in its well-worn neural grooves is hard work.

So how can there be any hope for computers? Is it possible, somewhere in the hypothetical space-opera future, for software to "understand" language in the same way that humans do? Derrida or Heidegger might argue that all of perceived reality is exactly that - perception only. Given that language is the exclusive realm of signifiers and symbols, one might suppose that computers - which deal only with symbols and signifiers - would be ideally suited to the task. Can one be "trained", in the manner of a human mind, to have intrinsic understanding of a concept? Can an artificial mind be kicked out of its paths of databases and into a more functional, fluid form of expression and translation?

Perhaps the answer lies in that last question. Functional programming languages (Haskell, Lisp) operate on a basis somewhere beyond the mechanical strictures of Structural languages (Pascal, Aida) or the deliberate and measured methods of Object Oriented Programming (Java, C++). My brother (the Dysfunctor - get off your arse and fix your Blog, mate) could tell you a million times more than I could about this topic, but I have some very basic understanding. All things are functions - processes, if you will - and everything is signified rather than explicit. Sound familiar?

Artificial Intelligence (the emergent kind) and a computer really learning a language are in the same chapter of philosophy - the same page, even - because language, perception and intelligence are so closely linked. They're pretty much a blurry smear of concepts, as any drunk philosophy undergrad will rant. There's no point trying to tackle one without approaching the others, but if we come at it side-long, with a very long game-plan in mind, and functional programming as the tool (or the precursor to a better one), then who knows...?

Still more curious: if we created machines with the ability to learn and communicate, but didn't teach them anything, what language would emerge from their society? What could we learn from their linguistic development?

Before they wiped us all out, I mean.

1 comment:

Anonymous said...

You remind me of a wonderful thought I thunk a little while ago about computers and natural language.

Yahoo! Translator works the old-fashioned way. A bunch of linguists & programmers try to instruct a computer how to parse a sentence in one language, and spit out an equivalent sentence in the other language.

Google Translate is the new machine learning hotness. [Note: "machine learning" is the politically correct term for "AI". AI doesn't get research funding.] You just train their machine with bilingual parallel texts, and it does the rest itself.

I was expecting Google Translate to steal Yahoo! Translate's lunch money. This is partly because I'm a Google fanboy, but mainly because I've seen just how good Bayesian algorithms can be. E.g. Bayesian spam filters spot things that would never have occurred to their human masters. Who'd have thought that "sex" is not a spam word, but "sexy" is?

I was disappointed. Google Translate turned out to be no better, and no worse than Yahoo! Translate. Not that I'm knocking Google Translate's achievement, but still...

Then I started to wonder ... why did different approaches yield such similar results? That's not what normally happens. Usually, one approach is clearly dominant. That's when I thunk my thought.

My hunch is that both teams hit the same hard limit. The hard limit is this: you can't understand natural language until you understand the human condition. For instance, concepts of time, distance and motion are so deeply embedded in our language that we don't even notice their presence. In those last two sentences alone, I've used words like "hard limit" (physical analogy), "until" (using time to describe a logical dependency), "deeply embedded" (physical analogy) and "presence" (physical analogy). A machine that lives in cyberspace can never really understand a language so firmly rooted in meatspace.

Which is all just a long winded way of saying, "You can't learn a language properly without learning the culture it inhabits."

Regarding my blog: all in good time, Pyro. I have a lot of things I need to get off my arse and fix. Please could you spell my name with a small "d". :-)