I put together this piece of software to help me learn Japanese vocabulary. The problem with most of the programmes I found was that they come with their own vocabulary, making it hart to adapt the software for use in the context of some pre-existing curriculum. More particularly: I was attending a taught language course. Every lecture in that course had a bit of vocabulary attached to it that students were expected to learn. Therefore, language learning software was completely useless to me, unless it allowed me to put in that vocabulary. So I decided to write a programme that would make this kind of thing as easy as possible.

Another bit of functionality I wanted was that this programme should make it possible to deal with character sets other than the user's native input locale. In my case, I wanted to input Kanji. Using standard input scripts for Japanese would trivialize the task too much. Looking up the symbols in a standard character palette, on the other hand, would be way too tedious.

how this works

The following screenshot shows you the programme in action. To see how it works, think of it as a card game. You would have a deck of cards, each representing one word that you have to learn. On the screenshot you can see the status line shows "0/514". This means there are presently 514 cards in that deck.

The programme now picks a card from it at random. One side of the card represents a stimulus, the other a response. In the screenshot, you can see how it displays a Japanese word in two alternative spellings, the stimulus.

If your response matches with the response on the other side of the card, in our case the English translation, then the computer removes that card from the deck. The status now goes to "1/514", representing the fact that the deck of cards still to be learned has 513 cards in it, and the deck of cards you have successfully learned has 1 card in it.

If the response is incorrect, then the correct answer is displayed, and the card is placed back into the deck. In addition to that, a copy of the card is made, and also placed into the deck. The status would now go to "0/515", indicating that the deck of cards still to be learned now has 515 cards. This has two effects: First, for each time you make a mistake, you will have to get it right once. It's not enough to get it right once to make up for getting it wrong a hundred times. You would have to get it right a hundred and one times. Second, the next time a card is picked at random, it is now twice as probable that this card or its copy comes up. Thus, the programme will tend to keep bugging you about words that you get wrong frequently.

In order to enter, for example, a Japanese word using Kanji, you can use the character palette at the bottom of the window. Double-clicking a character will add it to the input box. It is guaranteed that this palette contains all the characters you need for the correct response. But it also contains some other characters, specifically characters selected at random from other cards in the deck, to make the selection a bit more difficult. This way, hopefully, a Japanese language learner would learn to recognize and distinguish between different Kanji in their particular vocabulary.

how to download, install, and use the programme

1. SimpleVocab is free software. In a nutshell: You're free to use this software and redistribute modifications, as long as you acknowledge our contributions and don't go around suing us for anything. But note that the licensing terms, in their legal smallprint, supersede and prevail over this paragraph.
2. Download the file SimpleVocab-0.99.1.zip and unzip to some appropriate location.
3. In that archive you will find a file named "SimpleVocab.jar". Run it with java -jar. Make sure that you run this while the directory you've extracted the zip file into is the present working directory.
  If these instructions make no sense to you, find someone who can help you with the installation, but don't bother us.
  If you're having troubles, although you're confident you've followed the above instructions correctly, please do contact us.
  You will have guessed this: Step 3, of course, requires that you have a Java 2 SE Runtime Environment installed. We've tested this with JRE 1.4.2 on MacOS X and with JRE 1.6.0 on Windows Vista. You can get an appropriate runtime environment for your platform here.

structure of the vocabulary database

This programme is really just the visual frontend to a vocabulary database that you have to create for yourself. The programme comes with some vocabulary so you have examples to look at. But probably you will want to input your own vocabulary. Here is how it works.

When SimpleVocab starts up, it looks at the data subdirectory of your SimpleVocab installation directory and look for ".tsd" files. Each such file represents a test session definition. When you open the "New Session" menu in SimpleVocab, you can select a tsd file to load.

In that directory, you will also find ".voc" files. Such a file represents a vocabulary. Let's have a look at the file test.voc:

わたし; 私 : I
[お]なまえ; [お]名前 : name (honorary)
[お]しごと; [お]仕事 : occupation (honorary)
けいたい, けいたいでんわ; 携帯電話 : mobile phone, cell phone
でんき : electricity; electric light; electrical machinery, appliance
電気 : electricity; electric light
電機 : electrical machinery, appliance
にほんご; 日本語 : japanese [language]
[お]てあらい; [お]手洗い : toilet

This file obeys a particular format, that you have to stick to when defining your own vocabulary. The main separator is the colon ":". This file might be used by an English speaker in order to build a passive vocabulary in Japanese. Each line represents one vocabulary item. The stimulus is on the left hand side of the colon, the response on the right hand side. In other words: Japanese words go on the left hand side, English words go on the right. In the above example, the programme might display でんき. The response "electricity" would be considered correct.

The next-level separator is the semicolon ";". Each item separated by a semicolon represents a synonym set in the respective language, each synonym set is a list of words separated by commas ",". The distinction between the semicolon, and the comma is a tricky concept. If you don't want to fuss with this, use commas all the time. But if you are a more sophisticated language learner, you will appreciate the distinction, so let me describe this in some more detail.

word senses: semicolon and comma separated items

Linguistically, the problem is that there is no one-to-one correspondence between words in one language and words in another language. Say, we start with a word "x" in English. Depending on the linguistic context "x" is used in in an English sentence, the English word "x" denotes either the concept x1, x2, or x3. We write this as

x1; x2; x3 : x

The interpretation of this, in the SimpleVocab engine is that the user is expected to recognize the English word "x", given either the concept x1, the concept x2 or the concept x3, independently. So each represents a different stimulus, that expects the response "x". So the above vocabulary file, is the same as the following.

x1 : x
x2 : x
x3 : x

We now translate each such concept into Japanese. But depending on the linguistic context in the Japanese sentence, we would translate concept x1, either as x11, or as x12, concept x2 either as x21 or as x22, and concept x3 always as x30. We write this as

x11, x12; x21, x22; x30 : x

Again, this is the same as writing the following.

x11, x12 : x
x21, x22 : x
x30 : x

Now, the interpretation of the comma in SimpleVocab, is that the user always gets to see x11, x12 together as one stimulus. This is because the user might require them all to determine which English words is meant, when a Japanese word is sense-ambiguous between different English translations.

What do the comma and the semicolon do on the right hand side, i.e. in the response? Linguistically it's the same story. Given the Japanese stimulus "y", this stimulus can translate, depending on the context in the Japanese sentence into a number of different concepts y1, y2, and y3, which in turn map to a number of different concepts depending on the linguistic context of the English sentence. So we could have the following.

y : y11, y12; y21, y22; y30

But how does SimpleVocab deal with this linguistic issue, when we have this kind of ambiguity in the response, rather than the stimulus? A user is expected to know all of the different meaning of a given Japanese stimulus, i.e. all of y1, y2, and y3. But, since the user cannot be expected to guess any particular linguistic context for an English sentence, to translate this to, it is enough if the user responds with either "y11" or "y12" in order to get the concept right.

There is some additional syntactic sugar to this file format. Material in square brackets is optional. So if we have "[to] swim" in English, it means the user can respond with either "to swim" or with "swim". Thus, in the response, "[to] swim" is equivalent to writing "swim, to swim". In the stimulus, the square brackets are simply displayed. Round brackets indicate comments that are provided, when a word appears in the stimulus.

session definitions

When you've entered your own vocabulary into test.voc, have a look at test.tsd.

vocab = test.voc
leftdown = no
rightdown = no
addinvert = yes  

This file defines a session based on the vocabulary in test.voc. If, instead of a ".voc", we specified a directory here, then all the vocabulary files in that directory would be taken together into one session.

The addinvert option chooses, whether you want to load vocab items, together with their inverse. If this option is set to yes, then writing

x : y

is equivalent to writing

x : y
y : x

This is useful if you have a vocab file like the test.voc displayed above, but you now wanted to use it to train Japanese, both as an active or a passive vocabulary, i.e. the programme would now give you Japanese words, asking you for English translations, and it would also give you English words, asking you for Japanese translations.

The options leftdown and rightdown choose, whether to "downgrade" all semicolons in the vocabulary definition on the left or right hand side of the colon, respectively, to commas. This is useful to make an "easy" and a "hard" mode for your vocabulary learning sessions. If you'd like to take it easy first, you can downgrade the items, so you will get maximally informative stimuli and would accept minimally informative responses, i.e. you would get all word senses as a stimulus, but only have to enter a single response.

further configuration

Also have a look at the file simplevocab.conf.

input.encoding = UTF-8
gui.font = Kozuka Mincho Std R-PLAIN-32

Here you can switch the input encoding for your vocabulary files, but please note that this has only ever been tested with UTF-8, and I see no good reason for using anything else, if you have a proper text editor.

You can also switch the font used for displaying text. This is useful for working with foreign character sets. For example, I found that the default fonts don't have very nice looking kana and kanji.

(c) Copyright 2007 -- 2009