Hypothetical: Making a word game and needing to check word..

General discussion about LÖVE, Lua, game development, puns, and unicorns.
User avatar
Jasoco
Inner party member
Posts: 3725
Joined: Mon Jun 22, 2009 9:35 am
Location: Pennsylvania, USA
Contact:

Hypothetical: Making a word game and needing to check word..

Post by Jasoco »

Hypothetical question. Say you are making a word game. But you need to check whether the word is real or made up. Obviously you need a database of every single allowed word.

How exactly do you go about it?

Is there a database of words you can download? Is there a site you could connect to and check against? If you had a text file with every single word, how exactly would you go about it quickly? How many words are there even in the English language?

I mean, you could place every word in a table. But then checking them all would be extremely time consuming. You could place the word itself as the table key, then check if word["Hello"] ~= nil. But then again, either method ends up with a huge table. What about having the words in a text file then somehow parsing the file for the word.
The Second Edition of the 20-volume Oxford English Dictionary contains full entries for 171,476 words in current use, and 47,156 obsolete words. To this may be added around 9,500 derivative words included as subentries.
That's about 228,000 words.

I mean I would love to make a word game at some point, but checking legality of the word seems to be the main problem. Conundrum. Puzzle. Situation. And other various thesaurus words.
User avatar
ivan
Party member
Posts: 1911
Joined: Fri Mar 07, 2008 1:39 pm
Contact:

Re: Hypothetical: Making a word game and needing to check wo

Post by ivan »

Jasoco wrote:I mean, you could place every word in a table. But then checking them all would be extremely time consuming.
It would probably work if the table words are KEYS to the table.
Scanning a database from file would probably be too slow unless the database is indexed (you have an index table which stores important positions in the database file).
Jasoco wrote:That's about 228,000 words.
200 K words is probably overkill for a word puzzle game.
It would probably take while to load such a huge database.
I think 10-20 K would probably be enough.
User avatar
Robin
The Omniscient
Posts: 6506
Joined: Fri Feb 20, 2009 4:29 pm
Location: The Netherlands
Contact:

Re: Hypothetical: Making a word game and needing to check wo

Post by Robin »

Reminds me of http://prog21.dadgum.com/29.html

Anyway, it might be useful to just store the most commonly used 1K-10K words or something. I'm sure you can find a database of those. Tough luck for the people trying to use "dirigible" or something.
Help us help you: attach a .love.
User avatar
bartbes
Sex machine
Posts: 4946
Joined: Fri Aug 29, 2008 10:35 am
Location: The Netherlands
Contact:

Re: Hypothetical: Making a word game and needing to check wo

Post by bartbes »

Reminds me of OMG! Words!, a game TommyBrunn/Nevon made for a blog (OMG! Ubuntu!). He actually used the last article published on their site as a source for words.
User avatar
miko
Party member
Posts: 410
Joined: Fri Nov 26, 2010 2:25 pm
Location: PL

Re: Hypothetical: Making a word game and needing to check wo

Post by miko »

Jasoco wrote:Hypothetical question. Say you are making a word game. But you need to check whether the word is real or made up. Obviously you need a database of every single allowed word.

How exactly do you go about it?

Is there a database of words you can download?
Check out cracklib: http://cracklib.svn.sourceforge.net/vie ... lib/trunk/ for example: cracklib/dicts/cracklib-small
Jasoco wrote:Is there a site you could connect to and check against? If you had a text file with every single word, how exactly would you go about it quickly?
I would embed sqlite3 and use indexed table. That is possible even in love2d, but you will need sqlite3 binary and lua binding for every platform. I wish love2d had sqlite3 built-in.
Jasoco wrote: I mean, you could place every word in a table. But then checking them all would be extremely time consuming. You could place the word itself as the table key, then check if word["Hello"] ~= nil. But then again, either method ends up with a huge table. What about having the words in a text file then somehow parsing the file for the word.
If I was to do it in plain lua, I would made a table indexed by first letter of a word. So:
for words: "home hello house" the table would be:

Code: Select all

Words={h={e={l={l={o=true},
                       },
                    },
                o={m={e=true},
                     u={s={e=true}}
                   }
               }
           }
or something like that. Check out the attachment for working example.
Attachments
words.lua
(1.09 KiB) Downloaded 163 times
My lovely code lives at GitHub: http://github.com/miko/Love2d-samples
User avatar
Robin
The Omniscient
Posts: 6506
Joined: Fri Feb 20, 2009 4:29 pm
Location: The Netherlands
Contact:

Re: Hypothetical: Making a word game and needing to check wo

Post by Robin »

miko wrote:If I was to do it in plain lua, I would made a table indexed by first letter of a word. So:
for words: "home hello house" the table would be:
I don't think tries have good performance compared to hash tables in this case. If there are a lot of words that have a common prefix, it could save some memory (but there is also a lot of overhead for all those strings and tables) but lookup time would be worse (O(log n) instead of O(1), I believe).
Help us help you: attach a .love.
User avatar
kikito
Inner party member
Posts: 3153
Joined: Sat Oct 03, 2009 5:22 pm
Location: Madrid, Spain
Contact:

Re: Hypothetical: Making a word game and needing to check wo

Post by kikito »

Robin wrote:(but there is also a lot of overhead for all those strings and tables)
For the tables, yes. For the strings, not so much. A big table with all the possible words would probably use up more memory IMHO.
Robin wrote:but lookup time would be worse (O(log n) instead of O(1), I believe).
I don't think so. For a sufficiently large number of different words, hash collisions will appear. When two words have the same hash, Lua has a to do some extra work - it basically relies on linked lists. In a way, that's also O(log n). But O(log n) in C, not in Lua.

A particularity of the string hashing in lua is that only the first 31 characters are used to calculate the hash. So if you were hashing, say, lots of those long "chained" words used in German, beginning with the same 31 letters, you would have worst performance. The Lua trie would probably be worse.

John Resig has a very interesting post about tries in javascript.

Edit: Actually, John proposed another solution not even considered here: using a big string like this:

Code: Select all

local words = " hell hello helipad hellen "
And simply find "space your-word space" on it:

Code: Select all

words:find(" hell ") -- true (returns a number)
words:find(" world ") -- nil
Depending on your needs, this might also work.
When I write def I mean function.
User avatar
nevon
Commander of the Circuloids
Posts: 938
Joined: Thu Feb 14, 2008 8:25 pm
Location: Stockholm, Sweden
Contact:

Re: Hypothetical: Making a word game and needing to check wo

Post by nevon »

I don't know if you're actually wondering where to get such a list, or if you're only interested in how to check against it. But if you are in need of such a list, look no further than /usr/share/dict/<language>/ on any Linux system (quite possibly in OSX as well). There are a bunch of other databases available from here.
User avatar
Jasoco
Inner party member
Posts: 3725
Joined: Mon Jun 22, 2009 9:35 am
Location: Pennsylvania, USA
Contact:

Re: Hypothetical: Making a word game and needing to check wo

Post by Jasoco »

Well how would a computer version of Scrabble do it? In the pre-internet days that is. It would have a database of all allowed words. SQL would be nice in Löve actually.

I'm wondering everything about this whole idea. Ideally there'd be a function you'd call checkWord(word) that would return true or false depending on if it's real or not. How it does the checking is the question. How it does it fast with hardly any notice that it's checking (i.e. no huge delay) is the real goal.

What if you had 26 tables, one for every letter, and put all words beginning with that letter in it. Then you'd cut down on how big the table is. But if you use the word as the key, I guess it's not as important.

I never really did any tests to see how long it takes to load a text file with 200,000 lines in it into a table in Lua using the lines function. I guess you could make it only load it once and output a cache file with the table already properly formatted. Would it load a .lua file with a table of 200,000 entries faster than loading a txt file with 200,000 lines?

Obviously if you are downloading a database, you can exclude words like single letter words or if your game doesn't allow nouns, exclude all the nouns from the cache. Make it smaller and only consisting of words that are allowed.

Basically you just need a Word Service that takes a word you pass to it and returns whether or not it's real, as well as maybe some other information if it's needed like the definition if it's required, whether it's a noun, verb, adjective, etc...

I already looked at that link, nevon. A lot of them are out of date and don't work, and some are just confusing. I just want a single file with every word in it. But then what if a word is added to the dictionary? Can the file be redownloaded and replaced? Can Löve do the downloading and processing or would the file be shipped with the app. These are all curiosities.
Post Reply

Who is online

Users browsing this forum: Google [Bot] and 2 guests