A Garbled Story

Have you ever thought about making up your own language?

What would it sound like? How would you form words?

When I decided to give it a try, my first thought went to language learning models - the backbone of software like ChatGPT and other text based "AI" assistants. This led to the creation of two scripts: one that would turn a source text (or corpus) into a dataset, and another that could use the dataset to create words that seem to follow the rules of the source material's language.

With a method to create plausable words down, I began to look for a way to create a grammar. This led me to visit the library, where I picked up some books on linguistics. While reading them, I learned that linguists had already "cracked the code" of English. There's no need for LLMs or fancy scripts - words in English are, in theory, created by a simple set of rules.

Realizing this, I recreated the word generating script using Javascript, these rules, and some HTML. Since it's less than 10 KB, I was able to share it online where everyone can play with it for free.

Amusingly, this new script is much more likely to generate actual words than the ones I'd made using language learning models. Whether or not this is a good thing is a matter of opinion.

As for creating my own fictional language, that goal ended up getting lost in the shuffle. But if I ever want to try it again, I won't be starting from scratch!



A WFTID Website