Behind the Gibberish
The Gibberish Engine has the ability to use one of several methods to create strings of plausable words. With one exception, these methods utilize some actual logic to it work.
Here's an explanation of each method.
English Morphology (aka the Word Form method)
Sounds in English come in two types: consonants and vowels. These are the building blocks of every English word, and surprisingly enough, there are only four ways to combine them: CCCV, CCV, CV, and VC.
While CV and VC can use any consonant or vowel sound, there are additional rules for CCCV and CCV.
To make a word with CCCV, the first consonant must be the 's' sound. Next comes a 'p', 't', 'c', or 'k' sound, and the final consonant is either an 'r' or 'l' sound. Examples of this pattern include "STRing" and "SPRing"
CCV has more consonant options. Words with this pattern can begin with 'p', 't', 'c', 'k', 'b', 'g', 'f', 'th', 's', or 'ch'. Unlike the CCCV pattern, which consonants can appear in the second slot are dependent on the first consonant.
Once we have the base of the word figured out, it's time to look at word endings, prefixes, and suffixes.
Endings are a made of one or more consonants from yet another list. They're optional, so we won't add them every time. Also, some vowels, like the 'oy' in 'toy' or 'soy', are expected to end the word, so we won't add an ending to any words that end with one of them.
As for prefixes and suffixes, we'll just add them randomly.
Sprinkle in a few rules to clean up any unwanted letter combinations (such as 'ii'), and you have yourself a new word!
Frequency Analysis
Another method for creating new words is to study words that already exist, create some statistics, and let randomness try its luck. I've labelled this method the "Frequency Analysis" method, as it's based on how frequently certain letters come after a given letter.
In order to create the dataset for this method, I picked up a copy of the
Open-Source English Dictionary, separated out the nouns, and then had a script I prepared track how often one letter was followed by another. These totals were then turned into percentages that could then be used to generate the dataset itself.
For example, if the letter 'd' follows 'n' 7% of the time, then the string for the letters coming after 'n' should contain 7 'd's. You can see the actual amounts if you view the source code of the generator's Javascript file; It's the FreqLetters array.
Common Pairings
This method is more haphazard than the Frequency Analysis method, but it works on the same general idea. Using the same word list, I had another script break every word into two letter pairs and track which pairs are related. In order to make a new word, several of these pairs are slapped together from the lists of possible options. If the list comes to an unexpected end, so does the word being built.
Alphabet Soup
The Alphabet Soup method is the exception I mentioned above. It creates new words by selecting the length of the desired word, and then randomly selects that many letters from the alphabet. Without any rules to fine-tune the process, this results in a big pile of 'x's, 'q's, and other uncommon letters.