------------------------------------ TEXTO - Text steganography ------------------------------------ Texto is a rudimentary text steganography program which transforms uuencoded or pgp ascii-armoured ascii data into English sentences. This program was written to facilitate the exchange of binary data, especially encrypted data. Why is this necessary? People or programs may be reading your mail. Recent events in the US congress may _require_ Internet Service Providers to monitor incoming mail and determine whether or not it is "obscene" or lives up to particular parochial moral standards. Since they can't scan the contents of an encrypted message, and probably don't have time to manually look at each uuencoded message, such emails will probably go into the bit bucket. This program's output is hopefully close enough to normal English text that it will slip by any kind of automated scanning. Texto text files look like something between mad libs and bad poetry, (although they do sometimes contain deep cosmic truths) and should be close enough to normal english to get past simple-minded mail scanners and to entertain readers of talk.bizarre. Texto works just like a simple substitution cipher, each of the 64 ascii symbols used by pgp ascii armour or uuencode is replaced by an english word. Not all of the words in the resulting English are significant, only those nouns, verbs, adjectives, and adverbs used to fill in the preset sentence structures. Punctuation and "connecting" words (or any other words not in the dictionary) are ignored. The obvious main drawback to using this program: the resulting text is larger than the original data by a factor of 10. This is bad to the point of uselessness if you need to send a 5MB uuencoded file. What are some possible solutions to this problem? Using shorter words would yield only minimal improvement as most of the words are pretty short now, and you would still need the same number of english words. The best solution I can think of is to use more words, one for every 2 symbols instead of a one-to-one symbol to word mapping. This requires 4096 words for each part of speech, (finding that many adverbs will be a real challenge), but search speed shouldn't become a big factor when transforming text to data, since texto uses a hash table for the words and their lengths in order to minimize search times. The net result would probably be an average expansion by ~5x instead of ~10x, which is significant enough to warrant trying it. Changing the code will be easy, the hard part is typing in the dictionaries. Look for this feature in texto 2.0 coming Real Soon to a net near you. Since words are occasionally pluralized and/or gerundized (-ing), and they're not all regular verbs/nouns, there are plenty of strange spelling mistakes. While normally I despise misspelled words, they add a nice human touch to the repetitive text, and add to the feeling that who/whatever wrote the text was quite clearly out of his/her/its mind. Usage: ------ texto msgfile > engfile - Transforms the contents of msgfile into English text and places results in "engfile" msgfile must be a uuencoded or pgp ascii- armoured text file. texto -p engfile > pgpfile - Takes English text from engfile and produces OR a pgp ascii-armoured text file, which will texto -p engfile | pgp -f be readable by pgp if the original message file was. Alternatively, the output from texto can be piped directly into pgp. texto -u engfile > uufile - Takes English text from engfile and produces OR a uuencoded file, which will be readable by texto -u engfile | uudecode uuencode if the original message file was. Alternatively, the output from texto can be piped directly into uudecode. NOTE that uudecoding the results will always produce a file called "texto.out" mode 644, unless you redirect texto's output into a file and hand edit that file. Installation: ------------- This program has only been tested on IRIX 4.0.5, linux kernel 1.0.x, and Solaris 2.3. To build it, just type "make", on SGIs make it with the command "make sgi". If you're on a Solaris machine or any other machine whose uuencode uses spaces instead of ` characters, uncomment the "DEFINES" line in the makefile. Rolling your own: ----------------- The usually-correct English sentence structures are found in the file "structs", which is basically a file of mad lib-type "fill in the blank" sentences. Feel free to add your own, just be really really careful about not using words in the "words" file. You're safe if you use words that you see elsewhere in the "structs" file. Using varying "structs" files could at least annoy mail scanners. Using different "words" files as well should totally defeat them. The 64 verbs, 64 adjectives, 64 adverbs, 64 places, and 64 things which are used to fill in the blanks are in the "words" file. Again, feel free to add your own, but again, be careful. Don't use words that end in "s" or "ing" (they'll get chopped), don't use words that are already in there (you can double check with the command "sort words | uniq -d"). The order of the words in each section of the file is also significant, so for example rearranging the nouns will change the result. If you use a modified "words" file, the person on the other end of your communication must of course be using the same one, or the transformation will fail miserably. The "structs" file is totally irrelevant however, and can be modified to suit your taste or literary style, so long as it doesn't conflict with the "words" file as mentioned above. The structs file is not used in "decoding" text, so two people can still communicate whether or not they have the same "structs" file. BUGS ---- uuencoded files lose the mode and filename information, which is a bummer. Always writing to stdout may not be the best way to go. The text produced by texto'ing a uuencoded file can be _really_ repetitive. The 64-word dictionaries thing vs. the 4096-word ones, as mentioned above. Texto is a dorky name, but it sortof rhymes with stego. Please report any other bugs or fixes to kmaher@ucsd.edu LICENSE ------- Copying, modifications, improvements, etc. are highly encouraged, just let me know so I can incorporate them. All rites reversed. AUTHOR ------ Kevin Maher kmaher@ucsd.edu Underware Software Production Ltd. Inc. etc. "Covering your ass since 1981"