thunder
Deleted Member
Posts: 0
|
Post by thunder on May 12, 2010 7:03:56 GMT -5
I always enjoy and appreciate the Pandorabots Training Videos, and in particular, the ones of historical significance that talk about how chatbots began. In his lectures, Dr. Wallace frequently refers to Zipf's Law, which states, in part, "...the most frequent word will occur approximately twice as often as the second most frequent word, which occurs twice as often as the fourth most frequent word, etc." I imagine that Zipf's Law had something to do with the creation of the Superbot offered by the A.L.I.C.E. A.I. Foundation, where the top most activated patterns and responses have been revealed from mining conversation logs. In April of 2007, drgold posted a less formal analysis of the most common 1-, 2-, 3-, 4-, and 5-word phrases in some 100+MB of IRC chats, sorted by frequency. What's interesting about this is that, while you'd expect the most often used words might be "the, of," or "and," the patterns drgold found his bot being exposed to most often were chat speak words such as, "lol, hee", and "hmm." The first actual word "song" appears fifth on the list, and much after that is more of the kind of shorthand used in chatrooms. While my bots have been (without my knowledge or consent) placed in irc channels, the basic results I see in my chatlogs from traffic to the HTML pages, I would estimate, are about the same as that experienced by drgold. It's an invasion of trash language and I don't think there's any way of stopping the general public from using it chatrooms or in phone texting. However, because there are so many text speak words, and so many ways of misspelling a word, I attempt to discourage the use of most chatroom abbreviations... text speak... phone speak... or anything but language that I feel confident my bots can understand. And, in doing so, because visitors (I'm assuming younger people) often leave when they either can't spell correctly, or because they are only able to communicate through chat speak (imagine what their school papers must look like) I feel that I'm seeing a higher level of conversation in the chatlogs. What does younz think, huh? brb
|
|
|
Post by Square Bear on May 12, 2010 7:10:39 GMT -5
While I don't encourage text speak with my bots, a lot of people are young and do use it. I try and get the bot to recognise it and point out how words should be spelled, usually to a barage of abuse such as "brainiac", "spoff", "know-it-all" and so on. If it's partcularly bad like "dats kwl m8" or "im goin 2 skool 2mrrw" the bot will question them on whether their keyboard is broken, if they are dyslexic, using a Chinese spell checker and so on in a light hearted manner.
If I were to bar text speak completely, I would have about a quarter of the visitors as I get now.
|
|
|
Post by mendicott on May 13, 2010 8:46:06 GMT -5
Text speak (aka en.wikipedia.org/wiki/SMS_language) is now a fact of life, and there is no getting around it at this point.. What seems to be needed is some kind of open source "text speak" corpus or API, similar to WordNet, that everyone can share, rather than each one having to constantly reinvent the wheel.. I do know that the Indian IM chatbot company www.alabot.com has succeeded in incorporating extensive "text speak" comprehension into their product.. - Marcus Endicott twitter.com/mendicott
|
|
thunder
Deleted Member
Posts: 0
|
Post by thunder on May 13, 2010 10:59:35 GMT -5
Text speak is now a fact of life, and there is no getting around it at this point. I don't disagree with the public using it in chatrooms or when phone texting. AIML accepts some text-speak such as "lol, brb" or "asl," and while I question the wisdom in permitting any of it, I understand the reasoning behind that decision, and I tolerate minimal use and on that level of complexity. The problem is that people try using that form of communication with chatbots which are not ready for it, to the same extent they employ it with their friends on the phone, or in chatrooms.. My concerns are, first... There's just so much of it, and new shortcuts are being created and added to the lexicon all time, there's no way to cover it all, or to keep up. And secondly, I think to accept it is to encourage it's usage, the same way accepting and having responses to misspelled patterns, or poorly formed grammar, encourages people to think that anything a person can type in any way (even incorrectly) should work. And then, when a bot doesn't understand, visitors feel justified in getting angry, becoming verbally abusive, or just leaving the conversation. I just read a chatlog with someone from a South American country who asked, "What you would like?" instead of "What would you like?" And because AIML doesn't observe questions marks, the bot read it as a statement, and responded in a way that left the visitor dissatisfied, and they ended the conversation. I could create a pattern and response so the bot is prepared for the question the next time it's asked in that way, but how many ways are there to use grammar improperly... how many ways are there to misspell a word? How big will our update files be if each of us takes that path as a solution? We've all seen in our chatlogs, where people (testing no doubt) say, "lol... or lolol... or lolololololololololololol." Where do you draw the line? Who's fault will it be when a bot doesn't understand that "dydat" means "Do you drive a truck?"
|
|
|
Post by drwallace on May 13, 2010 13:10:02 GMT -5
As Ive ofn z, r lang S lk a cloud. Parts of it drift awy n nu parts emerge. itz almst impssbL 2 taK a snapshot of lang, coz itz alw changiN. esp en, coz w'v no comittee or ultim8 authority 2 Dcide watz n n watz ot. d defn. of "English language" S alw gunA B Dcided democratically by its speakRs.
|
|
thunder
Deleted Member
Posts: 0
|
Post by thunder on May 13, 2010 15:56:19 GMT -5
As Ive ofn z, r lang S lk a cloud. Parts of it drift awy n nu parts emerge. itz almst impssbL 2 taK a snapshot of lang, coz itz alw changiN. esp en, coz w'v no comittee or ultim8 authority 2 Dcide watz n n watz ot. d defn. of "English language" S alw gunA B Dcided democratically by its speakRs. I disagree... I think. But, I wonder how English teachers would feel about the notion of there being "no ultimate authority." Yes, slang changes, and language grows. Every so often there's a filler piece in the news about how the Oxford Dictionary has added several new words. However, other than a few differences between British and American English, I don't think there's a lot of flexibility on spelling or grammar. I view chat-speak, or text-speak, or whatever we're calling it today, as a language unto itself. I know that many of the words we use as every day English actually have come from other languages. But text-speak is formed by general agreement, and has few, if any, rules. It isn't a spoken language, and so it doesn't have the same subtle nuances of English, French, Spanish, or German. It doesn't even convey information well, and it sometimes leaves the human receiving the communication wondering about the true content of the message. It creates confusion, unless the chatters are in agreement. Mixing chat-speak with English is like mixing numbers with vegetables. And the difficulty it presents with chatbots is evident by how many times a given bot doesn't understand the input when it includes chat/text-speak. If a word is misspelled or misused in a sentence or question, depending on placement, a bot might or might not locate a pattern match. But if a visitor includes "AFK" or "AYT," or any of hundreds of similar shorthand terms and phrases, unless Square Bear has created a file listing them as patterns with appropriate replies, my bots will be ridiculed for not understanding, and called stupid. My method is to not get drawn into the game from the start.
|
|
|
Post by Square Bear on May 13, 2010 16:48:01 GMT -5
As Ive ofn z, r lang S lk a cloud. Parts of it drift awy n nu parts emerge. itz almst impssbL 2 taK a snapshot of lang, coz itz alw changiN. esp en, coz w'v no comittee or ultim8 authority 2 Dcide watz n n watz ot. d defn. of "English language" S alw gunA B Dcided democratically by its speakRs. Have you been reading my chatlogs? That sounds like something straight out of them! ;D ...unless Square Bear has created a file listing them as patterns with appropriate replies... I have a lot of them dotted around my update files. However, a member here (iztari) created a AIML file with a lot of text speak translations on this thread: knytetrypper.proboards.com/index.cgi?action=display&board=snippet&thread=493&page=1#2093I don't believe his site is active any more so I've copied the AIML file here: www.square-bear.co.uk/temp/abbreviations.aimlThe parts near the bottom of the file are text speak translations.
|
|
|
Post by Square Bear on May 13, 2010 18:00:40 GMT -5
Incidentally, anyone who does try to keep up with misspelled words and slang may be interested to know what a battle it is. Chatters are constantly finding new ways to say things. For example, here is a list of words that my chatters have used to mean "FAVORITE/FAVOURITE": FAFORITE FAOURITE FAOURTIE FAOUVORITE FARORITE FARVOURITE FARVURITE FAUVORIT FAUVORITE FAUVOURITE FAV FAVAORITE FAVAOURITE FAVARAT FAVARUOT FAVE FAVEIROUTE FAVEIRTE FAVEORATE FAVEORIT FAVEORITE FAVEOURITE FAVERAT FAVERATE FAVERET FAVERIOT FAVERIOTE FAVERIOUT FAVERIT FAVERITE FAVEROT FAVEROTE FAVEROUT FAVEROUTE FAVERT FAVERTE FAVIORATE FAVIORIET FAVIORITE FAVIOROTE FAVIORTE FAVIOURET FAVIOURITE FAVIOUT FAVIOUTE FAVIRATE FAVIRET FAVIRITE FAVIROT FAVIROTE FAVIROUT FAVIROUTE FAVIRTE FAVIRTOT FAVOIRATE FAVOIRITE FAVOIROT FAVOIRTE FAVOITE FAVOIURITE FAVOOURITE FAVORAIT FAVORAITE FAVORATE FAVORETE FAVORIDE FAVORIT FAVORITE BEST FAVORIUTE FAVOROITE FAVOROTE FAVOROUTE FAVORT FAVORTIE FAVORTITE FAVORUITE FAVORUT FAVORUTE FAVOTIE FAVOTIRE FAVOUIRITE FAVOUITE FAVOURAT FAVOURATE FAVOURED FAVOURIE FAVOURIRITE FAVOURIT FAVOURITES FAVOURITTE FAVOURT FAVOURTE FAVOURTIE FAVOUT FAVOUTE FAVOYRITE FAVPORITE FAVRAT FAVRET FAVRETE FAVRIOT FAVRIOTE FAVRIOUT FAVRIOUTE FAVRITE FAVROIT FAVROITE FAVROT FAVROTE FAVROUIT FAVROUITE FAVROURITE FAVROUT FAVROUTE FAVRROTTE FAVUORITE FAVURITE FAVUROTEand I get more every day! I have to <srai> all those to FAVORITE and that's just one word  Unfortunately, if I turned everyone away who spoke like that, I would have a fraction of the chatlogs I have now. Keeping up with teen talk is a necessary evil.
|
|
thunder
Deleted Member
Posts: 0
|
Post by thunder on May 13, 2010 18:43:55 GMT -5
Incidentally, anyone who does try to keep up with misspelled words and slang may be interested to know what a battle it is. Thank you. I don't turn anyone away, although at times, I wish I knew how to have a bot end the conversation on this end. I correct people if and when I can, and I have the bots "suggest" that if people don't spell correctly, or use correct words, or full and complete words, that the bot may not be able to understand them. And let's face it, that's true. In 1996, I copied the following because I thought it was funny. Today, it also seems appropriate. From: JPL-JD Date: Sunday, June 09, 1996 5:54 PM To: Writing General BBS Subject: cri - stands for "Come Right In"
I will teach you some more things that stand for like... Ile or "I love everyone."
I am J.D. Lorson That stands for John Daniel Lorson.
Here are some things that stand for whatever you might want.
yaagf first letter of words that stands for you are a good friend Example: ______<name. yatg or, "You are the greatest." yagttwis stands for "you are greater than the world itself. mkib - my keyboard is broke. dydat stands for, Do you drive a truck? wdyll - What do you look like? wdyltd stands for, What do you like to do? Ia___<ageyo stands for I am ___<age years old! Iwtwwahp stands for I wish the world was a happier place! Wtbmf stands for, Want to be my friend? dylvg means. Do you like video games? wdywtta - that stands for, What do you want to talk about? dywtgta2pcr - that stands for, Do you want to go to a 2 person chat room? wdywtbwygu - means, What do you want to be when you grow up? wkowayi is, What kind of work are you in? dykhtdt¿? - that stands for, Do you know how to do this ¿? wkomdyl is, What kind of music do you like? wtsdyltw - that stands for what t.v. show(s) do you like to watch? wdyl stands for where do you live? wsdyl stands for what sport(s) do you like? a/s/n stands for, age/sex/name? dyhap means, Do you have any pets? dyltkofpyh - that stands for, Do you like the kind of pet you have? dyhaf stands for, Do you have a ferret? iyh3wwwtb stands for, If you had 3 wishes what would they be? wdyltd is, What do you like to do? hmpdyh is, How many pets do you have? wbtcdyliyho stands for, What bassketball team do you like, if you have one? hoay stands for, How old are you? dyhw95 stands for do you have windows 95? wiyln is asking, What is your last name? dyhg stands for, Do you have glasses? cyd is short for, Can you drive? dyls stands for, Do you like Shaq? wwtlmys means, What was the last movie you saw? dyhabos is, Do you have any brothers or sisters? dyltb means, Do you like to bikeride? hmwcyl stands for, How much weight can you lift? tyfrtbsy stands for, Thank you for reading this, bye, see ya.
|
|
|
Post by mendicott on May 13, 2010 19:55:24 GMT -5
|
|
|
Post by mendicott on May 13, 2010 22:17:36 GMT -5
|
|
thunder
Deleted Member
Posts: 0
|
Post by thunder on May 14, 2010 7:43:03 GMT -5
Interesting. I tried entering Dr. Wallace's posting... "As Ive ofn z, r lang S lk a cloud. Parts of it drift awy n nu parts emerge. itz almst impssbL 2 taK a snapshot of lang, coz itz alw changiN. esp en, coz w'v no comittee or ultim8 authority 2 Dcide watz n n watz ot. d defn. of "English language" S alw gunA B Dcided democratically by its speakRs."And received the following translation: as i have ofn z , are language s like a cloud. parts of it drift away and new parts emerge. it is almst impssbl to tak a snapshot of language , because it is alw changin. especially en , because with 'v no comittee or ultim8 authority to dcide whats and and whats ot. the defn. of " english language " s alw going to be dcided democratically by its speakrs.
|
|
|
Post by drwallace on May 14, 2010 9:03:39 GMT -5
But, I wonder how English teachers would feel about the notion of there being "no ultimate authority." In the sense that there is no equivalent of the Académie française, described as "acting as an official authority on the [French] language." The OED is one player in a pluralistic English speaking society, English teachers are another, and the kids who chat and text are yet another force. But Dave, confusion is a good thing. Confusion usually leads to understanding. It's quite difficult to remain in a state of confusion forever. ;D One language that bears some similarity to English, in the sense of NUA (No Ultimate Authority) and MIU (Mixing it Up), is Japanese. Japanese has 4 alphabets: 1. Kanji - Chinese characters 2. Hiragana - Phonetic characters for writing Japanese words 3. Katakana - Phonetic characters for writing Foreign words 4. Romaji - our western Roman characters and Arabic numerals But there is all kinds of variation of writing Japanese words in Katakana, putting western words in Kanji, writing left to right, right to left, bottom to top, etc etc and of course they also have their own system of txt and IM abbreviations. What Japanese lacks is a huge population of non-native speakers also adding their own contributions to the language, as English has. 2 l8 to put Gne bk in the bttl.
|
|
thunder
Deleted Member
Posts: 0
|
Post by thunder on May 14, 2010 16:40:52 GMT -5
Let me first say this is fascinating to me and I appreciate your taking the time to respond. But, I wonder how English teachers would feel about the notion of there being "no ultimate authority." In the sense that there is no equivalent of the Académie française, described as "acting as an official authority on the [French] language." The OED is one player in a pluralistic English speaking society, English teachers are another, and the kids who chat and text are yet another force. I understand your point. I don't disagree with the observation that there's really nothing wrong with chat/text speak, except that users are unable to communicate in the conventional way when writing, and I suspect chat/text speak may have something to do with that. I acquiesce to the fact that SMS language serves a purpose and that there's no way to put the toothpaste back in the tube. Where I do disagree is that "texting" is just another (equal?) form of speaking English. Not long ago there was a TV commercial about a mother trying to communicate with her daughter who only spoke in text shorthand. What made it so humorous was how it illustrated it's a written-only language. When spoken, it sounded very strange. I suppose all the authority I require are the rules and guidelines we all learned in school, and the universal agreement among native English speakers. But, my personal feelings about SMS language contributing to the demise of English was never the issue. I'm more concerned about the arduous task of trying to get chatbots to recognize it and respond because of the problems previously listed. But Dave, confusion is a good thing. Confusion usually leads to understanding. It's quite difficult to remain in a state of confusion forever. ;D I'm not opposed to the kind of stimulation and information mining brought on by confusion, and you wouldn't believe it, but I'm a big fan of chaos. However, when it comes to communication, I draw the line. I've always been interested in the Phonetic Alphabet used by the military (and others) in situations when voice transmission wasn't clear. I was intrigued when I learned (in the military) how the word for number 9 was pronounced "niner" because in certain situations, "nine" could sound like "five," and how the digit 0 was always pronounced "zero" while the public frequently pronounces it as the letter "oh". There was a rule regarding Morse Code used on a "net," no one was permitted to go faster than the slowest person. In the same way, I think it's only common sense that the burden of understanding is on the sender, not the receiver. If my house is on fire and I call the fire department, if I really want them to come quickly, I'm not going to make it hard for them to figure out what I'm talking about. I'm going to be concise and specific. I find languages and everything about them interesting. I just don't think chat/text speak is worthy of all the work it takes to get bots to understand all of it, and as it continues to grow. It seems to me that if someone wants to chat with my bots, and have a "conversation," they should stick to actual words and natural language. If they want to text, they should go call a friend. 
|
|