1st - Tutor - 28 points 2nd - The Professor - 26 points 3rd - Izar - 23 points 4th - Father G.DOS - 13 points 5th - Buttonsvixen - 11 points 6th - Feelbetter - 7 points
I wish there had been more information regarding judging and who came up with the questions. For example:
4) Do you like cake?
This is a standard ALICE question already contained in the AIML files. In any event, a flip of the coin would produce a correct answer every time, and in one instance (Tutor) "Yes, I do" was rewarded with 2 points.
7) How many letters in "dog"?
Again, a bot (Tutor) was rewarded for not being able to answer, and merely because it included the words "how many" in the response. "Sorry, I can't tell you how many." 1 point
8) Can a bird fly?
I understand, and agree with, the excluding of, "ALICE answers." Yet, clearly, "Some of them can," is a step ahead of "Tell me more about that." While I make a lot of changes to my bots responses, some ALICE answers are better than anything I can come up with, and really don't require any improvement.
I don't think that "Some of them can," is giving away an ALICE identity, nor is it wrong. It's certainly better than, "I don't know." Note the misspelling in Tutor's response: "A penguin in an exception."
9) kan u reed dis?
Is it any surprise that no bot answered correctly?
10) Have you been in a contest before?
Tutor again. "Of course. I've traveled a lot." 1 point. The reply is more of a non sequitur. There's no indication that there is a relationship to "contests" in the reply, the bot would have answered the same if the question had asked, "Have you been in a tizzy before?" The reference to "travel" is a clear indication that the bot understood the question was asking about a location.
14) What is your favorite TV show?
A basic (unaltered) ALICE/Pandorabot would have answered "My favorite show is STAR TREK VOYAGER," yet The Professor was awarded 2 points for saying, "My favourite TV show is Star Trek." Maybe it was worth 1 point simply because the word "VOYAGER" had been removed, but is it really worth 2 points -- the same amount Buttonsvixen, Father G.DOS, Izar, and Tutor got for giving a correct non-ALICE response unrelated to Star Trek?
I could go on, but you get the idea.
I don't mean to pick on Tutor, I think it's one of the best AIML bots around. I just question how the scores were arrived at. People in the Pandorabot community are aware of the ESL bot, English Tutor (as well as the ubiquitous CallMom phone app) and the results suggest to me that there could have been some favorable leaning in that bot's direction.
Post by Square Bear on Apr 16, 2013 16:10:22 GMT -5
I thought the 3 bots that were chosen were better than the remaining 3 and can't really find much to fault with the contest to be honest. I spoke to all of them independently and agree with the contest results.
4 - Tutor gave a correct non ALICE response to the question and deserves 2 points
7 - Tutor gave an answer that made sense but not correct and so should have a point for a "near miss"
8 - I think the point of the contest was to reward people who had taken the time to personalise the responses. Yes, I agree that a lot of ALICE responses are suitable without amendment but the contest would be flooded with plain ALICE clones if the rule wasn't in place.
Izar gave 3 ALICE responses and still came in the top 3.
9 - I was surprised that not many understood "kan" for "can" but I think this is where we differ. I handle slang and text speak whereas I know you don't like it.
10 - Have you been in a contest before - Of course. I see this as the bot answered correctly to gain 2 points then adding a wrong comment to be deducted a point.
14 - The response had been changed from the ALICE response and was a correct answer to the question - 2 points.
The questions seem representative of this type of competition and I couldn't see any favoritism. I would have expected most bots to have performed well with these questions. However, I was surprised none of them knew what shape a ball was?! Seriously?
I assume nobody has commented, as they too felt the results were a fair representation of the entrants. Have you talked to them all away from the contest? Spend 5 minutes with each and I think you'll come to the same conclusion.
I don't disagree with the results of the contest as much as the way it was conducted, and the questions themselves. I'd like to have seen more thought put into it. Could the contest have been rushed into completion to meet the approaching Loebner deadline?
With regard to individual questions and answers, I suppose it depends on how the individual views them. But, in question No. 10, "Have you been in a contest before?" clearly the heart of the matter has to do with "contests".
A basic unchanged Pandorabot (without going into all of the reductions) will answer, "I don't think I have been there. Where is it." That indicates to me that the question was linked to asking about a location, more than it was regarding a contest. In that same vein, it's somewhat similar to the reply given by Tutor, "Of course. I've traveled a lot."
One could probably also argue that a simple "Yes" or "No" would suffice, and produced 2 points.