I had a new idea to try to solve this. Although it completely disregards the self learning aspect (unlike my first attempt which could be trained off of almost any chat source) it can still be semi easily trained.
I am going to take the ideas from AIML and ChatScript in which are based off question answer pairs, but for each question match the answer will consist of Markov chains. This means that if the AI matches "Hello", it will search for all chains that belong to that question, then randomly follow them to generate an answer related to "Hello".
A more practical example may be:
Code
Q: "I like the color red"
Marchov chains that were prior learned
-> "I do too", "So do I", "I don't like that one", "I don't prefer that one", "I don't prefer that shade"
A: I do I don't prefer that one (Or some shit like that).
This allows for self creation of unique responses based on prior answers/responses to the initial match. The downside is you have to train the input/output matches by hand for the first long while since it is hard to come across chat sequences where a person talks one line, and another user responds with one line.
I haven't tested such a system so I don't know how well it would work, but creating the database schema may prove difficult. I think the easiest way would be for each input question, create a new table to hold the specific markov chains that respond to it. Although this would take up a shit ton of disk space for a large database. Although without that kind of separation I can't think of a way to figure out which output responses belong to which input responses. I suck at SQL.
I guess the above schema would look something like:
Code
Table Questions (input TEXT, output_table_name TEXT)
Table AdAf52Fa2AfA2AFaRa76Gsa (word Text, nextword Text)
select output_table_name from Questions where input like "user input parsed string";
select nextword from AdAf52Fa2AfA2AFaRa76Gsa where word = ? order by random() limit 1;
This would allow for each question to have a unique table with chains relating to its subject.
I'd love it if someone could figure out a better way to segregate the two and provide a better schema.
That's enough ranting about a subject no one cares about, carry on.
This post was edited by AbDuCt on Mar 14 2016 08:05pm