Chat Ai - Topic

AbDuCt

Member

Posts: 13,425

Joined: Sep 29 2007

Gold: 0.00

Warn: 20%

#21

Mar 16 2016 07:42am

Quote (Ideophobe @ Mar 16 2016 09:33am)

noone here knows what you are working with, if you would post something that actually worked instead of a bunch of pseudocode with nasty selects relating fields to entities, using aliases as datatype definitions and selecting fields that dont exist maybe you could show how this system could ever possibly work

Well that just proves you have no idea how Markov chains work in text prediction and random sentence recreation from past learned entries. If you did you could easily correlate what I've been saying with the idea.

I'm still waiting for those logic flaws. Those "aliases" are valid sqlite data types.

Edit:: you're basically saying "your idea is wrong and won't work until you prove me otherwise". Pretty funny. This isn't a complicated idea to grasp even if you briefly visit the Markov chain Wikipedia.

This post was edited by AbDuCt on Mar 16 2016 07:46am

Ideophobe

Member

Posts: 14,631

Joined: Sep 14 2006

Gold: 575.56

#22

Mar 16 2016 10:41am

Quote (AbDuCt @ Mar 16 2016 07:42am)

this entire post is just one big way to avoid working on anything, you got nothing but a bad idea and you cant figure out why it can't be implemented
well it's because you're skipping steps, you don't know sql, and you're trying to work backwards from an answer with logic that could possibly get you there, but won't necessarily

AS IS A RESERVED WORD

This post was edited by Ideophobe on Mar 16 2016 10:45am

AbDuCt

Member

Posts: 13,425

Joined: Sep 29 2007

Gold: 0.00

Warn: 20%

#23

Mar 16 2016 10:59am

Lmk if you want to see my beast queries for enumerating databases using bitwise math and precomputed lookup tables based on page indexes. They would make anything I posted here look clean as hell.

You just don't understand this post. Catre got it within his second post. I will make a proof of concept later today. I was just looking for a way to structure the database, but it is obvouse that you wouldn't know how to begin with.

To spell it out, how to structure the database with one table for pointers to a group of separated linked lists. Or in terms of this how to take a users input and lookup a random starting word from a group belonging to the input and then follow the chains.

Then again Markov chains are 2 complex.

This post was edited by AbDuCt on Mar 16 2016 11:06am

Ideophobe

Member

Posts: 14,631

Joined: Sep 14 2006

Gold: 575.56

#24

Mar 16 2016 12:29pm

carte doesnt have a second post

i already told you you cant relate fields to entities

and your idea of chaining word by word is fundamentally flawed

Code

Q: "I like the color red"
Marchov chains that were prior learned
-> "I do too", "So do I", "I don't like that one", "I don't prefer that one", "I don't prefer that shade"
A: I do I don't prefer that one (Or some shit like that).

select nextword from AdAf52Fa2AfA2AFaRa76Gsa where word = ? order by random() limit 1;

do you see what can happen with this?
"i do i do i do i do i do i do i do i do i do i do i do i do i do i do i do i do i do i dont like that one"

when do you end it without giving it actual paths to follow instead of letting it create paths from words
the more you train it the more unreliable it gets

and if you do give it actual paths (which you can't word by word because you would end up creating tables of undefined field size)to follow what is the point of any of this because you're back where you started: 1 table of input and output strings
prepare statement from "select output from table where input =? order by random() limit 1;"
set @a=getInput();
execute statement using @a;

This post was edited by Ideophobe on Mar 16 2016 12:30pm

AbDuCt

Member

Posts: 13,425

Joined: Sep 29 2007

Gold: 0.00

Warn: 20%

#25

Mar 16 2016 02:35pm

Just by your post above you still have prooven that you have no idea what Markov chains are. This is probably why you are having trouble grasping the subject.

AbDuCt

Member

Posts: 13,425

Joined: Sep 29 2007

Gold: 0.00

Warn: 20%

#26

Mar 16 2016 06:48pm

Quick PoC, No optimizing or debugging. Responses are not optimal and tables are nasty. More then 3/4 of the responses were not given in the trained input. This means that it followed the chains (all be it not well) to generate unique output from what it has previously learned.

Also the Inputs table is not correct, using a hash of the input like so prevents me from doing fuzzy searching of inputs so that similar inputs can generate responses from the same table. So I will need to think of something that allows that to be flexible.

Output:

Code

AI>ruby brain2.rb
> Hello
"Hi I've missed you so long"

AI>ruby brain2.rb
> Hello
"Hi I've been here waiting forever"

AI>ruby brain2.rb
> Hello
"Hello why are you so long?"

AI>ruby brain2.rb
> Hello
"Hello it has been awhile"

AI>ruby brain2.rb
> Hello
"Hello abduct."

AI>ruby brain2.rb
> Hello
"Hi I've been waiting for you!"

AI>ruby brain2.rb
> Hello
"Hello abduct it has been waiting for you."

AI>ruby brain2.rb
> Hello
"Hello why are you."

AI>ruby brain2.rb
> Hello
"Hello it has taken you so long?"

AI>ruby brain2.rb
> Hello
"Hi I've been awhile"

AI>ruby brain2.rb
> Hello
"Hello abduct."

AI>ruby brain2.rb
> Hello
"Hi I've been awhile"

Code

require "sqlite3"
require "digest"

class Brain
def initialize(brain)
if !File.file?(brain)
@brain = SQLite3::Database.new brain
@brain.execute <<-SQL
create table Inputs (
Input text,
Pointer text
);
SQL

@brain.execute <<-SQL
create index InputIndex on Inputs (Input, Pointer)
SQL
else
@brain = SQLite3::Database.open brain
end

@brain.execute("PRAGMA synchronous=OFF")
@brain.execute("PRAGMA count_changes=OFF")
@brain.execute("PRAGMA journal_mode=MEMORY")
@brain.execute("PRAGMA temp_store=MEMORY")
end

def tokenizeInput(input)
regex = Regexp.new(/(\w+:\S+|[\w'-]+|[^\w\s][^\w]*[^\w\s]|[^\w\s]|\s+)/)

return [] if input.length == 0

tokens = []

input.scan(regex) do |token|
next if token.first =~ /\s+/
tokens.push token.first.strip
end

tokens.slice_after { |e| '.?!'.include?(e) }.to_a
end

def trainNewAnswer(input, groupTokens)
findTable = @brain.prepare "select Pointer from Inputs where Input like ? limit 1"
insertTable = @brain.prepare "insert into Inputs (Input, Pointer) values (?, ?)"
row = findTable.execute input

tableName = ""

if row.count == 0
tableName = Digest::MD5.hexdigest(input)
@brain.execute "create table `#{tableName}` (StartWord integer, Word text, NextWord text)"
@brain.execute "create index `#{tableName}index` on `#{tableName}` (StartWord, Word, NextWord)"
insertTable.execute input, tableName
else
tableName = Digest::MD5.hexdigest(input)
end

groupTokens.each do |tokens|
if tokens.count > 1
tokens.push ""

groups = []
(tokens.count - 1).times do |i|
groups.push tokens[i..i+1]
end

findChainLink = @brain.prepare "select * from `#{tableName}` where StartWord = ? and Word = ? and NextWord = ?"
insertChainLink = @brain.prepare "insert into `#{tableName}` (StartWord, Word, NextWord) values (?, ?, ?)"

firstWord = true
groups.each do |group|
if firstWord
exists = findChainLink.execute 1, group[0], group[1]
insertChainLink.execute 1, group[0], group[1] if exists.count == 0
firstWord = false
else
exists = findChainLink.execute 0, group[0], group[1]
insertChainLink.execute 0, group[0], group[1] if exists.count == 0
end
end

findTable.close
insertTable.close
findChainLink.close
insertChainLink.close
else
puts "Training data not long enough"
end
end
end

def getQuestionAnswerPair
print "> "
input = gets.strip
puts "Answer"
print "> "
answer = gets.strip

tokens = self.tokenizeInput answer
self.trainNewAnswer(input, tokens)
end

def generateResponse(input)
findTable = @brain.prepare "select Pointer from Inputs where Input like ? limit 1"
row = findTable.execute input

tableName = ""

if row.count == 0
puts "..."
else
tableName = Digest::MD5.hexdigest(input)
end

response = []

findNextWord = @brain.prepare "select NextWord from `#{tableName}` where StartWord = 0 and Word = ? order by random() limit 1"
findStartWord = @brain.prepare "select Word, NextWord from `#{tableName}` where StartWord = 1 order by random() limit 1"

firstWord = true
while response.last != ""
if firstWord
nextWord = findStartWord.execute
nextWord.first.each do |word|
response.push word
end
firstWord = false
else
nextWord = findNextWord.execute response.last
nextWord.first.each do |word|
response.push word
end
end
end
response.pop
response = response[0..-2].join(" ") + response[-1]
end
end

b = Brain.new("brain2.db")

#b.getQuestionAnswerPair
#b.getQuestionAnswerPair
#b.getQuestionAnswerPair

print "> "
input = gets.strip
p b.generateResponse input

This post was edited by AbDuCt on Mar 16 2016 06:56pm

Ideophobe

Member

Posts: 14,631

Joined: Sep 14 2006

Gold: 575.56

#27

Mar 16 2016 09:47pm

i just told you this is the exact problem that you would have trying to chain word by word

the word "you" ends the response 9 times out of 10 because theres a 90% chance that punctuation comes after the word "you"
as you give it more options you make it more random, unless you're going to try to think up strings that only share words where the paths intersect at cohesive points, and although that would be an interesting project, you're not going to do that

no matter how much you want it to language processing will never be possible with basic 2d logic like markov chains

a working implementation of an absorbing markov chain in a chat bot would be making two bots talk to each other

Code

string bot1, bot2
if bot1==hello
bot2==hi there
print bot2
if bot2==hi there
bot1==are u flirting with me?
print bot1
if bot1==are u flirting with me?
bot2== no i'm just trying to explain to you that markov chains are simple logic based on state changes and you can't use them to create anything dynamic

or directing it through multiple preset paths that make sense

Code

question="hi"

string response
if question = "hi"
response.add( "hi there" )
choice =randomly choose ["who","what","when","where","why"]
case choice
who: response.add("who are you?"); choice=["end"]
what: add("what's up?"); choice=["end"]
when: add("when did you get here?"); choice=["end"]
why: add("why are you talking to me?"); randomly choose choice=["end","dontend"]
where: add("where am i?");
if choice==end
response.add("\nwell? answer me! Now");
else if choice=="dontend"
response.add("\n NEVER FUCKING MIND DO WHATEVER YOU WANT WITH YOUR BROKEN BOT ")
print response

bayesian networks will work better
https://en.wikipedia.org/wiki/Bayesian_network
but

Quote

In 1990 while working at Stanford University on large bioinformatic applications, Greg Cooper proved that exact inference in Bayesian networks is NP-hard.[20] This result prompted a surge in research on approximation algorithms with the aim of developing a tractable approximation to probabilistic inference. In 1993, Paul Dagum and Michael Luby proved two surprising results on the complexity of approximation of probabilistic inference in Bayesian networks.[21] First, they proved that there is no tractable deterministic algorithm that can approximate probabilistic inference to within an absolute error ɛ< 1/2. Second, they proved that there is no tractable randomized algorithm that can approximate probabilistic inference to within an absolute error ɛ < 1/2 with confidence probability greater than 1/2.

This post was edited by Ideophobe on Mar 16 2016 10:15pm

carteblanche

Member

Posts: 32,925

Joined: Jul 23 2006

Gold: 3,804.50

#28

Mar 16 2016 11:12pm

my exposure to markov chains is limited to a small section from "Programming Pearls" i read years ago. if i remember, i'll try digging up my book and see what insight it offers. not sure where it is since i moved last year.

AbDuCt

Member

Posts: 13,425

Joined: Sep 29 2007

Gold: 0.00

Warn: 20%

#29

Mar 16 2016 11:19pm

Quote (Ideophobe @ Mar 16 2016 11:47pm)

Code

or directing it through multiple preset paths that make sense

Code

bayesian networks will work better
https://en.wikipedia.org/wiki/Bayesian_network
but

Wow actually linked a wiki article for once rather than saying something is wrong.

Markov chains can be trained better I was simply showing how the thig worked like you said it wouldn't.

For instance if you count how many times a specific word pairing is used you can do weighted random. You can also increase the order of the chain to make more concice sentences. First order by design outputs the most unique and random outputs. Not to mention I haven't even scored responses nor processed it through NLP libraries. If you generate a few thousand responses and score them they turn out better. I wouldn't be surprised if you didn't know that though.

Have you even read anything about Markov chains yet?

Edit:: not to mention in my first or second post I said that I might not even use Markov chains and I am looking for other predictive models.

Edit2:: the real problem is not text generation and processing but is about subject awareness. That's where the whole database schema of sub chains belonging to specific inputs stemmed from.

This post was edited by AbDuCt on Mar 16 2016 11:28pm

Minkomonster

Member

Posts: 1,995

Joined: Jun 28 2006

Gold: 7.41

#30

Mar 17 2016 12:32am

Markov chains. Fun stuff. I don't have any papers, but I can at least give my input.

So, for Markov Chains to work efficiently they need to be trained. In a nut shell they are basically a finite state machine with probabilistic transition. The simplest way to do this is to have a large text corpus which you can build the FSM, and then when the user inputs, you can use natural language processing to parse the sentence into a set of keywords and select one of them to build the Markov chain from. But you already know this, and you have already gotten this far. Your issue as I understand it is wanting varied responses. You don't just want to select the subject and go with that. Because then the conversation doesn't seem natural. It will be very rigid. So, a suggestion would be to create a Markov Chain of statement-> response templates. Something like read input, and determine what sentence structure this statement best fits, and then build a markov chain response from that. Basically, train your bot with not just a list of words and their probable successors but also simple sentence formats and their probable response formats. Then just let Markov generate your response from that, retrain, and go again. Instead of trying to force which route Markov takes based on a predefined set of response formats.

In theory? I don't know, sounds interesting. In practice? Well, you are attempting to design something that could pass a Turing test. So, good luck on that. If you are looking for interesting reads, I would suggest the very paper which described the Turing test. Alan Turing's Computer Machinery and Intelligence.

Go Back To Programming & Development Topic List