d2jsp
Log InRegister
d2jsp Forums > Off-Topic > Computers & IT > Programming & Development > Python Programming Assignment
Add Reply New Topic New Poll
Member
Posts: 18,969
Joined: Aug 16 2007
Gold: 16,089.87
Jan 31 2015 12:27pm
Writing a python program to parse basic text input files (multiple) and generate an inverted index output file (one).

Looking for some tips. Let me know if I'm doing these steps incorrectly or what may need to be added.

First thing - Take in arguments - Program name, folder name(with files) - I'll probably need a loop to go through each file?

I need to sort the dictionary of terms, and strip the terms
I'll strip before, so would it be best to strip line by line?
Assuming that once I parse the files I can just .sort them?

Guess I'm not really sure where to go from here, wondering if you guys could push me in the right direction.

Thanks :)
Member
Posts: 18,969
Joined: Aug 16 2007
Gold: 16,089.87
Jan 31 2015 03:18pm
When using glob.glob(path)
Can I somehow make the path a variable name or is there a way to do this?

I need it to go through the multiple files from my directory from the users input --- example - "directory/*"
Giving me all the files from the users input

Terminal
python progname.pyc directoryname/*

How can I make "directoryname/*" a variable?

This post was edited by Trev on Jan 31 2015 03:21pm
Member
Posts: 32,925
Joined: Jul 23 2006
Gold: 3,804.50
Jan 31 2015 03:22pm
http://stackoverflow.com/questions/2186525/use-a-glob-to-find-files-recursively-in-python

is that what you wanted? or are you trying to use a wildcard? just use regex on the list of files if that's the case.
Member
Posts: 18,969
Joined: Aug 16 2007
Gold: 16,089.87
Jan 31 2015 03:25pm
Quote (carteblanche @ Jan 31 2015 03:22pm)
http://stackoverflow.com/questions/2186525/use-a-glob-to-find-files-recursively-in-python

is that what you wanted? or are you trying to use a wildcard? just use regex on the list of files if that's the case.


Sorta, I just need the variable to be from the terminal and I'm not sure how to make that a variable with the * because that grabs all the files inside of the folder
Member
Posts: 18,969
Joined: Aug 16 2007
Gold: 16,089.87
Jan 31 2015 03:29pm
Essentially when I use directoryname/*, the argv[1] (which would be this had there not been a *) is not directoryname/* it's the first file in the directory. So I can't make the variable
Variable = argv[1] because it's the file name (inside the directory) instead of the directory it self from the user on the terminal
Member
Posts: 62,215
Joined: Jun 3 2007
Gold: 9,039.20
Member
Posts: 18,969
Joined: Aug 16 2007
Gold: 16,089.87
Feb 1 2015 10:33pm
Quote (j0ltk0la @ Jan 31 2015 07:50pm)


Looking at this closely, I think this is what I need.

How would I strip all but letters for each word? Need to strip out from documents commas, quotes, apostrophes, numbers, ect..
Member
Posts: 32,925
Joined: Jul 23 2006
Gold: 3,804.50
Feb 2 2015 04:37am
Quote (Trev @ Feb 1 2015 11:33pm)
Looking at this closely, I think this is what I need.

How would I strip all but letters for each word? Need to strip out from documents commas, quotes, apostrophes, numbers, ect..


just do a regex replace?
Member
Posts: 62,215
Joined: Jun 3 2007
Gold: 9,039.20
Feb 2 2015 05:48am
Quote (carteblanche @ Feb 2 2015 04:37am)
just do a regex replace?


Yeah,

string.replace(character, replacement)

or

ayy = re.compile(r'super_contrived_pattern', re.IGNORECASE|re.randomgayflags)
re.sub(ayy, replace, string)
Go Back To Programming & Development Topic List
Add Reply New Topic New Poll