d2jsp
Log InRegister
d2jsp Forums > Off-Topic > Computers & IT > Programming & Development > Perl: Reading In File To Hash (please Help)
Add Reply New Topic New Poll
Member
Posts: 5,102
Joined: Jan 2 2010
Gold: 45,375.00
Aug 26 2021 02:17pm
File looks like:

ENSG00012 ENSMUSG0000164
ENSG00256 ENSMUSG0004544
ENSG00089 ENSMUSG0000553

Two columns, tab separated. The file has 27,000 nonempty lines.

# My Program:
my %humanmouse;
open my $fh, '<', '/location/of/file.txt' or die "Unable to open file:$!\n";
while (my $line = <$fh>) {
chomp($fh);
my ($mouse, $human) = split /\s+/, $line;
$humanmouse{$human} = $mouse;
}
close $fh;

print "$_ => $humanmouse{$_}\n" for keys %humanmouse;

When I print out the hash, I only find 20312 entries of the 27000 that I want. I've tried almost every combination and permutation of this code and I can't get it to read in or print out all of the lines in the input file.
There are no errors with indenting, formatting, etc.
If you see a clear error and it fixes the problem, I can pay 100-500fg. If the solution is more intricate/complicated, I can pay up to 1000fg. If you want to work with me on the program, I can pay more.
I cannot share/upload the script or input file(s).
Member
Posts: 12,703
Joined: May 17 2013
Gold: 12,935.00
Aug 26 2021 04:16pm
your issue is that you have multiple instances of the same value, and since you are using a hash map you are overwriting the values of previous instances of the key.

Instead of overwriting the value directly, turn each value (in this case, $mouse) into an array, and add to that instead of assigning directly.
Member
Posts: 4,377
Joined: Jan 1 2018
Gold: 24,110.00
Sep 29 2021 06:56am
his issue is its perl
Member
Posts: 836
Joined: Jun 21 2015
Gold: 3.89
Oct 3 2021 09:24am
your code is behaving normally. hash keys are unique so chances are.. you have duplicate lines in your file.

in your loop, check for duplicates..

Code
my $dupe_count = 0;
while (my $line = <$fh>) {
chomp($fh);
my ($mouse, $human) = split /\s+/, $line;
if (exists $humanmouse{$human}") {
print "Duplicate line: $line";
$dupe_count++;
}
$humanmouse{$human} = $mouse;
}

close $f

print "Duplicates: $dupe_count" if $dupe_count;


or just check the uniq line count of your file versus the raw line count

Code
echo -n "Unique lines in file: "; uniq /location/of/file.txt | wc -l
echo -n "Total lines in file: "; wc -l /location/of/file.txt



if you want to have multiple 'mouse' values for a 'human', then you need to check if the key already exists and push the new $mouse value into $humanmouse{$human} as an array.

Code

while (my $line = <$fh>) {
chomp($fh);
my ($mouse, $human) = split /\s+/, $line;
if (my $existing_mouse = $humanmouse{$human}) {
if (ref $existing_mouse ne 'ARRAY') {
$humanmouse{$human} = ([$existing_mouse, $mouse]);
}
else {
push @{$humanmouse{$human}}, $mouse;
}
}
else {
$humanmouse{$human} = $mouse;
}
}

close $f


This post was edited by SLAMBOOZLED on Oct 3 2021 09:50am
Member
Posts: 13,452
Joined: Jan 14 2008
Gold: 116.00
Oct 5 2021 10:16am
Quote (dnf @ 29 Sep 2021 13:56)
his issue is its perl


Hot take!

Quote (SLAMBOOZLED @ 3 Oct 2021 16:24)
your code is behaving normally. hash keys are unique so chances are.. you have duplicate lines in your file.

in your loop, check for duplicates..

Code
my $dupe_count = 0;
while (my $line = <$fh>) {
chomp($fh);
my ($mouse, $human) = split /\s+/, $line;
if (exists $humanmouse{$human}") {
print "Duplicate line: $line";
$dupe_count++;
}
$humanmouse{$human} = $mouse;
}

close $f

print "Duplicates: $dupe_count" if $dupe_count;


or just check the uniq line count of your file versus the raw line count

Code
echo -n "Unique lines in file: "; uniq /location/of/file.txt | wc -l
echo -n "Total lines in file: "; wc -l /location/of/file.txt



if you want to have multiple 'mouse' values for a 'human', then you need to check if the key already exists and push the new $mouse value into $humanmouse{$human} as an array.

Code
while (my $line = <$fh>) {
chomp($fh);
my ($mouse, $human) = split /\s+/, $line;
if (my $existing_mouse = $humanmouse{$human}) {
if (ref $existing_mouse ne 'ARRAY') {
$humanmouse{$human} = ([$existing_mouse, $mouse]);
}
else {
push @{$humanmouse{$human}}, $mouse;
}
}
else {
$humanmouse{$human} = $mouse;
}
}

close $f


Seems like the issue to me. Couple of quick changes to the duplicate check version (you've got some errant characters):

Code

my $dupe_count = 0;
while (my $line = <$fh>) {
chomp($fh);
my ($mouse, $human) = split /\s+/, $line;
if (exists $humanmouse{$human}) {
print "Duplicate line: ${line}\n";
$dupe_count++;
}
$humanmouse{$human} = $mouse;
}

close $fh;

print "Duplicates: ${dupe_count}\n" if $dupe_count;
Member
Posts: 5,102
Joined: Jan 2 2010
Gold: 45,375.00
Oct 5 2021 06:04pm
Quote (dnf @ Sep 29 2021 05:56am)
his issue is its perl


this fixed it. thank you guys
Member
Posts: 4,377
Joined: Jan 1 2018
Gold: 24,110.00
Oct 6 2021 03:10am
hahaha harmless fun boys, glad you solved your problem. gl!
Member
Posts: 781
Joined: Oct 7 2021
Gold: 2,230.50
Oct 8 2021 11:11pm
Quote (dnf @ Sep 29 2021 05:56am)
his issue is its perl


lmao this is the correct answer
Go Back To Programming & Development Topic List
Add Reply New Topic New Poll