Open Source Software Notes
I have to make a lot of notes to myself about how to do stuff on the computer.
quick edit link
Linux/Kubuntu
to fast copy over the network with ssh:
to find the day of year of a particular date:
to get the details on an arbitrary list of files:
while read file; do ls -l $file; done < filelist.txt
to quickly scan through a text file for a word. then use ‘n’ and ‘N’ to search forward and backward:
to remove all of the blank lines in a text document:
to add an extension (here .csv) to all files in a directory:
Bioinformatics
to prepend “>filename” to every FASTA file in a directory:
for file in ./*.fasta; do
foo=${file##*/};
bar=${foo%.*};
sed -i "1i \>$bar" $file;
echo $bar;
done
download complete genome sequences from JGI Integrated Microbial Genomes (IMG) using a list of IMG taxon ids (input.txt)
for i in $(cat input.txt);
do echo $i
FILE=$i.fasta
BASE="http://img.jgi.doe.gov/cgi-bin/pub/main.cgi?section=TaxonDetail&downloadTaxonFnaFile=1&_noHeader=1&taxon_oid="
URL="${BASE}${i}"
wget $URL -O $FILE
done
to find all of the EC numbers in [file], sort, de-replicate, count, and print them by order of decreasing frequence
ARB import filter to read full_name from a FASTA file. Save to $ARBHOME/lib/import/
From of FASTA file should be >[name][tab][full_name]
#Global settings:
KEYWIDTH 1
BEGIN ">??*"
MATCH ">*"
SRT "* *=*1:*\t*=*1"
WRITE "name"
MATCH ">*"
SRT "*\t*=*2"
WRITE "full_name"
SEQUENCEAFTER "*"
SEQUENCESRT ""
SEQUENCECOLUMN 0
SEQUENCEEND ">*"
# DONT_GEN_NAMES
CREATE_ACC_FROM_SEQUENCE
END "//"
perl script to translate names in tree files or sequence files, given the file to convert and a 2-column translation table. will probably need to be edited depending on type of file. save as ‘myconvert.pl’, make it executable ‘chmod +x myconvert.pl’, and run as ‘./myconvert.pl [treefile] [translationfile]‘
use strict;
my $treefile = $ARGV[0]; # newick-like tree
my $translatefile = $ARGV[1]; #names to translate
my %namehash = ();
my %outhash = ();
open(FILE, "< $translatefile") or die;
while(<FILE>) {
chomp;
my @array = split(/\t/); #split on tab
$array[1] =~ s/[ \/\(\)']/_/g; #replace bad chars with underscore
$namehash{$array[0]} = $array[1];
}
close FILE;
open(FILE, "< $treefile") or die;
LINE: while(<FILE>) {
# chomp; #uncomment to remove newlines
# s/^[ \t]*//; #uncomment to replace whitespace at beginning of line
# s/['"]//g; #uncomment to delete quotation marks
foreach my $phyname (keys %namehash) {
s/$phyname/$namehash{$phyname}/;
}
print "$_";
}
close FILE;
LaTeX
to generate a clean one-page HTML output of a TeX document
to convert normal quotes into LaTeX quotes
to globally comment out/not run figures in LaTeX, put it at the end of the preamble
Engauge [Graph/Plot] Digitizer
Use this excellent program to convert an image of a graph into usable X/Y data points. It expects plots that do NOT have multiple Y values, so rotate images (e.g. P vs. depth) by 90 before you import them. If your plot has multiple colors it is easiest to digitize, in that case just use the ‘discretize’ options and turn off the ‘grid removal’ options. There are tutorials available at the Engauge site on SourceForge.