Month ago I wrote a post with current progress with my Outreachy internship.
Reality once again ruined my plans :) That’s what I’ve done since June, 15th:
My initial timeline had pair of packages which made me do all these packages.
Predictprotein
package turned out to be a complex pipeline, which uses long list of other packages. There also was disulfinder
, and proftmb
- both programs by RostLab.
Predictprotein
raised millions of errors, and some of these errors appeared because of packages it depends on. I thought that it would be good to write tests to other RostLab packages, since they are all connected (some of them are dependencies for others). That’s why I decided to write tests for all packages in this directory, and only after that to move forward.
I decided to skip some of them - for example, pp-popularity-contest
, since it doesn’t do anything biomedically significant, except sending usage reports to RostLab.
I also skipped pssh2
(because couldn’t figure out for now how to get sources), libai-fann-perl
(moved to Debian Perl Group), and tried to do my best to fix as many errors as I can and write as many tests as I can.
When I was working on them, I learned about autopkgtest-pkg-perl
, which helped me a lot.
Some of these packages were written in fortran, and I was very grateful to my former scientific advisor for asking me to implement old folding algorithm in Scala - because of that I already knew, how fortran code may look like (algorithm’s parameters had readme file with small portions of fortran code) and I wasn’t afraid of it :)
The Top-1 Scariest Fortran Program in my personal scaryness rating is profnet
- this source package produces 8 binary packages. And for them I wrote 1 test:
This test requires binary package name as a parameter for execution, and it is ok since all mentioned 8 binary packages have similar structure.
Apparently, only 5 of 8 packages work well with this test. Other 3 end up with segmentation fault. I think they require some additional fixes or parameters, but I couldn’t find out what’s wrong and what parameters I should provide to run them. For now. That’s why test for profnet
is incomplete.
I haven’t fixed profphd
yet, since it requires old version of perl, and I don’t speak perl well enough yet to fix it.
Predictprotein
appeared to be just worst of all. It requires ~30GB database, which should be installed by hand. And still it is outdated, and raises error. Because this is BLASTP database, outdated version.
Predictprotein
uses blastpgp
program (from ncbi-blast+
package), and latest version of this program fails on that database.
That made me think a lot about typical problems with bioinformatics software - lack of standardization and database versioning.
But blastpgp
works well with latest BLASTP database from NCBI website.
I tried to run and it works! But I run by hand, and had patched profphd version installed (haven’t committed yet). And problem with database remains - probably I’ll try to make smaller version of NCBI database and use it to make testsuite for autopkgtest.
Packages metastudent
and libgo-perl
raised errors during predictprotein
run. That’s why I had to write test for these two and to fix errors.
The best thing is - I almost finished that nasty RostLab’s packages! Hope next week I could start working on r-cran-bio3d
or pymol
tests.