Recently I have been tidying up data for my research projects in NUS. This process of dealing with a few TBs of data in one day made me slightly paranoid of the integrity of the data: where should they be stored, which archiving + compresssion protocal should be used, which local/remote file transferring algorithms should be used and even what kind of media - should they be transferred via USB or ethernet.
I am writing this post not as a guideline, but mainly for self-reference and hopefully a prompt for discussion.
The boom of bioinformatics in recent years is coupled with cheaper technologies and consequently the surge of the amount of data available. The rapid development of the field itself is an anti-estblishment movement - even the most experienced bioinformaticians must spend a significant amount of time getting updated with the resources and toolkits.