In academia you are constantly sharing information and working on manuscripts. This often requires you to read, refine, edit, update and send back. This was simple with a piece of paper because there was only one piece of paper. When we went electronic we got multiple versions so along came version control and the infamous “track changes”. This is fine if data is processed in series but that is not always time efficient. So we want to do things in parallel. But that is when it all falls down or does it?
I guess email is the classic way to do this. We write a file and send that file off to a colleague and they read, modify and then send the file back to us. Simple but storage inefficient. *_Why?_ **Well think about it you generate a copy of a the document it goes out in your email assuming it is less than 25MB as most clients block files greater than this. They then get this file in their email. Now if there are many collaborators, which is often the case, each one receives this file and now we are looking at X24.9MB of space being wasted. We pull the file out of the emails store it, edit it then send it back.
But arrrgh no wait W. Bragg has just emailed me an updated version. So I now read that and add my changes to that and send it back. Damn it M. Laue has just emailed his version and corrections. Hang on Max’s version doesn’t have Williams edits. No wait now comes an email from P. Ewald and it has radically changed. You are not even the person supposedly trying to wrangle all these copies just update your information. But now you have three new files plus the original one and your email is 100 MB more full not to mention your harddisk. Plus each version has had the filename changed to that person’s name.
This seems very wasteful in time and effort. Plus someone has to now bring all these different changes together. Normally the person who sent out the first email but see how the thing breaks down so quickly because you are changing text which may have been changed already. So there needs to be a better solution and surely there is?
This is Microsoft’s Cloud sharing solution. You find it in some Universities. Personally I dislike the system with a** passion. _Why?** _Well it is Microsoft based and focussed on internet explorer (IE). This means you can use your Chrome and Firefox (FF) browsers but with a greatly reduced functionality. Take for example uploading files. In IE you can upload many files at once but in FF or Chrome you can only do one at a time. Which is odd since both browsers can do multiple file uploads just fine on many, many, many, many other websites. So Linux users have a massive disadvantage out the gate - nothing new there when it comes to MS products.
Other than the reduced functionality vocal does pretty much everything you would expect it to do. It has a complex folder hierarchy, blog, forum, wiki and many other template systems. You can build in task management and in general it is all there. It also can handle working with groups with both internal and external parties. So you’ll be correct in thinking that it has version control included. All good so far I guess. Except it breaks down just like the email system does when different file editors start changing the filenames. Just like email you need to pull a local copy for working but now you can flag that you are working on that copy so others see it as “checked-out”.
In my opinion it is a step up from email, has many of the same drawbacks but now the version controlling is centralised which is good and as long as people obey the filename convention, trusting the version control, then the process can work.
It is a pain to put files in and out and sometimes in Chrome the website does crazy, crazy things so if I had a choice I wouldn’t use it.
I recently joined DropBox under some strange delusion that I got 5GB free. This was wrong BIG time WRONG. I got a crappy 2GB to use and then after completing the task list got an extra 250MB. Wow, how lucky am I?
I thought DropBox was going to be some saving grace of Cloud storage but to be honest I’m really disappointed with it. I am a Linux user so I was impressed to see that there was a DropBox Linux client. I was however not particularly happy after installing the rpm to find that it had added a yum.repo to my system and I still needed to download another package to make it work. In fact I don’t like how it installs into my home directory either. I’m not a big fan of software doing that.
But once installed I thought it would be good to test. My former colleague, a MAC user, is all about the DropBox and he shared a folder with me. Imagine my shock when after he did this 1/4 of my crappy 2.25 GB suddenly vanished. “What?” I asked myself? Why would the action of someone else sharing their space with me reduce my space? I’d be pissed if I had paid for that space as well. I’m guessing it is some sort of anti-abuse policy so you can not daisy chain 10s of free accounts together but honestly it is rubbish and very annoying.
Not as annoying as trying to synchronise using the Linux client which just was horrible to use. I didn’t want the entire share being brought down locally instead I opted for a few folders. To be honest I only wanted a few files but there was no option for me to do that and as a result I had to get two folders. Now after it telling me the two folders were going to take about 48 hours to sync I gave up. I’d got part of one folder so I thought that was safe I would just stop the sync and remove the folder from the sync menu, then download via the web. BUT no wait as soon as I unchecked the folder in the sync settings DropBox removed them from my local folder. Please, for the love of GOD why?
So with the file sharing issues still in my head, the concept of a cloud storage facility makes sense. One that has a native interface to your file explorer also good. Especially for copying/transferring multiple files - such as images for a paper. This puts it one up on vocal. The fact you loose your space when someone shares with you is a big NEGATIVE and one I’m not certain how to resolve. You could of course never share folders instead just send each other links. This gets around the 25MB email limit as well. But again means all parties hold copies of the work in progress and perhaps multiple copies at that.
I’d rate DropBox somewhere above vocal but with the caveat that it is not as simple as it may first appear to be in theory. One note that DropBox also has file version control built into its system. It is very different to that of vocal and not really a method of tracking changes.
Google Docs Neigh Drive
Google of course has an offering for us here in the form of Google Docs or has it is now known Google Drive (gdrive). Now out the door you get 15GB thanks to Google taking the wise choice to merge your email, photo and gdrive space into a single number. 15GB for free did you notice that DropBox, cough, 2GB, cough JOKE. But it doesn’t stop with more space. Just like DropBox you can share and that works quite well if you are sharing with other Google people. It gets a bit messy otherwise and to be honest if I use Chrome to download anything from my gdrive space it has a massive fit and stops working. Google doesn’t do Google it would appear? FF works fine though.
If you have more than one Google account it can also be a pain as you need to fully log out and leave the account and then log into the other account to access gdrive this is different to all other products which allow hot switching in the browser.
But that is not where the power of gdrive potentially lies. That would be the fact you can simultaneously work on a document live and the document updates instantly in the cloud. In fact if there are a few people working on the same file you get a message telling you who and where they are currently pointing with their mouse. Brilliant. BUT unfortunately because most journals don’t let you submit papers in anything other than LATEX or Word document format - something I personally do not think is ethical, they should allow open format documents odf such as those from Libre/OpenOffice - then working online is going to work but not for the final submission. Plus trying to get some of the more established academic to work on a live gdrive document in the ether often triggers cries of “no trust” and “NSA”. Not to mention you need to be online to see the live changes which is no good if you are working on the document on the train.
There is also one final death nail/blow to this and that is the People’s Republic of China. If you are collaborating internationally then unfortunately there is no access to gdrive without a VPN and so it is very hard to collaborate using this method with Chinese academics.
GIT, SVN, CSV - LATEX
Of course this problem is nothing new. Software programmers have been having to think of ways to work on code and bring multiple changes together. This has been achieved using CSV, SVN and GIT to name but a few. I have known a few PhD students who had Linux/Unix orientated supervisors who allowed them to work in a SVN repository with a LATEX document. With this method the changes made to the file are trackable and reversible by either party at any time. The downside being that you need really to work in plain text and as such you tend to use LATEX. On the up side you are using LATEX so you are free from MS office and most decent journals will accept the article as well as provide templates.
In short it works but it is a little bit too extreme for the normal academic - I guess.
What about references?
Of course with all the changes to your manuscript you need to build the references and keep them up to date. You could option for an Endnote system. Again being a Linux user I hate Endnote and using it means all parties need to have Endnote. On the plus it is quick if you want to reformat your references due to a rejected manuscript or change in destination journal.
However unless you store the Endnote file with the document or send it via email it is not very centralised. This can be overcome using the WebEndNote which I have found to be significantly better than the actual windows program. Again Linux issues are many. But that is not the only method.
Many wordprocessor packages come with their own referencing tools built in but if you want something else you can look no further than Zotero. Zotero has been around for a while now and has matured from a FF plugin to being a standalone program, a FF plugin and a web system. Pretty cool. It means you can share “folders” and build references collaboratively online and it plugs into MS word and importantly all your lovely free Libre products as well.
If you are interested you can also append the actual paper pdf files with each reference making it easier for your collaborators to read why “you have put that reference there”! There is a downside though in that you only get 300MB of sync/cloud space for free but that is normally enough to do what you want. Plus they have chosen to use SQLite for the database backend which makes it a little bit tricky to build onto it for your own purposes using a LAMP system. There are APIs though to connect to their web services so it is not the end of the world.
I can’t leave this section without pointing out that LATEX handles references pretty well as well!
The reality of the matter is that being academics we don’t really have any “standards”. We use lots of different products depending on our own personal and in many cases institutional preferences. This means that we pretty much need to keep a toe in each of the main product lines to make sure that we cover all our bases.
However, simply uploading files to cloud servers is not just the solution if we are trying to make a collaborative document where we are all aware of the changes in “real time”. It is a damn sight better than sending everything via email though.
So how about this - the next time you are working with a completely Western academic team - why not fire up a Google Docs document and work from that until the final submission version and then pull that down in MS format for submission? It may be more productive than you think? Of course that is assuming you have no ethical issues with Google or the NSA or GCHQ reading the document. You never know they may spot a typo here and there?
One last thing - at what point will publishes stop forcing academics to use Microsoft products to produce the final manuscript. When will I be able to download the “open document format” template for Angewandte Chemie rather than the word or LATEX one? When will free software take precedence over commercial in our new Open Access word? Perhaps when we see odf templates and open documents becoming the normal then we will have an improved resource base of open source programs allowing us all to collaborate towards producing open access publications in an efficient and environmentally friendly manner.
Think of the environment before printing this version, or the next version of the manuscript that is being sent to you right NOW.