When it comes to analytics software and languages you will sooner or later have to decide which one you will use. There are dozens of tools available, some more in demand than others. As a data scientist you will find R and Python as popular open source statistical packages. On the proprietary side, you will find products like SPSS, Matlab, Stata or SAS. Since it takes a lot of time to familiarize with one of those tools, it is important to choose your software wisely. Proprietary or open source analytics software? – this article will give you some ideas.
Let’s take a look at the advantages of those open source analytics software.
1. There is a huge user community out there with discussion boards. If you go to stackoverflow you will find a solution to nearly any question you might have related to R and Python. If you would be using paid software, with a smaller user group, chances are lower that you will find satisfactory support in the community.
2. Adaptability to new trends is much higher with open source. Since there are so many people using it, contributing without financial calculations, you will find add on libraries to those trends much faster with open source. If you are working in a field that does not promise much of financial reward, chances are that paid software does not adapt to that.
3. The financial advantage is obvious. But this financial freedom allows you to test several software products and identify the one which best suits your needs. There is nothing worse than being forced to work with a product that does not suit your demands but you have to stick with it since you already made the financial decision.
4. Especially with Start Ups and new companies open source is preferable. If you go through job openings of Start Ups you will quite often find R and Python in demand. Therefore if this is your career destination of choice I would recommend to focus on open source.
5. Interfaces between competing software can be easily found in open source. If you want to use SAS data in R, it can be easily done with an-add on library like “foreign”.
Now let’s take a look at the advantages of paid software:
1. Some paid products offer excellent free tutorial materials which makes it easy to learn. Stata is a classic example of a well-documented analytics software. The Stata homepage offers a variety of well-structured videos. SAS even has an academy where you can make get a certification which is well accepted in several industries. A SAS certificate is definitely a useful paper for your CV and can get you the edge on the job market. Costs however can be significant.
2. Those software companies have the funds to do serious lobbying which gets them influence in some industries. When you take a look a quality management and six sigma, you will find Minitab as a quite prominent tool which is widely used there. That simply means that Minitab has gained influence in the field of quality management. Therefore, it would be wise for you to focus on that software if you want to go into quality management. Same goes for SAS and the pharma industry. As long as SAS is the only software accepted by the FDA for clinical trial submissions, it will be the tool of choice in pharma. In general depending on the industry you want to go into, you might check if this particular industry has a heavy bias to one of the many different software packs out there.
As with many things in life, decisions can only be made on a case by case basis. Open source vs. paid is no exception of that rule, even when open source is like a religion for many people. Quite often people learn one or two stats packages during college. Unfortunately, the professors chose software based on availability and lobbied contracts of the university instead of making clever decisions based on future market demands and recent changes in tech. That leads graduates in a weaker job market position than they would otherwise be. In Europe I see many students using SPSS during their college time since it is used by professors and the university IT department offers it for free. As soon as the students leave the school, SPSS access is not available any more. Whole industries might run on open source R or Python. Basically in this case you would need to readjust your education to have an opportunity for good jobs.
During your education it is recommended to focus on tools that are used in many different fields. This is the case with free tools like Python or R. You will benefit more from these tools instead of knowing the ins and outs of paid software like Minitab and SPSS which you probably will not need in the years to come.