Talk:Comparison of statistical packages

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
WikiProject iconStatistics List‑class Low‑importance
WikiProject iconThis article is within the scope of WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
ListThis article has been rated as List-class on Wikipedia's content assessment scale.
 Low This article has been rated as Low-importance on the importance scale.
WikiProject iconMathematics Start‑class Low‑priority
WikiProject iconThis article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
StartThis article has been rated as Start-class on Wikipedia's content assessment scale.
 Low This article has been rated as Low-priority on the project's priority scale.

LAD support[edit]

By definition, any package that supports quantile regression will support least absolute deviation regression, which is just quantile regression at q=0.5. --189.125.124.24 (talk) 16:44, 23 November 2016 (UTC)Reply[reply]

RKWard windows support[edit]

RKWard is actually fully usable on windows by installing the KDE package. Source: http://sourceforge.net/apps/mediawiki/rkward/index.php?title=RKWard_on_Windows

So I think there should be a star specifying such support. — Preceding unsigned comment added by 66.36.128.146 (talk) 14:25, 4 June 2011 (UTC)Reply[reply]

Gretl and Stepwise[edit]

Can gretl do stepwise regression? I can find no mention of it inside the program or documentation, but I may be wrong - I'm quite new to it! tompagenet (talk) 22:02, 26 August 2008 (UTC)Reply[reply]

Yes, it can. In the output window, choose Tests> Omit variables. Free Software Knight (talk) 08:31, 22 October 2008 (UTC)Reply[reply]

External URLs[edit]

Most other software lists and comparisons do not have external URLs for every package. I think they should be removed from this article as well. --Karnesky 15:57, 6 September 2006 (UTC)Reply[reply]

GLM[edit]

Why is GLM listed in both ANOVA and regression? --Karnesky 15:57, 6 September 2006 (UTC)Reply[reply]

General linear model and Generalized linear model are both abbreviated GLM. Den fjättrade ankan 16:05, 6 September 2006 (UTC)Reply[reply]

Both ANOVA and regression analyses are special cases of the GLM. MW (talk) 15:15, 24 December 2009 (UTC)Reply[reply]

Procedure Comparisons[edit]

It seems pointless, and cumbersome, to list different capabilities for each package, like ANOVA, Regression, etc. First of all, there are just TONS of them: many, many more than could be listed in such a form. This makes the current listing utterly incomplete, and therefore very misleading. There are already links to each package's website, where such details are given. I'm for deleting these tables altogether. What are the thoughts out there? Ww.ellis 15:01, 29 November 2006 (UTC)Reply[reply]

I disagree--other software comparisons have complicated sets of features. What features do you think are missing? Why don't you just add them? --Karnesky 17:06, 29 November 2006 (UTC)Reply[reply]
I disagree too, I think the tables are useful. Den fjättrade ankan 21:58, 29 November 2006 (UTC)Reply[reply]
I do agree, these kind of tables are misleading from the early beginning, out of WP scope, always outdated, non NPOV by nature, I'm for deleting the whole entry. Jean R. Lobry 01:59, 6 January 2007 (UTC)Reply[reply]
I disagree for now. The sheer principle of the table is to give an overview of the general capability and orientation of the statistics package, which partially fails, as you point out. But in order to gain an overview over the statistics packages, this kind of table is the only way to start to get a more balanced overview. The table should - in future - be splitup according to the table headers main sections, and the sub-headers provided in separate tables. It's unavoidable that comparing proprietary and open source software involves a kind of original research, so either move the table to this discussion page, or keep. Said: Rursus 10:34, 25 May 2007 (UTC)Reply[reply]

I find the comparisons helpful, especially to narrow down the packages which may be appropriate to the kind of analysis I have in mind. I also refer students to this page when they are looking for statistical software. However, some of the descriptions are out of date (for example neither the SalStat home page nor the program are being maintained). Other descriptions, as noted in other comments, are incomplete. This page needs to be better maintained. I would do it myself but I'm not sure how to insert a footnote to the author, Salmoni. MW (talk) 15:19, 24 December 2009 (UTC)Reply[reply]

The comparisons are useful, albeit limited. Another alternative/additional approach would be to list the number of procedures provided by each application within a given category. For example, rather than checking whether each application allows plotting pie charts, bar charts, scatter plots, and so on ad nauseum, the table could summarise that 'application A' provides 7 types of chart, 'application B' provides 53 types of chart, and so on. I wouldn't suggest removing data already in the table and replacing it with this right away, but it is at least worth considering when contemplating the addition of new categories/columns to the tables! —DIV (137.111.13.17 (talk) 12:21, 30 January 2019 (UTC))Reply[reply]

Free[edit]

Lets change all the instances of "Free" in the Cost columns to "Gratis", least anyone become confused with the licensing terms. Ogranut 03:15, 11 January 2007 (UTC)Reply[reply]

I also disagree. This page has one important role - it shows which software is better. You pay more, you get more. "Free" is a simple term. Let's stick to it. Everybody understands "Free".

By the way, i haven't seen JMulti in the list... unfortunately i'm not competent enough to add it.. ;) R.B.

Nope, there's free, and there's gratis. "Gratis" means there's a zero-bucks version for download somewhere - gratis directs towards low-cost-custormers. Free means the source code is open for everyone to share, reprogram and redistribute - free directs towards finger-itching programmers that want to improve the program. Mostly free code can be achieved in a gratis version. See the preachings of our most revered prophet R. M. Stallman of the Emacs Church in Free Software Definition. Said: Rursus 09:09, 25 May 2007 (UTC)Reply[reply]

Gratis and Free mean the same things. Check the dictionary. Graemec2 (talk) 14:28, 28 April 2008 (UTC)Reply[reply]

Red/Green should be avoided[edit]

In the cost field, coloring open source cells green and non-open source cells red should be avoided because they correlate to good and bad endorsements (green means go, red means stop). —Preceding unsigned comment added by 199.253.16.1 (talkcontribs)

Rather than "open source," which places a value judgment on particular set of licenses, why don't we say "source available?" Having source code available is a feature & a differentiator (particularly for mathematical software). I don't see how you can argue that we shouldn't color code this feature, but should color-code platform support or a particular ANOVA method. --Karnesky 19:31, 20 February 2007 (UTC)Reply[reply]
Red means stop, or missing, or broken by a general tabular convention found at open source tables. Maybe cyan or blue for non-gratis software. It indicates the coolness (coldness?) of the market. Said: Rursus 09:13, 25 May 2007 (UTC)Reply[reply]
This page does not exist in a vacuum. There are many other pages with these kind of tables & they've all used red for "no" as opposed to anything else. Perhaps a new set of templates can be made for "free/open source" vs. "proprietary" (similar to the free (gratis)/nonfree templates). However, I think this page should follow conventions and consensus set forth by other pages. This is currently that green and red aren't being used to give value judgments. --Karnesky 14:07, 25 May 2007 (UTC)Reply[reply]
The same complaint was voiced at Talk:Comparison_of_computer_algebra_systems. JonMcLoone (talk) 16:35, 28 April 2008 (UTC)Reply[reply]
And, copying from that page: We had this discussion at Template talk:Yes. The consensus was that green means yes & red means no & that we aren't prescribing a value judgment. The 'but yes' and 'but no' templates were deleted for this very reason. --Karnesky (talk) 17:45, 28 April 2008 (UTC)Reply[reply]

(backdent) as an economist, I just want to throw out there that paying less is good. It also totally changes the environment when requiring students to use a package. PDBailey (talk) 00:53, 9 January 2009 (UTC)Reply[reply]

It doesn't automatically follow that commercial implies any difference to student use. eg in my world (Mathematica) student home use is included in the price of a site license. From the students perspective Mathematica is free. Someone is paying but not them. This is the same with free software, there is always someone is paying (at least with their time), just not the end user.JonMcLoone (talk) 10:57, 9 January 2009 (UTC)Reply[reply]
Red means no, green means yes? OK, so change the question: change the heading to "Closed source". All answers will then flip. Everyone happy then? —DIV (120.17.146.10 (talk) 03:34, 25 January 2019 (UTC))Reply[reply]

Asterisks[edit]

There are asterisks after some program's prices but there isn't an explanation for them anywhere.

Where is Octave ?[edit]

As Zarahemlite (talk) said, where is octave? rolandog (talk) 15:14, 25 September 2012 (UTC)Reply[reply]

Yeah, where is GNU Octave? If MATLAB (plus its Statistics Toolbox) is included, then GNU Octave (plus its Statistics Package, from Octave Forge) can be included too. —DIV (120.17.146.10 (talk) 02:30, 25 January 2019 (UTC))Reply[reply]

Where is MATLAB ?[edit]

I am shocked that all Matlab information is gone from this page. Is Matlab suddenly not a powerful statistical analysis package any more? —Preceding unsigned comment added by 71.98.91.175 (talk) 11:09, 5 May 2009 (UTC)Reply[reply]

I added it back. I agree with you, whoever removed it was wrong to do so. I had a quick look for prices, and could not find them. Clearly, like all the commercial packages, they try to lock students in with packages at vastly reduced rates. Toolboxes add extra complications too. I suspect UNIX versions might be more than Windows. Hence I just put depends on many things. Let someone else fill in the details if they want. But at least MATLAB is now back. It would be interesting to know who and why they removed it. Perhaps it was an accident.
In addition, where is octave? Zarahemlite (talk) 17:03, 30 January 2010 (UTC)Reply[reply]
Where is Design Expert? -134.84.166.40 (talk) 18:34, 26 February 2010 (UTC)Reply[reply]
Shouldn't Weka be here as well?

BSD on the OS list[edit]

Having BSD on the list of supported platforms makes this look out of touch. Might as well have DOS amd OS/2. I suggest we cut the column. Wordsoup (talk) 18:27, 7 March 2008 (UTC)Reply[reply]

I don't understand this reference to DOS or OS/2. The BSD operating systems are architecturally modern (sometimes leading / sometimes lagging), actively maintained, with reliable release schedules, and excellent reputations for stability. While the BSD user base is not large enough to interest the commercial vendors (is that what you mean by "out of touch"?), the BSD progeny remain important within the open source community, even if they are not everyone's cup of tea. MaxEnt (talk) 01:32, 19 August 2009 (UTC)Reply[reply]

Open source vs proprietary[edit]

Anyone understand why Dataplot and SalStat are listed as "open source", while their license is listed as proprietary, and hence not "open source". -ChristopherM (talk) 05:34, 3 April 2008 (UTC)Reply[reply]

SalStat is GPLed and on sourceforge. Dataplot is public domain (by NIST) --Karnesky (talk) 05:49, 3 April 2008 (UTC)Reply[reply]

DAP[edit]

Is anybody familiar enough with GNU DAP (Free software with at least some compatibility with SAS) to add it to the comparison matrices?

http://www.gnu.org/software/dap/dap.html

--CristoperB (talk) 19:37, 2 December 2008 (UTC)Reply[reply]

memory model would be nice[edit]

It would be nice to list the memory model. i.e., does it use the hard drive or system memory for storage of matricies / data sets. Those that use system memory rely on swap to deal with large data sets but are generally faster for small data sets and require more thought go into their algorithms. PDBailey (talk) 22:04, 8 January 2009 (UTC)Reply[reply]

Image copyright problem with File:Mainshot5.png[edit]

The image File:Mainshot5.png is used in this article under a claim of fair use, but it does not have an adequate explanation for why it meets the requirements for such images when used here. In particular, for each page the image is used on, it must have an explanation linking to that page which explains why it needs to be used on that page. Please check

  • That there is a non-free use rationale on the image's description page for the use in this article.
  • That this article is linked to from the image description page.

The following images also have this problem:

This is an automated notice by FairuseBot. For assistance on the image use policy, see Wikipedia:Media copyright questions. --23:27, 8 February 2009 (UTC)Reply[reply]

Eviews[edit]

Why does this page claim that the Eviews costs 40 dollars, if on the Eviews page it costs 600 dollars? —Preceding unsigned comment added by 92.245.195.116 (talk) 23:28, 11 February 2009 (UTC)Reply[reply]

Inclusion criteria[edit]

I think we need to define some criteria for including a package in these tables, as they are becoming too large to be easy to use and the page becomes a sort of Directory. At the moment the lead just says "a number" of statistical packages, which is a bit woolly. I'd suggest that the main criterion should be the first criterion of the common selection criteria for inclusion in a stand-alone list, i.e. "Every entry meets the notability criteria for their own non-redirect articles in English Wikipedia. Red-linked entries are acceptable if the entry is verifiably a member of the listed group, and it is reasonable to expect an article could be forthcoming in the future." In addition, we could specify that the packages must be currently available (so e.g. not GLIM). Thoughts? Qwfp (talk) 11:42, 17 July 2010 (UTC)Reply[reply]

Qwfp, I think that is a great idea. However, I think as long as the package is notable, it should be included. Obviously GLIM is hugely notable in the area of statistical computing since it changed the face of it. 018 (talk) 19:03, 13 October 2010 (UTC)Reply[reply]
Another question is how much statistics-stuff should a package include to count as a statistics package ... I presume we wouldn't want to have all of Category:Data analysis software. Melcombe (talk) 15:26, 24 June 2011 (UTC)Reply[reply]
This may not be such a philosophical question. Google Scholar searches for "review of econometric software" will bring up such articles published in the scholarly journals. I know "statistics" isn't just "econometrics', but you catch my drift. --189.125.124.24 (talk) 16:47, 23 November 2016 (UTC)Reply[reply]

Why isn't Excel included? Or to put it another way, what are the inclusion criteria which excludes Excel? It would be interesting to see how it compares to the other statistical packages (and may be a useful tool to justfy not using Excel).--NimbleThink (talk) 06:57, 13 February 2013 (UTC)Reply[reply]

Sorting in "Latest Version" column of General Information table is broken[edit]

The sorting feature of the General Information table is broken for the "Latest Version" column b/c of a mix of alphabetical and numerical data. Since some entries contain MONTH information in addition to YEAR, the table sorts via alphabetical order; however this breaks the functionality of the table, b/c it makes "August 2009" appear before "March 2010" ("A" obviously appearing in the alphabet before "M".)

I think the best solution to this would be to change the format of data in the "Latest Version" column to numerical dates only, consistent across all entries, then sort by numerical order. —Preceding unsigned comment added by 216.15.58.24 (talk) 01:29, 15 November 2010 (UTC)Reply[reply]

I've made that column unsortable for now as a quick fix. Changing the format of all the dates would take a while, and I'm not sure it's that useful to sort on "Latest version" so I'm not sure it's worth it. Qwfp (talk) 07:40, 15 November 2010 (UTC)Reply[reply]
Even if they were all in the same format, I'm not sure I'd want to be able to sort on that column. The only situation I can think of is to remove the old ones, but that would be better handled by a sortable, "actively maintained" column. 018 (talk) 17:31, 15 November 2010 (UTC)Reply[reply]

Weibull++[edit]

I think Weibull++ should be added here...188.194.248.102 (talk) 17:27, 23 November 2011 (UTC)Reply[reply]

Microfit[edit]

I think Microfit developed by Bahram Pesaran (Wadhwani Asset Management) and M. Hashem Pesaran (University of Cambridge)should also be in the list. — Preceding unsigned comment added by 175.156.156.100 (talk) 16:09, 15 December 2012 (UTC)Reply[reply]

SIMCA P+[edit]

Umetrics SIMCA P+ should be included. — Preceding unsigned comment added by 136.159.224.201 (talk) 23:12, 29 January 2013 (UTC)Reply[reply]

StatsDirect[edit]

I'd never come across this until I saw it cited in PLOS Medicine as the software used in this article. My sense from a very unscientific Google sniff is that it is meant for those who find SPSS too challenging - but a knowledgeable appreciation would be helpful. — Preceding unsigned comment added by Skeptic12 (talkcontribs) 21:45, 19 February 2013 (UTC)Reply[reply]


Should BUGS/WinBUGS be here ?[edit]

Yes, Bayesian packages including WinBUGS, OpenBUGS, JAGS and Stan should all be included.

Bayesian statistics has gone mainstream and penetrated popular culture:

  • McGrayne, Sharon Bertsch. (2011). The Theory That Would Not Die: How Bayes' Rule Cracked The Enigma Code, Hunted Down Russian Submarines, & Emerged Triumphant from Two Centuries of Controversy. [1]
  • Silver, Nate. (2012) The Signal and the Noise: Why so many predictions fail -- but some don't [2]

Jim.Callahan,Orlando (talk) 11:52, 21 March 2013 (UTC)Reply[reply]

The question is not so much "whether" but "how". Bayesian analyses are qualitatively different from other standard analyses in statistical packages. Thus there should be a way of indicating whether packages are capable of doing Bayesian versions of analyses for given questions. 81.98.35.149 (talk) 20:11, 21 March 2013 (UTC)Reply[reply]
Agreed: Bayesian capabilities would be a useful addition to one of the existing tables (if not a brand new table). —DIV (120.17.146.10 (talk) 02:39, 25 January 2019 (UTC))Reply[reply]
Not sure whether it is desirable to include all of those incarnations of Bayesian inference using Gibbs sampling (BUGS), or just have the one entry though. —DIV (137.111.13.17 (talk) 12:15, 30 January 2019 (UTC))Reply[reply]

References

  1. ^ McGrayne, Sharon Bertsch. (2011). The Theory That Would Not Die: How Bayes' Rule Cracked The Enigma Code, Hunted Down Russian Submarines, & Emerged Triumphant from Two Centuries of Controversy. New Haven: Yale University Press. 13-ISBN 9780300169690/10-ISBN 0300169698; OCLC 670481486 The Theory That Would Not Die, at Google Books
  2. ^ {{cite book|last=Silver|first=Nate|title=The Signal and the Noise: Why so many predictions fail -- but some don't|year=2012|publisher=Penguin|location=New York|isbn=978-1-59-420411-1

missing: where is StatTools?[edit]

StatTools has been available commercially from Palisade Corp, [1], since before 2000, yet it is not listed. — Preceding unsigned comment added by Thomcat00 (talkcontribs) 16:40, 19 April 2013 (UTC)Reply[reply]

References

  1. ^ www.palisade.com

OS X[edit]

Note that "Mac OS" is no longer current. Apple's current operating system for the Mac is called "OS X". Ricklaman (talk) 07:23, 18 August 2013 (UTC)Reply[reply]

Omit the "Approx installed base (users)" column[edit]

They are no estimates from a neutral source about user bases' and at the moment the whole column is totally empty! It just makes the table look sparser! I think it should be deleted. — Preceding unsigned comment added by Homo Ex Machina (talkcontribs) 20:34, 24 August 2013 (UTC)Reply[reply]

Good point. column removed. Qwfp (talk) 21:18, 24 August 2013 (UTC)Reply[reply]

Price?[edit]

I've looked at the article, and I can't find any pricing information for the software. Yet there is a footnote and a short discussion on the limitations of pricing information at the beginning of the article. Why is that information there?

I am not arguing for the inclusion of pricing information; it's probably too volatile to maintain and has too many caveats, etc. (Different packages. Different geographical prices. Introductory pricing. Subscription vs. purchase. Etc.) However, if there's no pricing information in the article, shouldn't the pricing related text also be deleted?Fredrik Coulter (talk) 13:46, 23 January 2015 (UTC)Reply[reply]

Qtiplot missing[edit]

I'm surprised not to see Qtiplot among the list of statistical software. It is a widely used scientific software. Shouldn't we include it?  :) Muriel (talk) 13:39, 29 May 2015 (UTC)Reply[reply]

Request for changes to MATLAB rows in the tables[edit]

Hi. I work for MathWorks and I happened upon this page. To avoid any accusations of partiality, I'd like to request someone else make two changes to the MATLAB rows in two of the tables.

The first relates to the most recent release. The "General information" table currently lists release R2018b as the latest release, but release R2019a is now available as this MathWorks press release states.

https://www.mathworks.com/company/newsroom/mathworks-announces-release-2019a-of-matlab-and-simulink.html

The second is in the "Operating system support" table. MATLAB is listed as not having cloud support, but it does through the MATLAB Online product.

https://www.mathworks.com/products/matlab-online.html

One of the supported products available for use with MATLAB Online is Statistics and Machine Learning Toolbox, as you can see from the Supported Toolboxes and Limitations link near the end of that page. Because other rows in other of the tables on this page reference MATLAB + Statistics, I believe this qualifies it as a statistical package having cloud support.

Thanks. — Preceding unsigned comment added by 2601:192:4C7F:E8B0:1186:7045:3EC5:1EB3 (talk) 03:15, 18 June 2019 (UTC)Reply[reply]