Why MS Excel Is a Poor Choice for Data Projects

  • This article seems to be a bunch of random claims with no evidence.

    If you want to demonstrate how Excel is a poor choice, show how someone would do data analysis using excel, then show how your alternative system, whatever it is, would do it "better", for some definition of better you want to use.

  • I'm surprised they didn't mention Looker as an alternative (I have zero association with the company). They also missed R. yes it is just a language but its impressive growth and tools make it a compelling alternative.

    I think the snowplow analytics guys (again zero association) have better and arguably more transparent/honest documentation. Particularly the whole end to end complete process.

    One of the reasons people like Excel (besides the obvious ubiquity of that software) is that they don't have to put their data on some other service. Some even feel it is more secure (this is perhaps somewhat false). That beings said I don't know if I would ever trust a company hosting all my data collection/warehouse needs like the authors of this blog. They might not do anything bad with the data but they sure do have a lot of leverage on you once you completely rely on them.

    The other thing is the proprietary data visualization/calculation companies will come and go. I bet Excel will still be here 20 years from now. That is why R and SQL are also good things to learn as well.

  • Another issue is that there is a massive difference between casually knowing excel formulas and being familiar with power query/power pivot/etc cetera. I think someone well versed in DAX expressions and other portions of excel could probably handle a few midsized things, that said, if your dataset is gigabytes then obviously you're going to need a specialized database or custom programming.

  • Lost me at the opening sentence:

    > A business organization or enterprise always needs to adopt multi-directional approaches.

    Reads like a marketing class assignment from junior high.

  • > With MS Excel failing to offer optimum results and performance, organizations across the business landscape are in search of better tools and technologies to bring out insights from mere numbers. With the advanced data visualization tools mentioned here, they will have perfect opportunities to leverage the power of information and data.

    Who writes like this in 2017?

  • I used to believe that Excel was a bad choice and that users should use a real modeling language. I've changed my mind. Excel is the only tool I've seen that actually gets domain experts who aren't programmers to express their models in any kind of formal language. It deserves a lot more credit than it gets.

  • A good rule of thumb for using spreadsheets vs. a programming language/database is to only limit the amount of data in the datasheet to what you can perceivably scan with panning/scrolling around the spreadsheet, otherwise there will be a lot of inefficiency/performance issues when organizing/analyzing the clump of data, which is bad in the long run.

    A fair limit is 1,000 rows in a sheet, which is enough for most utilitarian use cases (e.g. daily-aggregated data for a year), and certainly enough for bespoke models. I would not recommend using spreadsheets for Kaggle competitions, though.

  • This seems to be a theme of my comments lately, but I wonder why the writer doesn't mention SSAS, SSIS and SSRS as alternatives.

  • I once had a customer try to load a 20GB CSV file into Excel. Well, it didn't take long to crash, that's for sure.

  • spam?

  • Here's a good example of a dataset that cannot be loaded into Excel:

    https://catalog.data.gov/dataset/crimes-2001-to-present-398a...

    > The dataset contains more than 65,000 records/rows of data and cannot be viewed in full in Microsoft Excel.