Hacker News Clone

Ask HN: Premature optimization – when to use?

by nekopa on 8/11/2015, 2:47 PM with 5 comments

I know the old adage, but I think I have a use case where it is false, so I want to see what HN has to say about it.

Fist of all, this is a private project, something I am working on with me as the only user. So the whole market fit issue shouldn't be relevant*.

So I am working on a webscraping project, mainly through trying to apply what I have leared through the learn python the hardway course. So far so good, everything is working well. But when I deployed my first spider, it took an hour, and returned ~20 million lines for one year (I want to get data going back at least 5 years).

So its working the way I want, but I am thinking, should I start looking at my code and see if there are ways I can speed it up?

Either way, it's not a problem, I have spare computers I can set to run the program for hours on end.

But should I waste electricity running a hack, or should I try to start optimizing now?

by TheCams on 8/11/2015, 5:00 PM
I think what could be considered an early optimization (but not premature) is the architecture design of your software.
I dont know how you did your application, but in this example, you could have made the design choice of a heavily multithreaded application to fetch and process multiple pages at the same time.
It could be an early optimization that saves you a lot of refactoring later.
by aburan28 on 8/11/2015, 3:09 PM
Find your bottleneck and see what function calls are the slowest/fastest and find out the root cause of any perceived lack of performance. Just run python -m cProfile <yourscript.py> and you'll get that information
by dudul on 8/11/2015, 3:11 PM
In your case this is not premature optimization. You have written functioning code, and use it to tackle your problem, you just saw (and proved) that it wasn't fast enough, now you can optimize.
Premature optimization is when you try to write "clever" and "fast" code without first exercising it in real world scenario to actually see that it is not fast enough.