AWS Data Pipeline
AWS is slowly becoming the Oracle of our generation, in the sense that they have found a way to lock startups and large companies into a software/services ecosystem that is really really hard to stop using once you get started.
You start with regular open-source instances, but that's just the hook. Once you have EC2, it's really easy to get started with AWS 'magic' services like Elasticache and RDS. It's easier than setting up a memcache cluster or mysql right? But once you get comfortable with those services, it's just so easy to keep going down that road and making your software reliant on proprietary services like SimpleDB, S3 and AWS Data Pipeline. And then you wake up at some point and find that you're 100% dependent on AWS.
By that point, if you're lucky your monthly AWS bill gets you an invite to speak at the next AWS conference. :-) You might even get a personal customer support rep that calls you when your servers go down.
A website/service cannot by definition be HA if it's reliant on one service or infrastructure provider. AWS has so many proprietary parts now that you really need to be careful which ones to use so that you don't wake up one day and realize that you're completely dependent on AWS.
I'd stay away from this with a 30-foot pole, but if we really did need to use it, I would only use the features that I felt comfortable building internally at some future point if we chose to move off of AWS.
It's important to keep your software stack as flexible and open as possible, and for risk-management you should plan on using (or least having the option of using) multiple vendors and service providers.
A whole lot of glue-job VMs just became unnecessary.
Just this week I was looking for a better solution that would back up my RDS database to S3. I'm currently using mysqldump, but the RDS instance size has grown extremely large and so, it has become unwieldly. Hopefully this will help with that.
It's a mainframe in the cloud.
undefined
Dear AWS hire a designer. Thanks.
ETL-as-a-Service
You shouldn't really be trusting Amazon with your datawarehouse or paying that much for the storage, but from a technical convenience standpoint AWS is probably the best solution for some of the horrid little inept kinds of organizations that I have encountered.