Hacker News Clone

DigitalOcean not destroying droplets securely, data is completely recoverable

by nixgeek on 3/31/2014, 1:03 AM with 66 comments

by powera on 3/31/2014, 2:45 AM
I think the moral of the story is that if you are so concerned about IMMEDIATELY deleting your data and a 48 hour period where you can recover it is unacceptable, you should definitely run your own servers.
by spindritf on 3/31/2014, 2:01 AM
Both retaining the IP and being able to recover a destroyed droplet are strictly features. It would be a problem if someone else could create a droplet and recover your data, not when you can do it.
by derefr on 3/31/2014, 1:43 AM
DigitalOcean instances run with a machine-shared storage pool (think EC2 ephemeral storage), which is why not securely erasing them was a problem.
The "destroyed instance" you see in the "spawn instance from a template" UI, on the other hand, is a snapshot of the destroyed instance, taken upon the instance's destruction. Snapshots are stored in a separate network-object-storage pool (think S3), and raw-reading your ephemeral storage won't turn up deleted snapshots.
Securely erasing an instance means erasing its data from shared ephemeral storage. It doesn't mean erasing any snapshots of it, because snapshots aren't located on shared ephemeral storage.
by zagi on 3/31/2014, 2:19 AM
Thanks for pointing this out, nearly 1 year ago we implemented a backup mechanism which stores a destroyed machine for 24 hours. This is only enabled for users with a valid paying account and we used this mechanism sometimes to return a droplet to a customer who accidentally deleted it, other times there were problems with an integrated third party where we were able to recover customer droplets because of a security problem.
We will take the necessary steps so that if users are enabling scrub all data permanently then we will not store this temporary image and therefore destroys will be immediate and permanent.
by raiyu on 3/31/2014, 2:54 AM
Hi Folks,
Just wanted to clarify the issue for anyone who didn't have the time to read the full gist.
When we first started DigitalOcean we occasionally received tickets from customers about recovering a droplet that they had destroyed. Unfortunately when a droplet was destroyed it was gone from the system and it wasn't possible to recover. To help our customers we decided that it was a good idea to take a temporary snapshot of the droplet after the destroy was issued that would automatically expire. This way if someone mistakenly destroyed a droplet they could still recreate it.
This proved to be a lifesaver for many customers of DigitalOcean when a third party company that provided a provisioning service that integrated with DO, AWS, Rackspace, etc. was compromised and the attacker issued a delete to all customers and all instances. Because this mechanism was in place we were able to recover almost everyone's droplets.
We ran into an issue with securely scrubbing data which was publicized on HN and we implemented a fix immediately with a scrub flag. Unfortunately we made a mistaken and made the default setting false. Most customers often click the default, and I myself do the same thing, since I assume that the default is the best course of action and this led to this issue resurfacing. This also was posted to HN and we immediately decided that the default behavior should be to scrub.
Prior to this when a customer selected scrub securely because they were taking two actions, issuing a destroy, and setting a flag, it was safe to assume that they indeed want the data completely destroyed. However when we had to reverse the default we were left in a situation where the default would not create a temporary snapshot if we used the secure destroy flag as the indicator for whether or not a temporary snapshot should be created.
Since we've implemented the temporary snapshot feature we have had 1154 droplets that have been restored after a destroy from different 752 customers.
That's 752 customers that were elated to find out that they could recover a droplet that was mistakenly destroyed so obviously this is a very beneficial feature since each time one of those customers recovered a droplet it was a huge win for them.
We assumed that since the temporary snapshots are automatically destroyed this would not be an issue. In fact in the control panel we provided an additional feature which would make the snapshot permanent otherwise that snapshot is deleted.
I think the issue that is brought up here is definitely worth a discussion and we treat security very highly. Since we had the prior HN post regarding changing our default behavior we have been working behind the scenes to ensure that would be the default behavior so that the scrub flag could be removed entirely and that all destroys regardless of how they would be issued would be secure.
That behind the scenes work is almost entirely done so this discussion of the temporary snapshot is great because it allows us to revisit this issue once again.
We have not had any other customer complaints that during a secure destroy the droplet and the backups and the snapshots were not immediately destroyed. So it was great to engage in a conversation with the customer to understand their view on how they wanted these commands to functions.
We'll be engaging with engineering tomorrow to see if it's time for us to begin to phase out the scrub data flag and instead perhaps open up a new flag which would create a temporary snapshot.
For the UX/UI of the control panel we would make the default behavior of the destroy create a temporary snapshot and we would have to discuss to see if the API should behave the same way.
Often API customers are creating and destroying many servers so it may be safe to assume that they do not want a temporary snapshot, though having default behaviors differ between the control panel and the API is generally not a good idea.
I think in general this highlights an issue that all startups deal with. That is as the product grows and matures and as features are added there are often unintended cascading consequences.
In this case we have done our best to do right by the largest number of customers to ensure that data is safely and securely destroyed while still providing a default behavior that would protect customers against accidental destroys whether they be self-initiated or otherwise.
If anyone has any questions regarding this issue or anything else always please feel free to email me directly, my first name, Moisey, at DO (expand that) . com.
Thanks, Moisey Cofounder DigitalOcean
by unclesaamm on 3/31/2014, 3:17 AM
This is a total non-issue. OP was wrong about his interpretation of the recoverable droplet, then pursued an argument with the same high-pitched rhetoric about what amounts to a UI issue. Sorry, not news
by bluesix on 3/31/2014, 4:48 AM
Surely this whole mess can be fixed with a 5 minute UI change - reword the copy and add a checkbox to do/don't take a snapshot (default to checked (take a snapshot)).
by michaelmior on 3/31/2014, 4:23 AM
Kudos to raiyu and nixgeek for one of the most civilized discussions among disagreeing hackers I've seen in quite some time! :)
by eof on 3/31/2014, 4:33 AM
The notion of having having 'secure' data on someone else's hardware is just a bit silly.
I think the OP here definitely points at something, but primarily that the `scrubbing` checkbox is essentially a placebo button.
Getting a little meta: a 'no matter what delete this in a fully 100% absolutely totally unrecoverable forever fashion' checkbox is just begging for a generic law enforcement ping which DO would be forced to provide covertly.
I appreciate OPs side, I definitely see DO's viewpoint of customer happiness >> accurate UI.. but the lesson here is definitely to own your own necessarily-secure data.
by unreal37 on 3/31/2014, 12:47 PM
You have to hand it to Digital Ocean for actually listening to customers, explaining themselves thoroughly, and taking the issue to the community (HN on several instances) for discussion. Totally civil all around. Thanks to nixgeek for raising the potential issue, and thanks to raiyu for engaging in a meaningful back and forth discussion with everyone.
The issue itself? I have accidentally "terminated" a few AWS instances that I instantly wished I hadn't, and so I can see the benefit of it sticking around for 24 hours. This would have saved me a few times if I was using DO instead.
by ivan_gammel on 3/31/2014, 11:59 AM
Apparently, this is UX issue. The checkbox is not the right choice for allowing customers to make the decision.
I can suggest couple solutions: Option 1. Explicitly state during the signup process that for recoverability purposes all customer data is removed in N hours after request. At this point DO may loose some customers, which have wrong expectations from the service, but the interface will be simpler.
Option 2. Replace checkbox with additional confirmation page, that asks customer about data removal strategy ("thrash can" or "scrub"). There should not be default selection here. Additional safety measures can be implemented to avoid occasional selection of "scrub" - confirmation by e-mail, using SMS security code or some other "two-factor" approval.

by osteele on 3/31/2014, 1:25 PM

The behavior sounds considered; the web site doesn't describe it to the user.

How about if the Destroy dialog read something like:

  This is irreversible. We will destroy your droplet and all associated backups immediately. We will keep a snapshot that you can use to recover your droplet; you can disable this below.

  [v] Scrub data - [etc.]

  [v] Temporary snapshot - this will keep a snapshot that you can use to recreate your droplet. This snapshot will be destroyed in 24 hours.

and the Select Image list showed something like:

  Destroyed Droplets

  chef.nl-haa1.infr.as f… — automatically deleted at 2014-04-01 09:25 UTC

by lazyant on 3/31/2014, 2:45 PM
In my experience the people that wants to restore a destroyed instance outnumber the ones that want the instance scrubbed right away by 10-1 or so (if not more), so basically we (at another ISP) would decommission the instance (which could not be allocated to another customer) and leave a grace period after which it would be effectively scrubbed. If a user wanted to scrub immediately they could send a ticket and we would do it right away (this was noted in the "power down" email to the user), we saw very few of those.
by sigil on 3/31/2014, 2:06 AM
Question for fellow paranoid HNers: what do you use to decommission a server? Do you run shred(1) on all "interesting" files? Do you write over the block device itself with random data?
by pearjuice on 3/31/2014, 9:49 AM
Earlier using DigitalOcean I also noticed that in the .bash_history there would be a wget for a script on the website of a DigitalOcean employee which had all kinds of clean-instructions.
by Demiurge on 3/31/2014, 1:59 AM
Anyone else feels like hiring Moisey immediately?
by bakhy on 3/31/2014, 9:17 AM
this headline is misleading click-bait. the customer recovered only his/her own Droplet, not someone else's. and the DigitalOcean explanation is perfectly reasonable. they should only perhaps improve their UX to not surprise users with this.
all in all, someone is venting frustrations.
by solomone on 3/31/2014, 7:53 AM
This headline is a bit sensational given the the OP was incorrect in their assumption. Should probably be changed to: DigitalOcean leaves your droplet around 24 hours after you destroy it. If you care, destroy your own data.
by kakashi19 on 3/31/2014, 2:27 PM
destroy: put an end to the existence of (something) by damaging or attacking it: the room had been destroyed by fire.
--- Oxford Dictionary
by good_guy on 3/31/2014, 2:12 AM
Previous discussion, https://news.ycombinator.com/item?id=6983097
by nixgeek on 3/31/2014, 1:44 AM
Also putting the word out via Twitter:
https://twitter.com/nixgeek/status/450438984574193665
I think awareness is key with these type of issues, infrastructure providers are very opaque beasts and the underlying platform behaviour varies with each of them.
Knowing that you may need to erase sensitive data yourself before initiating the destroy, so that it is not captured in the snapshot, that's probably half the battle.