Hacker News Clone

Deep interpolation – denoising data by removing independent noise

by gtsnexp on 10/18/2020, 5:05 PM with 28 comments

by jrockway on 10/18/2020, 8:12 PM
Interesting. When I was in high school, I wrote some software to denoise datasets with a genetic algorithm (it's what we had before deep learning was a thing ;). I showed it to my math teachers and they were horrified; they told me that you cannot change data once it's been collected.
I realize now that their take is kind of wrong -- pretty much every instrument in the world performs filtering on what it samples before it presents you a number. Saying you can't filter data kind of misses the point, because almost every real-world system is bandwidth limited and acts as a low-pass filter. But sometimes there are different noise sources, and you can probably write an algorithm to attack them without being "inaccurate". (For example, ever measure very low voltages at relatively high frequency? You'll see a nice 60Hz component in there, because you are surrounded by a 60Hz electric field. That's noise, not data. Apparently averaging through multiple power line cycles is fine, but writing a program to do the same thing is wrong. Seems weird to me.)
Anyway, I'm still bitter about it. (That, and how in 4th grade I got a C in science because we had to make a flipbook about earthquakes that had to be titled "quakin' shakin' earth" and I spelled "quakin'" and "shakin'" wrong. Still don't know how to spell either of those contractions, or what flipbooks have to do with science. They don't come up in real life much. But I sure am bitter, 26 years later.)
by refactor_master on 10/18/2020, 7:03 PM
Isn’t this just a specific application of an autoencoder though? Because random noise isn’t learnable, it gets filtered out if you teach the network to compress/decompress a sample with itself as target.
https://blog.keras.io/building-autoencoders-in-keras.html
With that said, it’s amazing how effective this concept is for cleaning up scientific data. Many have also used variational autoencoders to take it one step further and also cluster data by latent space features. I’ve used it myself to uncover various groups of behaviors in time-series from other cellular processes.
by mota7 on 10/18/2020, 7:46 PM
The difficultly here is that there's an implicit assumption: 'Noise' is implicitly defined to be anything that isn't learned by the network.
Now in same cases that may indeed be actual process noise! But there are many fully predictable data series that are difficult for deep networks to learn and in those cases actual signal will be silently removed.
by webmaven on 10/19/2020, 1:09 AM
Regarding the license:
> Allen Institute Software License – This software license is the 2-clause BSD license plus clause a third clause that prohibits redistribution and use for commercial purposes without further permission.
I hadn't seen this before, If I am interpreting it correctly, it is basically saying "Look but don't touch, except you can run it for verification/reproducibility purposes". Right?
I suspect that even a non-profit, running this code or a derivative of it on their own servers would run afoul of the additional restrictions.
This means that a researcher that uses any of this code or it's derivatives in their own research is planting a huge downstream landmine into their own code.
by pizza on 10/18/2020, 5:50 PM
I think this is the related paper: Removing independent noise in systems neuroscience data using DeepInterpolation
https://www.biorxiv.org/content/10.1101/2020.10.15.341602v1....
by fghorow on 10/18/2020, 10:08 PM
As someone who has worked on "both" interpolations and image compression in a geoscientific context, the issue is specifically what is the model for the signal. If the model is accurate enough, then yes, removing independent noise is of great value. In my worldview, the big problem is what is the physics behind the signal, and where precisely does it change from one characteristic class to another. Those are the hard problems.
by nl on 10/18/2020, 10:21 PM
Allen Institute Software License – This software license is the 2-clause BSD license plus clause a third clause that prohibits redistribution and use for commercial purposes without further permission.
That's a big pity.
by bschwindHN on 10/19/2020, 4:34 AM
https://github.com/AllenInstitute/deepinterpolation/blob/725...
Is it really that hard to use relative paths? This kind of messy programming is why it's so hard to reproduce so many scientific research papers.
by juangburgos on 10/20/2020, 8:27 AM
Just use a simple linear non-causal filter, i.e. low pass filter forward in time and then backwards in time. Do this for any desired bandwidth and iterate as many times as you want. No need for neural networks in here.
by miguendes on 10/18/2020, 6:38 PM
Allen Institute produces many nice things.
One thing I don't quite understand about this project is, what kind of noise it removes? Any kind of noise from any kind of dataset?
I wish there was a more beginners like explanation on the README.
by vsskanth on 10/18/2020, 8:47 PM
Is it possible that it is learning a kalman filter indirectly ?
by gtsnexp on 10/18/2020, 5:05 PM
Interesting development by the Allen Brain Institute.