Hacker News Clone

Fwaf – Machine Learning Driven Web Application Firewall

by Faizann20 on 5/14/2017, 8:08 AM with 18 comments

by rfoo on 5/14/2017, 3:54 PM

It seems like what it actually "learned" is no better than banning some keywords, for example:

  >>> p=lambda x:lgs.predict(vectorizer.transform([x]))
  >>> p("/product.php?name=etc")
  array([1])
  >>> p("/login.php?name=rfoo&pass=hehe")
  array([1])
  >>> p("/download.php?file=/root/.bashrc")
  array([0])
  >>> p("/example/test/q=" + lorem + "<script>alert(1)</script>") # len(lorem) = 4488
  array([0])

(FYI 1 means malicious and 0 means clean)

by elchief on 5/14/2017, 6:22 PM
Doesn't look like he did any cross-validation, hence the high accuracy. Always keep a hold-out set to test against
by Faizann20 on 5/14/2017, 10:42 AM
In case someone is having difficulty with the link, here is an alternative:
https://web.archive.org/web/20170514081124/http://fsecurify....
Apologies for inconvenience.
Thanks
by halflings on 5/14/2017, 11:12 PM
Like others have said, you might be overfitting your training data here: your model is just memorising the examples you give it and would fail if somebody slightly varies some payload. (inserting some whitespace or something)
Another thing to keep in mind is that an accuracy of 99% doesn't mean much in an unbalanced problem like yours (much more clean queries than malicious once).
What you should show instead is precision (of the ones labeled malicious, how many are actually malicious?) and recall (out of the malicious queries in the dataset, how many did your model label as malicious?)
by bsg75 on 5/14/2017, 7:24 PM
The similarities in name _and_ logo to F-Secure are a little bothersome.
by based2 on 5/14/2017, 4:22 PM
http://yararules.com/2017/04/06/yara-rules-strings-statistic...
by bllguo on 5/14/2017, 12:19 PM
Fun, will look into this! Wonder if anyone can point to other datasets?
by Faizann20 on 5/14/2017, 10:08 AM
The website is working a bit slow but the page is loading. If you are facing any problem, please wait for a minute and the page will load.
by mcboman on 5/14/2017, 8:29 AM
Why use a trigram as n ?
by godmodus on 5/14/2017, 11:52 AM
Woo, saving this.im planning a similar ptoject once im done with my studies.