50 terms most predictive of a submission making it to the front page
Results are based on stories submitted in 2015.
Overall ROC AUC score: 0.634 (front page: 60.00%, not front page: 84.70%)
0.480619587060 pdf
0.400951486155 yc
0.357299773818 c
0.345074611885 2013
0.312349086474 2014
0.282104533149 2012
0.267570843408 language
0.261375936103 2011
0.241650745481 go
0.219169898626 haskell
0.211257263299 show hn
0.205623463348 rust
0.199571502981 i
0.195598572648 programming
0.195274408452 lisp
0.186177768937 2010
0.178721616659 a
0.167683265531 theory
0.164806775214 fast
0.163041352661 linux
0.160414928257 2009
0.159121703294 released
0.158462774218 yc w15
0.157937056948 0
0.155722711980 w15
0.153727530604 memory
0.152325392129 openbsd
0.140109894354 compiler
0.139065804971 an
0.135471235636 open-source
0.131606742813 deep
0.131399688219 unix
0.131269232599 gnu
0.131168543633 kernel
0.125287242638 show
0.124609324199 firefox
0.124602305508 os
0.123616770730 who
0.120643387605 computer
0.117908824287 the
0.115902369932 modern
0.114483061102 hn what
0.112337592986 ocaml
0.111354775352 programmer
0.109343194868 postgresql
0.108654447345 math
0.107721823851 2008
0.106274292219 in go
0.106071633308 2006
0.104932567748 perlThis list includes ocaml, postgresql and perl but not java, javascript, python, C#, or c++.
How sure are you of your data and your results?
Because imo those are pretty remarkable results if they're accurate and the lack of discussion of them in this thread is itself remarkable...
The year's coefficient is misleading. When an old article gets to the front page, the title is usually changed from "The Title" to "The Title (year)" by the mods. So the submissions of old articles that are successful get the year number but the unsuccessful ones don't get it. (It would be interesting to repeat this with the original title.)
(There are some exceptions, sometimes the submitter adds the year to the title.)
Also, 100 terms most predictive of a submission not making it to the front page:
-0.335386489547 startup -0.331723905544 2015 -0.321593118669 app -0.306937335575 your -0.305739531214 how to -0.275438550569 this -0.261565592652 business -0.252649164518 product -0.250614203448 mobile -0.236041160710 marketing -0.227196421746 top -0.208139598304 with -0.206031814574 5 -0.203087091676 ios -0.202457032685 design -0.201021718651 watch -0.200267475193 startups -0.197466134506 ask -0.196357335391 or -0.192562683469 10 -0.191253124976 best -0.190867070325 ask hn -0.187721441778 cloud -0.187394374070 android -0.186461809237 smart -0.184024063073 you -0.183827664018 tips -0.182653896122 growth -0.181372850037 for -0.178198606780 could -0.162472422631 blog -0.162207059285 java -0.160644447613 development -0.159487418681 social -0.157294135483 should -0.156980003088 bitcoin -0.150609220130 iphone -0.148979317953 tech -0.148345714371 testing -0.147454333035 change -0.145491827860 list -0.145485290331 to -0.144015642286 3 -0.143708682318 robot -0.142186986230 tools -0.140812013948 twitter -0.140696100278 rails -0.140548788801 software -0.138527298008 future -0.138172121531 good -0.138015521103 internet -0.137281744329 facebook -0.136342150691 security -0.134144777413 content -0.133091842596 awesome -0.133049592053 angularjs -0.133019163138 create -0.131147662198 meet -0.128568740027 live -0.125766592272 wordpress -0.125681496867 star -0.125433963958 here's -0.124980970020 test -0.123513256155 day -0.123227292738 podcast -0.123085547655 feedback -0.122558159240 uber -0.122365526765 bill -0.121846476127 things -0.121766619177 online -0.121674711692 entrepreneurs -0.121271063379 vr -0.120835224059 devops -0.120704156113 website -0.120668008266 resources -0.119873591378 tutorial -0.119600975052 6 -0.119263351612 most -0.118987167145 api -0.118767754130 apps -0.118683692890 digital -0.116745925093 will -0.116477896000 data -0.116317401689 needs -0.116223838757 need -0.115050697065 market -0.114878154258 3d -0.114105916526 more -0.111918004178 help -0.111764422735 apple -0.111326594562 new -0.110914386417 year -0.110475338587 customer -0.109564041456 technology -0.109468606136 iot -0.109381535069 application -0.109146062602 4 -0.108483540034 solution -0.108171407112 music -0.107249340464 dronethat's pretty neat. you could build a model using conditional probabilities to generate fake HN submissions. both good and bad ones.