Show HN: Page.REST – An API to fetch details from a web page as JSON
For absolutely 0-cost you can use http://proc.link which will return oembed info for ANY url.
General example: http://api.proc.link/oembed?url=http%3A%2F%2Fpage.rest
Youtube: http://api.proc.link/oembed?url=https%3A%2F%2Fwww.youtube.co...
Facebook: http://api.proc.link/oembed?url=https%3A%2F%2Fwww.facebook.c...
If I have to know the elements' selectors, why should I prefer this service over using a HTML parser?
I would make the 5$ price and token validity much larger, like "4rem" or something, I was looking at the CC input field and thinking "seriously? how much will you charge?"
Not unlike YQL - https://developer.yahoo.com/yql/
Uhm this is interesting. Reminds me of: https://wrapapi.com
Looks interesting. I wonder what kind of market this app might serve. For larger apps, I would worry about support. 5 dollars per year tells me that the developer is doing this as a hobby. For small side projects, I can see tinkerers building this themselves.
I did something similar a loooong time ago. Granted not as sexy.
I wrote a follow up blog post about what I learnt from shipping Page.REST http://www.laktek.com/what-i-learned-from-building-pagerest/
Added support for OpenGraph extraction https://www.page.rest/#open-graph
Is there a await for JS Frontends to initialise before the scrape via a selector (i.e. Angular/Ember etc)?
Someone could really abuse this service, I don't see any mention of API limits.
Zapier it.
How do you handle sites that have scraper prevention? Such as captcha, IP throttling, etc.
I'd pay $100-$500 per-month for a service that could reliably scrape some particularly difficult sites. With that being said, I'd need the service to be able to handle ~100 req/s in bursts and 2-4 req/s on average.