Load HTML
Scraping Data from Html Pages.
|
Load Html page and return the result as a Map |
This procedures provides a very convenient API for acting using DOM, CSS and jquery-like methods. It relies on jsoup library.
CALL apoc.load.html(url, {name: <css/dom query>, name2: <css/dom query>}, {config}) YIELD value
The result is a stream of DOM elements represented by a map
The result is a map i.e.
{name: <list of elements>, name2: <list of elements>}
Config
Config param is optional, the default value is an empty map.
|
Default: UTF-8 |
|
Default: "", it is use to resolve relative paths |
Example with real data
The examples below use the Wikipedia home page.
CALL apoc.load.html("https://en.wikipedia.org/",{metadata:"meta", h2:"h2"})
You will get this result:
data:image/s3,"s3://crabby-images/57e2e/57e2edd130aaa462fbf9be0e2dd4d44825dd9094" alt="apoc.load.htmlall"
CALL apoc.load.html("https://en.wikipedia.org/",{links:"link"})
You will get this result:
data:image/s3,"s3://crabby-images/1654e/1654e01de50f3abe1a2c63a2baef4f32c684edd8" alt="apoc.load.htmllinks"
CALL apoc.load.html("https://en.wikipedia.org/",{metadata:"meta", h2:"h2"}, {charset: "UTF-8})
You will get this result:
data:image/s3,"s3://crabby-images/507ba/507ba96512fb8f084bf0433a2ce335c0a98852c7" alt="apoc.load.htmlconfig"