Load HTML
This is the APOC Extended documentation. APOC Extended is not supported by Neo4j. For the officially supported APOC Core, go to the APOC Core page. |
Scraping Data from Html Pages.
|
Load Html page and return the result as a Map |
This procedures provides a very convenient API for acting using DOM, CSS and jquery-like methods. It relies on jsoup library.
CALL apoc.load.html(url, {name: <css/dom query>, name2: <css/dom query>}, {config}) YIELD value
The result is a stream of DOM elements represented by a map
The result is a map i.e.
{name: <list of elements>, name2: <list of elements>}
Config
Config param is optional, the default value is an empty map.
|
Default: UTF-8 |
|
Default: "", it is use to resolve relative paths |
|
Default: false, to use an HTML string instead of an url as 1st parameter |
Example with real data
The examples below use the Wikipedia home page.
CALL apoc.load.html("https://en.wikipedia.org/",{metadata:"meta", h2:"h2"})
You will get this result:
CALL apoc.load.html("https://en.wikipedia.org/",{links:"link"})
You will get this result:
CALL apoc.load.html("https://en.wikipedia.org/",{metadata:"meta", h2:"h2"}, {charset: "UTF-8"})
You will get this result: