| Title: | Parse html files containing search engine results |
|---|---|
| Description: | Parse search engine results which have been scraped with the 'WebBot' browser extension <https://github.com/gesiscss/WebBot>. |
| Authors: | David Schoch [aut, cre] (ORCID: <https://orcid.org/0000-0003-2952-4812>), Chung-hong Chan [aut] (ORCID: <https://orcid.org/0000-0002-6232-7530>) |
| Maintainer: | David Schoch <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.0.9000 |
| Built: | 2026-05-16 06:25:22 UTC |
| Source: | https://github.com/schochastics/webbotparseR |
Convert a data uri to an image in the correct format and save it to a file.
base64_to_img(data_uri, slug)base64_to_img(data_uri, slug)
data_uri |
charachter, base64 image string as returned by parse_search_results |
slug |
character, name of file to export image to. WITHOUT extension |
nothing, called for side effects
## Not run: data_uri <- paste0( "data:image/png;base64,", base64enc::base64encode(system.file("logo.png", package = "webbotparseR")) ) base64_to_img(data_uri, "logo") ## End(Not run)## Not run: data_uri <- paste0( "data:image/png;base64,", base64enc::base64encode(system.file("logo.png", package = "webbotparseR")) ) base64_to_img(data_uri, "logo") ## End(Not run)
Parse metadata from search engine results
parse_metadata(path)parse_metadata(path)
path |
character. a path to a file that contains search results |
a tibble of parsed search engine results
parse_metadata("www.google.com_climate change_text_2023-03-16_08_16_11.html")parse_metadata("www.google.com_climate change_text_2023-03-16_08_16_11.html")
Parse search engine results
parse_search_results(path, engine, selectors = "latest")parse_search_results(path, engine, selectors = "latest")
path |
character. either a path to a file that contains search results or a path to a directory containing search engine result files |
engine |
character. |
selectors |
either character or a |
a tibble of parsed search engine results
search_html <- system.file( "www.google.com_climatechange_text_2023-03-16_08_16_11.html", package = "webbotparseR" ) parse_search_results(search_html, engine = "google text", selectors = "ver1")search_html <- system.file( "www.google.com_climatechange_text_2023-03-16_08_16_11.html", package = "webbotparseR" ) parse_search_results(search_html, engine = "google text", selectors = "ver1")