Title: | Parse html files containing search engine results |
---|---|
Description: | Parse search engine results which have been scraped with the 'WebBot' browser extension <https://github.com/gesiscss/WebBot>. |
Authors: | David Schoch [aut, cre] , Chung-hong Chan [aut] |
Maintainer: | David Schoch <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.0.9000 |
Built: | 2024-11-12 06:27:46 UTC |
Source: | https://github.com/schochastics/webbotparseR |
Convert a data uri to an image in the correct format and save it to a file.
base64_to_img(data_uri, slug)
base64_to_img(data_uri, slug)
data_uri |
charachter, base64 image string as returned by parse_search_results |
slug |
character, name of file to export image to. WITHOUT extension |
nothing, called for side effects
## Not run: data_uri <- paste0( "data:image/png;base64,", base64enc::base64encode(system.file("logo.png", package = "webbotparseR")) ) base64_to_img(data_uri, "logo") ## End(Not run)
## Not run: data_uri <- paste0( "data:image/png;base64,", base64enc::base64encode(system.file("logo.png", package = "webbotparseR")) ) base64_to_img(data_uri, "logo") ## End(Not run)
Parse metadata from search engine results
parse_metadata(path)
parse_metadata(path)
path |
character. a path to a file that contains search results |
a tibble of parsed search engine results
parse_metadata("www.google.com_climate change_text_2023-03-16_08_16_11.html")
parse_metadata("www.google.com_climate change_text_2023-03-16_08_16_11.html")
Parse search engine results
parse_search_results(path, engine, selectors = "latest")
parse_search_results(path, engine, selectors = "latest")
path |
character. either a path to a file that contains search results or a path to a directory containing search engine result files |
engine |
character. |
selectors |
either character or a |
a tibble of parsed search engine results
search_html <- system.file( "www.google.com_climatechange_text_2023-03-16_08_16_11.html", package = "webbotparseR" ) parse_search_results(search_html, engine = "google text", selectors = "ver1")
search_html <- system.file( "www.google.com_climatechange_text_2023-03-16_08_16_11.html", package = "webbotparseR" ) parse_search_results(search_html, engine = "google text", selectors = "ver1")