Package 'webbotparseR'

Title: Parse html files containing search engine results
Description: Parse search engine results which have been scraped with the 'WebBot' browser extension <https://github.com/gesiscss/WebBot>.
Authors: David Schoch [aut, cre] , Chung-hong Chan [aut]
Maintainer: David Schoch <[email protected]>
License: MIT + file LICENSE
Version: 0.1.0.9000
Built: 2024-11-12 06:27:46 UTC
Source: https://github.com/schochastics/webbotparseR

Help Index


Image data uri to file

Description

Convert a data uri to an image in the correct format and save it to a file.

Usage

base64_to_img(data_uri, slug)

Arguments

data_uri

charachter, base64 image string as returned by parse_search_results

slug

character, name of file to export image to. WITHOUT extension

Value

nothing, called for side effects

Examples

## Not run: 
data_uri <- paste0(
    "data:image/png;base64,",
    base64enc::base64encode(system.file("logo.png", package = "webbotparseR"))
)
base64_to_img(data_uri, "logo")

## End(Not run)

Parse metadata from search engine results

Description

Parse metadata from search engine results

Usage

parse_metadata(path)

Arguments

path

character. a path to a file that contains search results

Value

a tibble of parsed search engine results

Examples

parse_metadata("www.google.com_climate change_text_2023-03-16_08_16_11.html")

Parse search engine results

Description

Parse search engine results

Usage

parse_search_results(path, engine, selectors = "latest")

Arguments

path

character. either a path to a file that contains search results or a path to a directory containing search engine result files

engine

character.

selectors

either character or a webbot_selectors S3 object. For character, it represents the selectors version and valid choices are listed in selectors_versions and "latest" (select the latest version). You can also supply your own webbot_selectors object.

Value

a tibble of parsed search engine results

Examples

search_html <- system.file(
    "www.google.com_climatechange_text_2023-03-16_08_16_11.html",
    package = "webbotparseR"
)

parse_search_results(search_html, engine = "google text", selectors = "ver1")