Last data update: 2014.03.03

R: Identifies the text of an html string
html2textR Documentation

Identifies the text of an html string

Description

This function is used for processing an html string in order to find the main text of this string. The output is a list that contains the extracted text.

Usage

html2text(html, session=getCurlHandle())

Arguments

html

A string containing valid html code.

session

This is the CURLHandle object giving the structure for the options and that will process the command. For curlMultiPerform, this is an object of class code MultiCURLHandle-class.

Value

A list with the main text in the html.

Author(s)

Ryan Elmore

References

http://www.datasciencetoolkit.org/developerdocs#html2text

See Also

curlPerform, getCurlHandle, dynCurlReader

Examples

	## Not run: 
		html <- '<html><head><title>MyTitle</title></head><body><script
		 type="text/javascript">something();</script><div>Some actual
		 text</div></body></html>'
		html2text(html)
	
## End(Not run)

Results