Title: | Handling Methods for Naver News Text Crawling |
---|---|
Description: | Provides some functions to get Korean text sample from news articles in Naver which is popular news portal service <https://news.naver.com/> in Korea. |
Authors: | Chanyub Park [aut, cre] |
Maintainer: | Chanyub Park <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.8.4 |
Built: | 2024-11-21 03:50:07 UTC |
Source: | https://github.com/forkonlp/N2H4 |
Get all comments from the provided news article url on naver
getAllComment(turl)
getAllComment(turl)
turl |
character. News article on 'Naver' such as <https://n.news.naver.com/mnews/article/023/0003712918>. News article url that is not on Naver.com domain will generate an error. |
Works just like getComment, but this function executed in a fashion where it finds and extracts all comments from the given url.
a [tibble][tibble::tibble-package]
## Not run: getAllComment("https://n.news.naver.com/mnews/article/214/0001195110") ## End(Not run)
## Not run: getAllComment("https://n.news.naver.com/mnews/article/214/0001195110") ## End(Not run)
Get All Comment History
getAllCommentHistory(turl, commentNo)
getAllCommentHistory(turl, commentNo)
turl |
character. News article on 'Naver' such as <https://n.news.naver.com/mnews/article/001/0009205077?sid=102>. News articl url that is not on Naver.com domain will generate an error. |
commentNo |
Parent Comment No. |
a [tibble][tibble::tibble-package]
## Not run: getAllComment("https://n.news.naver.com/mnews/article/214/0001195110?sid=103") ## End(Not run)
## Not run: getAllComment("https://n.news.naver.com/mnews/article/214/0001195110?sid=103") ## End(Not run)
News Category
getCategory(fresh = FALSE)
getCategory(fresh = FALSE)
fresh |
get data from online. Default is FALSE using cached built-in data. |
Get naver news comments. if you want to get data only comment, enter command like below. getComment(url)$result$commentList[[1]]
getComment(turl, count = 10, type = c("df", "list"))
getComment(turl, count = 10, type = c("df", "list"))
turl |
like <https://n.news.naver.com/mnews/article/023/0003712918>. |
count |
is a number of comments. Defualt is 10. "all" works to get all comments. |
type |
type return df or list. Defualt is df. df return part of data not all. |
a [tibble][tibble::tibble-package]
## Not run: getComment("https://n.news.naver.com/mnews/article/421/0002484966?sid=100") ## End(Not run)
## Not run: getComment("https://n.news.naver.com/mnews/article/421/0002484966?sid=100") ## End(Not run)
Get naver news comments on user histories.
getCommentHistory(turl, commentNo, count = 10, type = c("df", "list"))
getCommentHistory(turl, commentNo, count = 10, type = c("df", "list"))
turl |
character. News article on 'Naver' such as <https://n.news.naver.com/mnews/article/001/0009205077?sid=102>. News articl url that is not on Naver.com domain will generate an error. |
commentNo |
Parent Comment No. |
count |
is a number of comments. Defualt is 10. "all" works to get all comments. |
type |
type return df or list. Defult is df. df return part of data not all. |
a [tibble][tibble::tibble-package]
## Not run: cno <- getComment("https://n.news.naver.com/mnews/article/421/0002484966?sid=100") getCommentHistory("https://n.news.naver.com/mnews/article/421/0002484966?sid=100", cno$commnetNo[1]) ## End(Not run)
## Not run: cno <- getComment("https://n.news.naver.com/mnews/article/421/0002484966?sid=100") getCommentHistory("https://n.news.naver.com/mnews/article/421/0002484966?sid=100", cno$commnetNo[1]) ## End(Not run)
Get naver news content from links.
getContent( turl, col = c("url", "original_url", "section", "datetime", "edittime", "press", "title", "body") )
getContent( turl, col = c("url", "original_url", "section", "datetime", "edittime", "press", "title", "body") )
turl |
is naver news link. |
col |
is what you want to get from news. Defualt is all. |
a [tibble][tibble::tibble-package]
## Not run: getContent("https://n.news.naver.com/mnews/article/214/0001195110?sid=103") ## End(Not run)
## Not run: getContent("https://n.news.naver.com/mnews/article/214/0001195110?sid=103") ## End(Not run)
Get naver news main category names and ids recently.
getMainCategory()
getMainCategory()
a [tibble][tibble::tibble-package]
## Not run: getMainCategory() ## End(Not run)
## Not run: getMainCategory() ## End(Not run)
Get naver news sub category names and urls recently.
getSubCategory(sid1 = 100)
getSubCategory(sid1 = 100)
sid1 |
Main category id in naver news url. Only 1 value is passible. Default is 100 means Politics. |
a [tibble][tibble::tibble-package]
## Not run: getSubCategory(100) getSubCategory(100, FALSE) ## End(Not run)
## Not run: getSubCategory(100) getSubCategory(100, FALSE) ## End(Not run)