| Title: | Handling Methods for Naver News Text Crawling |
|---|---|
| Description: | Provides some functions to get Korean text sample from news articles in Naver which is popular news portal service <https://news.naver.com/> in Korea. |
| Authors: | Chanyub Park [aut, cre] (ORCID: <https://orcid.org/0000-0001-6474-2570>) |
| Maintainer: | Chanyub Park <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.8.4 |
| Built: | 2026-06-11 08:00:29 UTC |
| Source: | https://github.com/forkonlp/N2H4 |
Get all comments from the provided news article url on naver
getAllComment(turl)getAllComment(turl)
turl |
character. News article on 'Naver' such as <https://n.news.naver.com/mnews/article/023/0003712918>. News article url that is not on Naver.com domain will generate an error. |
Works just like getComment, but this function executed in a fashion where it finds and extracts all comments from the given url.
a [tibble][tibble::tibble-package]
## Not run: getAllComment("https://n.news.naver.com/mnews/article/214/0001195110") ## End(Not run)## Not run: getAllComment("https://n.news.naver.com/mnews/article/214/0001195110") ## End(Not run)
Get All Comment History
getAllCommentHistory(turl, commentNo)getAllCommentHistory(turl, commentNo)
turl |
character. News article on 'Naver' such as <https://n.news.naver.com/mnews/article/001/0009205077?sid=102>. News articl url that is not on Naver.com domain will generate an error. |
commentNo |
Parent Comment No. |
a [tibble][tibble::tibble-package]
## Not run: getAllComment("https://n.news.naver.com/mnews/article/214/0001195110?sid=103") ## End(Not run)## Not run: getAllComment("https://n.news.naver.com/mnews/article/214/0001195110?sid=103") ## End(Not run)
News Category
getCategory(fresh = FALSE) news_category_get(fresh = FALSE)getCategory(fresh = FALSE) news_category_get(fresh = FALSE)
fresh |
get data from online. Default is FALSE using cached built-in data. |
Use 'N2H4_CACHE' to control cached data usage when 'fresh = FALSE'. truthy values: '1', 'true', 'yes', 'on'; falsy values: '0', 'false', 'no', 'off'.
Get naver news comments. if you want to get data only comment, enter command like below. getComment(url)$result$commentList[[1]]
getComment(turl, count = 10, type = c("df", "list")) news_comment(turl, count = 10, type = c("df", "list"))getComment(turl, count = 10, type = c("df", "list")) news_comment(turl, count = 10, type = c("df", "list"))
turl |
like <https://n.news.naver.com/mnews/article/023/0003712918>. |
count |
is a number of comments. Defualt is 10. "all" works to get all comments. |
type |
type return df or list. Defualt is df. df return part of data not all. |
a [tibble][tibble::tibble-package]
## Not run: getComment("https://n.news.naver.com/mnews/article/421/0002484966?sid=100") ## End(Not run)## Not run: getComment("https://n.news.naver.com/mnews/article/421/0002484966?sid=100") ## End(Not run)
Get naver news comments on user histories.
getCommentHistory(turl, commentNo, count = 10, type = c("df", "list")) news_comment_history(turl, commentNo, count = 10, type = c("df", "list"))getCommentHistory(turl, commentNo, count = 10, type = c("df", "list")) news_comment_history(turl, commentNo, count = 10, type = c("df", "list"))
turl |
character. News article on 'Naver' such as <https://n.news.naver.com/mnews/article/001/0009205077?sid=102>. News articl url that is not on Naver.com domain will generate an error. |
commentNo |
Parent Comment No. |
count |
is a number of comments. Defualt is 10. "all" works to get all comments. |
type |
type return df or list. Defult is df. df return part of data not all. |
a [tibble][tibble::tibble-package]
## Not run: cno <- getComment("https://n.news.naver.com/mnews/article/421/0002484966?sid=100") getCommentHistory("https://n.news.naver.com/mnews/article/421/0002484966?sid=100", cno$commnetNo[1]) ## End(Not run)## Not run: cno <- getComment("https://n.news.naver.com/mnews/article/421/0002484966?sid=100") getCommentHistory("https://n.news.naver.com/mnews/article/421/0002484966?sid=100", cno$commnetNo[1]) ## End(Not run)
Get naver news content from links.
getContent( turl, col = c("url", "original_url", "section", "datetime", "edittime", "press", "reporter", "title", "body") ) news_content( turl, col = c("url", "original_url", "section", "datetime", "edittime", "press", "reporter", "title", "body") )getContent( turl, col = c("url", "original_url", "section", "datetime", "edittime", "press", "reporter", "title", "body") ) news_content( turl, col = c("url", "original_url", "section", "datetime", "edittime", "press", "reporter", "title", "body") )
turl |
is naver news link. |
col |
is what you want to get from news. Defualt is all. |
a [tibble][tibble::tibble-package]
## Not run: getContent("https://n.news.naver.com/mnews/article/214/0001195110?sid=103") ## End(Not run)## Not run: getContent("https://n.news.naver.com/mnews/article/214/0001195110?sid=103") ## End(Not run)
Get naver news main category names and ids recently.
getMainCategory()getMainCategory()
a [tibble][tibble::tibble-package]
## Not run: getMainCategory() ## End(Not run)## Not run: getMainCategory() ## End(Not run)
Get naver news sub category names and urls recently.
getSubCategory(sid1 = 100)getSubCategory(sid1 = 100)
sid1 |
Main category id in naver news url. Only 1 value is passible. Default is 100 means Politics. |
a [tibble][tibble::tibble-package]
## Not run: getSubCategory(100) getSubCategory(100, FALSE) ## End(Not run)## Not run: getSubCategory(100) getSubCategory(100, FALSE) ## End(Not run)