Package 'DNH4'

Title: Crawling for Daum News Text
Description: Provides some utils to get Korean text sample from news articles in Daum which is popular news portal service in Korea.
Authors: Chanyub Park [aut, cre]
Maintainer: Chanyub Park <[email protected]>
License: MIT + file LICENSE
Version: 0.1.12
Built: 2024-11-08 02:56:22 UTC
Source: https://github.com/forkonlp/DNH4

Help Index


Get All Comment

Description

Get daum news comments

Usage

getAllComment(turl, sort = c("RECOMMEND", "LATEST"))

Arguments

turl

like 'http://v.media.daum.net/v/20161117210603961'.

sort

you can select RECOMMEND, LATEST. RECOMMEND is Default.

Value

a [tibble][tibble::tibble-package]


Get Comment

Description

Get daum news comments

Usage

getComment(
  turl,
  limit = 10,
  offset = 0,
  parentId = 0,
  sort = c("RECOMMEND", "LATEST"),
  type = c("df", "list")
)

Arguments

turl

like 'http://v.media.daum.net/v/20161117210603961'.

limit

is number of comment. Default is 10.

offset

is comment number of start. Default is 0.

parentId

Default is 0.

sort

you can select RECOMMEND, LATEST. RECOMMEND is Default.

type

return type. Default is tibble. It may sometimes warn message.

Value

a [tibble][tibble::tibble-package]


Get Content

Description

Get daum news content from links.

Usage

getContent(turl = url)

Arguments

turl

is daum news link.

Value

a [tibble][tibble::tibble-package] (url,datetime,press,title,content).


Get News Main Categories

Description

Get daum news main category names and ids recently.

Usage

getMainCategory(fresh = FALSE)

Arguments

fresh

If TRUE, get data from internet. Default is FALSE which is return with cache.

Value

Get data.frame(chr:cate_name, chr:url).

Examples

getMainCategory()

Get Max Page Number

Description

Get Max Page Number

Usage

getMaxPageNum(turl = url)

Arguments

turl

is target url include breakingnews, category url, date without regDate like below. 'https://news.daum.net/breakingnews/politics/administration?regDate=20220305'

Value

Get numeric


Get News Sub Categories

Description

Get daum news sub category names and urls recently.

Usage

getSubCategory(categoryUrl = "society", fresh = FALSE)

Arguments

categoryUrl

Main category url in daum news. Only 1 value is passible. Default is society.

fresh

If TRUE, get data from internet. Default is FALSE which is return with cache.

Value

Get data.frame(chr:sub_cate_name, chr:url).

Examples

getSubCategory()
  getSubCategory("politics")

Get Url List

Description

Get daum news titles and links from target url.

Usage

getUrlList(turl = url)

Arguments

turl

is target url daum news.

Value

a [tibble][tibble::tibble-package](news_title, news_links).