eyes.crawler¶
eyes crawler module
eyes.crawler.ptt¶
PTT crawler module
- eyes.crawler.ptt.get_post_id(url: str) str ¶
Get post id by url
- Parameters
url (str) – post url
- Returns
post id
- Return type
str
- eyes.crawler.ptt.crawl_post(url: str, board: str) Optional[eyes.data.ptt.PttPost] ¶
Crawl a ptt post into a PttPost
- Parameters
url (str) – a post url
board (str) – board name
- Returns
ptt post data container, None if 404
- Return type
Optional[PttPost]
- eyes.crawler.ptt.get_next_url(dom: lxml.etree.Element) str ¶
Get next page url
- Parameters
dom (etree.Element) – current page DOM
- Returns
next page url
- Return type
str
- eyes.crawler.ptt.crawl_post_urls(board: str, n_days: Optional[int] = None) Iterator[str] ¶
Crawl latest N post urls
- Parameters
board (str) – board name
n_days (Optional[int]) – number of days which posts are created at this range. If n_days is None, crawler will ignore this setting.
- Returns
a list of ptt post urls
- Return type
Iterator[str]
- eyes.crawler.ptt.crawl_board_list(top_n: Optional[int] = None) Iterator[eyes.data.ptt.PttBoard] ¶
Crawl PTT post
- Parameters
top_n (int) – top N boards.
- Returns
Iterator[PttBoard]
eyes.crawler.dcard¶
Dcard Crawler module
- eyes.crawler.dcard.crawl_post(post_id: int) eyes.data.dcard.DcardPost ¶
Crawl a dcard post
- Parameters
post_id (int) – post id
- Returns
dcard post
- Return type
- eyes.crawler.dcard.crawl_post_ids(forum_id: str, n_days: Optional[int] = None) Iterator[int] ¶
Crawl dcard post ids
- Parameters
forum_id (str) – forum id
n_days (str) – number of days which posts are created at this range. If n_days is None, crawler will ignore this setting.
- Returns
post id iterator
- Return type
Iterator[int]
- eyes.crawler.dcard.crawl_board_list(top_n: Optional[int] = None) Iterator[eyes.data.dcard.DcardBoard] ¶
Crawl Dcard forum list
- Parameters
top_n (Optional[int]) – max number of boards
- Returns
board iterator
- Return type
Iterator[DcardBoard]
eyes.crawler.entity¶
Eyes entity crawler module
- eyes.crawler.entity.crawl_wiki_entity(url: str, label: eyes.type.Label) Optional[eyes.data.Entity] ¶
Crawl wiki entity
- Parameters
url (str) – wiki entity url
label (Label) – entity label
- Returns
entity
- Return type
Optional[Entity]
- eyes.crawler.entity.crawl_wiki_entity_urls(category_url: str) Iterator[str] ¶
Crawl wiki entity urls by category url
- Parameters
category_url (str) – category_url
- Returns
category url iterator
- Return type
Iterator[str]
eyes.crawler.utils¶
Crawler utils
- eyes.crawler.utils.get_dom(resp: requests.models.Response) lxml.etree.Element ¶
Transform response to etree.Element
- Parameters
resp (requests.Response) – response
- Returns
DOM
- Return type
etree.Element