eyes.crawler

eyes crawler module

eyes.crawler.ptt

PTT crawler module

eyes.crawler.ptt.get_post_id(url: str) str

Get post id by url

Parameters

url (str) – post url

Returns

post id

Return type

str

eyes.crawler.ptt.crawl_post(url: str, board: str) Optional[eyes.data.ptt.PttPost]

Crawl a ptt post into a PttPost

Parameters
  • url (str) – a post url

  • board (str) – board name

Returns

ptt post data container, None if 404

Return type

Optional[PttPost]

eyes.crawler.ptt.get_next_url(dom: lxml.etree.Element) str

Get next page url

Parameters

dom (etree.Element) – current page DOM

Returns

next page url

Return type

str

eyes.crawler.ptt.crawl_post_urls(board: str, n_days: Optional[int] = None) Iterator[str]

Crawl latest N post urls

Parameters
  • board (str) – board name

  • n_days (Optional[int]) – number of days which posts are created at this range. If n_days is None, crawler will ignore this setting.

Returns

a list of ptt post urls

Return type

Iterator[str]

eyes.crawler.ptt.crawl_board_list(top_n: Optional[int] = None) Iterator[eyes.data.ptt.PttBoard]

Crawl PTT post

Parameters

top_n (int) – top N boards.

Returns

Iterator[PttBoard]

eyes.crawler.dcard

Dcard Crawler module

eyes.crawler.dcard.crawl_post(post_id: int) eyes.data.dcard.DcardPost

Crawl a dcard post

Parameters

post_id (int) – post id

Returns

dcard post

Return type

DcardPost

eyes.crawler.dcard.crawl_post_ids(forum_id: str, n_days: Optional[int] = None) Iterator[int]

Crawl dcard post ids

Parameters
  • forum_id (str) – forum id

  • n_days (str) – number of days which posts are created at this range. If n_days is None, crawler will ignore this setting.

Returns

post id iterator

Return type

Iterator[int]

eyes.crawler.dcard.crawl_board_list(top_n: Optional[int] = None) Iterator[eyes.data.dcard.DcardBoard]

Crawl Dcard forum list

Parameters

top_n (Optional[int]) – max number of boards

Returns

board iterator

Return type

Iterator[DcardBoard]

eyes.crawler.entity

Eyes entity crawler module

eyes.crawler.entity.crawl_wiki_entity(url: str, label: eyes.type.Label) Optional[eyes.data.Entity]

Crawl wiki entity

Parameters
  • url (str) – wiki entity url

  • label (Label) – entity label

Returns

entity

Return type

Optional[Entity]

eyes.crawler.entity.crawl_wiki_entity_urls(category_url: str) Iterator[str]

Crawl wiki entity urls by category url

Parameters

category_url (str) – category_url

Returns

category url iterator

Return type

Iterator[str]

eyes.crawler.utils

Crawler utils

eyes.crawler.utils.get_dom(resp: requests.models.Response) lxml.etree.Element

Transform response to etree.Element

Parameters

resp (requests.Response) – response

Returns

DOM

Return type

etree.Element