eyes.ml¶
eyes ml module
eyes.ml.spacy¶
Eyes ml spacy module
- eyes.ml.spacy.transform_ptt_post_to_spacy(post: eyes.db.ptt.PttPost, nlp: spacy.language.Language, disable: Iterable[str] = ['tok2vec']) eyes.data.spacy.SpacyPttPost ¶
Transform ptt post to spacy doc binary
- Parameters
post (ptt.PttPost) – ptt post
nlp (Language) – spacy language model
disable (Iterable[str]) – disabled pipeline
- Returns
spacy.SpacyPttPost
- eyes.ml.spacy.binary_to_doc(binary: bytes, nlp: spacy.language.Language) spacy.tokens.doc.Doc ¶
Transform bytes to spacy doc
- Parameters
binary (bytes) – spacy binary string
nlp (Language) – spacy language model
- Returns
spacy doc
- Return type
Doc
- eyes.ml.spacy.transform_ptt_comment(comment: eyes.data.spacy.SpacyPttComment, nlp: spacy.language.Language) Dict ¶
Transform ptt comment to decoded dictionary
- Parameters
comment (spacy.SpacyPttComment) – spacy binary comment
nlp (Language) – spacy language model
- Returns
decoded dictionary
- Return type
Dict
- eyes.ml.spacy.transform_ptt_post(post: eyes.data.spacy.SpacyPttPost, nlp: spacy.language.Language) Dict ¶
Transform ptt post to decoded dictionary
- Parameters
post (spacy.SpacyPttPost) – spacy binary post
nlp (Language) – spacy language model
- Returns
decoded dictionary
- Return type
Dict
- eyes.ml.spacy.build_docs(nlp: spacy.language.Language, sess: sqlalchemy.orm.session.Session, limit: int = 100000, batch_size: int = 32) Iterable[spacy.tokens.doc.Doc] ¶
Build spacy docs
- Parameters
sess (Session) – sqlalchemy session
limit (int) – max number of docs
- Returns
spacy docs
- Return type
Iterable[Doc]
eyes.ml.lf¶
Eyes label functions module
- eyes.ml.lf.build_tries(entities: List[eyes.data.Entity]) Dict[str, skweak.gazetteers.Trie] ¶
Build Gazetteer Tries
- Parameters
entities (List[Entity]) – entities
- Returns
tries used in gazetteer
- Return type
Dict[str, Trie]