eyes.ml

eyes ml module

eyes.ml.spacy

Eyes ml spacy module

eyes.ml.spacy.transform_ptt_post_to_spacy(post: eyes.db.ptt.PttPost, nlp: spacy.language.Language, disable: Iterable[str] = ['tok2vec']) eyes.data.spacy.SpacyPttPost

Transform ptt post to spacy doc binary

Parameters
  • post (ptt.PttPost) – ptt post

  • nlp (Language) – spacy language model

  • disable (Iterable[str]) – disabled pipeline

Returns

spacy.SpacyPttPost

eyes.ml.spacy.binary_to_doc(binary: bytes, nlp: spacy.language.Language) spacy.tokens.doc.Doc

Transform bytes to spacy doc

Parameters
  • binary (bytes) – spacy binary string

  • nlp (Language) – spacy language model

Returns

spacy doc

Return type

Doc

eyes.ml.spacy.transform_ptt_comment(comment: eyes.data.spacy.SpacyPttComment, nlp: spacy.language.Language) Dict

Transform ptt comment to decoded dictionary

Parameters
Returns

decoded dictionary

Return type

Dict

eyes.ml.spacy.transform_ptt_post(post: eyes.data.spacy.SpacyPttPost, nlp: spacy.language.Language) Dict

Transform ptt post to decoded dictionary

Parameters
Returns

decoded dictionary

Return type

Dict

eyes.ml.spacy.build_docs(nlp: spacy.language.Language, sess: sqlalchemy.orm.session.Session, limit: int = 100000, batch_size: int = 32) Iterable[spacy.tokens.doc.Doc]

Build spacy docs

Parameters
  • sess (Session) – sqlalchemy session

  • limit (int) – max number of docs

Returns

spacy docs

Return type

Iterable[Doc]

eyes.ml.lf

Eyes label functions module

eyes.ml.lf.build_tries(entities: List[eyes.data.Entity]) Dict[str, skweak.gazetteers.Trie]

Build Gazetteer Tries

Parameters

entities (List[Entity]) – entities

Returns

tries used in gazetteer

Return type

Dict[str, Trie]

class eyes.ml.lf.NERAnnotator(sess: sqlalchemy.orm.session.Session)

NER Annotator for spacy document

add_all()

Add all annotators

add_gazetteers()

Add gazetteers to annotator