Алгоритмы машинного обучения

Теги: machine-learning 

Data engines

[computer-visions]

NLP

  • Natural Language Toolkit NLTK is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries, and an active discussion forum.
  • Levenshtein a C extension module for fast computation of Levenshtein (edit) distance and edit sequence manipulation, string similarity, approximate median strings, and generally string averaging, string sequence and set similarity.
  • python-string-similarity a library implementing different string similarity and distance measures. A dozen of algorithms (including Levenshtein edit distance and sibblings, Jaro-Winkler, Longest Common Subsequence, cosine similarity etc.) are currently implemented.
  • open-llms a list of open LLMs available for commercial use.

[graphs]

[knowledge-graphs]

[evolution-methods]

[reinforcement-learning]

Boosting

Recomendation systems

Preprocessing

  • scikit-lego custom sxcikit-learn transformers, metrics and models

Mining

  • siuba is a tool for concise, flexible data-analysis over multiple data sources. It currently supports pandas DataFrames and SQL tables.

Experiments piplines and deployments

  • sacred is a tool to help you configure, organize, log and reproduce experiments
  • kedro is an open-source Python framework for creating reproducible, maintainable and modular data science code
  • MLflow is an open source platform for managing the end-to-end machine learning lifecycle
  • AutoDeploy allows configuration based MLOps
  • spacy train, manage, package, deploy nlp-models pipline
  • MLFlow Machine Learning Lifecycle Platform
  • KubeFlow the cloud-native platform for machine learning operations - pipelines, training and deployment
  • RayWorkflow provides a simple, universal API for building distributed applications, дока
  • seldon-core converts your ML models (Tensorflow, Pytorch, H2o, etc.) or language wrappers (Python, Java, etc.) into production REST/GRPC microservices.
  • [jina] Фреймворк для работы с ml-моделями
  • [docarray] is a library for nested, unstructured data in transit, including text, image, audio, video, 3D mesh, etc.

Neural networks

Spiking NN

  • SnnTorch Deep and online learning with spiking neural networks in Python

Ready engines and models

  • EasyOCR Ready-to-use OCR with 80+ supported languages and all popular writing scripts including: Latin, Chinese, Arabic, Devanagari, Cyrillic, etc.
  • Face Recognition Recognize and manipulate faces from Python or from the command line with the world’s simplest face recognition library.
  • Weights and biaces Developer tools for machine learning gui

Datasets