Алгоритмы машинного обучения
Теги: machine-learning
- ml_monorepo Super-monorepo for machine learning and algorithmic trading
- CS50’s Introduction to Computer Science
Data engines
[computer-visions]
NLP
- Natural Language Toolkit NLTK is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries, and an active discussion forum.
- Levenshtein a C extension module for fast computation of Levenshtein (edit) distance and edit sequence manipulation, string similarity, approximate median strings, and generally string averaging, string sequence and set similarity.
- python-string-similarity a library implementing different string similarity and distance measures. A dozen of algorithms (including Levenshtein edit distance and sibblings, Jaro-Winkler, Longest Common Subsequence, cosine similarity etc.) are currently implemented.
- open-llms a list of open LLMs available for commercial use.
[graphs]
[knowledge-graphs]
[evolution-methods]
[reinforcement-learning]
Boosting
Recomendation systems
Preprocessing
- scikit-lego custom sxcikit-learn transformers, metrics and models
Mining
- siuba is a tool for concise, flexible data-analysis over multiple data sources. It currently supports pandas DataFrames and SQL tables.
Experiments piplines and deployments
- sacred is a tool to help you configure, organize, log and reproduce experiments
- kedro is an open-source Python framework for creating reproducible, maintainable and modular data science code
- MLflow is an open source platform for managing the end-to-end machine learning lifecycle
- AutoDeploy allows configuration based MLOps
- spacy train, manage, package, deploy nlp-models pipline
- MLFlow Machine Learning Lifecycle Platform
- KubeFlow the cloud-native platform for machine learning operations - pipelines, training and deployment
- RayWorkflow provides a simple, universal API for building distributed applications, дока
- seldon-core converts your ML models (Tensorflow, Pytorch, H2o, etc.) or language wrappers (Python, Java, etc.) into production REST/GRPC microservices.
- [jina] Фреймворк для работы с ml-моделями
- [docarray] is a library for nested, unstructured data in transit, including text, image, audio, video, 3D mesh, etc.
Neural networks
Spiking NN
- SnnTorch Deep and online learning with spiking neural networks in Python
Ready engines and models
- EasyOCR Ready-to-use OCR with 80+ supported languages and all popular writing scripts including: Latin, Chinese, Arabic, Devanagari, Cyrillic, etc.
- Face Recognition Recognize and manipulate faces from Python or from the command line with the world’s simplest face recognition library.
- Weights and biaces Developer tools for machine learning gui
Datasets
- Recipe1M+: A Dataset for Learning Cross-Modal Embeddings for Cooking Recipes and Food Images
- Национальный корпус русского языка comertial license
- pencorpora.org Открытый корпус