Hugging Face Transformers has been built by, with, and for the community. Reaching 100k on GitHub is a testament to ML’s reach and the community’s will to innovate and contribute. To celebrate, we highlight 100 incredible projects in transformers’ vicinity.
HuggingFace Transformers
This page features an impressive array of projects built upon the foundation of Transformers. Transformers is more than just a platform for utilizing pre-trained models; it represents a vibrant community of projects centered around it and the Hugging Face Hub. Our aim with Transformers is to empower a diverse group of individuals including developers, researchers, educators, students, and engineers to bring their dream projects to fruition.
In this compilation, we highlight a range of groundbreaking and influential projects that have significantly advanced the field. As we celebrate this community’s achievement of reaching 100k stars, we feature 100 such projects. However, we remain keenly open to contributions and encourage submissions of other noteworthy projects through pull requests. If there’s a project you believe merits inclusion but isn’t listed, we invite you to submit a PR to add it.
gpt4all is a comprehensive ecosystem of open-source chatbots developed from extensive datasets of clean assistant data, including code, stories, and dialogue. This project features open-source, large-scale language models such as LLaMA and GPT-J, specifically trained for assistant-like interactions.
Keywords: Open-source, LLaMa, GPT-J, instruction, assistant
recommenders
This repository serves as a resource for building recommendation systems, offering a collection of examples and best practices encapsulated in Jupyter notebooks. It covers various critical aspects necessary for developing effective recommendation systems: data preparation, modeling, evaluation, model selection and optimization, and operationalization.
Keywords: Recommender systems, AzureML
lama-cleaner
An image inpainting tool powered by Stable Diffusion, lama-cleaner enables users to remove unwanted objects, defects, or individuals from photos, or to modify and replace elements within images.
Keywords: inpainting, SD, Stable Diffusion
flair
FLAIR is a dynamic PyTorch NLP framework, supporting a variety of crucial tasks such as NER, sentiment analysis, part-of-speech tagging, and generating text and document embeddings.
Keywords: NLP, text embedding, document embedding, biomedical, NER, PoS, sentiment-analysis
mindsdb
MindsDB is an accessible, low-code ML platform that seamlessly integrates various ML frameworks into the data stack through “AI Tables”. This integration simplifies the incorporation of AI into applications, making it accessible to a broad spectrum of developers.
Keywords: Database, low-code, AI table
langchain
Langchain is designed to aid in the development of applications that combine LLMs with other knowledge sources. The library facilitates the chaining of calls to applications, creating a sequence that spans multiple tools.
Keywords: LLMs, Large Language Models, Agents, Chains
LlamaIndex
LlamaIndex is a project offering a central interface to connect your LLMs with external data. It provides a variety of indices and retrieval mechanisms for different LLM tasks, enabling knowledge-augmented results.
Keywords: LLMs, Large Language Models, Data Retrieval, Indices, Knowledge Augmentation
ParlAI
ParlAI is a Python framework dedicated to the sharing, training, and testing of dialogue models, ranging from open-domain chitchat to task-oriented dialogue and visual question answering. It integrates more than 100 datasets under a unified API, boasts a vast collection of pretrained models, a set of agents, and multiple integrations.
Keywords: Dialogue, Chatbots, VQA, Datasets, Agents
sentence-transformers
This framework offers a straightforward approach to computing dense vector representations for sentences, paragraphs, and images. The models, based on transformer networks like BERT, RoBERTa, and XLM-RoBERTa, deliver state-of-the-art performance in various tasks. They embed text in a vector space where similar content is closely aligned, facilitating efficient discovery using cosine similarity.
Keywords: Dense vector representations, Text embeddings, Sentence embeddings
ludwig
Ludwig is a user-friendly machine learning framework that simplifies the definition of machine learning pipelines through a straightforward, data-driven configuration system. It caters to a wide range of AI tasks and includes data-driven configuration, training, prediction, evaluation scripts, and a programmatic API.
Keywords: Declarative, Data-driven, ML Framework
InvokeAI
InvokeAI is a tool for professionals, artists, and enthusiasts, serving as an engine for Stable Diffusion models. It harnesses the latest AI technologies through both CLI and a WebUI.
Keywords: Stable-Diffusion, WebUI, CLI
PaddleNLP
PaddleNLP is an efficient and powerful NLP library, particularly tailored for the Chinese language. It supports multiple pre-trained models and a wide array of NLP tasks, from research to industrial applications.
Keywords: NLP, Chinese, Research, Industry
stanza
Developed by the Stanford NLP Group, this official Python library supports running various sophisticated natural language processing tools for over 60 languages and facilitates access to Stanford CoreNLP software from Python.
Keywords: NLP, Multilingual, CoreNLP
This page features an impressive array of projects built upon the foundation of Transformers. Transformers is more than just a platform for utilizing pre-trained models; it represents a vibrant community of projects centered around it and the Hugging Face Hub. Our aim with Transformers is to empower a diverse group of individuals including developers, researchers, educators, students, and engineers to bring their dream projects to fruition.
In this compilation, we highlight a range of groundbreaking and influential projects that have significantly advanced the field. As we celebrate this community’s achievement of reaching 100k stars, we feature 100 such projects. However, we remain keenly open to contributions and encourage submissions of other noteworthy projects through pull requests. If there’s a project you believe merits inclusion but isn’t listed, we invite you to submit a PR to add it.
gpt4all
gpt4all is a comprehensive ecosystem of open-source chatbots developed from extensive datasets of clean assistant data, including code, stories, and dialogue. This project features open-source, large-scale language models such as LLaMA and GPT-J, specifically trained for assistant-like interactions.
Keywords: Open-source, LLaMa, GPT-J, instruction, assistant
recommenders
This repository serves as a resource for building recommendation systems, offering a collection of examples and best practices encapsulated in Jupyter notebooks. It covers various critical aspects necessary for developing effective recommendation systems: data preparation, modeling, evaluation, model selection and optimization, and operationalization.
Keywords: Recommender systems, AzureML
lama-cleaner
An image inpainting tool powered by Stable Diffusion, lama-cleaner enables users to remove unwanted objects, defects, or individuals from photos, or to modify and replace elements within images.
Keywords: inpainting, SD, Stable Diffusion
flair
FLAIR is a dynamic PyTorch NLP framework, supporting a variety of crucial tasks such as NER, sentiment analysis, part-of-speech tagging, and generating text and document embeddings.
Keywords: NLP, text embedding, document embedding, biomedical, NER, PoS, sentiment-analysis
mindsdb
MindsDB is an accessible, low-code ML platform that seamlessly integrates various ML frameworks into the data stack through “AI Tables”. This integration simplifies the incorporation of AI into applications, making it accessible to a broad spectrum of developers.
Keywords: Database, low-code, AI table
langchain
Langchain is designed to aid in the development of applications that combine LLMs with other knowledge sources. The library facilitates the chaining of calls to applications, creating a sequence that spans multiple tools.
Keywords: LLMs, Large Language Models, Agents, Chains
LlamaIndex
LlamaIndex is a project offering a central interface to connect your LLMs with external data. It provides a variety of indices and retrieval mechanisms for different LLM tasks, enabling knowledge-augmented results.
Keywords: LLMs, Large Language Models, Data Retrieval, Indices, Knowledge Augmentation
ParlAI
ParlAI is a Python framework dedicated to the sharing, training, and testing of dialogue models, ranging from open-domain chitchat to task-oriented dialogue and visual question answering. It integrates more than 100 datasets under a unified API, boasts a vast collection of pretrained models, a set of agents, and multiple integrations.
Keywords: Dialogue, Chatbots, VQA, Datasets, Agents
sentence-transformers
This framework offers a straightforward approach to computing dense vector representations for sentences, paragraphs, and images. The models, based on transformer networks like BERT, RoBERTa, and XLM-RoBERTa, deliver state-of-the-art performance in various tasks. They embed text in a vector space where similar content is closely aligned, facilitating efficient discovery using cosine similarity.
Keywords: Dense vector representations, Text embeddings, Sentence embeddings
ludwig
Ludwig is a user-friendly machine learning framework that simplifies the definition of machine learning pipelines through a straightforward, data-driven configuration system. It caters to a wide range of AI tasks and includes data-driven configuration, training, prediction, evaluation scripts, and a programmatic API.
Keywords: Declarative, Data-driven, ML Framework
InvokeAI
InvokeAI is a tool for professionals, artists, and enthusiasts, serving as an engine for Stable Diffusion models. It harnesses the latest AI technologies through both CLI and a WebUI.
Keywords: Stable-Diffusion, WebUI, CLI
PaddleNLP
PaddleNLP is an efficient and powerful NLP library, particularly tailored for the Chinese language. It supports multiple pre-trained models and a wide array of NLP tasks, from research to industrial applications.
Keywords: NLP, Chinese, Research, Industry
stanza
Developed by the Stanford NLP Group, this official Python library supports running various sophisticated natural language processing tools for over 60 languages and facilitates access to Stanford CoreNLP software from Python.
Keywords: NLP, Multilingual, CoreNLP
DeepPavlov
DeepPavlov is an open-source conversational AI library designed for developing production-ready chatbots and complex conversational systems, as well as for research in NLP and dialog systems.
Keywords: Conversational, Chatbot, Dialog
alpaca-lora
Alpaca-lora provides code to reproduce Stanford’s Alpaca results using low-rank adaptation (LoRA). The repository includes scripts for training (fine-tuning) and generation.
Keywords: LoRA, Parameter-efficient fine-tuning
imagen-pytorch
This open-source implementation of Imagen, Google’s closed-source text-to-image neural network surpassing DALL-E2, represents the new state-of-the-art in text-to-image synthesis.
Keywords: Imagen, Text-to-image
adapter-transformers
Adapter-transformers extend HuggingFace’s Transformers library by integrating adapters with state-of-the-art language models through AdapterHub, a repository for pre-trained adapter modules. This extension serves as a drop-in replacement for transformers, consistently updated to reflect the latest advancements in the field.
Keywords: Adapters, LoRA, Parameter-efficient fine-tuning, Hub
NeMo
NVIDIA’s NeMo is a conversational AI toolkit designed for researchers focusing on automatic speech recognition (ASR), text-to-speech synthesis (TTS), large language models (LLMs), and NLP. Its primary goal is to facilitate research by making it easier to repurpose existing work (code and pretrained models) and create new models.
Keywords: Conversational, ASR, TTS, LLMs, NLP
Runhouse
Runhouse enables the transmission of code and data to any compute or data infrastructure, all in Python, allowing seamless interaction with these resources from your existing code and environment. Think of it as an extension to your Python interpreter, empowering it to utilize remote machines or manage remote data.
Keywords: MLOps, Infrastructure, Data storage, Modeling
MONAI
MONAI, a PyTorch-based, open-source framework, is focused on deep learning in healthcare imaging and is a part of the PyTorch Ecosystem. Its goals include fostering a collaborative community among academic, industrial, and clinical researchers, creating cutting-edge end-to-end training workflows for healthcare imaging, and providing researchers with an optimized and standardized approach to developing and evaluating deep learning models.
Keywords: Healthcare imaging, Training, Evaluation
simpletransformers
Simple Transformers enables rapid training and evaluation of Transformer models, requiring only three lines of code for initialization, training, and evaluation. It supports a diverse range of NLP tasks.
Keywords: Framework, simplicity, NLP
JARVIS
JARVIS strives to integrate LLMs like GPT-4 with the broader open-source ML community, leveraging up to 60 downstream models to execute tasks identified by the LLM.
Keywords: LLM, Agents, HF Hub
transformers.js
transformers.js is a JavaScript library designed to run models from transformers directly within the browser.
Keywords: Transformers, JavaScript, browser
bumblebee
Bumblebee offers pre-trained Neural Network models atop Axon, a neural networks library for the Elixir language. It integrates with 🤗 Models, enabling effortless download and execution of Machine Learning tasks with minimal code.
Keywords: Elixir, Axon
argilla
Argilla is an open-source platform providing advanced NLP labeling, monitoring, and workspaces. It is compatible with various open-source ecosystems like Hugging Face, Stanza, FLAIR, and more.
Keywords: NLP, Labeling, Monitoring, Workspaces
haystack
Haystack is an open-source NLP framework enabling interaction with data using Transformer models and LLMs. It offers production-ready tools for quickly building complex decision-making, question answering, semantic search, text generation applications, and more.
Keywords: NLP, Framework, LLM
spaCy
spaCy is a library for advanced Natural Language Processing in Python and Cython, built on the latest research and designed from the outset for real-world applications. It supports transformer models through its third-party package, spacy-transformers.
Keywords: NLP, Framework
speechbrain
SpeechBrain is an all-in-one conversational AI toolkit based on PyTorch. Its objective is to create a flexible, user-friendly toolkit for developing state-of-the-art speech technologies, encompassing speech recognition, speaker recognition, speech enhancement, speech separation, language identification, multi-microphone signal processing, and more.
Keywords: Conversational, Speech
skorch
Skorch is a scikit-learn compatible neural network library that wraps PyTorch. It supports models from transformers and tokenizers from tokenizers.
Keywords: Scikit-Learn, PyTorch
bertviz
BertViz is an interactive tool for visualizing attention in Transformer language models such as BERT, GPT2, or T5. It operates within a Jupyter or Colab notebook via a simple Python API and is compatible with most Huggingface models.
Keywords: Visualization, Transformers
mesh-transformer-jax
Read other articles in our Blog.

