Technical Dashboard - Validation and Metrics
First model specialized in anti-LGBTQIA+ hate speech in Brazilian Portuguese
1. BERT (Google, 2018)
BERT (Bidirectional Encoder Representations from Transformers) was created by Google as a general-purpose language model, trained on English. It revolutionized natural language processing by understanding bidirectional context, but was developed for English and does not capture Brazilian Portuguese specifics.
2. Tupi-BERT (FpOliveira, Brazil — now a partner of Código Não Binário)
Brazilian researchers adapted BERT for Brazilian Portuguese, creating Tupi-BERT. This model was pre-trained specifically on Brazilian Portuguese and received a "warm-start" in hate speech detection — i.e., it already had some knowledge of hate speech before our fine-tuning. This makes it better suited for content moderation tasks in Portuguese.
3. TibyrIA v2.1 (Código Não Binário)
We perform specialized fine-tuning of Tupi-BERT on our unique anti-LGBTQIA+ hate dataset. We trained the model on 1,891 comments manually annotated by our community, using Focal Loss to handle class imbalance. The result is a model that not only understands Brazilian Portuguese but specifically recognizes the nuances of LGBT+ phobia — transphobia, lesbophobia, homophobia, biphobia — in its intersections with racism, misogyny, and other power structures.
Why this architecture?
Small models (SLMs) like Tupi-BERT (~110M parameters, 12 layers) are better suited for community use: they require less infrastructure, are faster, can run on your own servers, and do not depend on Big Tech. TibyrIA shows that specialized, contextualized models can outperform generic giants on specific tasks.
3 full iterations with validated improvement: from 5% to 92.86% recall
First dataset specialized in anti-LGBTQIA+ hate in Brazilian Portuguese
Collected from Instagram, TikTok and YouTube between May–August 2024
1,891 comments manually annotated (15.6% of total)
Since October 2025 (5 months), our AI solutions have been live on huggingface.co/Veronyka.
| Resource | Statistic | Link |
|---|---|---|
| Dataset base-dados-odio-lgbtqia | 514 downloads | Veronyka/base-dados-odio-lgbtqia |
| Model tybyria-v2.1 | 297 downloads | Veronyka/tybyria-v2.1 |
| Space radar-social-lgbtqia-v2.1 | 443 views | radar-social-lgbtqia-v2.1 |
| Space radar-legislativo-lgbtqia-v2.1 | 301 views | radar-legislativo-lgbtqia-v2.1 |
| Space radar-social-lgbtqia-v2 | 63 views | radar-social-lgbtqia-v2 |
| Space radar-social-lgbtqia-v1 | 448 views | radar-social-lgbtqia-v1 |
| radar-social-lgbtqia | 66,585 views (see note) | Veronyka/radar-social-lgbtqia |
Note: The radar-social-lgbtqia figure may include automated traffic and system requests, not just unique users. Overall, the numbers indicate sustained public presence and recurring use of the tools.
London: Classes using the tool
São Paulo: Classes using the tool
Turkey: Conversations in progress
Netherlands: Conversations in progress