TibyrIA v2.1

Technical Dashboard - Validation and Metrics

First model specialized in anti-LGBTQIA+ hate speech in Brazilian Portuguese

Recall
92.86%
Detects 93 out of every 100 hate cases
Accuracy
80.95%
Overall model precision
F1-Score
72.63%
Precision/recall balance
Precision
59.63%
Cases classified as hate

🔬 Model evolution: From BERT to TibyrIA

The technical journey

1. BERT (Google, 2018)
BERT (Bidirectional Encoder Representations from Transformers) was created by Google as a general-purpose language model, trained on English. It revolutionized natural language processing by understanding bidirectional context, but was developed for English and does not capture Brazilian Portuguese specifics.

2. Tupi-BERT (FpOliveira, Brazil — now a partner of Código Não Binário)
Brazilian researchers adapted BERT for Brazilian Portuguese, creating Tupi-BERT. This model was pre-trained specifically on Brazilian Portuguese and received a "warm-start" in hate speech detection — i.e., it already had some knowledge of hate speech before our fine-tuning. This makes it better suited for content moderation tasks in Portuguese.

3. TibyrIA v2.1 (Código Não Binário)
We perform specialized fine-tuning of Tupi-BERT on our unique anti-LGBTQIA+ hate dataset. We trained the model on 1,891 comments manually annotated by our community, using Focal Loss to handle class imbalance. The result is a model that not only understands Brazilian Portuguese but specifically recognizes the nuances of LGBT+ phobia — transphobia, lesbophobia, homophobia, biphobia — in its intersections with racism, misogyny, and other power structures.

Why this architecture?
Small models (SLMs) like Tupi-BERT (~110M parameters, 12 layers) are better suited for community use: they require less infrastructure, are faster, can run on your own servers, and do not depend on Big Tech. TibyrIA shows that specialized, contextualized models can outperform generic giants on specific tasks.

📈 Model evolution

v1
5%
First iteration
v2
67%
Significant improvement
v2.1
92.86%
Current validated version

⚡ Development in 1 month

3 full iterations with validated improvement: from 5% to 92.86% recall

📊 Performance metrics

Optimized threshold
0.30 (optimized for maximum recall)
Training base
1,891 manually annotated comments
Validation
Full dataset: 12,102 comments
Annotation rate
1,891 of 12,102 comments annotated (15.6%)
Interpretation
Only 1.56% of hate cases may be missed

📚 Dataset

12,102 comments collected

First dataset specialized in anti-LGBTQIA+ hate in Brazilian Portuguese

Collected from Instagram, TikTok and YouTube between May–August 2024

1,891 comments manually annotated (15.6% of total)

Instagram
2,098 comments (17.3%)
TikTok
6,271 comments (51.8%)
YouTube
3,733 comments (30.9%)
License
CC BY-NC-SA 4.0 (open and replicable)

⚙️ Technical architecture

Base model
Tupi-BERT (FpOliveira/tupi-bert-base-portuguese-cased)
Architecture
BERT-base (~110M parameters, 12 layers)
Fine-tuning
4 epochs, Learning Rate 1e-5, Focal Loss (α=0.75, γ=2.0)
Training
MacBook M2, training in 2–4 hours
Deploy
Hugging Face Spaces (ZeroGPU), inference <100ms
Sovereignty
Brazilian SLM, CC-BY-NC-SA-4.0 license, no Big Tech dependency

🔗 Links and repositories

🌍 Impact and expansion

Presence on Hugging Face

Since October 2025 (5 months), our AI solutions have been live on huggingface.co/Veronyka.

ResourceStatisticLink
Dataset base-dados-odio-lgbtqia514 downloadsVeronyka/base-dados-odio-lgbtqia
Model tybyria-v2.1297 downloadsVeronyka/tybyria-v2.1
Space radar-social-lgbtqia-v2.1443 viewsradar-social-lgbtqia-v2.1
Space radar-legislativo-lgbtqia-v2.1301 viewsradar-legislativo-lgbtqia-v2.1
Space radar-social-lgbtqia-v263 viewsradar-social-lgbtqia-v2
Space radar-social-lgbtqia-v1448 viewsradar-social-lgbtqia-v1
radar-social-lgbtqia66,585 views (see note)Veronyka/radar-social-lgbtqia

Note: The radar-social-lgbtqia figure may include automated traffic and system requests, not just unique users. Overall, the numbers indicate sustained public presence and recurring use of the tools.

Real-world use and expansion

London: Classes using the tool

São Paulo: Classes using the tool

Turkey: Conversations in progress

Netherlands: Conversations in progress