r/MachineLearning Community Analysis

1. Data Sources & Methodology

Subreddit: r/MachineLearning (3,036,680 subscribers)
Total unique posts analyzed: 329 (after deduplication across 15 raw JSON files)
Date collected: 2026-04-10
Score range: 0 to 8,544
Median score: 219
Top 10 threshold: 3,510
Top 25 threshold: 2,619
Top 50 threshold: 1,847
Top 100 threshold: 1,238

Period breakdown:

Period	Posts	Score Range	Median	Notes
All-time	100	1,238-8,544	1,847	Historical GAN/video demos from 2020-2023, classic drama (Siraj, LeCun), founding-era AMAs
Year	100	179-1,609	246	Dominated by ICLR/NeurIPS review drama, LLM-era malaise, paper withdrawals, slop complaints
Month	100	20-415	52	ICML 2026 rebuttal anxiety, arXiv independence news, TurboQuant threads
Week	63	0-223	8	Raw pulse: ICML rebuttal questions, low-traction Projects, dead-on-arrival self-promos

Cross-subreddit score calibration: r/MachineLearning peaks at ~8,544 -- comparable to r/ClaudeAI (~8,084) but with a much lower all-time top-25 threshold (~2,619 vs. r/ClaudeAI ~3,000+). Despite being a 3M-subscriber giant (5x r/LocalLLaMA, 10x r/learnmachinelearning), scores are surprisingly compressed. A score of 300 is a good hit, 800+ is strong, 1,500+ is memorable, and 2,500+ is canonical. The median (219) is barely higher than r/learnpython's (~194) despite 15x the subscribers -- because the community heavily downvotes low-effort content and mods aggressively remove posts. Notably, the entire 2025-2026 dataset has NO post above 1,609 ([D] Got burned by an Apple ICLR paper). The 5,000+ viral hits are all 2019-2023 GAN/diffusion demo videos. The glory days of viral ML Twitter-bait are gone.

This is a content strategy guide, not a sociological study.

2. Subreddit Character

r/MachineLearning is the Hacker News of ML research -- a jury of skeptical, credentialed practitioners that has spent the last two years watching its own field collapse into slop and is now deeply, publicly grieving. It is not r/LocalLLaMA (enthusiasts tinkering with GGUF quants), not r/learnmachinelearning (students begging for roadmaps), and not r/OpenAI (consumers of Sam Altman takes). This is where people who've shipped NeurIPS papers, reviewed for ICLR, or actually trained a model from scratch hang out -- and they are tired.

Community identity: Graduate students, industry research scientists, professors, and senior MLEs. Comment sections name-drop OpenReview forum IDs, cite specific reviewer numbers, and dissect hyperparameter choices. The rules explicitly banish beginners ("Beginners -> /r/mlquestions or /r/learnmachinelearning"). The submit page warns: "Do NOT submit questions which are easily googled... Posts without appropriate tag (e.g: [R], [D], [P], [N])... Posts which lack technical detail." This is enforced ruthlessly by mods.

Product launches: Hostile to anything with a commercial smell. Rule 2 says "No Self-Promotion" and Rule 3 says "No Marketing Campaigns (SEO)" with "perpetually banned with all past posts and comments purged." The only "Shameless Self Promo" flair in the whole dataset appears exactly once ([P] The easiest way to process and tag video data, 1,693). Successful Project posts are either (a) visual demos of novel research (StyleGAN, diffusion, pose tracking) or (b) free open-source tools from known contributors (jsonathan's debuggers, OpenAssistant, cuML). A SaaS pitch, a "we just launched" announcement, or any post with a pricing page will be nuked within minutes.

Humor: Exists but must be wrapped in technical substance. [P] I trained a GAN to generate photorealistic fake penises (2,362) and [P] I trained a recurrent neural network trained to draw dick doodles (1,785) both worked because the method sections are legitimate. Pure shitposting dies. The community rewards researcher humor, not meme humor.

Technical level: The highest of any ML sub. Top posts casually reference Jacobians, MAP-Elites, surrogate gradients, KV caches, Hexagon NPU INT8 quantization, Fourier synthesis activation functions. Posts that use buzzwords without substance get eviscerated in comments (see: [R] The Gamechanger of Performer Attention Mechanism, 241 score but 0.85 ratio -- the title hype triggered the immune response).

Key cultural values (ranked):

Reproducibility and code release -- The single loudest grievance across the 2024-2025 corpus. [D] Papers with no code (203), [D] Published paper uses hardcoded seed and collapsed model (288), [D] Got burned by an Apple ICLR paper (1,609). Publishing a paper without runnable code is treated as near-fraud.
Anti-slop -- The community has a visceral, organized hatred of LLM-generated content. Can we stop these LLM posts and replies? [D] (258, 0.95), [D] Alarming amount of schizoid people being validated by LLMs (328). Posts suspected of being ChatGPT-written get flagged in comments and downvoted.
Open science / anti-corporate -- [D] Our community must get serious about opposing OpenAI (3,060), [D] Does anybody else despise OpenAI? (1,538, 0.87 ratio -- controversial but loud). The Ian Goodfellow return-to-office post (1,847) got 202 comments because it was framed as a principled stand.
Credentialism with a chip on its shoulder -- The community reveres real researchers (hardmaru, programmerChilli, Yann LeCun) but actively resents "top labs" gatekeeping. [D] I don't really trust papers out of "Top Labs" anymore (1,699), [D] Can we stop glazing big labs and universities? (302).
Peer-review cynicism -- Nearly every 2025-2026 Discussion post is about conferences: NeurIPS 2025 Reviews (237), NeurIPS 2025 Decisions (201, 1,008 comments), ICLR 2026 Reviews (188, 839 comments), ICML 2026 Review (125, 633 comments), NeurIPS 2025 Reviews mega-thread (237, 912 comments). These are the community's grief-processing rituals.

Enforcement mechanisms: The most aggressive of any ML sub. Rule 5 explicitly bans "No arXiv Links without Body Text" -- you cannot just post a paper link, you must write commentary. Rule 6 bans "No Low-Effort, Beginner Questions." The required flair system [R]/[D]/[P]/[N] is mandatory and posts without it are removed. A mod recently publicly quit (So long r/MachineLearning, 1,320) because API changes hurt their moderation workflow, and the quality visibly dropped -- the community itself now posts [D] Am I the only one noticing a drop in quality for this sub? (226). Community self-policing is aggressive: in comment sections, users will demand code links, cite reviewer numbers, and call out methodological flaws with receipts.

Mandatory posting format: Every title MUST be prefixed with a bracketed tag:

[R] = Research (paper/result)
[D] = Discussion (opinion, question, meta)
[P] = Project (code/demo/tool)
[N] = News (announcement, external link)

Combos like [R][P] or [D][R] are accepted. Posts without tags get removed or, if they slip through, get visibly lower engagement. The tag is load-bearing: a tool post tagged [D] will confuse readers; a controversy tagged [P] will look spammy. Choose deliberately.

How this differs from related subs: r/LocalLLaMA is for running models; r/MachineLearning is for understanding them. r/learnmachinelearning rewards sincerity and learning-in-public; r/MachineLearning punishes it. r/artificial and r/singularity are for speculation; r/MachineLearning demands citations. If you wouldn't publish something on arXiv or discuss it at a lab meeting, it doesn't belong here.

3. The All-Time Leaderboard

Median of full dataset: 219. Top-25 threshold: 2,619. Top-10 threshold: 3,510.

#	Score	Flair	Ratio	Comments	Format	Title
1	8,544	Project	0.99	198	VIDEO	[Project] From books to presentations in 10s with AR + ML
2	6,378	Discussion	0.99	136	VIDEO	[D] 1993 Yann LeCun demo of world's first ConvNet for text
3	4,924	Discussion	0.96	236	IMAGE	[D] This AI reveals time politicians stare at their phone
4	4,896	Research	0.97	109	VIDEO	[R] First Order Motion Model applied to animate paintings
5	4,807	News	0.97	230	VIDEO	[N] AI can turn old photos into moving Images
6	4,728	Discussion	0.98	212	IMAGE	[D] Types of Machine Learning Papers (meme infographic)
7	4,272	Project	0.95	168	VIDEO	[P] Robot that punishes me if I procrastinate
8	3,924	Discussion	0.95	566	TEXT	[D] The ML community has a toxicity problem
9	3,725	Project	0.99	174	VIDEO	[Project] Lucid Sonic Dreams: GAN art synced to music
10	3,510	Project	0.98	111	VIDEO	[P] Using oil portraits + First Order Model to animate paintings
11	3,457	Discussion	0.99	75	VIDEO	[D] CNN Visualization made with Unity 3D
12	3,272	Project	0.99	65	VIDEO	[P] RL agent air-dribbles in Rocket League clone
13	3,079	Research	0.93	212	VIDEO	[R] Speech-to-speech translation for unwritten language
14	3,060	Discussion	0.95	448	TEXT	[D] Our community must get serious about opposing OpenAI
15	2,904	Project	0.97	112	IMAGE	[P] CLI tool that explains errors using ChatGPT
16	2,870	Project	0.98	60	VIDEO	[P] Draw/write with hand + webcam using DL
17	2,842	Research	0.99	102	VIDEO	[R] Consistent Video Depth Estimation (SIGGRAPH 2020)
18	2,809	Research	0.99	146	VIDEO	[R] RIFE: 15FPS to 60FPS frame interpolation
19	2,801	Research	0.98	104	IMAGE	[R] Wolfenstein/Doom upscaled to realistic faces with PULSE
20	2,794	Project	0.97	249	GALLERY	[P] Anti-clickbait YouTube summaries via Instruct GPT
21	2,737	Project	0.97	72	IMAGE	[P] AI Twitter bot draws people's dream jobs
22	2,683	Discussion	0.97	311	TEXT	[D] A Super Harsh Guide to Machine Learning
23	2,650	Discussion	0.97	92	IMAGE	[D] Types of Machine Learning Papers (reposted meme)
24	2,620	Discussion	0.96	406	IMAGE	[D] An example of ML bias on Popular (locked post)
25	2,619	Discusssion	0.96	217	TEXT	Should r/ML join the reddit blackout?

Key observations: 21 of the top 25 are visual (VIDEO/IMAGE/GALLERY). The 4 TEXT posts are all meta-controversies: toxicity, opposing OpenAI, the Super Harsh Guide, and the blackout vote. There is zero "tool launch" content in the top 25 in the traditional sense -- the closest thing (command-line error tool #15) is by jsonathan, a repeat creator, and it's a lightweight dev utility. Note the misspelled flair "Discusssion" appears twice, including the sticky blackout post.

4. Content Type Dominance at Scale

Flair	Top 25	Top 50	Top 100	All Posts	Avg Score (All)	Avg Ratio	Best Post
Project	9	21	41	90	1,061	0.90	[Project] Books→presentations AR+ML (8,544)
Discussion	9	13	24	134	541	0.89	[D] 1993 Yann LeCun ConvNet demo (6,378)
Research	5	13	27	80	758	0.89	[R] First Order Motion Model (4,896)
News	1	2	6	19	926	0.96	[N] AI turns old photos to video (4,807)
Discusssion (typo)	1	1	2	2	1,963	0.96	Reddit blackout vote (2,619)
(no flair)	0	0	1	3	1,070	0.96	[Project] Texthero (1,475)
Shameless Self Promo	0	1	1	1	1,693	0.97	[P] Video data tagging (1,693)

Surprising finding #1: Discussion is the largest flair (134 posts, 41% of the dataset) but has the LOWEST avg score (541) and the LOWEST avg ratio (0.89). This is a flair that dominates by volume but performs poorly per-post. The 2025-2026 year is flooded with angsty [D] threads about peer review, and most of them score 50-250. Only viral [D] posts hit big, and they're almost always visual memes or meta-drama.

Surprising finding #2: Project posts have the highest avg score (1,061) and the highest score ceiling (8,544), but also the second-highest volume (90 posts). Project isn't just the "launch flair" -- it's the flair that rewards high-production visual demonstrations of novel ML capabilities. 41 of the top 100 posts are [P].

Surprising finding #3: News posts have the best ratio (0.96) and punch above their weight -- 19 posts, avg 926. News is a low-volume, low-friction, steady-performer flair. If you have something factual to share (a release, a leak, a departure), News is safer than Discussion.

The typo "Discusssion" is the highest-performing single flair (avg 1,963) -- but only because both posts happened to be sticky mod announcements. Don't read into it.

5. Content Archetypes That Work

Archetype 1: The Viral Research Demo Video (score ceiling: 8,544)

Score range: 1,500-8,544
Examples:
- [Project] From books to presentations in 10s with AR + ML (8,544) -- cyrildiagne
- [D] 1993 Yann LeCun ConvNet for text recognition (6,378)
- [R] First Order Motion Model applied to animate paintings (4,896)
- [R] RIFE: 15FPS to 60FPS frame interpolation (2,809)
- [Project] Lucid Sonic Dreams: GAN art synced to music (3,725)
The pattern: A 15-60 second hosted video (v.redd.it) showing a specific, legible ML capability doing something concrete. Always has a crossposts count in the double digits (these go viral outside the sub). No talking head. The demo IS the post -- the selftext is empty.
Why it matters: This is the ONLY archetype that breaks 5,000. If your ceiling is "viral," this is your format. But note: none of these posts are from 2024-2026. The era of pure-demo virality is over. The community has shifted toward drama and meta.

Archetype 2: The Meta-Drama / Community Soul-Search (score ceiling: 3,924)

Score range: 1,300-3,924
Examples:
- [D] The machine learning community has a toxicity problem (3,924, 566 comments)
- [D] Our community must get serious about opposing OpenAI (3,060, 448 comments)
- [D] A Super Harsh Guide to Machine Learning (2,683, 311 comments)
- [D] Siraj has a new paper: 'The Neural Qubit'. It's plagiarised (2,574, 451 comments)
- [D] Got burned by an Apple ICLR paper (1,609) -- the top-scoring post of 2025
The pattern: A long-form TEXT post (500-2000 words) written in the first person, making a principled argument about the state of the field. Structured with numbered grievances or bold headers. Names names. Provides receipts (links, screenshots, openreview IDs). Comments section explodes with C/U ratios above 0.15.
Why it matters: This is the ONLY archetype still reliably breaking 1,500 in the post-2024 era. If you want current-day visibility, this is the play -- but you must have real evidence, not just vibes. The Apple ICLR whistleblower post worked because the author documented the bug in GitHub issues first.

Archetype 3: The Tool From A Known Contributor (score ceiling: 2,904)

Score range: 200-2,904
Examples:
- [P] CLI tool that explains errors using ChatGPT (2,904) -- jsonathan
- [P] I built a chatbot that lets you talk to any Github repository (1,699) -- jsonathan
- [P] AppleNeuralHash2ONNX: Reverse-Engineered Apple NeuralHash (1,743) -- first reverse engineering
- [P] OpenAssistant: World's largest open-source ChatGPT replication (1,276) -- ykilcher
- [P] OpenEvolve: Open Source AlphaEvolve (216) -- asankhs
The pattern: Shipped open-source code (GitHub link in comments, not title -- the mod-safe way). Either (a) a reproduction of something famous, (b) a reverse-engineering of something proprietary, or (c) a minimal dev tool. jsonathan has 6 top-100 posts with an avg of 1,548 -- the canonical "known tool builder" profile.
Why it matters: You won't break 2,000 as a first-time poster. But repeat contributors establish reputation; once you do, your tool posts get a visible uplift. Note that none of these include SaaS, pricing, or "sign up for waitlist" language. They are all self-hostable/MIT-licensed.

Archetype 4: The "Types of ML Papers" Infographic Meme (score ceiling: 4,728)

Score range: 2,413-4,728
Examples:
- [D] Types of Machine Learning Papers (4,728) -- TheInsaneApp
- [D] Types of Machine Learning Papers (2,650) -- repost by another user
- [D] Types of Machine Learning Papers (2,413) -- yet another repost
- [D] This AI reveals politicians staring at phones (4,924)
- [D] Convolution Neural Network Visualization (Unity 3D) (3,457)
The pattern: A single-panel image (i.redd.it) that is either (a) a self-aware meme about ML research culture or (b) a polished visualization of an ML concept. Comments read "saving this" and "finally someone said it." Exactly the same "Types of ML Papers" image has been reposted 3 times and scored 4,728 / 2,650 / 2,413 respectively -- the community karma-farms its own memes.
Why it matters: Fastest path to 2,000+ if you have design skills. But know that the community sees through reposts at this point. Original meme content has a much higher ceiling.

Archetype 5: The Credible Whistleblower (score ceiling: 1,609)

Score range: 200-1,609
Examples:
- [D] Got burned by an Apple ICLR paper (1,609, 104 comments)
- [D] Published paper uses hardcoded seed and collapsed model (288, 65 comments)
- [D] Tsinghua ICLR paper withdrawn due to AI citations (363, 66 comments)
- [D] ICML: every paper in my review batch contains prompt-injection text (457, 90 comments)
- [D] 100 Hallucinated Citations Found in 51 NeurIPS 2025 Papers (397, 79 comments)
The pattern: First-person narrative: "I tried to reproduce X, found bug Y, here's the GitHub issue, here's the OpenReview link." Evidence-rich, not ranty. Always names the paper. Often triggers a secondary news cycle (paper withdrawal, author response).
Why it matters: The single most respected archetype post-2024. If you're a researcher or reviewer, this is your highest-leverage play. The Apple ICLR post is THE top post of 2025 -- showing that the community's attention has fully rotated from demos to accountability.

Archetype 6: The Serious Technical Deep-Dive Post (score ceiling: ~400)

Score range: 50-454
Examples:
- [R] LLMs are Locally Linear Mappings (244, 45 comments) -- jamesvoltage, detached Jacobians
- [R][P] We compress any BF16 model to ~70% size LOSSLESS (200, 27 comments) -- DF11
- [N] cuML zero code change (scikit-learn on GPU) (454, 25 comments)
- [R] The Resurrection of the ReLU (235, 63 comments) -- SUGAR surrogate gradients
- [R] Analysis of 350+ ML competitions in 2025 (221) -- mlcontests.com
The pattern: A detailed technical post (500+ words) describing a novel method, a detailed benchmark, or a carefully-documented practical finding. Always includes arXiv link, GitHub repo, and concrete numbers. The author engages heavily in comments, answering technical questions.
Why it matters: Your ceiling as an independent researcher is ~400-500. Do not expect viral. Expect a small-but-surgical audience that reads your paper and cites it. The distribution value is reputation-building, not visibility.

6. Format Analysis

Format	Top 25	Top 50	Top 100	All	% Top 25	% Top 100	% All
VIDEO	13	27	50	50	52%	50%	15%
IMAGE	7	13	22	39	28%	22%	12%
TEXT	4	6	21	207	16%	21%	63%
GALLERY	1	2	2	14	4%	2%	4%
LINK	0	2	5	19	0%	5%	6%
GIF	0	0	0	0	0%	0%	0%

The visual cliff is stark: 84% of the top 25 and 72% of the top 100 are visual (VIDEO+IMAGE+GALLERY), yet only 31% of the full dataset is visual. Every one of the top 50 VIDEO posts is from 2020-2023. The 2024-2026 period is almost entirely TEXT — the community has stopped upvoting demo videos at scale. If you post a video today, your ceiling is probably ~500, not 5,000.

What format to use for what:

Novel ML result / paper → TEXT (with arXiv link + GitHub in body, never the title). Discussion flair if you want engagement, Research if you want to be taken seriously. Video was the winning format 2020-2023 but has been overtaken by detailed text.
Reproducibility drama / whistleblower → TEXT. Must be first-person, must have receipts, must link OpenReview/GitHub.
Tool / library launch → TEXT with flair [P]. Link in comments, not title. Visual preview is optional but helps; don't lead with it. Include benchmarks table.
Conference review rant → TEXT. Short is fine. Let comments do the work.
Infographic / meme → IMAGE. Only works if actually original and designed.
Paper summary / news → TEXT or LINK with [N] flair. LINK is safer than you'd think for News -- ratio averages 0.96.

What makes a good (historical) demo video: Looking at the 2020-2023 video hits:

Length: 10-30 seconds. Almost every top video post is short. No intros, no outros.
Hosted on v.redd.it, not YouTube. YouTube links consistently underperform; Reddit's native player inflates engagement.
Show the transformation, not the training. "Photo → animation" not "loss curve → accuracy plot." The payload is in the final frame contrast.
No audio narration. Silent demos outperform narrated ones. Music is optional.
Caption the capability in the title, not the method. "Turn old photos into moving images" > "Applied Motion Model with Latent Warping."

Gallery format: Used in only 14 posts, typically 4-10 images showing before/after comparisons, training curves, or a multi-pane example grid. Best gallery post: [P] I'm using Instruct GPT to show anti-clickbait summaries (2,794).

7. Flair/Category Strategy

Raw performance ranking:

Project (avg 1,061) -- highest ceiling (8,544), reliable for visual demos
News (avg 926) -- highest ratio (0.96), low friction, factual
Research (avg 758) -- reliable mid-performer if you have an actual paper
Discussion (avg 541) -- highest volume, lowest avg, high variance

Distribution utility ranking (for someone trying to get reach):

Discussion -- Despite its low avg, this is your best vehicle if you have a credible grievance or meta-observation. 8 of the 25 biggest current-era posts are [D]. The C/U ratio is 0.59 (highest of all flairs) -- meaning Discussion posts generate real conversations.
Project -- Best for shipping code or a demo. Avg C/U is 0.17 (passive upvotes, not discussion).
Research -- Best for establishing credibility. Avg C/U is 0.38 (moderate engagement).
News -- Best if you have a factual announcement and want safety (high ratio).

Title-prefix tag conventions:

[R] Research -- you have a paper, method, or result
[D] Discussion -- opinion, question, meta-commentary (do NOT use for tool launches)
[P] Project -- code, demo, tool (always include GitHub/repo link in body)
[N] News -- announcement, external event, release
[Project] also acceptable (long-form spelling, slightly archaic)
Combos: [R][P], [D][R], [N][P] are fine. Mods are more tolerant than strict.

Ironic flair use: The "Discusssion" typo appears on two sticky mod posts (blackout, reddit API). Some users tag career rants as [R] -- those get removed. Tag-mismatch is a fast removal trigger.

Pricing / commercial language hierarchy (most to least community-friendly):

Free, open-source, MIT/Apache — Fully welcomed. The only safe category.
Open-weights models with self-hosted inference — Welcome if accompanied by real eval numbers.
Freemium with API — Tolerated if the paper/method is the focus and the API is secondary.
"Try our demo" links — Acceptable only if no login required and the method is novel.
Paid products / subscriptions — Dead on arrival. Will be removed.
"Sign up for waitlist" / "join our Discord" — Perma-ban territory.

8. Title Engineering

Deconstructing the top 10:

"[Project] From books to presentations in 10s with AR + ML" (8,544) -- Specific time ("10s"), concrete input→output ("books → presentations"), specific tech stack ("AR + ML"). You can visualize the demo before clicking.
"[D] A Demo from 1993 of 32-year-old Yann LeCun showing off the World's first Convolutional Network for Text Recognition" (6,378) -- Celebrity name + historic framing + "World's first." Leans on tribal reverence.
"[D] This AI reveals how much time politicians stare at their phone at work" (4,924) -- Political lightning rod + "AI reveals" framing. This is tabloid-coded but worked because of the target.
"[R] First Order Motion Model applied to animate paintings" (4,896) -- Concrete method name + surprising application. The "applied to paintings" twist creates the upvote trigger.
"[N] AI can turn old photos into moving Images" (4,807) -- Nostalgia + capability demonstration. Zero jargon.
"[D] Types of Machine Learning Papers" (4,728) -- Meta-humor. Title tells you exactly what the image contains.
"I made a robot that punishes me if it detects that if I am procrastinating on my assignments [P]" (4,272) -- First-person confessional + absurd outcome. Humor wrapped in ML.
"[D] The machine learning community has a toxicity problem" (3,924) -- Grievance framing + specific community target. Provokes immediate agree/disagree split.
"[Project] NEW PYTHON PACKAGE: Sync GAN Art to Music with 'Lucid Sonic Dreams'! (Link in Comments)" (3,725) -- ALL CAPS + named deliverable + "(Link in comments)" (mod-safe pattern).
"[P] Using oil portraits and First Order Model to bring the paintings back to life" (3,510) -- Method + evocative outcome ("bring paintings back to life").

Title formulas that work:

Formula 1: The Surprising Application

"[R] First Order Motion Model applied to animate paintings" (4,896)
"[R] Wolfenstein and Doom Guy upscaled into realistic faces with PULSE" (2,801)
"[R] Speech-to-speech translation for a real-world unwritten language" (3,079)
Template: [Tag] [Known method] applied to [unexpected domain]

Formula 2: The Grievance Manifesto

"[D] The machine learning community has a toxicity problem" (3,924)
"[D] I don't really trust papers out of 'Top Labs' anymore" (1,699)
"[D] The current and future state of AI/ML is shockingly demoralizing" (1,492)
"[D] Why can't you guys comment your fucking code?" (1,658)
Template: [D] [Systemic complaint] [about the field]

Formula 3: The Whistleblower Reveal

"[D] Got burned by an Apple ICLR paper — it was withdrawn after my Public Comment" (1,609)
"[D] Siraj has a new paper: 'The Neural Qubit'. It's plagiarised" (2,574)
"[D] ICML: every paper in my review batch contains prompt-injection text" (457)
Template: [D] [Specific paper/actor] [specific wrongdoing]

Formula 4: The First-Person Build Confessional

"I made a robot that punishes me if it detects that if I am procrastinating on my assignments [P]" (4,272)
"[P] I trained a GAN to generate photorealistic fake penises" (2,362)
"[P] I built a chatbot that lets you talk to any Github repository" (1,699)
Template: [P] I [built/trained/made] [absurd but legitimate artifact]

Formula 5: The "Link in Comments" Handshake

"[N] AI can turn old photos into moving Images / Link is given in the comments" (4,807)
"[R] WHIRL algorithm... (link in comments)" (1,750)
"[P] Pose Animator: SVG animation tool... (links in comments)" (1,674)
Template: [Tag] [Description] (link in comments) -- the mod-safe way to share external links

Formula 6: The Known-Method-With-Numbers

"[R] RIFE: 15FPS to 60FPS Video frame interpolation" (2,809)
"[R] SIMPLERECON — 3D Reconstruction — 73ms per frame!" (1,421)
"[R][P] We compress any BF16 model to ~70% size LOSSLESS" (200)
Template: [R] [Method name]: [specific metric improvement]

Title anti-patterns (community-specific):

No raw arXiv dumps: Rule 5 explicitly bans this. [R] TriAttention: Efficient KV Cache Compression (10 score, 0.78 ratio) died because the post was a thin wrapper around a link.
No "How to learn ML in 2026": The community ships these to r/learnmachinelearning. Any "roadmap" post gets removed or buried.
No vague questions: [D] Best websites for pytorch/numpy interviews (8 score, 0.66 ratio). The community tells you to google it.
No hype adjectives without substance: [R] The Gamechanger of Performer Attention Mechanism (241 score but 0.85 ratio). "Gamechanger" triggered the bullshit detector.
No vendor branding in titles: [P] My DC-GAN works better then ever! (292, but 0.94 ratio -- barely). Posts that sound like commercial launches get immediate skepticism.
No "I used ChatGPT to..." content: [P] I Gave Claude Code 9.5 Years of Health Data (230, 0.89 -- the friction is visible). The community sees "I used Claude" and assumes low effort.
No benchmark numbers without methodology: [R] 94.42% on BANKING77 Official Test Split (0 score, 0.25 ratio). The community suspects test leakage.

9. Engagement Patterns

C/U ratios by flair:

Flair	Avg C/U	Interpretation
Discussion	0.59	Discussion-generating -- people reply to each other, not just vote
Research	0.38	Moderate discussion -- researchers ask technical questions
Discusssion (typo)	0.34	Same as Discussion
Project	0.17	Passive upvoting -- people star the repo, move on
News	0.16	Passive -- "cool, saved"
(no flair)	0.11	Very passive
Shameless Self Promo	0.03	Almost no engagement

If your goal is VISIBILITY (broadest reach): Post a [Project] with a VIDEO (historical ceiling) or a [D] meta-complaint (current-era ceiling). Both can reach 2,000+ upvotes in the right conditions.

If your goal is RELATIONSHIPS and discussion: Post a [D] long-form grievance or a [R] with a paper and open questions. These are where DMs come from; these are where collaborators reach out.

If your goal is REPUTATION-BUILDING for long-term distribution: Post technical [R] and [P] content consistently. Build a known username like jsonathan (6 top-100 posts, avg 1,548) or Illustrious_Row_9971 (12 top-100 posts, avg 1,954). After ~5 quality posts, your ceiling rises.

Highest-discussion topics (most comments relative to score):

Conference review mega-threads: [D] - NeurIPS 2025 Decisions (201 score / 1,008 comments, C/U=5.0), [D] ICLR 2026 Paper Reviews (188 / 839, C/U=4.5), [D] - NeurIPS'2025 Reviews (237 / 912, C/U=3.8), [D] ICML 2026 Review Discussion (125 / 633, C/U=5.1). These are seasonal grief-processing threads that generate low scores but enormous comment volumes.
"OpenAI hate" threads: [D] Does anybody else despise OpenAI? (1,538 / 429, C/U=0.28). Controversial but generates huge debate.
Career pity: [D] Ph.D. from top Europe university, 10 NeurIPS papers, 0 Big Tech interviews (472 / 154, C/U=0.33)
Whistleblower threads: Every "I found a fraud" post generates 60-200 comments even at moderate scores.
Meta-critique threads: [D] Am I the only one noticing a drop in quality for this sub? (226 / 79, C/U=0.35)

10. What Gets Downvoted

Notable low-ratio posts (< 0.85):

Title	Score	Ratio
[D] Has industry effectively killed off academic ML research in 2026?	173	0.82
[D] Has "AI research lab" become completely meaningless as a term?	73	0.80
[D] How do ML engineers view vibe coding?	53	0.83
[P] Vibecoded on a home PC: neural chess engine	81	0.72
[R] The Gamechanger of Performer Attention Mechanism	241	0.85
[R] Controlled experiment: LLM agent with CS papers for HP search	51	0.82
[D] Why I abandoned YOLO for safety-critical plant ID	37	0.74
[P] Weight Norm Clipping Accelerates Grokking 18-66×	62	0.84
[D] - 1M tokens/second serving Qwen 3.5 27B on B200 GPUs	22	0.77

Ratio tiers:

Above 0.94 (safe): 176 posts (54%). Universally well-received -- technical content with evidence, factual news, genuine research.
0.85-0.94 (friction): 100 posts (30%). Net positive but with visible pushback. Meta-complaints about the field, controversial opinions, posts that name-check LLMs without being about LLMs.
Below 0.85 (controversial/hostile): 53 posts (16%). The bottom of the barrel -- failed launches, vague questions, suspected slop, LLM-flavored posts, and career complaints that come across as entitled.

Anti-patterns (community-specific):

The "Vibecoded" Tell -- Any post that uses the word "vibecoded" or admits to "I asked ChatGPT to..." gets a visible ratio drop. [P] Vibecoded on a home PC: building a ~2700 Elo browser-playable neural chess engine (81 score, 0.72 ratio) -- the highest-friction post in the current-era dataset. The community interprets "vibecoded" as "I don't understand my own code." Avoid this word.
The Hype Adjective Trigger -- "Gamechanger," "revolutionary," "state of the art," "breakthrough" in the title without extraordinary evidence. The community has seen hundreds of these and is immunized. [R] The Gamechanger of Performer Attention Mechanism (241, 0.85) is textbook.
The Unverified Benchmark Post -- Posting a benchmark number without methodology, code, or reproducibility is treated as suspected fraud. [R] 94.42% on BANKING77 Official Test Split (0, 0.25) -- the community assumes test-set leakage and downvotes on sight.
The "I'm Tired" Career Rant Without Specifics -- Career complaints without technical substance get cold-shouldered. [D] ICML 2026 am I cooked? (21, 0.78), [D] Rejected a Solid Offer Waiting for My 'Dream Job' (197, 0.86) -- even the successful ones show friction.
The LLM Slop Suspicion -- Posts that read like they were written by ChatGPT -- overly structured, bullet-heavy, emoji-laden ("🚀"), generic phrases -- trigger mass downvoting regardless of actual origin. The community has developed a hair-trigger for this. Even when the underlying content is legitimate (e.g., [N] cuML zero code change 🚀 still got 454, 0.99 -- but only because NVIDIA is a known entity).
The Self-Promo Waitlist -- Any post that ends with "sign up for the waitlist" or "follow us for updates" is dead on arrival. The dataset has zero successful examples.
The "Is ACL / NeurIPS / venue X a scam" Meta-Complaint -- [D]NLP conferences look like a scam.. (267, 0.92), [D] Bad Industry research gets cited (266, 0.94). These work at a modest level but rarely break 500 because the community sees them as repetitive grievance-posting.

No formal blacklist, but informal memory is long:

There is no published hall-of-shame, but moderator decisions and community memory are aggressive. Siraj Raval drama persisted for months (1,285 / 1,357 / 2,574 scores across multiple posts). Usernames that push self-promo get called out in comments. The best defense is to post as a person with a visible history (GitHub, prior comments, arXiv), not as a throwaway.

11. The Distribution Playbook

Phase 1: Pre-launch (build presence)

Read the rules. Seriously. Rule 2 (self-promo) and Rule 3 (marketing) will get you banned. Rule 5 (no bare arXiv) will get you removed.
Lurk for 2 weeks. Read the 2025-2026 top 100 posts, not the 2020 greatest hits. The community has changed.
Comment on 10+ technical threads. Answer questions, cite papers, demonstrate depth. The community can check your username history and this matters.
Ensure your artifact is shippable. GitHub repo with a README. Runnable code. Reproducible results. Benchmark numbers against a real baseline.
Do NOT post a tool that's behind a signup form. Kill the login wall first, or don't post.

Phase 2: Launch day

Choose the right flair. Default to [P] for tools, [R] for papers, [D] for opinion, [N] for announcements. When in doubt, [P] + [R] combo is safe.
Write the title with Formula 1, 4, or 6 (Surprising Application / First-Person Build / Method-with-Numbers). Avoid hype adjectives.
Post body structure:
- 1 paragraph: what it is and why it matters
- 1 section: how it works (method, with real technical detail)
- 1 section: results with a table
- 1 section: honest limitations ("batch size 1 latency is ~40% slower")
- Links: GitHub (always), paper (if research), demo (optional)
No marketing language. No "revolutionary." No "state of the art" unless you have the numbers. No "sign up."
Timing: The dataset doesn't show strong timing effects, but posts from 12:00-17:00 UTC on Tuesdays-Thursdays dominate the top 100. Weekends are slower.
Length: Text posts that work are 300-1500 words. Shorter gets dismissed as low-effort; longer gets TL;DR'd.

Phase 3: First 24-48 hours

Engage every technical question in the first 4 hours. This is the single highest-leverage activity. Comment C/U ratios are strongest for posts where the OP is visibly present.
Acknowledge limitations. When someone says "what about X edge case," reply with a specific honest answer. The community rewards humility and punishes defensiveness.
Pre-emptively answer the "Is this just LLM slop?" question. If your post is technical, lead with a concrete method section. If your post is opinion, lead with personal first-person context.
Watch the ratio at the 4-hour mark. If ratio < 0.90, the post is in trouble -- expect mod attention. If ratio > 0.95, you're safe to engage and expand.
Don't cross-post. The community notices. Post once, engage there, leave it alone.

Phase 4: Ongoing presence

Follow up with progress posts every 4-8 weeks. Don't do weekly "update" spam.
Participate in other people's threads. The community's credibility graph is author-based -- if you're only present in your own posts, you're seen as a self-promoter.
Build a second-post reputation. Your first [P] might score 200; your fifth, with a now-known GitHub handle, can score 1,500+ for the same category of work.
Avoid the LLM content trap. Do not post anything that looks like a ChatGPT summary. If you use LLM assistance for writing, heavily edit to remove the telltale patterns (bullet structure, emoji, "let's dive in," "in conclusion").

Community-specific comment strategy templates:

"Is this vibe-coded?" → "No. The core [component] is hand-written PyTorch/Jax. I did use Claude for [specific task like docstrings or test generation]. Here's the commit history showing the development: [link]."
"Where's the code?" → "GitHub link is in the post body: [link]. Reproducibility script is in /examples/benchmark.py."
"Doesn't [X existing tool] already do this?" → "Yes, and [X] is great. The specific difference is [concrete technical delta]. Here's a comparison table: [table]. If [X] works for you, use it."
"Did you test on [hard benchmark]?" → "Not yet -- my compute budget was ~$[X] on [GPU]. I'd love to see results on [benchmark] if anyone has compute. Here's how to reproduce: [link]."
"Is this peer-reviewed?" → "No, this is an open-source [tool/method] release. I'm planning to submit to [venue] once I have [additional experiments]."

Stealth distribution tactics:

Answer technical questions on other people's threads with "I ran into this recently while building [your tool] -- found that [technical insight]." Link only if explicitly asked. This builds name recognition without triggering self-promo filters.
Participate in conference mega-threads (NeurIPS reviews, ICLR discussion). These have 500-1000+ comments and are where repeat names get noticed. Mention your work only when directly relevant.
Be a helpful citation -- when someone asks "has anyone implemented X?" and you have, link your repo. This is accepted distribution.
Post methodology critiques of bad papers. If done with receipts, these go viral (see Archetype 5). Your name gets associated with rigor.

Score-tier calibration:

Tool launches in the post-2024 era: realistic ceiling is 500-1,500. 2,000+ requires a known username or a genuinely novel result. Do not expect 5,000.
Research papers: 200-800 is typical. 1,500+ only for whistleblower angles or major lab announcements.
Discussion / meta posts: 200-1,000 is typical. 1,500+ requires a credible grievance with receipts.
Visual demos: Ceiling has collapsed. Historical 5,000+ videos from 2020-2023 would likely score 500-1,500 today.

Post-publication measurement:

First 1 hour, < 50 upvotes: Normal for technical posts. Don't panic.
First 4 hours, ratio < 0.88: The post has friction. Review the comments -- address pushback directly.
First 4 hours, ratio < 0.80: The post is actively disliked. Usually means hype adjectives, missing code link, or slop suspicion. Sometimes salvageable with edits; often not.
First 12 hours, < 100 upvotes on [P] or [R]: Post is dead. The ML news cycle moves fast; it will not recover.
First 24 hours, 300+ upvotes and ratio > 0.94: Strong hit. Engage every comment over the next 48 hours. This is when reputation is built.
500+ upvotes: The comment section becomes the most valuable asset -- likely collaborators and critics worth responding to.

12. Applying This to Any Project

Pre-launch checklist (12 items):

Scenario-based launch guides:

Scenario A: You built a free, open-source tool

Formula: [P] I built [specific tool name] for [specific ML task] or [P] [Tool name]: [specific capability with number]
Flair: [P]
Body structure: What it is (1 para) → method/architecture (1 section) → benchmark table → comparison to existing tools → GitHub link → honest limitations
Key risk: Comments section dissecting your baseline comparison. Mitigation: pre-empt with "I compared against [X, Y, Z] using [exact protocol]; here's the reproducibility script."

Scenario B: You have a research paper to share

Formula: [R] [Method name]: [surprising result] or [R] [Known method] applied to [unexpected domain]
Flair: [R]
Body structure: TL;DR (1 para) → the contribution (1 para) → key experimental results (table) → arXiv link → GitHub link → what you'd like feedback on
Key risk: Reviewers will call out any missing ablations, unfair comparisons, or unreleased code. Mitigation: release the code first. Then post.

Scenario C: You found a bug/fraud in a published paper

Formula: [D] [Specific paper/venue] [specific problem I found]
Flair: [D]
Body structure: Backstory (1 para) → what I tried to reproduce → what I found → the evidence (screenshots, GitHub issues, OpenReview links) → author response if any → call to action
Key risk: Authors or their supporters will counter-attack. Mitigation: file a GitHub issue FIRST, let them have a chance to respond, then post. The post lands harder when you can show "I gave them 6 days to respond."

Scenario D: You want to discuss a meta-issue about the field

Formula: [D] [Systemic observation or grievance]
Flair: [D]
Body structure: The observation → specific examples → why it matters → what you'd like to discuss
Key risk: Gets dismissed as "another conference rant." Mitigation: bring specific data (numbers of submissions, actual review scores, dated examples). The difference between 500 and 1,500 is whether you have receipts.

Scenario E: Your project was built with AI assistance

Formula: Lead with the technical contribution, not the tooling. NEVER use the word "vibecoded."
Flair: [P]
Body structure: Same as Scenario A, but with an explicit "How I used LLMs in development" paragraph that is honest about what LLMs wrote vs. what you wrote. The community rewards transparency; it punishes concealment.
Key risk: The 0.72 ratio of the "Vibecoded neural chess engine" post. Mitigation: frame as "I wrote the training loop and novel components; I used Claude Code for [specific tasks]."

Cross-posting guidance (reframing for other subs):

On r/MachineLearning: Lead with method, benchmarks, and reproducibility.
On r/LocalLLaMA: Lead with model size, quantization, and inference speed on consumer hardware.
On r/learnmachinelearning: Lead with the learning journey and educational value; frame as "I learned X by building Y."
On r/programming: Lead with the software engineering aspects -- architecture, language, pipeline.
On r/OpenAI or r/ClaudeAI: Lead with the AI tool integration and prompt engineering learnings.
On r/sideproject or r/buildinpublic: Lead with the product story -- user pain, iteration, shipping.

Critical distinction: r/MachineLearning's reader is not r/sideproject's reader. If you're posting the same tool to both, the r/MachineLearning version must strip all product-marketing language, emphasize technical novelty over utility, and include benchmark numbers that would never appear in a r/sideproject post. Do NOT cross-post literally; re-author each version.

Final calibration: If you are a first-time poster launching an open-source ML tool in 2026, expect a realistic ceiling of 300-800 upvotes. If you're a known contributor with prior posts, 1,000-2,000 is possible. If you have a credible whistleblower story about a top-lab paper, 1,500+ is achievable. If you're hoping for 5,000, you need to either (a) time-travel to 2021 with a GAN demo video or (b) have something genuinely unprecedented. The community has become quieter, more skeptical, and more demanding -- but its attention, when earned, is still the most valuable in all of ML Reddit.