r/MachineLearning Community Analysis
1. Data Sources & Methodology
- Subreddit: r/MachineLearning (3,036,680 subscribers)
- Total unique posts analyzed: 329 (after deduplication across 15 raw JSON files)
- Date collected: 2026-04-10
- Score range: 0 to 8,544
- Median score: 219
- Top 10 threshold: 3,510
- Top 25 threshold: 2,619
- Top 50 threshold: 1,847
- Top 100 threshold: 1,238
Period breakdown:
| Period | Posts | Score Range | Median | Notes |
|---|---|---|---|---|
| All-time | 100 | 1,238-8,544 | 1,847 | Historical GAN/video demos from 2020-2023, classic drama (Siraj, LeCun), founding-era AMAs |
| Year | 100 | 179-1,609 | 246 | Dominated by ICLR/NeurIPS review drama, LLM-era malaise, paper withdrawals, slop complaints |
| Month | 100 | 20-415 | 52 | ICML 2026 rebuttal anxiety, arXiv independence news, TurboQuant threads |
| Week | 63 | 0-223 | 8 | Raw pulse: ICML rebuttal questions, low-traction Projects, dead-on-arrival self-promos |
Cross-subreddit score calibration: r/MachineLearning peaks at ~8,544 -- comparable to r/ClaudeAI (~8,084) but with a much lower all-time top-25 threshold (~2,619 vs. r/ClaudeAI ~3,000+). Despite being a 3M-subscriber giant (5x r/LocalLLaMA, 10x r/learnmachinelearning), scores are surprisingly compressed. A score of 300 is a good hit, 800+ is strong, 1,500+ is memorable, and 2,500+ is canonical. The median (219) is barely higher than r/learnpython's (~194) despite 15x the subscribers -- because the community heavily downvotes low-effort content and mods aggressively remove posts. Notably, the entire 2025-2026 dataset has NO post above 1,609 ([D] Got burned by an Apple ICLR paper). The 5,000+ viral hits are all 2019-2023 GAN/diffusion demo videos. The glory days of viral ML Twitter-bait are gone.
This is a content strategy guide, not a sociological study.
2. Subreddit Character
r/MachineLearning is the Hacker News of ML research -- a jury of skeptical, credentialed practitioners that has spent the last two years watching its own field collapse into slop and is now deeply, publicly grieving. It is not r/LocalLLaMA (enthusiasts tinkering with GGUF quants), not r/learnmachinelearning (students begging for roadmaps), and not r/OpenAI (consumers of Sam Altman takes). This is where people who've shipped NeurIPS papers, reviewed for ICLR, or actually trained a model from scratch hang out -- and they are tired.
Community identity: Graduate students, industry research scientists, professors, and senior MLEs. Comment sections name-drop OpenReview forum IDs, cite specific reviewer numbers, and dissect hyperparameter choices. The rules explicitly banish beginners ("Beginners -> /r/mlquestions or /r/learnmachinelearning"). The submit page warns: "Do NOT submit questions which are easily googled... Posts without appropriate tag (e.g: [R], [D], [P], [N])... Posts which lack technical detail." This is enforced ruthlessly by mods.
Product launches: Hostile to anything with a commercial smell. Rule 2 says "No Self-Promotion" and Rule 3 says "No Marketing Campaigns (SEO)" with "perpetually banned with all past posts and comments purged." The only "Shameless Self Promo" flair in the whole dataset appears exactly once ([P] The easiest way to process and tag video data, 1,693). Successful Project posts are either (a) visual demos of novel research (StyleGAN, diffusion, pose tracking) or (b) free open-source tools from known contributors (jsonathan's debuggers, OpenAssistant, cuML). A SaaS pitch, a "we just launched" announcement, or any post with a pricing page will be nuked within minutes.
Humor: Exists but must be wrapped in technical substance. [P] I trained a GAN to generate photorealistic fake penises (2,362) and [P] I trained a recurrent neural network trained to draw dick doodles (1,785) both worked because the method sections are legitimate. Pure shitposting dies. The community rewards researcher humor, not meme humor.
Technical level: The highest of any ML sub. Top posts casually reference Jacobians, MAP-Elites, surrogate gradients, KV caches, Hexagon NPU INT8 quantization, Fourier synthesis activation functions. Posts that use buzzwords without substance get eviscerated in comments (see: [R] The Gamechanger of Performer Attention Mechanism, 241 score but 0.85 ratio -- the title hype triggered the immune response).
Key cultural values (ranked):
- Reproducibility and code release -- The single loudest grievance across the 2024-2025 corpus.
[D] Papers with no code(203),[D] Published paper uses hardcoded seed and collapsed model(288),[D] Got burned by an Apple ICLR paper(1,609). Publishing a paper without runnable code is treated as near-fraud. - Anti-slop -- The community has a visceral, organized hatred of LLM-generated content.
Can we stop these LLM posts and replies? [D](258, 0.95),[D] Alarming amount of schizoid people being validated by LLMs(328). Posts suspected of being ChatGPT-written get flagged in comments and downvoted. - Open science / anti-corporate --
[D] Our community must get serious about opposing OpenAI(3,060),[D] Does anybody else despise OpenAI?(1,538, 0.87 ratio -- controversial but loud). The Ian Goodfellow return-to-office post (1,847) got 202 comments because it was framed as a principled stand. - Credentialism with a chip on its shoulder -- The community reveres real researchers (hardmaru, programmerChilli, Yann LeCun) but actively resents "top labs" gatekeeping.
[D] I don't really trust papers out of "Top Labs" anymore(1,699),[D] Can we stop glazing big labs and universities?(302). - Peer-review cynicism -- Nearly every 2025-2026 Discussion post is about conferences: NeurIPS 2025 Reviews (237), NeurIPS 2025 Decisions (201, 1,008 comments), ICLR 2026 Reviews (188, 839 comments), ICML 2026 Review (125, 633 comments), NeurIPS 2025 Reviews mega-thread (237, 912 comments). These are the community's grief-processing rituals.
Enforcement mechanisms: The most aggressive of any ML sub. Rule 5 explicitly bans "No arXiv Links without Body Text" -- you cannot just post a paper link, you must write commentary. Rule 6 bans "No Low-Effort, Beginner Questions." The required flair system [R]/[D]/[P]/[N] is mandatory and posts without it are removed. A mod recently publicly quit (So long r/MachineLearning, 1,320) because API changes hurt their moderation workflow, and the quality visibly dropped -- the community itself now posts [D] Am I the only one noticing a drop in quality for this sub? (226). Community self-policing is aggressive: in comment sections, users will demand code links, cite reviewer numbers, and call out methodological flaws with receipts.
Mandatory posting format: Every title MUST be prefixed with a bracketed tag:
[R]= Research (paper/result)[D]= Discussion (opinion, question, meta)[P]= Project (code/demo/tool)[N]= News (announcement, external link)
Combos like [R][P] or [D][R] are accepted. Posts without tags get removed or, if they slip through, get visibly lower engagement. The tag is load-bearing: a tool post tagged [D] will confuse readers; a controversy tagged [P] will look spammy. Choose deliberately.
How this differs from related subs: r/LocalLLaMA is for running models; r/MachineLearning is for understanding them. r/learnmachinelearning rewards sincerity and learning-in-public; r/MachineLearning punishes it. r/artificial and r/singularity are for speculation; r/MachineLearning demands citations. If you wouldn't publish something on arXiv or discuss it at a lab meeting, it doesn't belong here.
3. The All-Time Leaderboard
Median of full dataset: 219. Top-25 threshold: 2,619. Top-10 threshold: 3,510.
| # | Score | Flair | Ratio | Comments | Format | Title |
|---|---|---|---|---|---|---|
| 1 | 8,544 | Project | 0.99 | 198 | VIDEO | [Project] From books to presentations in 10s with AR + ML |
| 2 | 6,378 | Discussion | 0.99 | 136 | VIDEO | [D] 1993 Yann LeCun demo of world's first ConvNet for text |
| 3 | 4,924 | Discussion | 0.96 | 236 | IMAGE | [D] This AI reveals time politicians stare at their phone |
| 4 | 4,896 | Research | 0.97 | 109 | VIDEO | [R] First Order Motion Model applied to animate paintings |
| 5 | 4,807 | News | 0.97 | 230 | VIDEO | [N] AI can turn old photos into moving Images |
| 6 | 4,728 | Discussion | 0.98 | 212 | IMAGE | [D] Types of Machine Learning Papers (meme infographic) |
| 7 | 4,272 | Project | 0.95 | 168 | VIDEO | [P] Robot that punishes me if I procrastinate |
| 8 | 3,924 | Discussion | 0.95 | 566 | TEXT | [D] The ML community has a toxicity problem |
| 9 | 3,725 | Project | 0.99 | 174 | VIDEO | [Project] Lucid Sonic Dreams: GAN art synced to music |
| 10 | 3,510 | Project | 0.98 | 111 | VIDEO | [P] Using oil portraits + First Order Model to animate paintings |
| 11 | 3,457 | Discussion | 0.99 | 75 | VIDEO | [D] CNN Visualization made with Unity 3D |
| 12 | 3,272 | Project | 0.99 | 65 | VIDEO | [P] RL agent air-dribbles in Rocket League clone |
| 13 | 3,079 | Research | 0.93 | 212 | VIDEO | [R] Speech-to-speech translation for unwritten language |
| 14 | 3,060 | Discussion | 0.95 | 448 | TEXT | [D] Our community must get serious about opposing OpenAI |
| 15 | 2,904 | Project | 0.97 | 112 | IMAGE | [P] CLI tool that explains errors using ChatGPT |
| 16 | 2,870 | Project | 0.98 | 60 | VIDEO | [P] Draw/write with hand + webcam using DL |
| 17 | 2,842 | Research | 0.99 | 102 | VIDEO | [R] Consistent Video Depth Estimation (SIGGRAPH 2020) |
| 18 | 2,809 | Research | 0.99 | 146 | VIDEO | [R] RIFE: 15FPS to 60FPS frame interpolation |
| 19 | 2,801 | Research | 0.98 | 104 | IMAGE | [R] Wolfenstein/Doom upscaled to realistic faces with PULSE |
| 20 | 2,794 | Project | 0.97 | 249 | GALLERY | [P] Anti-clickbait YouTube summaries via Instruct GPT |
| 21 | 2,737 | Project | 0.97 | 72 | IMAGE | [P] AI Twitter bot draws people's dream jobs |
| 22 | 2,683 | Discussion | 0.97 | 311 | TEXT | [D] A Super Harsh Guide to Machine Learning |
| 23 | 2,650 | Discussion | 0.97 | 92 | IMAGE | [D] Types of Machine Learning Papers (reposted meme) |
| 24 | 2,620 | Discussion | 0.96 | 406 | IMAGE | [D] An example of ML bias on Popular (locked post) |
| 25 | 2,619 | Discusssion | 0.96 | 217 | TEXT | Should r/ML join the reddit blackout? |
Key observations: 21 of the top 25 are visual (VIDEO/IMAGE/GALLERY). The 4 TEXT posts are all meta-controversies: toxicity, opposing OpenAI, the Super Harsh Guide, and the blackout vote. There is zero "tool launch" content in the top 25 in the traditional sense -- the closest thing (command-line error tool #15) is by jsonathan, a repeat creator, and it's a lightweight dev utility. Note the misspelled flair "Discusssion" appears twice, including the sticky blackout post.
4. Content Type Dominance at Scale
| Flair | Top 25 | Top 50 | Top 100 | All Posts | Avg Score (All) | Avg Ratio | Best Post |
|---|---|---|---|---|---|---|---|
| Project | 9 | 21 | 41 | 90 | 1,061 | 0.90 | [Project] Books→presentations AR+ML (8,544) |
| Discussion | 9 | 13 | 24 | 134 | 541 | 0.89 | [D] 1993 Yann LeCun ConvNet demo (6,378) |
| Research | 5 | 13 | 27 | 80 | 758 | 0.89 | [R] First Order Motion Model (4,896) |
| News | 1 | 2 | 6 | 19 | 926 | 0.96 | [N] AI turns old photos to video (4,807) |
| Discusssion (typo) | 1 | 1 | 2 | 2 | 1,963 | 0.96 | Reddit blackout vote (2,619) |
| (no flair) | 0 | 0 | 1 | 3 | 1,070 | 0.96 | [Project] Texthero (1,475) |
| Shameless Self Promo | 0 | 1 | 1 | 1 | 1,693 | 0.97 | [P] Video data tagging (1,693) |
Surprising finding #1: Discussion is the largest flair (134 posts, 41% of the dataset) but has the LOWEST avg score (541) and the LOWEST avg ratio (0.89). This is a flair that dominates by volume but performs poorly per-post. The 2025-2026 year is flooded with angsty [D] threads about peer review, and most of them score 50-250. Only viral [D] posts hit big, and they're almost always visual memes or meta-drama.
Surprising finding #2: Project posts have the highest avg score (1,061) and the highest score ceiling (8,544), but also the second-highest volume (90 posts). Project isn't just the "launch flair" -- it's the flair that rewards high-production visual demonstrations of novel ML capabilities. 41 of the top 100 posts are [P].
Surprising finding #3: News posts have the best ratio (0.96) and punch above their weight -- 19 posts, avg 926. News is a low-volume, low-friction, steady-performer flair. If you have something factual to share (a release, a leak, a departure), News is safer than Discussion.
The typo "Discusssion" is the highest-performing single flair (avg 1,963) -- but only because both posts happened to be sticky mod announcements. Don't read into it.
5. Content Archetypes That Work
Archetype 1: The Viral Research Demo Video (score ceiling: 8,544)
- Score range: 1,500-8,544
- Examples:
[Project] From books to presentations in 10s with AR + ML(8,544) -- cyrildiagne[D] 1993 Yann LeCun ConvNet for text recognition(6,378)[R] First Order Motion Model applied to animate paintings(4,896)[R] RIFE: 15FPS to 60FPS frame interpolation(2,809)[Project] Lucid Sonic Dreams: GAN art synced to music(3,725)
- The pattern: A 15-60 second hosted video (v.redd.it) showing a specific, legible ML capability doing something concrete. Always has a crossposts count in the double digits (these go viral outside the sub). No talking head. The demo IS the post -- the selftext is empty.
- Why it matters: This is the ONLY archetype that breaks 5,000. If your ceiling is "viral," this is your format. But note: none of these posts are from 2024-2026. The era of pure-demo virality is over. The community has shifted toward drama and meta.
Archetype 2: The Meta-Drama / Community Soul-Search (score ceiling: 3,924)
- Score range: 1,300-3,924
- Examples:
[D] The machine learning community has a toxicity problem(3,924, 566 comments)[D] Our community must get serious about opposing OpenAI(3,060, 448 comments)[D] A Super Harsh Guide to Machine Learning(2,683, 311 comments)[D] Siraj has a new paper: 'The Neural Qubit'. It's plagiarised(2,574, 451 comments)[D] Got burned by an Apple ICLR paper(1,609) -- the top-scoring post of 2025
- The pattern: A long-form TEXT post (500-2000 words) written in the first person, making a principled argument about the state of the field. Structured with numbered grievances or bold headers. Names names. Provides receipts (links, screenshots, openreview IDs). Comments section explodes with C/U ratios above 0.15.
- Why it matters: This is the ONLY archetype still reliably breaking 1,500 in the post-2024 era. If you want current-day visibility, this is the play -- but you must have real evidence, not just vibes. The Apple ICLR whistleblower post worked because the author documented the bug in GitHub issues first.
Archetype 3: The Tool From A Known Contributor (score ceiling: 2,904)
- Score range: 200-2,904
- Examples:
[P] CLI tool that explains errors using ChatGPT(2,904) -- jsonathan[P] I built a chatbot that lets you talk to any Github repository(1,699) -- jsonathan[P] AppleNeuralHash2ONNX: Reverse-Engineered Apple NeuralHash(1,743) -- first reverse engineering[P] OpenAssistant: World's largest open-source ChatGPT replication(1,276) -- ykilcher[P] OpenEvolve: Open Source AlphaEvolve(216) -- asankhs
- The pattern: Shipped open-source code (GitHub link in comments, not title -- the mod-safe way). Either (a) a reproduction of something famous, (b) a reverse-engineering of something proprietary, or (c) a minimal dev tool. jsonathan has 6 top-100 posts with an avg of 1,548 -- the canonical "known tool builder" profile.
- Why it matters: You won't break 2,000 as a first-time poster. But repeat contributors establish reputation; once you do, your tool posts get a visible uplift. Note that none of these include SaaS, pricing, or "sign up for waitlist" language. They are all self-hostable/MIT-licensed.
Archetype 4: The "Types of ML Papers" Infographic Meme (score ceiling: 4,728)
- Score range: 2,413-4,728
- Examples:
[D] Types of Machine Learning Papers(4,728) -- TheInsaneApp[D] Types of Machine Learning Papers(2,650) -- repost by another user[D] Types of Machine Learning Papers(2,413) -- yet another repost[D] This AI reveals politicians staring at phones(4,924)[D] Convolution Neural Network Visualization (Unity 3D)(3,457)
- The pattern: A single-panel image (i.redd.it) that is either (a) a self-aware meme about ML research culture or (b) a polished visualization of an ML concept. Comments read "saving this" and "finally someone said it." Exactly the same "Types of ML Papers" image has been reposted 3 times and scored 4,728 / 2,650 / 2,413 respectively -- the community karma-farms its own memes.
- Why it matters: Fastest path to 2,000+ if you have design skills. But know that the community sees through reposts at this point. Original meme content has a much higher ceiling.
Archetype 5: The Credible Whistleblower (score ceiling: 1,609)
- Score range: 200-1,609
- Examples:
[D] Got burned by an Apple ICLR paper(1,609, 104 comments)[D] Published paper uses hardcoded seed and collapsed model(288, 65 comments)[D] Tsinghua ICLR paper withdrawn due to AI citations(363, 66 comments)[D] ICML: every paper in my review batch contains prompt-injection text(457, 90 comments)[D] 100 Hallucinated Citations Found in 51 NeurIPS 2025 Papers(397, 79 comments)
- The pattern: First-person narrative: "I tried to reproduce X, found bug Y, here's the GitHub issue, here's the OpenReview link." Evidence-rich, not ranty. Always names the paper. Often triggers a secondary news cycle (paper withdrawal, author response).
- Why it matters: The single most respected archetype post-2024. If you're a researcher or reviewer, this is your highest-leverage play. The Apple ICLR post is THE top post of 2025 -- showing that the community's attention has fully rotated from demos to accountability.
Archetype 6: The Serious Technical Deep-Dive Post (score ceiling: ~400)
- Score range: 50-454
- Examples:
[R] LLMs are Locally Linear Mappings(244, 45 comments) -- jamesvoltage, detached Jacobians[R][P] We compress any BF16 model to ~70% size LOSSLESS(200, 27 comments) -- DF11[N] cuML zero code change (scikit-learn on GPU)(454, 25 comments)[R] The Resurrection of the ReLU(235, 63 comments) -- SUGAR surrogate gradients[R] Analysis of 350+ ML competitions in 2025(221) -- mlcontests.com
- The pattern: A detailed technical post (500+ words) describing a novel method, a detailed benchmark, or a carefully-documented practical finding. Always includes arXiv link, GitHub repo, and concrete numbers. The author engages heavily in comments, answering technical questions.
- Why it matters: Your ceiling as an independent researcher is ~400-500. Do not expect viral. Expect a small-but-surgical audience that reads your paper and cites it. The distribution value is reputation-building, not visibility.
6. Format Analysis
| Format | Top 25 | Top 50 | Top 100 | All | % Top 25 | % Top 100 | % All |
|---|---|---|---|---|---|---|---|
| VIDEO | 13 | 27 | 50 | 50 | 52% | 50% | 15% |
| IMAGE | 7 | 13 | 22 | 39 | 28% | 22% | 12% |
| TEXT | 4 | 6 | 21 | 207 | 16% | 21% | 63% |
| GALLERY | 1 | 2 | 2 | 14 | 4% | 2% | 4% |
| LINK | 0 | 2 | 5 | 19 | 0% | 5% | 6% |
| GIF | 0 | 0 | 0 | 0 | 0% | 0% | 0% |
The visual cliff is stark: 84% of the top 25 and 72% of the top 100 are visual (VIDEO+IMAGE+GALLERY), yet only 31% of the full dataset is visual. Every one of the top 50 VIDEO posts is from 2020-2023. The 2024-2026 period is almost entirely TEXT — the community has stopped upvoting demo videos at scale. If you post a video today, your ceiling is probably ~500, not 5,000.
What format to use for what:
- Novel ML result / paper → TEXT (with arXiv link + GitHub in body, never the title). Discussion flair if you want engagement, Research if you want to be taken seriously. Video was the winning format 2020-2023 but has been overtaken by detailed text.
- Reproducibility drama / whistleblower → TEXT. Must be first-person, must have receipts, must link OpenReview/GitHub.
- Tool / library launch → TEXT with flair [P]. Link in comments, not title. Visual preview is optional but helps; don't lead with it. Include benchmarks table.
- Conference review rant → TEXT. Short is fine. Let comments do the work.
- Infographic / meme → IMAGE. Only works if actually original and designed.
- Paper summary / news → TEXT or LINK with [N] flair. LINK is safer than you'd think for News -- ratio averages 0.96.
What makes a good (historical) demo video: Looking at the 2020-2023 video hits:
- Length: 10-30 seconds. Almost every top video post is short. No intros, no outros.
- Hosted on v.redd.it, not YouTube. YouTube links consistently underperform; Reddit's native player inflates engagement.
- Show the transformation, not the training. "Photo → animation" not "loss curve → accuracy plot." The payload is in the final frame contrast.
- No audio narration. Silent demos outperform narrated ones. Music is optional.
- Caption the capability in the title, not the method. "Turn old photos into moving images" > "Applied Motion Model with Latent Warping."
Gallery format: Used in only 14 posts, typically 4-10 images showing before/after comparisons, training curves, or a multi-pane example grid. Best gallery post: [P] I'm using Instruct GPT to show anti-clickbait summaries (2,794).
7. Flair/Category Strategy
Raw performance ranking:
- Project (avg 1,061) -- highest ceiling (8,544), reliable for visual demos
- News (avg 926) -- highest ratio (0.96), low friction, factual
- Research (avg 758) -- reliable mid-performer if you have an actual paper
- Discussion (avg 541) -- highest volume, lowest avg, high variance
Distribution utility ranking (for someone trying to get reach):
- Discussion -- Despite its low avg, this is your best vehicle if you have a credible grievance or meta-observation. 8 of the 25 biggest current-era posts are [D]. The C/U ratio is 0.59 (highest of all flairs) -- meaning Discussion posts generate real conversations.
- Project -- Best for shipping code or a demo. Avg C/U is 0.17 (passive upvotes, not discussion).
- Research -- Best for establishing credibility. Avg C/U is 0.38 (moderate engagement).
- News -- Best if you have a factual announcement and want safety (high ratio).
Title-prefix tag conventions:
[R]Research -- you have a paper, method, or result[D]Discussion -- opinion, question, meta-commentary (do NOT use for tool launches)[P]Project -- code, demo, tool (always include GitHub/repo link in body)[N]News -- announcement, external event, release[Project]also acceptable (long-form spelling, slightly archaic)- Combos:
[R][P],[D][R],[N][P]are fine. Mods are more tolerant than strict.
Ironic flair use: The "Discusssion" typo appears on two sticky mod posts (blackout, reddit API). Some users tag career rants as [R] -- those get removed. Tag-mismatch is a fast removal trigger.
Pricing / commercial language hierarchy (most to least community-friendly):
- Free, open-source, MIT/Apache — Fully welcomed. The only safe category.
- Open-weights models with self-hosted inference — Welcome if accompanied by real eval numbers.
- Freemium with API — Tolerated if the paper/method is the focus and the API is secondary.
- "Try our demo" links — Acceptable only if no login required and the method is novel.
- Paid products / subscriptions — Dead on arrival. Will be removed.
- "Sign up for waitlist" / "join our Discord" — Perma-ban territory.
8. Title Engineering
Deconstructing the top 10:
- "[Project] From books to presentations in 10s with AR + ML" (8,544) -- Specific time ("10s"), concrete input→output ("books → presentations"), specific tech stack ("AR + ML"). You can visualize the demo before clicking.
- "[D] A Demo from 1993 of 32-year-old Yann LeCun showing off the World's first Convolutional Network for Text Recognition" (6,378) -- Celebrity name + historic framing + "World's first." Leans on tribal reverence.
- "[D] This AI reveals how much time politicians stare at their phone at work" (4,924) -- Political lightning rod + "AI reveals" framing. This is tabloid-coded but worked because of the target.
- "[R] First Order Motion Model applied to animate paintings" (4,896) -- Concrete method name + surprising application. The "applied to paintings" twist creates the upvote trigger.
- "[N] AI can turn old photos into moving Images" (4,807) -- Nostalgia + capability demonstration. Zero jargon.
- "[D] Types of Machine Learning Papers" (4,728) -- Meta-humor. Title tells you exactly what the image contains.
- "I made a robot that punishes me if it detects that if I am procrastinating on my assignments [P]" (4,272) -- First-person confessional + absurd outcome. Humor wrapped in ML.
- "[D] The machine learning community has a toxicity problem" (3,924) -- Grievance framing + specific community target. Provokes immediate agree/disagree split.
- "[Project] NEW PYTHON PACKAGE: Sync GAN Art to Music with 'Lucid Sonic Dreams'! (Link in Comments)" (3,725) -- ALL CAPS + named deliverable + "(Link in comments)" (mod-safe pattern).
- "[P] Using oil portraits and First Order Model to bring the paintings back to life" (3,510) -- Method + evocative outcome ("bring paintings back to life").
Title formulas that work:
Formula 1: The Surprising Application
- "[R] First Order Motion Model applied to animate paintings" (4,896)
- "[R] Wolfenstein and Doom Guy upscaled into realistic faces with PULSE" (2,801)
- "[R] Speech-to-speech translation for a real-world unwritten language" (3,079)
- Template:
[Tag] [Known method] applied to [unexpected domain]
Formula 2: The Grievance Manifesto
- "[D] The machine learning community has a toxicity problem" (3,924)
- "[D] I don't really trust papers out of 'Top Labs' anymore" (1,699)
- "[D] The current and future state of AI/ML is shockingly demoralizing" (1,492)
- "[D] Why can't you guys comment your fucking code?" (1,658)
- Template:
[D] [Systemic complaint] [about the field]
Formula 3: The Whistleblower Reveal
- "[D] Got burned by an Apple ICLR paper — it was withdrawn after my Public Comment" (1,609)
- "[D] Siraj has a new paper: 'The Neural Qubit'. It's plagiarised" (2,574)
- "[D] ICML: every paper in my review batch contains prompt-injection text" (457)
- Template:
[D] [Specific paper/actor] [specific wrongdoing]
Formula 4: The First-Person Build Confessional
- "I made a robot that punishes me if it detects that if I am procrastinating on my assignments [P]" (4,272)
- "[P] I trained a GAN to generate photorealistic fake penises" (2,362)
- "[P] I built a chatbot that lets you talk to any Github repository" (1,699)
- Template:
[P] I [built/trained/made] [absurd but legitimate artifact]
Formula 5: The "Link in Comments" Handshake
- "[N] AI can turn old photos into moving Images / Link is given in the comments" (4,807)
- "[R] WHIRL algorithm... (link in comments)" (1,750)
- "[P] Pose Animator: SVG animation tool... (links in comments)" (1,674)
- Template:
[Tag] [Description] (link in comments)-- the mod-safe way to share external links
Formula 6: The Known-Method-With-Numbers
- "[R] RIFE: 15FPS to 60FPS Video frame interpolation" (2,809)
- "[R] SIMPLERECON — 3D Reconstruction — 73ms per frame!" (1,421)
- "[R][P] We compress any BF16 model to ~70% size LOSSLESS" (200)
- Template:
[R] [Method name]: [specific metric improvement]
Title anti-patterns (community-specific):
- No raw arXiv dumps: Rule 5 explicitly bans this.
[R] TriAttention: Efficient KV Cache Compression(10 score, 0.78 ratio) died because the post was a thin wrapper around a link. - No "How to learn ML in 2026": The community ships these to r/learnmachinelearning. Any "roadmap" post gets removed or buried.
- No vague questions:
[D] Best websites for pytorch/numpy interviews(8 score, 0.66 ratio). The community tells you to google it. - No hype adjectives without substance:
[R] The Gamechanger of Performer Attention Mechanism(241 score but 0.85 ratio). "Gamechanger" triggered the bullshit detector. - No vendor branding in titles:
[P] My DC-GAN works better then ever!(292, but 0.94 ratio -- barely). Posts that sound like commercial launches get immediate skepticism. - No "I used ChatGPT to..." content:
[P] I Gave Claude Code 9.5 Years of Health Data(230, 0.89 -- the friction is visible). The community sees "I used Claude" and assumes low effort. - No benchmark numbers without methodology:
[R] 94.42% on BANKING77 Official Test Split(0 score, 0.25 ratio). The community suspects test leakage.
9. Engagement Patterns
C/U ratios by flair:
| Flair | Avg C/U | Interpretation |
|---|---|---|
| Discussion | 0.59 | Discussion-generating -- people reply to each other, not just vote |
| Research | 0.38 | Moderate discussion -- researchers ask technical questions |
| Discusssion (typo) | 0.34 | Same as Discussion |
| Project | 0.17 | Passive upvoting -- people star the repo, move on |
| News | 0.16 | Passive -- "cool, saved" |
| (no flair) | 0.11 | Very passive |
| Shameless Self Promo | 0.03 | Almost no engagement |
If your goal is VISIBILITY (broadest reach): Post a [Project] with a VIDEO (historical ceiling) or a [D] meta-complaint (current-era ceiling). Both can reach 2,000+ upvotes in the right conditions.
If your goal is RELATIONSHIPS and discussion: Post a [D] long-form grievance or a [R] with a paper and open questions. These are where DMs come from; these are where collaborators reach out.
If your goal is REPUTATION-BUILDING for long-term distribution: Post technical [R] and [P] content consistently. Build a known username like jsonathan (6 top-100 posts, avg 1,548) or Illustrious_Row_9971 (12 top-100 posts, avg 1,954). After ~5 quality posts, your ceiling rises.
Highest-discussion topics (most comments relative to score):
- Conference review mega-threads:
[D] - NeurIPS 2025 Decisions(201 score / 1,008 comments, C/U=5.0),[D] ICLR 2026 Paper Reviews(188 / 839, C/U=4.5),[D] - NeurIPS'2025 Reviews(237 / 912, C/U=3.8),[D] ICML 2026 Review Discussion(125 / 633, C/U=5.1). These are seasonal grief-processing threads that generate low scores but enormous comment volumes. - "OpenAI hate" threads:
[D] Does anybody else despise OpenAI?(1,538 / 429, C/U=0.28). Controversial but generates huge debate. - Career pity:
[D] Ph.D. from top Europe university, 10 NeurIPS papers, 0 Big Tech interviews(472 / 154, C/U=0.33) - Whistleblower threads: Every "I found a fraud" post generates 60-200 comments even at moderate scores.
- Meta-critique threads:
[D] Am I the only one noticing a drop in quality for this sub?(226 / 79, C/U=0.35)
10. What Gets Downvoted
Notable low-ratio posts (< 0.85):
| Title | Score | Ratio |
|---|---|---|
| [D] Has industry effectively killed off academic ML research in 2026? | 173 | 0.82 |
| [D] Has "AI research lab" become completely meaningless as a term? | 73 | 0.80 |
| [D] How do ML engineers view vibe coding? | 53 | 0.83 |
| [P] Vibecoded on a home PC: neural chess engine | 81 | 0.72 |
| [R] The Gamechanger of Performer Attention Mechanism | 241 | 0.85 |
| [R] Controlled experiment: LLM agent with CS papers for HP search | 51 | 0.82 |
| [D] Why I abandoned YOLO for safety-critical plant ID | 37 | 0.74 |
| [P] Weight Norm Clipping Accelerates Grokking 18-66× | 62 | 0.84 |
| [D] - 1M tokens/second serving Qwen 3.5 27B on B200 GPUs | 22 | 0.77 |
Ratio tiers:
- Above 0.94 (safe): 176 posts (54%). Universally well-received -- technical content with evidence, factual news, genuine research.
- 0.85-0.94 (friction): 100 posts (30%). Net positive but with visible pushback. Meta-complaints about the field, controversial opinions, posts that name-check LLMs without being about LLMs.
- Below 0.85 (controversial/hostile): 53 posts (16%). The bottom of the barrel -- failed launches, vague questions, suspected slop, LLM-flavored posts, and career complaints that come across as entitled.
Anti-patterns (community-specific):
-
The "Vibecoded" Tell -- Any post that uses the word "vibecoded" or admits to "I asked ChatGPT to..." gets a visible ratio drop.
[P] Vibecoded on a home PC: building a ~2700 Elo browser-playable neural chess engine(81 score, 0.72 ratio) -- the highest-friction post in the current-era dataset. The community interprets "vibecoded" as "I don't understand my own code." Avoid this word. -
The Hype Adjective Trigger -- "Gamechanger," "revolutionary," "state of the art," "breakthrough" in the title without extraordinary evidence. The community has seen hundreds of these and is immunized.
[R] The Gamechanger of Performer Attention Mechanism(241, 0.85) is textbook. -
The Unverified Benchmark Post -- Posting a benchmark number without methodology, code, or reproducibility is treated as suspected fraud.
[R] 94.42% on BANKING77 Official Test Split(0, 0.25) -- the community assumes test-set leakage and downvotes on sight. -
The "I'm Tired" Career Rant Without Specifics -- Career complaints without technical substance get cold-shouldered.
[D] ICML 2026 am I cooked?(21, 0.78),[D] Rejected a Solid Offer Waiting for My 'Dream Job'(197, 0.86) -- even the successful ones show friction. -
The LLM Slop Suspicion -- Posts that read like they were written by ChatGPT -- overly structured, bullet-heavy, emoji-laden ("🚀"), generic phrases -- trigger mass downvoting regardless of actual origin. The community has developed a hair-trigger for this. Even when the underlying content is legitimate (e.g.,
[N] cuML zero code change 🚀still got 454, 0.99 -- but only because NVIDIA is a known entity). -
The Self-Promo Waitlist -- Any post that ends with "sign up for the waitlist" or "follow us for updates" is dead on arrival. The dataset has zero successful examples.
-
The "Is ACL / NeurIPS / venue X a scam" Meta-Complaint --
[D]NLP conferences look like a scam..(267, 0.92),[D] Bad Industry research gets cited(266, 0.94). These work at a modest level but rarely break 500 because the community sees them as repetitive grievance-posting.
No formal blacklist, but informal memory is long:
There is no published hall-of-shame, but moderator decisions and community memory are aggressive. Siraj Raval drama persisted for months (1,285 / 1,357 / 2,574 scores across multiple posts). Usernames that push self-promo get called out in comments. The best defense is to post as a person with a visible history (GitHub, prior comments, arXiv), not as a throwaway.
11. The Distribution Playbook
Phase 1: Pre-launch (build presence)
- Read the rules. Seriously. Rule 2 (self-promo) and Rule 3 (marketing) will get you banned. Rule 5 (no bare arXiv) will get you removed.
- Lurk for 2 weeks. Read the 2025-2026 top 100 posts, not the 2020 greatest hits. The community has changed.
- Comment on 10+ technical threads. Answer questions, cite papers, demonstrate depth. The community can check your username history and this matters.
- Ensure your artifact is shippable. GitHub repo with a README. Runnable code. Reproducible results. Benchmark numbers against a real baseline.
- Do NOT post a tool that's behind a signup form. Kill the login wall first, or don't post.
Phase 2: Launch day
- Choose the right flair. Default to [P] for tools, [R] for papers, [D] for opinion, [N] for announcements. When in doubt, [P] + [R] combo is safe.
- Write the title with Formula 1, 4, or 6 (Surprising Application / First-Person Build / Method-with-Numbers). Avoid hype adjectives.
- Post body structure:
- 1 paragraph: what it is and why it matters
- 1 section: how it works (method, with real technical detail)
- 1 section: results with a table
- 1 section: honest limitations ("batch size 1 latency is ~40% slower")
- Links: GitHub (always), paper (if research), demo (optional)
- No marketing language. No "revolutionary." No "state of the art" unless you have the numbers. No "sign up."
- Timing: The dataset doesn't show strong timing effects, but posts from 12:00-17:00 UTC on Tuesdays-Thursdays dominate the top 100. Weekends are slower.
- Length: Text posts that work are 300-1500 words. Shorter gets dismissed as low-effort; longer gets TL;DR'd.
Phase 3: First 24-48 hours
- Engage every technical question in the first 4 hours. This is the single highest-leverage activity. Comment C/U ratios are strongest for posts where the OP is visibly present.
- Acknowledge limitations. When someone says "what about X edge case," reply with a specific honest answer. The community rewards humility and punishes defensiveness.
- Pre-emptively answer the "Is this just LLM slop?" question. If your post is technical, lead with a concrete method section. If your post is opinion, lead with personal first-person context.
- Watch the ratio at the 4-hour mark. If ratio < 0.90, the post is in trouble -- expect mod attention. If ratio > 0.95, you're safe to engage and expand.
- Don't cross-post. The community notices. Post once, engage there, leave it alone.
Phase 4: Ongoing presence
- Follow up with progress posts every 4-8 weeks. Don't do weekly "update" spam.
- Participate in other people's threads. The community's credibility graph is author-based -- if you're only present in your own posts, you're seen as a self-promoter.
- Build a second-post reputation. Your first [P] might score 200; your fifth, with a now-known GitHub handle, can score 1,500+ for the same category of work.
- Avoid the LLM content trap. Do not post anything that looks like a ChatGPT summary. If you use LLM assistance for writing, heavily edit to remove the telltale patterns (bullet structure, emoji, "let's dive in," "in conclusion").
Community-specific comment strategy templates:
- "Is this vibe-coded?" → "No. The core [component] is hand-written PyTorch/Jax. I did use Claude for [specific task like docstrings or test generation]. Here's the commit history showing the development: [link]."
- "Where's the code?" → "GitHub link is in the post body: [link]. Reproducibility script is in
/examples/benchmark.py." - "Doesn't [X existing tool] already do this?" → "Yes, and [X] is great. The specific difference is [concrete technical delta]. Here's a comparison table: [table]. If [X] works for you, use it."
- "Did you test on [hard benchmark]?" → "Not yet -- my compute budget was ~$[X] on [GPU]. I'd love to see results on [benchmark] if anyone has compute. Here's how to reproduce: [link]."
- "Is this peer-reviewed?" → "No, this is an open-source [tool/method] release. I'm planning to submit to [venue] once I have [additional experiments]."
Stealth distribution tactics:
- Answer technical questions on other people's threads with "I ran into this recently while building [your tool] -- found that [technical insight]." Link only if explicitly asked. This builds name recognition without triggering self-promo filters.
- Participate in conference mega-threads (NeurIPS reviews, ICLR discussion). These have 500-1000+ comments and are where repeat names get noticed. Mention your work only when directly relevant.
- Be a helpful citation -- when someone asks "has anyone implemented X?" and you have, link your repo. This is accepted distribution.
- Post methodology critiques of bad papers. If done with receipts, these go viral (see Archetype 5). Your name gets associated with rigor.
Score-tier calibration:
- Tool launches in the post-2024 era: realistic ceiling is 500-1,500. 2,000+ requires a known username or a genuinely novel result. Do not expect 5,000.
- Research papers: 200-800 is typical. 1,500+ only for whistleblower angles or major lab announcements.
- Discussion / meta posts: 200-1,000 is typical. 1,500+ requires a credible grievance with receipts.
- Visual demos: Ceiling has collapsed. Historical 5,000+ videos from 2020-2023 would likely score 500-1,500 today.
Post-publication measurement:
- First 1 hour, < 50 upvotes: Normal for technical posts. Don't panic.
- First 4 hours, ratio < 0.88: The post has friction. Review the comments -- address pushback directly.
- First 4 hours, ratio < 0.80: The post is actively disliked. Usually means hype adjectives, missing code link, or slop suspicion. Sometimes salvageable with edits; often not.
- First 12 hours, < 100 upvotes on [P] or [R]: Post is dead. The ML news cycle moves fast; it will not recover.
- First 24 hours, 300+ upvotes and ratio > 0.94: Strong hit. Engage every comment over the next 48 hours. This is when reputation is built.
- 500+ upvotes: The comment section becomes the most valuable asset -- likely collaborators and critics worth responding to.
12. Applying This to Any Project
Pre-launch checklist (12 items):
- Artifact is on GitHub, public, with a working README
- No login wall, no waitlist, no signup form on any demo link
- Benchmark numbers are real and reproducible (provide the script)
- Title uses one of the 6 formulas; no hype adjectives
- Title has a proper bracket tag: [R], [D], [P], or [N]
- Post body is 300-1500 words with a limitations section
- Post is NOT a bare arXiv link (Rule 5 violation)
- Code link is in the body, not the title
- You have a visible comment history in the sub (not a throwaway)
- No "vibe coded," "revolutionary," "game changer" language
- No emoji in title, minimal emoji in body
- You have 4 hours of availability after posting to respond to comments
Scenario-based launch guides:
Scenario A: You built a free, open-source tool
- Formula:
[P] I built [specific tool name] for [specific ML task]or[P] [Tool name]: [specific capability with number] - Flair: [P]
- Body structure: What it is (1 para) → method/architecture (1 section) → benchmark table → comparison to existing tools → GitHub link → honest limitations
- Key risk: Comments section dissecting your baseline comparison. Mitigation: pre-empt with "I compared against [X, Y, Z] using [exact protocol]; here's the reproducibility script."
Scenario B: You have a research paper to share
- Formula:
[R] [Method name]: [surprising result]or[R] [Known method] applied to [unexpected domain] - Flair: [R]
- Body structure: TL;DR (1 para) → the contribution (1 para) → key experimental results (table) → arXiv link → GitHub link → what you'd like feedback on
- Key risk: Reviewers will call out any missing ablations, unfair comparisons, or unreleased code. Mitigation: release the code first. Then post.
Scenario C: You found a bug/fraud in a published paper
- Formula:
[D] [Specific paper/venue] [specific problem I found] - Flair: [D]
- Body structure: Backstory (1 para) → what I tried to reproduce → what I found → the evidence (screenshots, GitHub issues, OpenReview links) → author response if any → call to action
- Key risk: Authors or their supporters will counter-attack. Mitigation: file a GitHub issue FIRST, let them have a chance to respond, then post. The post lands harder when you can show "I gave them 6 days to respond."
Scenario D: You want to discuss a meta-issue about the field
- Formula:
[D] [Systemic observation or grievance] - Flair: [D]
- Body structure: The observation → specific examples → why it matters → what you'd like to discuss
- Key risk: Gets dismissed as "another conference rant." Mitigation: bring specific data (numbers of submissions, actual review scores, dated examples). The difference between 500 and 1,500 is whether you have receipts.
Scenario E: Your project was built with AI assistance
- Formula: Lead with the technical contribution, not the tooling. NEVER use the word "vibecoded."
- Flair: [P]
- Body structure: Same as Scenario A, but with an explicit "How I used LLMs in development" paragraph that is honest about what LLMs wrote vs. what you wrote. The community rewards transparency; it punishes concealment.
- Key risk: The 0.72 ratio of the "Vibecoded neural chess engine" post. Mitigation: frame as "I wrote the training loop and novel components; I used Claude Code for [specific tasks]."
Cross-posting guidance (reframing for other subs):
- On r/MachineLearning: Lead with method, benchmarks, and reproducibility.
- On r/LocalLLaMA: Lead with model size, quantization, and inference speed on consumer hardware.
- On r/learnmachinelearning: Lead with the learning journey and educational value; frame as "I learned X by building Y."
- On r/programming: Lead with the software engineering aspects -- architecture, language, pipeline.
- On r/OpenAI or r/ClaudeAI: Lead with the AI tool integration and prompt engineering learnings.
- On r/sideproject or r/buildinpublic: Lead with the product story -- user pain, iteration, shipping.
Critical distinction: r/MachineLearning's reader is not r/sideproject's reader. If you're posting the same tool to both, the r/MachineLearning version must strip all product-marketing language, emphasize technical novelty over utility, and include benchmark numbers that would never appear in a r/sideproject post. Do NOT cross-post literally; re-author each version.
Final calibration: If you are a first-time poster launching an open-source ML tool in 2026, expect a realistic ceiling of 300-800 upvotes. If you're a known contributor with prior posts, 1,000-2,000 is possible. If you have a credible whistleblower story about a top-lab paper, 1,500+ is achievable. If you're hoping for 5,000, you need to either (a) time-travel to 2021 with a GAN demo video or (b) have something genuinely unprecedented. The community has become quieter, more skeptical, and more demanding -- but its attention, when earned, is still the most valuable in all of ML Reddit.