Hot data science related posts every hour. Chat: https://telegram.me/r_channels Contacts: @lgyanf
Trying to build computer vision to track ultimate frisbee players… what tools should I use?
https://redd.it/1k0bi9b
@datascientology
I built a biomedical GNN + LLM pipeline (XplainMD) for explainable multi-link prediction
https://redd.it/1jvuz3b
@datascientology
ML Data Linguist Interview - Coding
Hello all, first post here. I'm having a second set of interviews next week for an Amazon ML Data Linguist position after having a successful first phone interview last week. I'll start right away with the problem: I do not know how to code. I made that very clear in the first phone interview but I was still passed on to this next set of interviews, so I must have done/said something right. Anyway, I've done research into how these interviews typically go, and how much knowledge of each section one should have to prepare for these interviews, but I'm just psyching myself out and not feeling very prepared at all.
My question in its simplest form would be: is it possible to get this position with my lack of coding knowledge/skills?
I figured this subreddit would be filled with people with that expertise and wanted to ask advice from professionals, some of whom might be employed in the very position I'm applying for. I really value this opportunity in terms of both my career and my life and can only hope it goes well from here on out. Thanks!
/r/LanguageTechnology
https://redd.it/1jpruz4
D Why is table extraction still not solved by modern multimodal models?
There is a lot of hype around multimodal models, such as Qwen 2.5 VL or Omni, GOT, SmolDocling, etc. I would like to know if others made a similar experience in practice: While they can do impressive things, they still struggle with table extraction, in cases which are straight-forward for humans.
Attached is a simple example, all I need is a reconstruction of the table as a flat CSV, preserving empty all empty cells correctly. Which open source model is able to do that?
https://preview.redd.it/krox7ytlhvre1.png?width=1650&format=png&auto=webp&s=5daa7f68f4acc55f4bdac3b2defa21b9ebfae0d9
/r/MachineLearning
https://redd.it/1jnjfaq
NVidia DGX Spark preorders open, jjust preordered!
/r/deeplearning
https://redd.it/1jeic2l
Simple Tool for Annotating Temporal Events in Videos with Custom Categories
Hey Guys, I built TAAT (Temporal Action Annotation Toolkit),a web-based tool for annotating time-based events in videos. It’s super simple: upload a video, create custom categories like “Human Actions” with subcategories (e.g., “Run,” “Jump”) or “Soccer Events” (e.g., “Foul,” “Goal”), then add timestamps with details. Exports to JSON, has shortcuts (Space to pause,Enter to annotate), and timeline markers for quick navigation.
Main use cases:
Building datasets for temporal action recognition .
Any project needing custom event labels fast.
It’s Python + Flask, uses Video.js for playback, and it’s free on GitHub here. Though this might be helpful for anyone working on video understanding.
https://preview.redd.it/yq6t83tjw3oe1.png?width=1366&format=png&auto=webp&s=c10f5593df671b0c93d539bfe6419770157f28de
/r/computervision
https://redd.it/1j8ybrx
Should I fork and maintain YOLOX and keep it Apache License for everyone?
Latest update was 2022... It is now broken on Google Colab... mmdetection is a pain to install and support. I feel like there is an opportunity to make sure we don't have to use freakin Ultralytics... who's trying to make money on open-source research.
10 YES and I repackage it and keep it up-to-date...
LMK!
/r/computervision
https://redd.it/1izuh6k
YOLOv12: Algorithm, Inference and Custom Data Training
https://youtu.be/1YZDsZL_VyI
/r/computervision
https://redd.it/1itnedo
RT-DETRv2: Is it possible to use it on Smartphones for realtime Object Detection + Tracking?
Any help or hint appreciated.
For a research project I want to create an App (Android preferred) for realtime object detection and tracking. It is about detecting person categorized in adults and children. I need to train with my own dataset.
I know this is possible with Yolo/ultralytics.
However I have to use Open Source with Apache or MIT license only.
I am thinking about using the promising RT-Detr Model (small version) however I have struggles in converting the model into the right format (such as tflite) to be able to use it on an Smartphones.
Is this even possible? Couldn't find any project in this context.
Plan B would be using MediaPipe and its pretrained efficient model with finetuning it with my custom data.
Open for a completely different approach.
So what do you recommend me to do?
Any roadmaps to follow are appreciated.
/r/computervision
https://redd.it/1iqunlw
D What happened to SSMs and linear attentions?
Someone who is upto date with this area of research can summarize what is current state of SSMs and softmax attention alternatives? Are they used in cusomer focused models yet or are still in research? Does their promise only appears to be in benchmarks on a paper? or are the hardware accelerators have etched the attention so that it is fully juiced up and using SSMs or linear attention alternatives only provide marginal gains which does appeal with the level of complexity in them?
/r/MachineLearning
https://redd.it/1in9y30
D Monthly Who's Hiring and Who wants to be Hired?
For Job Postings please use this template
>Hiring: [Location\], Salary:[\], [Remote | Relocation\], [Full Time | Contract | Part Time\] and [Brief overview, what you're looking for\]
For Those looking for jobs please use this template
>Want to be Hired: [Location\], Salary Expectation:[\], [Remote | Relocation\], [Full Time | Contract | Part Time\] Resume: [Link to resume\] and [Brief overview, what you're looking for\]
​
Please remember that this community is geared towards those with experience.
/r/MachineLearning
https://redd.it/1ie5qoh
[D] Which software tools do researchers use to make neural net architectures like this?
/r/MachineLearning
https://redd.it/1ig6k3l
The scale vs. intelligence trade-off in retrieval augmented generation Discussion
Retrieval Augmented Generation (RAG) has been huge in the past year or two as a way to supplement LLMs with knowledge of a particular set of documents or the world in general. I've personally worked with most flavors of RAG quite extensively and there are some fundamental limitations with the two fundamental algorithms (long-context, and embedding) which almost all flavors of RAG are built on. I am planning on writing a longer and more comprehensive piece on this, but I wanted to put some of my thoughts here first to get some feedback and see if there are any perspectives I might be missing.
Long-context models (e.g. Gemini), designed to process extensive amounts of text within a single context window, face a critical bottleneck in the form of training data scarcity. As context lengths increase, the availability of high-quality training data diminishes rapidly. This is important because of the neural scaling laws, which have been remarkably robust for LLMs so far. There is a great video explaining them here. One important implication is that if you run out of human-generated training data, the reasoning capabilities of your model are bottle-necked no matter how many other resources or tricks you throw at the problem. This paper provides some nice empirical support for this idea. Across all of the "long-context" models the reasoning capabilities decrease dramatically as the context length increases.
A graph I generated based on one of the main tables in the paper showing how reasoning capabilities degrade as context length increases.
Embeddings based RAG has much better scalability but suffers from some pretty serious issues with high-level reasoning tasks. Here is a small list from this paper:
https://preview.redd.it/huig4ipulufe1.png?width=967&format=png&auto=webp&s=62743d60ba1c9162c9e1bf5ff6d05af20d577868
The authors also have a nice statement as to the core reason why towards the beginning of the paper:
>
This structural limitation is particularly problematic when dealing with documents that require deep understanding and contextual interpretation such as a complex book. Often there will not only be an important internal structure to each document, but also an important meta-structure across documents (think of scientific papers that cite specific portions of other scientific papers). There are tricks like using knowledge graphs that try to get around some of these issues, but they can only do so much when the fundamental method shreds any structure the documents might have had before any of the secondary steps even begin.
The scalability limitations of long-context, and the reasoning limitations of embedding, lead to an important trade-off for anyone building a RAG system. Long-context models excel in creativity and complex reasoning but are limited to small document sets due to training data constraints. Conversely, embeddings-based approaches can handle vast corpuses but function more like enhanced search engines with minimal reasoning abilities. For many tasks, this trade-off is fine as the task already fits well on one side or the other of the trade-off. Many other tasks however, are simply not easily achievable with SoTA RAG methods due to the fact that they require both large amounts of documents and advanced reasoning over these documents.
/r/MachineLearning
https://redd.it/1ick63j
Feb 4 - Best of NeurIPS Virtual Event
[Register for the virtual event.](https://voxel51.com/computer-vision-events/best-of-neurips-feb-4-2025/)
I have added a second date to the Best of NeurIPS virtual series that highlights some of the groundbreaking research, insights, and innovations that defined this year’s conference. Live streaming from the authors to you.
Talks will include:
* [No "Zero-Shot" Without Exponential Data](https://arxiv.org/abs/2404.04125) \- Vishaal Udandarao at University of Tuebingen
* [Understanding Bias in Large-Scale Visual Datasets](https://arxiv.org/abs/2412.01876) \- Boya Zeng at University of Pennsylvania
* [Map It Anywhere: Empowering BEV Map Prediction using Large-scale Public Datasets ](https://arxiv.org/abs/2407.08726)\- Cherie Ho, Omar Alama, and Jiaye Zou at Carnegie Mellon University
/r/computervision
https://redd.it/1i8f9y1
System Design resources for building great CV products
Hi all,
It seems like there are many resources for system design for regular developer based roles. However, I'm wondering if there are any good books/resources that can help one get better in designing systems around computer vision. I'm specifically interested in building scalable CV systems that involve DL inference. Please give your inputs.
Also, what are typically asked in a system design interview for CV based roles? Please tell, thank you.
/r/computervision
https://redd.it/1i35ysu
have some unused compute, giving it away for free!
I have 4 A100s, waiting to go brrrr 🔥 ..... I have some unused compute, so if anyone has any passion project, and the only hinderance is compute, hmu let's get you rolling.
just ask these questions to yourself before:-
\- can your experiment show some preliminary signals in let's say 100 hours of A100s?
\- is this something new? or recreation of some known results? (i would prefer the former)
\- how is this going to make world a better place?
i don't expect you to write more than 2 lines for each of them.
/r/deeplearning
https://redd.it/1jypq8n
Interspeech 2025 Author Review Phase (April 4th)
Just a heads-up that the Author Review phase for Interspeech 2025 starts!!!
Wishing the best to everyone!
Share your experiences or thoughts below — how are your reviews looking? Any surprises?
Let’s support each other through this final stretch!
/r/LanguageTechnology
https://redd.it/1jrh6q8
Part 2: Fork and Maintenance of YOLOX - An Update!
Hi all!
After my post regarding YOLOX: https://www.reddit.com/r/computervision/comments/1izuh6k/should\_i\_fork\_and\_maintain\_yolox\_and\_keep\_it/ a few folks and I have decided to do it!
Here it is: https://github.com/pixeltable/pixeltable-yolox.
I've already engaged with a couple of people from the previous thread who reached out over DMs. If you'd like to get involved, my DMs are open, and you can directly submit an issue, comment, or start a discussion on the repo.
So far, it contains the following changes to the base YOLOX repo:
`pip install`able with all versions of Python (3.9+)
New YoloxProcessor
class to simplify inference
Refactored CLI for training and evaluation
Improved test coverage
The following are planned:
CI with regular testing and updates
Typed for use with mypy
This fork will be maintained for the foreseeable future under the Apache-2.0 license.
Installpip install pixeltable-yolox
Inferenceimport requests
from PIL import Image
from yolox.models import Yolox, YoloxProcessor
url = "https://raw.githubusercontent.com/pixeltable/pixeltable-yolox/main/tests/data/000000000001.jpg"
image = Image.open(requests.get(url, stream=True).raw)
model = Yolox.from_pretrained("yolox_s")
processor = YoloxProcessor("yolox_s")
tensor = processor([image])
output = model(tensor)
result = processor.postprocess([image], output)
See more in the repo!
/r/computervision
https://redd.it/1jp0o48
Advice on career change
Hi, I’m about to finish my PhD in Linguistics and would like to transition into industry, but I don’t know how realistic it would be with my background.
My Linguistics MA was mostly theoretical. My PhD includes corpus and experimental data, and I’ve learnt to do regression analysis with R to analyse my results. Overall, my background is still pretty formal/theoretical, apart from the data collection and analysis side of it. I also did a 3-month internship in a corpus team, it involved tagging and finding linguistic patterns, but there was no coding involved.
I feel some years ago companies were more interested in hiring linguists (I know linguists who got recruited by apple or google), but nowadays it seems you need to come from coputer science, mahine learning or data science.
What would you advice me to do if I want to transition into insustry after the PhD?
/r/LanguageTechnology
https://redd.it/1jhxxdo
D Self-Promotion Thread
Please post your personal projects, startups, product placements, collaboration needs, blogs etc.
Please mention the payment and pricing requirements for products and services.
Please do not post link shorteners, link aggregator websites , or auto-subscribe links.
\--
Any abuse of trust will lead to bans.
Encourage others who create new posts for questions to post here instead!
Thread will stay alive until next one so keep posting after the date in the title.
\--
Meta: This is an experiment. If the community doesnt like this, we will cancel it. This is to encourage those in the community to promote their work by not spamming the main threads.
/r/MachineLearning
https://redd.it/1j1hc0o
I Just Open-Sourced the Viral Squish Effect! (see comments for workflow & details)
/r/deeplearning
https://redd.it/1j7ny3l
Real world applications of 3D Reconstruction and Vision
With the rapid growth of 3D reconstruction and 3D Vision technologies, I'm very interested in learning about their practical applications across different industries. What business solutions are currently utilizing these techniques effectively? I'm also curious about your imagination of where these technologies might lead us in the future.
I'd appreciate hearing about real-world implementation examples, emerging use cases, and speculative future applications..
/r/computervision
https://redd.it/1iy2jcn
I suck at programming and I feel so bad
I failed an introductory programming exam (Python) at university and honestly, it made me feel really stupid and inadequate.
I come from a BA in pure linguistics in Germany and I had taken a programming course on Codecademy last year ( still during my BA), but after that, I hadn’t touched Python at all.
Plus, the course at my MSc was terribile, after covering functions it focused almost entirely on regex, which I had never worked with before.
On top of that, I had a lot of other exams to prepare for, so I barely studied and did very little practice. I do enjoy programming—I’ve gone over the “theory” multiple times—but I struggle to remember concepts and apply critical thinking when trying to solve problems. I lack hands-on experience. If you asked me to write even the simplest program, I wouldn’t know where to start.
I mean, at the exam I couldn’t even figure out, recall, how to invert a string or how to join 2 dictionaries…
I had problems in saving a file in Visual studio Code on a different laptop.
I felt so dumb and not suited for this path.
While, most of my colleagues were just great at programming and did fine at the exam.
It feels like I’m just memorizing code rather than truly understanding how to use it.
This whole experience has been pretty discouraging because I know how important programming skills are in this field—especially when there are people with computer science degrees who have been coding since high school.
So now I don’t know where to start. As I said I’ve read the theory multiple times ( how to join dicyionaries, what are functions and hoe they work etv..) bit then if you put me a concrete problem to solbe, even a very dumb one, i dont knkw where to star5t.
That said, I’m currently taking an NLP and ML course at university, which requires basic programming knowledge. So I was thinking of following a hands-on NLP course that also covers regex. That way, I could improve my programming skills while reinforcing what I’m studying now.
Or would it be better to start from the basics of Python again maybe going thru tutorials once again and focusing on practice ?
/r/LanguageTechnology
https://redd.it/1isgphw
NAACL 2025 Decision
The wait is almost over, and I can't contain my excitement for the NAACL 2025 final notifications!
Wishing the best of luck to everyone who submitted their work! Let’s hope for some great news!!!!!
/r/LanguageTechnology
https://redd.it/1i6sbwy
Deepseek’s AI model is ‘the best work’ out of China but the hype is ‘exaggerated,’ Google Deepmind CEO says
https://www.cnbc.com/2025/02/09/deepseeks-ai-model-the-best-work-out-of-china-google-deepmind-ceo.html
/r/deeplearning
https://redd.it/1ilw5bu
huawei's ascend 910c chip matches nvidia's h100. there will be 1.4 million of them by december. don't think banned countries and open source can't reach agi first.
recently the world was reminded about sam altman having said "it’s totally hopeless to compete with us on training foundation models." he was obviously trying to scare off the competition. with deepseek r1, his ploy was exposed as just hot air.
you've probably also heard billionaire-owned news companies say that china is at least a few years behind the united states in ai chip development. they say that because of this, china and open source can't reach agi first. well, don't believe that self-serving ploy either.
huawei's 910c reportedly matches nvidia's h100 in performance. having been tested by baidu and bytedance, huawei will make 1.4 million of them in 2025. 910c chips sell for about $28,000 each, based on reports of an order of 70,000 valued at $2 billion. that's about what nvidia charges for its h100s.
why is this such awesome news for ai and for the world? because the many companies in china and dozens of other countries that the us bans from buying nvidia's top chips are no longer at a disadvantage. they, and open source developers, will soon have powerful enough gpus to build top-ranking foundation ai models distilled from r1 at a very low cost that they can afford. and keep in mind that r1 already comes in at number 3 on the chatbot arena leaderboard:
https://lmarena.ai/?leaderboard
if an open source developer gets to agi first, this will of course be much better for the world than if one of the ai giants beats them there. so don't believe anyone who tells you that china, or some other banned country, or open source, can't get to agi first. deepseek r1 has now made that both very possible and very affordable.
/r/deeplearning
https://redd.it/1ihecl0
Why does the DeepSeek student model (7B parameters) perform slightly better than the teacher model (671B parameters)? D
This is the biggest part of the paper that I am not understanding - knowledge distillation to match the original teacher model's distribution makes sense, but how is it beating the original teacher model?
/r/MachineLearning
https://redd.it/1ie46nq
D Monthly Who's Hiring and Who wants to be Hired?
For Job Postings please use this template
>Hiring: [Location\], Salary:[\], [Remote | Relocation\], [Full Time | Contract | Part Time\] and [Brief overview, what you're looking for\]
For Those looking for jobs please use this template
>Want to be Hired: [Location\], Salary Expectation:[\], [Remote | Relocation\], [Full Time | Contract | Part Time\] Resume: [Link to resume\] and [Brief overview, what you're looking for\]
​
Please remember that this community is geared towards those with experience.
/r/MachineLearning
https://redd.it/1hq5o1z
The Great ChatGPT o1 pro Downgrade Nobody’s Talking About
Let’s talk about what’s happening with OpenAI’s $200/month o1 pro tier, because this is getting ridiculous.
Remember when you first got access? The performance was incredible. Complex analysis, long documents, detailed code review - it handled everything brilliantly. Worth every penny of that $200/month premium.
Fast forward to now:
Can’t handle long documents anymore
Loses context after a few exchanges
Code review capability is a shadow of what it was
Complex tasks fail constantly
And here’s the kicker: OpenAI never published specifications, disabled their own token counting tool for o1 pro, and provided no way to verify anything. Convenient, right?
Think about what’s happening here:
Launch an amazing service
Get businesses hooked and dependent
Quietly degrade performance
Keep charging premium prices
Make it impossible to prove anything changed
We’re paying TEN TIMES the regular ChatGPT Plus price ($200 vs $20), and they can apparently just degrade the service whenever they want, without notice, without acknowledgment, without any way to verify what we’re actually getting.
This isn’t just about lost productivity or wasted money. This is about a premium service being quietly downgraded while maintaining premium pricing. It’s about a company that expects us to pay $200/month for a black box that keeps getting smaller.
What used to take 1 hour now takes 4. What used to work smoothly now requires constant babysitting. Projects are delayed, costs are skyrocketing, and we’re still paying the same premium price for what feels like regular ChatGPT with a fancy badge.
The most alarming part? OpenAI clearly knows about these changes. They’re not accidental. They’re just counting on the fact that without official specifications or metrics, nobody can prove anything.
This needs to stop.
If you’re experiencing the same issues, make some noise. Share this post. Let them know we notice what’s happening. We shouldn’t have to waste our time documenting their downgrades while paying premium prices for degraded service.
OpenAI: if you need to reduce capabilities, fine. But be transparent about it and adjust pricing accordingly. This silent downgrade while maintaining premium pricing isn’t just wrong - it’s potentially fraudulent.
/r/LanguageTechnology
https://redd.it/1i56cmx
Guide to Making the Best Self Driving Dataset
https://medium.com/voxel51/how-to-make-the-best-self-driving-dataset-c2170cb47bff
/r/computervision
https://redd.it/1i1cki1