Apple Depth Pro
I was really excited to read about Apple's Depth Pro model especially seeing the examples of fine detail snd how it compared favorably to DepthProV2 (which I think is already amazing), and with the addition of massive speed gains over these other models - but in reality I've found it incredibly inconsistent, often completely wrong, and exactly the same speed as DepthProV2. I'm just wondering if other people have found similar experiences? There's not a great deal in the way of settings so I don't think I can be doing much wrong but perhaps it's the quality of the original images not being high enough?
As examples, it does often get pin sharp details for lines and some areas of someone's coat or clothes, but I often see a "halo" around a subject that simply isn't an area of different Depth to the background. I am also mainly interested in using it for stereoscopic imagery and when converting Depth map + 2D image to a stereoscopic image this reveals massive holes and areas that are completely inconsistent or wrong. Perhaps the model is mainly designed for different purposes such as robotics or image detection though as well? Even viewed simply as a depth map I csn see I'm not getting results comparable with the original authors, however.
I'd be interested to hear how other people are finding it!
/r/computervision
https://redd.it/1fxl2jd
D Monthly Who's Hiring and Who wants to be Hired?
For Job Postings please use this template
>Hiring: [Location\], Salary:[\], [Remote | Relocation\], [Full Time | Contract | Part Time\] and [Brief overview, what you're looking for\]
For Those looking for jobs please use this template
>Want to be Hired: [Location\], Salary Expectation:[\], [Remote | Relocation\], [Full Time | Contract | Part Time\] Resume: [Link to resume\] and [Brief overview, what you're looking for\]
​
Please remember that this community is geared towards those with experience.
/r/MachineLearning
https://redd.it/1ftdkmb
D Fellow ML Practitioners, who do you go to when you are stuck on an ML problem?
Btw, not posting in the "Simple Questions Thread" because I believe even someone with formal ML knowledge may benefit from this.
I'm curious to know how you get new ideas and validate them if you are stuck on something you haven't worked on before. I'm in a similar boat, and while my team at work has experts in other fields, there's no senior MLE as such.
It doesn't have to be a person, I'm keen to know any sources you refer to as well.
/r/MachineLearning
https://redd.it/1fqb1t1
D Monthly Who's Hiring and Who wants to be Hired?
For Job Postings please use this template
>Hiring: [Location\], Salary:[\], [Remote | Relocation\], [Full Time | Contract | Part Time\] and [Brief overview, what you're looking for\]
For Those looking for jobs please use this template
>Want to be Hired: [Location\], Salary Expectation:[\], [Remote | Relocation\], [Full Time | Contract | Part Time\] Resume: [Link to resume\] and [Brief overview, what you're looking for\]
​
Please remember that this community is geared towards those with experience.
/r/MachineLearning
https://redd.it/1f5cy0v
D Self-Promotion Thread
Please post your personal projects, startups, product placements, collaboration needs, blogs etc.
Please mention the payment and pricing requirements for products and services.
Please do not post link shorteners, link aggregator websites , or auto-subscribe links.
--
Any abuse of trust will lead to bans.
Encourage others who create new posts for questions to post here instead!
Thread will stay alive until next one so keep posting after the date in the title.
--
Meta: This is an experiment. If the community doesnt like this, we will cancel it. This is to encourage those in the community to promote their work by not spamming the main threads.
/r/MachineLearning
https://redd.it/1f63rhf
Yolov8 free alternatives
I'm currently using Yolov8 for some object detection and classification tasks. Overall, I like the accuracy and speed. But it is licensed. What are some free alternatives to it that offers both detection and classification?
/r/computervision
https://redd.it/1eyet6m
Project I Created the Definitive AUTOMATIC Shiny Hunter for Pokémon BDSP
Hey everyone! I am Dinones! I coded a Python program using object detection that lets my computer hunt for shiny Pokémon on my physical Nintendo Switch while I sleep. So far, I’ve automatically caught shiny Pokémon like Giratina, Dialga or Azelf, Rotom, Drifloon, all three starters, and more in Pokémon BDSP. Curious to see how it works? Check it out! The program is available for everyone! Obviously, for free; I'm just a student who likes to program this stuff in his free time :)
The games run on a Nintendo Switch (not emulated, a real one). The program gets the output images using a capture card, then, it process them to detect whether the pokemon is shiny or not (OpenCV). Finally, it emulates the joycons using bluetooth (NXBT) and control the Nintendo. Also works on a Raspberry Pi!
I don't make money with this, I just feel my project can be interesting for lot of people.
📽️ Youtube: https://www.youtube.com/watch?v=84czUOAvNyk
🤖 Github: https://github.com/Dinones/Nintendo-Switch-Pokemon-Shiny-Hunter
https://preview.redd.it/7jbe6fdxrijd1.png?width=1920&format=png&auto=webp&s=626c801925fb0769f59e62ece09f0e00b18b828e
https://preview.redd.it/2h2alqcxrijd1.png?width=1920&format=png&auto=webp&s=fddd11c5c04c58268bbaf0e8bca0fd7081a7f775
/r/MachineLearning
https://redd.it/1evp3wz
Convince me to learn C++ for computer vision.
PLEASE READ THE PARAGRAPHS BELOW
HI everyone. Currently I am at the last year of my master and I have good knowledge about image processing/CV and also deep learning and machine learning. I plan to pursue a career in computer vision (currently have a job on this field). I have some c++ knowledge and still learning but not once I've came across an application that required me to code in c++. Everything is accessible using python nowadays and I know all those tools are made using c/c++ and python is just a wrapper. I really need your opinions to gain some insight regarding the use cases of c/c++ in practical computer vision application. For example Cuda memory management.
/r/computervision
https://redd.it/1epl53b
RPC — A New Way to Build Language Models
Article: jpmag7/rpc-language-modeling-by-relevant-precedence-compression-3d09bb4f23e6">RPC — A New Way to Build Language Models
One of the reasons I really like software engineering in general is because anyone can do almost anything with just a computer. But when it comes to Al and specifically LLMs you need a tone of resources and money to do anything interesting by yourself.
So recently I've been trying to find a way to build language models with far less training data and far less compute. RPC is my closest attempt at that. It compresses the prompt into a vector representation and then performs a search in a vector database to find the most appropriate next token. It works remarkably well.
I haven't got the time to properly evaluate and test it yet. That's why I'm sharing this with the community, in the hope that someone will give some feedback or even try to replicate it. I'd love for you to take a look at the article and share some thoughts here.
/r/deeplearning
https://redd.it/1ehp00w
How do researchers come up with these ideas?
Hi everyone. I have a question which is tickling my mind for a while now and I was hoping maybe you can help me. How do cv researchers come up with their ideas? I mean I have read over 100 cv papers (not much I know) but every single time I asked myself how? How is this justified? For example in object detection I've read Yolo v6, all I saw was that they experimented so many configuration with little to no insight, the same goes to most other papers, I mean yes I can understand why focal loss or arcface might help learning procedure but I cannot understand how traversing feature pyramid top to bottom or bottom to top or bidirectional or etc might help when there is no proper justification provides. Where is the intuition? I read a paper, the author stated that we fuse only top layers of FP together and bottom layers together and it works, why? How? I am really confused specially since started to work on my thesis. Which is about object detection.
/r/computervision
https://redd.it/1e8k928
Transfer Learning vs. Fine-tuning vs. Multitask Learning vs. Federated Learning
/r/deeplearning
https://redd.it/1e31jgt
N The 2024 Nobel Prize in Chemistry goes to the people Google Deepmind's AlphaFold. One half to David Baker and the other half jointly to Demis Hassabis and John M. Jumper.
Announcement: https://twitter.com/NobelPrize/status/1843951197960777760
/r/MachineLearning
https://redd.it/1fznxyr
R Were RNNs All We Needed?
https://arxiv.org/abs/2410.01201
The authors (including Y. Bengio) propose simplified versions of LSTM and GRU that allow parallel training, and show strong results on some benchmarks.
/r/MachineLearning
https://redd.it/1fvg7qr
How long does it take for you to read and understand a typical paper?
It takes me quite a long time to fully understand a typical computer vision paper. I usually need to revisit sections multiple times and research different topics to absorb everything.
I’m curious—how long does it take for others? Does your experience in computer vision or related fields affect how quickly you grasp these papers? Share how you approach them and how long it takes you!
/r/computervision
https://redd.it/1fsclsh
Deep learning developers, what are you doing?
Hello all,
I've been a software developer on computer vision application for the last 5-6 years (my entire carreer work). I've never used deep learning algorithms for any applications, but now that I've started a new company, I'm seeing potential uses in my area, so I've readed some books, learned the basics of teory and developed my first application with deep learning for object detection.
As an enterpreneur, I'm looking back on what I've done for that application in a technical point of view and onestly I'm a little disappointed. All I did was choose a model, trained it and use it in my application; that's all. It was pretty easy, I don't need any crazy ideas for the application, it was a little time consuming for the training part, but, in general, the work was pretty simple.
I really want to know more about this world and I'm so excited and I see opportunity everywhere, but then I have only one question: what a deep learning developer do at work? What the hundreads of company/startup are doing when they are developing applications with deep learning?
I don't think many company develop their own model (that I understand is way more complex and time consuming compared to what i've done), so what else are they doing?
I'm pretty sure I'm missing something very important, but i can't really understand what! Please help me to understand!
/r/computervision
https://redd.it/1fnfhec
D Self-Promotion Thread
Please post your personal projects, startups, product placements, collaboration needs, blogs etc.
Please mention the payment and pricing requirements for products and services.
Please do not post link shorteners, link aggregator websites , or auto-subscribe links.
--
Any abuse of trust will lead to bans.
Encourage others who create new posts for questions to post here instead!
Thread will stay alive until next one so keep posting after the date in the title.
--
Meta: This is an experiment. If the community doesnt like this, we will cancel it. This is to encourage those in the community to promote their work by not spamming the main threads.
/r/MachineLearning
https://redd.it/1fh23n3
The fact that sony only gives out sensor documentation under an NDA makes me hate them so much.
People resort to reverse engineering for fucks sake: https://github.com/Hermann-SW/imx708regsannotated
Sony: "Oh you want to check if it's possible to enable HDR before you buy? Haha go fuck yourself! We want you to waste time calling a salesperson, signing an NDA, telling us everything about your application(which might need another NDA), and then maybe we'll give you some documentation if we deem you worthy"
Fuck companies that put documentation behind sales reps.
I mean seriously, why is it so fucking hard to find an embeddable/industrial camera that supports HDR? Arducam and Basler are just as bad. They use sensors which Sony claims to have built in HDR, but do these companies fucking tell you how to enable it? Nope! Which means it might not be possible at all, and you won't know until you buy it.
/r/computervision
https://redd.it/1f9qljk
Best data labeling tools (covering all modality, industry and team sizes)
I have been working on several computer vision project in last couple of years and found labeling the biggest bottleneck.
Based on my experience so far and exploring different tools throughout, I made a list of tools for each category.
If you fall in any of these category, I'm sure one of tool will fit in your bill.
Top tool for computer vision task with limited budget or low scale requirement-
1. Labellerr
2. Roboflow
3. Supervisely
4. Scale Rapid (used to be)
5. Clarifai
Top open source tool
1. CVAT
2. Labelme
3. Labelimg
Best open source dicom tool
1. 3D Slicer
Top RLHF service and tools (which include global resources)
1. iMerit
2. Scale AI
Best tool for segmentation
1. Segments AI
2. Superannotate
3. Labellerr
Top provider for tool with manual team
1. Labellerr (They have good team and easy UI with solid QC mechanism)2.iMerit (There manual team is good but tool has some issue)
2. Appen (For bigger project)
Top tool for text labeling
1. Kili
2. V7
3. UBIAI
/r/computervision
https://redd.it/1f4bhv5
D Self-Promotion Thread
Please post your personal projects, startups, product placements, collaboration needs, blogs etc.
Please mention the payment and pricing requirements for products and services.
Please do not post link shorteners, link aggregator websites , or auto-subscribe links.
--
Any abuse of trust will lead to bans.
Encourage others who create new posts for questions to post here instead!
Thread will stay alive until next one so keep posting after the date in the title.
--
Meta: This is an experiment. If the community doesnt like this, we will cancel it. This is to encourage those in the community to promote their work by not spamming the main threads.
/r/MachineLearning
https://redd.it/1euyfi6
D Monthly Who's Hiring and Who wants to be Hired?
For Job Postings please use this template
>Hiring: [Location\], Salary:[\], [Remote | Relocation\], [Full Time | Contract | Part Time\] and [Brief overview, what you're looking for\]
For Those looking for jobs please use this template
>Want to be Hired: [Location\], Salary Expectation:[\], [Remote | Relocation\], [Full Time | Contract | Part Time\] Resume: [Link to resume\] and [Brief overview, what you're looking for\]
​
Please remember that this community is geared towards those with experience.
/r/MachineLearning
https://redd.it/1egc1um
I wish this “AI is one step from sentience” thing would stop
The amount of YouTube videos I’ve seen showing a flowchart representation of a neural network next to human neurons and using it to prove AI is capable of human thought...
I could just as easily put all the input nodes next to the output, have them point left instead of right, and it would still be accurate.
Really wish this AI doomsaying would stop using this method to play on the fears of the general public. Let’s be honest, deep learning is no more a human process than JavaScript if/then statements are. It’s just a more convoluted process with far more astounding outcomes.
/r/deeplearning
https://redd.it/1els27c
D Simple Questions Thread
Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instead!
Thread will stay alive until next one so keep posting after the date in the title.
Thanks to everyone for answering questions in the previous thread!
/r/MachineLearning
https://redd.it/1ee9dra
RAG evaluation framework
Hi,
I am looking for some good resources for RAG evaluation.
/r/LanguageTechnology
https://redd.it/1eawoog
AI Study GroupAny willing study partners to create some group to learn architectures, implement them, discuss them and create some application level projects?D
Basically, I am interested in learning and discussing architectures and implementing them and doing some projects. I prefer to make a group where we get productive, share our learnings, teach each other and have some accountability.
Rather than experts, I would love to connect with those who are intermediate with ML and DL architectures, and are willing to explain and implement things they are interested in. Any country, any age.
If anyone is willing to, please feel free to DM or comment.
Do mention your expertise level and your areas of interest!
/r/MachineLearning
https://redd.it/1e59xvu
D Self-Promotion Thread
Please post your personal projects, startups, product placements, collaboration needs, blogs etc.
Please mention the payment and pricing requirements for products and services.
Please do not post link shorteners, link aggregator websites , or auto-subscribe links.
--
Any abuse of trust will lead to bans.
Encourage others who create new posts for questions to post here instead!
Thread will stay alive until next one so keep posting after the date in the title.
--
Meta: This is an experiment. If the community doesnt like this, we will cancel it. This is to encourage those in the community to promote their work by not spamming the main threads.
/r/MachineLearning
https://redd.it/1dx5tpo