How much time do Ai/Ml engineer spend doing Coding?
I have been learning ML for 6 months but I haven't done any serious big project. I have only done small projects like, next word prediction, sentiment analysis, etc.. I have a question about ml and dl. How much time in a company do ai and ml engineer spend on coding and most of the time what they do? What they spend their time on most?
/r/deeplearning
https://redd.it/1bkb95p
Stuck in a loop while Learning ML/Deep Learning and I feel whatever I have done can be replaced by a basic chatgpt lol
A little bit about myself,
I love data. Since childhood I have loved collecting and storing and playing with data, especially once I learnt Excel in my childhood. Hence I really feel this field suits me more than the webdev etc
The thrill while training a model is unparalleled to any other thing that I've ever done.
Coming to my current condition, I have a basic exposure in Deep Learning where I have done a couple of internship where I did transfer learning to train a model to classify a specific type of object.
I used chatgpt and google to guide me and trained the model, with an accuracy of 90% approx. Another project of mine involved predicting a image improvement (densenet, squeezenet) and (random forest, SVM) algorithm parameters based on features like skewness, contrast etc where I used random forest and outputs were satisfactory.
Again I used chatgpt and just trained, it seemed very rudimentary as I didn't know how to improve the model by myself.
I wouldn't say I learnt nothing but at the same time idk I still feel like a noob in Machine learning
Btw I have placements in a few months, hence I'm even more worried that should I continue to work on ML or just make basic MERN projects and get placed first.
Now the thing is, I don't have much idea about neural networks, or any maths behind it.
I built a NLP project, Twitter sentiment (i know very basic YouTube project) I understood the entire process and loved the new learning but I don't know ANYTHING about LSTM
I'm learning basics of ML now, like regression, KNN etc but Idk the maths behind it.
How should A guy like me proceed to learn Machine Learning and gradually move towards LLM/other topics
This isn't a normal roadmap cuz I have already worked on stuff but I'm still clueless.
Can someone please please guide me as to how can I proceed from here, and the stuff that I've done is that good enough for a fresher in ML or should I improve in some other direction.
Edit: I'm 20 years old. And a newbie in this field.
Edit: I have taken subjects like Algebra, Calculus, Probability in my uni. So I know the maths basics. It's just that I'm learning the maths related to ML now.
/r/deeplearning
https://redd.it/1b8thzw
How do I start with RAG?
I want to learn RAG. Is there any resources out there where I can learn RAG from basic to advanced from? Like starting with simple exercises then gradually increasing the difficulty.
/r/LanguageTechnology
https://redd.it/1b3ns1z
R The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
https://arxiv.org/abs/2402.17764
Abstract
>Recent research, such as BitNet, is paving the way for a new era of 1-bit Large Language Models (LLMs). In this work, we introduce a 1-bit LLM variant, namely BitNet b1.58, in which every single parameter (or weight) of the LLM is ternary {-1, 0, 1}. It matches the full-precision (i.e., FP16 or BF16) Transformer LLM with the same model size and training tokens in terms of both perplexity and end-task performance, while being significantly more cost-effective in terms of latency, memory, throughput, and energy consumption. More profoundly, the 1.58-bit LLM defines a new scaling law and recipe for training new generations of LLMs that are both high-performance and cost-effective. Furthermore, it enables a new computation paradigm and opens the door for designing specific hardware optimized for 1-bit LLMs.
/r/MachineLearning
https://redd.it/1b22izk
Switch from Classic Computer vision to Deep Learning
I develop classic Computer Vision algorithms on C++ and want to switch to Deep Learning area.
For now, I see these options:
Plan #1:
1. Join a small HW/mobile startup
2. Integrate/optimize neural networks (this usually requires C++)
3. Gradually move to NN finetuning.
This may be possible if NN training and integration is made by the same team. That's why the company should be small.
Plan #2:
1. Learn leetcode
2. Apply to FAANG-like companies for SE in test role to support AI/scientific infrastructure
3. Try to switch from testing utils to NN finetuning.
Plan #3:
1. Learn DL theory on excellent level
2. Join intern/junior position in bigtech
3. Hope to get a better offer for middle position from them
My skills:
C++: 5 YoE as a main programming language
Python: 2 YoE for auxiliary utils
DL: 1 year of commercial experience, utilized a few networks as black boxes
DL theoty: few online courses
Math: good level
Extra: 5 YoE in Classic Computer Vision
​
So, I have a couple of questions:
1. What do you think about these plans?
2. Maybe, you can share some relevant experience?
3. Do you have other ideas how to switch to DL with my experience?
4. Should I do pet projects to tell about them on job interview when the interviewer asks me about my DL experience?
​
I would also be happy to hear your stories if you did similar transfers.
Thanks in advance!
/r/computervision
https://redd.it/1avnksk
R P 10 times faster LLM evaluation with bayesian optimization
Recently I've been working on making LLM evaluations fast by using bayesian optimization to select a sensible subset.
Bayesian optimization is used because it’s good for exploration / exploitation of expensive black box (paraphrase, LLM).
Project link
I would love to hear your thoughts and suggestions on this!
/r/MachineLearning
https://redd.it/1apv97t
Is overfitting always a bad thing?
As I understand, overfitting occurs when a model learns noise in the training data, so that it performs on training data higher than validation data. Overfitting is bad because overfit models do not generalize well on unseen data. So we use early stopping to prevent overfitting.
Now, I am training a CNN for image classification. At first, till the training accuracy reaches 95%, I see the same trend in validation accuracy. So till this point, there is no overfitting. But as I train the model from 95% to 99%, validation accuracy moves from 95% to 96%. By definition, this is overfitting, but the validation performance of the model is still improving. Is this kind of overfitting also considered bad?
/r/deeplearning
https://redd.it/1alyiic
D Why do current LLMs work well in discrete space but not in continuous space?
One interesting observation is that LMs are trained to predict tokens over a categorical distribution and then a sampling algorithm is used to discretize the distribution to produce an output. If we try this in a continuous domain, e.g., predict pixels directly with L2 loss, it doesn't work, the output gets very blurry. It seems that the descretization via sampling is crucial to make things work during inference. Recent papers like GIVT can model the output as a gaussian mixture instead of a categorical distribution, but sampling is still necessary to make it work.
I'm sure this isn't some new observation, are there any resources out there that can help explain why this is the case?
/r/MachineLearning
https://redd.it/18vfj7k
D Do we really know how token probability leads to reasoning? For example, when we give GPT4 a riddle and it selves it using non-intuitive logic, how is that happening?
GPT4 can solve the below very basic riddle/question with ease.
Example riddle: You have a cup and a ball. You place the ball on the table and place the cup over the ball. You then place the cup on the kitchen counter. Where is the ball?
Answer: It's still on the original table of course.
How does a probability engine know that reasoning?
/r/MachineLearning
https://redd.it/18q9ucf
D What are 2023's top innovations in ML/AI outside of LLM stuff?
What really caught your eye so far this year? Both high profile applications but also research innovations which may shape the field for decades to come.
/r/MachineLearning
https://redd.it/18hnh8p
PubMedBERT Embeddings - Semantic search and retrieval augmented generation for medical literature
https://huggingface.co/NeuML/pubmedbert-base-embeddings
/r/LanguageTechnology
https://redd.it/17y4a5s
How do you keep up with the advancements with/in LLMs?
I would like to know how everyone can stay (almost) up-to-date with the advancements in LLMs? Do you follow survey papers/any blogs/newsletters/channels etc?
/r/LanguageTechnology
https://redd.it/18akb71
Converting multiview images to 3d model
Hi, I'm looking for a solution to convert multi-view images that look like this:
https://preview.redd.it/wbquazk95i4c1.png?width=1762&format=png&auto=webp&s=5a9ded99bc0eb8891e73f7519317a7aeae572044
into 3D models (obj, glb, etc.)
I'm pretty sure I can do this with NeRF but I'm fairly new to the field (coming from a classic software engineer background) so I'm looking for suggestions and some advice on this topic.
I've also been suggested to use COLMAP for this purpose but from what I understand COLMAP seems to be a different kind of tool that is meant to reconstruct 3d from multiple poses / camera pictures.
I'd also welcome any existing projects that could help me save some time doing so. Thanks!
/r/computervision
https://redd.it/18bftod
We've programmed our DIY smartwatch to take the wheel and steer the Space Rover around 🚀🌌
/r/computervision
https://redd.it/186q043
Do you regret getting into computer vision?
If so, why and what CS specialization would you have chosen? Or maybe a completely different major?
If not, what do you like the most about your job?
/r/computervision
https://redd.it/1bcwawr
Can old people learn and get hired?
I am 71 with an all but phd dissertation math background, several years of teaching experience (up through Calculus and Prob/Stat). My programming skills are modest but improving. I have taken a number of machine learning and deep learning courses on Coursera and done quite well. Is it possible for me to get a bachelor’s or master’s degree in computer science or data analytics online and then get a job with an AI company?
If not, what are the best ways to make a positive impact on the field?
I am not in this for the big bucks, as I am comfortably retired, but rather to show that it can be done.
/r/deeplearning
https://redd.it/1b7mf2m
Why is ViT more commonly used than SWIN?
I'm still reading around but most every Computer Vision paper I read uses ViT as their backbone instead of SWIN or other similar architectures but why?
​
The ViT paper had to pre-train their model on the 303M image JFT dataset to beat earlier convolutional models on ImageNet whereas SWIN achieves better performance without any pre-training. I imagine SWIN would achieve comparable, if not higher performance on ImageNet if it was pre-trained the same way though admittedly I haven't seen any work to validate this idea.
​
Is this just a case of ViT being first so now everyone uses it as a default or is there another reason?
/r/computervision
https://redd.it/1b3be51
D Simple Questions Thread
Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instead!
Thread will stay alive until next one so keep posting after the date in the title.
Thanks to everyone for answering questions in the previous thread!
/r/MachineLearning
https://redd.it/1aob7zi
Transfer Learning vs. Fine-tuning vs. Multitask Learning vs. Federated Learning
/r/deeplearning
https://redd.it/1au2avn
Is there a proof of convergence for any transformer model? D | R
I'm interested in proofs of convergence surrounding NLP problems in general, and transformers in particular. Can anyone point me to a paper that proves an NLP machine learning model converges to anything at all?
I see some validation in the practice of NLP, but I'm struggling to intuit a target to converge to when trying to prove that a transformer converges to anything at all.
/r/MachineLearning
https://redd.it/1aol1tp
How to get out the loneliness of research career
Hi guys, I'm a ms student and I'm doing research under the guidance with a new AP in my university. H's a good person, has strong capability, and he always encourages us to follow our own thoughts. However, since he is a new AP, our lab only have 2 ms students including me and 0 phd student.
I feel like overwhelmed by the loneliness. When I come across problems, such as the derivation of formulas and some research stuff, I can't find a friend to ask. My only way of acquiring information from other people is my weekly meeting with my professor.
I tried to talk with other phds in my univ but they usually don't work on my field, i.e., unsupervised learning and world models stuff. One impressive moment is one day I deployed some docker apps to our lab server just for fun. I wanted to cheer for that but couldn't find anyone to talk with.
I'm determined to pursue a phd career but I can't stand for the loneliness and pressure (from both research and coursework since I'm a ms). The instructions given by professor is rather free because he doesn't want to be so rigorous and hopes to inspire our enthusiasm towards research, whereas I need to publish one paper before my phd application so that I can be competitive in the applicantion pool. However, even if I become a phd in those big schools, I'm afraid that I'll keep repeating this lonely life for 5 years, and I'll spend my life before my 30 as an upset person :(
Sorry for these cliche, I just can't hold myself.
/r/deeplearning
https://redd.it/1ak1hmn
D 3 years doing ML, no success yet. Is it common?
I'm working in ML research for 1.5 years now, more specifically medical imaging and previously as a DL Engineer for building a facial recognition pipeline. Despite a good understanding and all my focus I'm yet to make a good enough system or model for all many use cases I worked on.
From last 4 months I'm exploring 'learning from noisy label' I worked on 3 techniques, spent considerate time integrating target loaders but results were poor, even worse than baseline.
Previously, made a failed attempt to make a system identification using hybrid adaptive algorithm scheme but approach failed. Did write a technical report on that.
Also, on the otherhand, I do participate in online competition. Vanilla methods get me top 10-20% but when I try to improve on it, I always fail. None of my method work well, super frustrating despite all efforts.
I'm not trying to build a state-of-art model, but atleast expect myself to get over the previous baselines or work of any significance.
/r/MachineLearning
https://redd.it/1aeq9pz
Open Source Multicamera Calibration in a GUI: pyxy3d
/r/computervision
https://redd.it/18rcwy4
D Deep dive into the MMLU ("Are you smarter than an LLM?")
After all the hubbub around the MMLU (for example my article) I thought I would make an interface for seeing how humans do versus even middle of the pack LLM. It's called Are You Smarter Than An LLM?
It presents you random questions from the MMLU and compares your answers to the LLM. Click the "what is this" button at the bottom for more details on how it works.
Feedback appreciated!
/r/MachineLearning
https://redd.it/18ntia7
KL divergence
Why do we use KL divergence in VAEs, it seems to me that we could just minimize the cross entropy between our models distribution and the Gaussian (or whatever distribution we choose for latent space). Is it simply because KL divergence has a minimum of 0 and we already know the entropy of the Gaussian? I guess more what I am getting at is how do we choose between kl divergence and cross entropy as a loss function if minimizing one always minimizes the other?
/r/deeplearning
https://redd.it/18k82sv
3D Pose Estimation of Two Interacting Hands from a Monocular Event Camera
/r/computervision
https://redd.it/18fvkv5
D Thoughts on Mamba?
I ran the NanoGPT of Karparthy replacing Self-Attention with Mamba on his TinyShakespeare Dataset and within 5 minutes it started spitting out the following:
​
https://preview.redd.it/4r96tp6lxx4c1.png?width=836&format=png&auto=webp&s=10f2f61cd4cea96f4f903cb2070835fc5d1df951
​
https://preview.redd.it/32ler5vnxx4c1.png?width=622&format=png&auto=webp&s=dd00e53f43dd0afa058758a987901ee6789d2258
​
https://preview.redd.it/sc96i4xoxx4c1.png?width=678&format=png&auto=webp&s=94d2ed279054363d3ed2b6beed65be89468582b0
So much faster than self-attention, and so much smoother, running at 6 epochs per second. I'm honestly gobsmacked.
https://colab.research.google.com/drive/1g9qpeVcFa0ca0cnhmqusO4RZtQdh9umY?usp=sharing
​
Some loss graphs:
Multihead attention without truncation\(x is iterations in 10s, and y is loss\)
Multihead attention with truncation\(x is iterations in 10s, and y is loss\)
Mamba loss graph\(x is iterations in 10s, and y is loss\)
​
​
/r/MachineLearning
https://redd.it/18d65bz
After two years of self-study, my first independent paper: Cross-Axis Transformer with 2D Rotary Embeddings
https://arxiv.org/pdf/2311.07184v1.pdf
/r/computervision
https://redd.it/189bo8i
A good modern textbook to get me up to speed on NLP in Python?
Hey everyone,
I have an MS in Statistics, but the focus was not on NLP - more classical models with a little machine learning. I'm not sure what's hip in the NLP circles, but I don't want to go down a bunch of rabbit holes trying to find out. Any suggestions?
/r/LanguageTechnology
https://redd.it/17xa49p