datascientology | Образование

Telegram-канал datascientology - Data Scientology

1234

Hot data science related posts every hour. Chat: https://telegram.me/r_channels Contacts: @lgyanf

Подписаться на канал

Data Scientology

Machine Learning Books that emphasize MATH?

Hi all! So far, the best machine learning book that I've come across is ISLP (Introduction to Statistical Learning in Python/R). There is also a book by Dr. Manel Martinez-Ramon that is set to publish in October that I've eagerly waiting for (took his class, failed it massively, still think he is one of the coolest dudes ever). In the meantime, I'm looking for any books that REALLY help consolidate the mathematical learning into a single resource as best as possible, with references for further reading when necessary. Has anyone come across a deep learning book that is LESS concerned with programming and MORE concerned with the mathematical structures behind the deep learning processes? (ISLP is a great machine learning resource but only has one chapter on deep learning...)

/r/deeplearning
https://redd.it/1cx8ilz

Читать полностью…

Data Scientology

R Our new classification algorithm outperforms CatBoost, XGBoost, LightGBM on five benchmark datasets, on accuracy and response time


Hi All!

We're happy to share LinearBoost, our latest development in machine learning classification algorithms. LinearBoost is based on boosting a linear classifier to significantly enhance performance. Our testing shows it outperforms traditional GBDT algorithms in terms of accuracy and response time across five well-known datasets.
The key to LinearBoost's enhanced performance lies in its approach at each estimator stage. Unlike decision trees used in GBDTs, which select features sequentially, LinearBoost utilizes a linear classifier as its building block, considering all available features simultaneously. This comprehensive feature integration allows for more robust decision-making processes at every step.

We believe LinearBoost can be a valuable tool for both academic research and real-world applications. Check out our results and code in our GitHub repo: https://github.com/LinearBoost/linearboost-classifier . The algorithm is in its infancy and has certain limitations as reported in the GitHub repo, but we are working on them in future plans.

We'd love to get your feedback and suggestions for further improvements, as the algorithm is still in its early stages!

/r/MachineLearning
https://redd.it/1cqv5y4

Читать полностью…

Data Scientology

D ICLR Outstanding Paper Awards. Congratulations!

**Vision Transformers Need Registers**
Timothée Darcet, Maxime Oquab, Julien Mairal, Piotr Bojanowski

Abstract: Transformers have recently emerged as a powerful tool for learning visual representations. In this paper, we identify and characterize artifacts in feature maps of both supervised and self-supervised ViT networks. The artifacts correspond to high-norm tokens appearing during inference primarily in low-informative background areas of images, that are repurposed for internal computations. We propose a simple yet effective solution based on providing additional tokens to the input sequence of the Vision Transformer to fill that role. We show that this solution fixes that problem entirely for both supervised and self-supervised models, sets a new state of the art for self-supervised visual models on dense visual prediction tasks, enables object discovery methods with larger models, and most importantly leads to smoother feature maps and attention maps for downstream visual processing.

**Generalization in diffusion models arises from geometry-adaptive harmonic representations**
Zahra Kadkhodaie, Florentin Guth, Eero P Simoncelli, Stéphane Mallat

Abstract: Deep neural networks (DNNs) trained for image denoising are able to generate high-quality samples with score-based reverse diffusion algorithms. These impressive capabilities seem to imply an escape from the curse of dimensionality, but recent reports of memorization of the training set raise the question of whether these networks are learning the “true” continuous density of the data. Here, we show that two DNNs trained on non-overlapping subsets of a dataset learn nearly the same score function, and thus the same density, when the number of training images is large enough. In this regime of strong generalization, diffusion-generated images are distinct from the training set, and are of high visual quality, suggesting that the inductive biases of the DNNs are well-aligned with the data density. We analyze the learned denoising functions and show that the inductive biases give rise to a shrinkage operation in a basis adapted to the underlying image. Examination of these bases reveals oscillating harmonic structures along contours and in homogeneous regions. We demonstrate that trained denoisers are inductively biased towards these geometry-adaptive harmonic bases since they arise not only when the network is trained on photographic images, but also when it is trained on image classes supported on low-dimensional manifolds for which the harmonic basis is suboptimal. Finally, we show that when trained on regular image classes for which the optimal basis is known to be geometry-adaptive and harmonic, the denoising performance of the networks is near-optimal.

**Learning Interactive Real-World Simulators**
Sherry Yang, Yilun Du, Seyed Kamyar Seyed Ghasemipour, Jonathan Tompson, Leslie Pack Kaelbling, Dale Schuurmans, Pieter Abbeel

Abstract: Generative models trained on internet data have revolutionized how text, image, and video content can be created. Perhaps the next milestone for generative models is to simulate realistic experience in response to actions taken by humans, robots, and other interactive agents. Applications of a real-world simulator range from controllable content creation in games and movies, to training embodied agents purely in simulation that can be directly deployed in the real world. We explore the possibility of learning a universal simulator (UniSim) of real-world interaction through generative modeling. We first make the important observation that natural datasets available for learning a real-world simulator are often rich along different axes (e.g., abundant objects in image data, densely sampled actions in robotics data, and diverse movements in navigation data). With careful orchestration of diverse datasets, each providing

Читать полностью…

Data Scientology

Which NLP-master programs in Europe are more cs-leaning?

I'm (hopefully) going to finish my bachelors degree in Computational Linguistics and English Studies in Germany (FAU Erlangen-Nürnberg, to be precise) next year and I'm starting to look into masters programs. As much as I love linguistics, thinking about job perspectives I want to do a program that is much heavier on the computer science aspects than the linguistic ones. I sadly haven't been able to take any math courses and I doubt I'd be able to finish the ones you would have with a normal cs degree before finishing my studies, I do however have programming experience in Python and Java and I've also worked with Neural Networks before.

I'd like to stay in Europe and I also can't afford places like Edinburgh with those absurd tuition fees (seriously, 31k? who can afford that?). I know Stuttgart is supposed to be good, Heidelberg too, although I don't know how cs-heavy that is considering it's a master of arts. I've also heard about this European Erasmus Mundus LCT Program, although I wonder how likely it would be to get a scholarship for that. Also I'd be a little worried about having to find housing twice in 2 years.

tl;dr

looking for a cs-heavy NLP-master in Europe (or smth else that I could get into with basically no mathematical experience that enables me to work with Machine Learning etc. later) that also won't require me to sell a kidney to afford it.

/r/LanguageTechnology
https://redd.it/1cje2kc

Читать полностью…

Data Scientology

D Simple Questions Thread

Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instead!

Thread will stay alive until next one so keep posting after the date in the title.

Thanks to everyone for answering questions in the previous thread!

/r/MachineLearning
https://redd.it/1c9jy4b

Читать полностью…

Data Scientology

Computer vision on an MCU, and I got this fan that follows my every single move! No more manual adjustments or stagnant air!!!

/r/computervision
https://redd.it/1cco0m6

Читать полностью…

Data Scientology

D What comes first, math, or algorithm in research?

I'm learning meths behind diffusion right now (DDPM, Score-based, and other approaches). I'm wondering how exactly did researchers come up with the idea?

Does inventing new approaches go something like this?
1. We want to make better image generator.
2. Oh, the data will never be enough...
3. Let's multiply data - by adding some noise corruption
4. This this works well, what if we make a denoising network?
5. What if we make network that makes an image from pure noise?
6. That doesn't work, what if we did smaller denoising steps?
7. This works! Now, let's create some theory on why it works.
8. Write the paper

Or something like this?
1. We want to make better image generator.
2. We know "nonequilibrium thermodynamics" really well and want to try applying it somehow
3. We somehow come up with an algorithm that relies on math from that theory
4. It works!
5. We write the paper.

Which comes first usually? Math or Algorithm?

/r/MachineLearning
https://redd.it/1c64jw0

Читать полностью…

Data Scientology

It happens

/r/deeplearning
https://redd.it/1bw6tno

Читать полностью…

Data Scientology

D What's more impressive in a ML portfolio: implementing a paper or creating a good project?

Hey guys, what do hiring managers in companies would prefer more from your experience, having a great implementation of papers or great practical projects? I know both have great benefits, pros and cons etc. But, what do managers here on reddit like to see when going through repos? Would one of these be better than the other when going through the skills of a candidate?

/r/MachineLearning
https://redd.it/1bsezcf

Читать полностью…

Data Scientology

What are the most interesting CVPR papers this year?

I’ve seen some people promoting their papers, but as a reader what caught your attention from CVPR?

/r/computervision
https://redd.it/1bmz7n3

Читать полностью…

Data Scientology

Do you regret getting into computer vision?

If so, why and what CS specialization would you have chosen? Or maybe a completely different major?

If not, what do you like the most about your job?

/r/computervision
https://redd.it/1bcwawr

Читать полностью…

Data Scientology

Can old people learn and get hired?

I am 71 with an all but phd dissertation math background, several years of teaching experience (up through Calculus and Prob/Stat). My programming skills are modest but improving. I have taken a number of machine learning and deep learning courses on Coursera and done quite well. Is it possible for me to get a bachelor’s or master’s degree in computer science or data analytics online and then get a job with an AI company?

If not, what are the best ways to make a positive impact on the field?

I am not in this for the big bucks, as I am comfortably retired, but rather to show that it can be done.

/r/deeplearning
https://redd.it/1b7mf2m

Читать полностью…

Data Scientology

Why is ViT more commonly used than SWIN?

I'm still reading around but most every Computer Vision paper I read uses ViT as their backbone instead of SWIN or other similar architectures but why?

​

The ViT paper had to pre-train their model on the 303M image JFT dataset to beat earlier convolutional models on ImageNet whereas SWIN achieves better performance without any pre-training. I imagine SWIN would achieve comparable, if not higher performance on ImageNet if it was pre-trained the same way though admittedly I haven't seen any work to validate this idea.

​

Is this just a case of ViT being first so now everyone uses it as a default or is there another reason?

/r/computervision
https://redd.it/1b3be51

Читать полностью…

Data Scientology

D Simple Questions Thread

Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instead!

Thread will stay alive until next one so keep posting after the date in the title.

Thanks to everyone for answering questions in the previous thread!

/r/MachineLearning
https://redd.it/1aob7zi

Читать полностью…

Data Scientology

Transfer Learning vs. Fine-tuning vs. Multitask Learning vs. Federated Learning

/r/deeplearning
https://redd.it/1au2avn

Читать полностью…

Data Scientology

D GPT-4o "natively" multi-modal, what does this actually mean?

What are your best guesses on how it works (training and architecture) vs. the typical VL formula of pretrained vision encoder + pretrained LLM -> fine-tune with multimodal tasks?

E.g. Is it fully mixed modality pre-training the entire system? Does model embed all modalities into a shared space for prediction? Does the system "self-select" the modality of output tokens (can flexibly choose to output audio vs. text based on input tokens) or is this user specified?

/r/MachineLearning
https://redd.it/1crzdhd

Читать полностью…

Data Scientology

a different aspect of the overall experience, UniSim can emulate how humans and agents interact with the world by simulating the visual outcome of both high-level instructions such as “open the drawer” and low-level controls such as “move by x,y” from otherwise static scenes and objects. There are numerous use cases for such a real-world simulator. As an example, we use UniSim to train both high-level vision-language planners and low-level reinforcement learning policies, each of which exhibit zero-shot real-world transfer after training purely in a learned real-world simulator. We also show that other types of intelligence such as video captioning models can benefit from training with simulated experience in UniSim, opening up even wider applications.

**Never Train from Scratch: Fair Comparison of Long-Sequence Models Requires Data-Driven Priors**
Ido Amos, Jonathan Berant, Ankit Gupta

Abstract: Modeling long-range dependencies across sequences is a longstanding goal in machine learning and has led to architectures, such as state space models, that dramatically outperform Transformers on long sequences. However, these impressive empirical gains have been by and large demonstrated on benchmarks (e.g. Long Range Arena), where models are randomly initialized and trained to predict a target label from an input sequence. In this work, we show that random initialization leads to gross overestimation of the differences between architectures and that pretraining with standard denoising objectives, using only the downstream task data, leads to dramatic gains across multiple architectures and to very small gaps between Transformers and state space models (SSMs). In stark contrast to prior works, we find vanilla Transformers to match the performance of S4 on Long Range Arena when properly pretrained, and we improve the best reported results of SSMs on the PathX-256 task by 20 absolute points. Subsequently, we analyze the utility of previously-proposed structured parameterizations for SSMs and show they become mostly redundant in the presence of data-driven initialization obtained through pretraining. Our work shows that, when evaluating different architectures on supervised tasks, incorporation of data-driven priors via pretraining is essential for reliable performance estimation, and can be done efficiently.

/r/MachineLearning
https://redd.it/1co4kfw

Читать полностью…

Data Scientology

What are the best websites to find state-of-the-art (SOTA) deep learning models at the moment?

Hey everyone, sometimes when I want to explore the best state-of-the-art (SOTA) object detection or classification models, I find myself confused about which models are currently considered the best and freely available. I'm wondering what the best websites are to find the most recent news, as deep learning research is making overwhelming progress and it's hard to keep track.

/r/deeplearning
https://redd.it/1cm9cjm

Читать полностью…

Data Scientology

Multilabel text classification on unlabled data

I'm curious what you all think about this approach to do text classification.

I have a bunch of text varying between 20 to 2000+ words long, each talking about varying topics. I'll like to tag them with a fix set of labels ( about 8). E.g. "finance" , "tech"..

This set of data isn't labelled.

Thus my idea is to perform a zero-shot classification with LLM for each label as a binary classification problem.

My idea is to perform a binary classification, explain to the LLM what "finance" topic means, and ask it to reply with "yes" or "no" if the text is talking about this topic. And if all returns a "no" I'll label it as "others".

For validation we are thinking to manually label a very small sample (just 2 people working on this) to see how well it works.

Does this methology make sense?

edit:

for more information , the text is human transcribed text of shareholder meetings. Not sure if something like a newspaper dataset can be used as a proxy dataset to train a classifier.

/r/LanguageTechnology
https://redd.it/1chew9t

Читать полностью…

Data Scientology

A visual deep dive into Uber's ML system to solve the billion dollar problem of predicting ETAs.

TL;DR: Uber follows a 2 layer approach. They use traditional graph algorithms like Dijkstra followed by learned embeddings and a lightweight self-attention neural network to reliably predict estimated time of arrival or ETA.

How Uber uses ML to ETAs

https://preview.redd.it/cg6r82se67xc1.png?width=1358&format=png&auto=webp&s=4ac9e946b30d858721b842f0f4407dfa6c50ce3e



/r/deeplearning
https://redd.it/1cf3apc

Читать полностью…

Data Scientology

Should I learn a low level languages like c or c++? If yes which one?

Since the time got interested in AI and ML, I have always used python. My grip over python is pretty good now.

Recently I have been thinking if I should learn a low level language. I already know one high level language.

Do you think it will be beneficial to learn another language like c or c++ in the context of AI? Will learning a low level language help me in AI in some way?

If I should learn one which one should I choose?

Thanks

/r/deeplearning
https://redd.it/1c9iqsh

Читать полностью…

Data Scientology

Best Computer Vision Framework in 2024?

Hi everyone,

I'm about to start my internship in Computer Vision, and I'd like to brush up on the basics since I've been primarily focused on NLP recently. For Computer Vision, I've only worked with PyTorch, but after doing some reading, I've realized that TensorFlow is a good alternative, but I am not sure.

So, would it make sense for me to learn TensorFlow now, or should I stick with PyTorch? If there are any other frameworks that you'd recommend, please feel free to share.

/r/computervision
https://redd.it/1c1jucz

Читать полностью…

Data Scientology

Are there any benefits of using two Nvidia RTX 4090 in a single computer?

Hey everyone! I'm diving into my PhD focusing on deep learning, I've got a chance to get two RTX 4090s from my faculty. However, I've learned that the 4090s don't support SLI or NVLink, suggesting that communication between the cards might not be very efficient. I'm pondering whether it's worth using two 4090s together, or if it might be overkill. My toolkit includes Python, TensorFlow, Keras, and occasionally Matlab for deep learning tasks. I mainly work with convolutional neural networks for audio classification. A larger VRAM pool would be beneficial, but I'm guessing this won't improve with a second GPU. At least, I could train models faster, right? I could also opt for two 3090s, but since one 4090 seems to outpace two 3090s speed-wise, that option seems less appealing. What do you guys think?

/r/deeplearning
https://redd.it/1bubc3j

Читать полностью…

Data Scientology

Should I learn NLP with scikit_learn or transformers along with Pytorch?



/r/LanguageTechnology
https://redd.it/1bp9v7t

Читать полностью…

Data Scientology

How much time do Ai/Ml engineer spend doing Coding?

I have been learning ML for 6 months but I haven't done any serious big project. I have only done small projects like, next word prediction, sentiment analysis, etc.. I have a question about ml and dl. How much time in a company do ai and ml engineer spend on coding and most of the time what they do? What they spend their time on most?

/r/deeplearning
https://redd.it/1bkb95p

Читать полностью…

Data Scientology

Stuck in a loop while Learning ML/Deep Learning and I feel whatever I have done can be replaced by a basic chatgpt lol

A little bit about myself,
I love data. Since childhood I have loved collecting and storing and playing with data, especially once I learnt Excel in my childhood. Hence I really feel this field suits me more than the webdev etc
The thrill while training a model is unparalleled to any other thing that I've ever done.

Coming to my current condition, I have a basic exposure in Deep Learning where I have done a couple of internship where I did transfer learning to train a model to classify a specific type of object.
I used chatgpt and google to guide me and trained the model, with an accuracy of 90% approx. Another project of mine involved predicting a image improvement (densenet, squeezenet) and (random forest, SVM) algorithm parameters based on features like skewness, contrast etc where I used random forest and outputs were satisfactory.
Again I used chatgpt and just trained, it seemed very rudimentary as I didn't know how to improve the model by myself.
I wouldn't say I learnt nothing but at the same time idk I still feel like a noob in Machine learning
Btw I have placements in a few months, hence I'm even more worried that should I continue to work on ML or just make basic MERN projects and get placed first.

Now the thing is, I don't have much idea about neural networks, or any maths behind it.
I built a NLP project, Twitter sentiment (i know very basic YouTube project) I understood the entire process and loved the new learning but I don't know ANYTHING about LSTM

I'm learning basics of ML now, like regression, KNN etc but Idk the maths behind it.

How should A guy like me proceed to learn Machine Learning and gradually move towards LLM/other topics
This isn't a normal roadmap cuz I have already worked on stuff but I'm still clueless.
Can someone please please guide me as to how can I proceed from here, and the stuff that I've done is that good enough for a fresher in ML or should I improve in some other direction.

Edit: I'm 20 years old. And a newbie in this field.

Edit: I have taken subjects like Algebra, Calculus, Probability in my uni. So I know the maths basics. It's just that I'm learning the maths related to ML now.

/r/deeplearning
https://redd.it/1b8thzw

Читать полностью…

Data Scientology

How do I start with RAG?

I want to learn RAG. Is there any resources out there where I can learn RAG from basic to advanced from? Like starting with simple exercises then gradually increasing the difficulty.

/r/LanguageTechnology
https://redd.it/1b3ns1z

Читать полностью…

Data Scientology

R The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

https://arxiv.org/abs/2402.17764

Abstract

>Recent research, such as BitNet, is paving the way for a new era of 1-bit Large Language Models (LLMs). In this work, we introduce a 1-bit LLM variant, namely BitNet b1.58, in which every single parameter (or weight) of the LLM is ternary {-1, 0, 1}. It matches the full-precision (i.e., FP16 or BF16) Transformer LLM with the same model size and training tokens in terms of both perplexity and end-task performance, while being significantly more cost-effective in terms of latency, memory, throughput, and energy consumption. More profoundly, the 1.58-bit LLM defines a new scaling law and recipe for training new generations of LLMs that are both high-performance and cost-effective. Furthermore, it enables a new computation paradigm and opens the door for designing specific hardware optimized for 1-bit LLMs.

/r/MachineLearning
https://redd.it/1b22izk

Читать полностью…

Data Scientology

Switch from Classic Computer vision to Deep Learning

I develop classic Computer Vision algorithms on C++ and want to switch to Deep Learning area.
For now, I see these options:
Plan #1:

1. Join a small HW/mobile startup
2. Integrate/optimize neural networks (this usually requires C++)
3. Gradually move to NN finetuning.
This may be possible if NN training and integration is made by the same team. That's why the company should be small.

Plan #2:

1. Learn leetcode
2. Apply to FAANG-like companies for SE in test role to support AI/scientific infrastructure
3. Try to switch from testing utils to NN finetuning.

Plan #3:

1. Learn DL theory on excellent level
2. Join intern/junior position in bigtech
3. Hope to get a better offer for middle position from them


My skills:

C++: 5 YoE as a main programming language
Python: 2 YoE for auxiliary utils
DL: 1 year of commercial experience, utilized a few networks as black boxes
DL theoty: few online courses
Math: good level
Extra: 5 YoE in Classic Computer Vision

​

So, I have a couple of questions:

1. What do you think about these plans?
2. Maybe, you can share some relevant experience?
3. Do you have other ideas how to switch to DL with my experience?
4. Should I do pet projects to tell about them on job interview when the interviewer asks me about my DL experience?

​

I would also be happy to hear your stories if you did similar transfers.

Thanks in advance!

/r/computervision
https://redd.it/1avnksk

Читать полностью…

Data Scientology

R P 10 times faster LLM evaluation with bayesian optimization

Recently I've been working on making LLM evaluations fast by using bayesian optimization to select a sensible subset.




Bayesian optimization is used because it’s good for exploration / exploitation of expensive black box (paraphrase, LLM).




Project link




I would love to hear your thoughts and suggestions on this!

/r/MachineLearning
https://redd.it/1apv97t

Читать полностью…
Подписаться на канал