Why everyone is asking about C++?
When I look at the job posts I see a lot of C++ requirements for AI/DL/ML related jobs.
I assume this is to create optimized models. However when I check online I couldn't see any specific benefit of using C/C++ over python.
When de they plan to use C/C++ and for what? I checked some benchmark comparsions and they're very similar etiher. Furthermore can't we use cython instead of C/C++ anyways?
Do you have any ideas about this?
/r/deeplearning
https://redd.it/16zw8ct
“Decoder-only” Transformer models still have an encoder…right? Otherwise how do they “understand” a prompt?
The original transformer model consisted of both encoder and decoder stages. Since that time, people have created encoder-only models, like BERT, which have no decoder at all and so function well as base models for downstream NLP tasks that require rich representations.
Now we also have lots of “decoder-only“ models, such as GPT-*. These models perform well at creative text generation (though I don’t quite understand how or why).
But in many (all?) use cases of text generation, you start with a prompt. Like the user could ask a question, or describe what it wants the model to do, and the model generates a corresponding response.
If the model’s architecture is truly decoder-only, by what mechanism does it consume the prompt text? It seems like that should be the role of the encoder, to embed the prompt into a representation the model can work with and thereby prime the model to generate the right response?
So yeah, do “decoder-only” models actually have encoders? If so, how are these encoders different from say BERT’s encoder, and why are they called “decoder-only”? If not, then how do the models get access to the prompt?
/r/LanguageTechnology
https://redd.it/16nl811
Is running an open sourced LLM in the cloud via GPU generally cheaper than running a closed sourced LLM?
Assuming using the same cloud service, Is running an open sourced LLM in the cloud via GPU generally cheaper than running a closed sourced LLM? (ie. do we pay a premium when running a closed sourced LLM compared to just running anything on the cloud via GPU?)
One eg. I am thinking of is running Llama 2 13b GPTQ in Microsoft Azure vs. GPT-3.5 Turbo.
I understand there are a lot of parameters to consider (such as choosing which GPU to use in Microsoft Azure etc.), but I am really looking at what’s the cheapest way to run Llama 2 13b GPTQ or a performance-equivalent closed sourced LLM.
/r/LanguageTechnology
https://redd.it/16p6ceo
Why use ONNX with Triton Inference Server? Why use ONNX in general?
Since Triton can support TensorFlow and PyTorch via torchscript. I was wondering why you would want to convert your model to ONNX? Is it simply to use TensorRT?
Also just wanted to know why use ONNX in general? What are the main advantages?
/r/computervision
https://redd.it/16ogz45
iPhone 15 Stereo Imaging
In yesterday’s keynote event Apple released the iPhone 15 pro max. Apparently you can now take 3d images (only available on the iPhone 15 pro). Well, it uses two of its camera lenses to take two images from slightly different angles to perform stereo imaging - obtaining depth.
So I’m sitting here thinking - every iPhone can do that - right? I’m looking at my iPhone 11 Pro Max thinking about writing up a program in iOS that can utilize two lenses and to take a “3d image.”
Sounds like a doable project right? I did stereo imaging and depth estimation projects for one of my classes so I think I can take on the challenge.
/r/computervision
https://redd.it/16ihrtk
D The ML Papers That Rocked Our World (2020-2023)
Hey everyone! 👋
I’ve been on a bit of a deep-dive lately, trying to catch up on all the awesome stuff that’s been happening in the ML space. It got me wondering, from 2020 to 2023, what have been the absolute must-read papers that shook the foundations and got everyone talking?
Whether it’s something that reinvented the wheel in your specific niche or just made waves industry-wide, I wanna hear about it!
I’m curious to see how different the responses will be, and hey, this might even become a go-to list for anyone looking to get the lowdown on the hottest trends and discoveries of the past few years.
Can’t wait to hear your thoughts!
# tl;dr
I decided to aggregate your best suggestions into categories for anyone interested in reading them without searching through the whole comment section in the future.
## Theoretical:
[Neural Networks are Decision Trees](https://arxiv.org/abs/2210.05189)
Cross-Validation Bias due to Unsupervised Preprocessing
[The Forward-Forward Algorithm: Some Preliminary Investigations](https://arxiv.org/abs/2212.13345)
LoRA: Low-Rank Adaptation of Large Language Models (included here as it has applications beyond LLMs)
[Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets](https://arxiv.org/abs/2201.02177)
## Image:
ViT related:
[An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (ViT)](https://arxiv.org/abs/2010.11929)
Emerging Properties in Self-Supervised Vision Transformers
[Training data-efficient image transformers & distillation through attention](https://arxiv.org/abs/2012.12877v2)
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
[A ConvNet for the 2020s (a CNN that implements several key components that contribute to the performance of Vision Transformers)](https://arxiv.org/abs/2201.03545)
(CLIP) Learning Transferable Visual Models From Natural Language Supervision
Diffusion related:
High-Resolution Image Synthesis with Latent Diffusion Models
[Denoising Diffusion Probabilistic Models (DDPM)](https://arxiv.org/abs/2006.11239)
Classifier-Free Diffusion Guidance
[Taming Transformers for High-Resolution Image Synthesis (VQGAN)](https://arxiv.org/abs/2012.09841)
Segment Anything (SAM)
[DINOv2: Learning Robust Visual Features without Supervision](https://arxiv.org/abs/2304.07193)
Bayesian Flow Networks
## NLP:
[Language Models are Few-Shot Learners (GPT-3)](https://arxiv.org/abs/2005.14165)
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
[Training language models to follow instructions with human feedback](https://arxiv.org/abs/2203.02155)
Training Compute-Optimal Large Language Models (Chinchilla)
[The Flan Collection: Designing Data and Methods for Effective Instruction Tuning](https://arxiv.org/abs/2301.13688)
LLaMA: Open and Efficient Foundation Language Models
[Toolformer: Language Models Can Teach Themselves to Use Tools](https://arxiv.org/abs/2302.04761)
## 3D Rendering:
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
[Highly accurate protein structure prediction with AlphaFold](https://www.nature.com/articles/s41586-021-03819-2)
## Misc:
Human-level play in the game of Diplomacy by combining language models with strategic reasoning
For a well-made and maintained list of ML resources (not only the newest like here) you can check out
How do these nickname tools work?
Hey everyone! I recently came across this interesting nickname generator (it is not the only one). It gave me a surprisingly accurate "japanese viking" name, which piqued my curiosity. From a linguistic perspective, how might such a tool understand and combine linguistic elements to produce coherent and culturally relevant nicknames? Does it consider phonetics, morphology, or other linguistic rules? Would love to get your insights!
/r/LanguageTechnology
https://redd.it/16d781w
Coding LLaMA 2 from scratch in PyTorch, with step by step explanation of KV Cache, Grouped Query Attention, Rotary Positional Embedding, RMS Normalization, SwiGLU and much more!
https://www.youtube.com/watch?v=oM4VmoabDAI
/r/deeplearning
https://redd.it/168onwq
Do you really need a strong Math ( and ML ) knowledge be a NLP engineer ?
Let me explain a bit. I come from a humanities bachelor's degree background, but with a strong passion for linguistics. I wanted to specialize in computational linguistics, but gradually I also became very interested in NLP and jobs related to NLP. That being said, I hope the repressed computer engineers don't show up now lol
I'm about to start a master's degree called “ Digital Humanities” but which is actually only about language technologies. The program includes various subjects like NLP, computational linguistics, data mining, programming, data analysis, etc. However, I know that the Machine Learning (ML) course is fundamental for NLP, but the university's ML course requires strong math foundations, designed for those who have a bachelor's degree in computer science or computer engineering. So, I had thought about giving it up and instead taking the course called “ Computational Intelligence and Deep Learning” that focuses more on topics like fuzzy logic and especially artificial neural networks, RNNs, etc., without requiring initial math foundations.
And maybe adding also an Algorithms class (a good class but not too advanced) to have an additional foundation for NLP.
And then I might study ML on my own through private courses like the one from Stanford on platforms like Coursera.
Or would it be better for me to study the math part (linear algebra, integral and differential calculus, functions) and attempt the ML exam? Keep in mind that I've already taken a statistics course and enjoyed it, but honestly, I don't have that much motivation to study math extensively, especially because I might invest so much effort for none since I might only find jobs like data linguist or computational linguist (given my background in humanistic informatics) where these strong math and ML knowledge are not necessary.
Certainly, my career goal in NLP isn't to engage in researching new algorithms and statistical models, I want to use more my linguistics knowledge in NLP but not only to do annotations.
I've noticed there are many people working more as "NLP engineers" many practical NLP tasks can be accomplished using existing libraries and tools without delving deep into the underlying mathematical concepts and who directly apply algorithms. So obviously you need t know algorithms and deep learning but not too much deep into math research right?
Or would it be better for me to just give up and focus solely on computational linguistics?
/r/LanguageTechnology
https://redd.it/165epjv
Getting data from physical circular chart.
/r/computervision
https://redd.it/162xdyo
Is CV evolving beyond bounding boxes?
Hi all - We (team of Stanford researchers) wrote a new blogpost on "Video Analysis Beyond Bounding Boxes" collecting some of our thoughts on the direction the CV field is heading.
We're actively researching&developing in this space so would love to hear some feedback on this vision for the future of CV and video analysis.
/r/computervision
https://redd.it/15ydds0
Your Neural Network Doesn't Know What It Doesn't Know
Hi everyone,
I made a repo trying to collect every high-quality source for Out-of-distribution detection, ranging from articles and talks for beginners to research papers at top conferences. It also has a primer if you are not familiar with the topic. Check it out and give it a star to support me if you find it helpful. Thanks a lot ;)
https://github.com/continuousml
​
https://preview.redd.it/3dsy0ameoxhb1.png?width=868&format=png&auto=webp&s=4a0c016ab9ad6baeb603bedac1d798572fc41152
/r/computervision
https://redd.it/15q8mx0
Looking for good learning sources around generative AI, specifically LLM
Are there any good video content sources that explains all the concepts associated with generative AI (ex: RL, RLHF, transformer, etc) from the ground up in extremely simple language (using analogies/stories of things that would be familiar to say a 10-12 year old)? Also would prefer channels which explain the concepts in a sequential manner (so that easy to follow) and make short and crisp videos
If yes, could you kindly comment below with the suggestions. If not, could you comment whether something like that would be useful to you and ideally why also?
Big thanks in advance 🙏
/r/deeplearning
https://redd.it/15hdu5v
D Simple Questions Thread
Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instead!
Thread will stay alive until next one so keep posting after the date in the title.
Thanks to everyone for answering questions in the previous thread!
/r/MachineLearning
https://redd.it/15dnok8
Stable validation curves on NLP project with BERT
/r/deeplearning
https://redd.it/16w3h3f
Meta Unfolds a 'Universe of AI' Across Instagram, Facebook, and WhatsApp
Meta has unveiled colossal AI updates peppered across its platform that would fundamentally alter user experiences on Instagram, Facebook, and WhatsApp, opening up a "universe of AI" solutions.
For the latest advancements in AI, look here first.
https://preview.redd.it/6od0fkjtp1rb1.png?width=2048&format=png&auto=webp&s=e424d2cd2e614728123005b10431c2c13e780871
Spearheading the AI Universe - Meta AI Chatbot
The “advanced conversational assistant” is set to enhance Messenger, WhatsApp, and Instagram services and will be incorporated into upcoming Ray-Ban Meta smart glasses and Quest 3.
Real-time information capabilities have been bolstered through a partnership with Microsoft Bing, and image generation is powered by a new model, Emu.
A Galaxy of AI Personalities
Meta rolled out 28 AIs in beta, featuring sterling personas such as Snoop Dogg, Tom Brady, Kendall Jenner, and Naomi Osaka, thus amplifying the interactivity quotient.
AI Studio - Empowering Businesses
The AI Studio Platform is equipped to enable businesses to build AI chatbots for messaging services on Facebook, Instagram, and Messenger.
Also, Meta will provide a sandbox tool in the upcoming year for users to experiment with creating their own AI.
Generative AI Stickers - A New Co-creating Experience
AI editing tools will allow users to edit images and co-create content with friends.
The tool uses Llama 2 and the new image generation model, Emu, to convert text prompts into stickers in seconds.
Ray-Ban Smart Glasses with Meta AI
The Ray-Ban smart glasses are equipped with Meta AI, allowing users to receive information, incite creativity, and manage the glasses using just their voice.
(source)
P.S. If you like this kind of analysis, I write a free newsletter with the latest and most impactful news in AI. Professionals from Google, Meta, and OpenAI read it daily.
/r/deeplearning
https://redd.it/16uow1h
How do Large Language Models compare to NLP toolkits for NLP tasks?
I need to do some NLP on text in a number of different languages (English, Spanish, Russian etc). I've experimented using spaCy, stanza and NLTK, as well as some LLMs like ChatGPT, Bard, LLaMa 2 and GPT-4, to do things like lemmatization and POS tagging.
In my experimentation, GPT-4 with adequate prompting outperformed everything else in every language. I wasn't able to spot any errors.
The other LLMs were more or less on par with NLP toolkits: LLMs were a bit more robust to imperfections in the input strings (typos, weird punctuation etc), but were more likely to make very simple mistakes too.
​
Have you guys tried to use LLMs for NLP?
Can you confirm my experimental results, or did you get a different outcome?
Is anyone trying to take advantage of the power of LLMs for these tasks? For instance, is anyone trying to extract NLP features from the insides of models like LLaMa 2?
/r/LanguageTechnology
https://redd.it/16gtrk4
R Unveiling theory of mind in large language models: A parallel to single neurons in the human brain - Harvard University 2023
Paper: https://arxiv.org/abs/2309.01660
Abstract:
>With their recent development, large language models (LLMs) have been found to exhibit a certain level of Theory of Mind (ToM), a complex cognitive capacity that is related to our conscious mind and that allows us to infer another's beliefs and perspective. While human ToM capabilities are believed to derive from the neural activity of a broadly interconnected brain network, including that of dorsal medial prefrontal cortex (dmPFC) neurons, the precise processes underlying LLM's capacity for ToM or their similarities with that of humans remains largely unknown. In this study, we drew inspiration from the dmPFC neurons subserving human ToM and employed a similar methodology to examine whether LLMs exhibit comparable characteristics. Surprisingly, our analysis revealed a striking resemblance between the two, as hidden embeddings (artificial neurons) within LLMs started to exhibit significant responsiveness to either true- or false-belief trials, suggesting their ability to represent another's perspective. These artificial embedding responses were closely correlated with the LLMs' performance during the ToM tasks, a property that was dependent on the size of the models. Further, the other's beliefs could be accurately decoded using the entire embeddings, indicating the presence of the embeddings' ToM capability at the population level. Together, our findings revealed an emergent property of LLMs' embeddings that modified their activities in response to ToM features, offering initial evidence of a parallel between the artificial model and neurons in the human brain.
​
https://preview.redd.it/2wduugp4svnb1.png?width=1098&format=png&auto=webp&s=d59878eec6a6570a15ac2a3f9d3485a3c140eb73
https://preview.redd.it/qkobarp4svnb1.png?width=1094&format=png&auto=webp&s=08c17207e282effc21149984e88e143f0878c154
https://preview.redd.it/qz9zydp4svnb1.png?width=1116&format=png&auto=webp&s=a08f4257235a60597ec9a85be3cd6c7df409d755
https://preview.redd.it/c0v4qmp4svnb1.png?width=1143&format=png&auto=webp&s=62c238c1bde2bce7e56de5e738ad2abce71d042d
/r/MachineLearning
https://redd.it/16h1tup
I've created a neural network library in c++ and trained image super resolution in it, the results are surprisingly good.
Hey.
To cut the story short, I've created a library in C++ from scratch using only the Eigen library (still writing most algorithms by hand because of terrible Eigen performance). Anyways I've been experimenting with image super resolution for the past 2 weeks, and I finally found the correct formula for creating a reasonably performing image upscaler.
I'm using a really small network with only 5 convolutional layers of really small kernel sizes (5 and 3) and pixel shuffle layer at the end. The network is trained to correct the error of bicubic interpolation, rather than upscaling the image directly, and thats the reason why it might be performing so 'well', but you can be the judge of that...
Here is an example of upscaled image by the network:
2x Image upscaling
And of course my upscaled pup:
https://preview.redd.it/8frhiak2dwlb1.png?width=1918&format=png&auto=webp&s=05d647b176764dc34350fa9fa9db5b0d71bc38ab
The network mostly just reconstructs the edges in the image, but doesn't really 'hallucinate' any new detail, so the results are quite pleasing. (Still outperforms FSR1 by a lot from my testing). And it should be able to run in real-time on GPU if it were to be ported...
And here is link to the tool : https://github.com/Panjaksli/BNN/tree/v1.0a
You can try it out, and tell me what you think. Thanks.
/r/deeplearning
https://redd.it/168b0p7
My master's research has beaten state-of-the-art R. I am not sure what to do about it D.
Hello,
My research (Dissertation for MSc in AI) on applying LLMs to drug binding affinity prediction has beaten previous state-of-the-art in single sequence prediction tasks.
My method yields a correlation of 0.7079 for SMILES and 0.7007 for AA-pockets, which improves upon the previous state-of-the-art correlations of 0.485 and 0.501, respectively. The prior state-of-the-art is described and documented in the paper: "Improved Protein−Ligand Binding Affinity Prediction with Structure-Based Deep Fusion Inference" -> https://pubs.acs.org/doi/10.1021/acs.jcim.0c01306.
However, I don't really know what to do with this information. I did not have a supervisor who lead me with this research who I can discuss this with. This is because my supervisor was in a different field to ML (my university assigned you a supervisor semi-randomly, and I was given someone who focuses on string algorithms) so we agreed and I went down my own path (for the past year) as I really wanted to undertake LLM research. Unfortunately then, no one I know is knowledgeable in the field. My work is currently being marked (I submitted it 2 weeks ago) and I won't get any feedback until November.
My ideas are to put it on ArXiv, but that's it really. As a MSc student I'm still pretty new to research so I'm unsure what to do next. Any advice on what I should do next would be useful
The GitHub to my work can be found here (still a bit of a WIP) https://github.com/jacobmcasey/large-language-models-for-protein-ligand-binding/tree/main
/r/MachineLearning
https://redd.it/169mdnf
Introducing Code Llama, a state-of-the-art large language model for coding
https://ai.meta.com/blog/code-llama-large-language-model-coding/
/r/deeplearning
https://redd.it/1605opp
Fast CV App: Cross Platform Computer Vision Using Multiprocessing
**Why is this relevant to computer vision?**
In my project I show that a pure python app that does 1080p 30fps on both Windows and Mac is possible. It's good for prototyping, for testing (especially if you can just go to a C variant and make it really fast) and I hope in the future, for making "serious" apps.
I'm sharing this because I have never seen anybody talk about using multiprocessing, data compression, and a pure python GUI packaged to windows/mac in the context of computer vision. This might be due to people on reddit/discord/stack exchange just not talking about it but I really do think that this information is just locked to the industry professionals.
This is probably because people don't need it if they have a team of people working on a qt frontend and have another team working on computer vision specifically.
I haven't seen anybody working on this information publicly. All the good stuff is closed source in big corporations:
* examples: Mediapipe's slack channel require a google email: https://github.com/google/mediapipe/issues/779#issuecomment-1101212500
* I DEFINITELY do not have access to instagram filters or very specifically how they apply their filter processing. What I do know is that their more complex filters are not 30 fps at all on mobile phones.
* I can't recall off the top of my head other industry standard pose estimation apps that have open source code/documentation...
**What is my project?**
Here I show with Fast CV App that it is possible and that there is room for improvement. For example, I could "blit buffer" to a shared datatype instead of uploading the whole frame to shared memory, or even convert to YUV so that blit buffer on the kivy frontend is even faster, etc etc.
**How it works**
I gave up on threading because I just could not get mediapipe threading on 1080p frames to hit 30fps. As in the mediapipe docs, it actually drops frames to maintain framerate. I go one step further and actually analyze each frame. I do that by cheating and reading the future frames using opencv/ffmpeg, sending future frames to a multiprocessing subprocess to analyze, then recieve frames in kivy to display at the right time. This is where data compression kicks in, because inter-process communication was hell on this pipeline, taking up ~20-30ms which basically negated the benefits of multiprocessing. This delay made it so that instead of 3-4 subprocesses being sufficient, you needed to run ~6-8 subprocesses which is just not ok. I was stumped on this problem for ~3 months until I realized I could use a compression library like blosc to make the 1080p frames I was sending and receiving go from 6MB to 3.8MB, spending ~5ms on IPC on a task that previously took ~20-30ms. In hindsight, I think this step is actually a basic solution/ probably an industry standard, but all the multiprocessing tutorials never talked about compression so I never thought about it.
A couple tricks/hints:
* try/except blocks using a print(<error message here>, flush=True) was pretty good at catching silent errors from multiprocessing subprocesses
* start your multiprocessing code in AFTER an "if name == main" check or a similar guard so that you don't infinitely spawn subprocesses.
**Fast CV App links**
Github link:
https://github.com/AccelQuasarDragon/FastCVApp
Multiprocessing/Threading Analysis Video:
https://youtu.be/7-UdBUSfafo
Getting Started:
https://youtu.be/YnhHaKEx7pY
Thanks for your time and have a great day, hope this helps even one person out. Good luck!
/r/computervision
https://redd.it/15wdp3o
OpenAI Notebooks which are really helpful.
The OpenAI cookbook is one of the most underrated and underused developer resources available today. Here are 7 notebooks you should know about:
1. Improve LLM reliability:
https://github.com/openai/openai-cookbook/blob/main/techniques\_to\_improve\_reliability.md
2. Embedding long text inputs:
https://github.com/openai/openai-cookbook/blob/main/examples/Embedding\_long\_inputs.ipynb
3. Dynamic masks with DALLE:
https://github.com/openai/openai-cookbook/blob/main/examples/dalle/How\_to\_create\_dynamic\_masks\_with\_DALL-E\_and\_Segment\_Anything.ipynb
4. Function calling to find places nearby:
https://github.com/openai/openai-cookbook/blob/main/examples/Function\_calling\_finding\_nearby\_places.ipynb
5. Visualize embeddings in 3D:
https://github.com/openai/openai-cookbook/blob/main/examples/Visualizing\_embeddings\_in\_3D.ipynb
6. Pre and post-processing of Whisper transcripts:
https://github.com/openai/openai-cookbook/blob/main/examples/Whisper\_processing\_guide.ipynb
7. Search, Retrieval, and Chat:
https://github.com/openai/openai-cookbook/blob/main/examples/Question\_answering\_using\_a\_search\_API.ipynb
Big thanks to the creators of these notebooks!
/r/deeplearning
https://redd.it/15rihgo
D How to stay on the cutting edge of applied ML/AI while doing my PhD?
A lot of my PhD work will be in using different types of ML/NN approaches to characterizing problems in my field. It's kind of weird, since for my undergrad I came from a more traditional science background where we research off papers that were written like 2-20 years ago. Since a lot of these architectures and whatever are updating so fast, I wanted to see if there's a good way to keep up with the latest information so my work wouldn't be outdated by the time I publish. Is there a general workflow that those of you in the field follow in regards to this?
/r/MachineLearning
https://redd.it/15lnt4g
resources to learn about training LLMs?
I'd like to train a mini-LLM on a CPU just to get some experience with LLM training. Do y'all have any resources/links to relevant tutorials? I've looked around myself, but I couldn't find too many in-depth tutorials. I'm also interested in building my own toy LLM from scratch, just for better understanding.
/r/deeplearning
https://redd.it/15j3ls5
D NeurIPS 2023 Paper Reviews
NeurIPS 2023 paper reviews are visible on OpenReview. See this tweet. I thought to create a discussion thread for us to discuss any issue/complain/celebration or anything else.
There is so much noise in the reviews every year. Some good work that the authors are proud of might get a low score because of the noisy system, given that NeurIPS is growing so large these years. We should keep in mind that the work is still valuable no matter what the score is.
/r/MachineLearning
https://redd.it/15fo7td