datascientology | Образование

Telegram-канал datascientology - Data Scientology

1234

Hot data science related posts every hour. Chat: https://telegram.me/r_channels Contacts: @lgyanf

Подписаться на канал

Data Scientology

D Self-Promotion Thread

Please post your personal projects, startups, product placements, collaboration needs, blogs etc.

Please mention the payment and pricing requirements for products and services.

Please do not post link shorteners, link aggregator websites , or auto-subscribe links.

--

Any abuse of trust will lead to bans.

Encourage others who create new posts for questions to post here instead!

Thread will stay alive until next one so keep posting after the date in the title.

--

Meta: This is an experiment. If the community doesnt like this, we will cancel it. This is to encourage those in the community to promote their work by not spamming the main threads.

/r/MachineLearning
https://redd.it/1htw7hw

Читать полностью…

Data Scientology

AutoEncoder Embedding vectors

I have a question about an AutoEncoder(AE) embedding vector(latent vector). Let's suppose the training set is "FashionMNIST".

What we clarify about the AE training objective when we set the loss function is minimizing the difference between the pixels of input image and output image. There's no instructions about "Mapping the similar items to similar embedding vector" or "similar latent space region".

But, after the training, It shows that similar items are mapped to similar embedding vector. How this can be happened?

Is there any fundamental principle that can explain this phenomena?

\- e.g) because the gradient backpropagate in the way that \~\~\~

/r/deeplearning
https://redd.it/1hr2jkm

Читать полностью…

Data Scientology

Why flatter local minima is better than sharp local minima?

My goal is to understand how Deep Learning works. My initial assumption were:

1. "as long as the loss value reach 0, all good, the model parameters is tuned to the training data".
2. "if the training set loss value and test set loss value has a wide gap, then we have overfitting issue".
3. "if we have overfitting issue, throw in a regularization method such as label smoothing".

I don't know the reason behind overfitting.

Now, I read a paper called "Sharpness-Aware Minimization (SAM)". It shattered my assumption. Now I assume that we should set the learning rate as small as possible, and prevent exploding gradients at all cost.

PS: I don't know why exploding gradient is a bad thing if what matters was the lowest loss value. Will the model parameters be different for the model that was trained with a technique that didn't cause exploding gradients if compared to a model that was trained without the technique?

I binged a bit and found this image.

PS: I don't know what is a generalization loss. How does the generalization loss was calculated? Does this use the same loss function but use the testing set instead of training set?

In the image, it shows 2 minimum, one is sharp, the other is flat. If it's sharp, there is a large gap if compared to the generalization loss. If it's flat, there is a small gap if compared to the generalization gap.

Sharp and Flat Minimum

/r/deeplearning
https://redd.it/1hltl6r

Читать полностью…

Data Scientology

Best Computer Vision Books for Beginners to Advanced
https://codingvidya.com/best-computer-vision-books-for-beginners/

/r/computervision
https://redd.it/1hi8425

Читать полностью…

Data Scientology

What is an interesting/niche NLP task or benchmark dataset that you have seen or worked with?

With LLMs front and center, we're all familiar with tasks like NER, Summarization, and Question Answering.

Yet given the sheer volume of papers that are submitted to conferences like AACL, I'm sure there's a lot of new/niche tasks out there that don't get much attention. Through my personal project, I've been coming across things like metaphor detection and the cloze test (the latter is likely fairly well-known among the Compling folks).

It has left me wondering - what else is out there? Is there anything that you've encountered that doesn't get much attention?

/r/LanguageTechnology
https://redd.it/1he98sh

Читать полностью…

Data Scientology

Advice Math for Deep learning Book

Hello Everyone,

I want to learn more about the mathematics approach behind deep learning architecture.
I precise that I have no mathematical background in university (medical study), but I already create deep learning architecture (AE, CNN, GAN) and know every concept.
I realise that I need the mathematic logic creativity to personnalise new deep architecture, for future medicals papers.
Have you read a book about this subject and advise one ? , I already see this three books, but I don't know, who is the better ? :
\- Math for Deep learning
\- Math and Architectures for Deep learning

\- Essential math for AI

Thank you very much for your advice

/r/deeplearning
https://redd.it/1h7x5lm

Читать полностью…

Data Scientology

Can NLP exist outside of AI

I live in a Turkish speaking country and
Turkish has a lot of suffixes with a lot of edge cases. As a school project I made an algorithm that can seperate the suffixes from the base word. It also can add suffixes to another word. The algorithm relies solely on the Turkish grammar and does not use AI. Does this count as NLP? If it does it would be a significant advantage for the project

/r/LanguageTechnology
https://redd.it/1h4diir

Читать полностью…

Data Scientology

From humanities to NLP

How impossible is it for a humanities student (specifically English) to get a job in the world of computational linguistics?



To give you some background: I graduated with a degree in English Studies in 2021 and since then I have not known how to fit my studies into real job without having to be an English teacher. A year ago I found an approved UDIMA course (Universidad a Distancia de Madrid) on Natural Language Processing at a school aimed at humanistic profiles (philology, translation, editing, proofreading, etc.) to introduce them to the world of NLP. I understand that the course serves as a basis and that from there I would have to continue studying on my own. This course also gives the option of doing an internship in a company, so I could at least get some experience in the sector. The problem is that I am still trying to understand what Natural Language Processing is and why we need it, and from what I have seen there is a lot of statistics and mathematics, which I have never been good at. It is quite a leap, going from analyzing old texts to programming. I am 27 years old and I feel like I am running out of time. I do not know if this field is too saturated or if (especially in Spain) profiles like mine are needed: people from with a humanities background who are training to acquire technical skills.



I ask for help from people who have followed a similar path to mine or directly from people who are working in this field and can share with me their opinion and perspective on all this.



Thank you very much in advance.

/r/LanguageTechnology
https://redd.it/1h12gyo

Читать полностью…

Data Scientology

Reverse Face Search Technology

I built a free tool that lets you search your face across the internet using Face Recognition Technology. Check it out and see what you discover.

Try FaceOnLive Free Face Search Online - instant & no signup required.

/r/computervision
https://redd.it/1gwhrn0

Читать полностью…

Data Scientology

D Paper Club: Nvidia Researcher Ethan He Presents Upcycling LLMs in MoE

Hey all,


Tomorrow Nvidia researcher Ethan He will be doing a technical dive into his work: Upcycling LLMs in Mixture of Experts (MoE). Excited to get a peak behind the curtains to see what it is like to work on models at this scale at Nvida.


If you’d like to join the community tomorrow 10 AM PST we’d love to have you. We do it live over zoom and anyone is welcome to join.

Here's the paper: https://arxiv.org/abs/2410.07524
Join us live: https://lu.ma/arxivdive-31

/r/MachineLearning
https://redd.it/1grjjlz

Читать полностью…

Data Scientology

[Dataset Request] Looking for Animal Behavior Detection Dataset with Bounding Boxes

/r/deeplearning
https://redd.it/1go0x9m

Читать полностью…

Data Scientology

Beware of Latitude.sh

Hello,

The server provider "Latitude.sh" is starting to gain traction in the AI Deep Learning industry. I have had a very negative experience with them and I would like to share it to you all to be careful for getting your own servers.

Shortly after signing up to the Latitude platform, and verifying my team (which requires you to deposit 100$ in credits via crypto), my account was soon unverified and subsequently instantly terminated. I contacted the support team and they said it was banned due to "The account has been blocked due to suspicious" and refused to provide any insights or a way to get the account unblocked. I politely asked their team for a refund of the 100$ I deposited in credits in order to sign up to the platform, however it got denied within 2 minutes of me asking for the refund saying "We have carefully reviewed your refund request" and that they will not give me a refund. This is highly unacceptable as a new client signing up to the service to use them for AI Deep Learning algorithms, and having to pay via Crypto so that I am unable to chargeback.

After talking to some other people who have attempted to use this service, they all had similar experiences and wish they never touched this provider who are clearly scamming.

So, beware of Latitude they will scam you out of 100$ and any additional funds you deposit into your account. I also noticed a declined charge on my card of 500$ shortly after my termination (they require you to additionally add a card to your account for verification), which got denied by my card issuer for failing 3d secure.

I utterly do not recommend this provider to anyone looking to get servers for AI purposes and I recommend to use a more competent provider such as Hetzner or OVH.

Thank you.

/r/deeplearning
https://redd.it/1gkl3t9

Читать полностью…

Data Scientology

Control Gimbal(reCamera) using LLMs(Locally deployed on NVIDIA Jetson Orin)! Say turn left at 40 degrees, it works!

/r/computervision
https://redd.it/1gfhao0

Читать полностью…

Data Scientology

x.infer - Framework agnostic computer vision inference.

I spent the past two weekends building x.infer, a Python package that lets you run computer vision inference on a framework of choice. I hope x.infer makes it easier to experiment with new models without having to learn a new framework.

https://i.redd.it/f6nc4tzu5uwd1.gif

It currently supports models from transformers, Ultralytics, Timm, and vLLM. Combined, this covers over 1000+ computer vision models. You can easily add your own model.


Repo - https://github.com/dnth/x.infer

Colab quickstart - https://colab.research.google.com/github/dnth/x.infer/blob/main/nbs/quickstart.ipynb

Why did I make this?

It's mostly just for fun. I wanted to practice some design pattern principles I picked up from the past. The code is still messy though but it works.

Also, I enjoy playing around with new vision models, but not so much learning about the framework it's written with.

I'm working on this during my free time. Contributions/feedback are more than welcome! Hope this also helps you (especially newcomers) to experiment and play around with new vision models.

/r/computervision
https://redd.it/1gbmuum

Читать полностью…

Data Scientology

CloudPeek: a lightweight, c++ single-header, cross-platform point cloud viewer

https://preview.redd.it/mkwbsg22fxvd1.png?width=1946&format=png&auto=webp&s=5bddf24571cf4ffe1df08fea6d8312e8e663164a

Introducing my latest project CloudPeek; a lightweight, c++ single-header, cross-platform point cloud viewer, designed for simplicity and efficiency without relying on heavy external libraries like PCL or Open3D. It provides an intuitive way to visualize and interact with 3D point cloud data across multiple platforms. Whether you're working with LiDAR scans, photogrammetry, or other 3D datasets, CloudPeek delivers a minimalistic yet powerful tool for seamless exploration and analysis—all with just a single header file.

Find more about the project on GitHub official repo: CloudPeek

My contact: Linkedin


#PointCloud #3DVisualization #C++ #OpenGL #CrossPlatform #Lightweight #LiDAR #DataVisualization #Photogrammetry #SingleHeader #Graphics #OpenSource #PCD #CameraControls

/r/computervision
https://redd.it/1g81d9k

Читать полностью…

Data Scientology

Looking for a CV group

Hi All,

I am looking for folks who are in computer vision/ ML space who might be interested in forming a small group to do weekly paper readings. One of my favorite things in grad school was being able to keep up to date with SOTA in CV/ML using research group meetings where folks would do a short form presentation, followed by discussion. My work is closely related to 3D computer vision and CV deep learning but I am not up to date with the latest and the greatest.

Alternatively, if there are groups or discords already out there, I would be happy to join them.

/r/deeplearning
https://redd.it/1htukqk

Читать полностью…

Data Scientology

Looking for Good Cameras Under $350 for Autonomous Vehicles (Compatible with Jetson Nano)

Hi everyone,

I'm working on a project to build an autonomous vehicle that can detect lanes and navigate without a driver. For our last competition, we used a 720p Logitech webcam, and it performed decently overall. However, when the sun was directly overhead, we had a lot of issues with overexposure, and the camera input became almost unusable.

Since we are aiming for better performance in varying lighting conditions, we’re now looking for recommendations on cameras that would perform well for autonomous driving tasks like lane detection and obstacle recognition. Ideally, we're looking for something under $350 that can handle challenging environments (bright sunlight, low-light situations) without the overexposure problem we encountered.

It’s also important that the camera be compatible with the Jetson Nano, as that’s the platform we are using for our project.

If anyone here has worked on a similar project or has experience with cameras for autonomous vehicles, I’d love to hear your advice! What cameras have worked well for you? Are there specific features (like high dynamic range, wide field of view, etc.) that you’d recommend focusing on? Any tips for improving camera performance in harsh lighting conditions?

Thanks in advance for your help!

/r/computervision
https://redd.it/1hqeggo

Читать полностью…

Data Scientology

If you were to start from scratch, how would you delve into CL/NLP/LT?

Hello!

I graduated with a degree in Linguistics (lots of theoretical stuff) a few months ago and I would like to pursue a master's degree focusing on CL/NLP/LT in the upcoming year.

I was able to take a course on "computational methods" used in linguistics before graduating, which essentially introduced me to NLP practices/tools such as regex, transformers and LLMs. Although the course was very useful, it was designed to serve as an introduction and not teach us very advanced stuff. And since there is still quite a lot of time until the admissions to master's programs start, I am hoping to brush up on what might be most useful for someone wanting to pursue a master's degree in CL/NLP/LT or learn completely new things.

So, my question is this: Considering what you do -whether working in the industry or pursuing higher education- how would you delve into CL/NLP/LT if you were to wake up as a complete beginner in today's world? (Feel free to consider me a "newbie" when giving advice, some other beginners looking for help might find it more useful that way). What would your "road map" be when starting out?

Do you think it would be better to focus on computer science courses (I was thinking of Harvard's CS50) to build a solid background in CS first, learn how to code using Python or learn about statistics, algorithms, maths etc.?

I am hoping to dedicate around 15-20 hours every week to whatever I will be doing and just to clarify, I am not looking for a way to get a job in the industry without further education; so, I am not looking for ways to be an "expert". I am just wondering what you think would prepare me the best for a master's program in CL/NLP/LT.

I know there probably is no "best" way of doing it but I would appreciate any advice or insight. Thanks in advance!

/r/LanguageTechnology
https://redd.it/1hk338l

Читать полностью…

Data Scientology

D Best survey papers of 2024?

As an AI researcher who is starting out, I usually start by seeing survey papers related to a field, then creating a roadmap to further deep dive into my research topic. I am eager to see the sub's viewpoint of the best survey papers they came across in 2024.

/r/MachineLearning
https://redd.it/1hgwjqu

Читать полностью…

Data Scientology

D The winner of the NeurIPS 2024 Best Paper Award sabotaged the other teams

Presumably, the winner of the NeurIPS 2024 Best Paper Award (a guy from ByteDance, the creators of Tiktok) sabotaged the other teams to derail their research and redirect their resources to his own. Plus he was at meetings debugging his colleagues' code, so he was always one step ahead. There's a call to withdraw his paper.

https://var-integrity-report.github.io/

I have not checked the facts themselves, so if you can verify what is asserted and if this is true this would be nice to confirm.

/r/MachineLearning
https://redd.it/1hctf36

Читать полностью…

Data Scientology

D Monthly Who's Hiring and Who wants to be Hired?

For Job Postings please use this template

>Hiring: [Location\], Salary:[\], [Remote | Relocation\], [Full Time | Contract | Part Time\] and [Brief overview, what you're looking for\]

For Those looking for jobs please use this template

>Want to be Hired: [Location\], Salary Expectation:[\], [Remote | Relocation\], [Full Time | Contract | Part Time\] Resume: [Link to resume\] and [Brief overview, what you're looking for\]

​

Please remember that this community is geared towards those with experience.

/r/MachineLearning
https://redd.it/1h3u444

Читать полностью…

Data Scientology

PyTorch implementation of Levenberg-Marquardt training algorithm

Hi everyone,

In case anyone is interested, here’s a PyTorch implementation of the Levenberg-Marquardt (LM) algorithm that I’ve developed.

GitHub Repo: torch-levenberg-marquardt

A PyTorch implementation of the Levenberg-Marquardt (LM) optimization algorithm, supporting mini-batch training for both regression and classification problems. It leverages GPU acceleration and offers an extensible framework, supporting diverse loss functions and customizable damping strategies.

A TensorFlow implementation is also available: tf-levenberg-marquardt

# Installation

pip install torch-levenberg-marquardt

/r/deeplearning
https://redd.it/1h4m51a

Читать полностью…

Data Scientology

Feature extraction

What is the best way to extract features of a detected object?

I have a YOLOv7 model trained to detect (relatively) small objects devided into 4 classes, I need to track them through the frames from a camera. The idea is that I would track them by matching the features with the last frame with a threshold.

What is the best way to do this?
- Is there a way to get them directly from the YOLOv7 inference?
- If I train a classifier (ResNet) to get the features from the final layer, what is the best way to organise the data? should I have them into 4 classes as I trained the detection model or should I organise them in a different way?

/r/computervision
https://redd.it/1gysea1

Читать полностью…

Data Scientology

FLUX&OpenSora for Editing!

https://redd.it/1guq0yh
@datascientology

Читать полностью…

Data Scientology

CV Experts: what parts of your workflow have the worst usability?

I often hear that CV tools have a tough UX - even for industry professionals. While there are a lot of great tools available, the complexity of using them can be a barrier. If the learning curve were lower, CV could potentially be adopted more widely in sectors with lower tech expertise, like retail, agriculture, and small-scale manufacturing.

In your CV workflow, where do you find usability issues are the worst? Which part of the flow is the most challenging or frustrating to work with?

Thanks for sharing any insights!

/r/computervision
https://redd.it/1gpui26

Читать полностью…

Data Scientology

Ivy x Kornia: Now Supporting TensorFlow, JAX, and NumPy! 🚀

Hey r/computervision!

Just wanted to share something exciting for those of you working across multiple ML frameworks.

Ivy is a Python package that allows you to seamlessly convert ML models and code between frameworks like PyTorch, TensorFlow, JAX, and NumPy. With Ivy, you can take a model you’ve built in PyTorch and easily bring it over to TensorFlow without needing to rewrite everything. Great for experimenting, collaborating, or deploying across different setups!

On top of that, we’ve just partnered with Kornia, a popular differentiable computer vision library built on PyTorch, so now Kornia can also be used in TensorFlow, JAX, and NumPy. You can check it out in the latest Kornia release (v0.7.4) with the new methods:

`kornia.to_tensorflow()`
kornia.to_jax()
`kornia.to_numpy()`

These new methods leverage Ivy’s transpiler, letting you switch between frameworks seamlessly without rewriting your code. Whether you're prototyping in PyTorch, optimizing with JAX, or deploying with TensorFlow, it's all smoother now.

Give it a try and let us know what you think! You can check out Ivy and some demos here:

Ivy on GitHub
[Ivy Demos](https://www.docs.ivy.dev/demos/examples_and_demos.html)
Ivy Discord

Happy coding!

https://preview.redd.it/a7kawqkl6mzd1.jpg?width=1104&format=pjpg&auto=webp&s=d14253cdba9f0064229c0e3e78b5cf8ddf52f6c6

/r/computervision
https://redd.it/1gmbesd

Читать полностью…

Data Scientology

CL/NLP/LT Master's Programs in Europe

Hello! (TL;DR at the bottom)

I am quite new here since I stumbled upon the subreddit by chance while looking up information about a specific master's program.

I recently graduated with a bachelor's degree in (theoretical) Linguistics (phonology, morphology, syntax, semantics, sociolinguistics etc.) and I loved my major (graduated with almost a 3.9 GPA) but didn't want to rush into a master's program blindly without deciding what I would like to REALLY focus on or specialize in. I could always see myself continuing with theoretical linguistics stuff and eventually going down the 'academia' route; but realizing the network, time and luck one would need to have to secure a position in academia made me have doubts. I honestly can't stand the thought of having a PhD in linguistics just because I am passionate about the field, only to end up unemployed at the age of 30+, so I decided to venture into a different branch.

I have to be honest, I am not the most well-versed person out there when it comes to CL or NLP but I took a course focusing on computational methods in linguistics around a year ago, which fascinated me. Throughout the course, we looked at regex, text processing, n-gram language models, finite state automata etc. but besides the little bit of Python I learned for that course, I barely have any programming knowledge/experience (I also took a course focusing on data analysis with R but not sure how much that helps).

I am not pursuing any degree as of now, you can consider it to be something similar to a gap year and since I want to look into CL/NLP/LT-specific programs, I think I can use my free time to gain some programming knowledge by the time the application periods start, I have at least 6-8 months after all.

I want to apply to master's programs for the upcoming academic year (2025/2026) and I have already started researching. However, not long after I started, I realized that there were quite a few programs available and they all had different names, different program content and approaches to the area of LT(?). I was overwhelmed by the sheer number of options; so, I wanted to make this post to get some advice.

I would love to hear your advice/suggestions if anyone here has completed, is still doing or has knowledge about any CL/NLP/LT master's program that would be suitable for someone with a solid foundation in theoretical linguistics but not so much in CS, coding or maths. I am mainly interested in programs in Germany (I have already looked into a few there such as Stuttgart, Potsdam, Heidelberg etc. but I don't know what I should look for when deciding which programs to apply to) but feel free to chime in if you have anything to say about any program in Europe. What are the most important things to look for when choosing programs to apply to? Which programs do you think would prepare a student the best, considering the 'fluctuating' nature of the industry?

P.S.: I assume there are a lot of people from the US on the subreddit but I am not located anywhere near, so studying in the US isn't one of my options.

TL;DR: Which CL/NLP/LT master's programs in Europe would you recommend to someone with a strong background in Linguistics (preferably in Germany)?

/r/LanguageTechnology
https://redd.it/1gfrnux

Читать полностью…

Data Scientology

Is a Linguistics major, CS minor, and Stats minor enough to get into a CL/NLP masters program?

Obviously a CS major would be ideal, but since I'm a first year applying out of stream, there is a good chance I won't get into the CS major program. Also, the CS minor would still allow me to take an ML course, a CL course, and an NLP course in my third/fourth years. Considering everything, is this possible? Is there a different minor that would be better suited to CL/NLP than Stats?

/r/LanguageTechnology
https://redd.it/1gbgyve

Читать полностью…

Data Scientology

Is POS tagging (like with Viterbi HMM) still useful for anything in industry in 2024? Moreover, have you ever actually used any of the older NLP techniques in an industry context?

I have a background in a Computer Science + Linguistics BS, and a couple years of experience in industry as an AI software engineer (mostly implementing LLMs with python for chatbots/topic modeling/insights).

I'm currently doing a part time master's degree and in a class that's revisiting all the concepts that I learned in undergrad and never used in my career.

You know, Naive Bayes, Convolutional Neural Networks, HMMs/Viterbi, N-grams, Logistic Regression, etc.

I get that there is value in having "foundational knowledge" of how things used to be done, but the majority of my class is covering concepts that I learned, and then later forgot because I never used them in my career. And now I'm working fulltime in AI, taking an AI class to get better at my job, only to learn concepts that I already know I won't use.

From what I've read in literature, and what I've experienced, system prompts and/or finetuned LLMs kind of beat traditional models at nearly all tasks. And even if there were cases where they didn't, LLMs eliminate the huge hurdle in industry of finding time/resources to make a quality training data set.



I won't pretend that I'm senior enough to know everything, or that I have enough experience to invalidate the relevance of PhDs with far more knowledge than me. So please, if anybody can make a point about how any of these techniques still matter, please let me know. It'd really help motivate me to learn them more in depth and maybe apply them to my work.



/r/LanguageTechnology
https://redd.it/1g8brrn

Читать полностью…

Data Scientology

D Self-Promotion Thread

Please post your personal projects, startups, product placements, collaboration needs, blogs etc.

Please mention the payment and pricing requirements for products and services.

Please do not post link shorteners, link aggregator websites , or auto-subscribe links.

--

Any abuse of trust will lead to bans.

Encourage others who create new posts for questions to post here instead!

Thread will stay alive until next one so keep posting after the date in the title.

--

Meta: This is an experiment. If the community doesnt like this, we will cancel it. This is to encourage those in the community to promote their work by not spamming the main threads.

/r/MachineLearning
https://redd.it/1g2fmfw

Читать полностью…
Подписаться на канал