datascientology | Образование

Telegram-канал datascientology - Data Scientology

1073

Hot data science related posts every hour. Chat: https://telegram.me/r_channels

Подписаться на канал

Data Scientology

Do you really need a strong Math ( and ML ) knowledge be a NLP engineer ?

Let me explain a bit. I come from a humanities bachelor's degree background, but with a strong passion for linguistics. I wanted to specialize in computational linguistics, but gradually I also became very interested in NLP and jobs related to NLP. That being said, I hope the repressed computer engineers don't show up now lol

I'm about to start a master's degree called “ Digital Humanities” but which is actually only about language technologies. The program includes various subjects like NLP, computational linguistics, data mining, programming, data analysis, etc. However, I know that the Machine Learning (ML) course is fundamental for NLP, but the university's ML course requires strong math foundations, designed for those who have a bachelor's degree in computer science or computer engineering. So, I had thought about giving it up and instead taking the course called “ Computational Intelligence and Deep Learning” that focuses more on topics like fuzzy logic and especially artificial neural networks, RNNs, etc., without requiring initial math foundations.
And maybe adding also an Algorithms class (a good class but not too advanced) to have an additional foundation for NLP.
And then I might study ML on my own through private courses like the one from Stanford on platforms like Coursera.

Or would it be better for me to study the math part (linear algebra, integral and differential calculus, functions) and attempt the ML exam? Keep in mind that I've already taken a statistics course and enjoyed it, but honestly, I don't have that much motivation to study math extensively, especially because I might invest so much effort for none since I might only find jobs like data linguist or computational linguist (given my background in humanistic informatics) where these strong math and ML knowledge are not necessary.

Certainly, my career goal in NLP isn't to engage in researching new algorithms and statistical models, I want to use more my linguistics knowledge in NLP but not only to do annotations.
I've noticed there are many people working more as "NLP engineers" many practical NLP tasks can be accomplished using existing libraries and tools without delving deep into the underlying mathematical concepts and who directly apply algorithms. So obviously you need t know algorithms and deep learning but not too much deep into math research right?

Or would it be better for me to just give up and focus solely on computational linguistics?

/r/LanguageTechnology
https://redd.it/165epjv

Читать полностью…

Data Scientology

Getting data from physical circular chart.

/r/computervision
https://redd.it/162xdyo

Читать полностью…

Data Scientology

Is CV evolving beyond bounding boxes?

Hi all - We (team of Stanford researchers) wrote a new blogpost on "Video Analysis Beyond Bounding Boxes" collecting some of our thoughts on the direction the CV field is heading.

We're actively researching&developing in this space so would love to hear some feedback on this vision for the future of CV and video analysis.

/r/computervision
https://redd.it/15ydds0

Читать полностью…

Data Scientology

Vision transformers (ViT)

/r/deeplearning
https://redd.it/15rf0i8

Читать полностью…

Data Scientology

Your Neural Network Doesn't Know What It Doesn't Know

Hi everyone,

I made a repo trying to collect every high-quality source for Out-of-distribution detection, ranging from articles and talks for beginners to research papers at top conferences. It also has a primer if you are not familiar with the topic. Check it out and give it a star to support me if you find it helpful. Thanks a lot ;)

https://github.com/continuousml

​

https://preview.redd.it/3dsy0ameoxhb1.png?width=868&format=png&auto=webp&s=4a0c016ab9ad6baeb603bedac1d798572fc41152

/r/computervision
https://redd.it/15q8mx0

Читать полностью…

Data Scientology

Looking for good learning sources around generative AI, specifically LLM

Are there any good video content sources that explains all the concepts associated with generative AI (ex: RL, RLHF, transformer, etc) from the ground up in extremely simple language (using analogies/stories of things that would be familiar to say a 10-12 year old)? Also would prefer channels which explain the concepts in a sequential manner (so that easy to follow) and make short and crisp videos

If yes, could you kindly comment below with the suggestions. If not, could you comment whether something like that would be useful to you and ideally why also?

Big thanks in advance 🙏

/r/deeplearning
https://redd.it/15hdu5v

Читать полностью…

Data Scientology

D Simple Questions Thread

Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instead!

Thread will stay alive until next one so keep posting after the date in the title.

Thanks to everyone for answering questions in the previous thread!

/r/MachineLearning
https://redd.it/15dnok8

Читать полностью…

Data Scientology

Promptify 2.0: More Structured, More Powerful LLMs with Prompt-Optimization, Prompt-Engineering, and Structured Json Parsing with GPT-n Models! 🚀

Hello fellow coders and AI enthusiasts!First up, a huge Thank You for making Promptify a hit with over **2.3k+ stars on Github** ! 🌟Back in 2022, we were the first one to tackle the common challenge of uncontrolled, unstructured outputs from large language models like GPT-3. , and your support has pushed us to keep improving.Today, we're thrilled to share some major updates that make Promptify even more powerful

​

* **Unified Architecture 🧭**: Introducing Prompter, Model & Pipeline Solution
* **Detailed Output Logs 📔**: Comprehensive structured JSON format output within the log folder.
* **Wider Model Support 🤝:** Supporting models from OpenAI, Azure, Cohere, Anthropic, Huggingface and more - think of it as your universal language model adapter.
* **Robust Parser 🦸‍♂️**: Parser to handle incomplete or unstructured JSON outputs from any LLMs.
* **Ready-Made Jinja Templates 📝:** Jinja prompt templates for NER, Text Classification, QA, Relation-Extraction, Tabular data, etc.
* **Database Integration 🔗**: Soon, Promptify directly to Mongodb integration. Stay tuned!
* **Effortless Embedding Generation 🧬**: Generate embeddings from various LLMs effortlessly with the new update.

Check out the examples and take Promptify for a spin on GitHub. If you like what you see, we'd be honored if you gave us a star!

**Github**: [https://github.com/promptslab/Promptify](https://github.com/promptslab/Promptify)

Thank you again for your support - here's to more structured AI!

from promptify import Prompter,OpenAI, Pipeline

sentence = "The patient is a 93-year-old female with a medical..."
model = OpenAI(api_key)
result = pipe.fit(sentence, domain="medical", labels=None)


Output

[ {"E": "93-year-old", "T": "Age"}, {"E": "chronic right hip pain", "T": "Medical Condition"}, {"E": "osteoporosis", "T": "Medical Condition"}, {"E": "hypertension", "T": "Medical Condition"}, {"E": "depression", "T": "Medical Condition"}, {"E": "chronic atrial fibrillation", "T": "Medical Condition"}, {"E": "severe nausea and vomiting", "T": "Symptom"}, {"E": "urinary tract infection", "T": "Medical Condition"}, {"Branch": "Internal Medicine", "Group": "Geriatrics"}, ]

​

/r/LanguageTechnology
https://redd.it/15dfttb

Читать полностью…

Data Scientology

D Simple Questions Thread

Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instead!

Thread will stay alive until next one so keep posting after the date in the title.

Thanks to everyone for answering questions in the previous thread!

/r/MachineLearning
https://redd.it/1518fj5

Читать полностью…

Data Scientology

How essential are strong math and statistics skills for NLP Engineers?

My initial belief was that math and stats would be extremely vital for this field, but I'm seeing some mixed information online. Ironically, Google Bard also was stating that math and stats are not vital. (Though I can't help but think that this is inaccurate).

Can anyone confirm and give some feedback? What are the needed core skills?

/r/LanguageTechnology
https://redd.it/150sew0

Читать полностью…

Data Scientology

I Hit 700K Views in 3 Months with my Opensource Shorts automation framework, ShortGPT

/r/computervision
https://redd.it/150mzll

Читать полностью…

Data Scientology

A Comparison of Large Language Models (LLMs) in Biomedical Domain
https://provectus.com/blog/comparison-large-language-models-biomedical-domain/

/r/LanguageTechnology
https://redd.it/14x5cge

Читать полностью…

Data Scientology

P TomoSAM, a 3D Slicer extension using SAM to aid the segmentation of 3D data from tomography or other imaging techniques

We are a team at NASA working on modeling the material response of Thermal Protection Systems (TPS). We developed this tool to streamline the segmentation process of micro-tomography data, a necessary step before using the physics solvers within PuMA. However, we believe that TomoSAM is general enough to be useful in other fields, such as medical imaging. The release is fully open-source and you can find more information in the links below:

TomoSAM extension within 3D Slicer

🔗 Github: https://github.com/fsemerar/SlicerTomoSAM

🔗 YouTube tutorial: https://www.youtube.com/watch?v=4nXCYrvBSjk

🔗 Publication: https://arxiv.org/abs/2306.08609

🔬 TomoSAM combines the power of Segment Anything Model (SAM), a cutting-edge deep learning model, with the capabilities of 3D Slicer, a software platform useful for visualization and segmentation.

💡 SAM is a promptable deep learning model developed by Meta AI that can identify objects and generate image masks in a zero-shot manner, requiring only a few user clicks.

⚙️ This integration reduces the need for laborious manual segmentation processes, saving significant time and effort for researchers working with volumetric data.

📄 Our paper outlines the methodology and showcases the capabilities of TomoSAM.

TomoSAM's usage, architecture, and communication system

Feel free to reach out if you have any questions or comments! 🚀

/r/MachineLearning
https://redd.it/14sroe6

Читать полностью…

Data Scientology

Additional Resources

Hi everyone,

After an [extended blackout](https://old.reddit.com/r/MachineLearning/comments/146ue8q/rmachinelearning_is_joining_the_reddit_blackout), we've decided to reopen the sub since it became pretty clear that if we didn't then the [admins would likely replace us and just reopen](https://kbin.social/m/machinelearning/t/68966/r-MachineLearning-finally-received-a-warning-from-u-ModCodeOfConduct) anyway.

We know lots of you contacted us during the blackout trying to understand how to stay up to date with the latest ML research, news, and discussions. For that reason we are providing additional resources below that either exclusively focus on ml or often discuss ml:

* ~~[taggernews](http://www.taggernews.com/tags/ai/machine%20learning/) - an ml powered classifier for [hackernews](https://news.ycombinator.com/) posts tagged as ai/ml~~
* use [hackernews](https://news.ycombinator.com/) RSS feeds like this one to keep up with posted research https://hnrss.org/newest?q=arxiv+OR+cvpr+OR+aaai+OR+iclr+OR+icml+OR+neurips+OR+emnlp+OR+acl
* reddit has rss feeds for just about any link by just adding `.rss` to the end of the url, so you can follow r/machinelearning using https://reddit.com/r/machinelearning.rss or even follow all posts that link to arxiv using https://reddit.com/domain/arxiv.org.rss
* [lobste.rs/t/ai](https://lobste.rs/t/ai) - posts tagged as ai on [lobste.rs](https://lobste.rs/t/ai)
* [m/machinelearning](https://kbin.social/m/machinelearning/) - a nascent space for ml discussion on [kbin](https://kbin.social)

You can also find a more thorough list of subs with additional resources here: https://sub.rehab

If you have additional resources you think would be useful, please comment below and we can add them to the list.

EDIT: removed taggernews since its long defunct

/r/MachineLearning
https://redd.it/14ionyi

Читать полностью…

Data Scientology

DragGAN code is finally released! (Interactive Point-based Manipulation on the Generative Image Manifold)

​

https://reddit.com/link/14j92cv/video/dvf302fz0b8b1/player

https://github.com/XingangPan/DragGAN

https://vcai.mpi-inf.mpg.de/projects/DragGAN/

/r/deeplearning
https://redd.it/14j92cv

Читать полностью…

Data Scientology

Oh ok, cool.

/r/deeplearning
https://redd.it/1648zlm

Читать полностью…

Data Scientology

Introducing Code Llama, a state-of-the-art large language model for coding
https://ai.meta.com/blog/code-llama-large-language-model-coding/

/r/deeplearning
https://redd.it/1605opp

Читать полностью…

Data Scientology

Fast CV App: Cross Platform Computer Vision Using Multiprocessing

**Why is this relevant to computer vision?**

In my project I show that a pure python app that does 1080p 30fps on both Windows and Mac is possible. It's good for prototyping, for testing (especially if you can just go to a C variant and make it really fast) and I hope in the future, for making "serious" apps.

I'm sharing this because I have never seen anybody talk about using multiprocessing, data compression, and a pure python GUI packaged to windows/mac in the context of computer vision. This might be due to people on reddit/discord/stack exchange just not talking about it but I really do think that this information is just locked to the industry professionals.

This is probably because people don't need it if they have a team of people working on a qt frontend and have another team working on computer vision specifically.

I haven't seen anybody working on this information publicly. All the good stuff is closed source in big corporations:


* examples: Mediapipe's slack channel require a google email: https://github.com/google/mediapipe/issues/779#issuecomment-1101212500
* I DEFINITELY do not have access to instagram filters or very specifically how they apply their filter processing. What I do know is that their more complex filters are not 30 fps at all on mobile phones.
* I can't recall off the top of my head other industry standard pose estimation apps that have open source code/documentation...

**What is my project?**

Here I show with Fast CV App that it is possible and that there is room for improvement. For example, I could "blit buffer" to a shared datatype instead of uploading the whole frame to shared memory, or even convert to YUV so that blit buffer on the kivy frontend is even faster, etc etc.

**How it works**

I gave up on threading because I just could not get mediapipe threading on 1080p frames to hit 30fps. As in the mediapipe docs, it actually drops frames to maintain framerate. I go one step further and actually analyze each frame. I do that by cheating and reading the future frames using opencv/ffmpeg, sending future frames to a multiprocessing subprocess to analyze, then recieve frames in kivy to display at the right time. This is where data compression kicks in, because inter-process communication was hell on this pipeline, taking up ~20-30ms which basically negated the benefits of multiprocessing. This delay made it so that instead of 3-4 subprocesses being sufficient, you needed to run ~6-8 subprocesses which is just not ok. I was stumped on this problem for ~3 months until I realized I could use a compression library like blosc to make the 1080p frames I was sending and receiving go from 6MB to 3.8MB, spending ~5ms on IPC on a task that previously took ~20-30ms. In hindsight, I think this step is actually a basic solution/ probably an industry standard, but all the multiprocessing tutorials never talked about compression so I never thought about it.

A couple tricks/hints:

* try/except blocks using a print(<error message here>, flush=True) was pretty good at catching silent errors from multiprocessing subprocesses

* start your multiprocessing code in AFTER an "if name == main" check or a similar guard so that you don't infinitely spawn subprocesses.

**Fast CV App links**

Github link:

https://github.com/AccelQuasarDragon/FastCVApp

Multiprocessing/Threading Analysis Video:

https://youtu.be/7-UdBUSfafo

Getting Started:

https://youtu.be/YnhHaKEx7pY

Thanks for your time and have a great day, hope this helps even one person out. Good luck!

/r/computervision
https://redd.it/15wdp3o

Читать полностью…

Data Scientology

OpenAI Notebooks which are really helpful.

The OpenAI cookbook is one of the most underrated and underused developer resources available today. Here are 7 notebooks you should know about:

1. Improve LLM reliability:
https://github.com/openai/openai-cookbook/blob/main/techniques\_to\_improve\_reliability.md
2. Embedding long text inputs:
https://github.com/openai/openai-cookbook/blob/main/examples/Embedding\_long\_inputs.ipynb
3. Dynamic masks with DALLE:
https://github.com/openai/openai-cookbook/blob/main/examples/dalle/How\_to\_create\_dynamic\_masks\_with\_DALL-E\_and\_Segment\_Anything.ipynb
4. Function calling to find places nearby:
https://github.com/openai/openai-cookbook/blob/main/examples/Function\_calling\_finding\_nearby\_places.ipynb
5. Visualize embeddings in 3D:
https://github.com/openai/openai-cookbook/blob/main/examples/Visualizing\_embeddings\_in\_3D.ipynb
6. Pre and post-processing of Whisper transcripts:
https://github.com/openai/openai-cookbook/blob/main/examples/Whisper\_processing\_guide.ipynb
7. Search, Retrieval, and Chat:
https://github.com/openai/openai-cookbook/blob/main/examples/Question\_answering\_using\_a\_search\_API.ipynb

Big thanks to the creators of these notebooks!

/r/deeplearning
https://redd.it/15rihgo

Читать полностью…

Data Scientology

D How to stay on the cutting edge of applied ML/AI while doing my PhD?

A lot of my PhD work will be in using different types of ML/NN approaches to characterizing problems in my field. It's kind of weird, since for my undergrad I came from a more traditional science background where we research off papers that were written like 2-20 years ago. Since a lot of these architectures and whatever are updating so fast, I wanted to see if there's a good way to keep up with the latest information so my work wouldn't be outdated by the time I publish. Is there a general workflow that those of you in the field follow in regards to this?

/r/MachineLearning
https://redd.it/15lnt4g

Читать полностью…

Data Scientology

resources to learn about training LLMs?

I'd like to train a mini-LLM on a CPU just to get some experience with LLM training. Do y'all have any resources/links to relevant tutorials? I've looked around myself, but I couldn't find too many in-depth tutorials. I'm also interested in building my own toy LLM from scratch, just for better understanding.

/r/deeplearning
https://redd.it/15j3ls5

Читать полностью…

Data Scientology

D NeurIPS 2023 Paper Reviews

NeurIPS 2023 paper reviews are visible on OpenReview. See this tweet. I thought to create a discussion thread for us to discuss any issue/complain/celebration or anything else.

There is so much noise in the reviews every year. Some good work that the authors are proud of might get a low score because of the noisy system, given that NeurIPS is growing so large these years. We should keep in mind that the work is still valuable no matter what the score is.

/r/MachineLearning
https://redd.it/15fo7td

Читать полностью…

Data Scientology

Attention Is Off By One
https://www.evanmiller.org/attention-is-off-by-one.html

/r/deeplearning
https://redd.it/158xmbw

Читать полностью…

Data Scientology

YoloV8 Body Pose Estimation TensorRT C++ Tutorial (link in comments)

/r/computervision
https://redd.it/156v3e5

Читать полностью…

Data Scientology

Meta/Facebook just release Llama 2
https://huggingface.co/models?other=llama-2

/r/LanguageTechnology
https://redd.it/1533kuf

Читать полностью…

Data Scientology

Questions about Transformers

I just started reading about Transformers model. I have barely scratched the surface of this concept. For starters, I have the following 2 questions

1. How positional encoding are incorporated in the transformer model? I see that immediately after the word embedding, they have positional encoding. But I'm not getting in which part of the entire network it is being used?

2. For a given sentence, the weight matrices of the query, key and value, all of these 3 have the length of the sentence itself as one of its dimensions. But the length of the sentence is a variable, how to they handle this issue when they pass in subsequent sentences?

/r/computervision
https://redd.it/14xutf2

Читать полностью…

Data Scientology

CoViz - A Neural Network Playground built with WebGPU🔥(Compute) and ReactFlow

/r/deeplearning
https://redd.it/14ri42r

Читать полностью…

Data Scientology

LLMOps.space - curated resources related to LLM & LLMOps

LLMOps space is a community for LLM enthusiasts, researchers, and practitioners. The community will focus on content, discussions, and events around topics related to deploying LLMs into production. 🚀

This includes-

✅ 50+ LLMOps companies
📅 Upcoming events
📚 Educational resources
👩‍💻 Open-source LLM modules
💰 Funding news

Check out the LLMOps community website-
http://llmops.space/

/r/deeplearning
https://redd.it/14qlpzi

Читать полностью…

Data Scientology

Realistic personal projects to demonstrate knowledge of 3D computer vision

I currently work as an ML engineer with a focus in computer vision. I'm interested in pursuing jobs related to photogrammetry/3D reconstruction/computer graphics and am looking for advice on how to land these kinds of jobs. I have a Masters Degree, and, ideally, would not want to go back for a PhD.

I have picked up Multi-view Geometry by Zisserman and plan on working through the book. However, I'm also interested in gaining more hands-on/practical experience in this area. What are some realistic projects I could work on which would showcase my knowledge of 3D vision?

/r/computervision
https://redd.it/14mciux

Читать полностью…

Data Scientology

Deepchecks' New Open Source is on Product Hunt, and Needs Your Help

Deepchecks’ new ML Monitoring Open Source tool has 𝐣𝐮𝐬𝐭 𝐥𝐚𝐮𝐧𝐜𝐡𝐞𝐝 𝐨𝐧 𝐏𝐫𝐨𝐝𝐮𝐜𝐭 𝐇𝐮𝐧𝐭 for 24 hours!!

We’d really appreciate your support at https://www.producthunt.com/all, look for the release by Deepchecks.

Please support us! Comment with feedback, or share with friends!

This Product Hunt comes right after our announcement about Deepchecks' $14M fundraise, and a bit more than a year after releasing our open source ML testing module. We hope that the consistent, significant additions to our repo show our commitment to benefiting the open source community.

Thank you!!

/r/deeplearning
https://redd.it/14chlv7

Читать полностью…
Подписаться на канал