datascientology | Образование

Telegram-канал datascientology - Data Scientology

1234

Hot data science related posts every hour. Chat: https://telegram.me/r_channels Contacts: @lgyanf

Подписаться на канал

Data Scientology

Promptify 2.0: More Structured, More Powerful LLMs with Prompt-Optimization, Prompt-Engineering, and Structured Json Parsing with GPT-n Models! 🚀

Hello fellow coders and AI enthusiasts!First up, a huge Thank You for making Promptify a hit with over **2.3k+ stars on Github** ! 🌟Back in 2022, we were the first one to tackle the common challenge of uncontrolled, unstructured outputs from large language models like GPT-3. , and your support has pushed us to keep improving.Today, we're thrilled to share some major updates that make Promptify even more powerful

​

* **Unified Architecture 🧭**: Introducing Prompter, Model & Pipeline Solution
* **Detailed Output Logs 📔**: Comprehensive structured JSON format output within the log folder.
* **Wider Model Support 🤝:** Supporting models from OpenAI, Azure, Cohere, Anthropic, Huggingface and more - think of it as your universal language model adapter.
* **Robust Parser 🦸‍♂️**: Parser to handle incomplete or unstructured JSON outputs from any LLMs.
* **Ready-Made Jinja Templates 📝:** Jinja prompt templates for NER, Text Classification, QA, Relation-Extraction, Tabular data, etc.
* **Database Integration 🔗**: Soon, Promptify directly to Mongodb integration. Stay tuned!
* **Effortless Embedding Generation 🧬**: Generate embeddings from various LLMs effortlessly with the new update.

Check out the examples and take Promptify for a spin on GitHub. If you like what you see, we'd be honored if you gave us a star!

**Github**: [https://github.com/promptslab/Promptify](https://github.com/promptslab/Promptify)

Thank you again for your support - here's to more structured AI!

from promptify import Prompter,OpenAI, Pipeline

sentence = "The patient is a 93-year-old female with a medical..."
model = OpenAI(api_key)
result = pipe.fit(sentence, domain="medical", labels=None)


Output

[ {"E": "93-year-old", "T": "Age"}, {"E": "chronic right hip pain", "T": "Medical Condition"}, {"E": "osteoporosis", "T": "Medical Condition"}, {"E": "hypertension", "T": "Medical Condition"}, {"E": "depression", "T": "Medical Condition"}, {"E": "chronic atrial fibrillation", "T": "Medical Condition"}, {"E": "severe nausea and vomiting", "T": "Symptom"}, {"E": "urinary tract infection", "T": "Medical Condition"}, {"Branch": "Internal Medicine", "Group": "Geriatrics"}, ]

​

/r/LanguageTechnology
https://redd.it/15dfttb

Читать полностью…

Data Scientology

D Simple Questions Thread

Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instead!

Thread will stay alive until next one so keep posting after the date in the title.

Thanks to everyone for answering questions in the previous thread!

/r/MachineLearning
https://redd.it/1518fj5

Читать полностью…

Data Scientology

How essential are strong math and statistics skills for NLP Engineers?

My initial belief was that math and stats would be extremely vital for this field, but I'm seeing some mixed information online. Ironically, Google Bard also was stating that math and stats are not vital. (Though I can't help but think that this is inaccurate).

Can anyone confirm and give some feedback? What are the needed core skills?

/r/LanguageTechnology
https://redd.it/150sew0

Читать полностью…

Data Scientology

I Hit 700K Views in 3 Months with my Opensource Shorts automation framework, ShortGPT

/r/computervision
https://redd.it/150mzll

Читать полностью…

Data Scientology

A Comparison of Large Language Models (LLMs) in Biomedical Domain
https://provectus.com/blog/comparison-large-language-models-biomedical-domain/

/r/LanguageTechnology
https://redd.it/14x5cge

Читать полностью…

Data Scientology

P TomoSAM, a 3D Slicer extension using SAM to aid the segmentation of 3D data from tomography or other imaging techniques

We are a team at NASA working on modeling the material response of Thermal Protection Systems (TPS). We developed this tool to streamline the segmentation process of micro-tomography data, a necessary step before using the physics solvers within PuMA. However, we believe that TomoSAM is general enough to be useful in other fields, such as medical imaging. The release is fully open-source and you can find more information in the links below:

TomoSAM extension within 3D Slicer

🔗 Github: https://github.com/fsemerar/SlicerTomoSAM

🔗 YouTube tutorial: https://www.youtube.com/watch?v=4nXCYrvBSjk

🔗 Publication: https://arxiv.org/abs/2306.08609

🔬 TomoSAM combines the power of Segment Anything Model (SAM), a cutting-edge deep learning model, with the capabilities of 3D Slicer, a software platform useful for visualization and segmentation.

💡 SAM is a promptable deep learning model developed by Meta AI that can identify objects and generate image masks in a zero-shot manner, requiring only a few user clicks.

⚙️ This integration reduces the need for laborious manual segmentation processes, saving significant time and effort for researchers working with volumetric data.

📄 Our paper outlines the methodology and showcases the capabilities of TomoSAM.

TomoSAM's usage, architecture, and communication system

Feel free to reach out if you have any questions or comments! 🚀

/r/MachineLearning
https://redd.it/14sroe6

Читать полностью…

Data Scientology

Additional Resources

Hi everyone,

After an [extended blackout](https://old.reddit.com/r/MachineLearning/comments/146ue8q/rmachinelearning_is_joining_the_reddit_blackout), we've decided to reopen the sub since it became pretty clear that if we didn't then the [admins would likely replace us and just reopen](https://kbin.social/m/machinelearning/t/68966/r-MachineLearning-finally-received-a-warning-from-u-ModCodeOfConduct) anyway.

We know lots of you contacted us during the blackout trying to understand how to stay up to date with the latest ML research, news, and discussions. For that reason we are providing additional resources below that either exclusively focus on ml or often discuss ml:

* ~~[taggernews](http://www.taggernews.com/tags/ai/machine%20learning/) - an ml powered classifier for [hackernews](https://news.ycombinator.com/) posts tagged as ai/ml~~
* use [hackernews](https://news.ycombinator.com/) RSS feeds like this one to keep up with posted research https://hnrss.org/newest?q=arxiv+OR+cvpr+OR+aaai+OR+iclr+OR+icml+OR+neurips+OR+emnlp+OR+acl
* reddit has rss feeds for just about any link by just adding `.rss` to the end of the url, so you can follow r/machinelearning using https://reddit.com/r/machinelearning.rss or even follow all posts that link to arxiv using https://reddit.com/domain/arxiv.org.rss
* [lobste.rs/t/ai](https://lobste.rs/t/ai) - posts tagged as ai on [lobste.rs](https://lobste.rs/t/ai)
* [m/machinelearning](https://kbin.social/m/machinelearning/) - a nascent space for ml discussion on [kbin](https://kbin.social)

You can also find a more thorough list of subs with additional resources here: https://sub.rehab

If you have additional resources you think would be useful, please comment below and we can add them to the list.

EDIT: removed taggernews since its long defunct

/r/MachineLearning
https://redd.it/14ionyi

Читать полностью…

Data Scientology

DragGAN code is finally released! (Interactive Point-based Manipulation on the Generative Image Manifold)

​

https://reddit.com/link/14j92cv/video/dvf302fz0b8b1/player

https://github.com/XingangPan/DragGAN

https://vcai.mpi-inf.mpg.de/projects/DragGAN/

/r/deeplearning
https://redd.it/14j92cv

Читать полностью…

Data Scientology

Stylize Animation using Temporalnet

/r/deeplearning
https://redd.it/147kh35

Читать полностью…

Data Scientology

D Simple Questions Thread

Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instead!

Thread will stay alive until next one so keep posting after the date in the title.

Thanks to everyone for answering questions in the previous thread!

/r/MachineLearning
https://redd.it/140fmf3

Читать полностью…

Data Scientology

TokenMonster Ungreedy ~ 35% faster inference and 35% increased context-length for large language models (compared to tiktoken). Benchmarks included

**From the** [**GitHub**](https://github.com/alasdairforsythe/tokenmonster)**:**

TokenMonster is an ungreedy tokenizer and vocabulary builder, outperforming tiktoken by 35%. In fact, TokenMonster's smallest 24000 vocabulary consistently uses less tokens than tiktoken's largest 100256 vocabulary to tokenize the same text. Save the tokens! [See benchmark](https://github.com/alasdairforsythe/tokenmonster/blob/main/benchmark).

Given a text dataset, a vocabulary-size and a maximum-token-length, TokenMonster selects the tokens that optimally represent your dataset at that vocabulary size. It can do this at reasonable speed (within 24 hours) on server hardware, at a cost of around $8. [Prebuilt vocabularies](https://github.com/alasdairforsythe/tokenmonster#prebuilt-vocabularies) are provided, as well as tools to train your own vocabularies & native implementations in Go, Python & Javascript for tokenization and detokenization using the prebuilt or your own vocabularies.

You can [test TokenMonster in your browser here](https://bot.co/tokenmonster/), tokenizing live in native Javascript.

TokenMonster is a novel approach to tokenization with broad-ranging use potential, but its primary motivation is to increase the inference speed and context-length of large language models. By selecting better tokens, text can be represented with 35% less tokens compared to other modern tokenizing methods, increasing the speed of inference, training and the length of text by 35%. The code-optimized tokenizers do even better, [see for yourself](https://bot.co/tokenmonster/).

I also believe that TokenMonster vocabularies will improve the comprehension of Large Language Models. For more details see [The Philosophy of Tokenization](https://github.com/alasdairforsythe/tokenmonster#the-philosophy-of-tokenization).

Features

* Outperforms other tokenization algorithms ([benchmark](https://github.com/alasdairforsythe/tokenmonster/blob/main/benchmark))
* Longer text generation at faster speed
* Selects the optimal vocabulary
* Ungreedy
* Supports UTF-8, UTF-16 and binary
* Successfully identifies words, subwords, common phrases and figures of speech by itself
* Works with HTML tags, sequential spaces, tabs, etc. without wasting context
* Averages 5.5 characters per token
* No GPU needed

/r/LanguageTechnology
https://redd.it/140evta

Читать полностью…

Data Scientology

PlaNeRF: SVD Unsupervised 3D Plane Regularization for NeRF Large-Scale Scene Reconstruction

By: Fusang Wang, Arnaud Louys, Nathan Piasco, Moussab Bennehar, Luis Roldão, Dzmitry Tsishkou tl;dr: SVD based plane regularization+SSIM supervision

https://arxiv.org/pdf/2305.16914.pdf

>Neural Radiance Fields (NeRF) enable 3D scene reconstruction from 2D images and camera poses for Novel View Synthesis (NVS). Although NeRF can produce photorealistic results, it often suffers from overfitting to training views, leading to poor geometry reconstruction, especially in lowtexture areas. This limitation restricts many important applications which require accurate geometry, such as extrapolated NVS, HD mapping and scene editing. To address this limitation, we propose a new method to improve NeRF’s 3D structure using only RGB images and semantic maps. Our approach introduces a novel plane regularization based on Singular Value Decomposition (SVD), that does not rely on any geometric prior. In addition, we leverage the Structural Similarity Index Measure (SSIM) in our loss design to properly initialize the volumetric representation of NeRF. Quantitative and qualitative results show that our method outperforms popular regularization approaches in accurate geometry reconstruction for large-scale outdoor scenes and achieves SoTA rendering quality on the KITTI-360 NVS benchmark.

​

https://preview.redd.it/s9vwi4g0103b1.png?width=1408&format=png&auto=webp&v=enabled&s=2e6b4d2ecde7e451efcbe7ab549ca013caa64544

/r/computervision
https://redd.it/13vo6aw

Читать полностью…

Data Scientology

I made a free online text-to-speech tool as an implementation of Meta's Massively Multilingual Speech (MMS) – Supports 1144 Languages and Dialects!
https://www.mmstts.com

/r/LanguageTechnology
https://redd.it/13qxvtt

Читать полностью…

Data Scientology

(Pt. 3) Neural Networks Temporal Logic Verification with STL Net
https://youtube.com/watch?v=Jts45lJKiRI&feature=share

/r/deeplearning
https://redd.it/13o3f4c

Читать полностью…

Data Scientology

Domain specific chatbot. Semantic search isn't enough.

Hi guys, I'm struggling to find a reliable solution to this specific problem.

I have a huge dataset with chat conversations, about several topics. I want to ask questions and retrieve information about these conversations in a chatbot way.

I have tried semantic search with chat gpt to answer questions about these conversations. The problem is that semantic search only returns top similar sentences, and doesn't ‘read’ all conversations, that’s not enough to answer generic questions, just very specific ones. For example, if I ask “What are these people talking about person X?” it will return only the top sentences (through semantic similarity) and that will not tell the whole story. The LLM’s models have a limit of tokens, so I can’t send the whole dataset as context.

Is there any approach to giving a reliable answer based on reading all the messages?

Any ideas on how to approach this problem?

/r/LanguageTechnology
https://redd.it/13grik5

Читать полностью…

Data Scientology

Attention Is Off By One
https://www.evanmiller.org/attention-is-off-by-one.html

/r/deeplearning
https://redd.it/158xmbw

Читать полностью…

Data Scientology

YoloV8 Body Pose Estimation TensorRT C++ Tutorial (link in comments)

/r/computervision
https://redd.it/156v3e5

Читать полностью…

Data Scientology

Meta/Facebook just release Llama 2
https://huggingface.co/models?other=llama-2

/r/LanguageTechnology
https://redd.it/1533kuf

Читать полностью…

Data Scientology

Questions about Transformers

I just started reading about Transformers model. I have barely scratched the surface of this concept. For starters, I have the following 2 questions

1. How positional encoding are incorporated in the transformer model? I see that immediately after the word embedding, they have positional encoding. But I'm not getting in which part of the entire network it is being used?

2. For a given sentence, the weight matrices of the query, key and value, all of these 3 have the length of the sentence itself as one of its dimensions. But the length of the sentence is a variable, how to they handle this issue when they pass in subsequent sentences?

/r/computervision
https://redd.it/14xutf2

Читать полностью…

Data Scientology

CoViz - A Neural Network Playground built with WebGPU🔥(Compute) and ReactFlow

/r/deeplearning
https://redd.it/14ri42r

Читать полностью…

Data Scientology

LLMOps.space - curated resources related to LLM & LLMOps

LLMOps space is a community for LLM enthusiasts, researchers, and practitioners. The community will focus on content, discussions, and events around topics related to deploying LLMs into production. 🚀

This includes-

✅ 50+ LLMOps companies
📅 Upcoming events
📚 Educational resources
👩‍💻 Open-source LLM modules
💰 Funding news

Check out the LLMOps community website-
http://llmops.space/

/r/deeplearning
https://redd.it/14qlpzi

Читать полностью…

Data Scientology

Realistic personal projects to demonstrate knowledge of 3D computer vision

I currently work as an ML engineer with a focus in computer vision. I'm interested in pursuing jobs related to photogrammetry/3D reconstruction/computer graphics and am looking for advice on how to land these kinds of jobs. I have a Masters Degree, and, ideally, would not want to go back for a PhD.

I have picked up Multi-view Geometry by Zisserman and plan on working through the book. However, I'm also interested in gaining more hands-on/practical experience in this area. What are some realistic projects I could work on which would showcase my knowledge of 3D vision?

/r/computervision
https://redd.it/14mciux

Читать полностью…

Data Scientology

Deepchecks' New Open Source is on Product Hunt, and Needs Your Help

Deepchecks’ new ML Monitoring Open Source tool has 𝐣𝐮𝐬𝐭 𝐥𝐚𝐮𝐧𝐜𝐡𝐞𝐝 𝐨𝐧 𝐏𝐫𝐨𝐝𝐮𝐜𝐭 𝐇𝐮𝐧𝐭 for 24 hours!!

We’d really appreciate your support at https://www.producthunt.com/all, look for the release by Deepchecks.

Please support us! Comment with feedback, or share with friends!

This Product Hunt comes right after our announcement about Deepchecks' $14M fundraise, and a bit more than a year after releasing our open source ML testing module. We hope that the consistent, significant additions to our repo show our commitment to benefiting the open source community.

Thank you!!

/r/deeplearning
https://redd.it/14chlv7

Читать полностью…

Data Scientology

Crazy stylization with Temporalnet

/r/deeplearning
https://redd.it/146ralm

Читать полностью…

Data Scientology

How Open Ai’s Andrej Karpathy Made One of the Best Tutorials in Deep Learning

I want you to check 0ssamaak0/how-open-ais-andrej-karpathy-made-one-of-the-best-tutorials-in-deep-learning-e6b6445a2d05">my review on Andrej Karpathy amazing work on explaining how GPT is built

GitHub Repo for code & more details

​

https://preview.redd.it/z204zwtzn44b1.png?width=720&format=png&auto=webp&v=enabled&s=58f7ff9cdbe418064d77162d71386f6037669e9f

/r/deeplearning
https://redd.it/141282u

Читать полностью…

Data Scientology

D Simple Questions Thread

Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instead!

Thread will stay alive until next one so keep posting after the date in the title.

Thanks to everyone for answering questions in the previous thread!

/r/MachineLearning
https://redd.it/13nx7t0

Читать полностью…

Data Scientology

Research directions not related to include optimization of Transformers (compute, memory, etc.)?

Honestly, I'm a bit fed up with the strong focus on squeezing the last bit of performance out of transformers. To lighten my mood I wanted to ask the community, if they've come across something interesting/different in their area.
For example, I've found "Thinking Like Transformers" by to be an enlightening fresh take.

/r/LanguageTechnology
https://redd.it/13t2dzt

Читать полностью…

Data Scientology

Drag Your GAN

/r/deeplearning
https://redd.it/13q0k34

Читать полностью…

Data Scientology

python tools to load, save, split, and convert computer vision datasets | link in comment

/r/computervision
https://redd.it/13kvq5n

Читать полностью…

Data Scientology

D Simple Questions Thread

Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instead!

Thread will stay alive until next one so keep posting after the date in the title.

Thanks to everyone for answering questions in the previous thread!

/r/MachineLearning
https://redd.it/13as0ej

Читать полностью…
Подписаться на канал