Telegram-канал datascientology - Data Scientology: Education - каталог телеграмм

datascientology | Education

Subscribe to a channel

Telegram-канал datascientology - Data Scientology

1234

Hot data science related posts every hour. Chat: https://telegram.me/r_channels Contacts: @lgyanf

Subscribe to a channel

Data Scientology

15 December 2022 21:14

shoe color frequency

/r/dataisugly
https://redd.it/zml4e6

Читать полностью…

Data Scientology

15 December 2022 17:14

Are simulations done by data scientists or someone else?



\[Picture of a traffic simulation in Unreal Engine\](https://preview.redd.it/qfep38a8rv5a1.png?width=800&format=png&auto=webp&v=enabled&s=8237dce33de4a01dc3f07787f7c88f902aef791f)

I put together a blog this morning (https://whiteowleducation.substack.com/p/why-are-simulations-the-future-of) that builds off of a reddit discussion from yesterday.

I am genuinely curious though. Are any of you using simulations for your day-to-day work?

I ask because I see reports of the following:

Nvidia using simulations for weather prediction
BMW using simulations in order to optimize factory layout
There have been recent discoveries in Nuclear Fusion, and I have to believe that simulations were used to help set up those experiments.
I even see traffic Jams being simulated in Unreal Engine



Traffic Jam simulations are even photorealistic.

Long story short, it seems like people are doing simulations, but would this go under the "data science" job title, or is there a different profession that does this kind of work?

/r/datascience
https://redd.it/zlt5nt

Читать полностью…

Data Scientology

15 December 2022 14:14

[OC] Mean World Cup players name length

/r/dataisbeautiful
https://redd.it/zm3jm0

Читать полностью…

Data Scientology

15 December 2022 10:14

Work find it unbelievable I’ve exceeded 75GB storage limit…?

IT almost laughed at me on the phone saying I’d need at least 200GB.

I asked the bloke what his PC at home goes up to, and he implied most storage is taken up by programs so 75GB is more than enough for files.

Nobody in this organisation (4000+ people) have ever exceeded 75GB. Wtf??

One typical csv file is 1GB, how is this happening in such a large organisation?? My god.

Edit: this is Onedrive space. We’re unable to store things locally

/r/datascience
https://redd.it/zm3gac

Читать полностью…

Data Scientology

15 December 2022 08:14

[OC] Over the last decade, Chile has risen to become the world's third-largest producer of cherries, only behind Turkey and the United States. 🍒

/r/dataisbeautiful
https://redd.it/zlzr8q

Читать полностью…

Data Scientology

15 December 2022 06:14

Religious affiliation in Iran, based on a 2020 survey by Gamaan Research.

/r/Infographics
https://redd.it/zlse2t

Читать полностью…

Data Scientology

15 December 2022 03:14

Found the image on Twitter. Posted by climate change deniers

/r/dataisugly
https://redd.it/zkqci5

Читать полностью…

Data Scientology

15 December 2022 01:14

[OC] The Most Valuable Companies In The World

/r/dataisbeautiful
https://redd.it/zly6c2

Читать полностью…

Data Scientology

14 December 2022 23:14

Project Run and fine-tune BLOOM-176B at home using a peer-to-peer network

We made a library for inference/fine-tuning of open 175B+ language models (like BLOOM) using Colab or a desktop GPU. You join forces with other people over the Internet (BitTorrent-style), each running a small part of model layers. Check out our Colab example!

Thing is, even though BLOOM weights were publicly released, it was extremely difficult to run inference efficiently unless you had lots of hardware to load the entire model into the GPU memory (you need at least 3x A100 or 8x 3090 GPUs). E.g., in case of offloading, you can only reach the speed of \~10 sec/step for sequential (non-parallel) generation.

A possible alternative is to use APIs, but they are paid and not always flexible (you can’t adopt new fine-tuning/sampling methods or take a look at hidden states). So, Petals come to the rescue!

This is how Petals work: some peers want to use a pretrained LM to solve various tasks with texts in natural or programming languages. They do it with help of other peers, who hold subsets of model layers on their GPUs.

More details:

Paper (with speed measurements): [https://arxiv.org/abs/2209.01188](https://arxiv.org/abs/2209.01188)
GitHub repo: https://github.com/bigscience-workshop/petals

What do you think of it?

/r/MachineLearning
https://redd.it/zl03b0

Читать полностью…

Data Scientology

14 December 2022 20:14

What does yellow signify?!

/r/dataisugly
https://redd.it/zlarj4

Читать полностью…

Data Scientology

14 December 2022 18:14

Relative Humidity readings from my basement after carrying out remedial works - love that trend [OC]

/r/dataisbeautiful
https://redd.it/zlniy0

Читать полностью…

Data Scientology

14 December 2022 15:14

Lying on the CV taken to the next level

I have someone in my team who is currently applying for one of the internal roles - a promotion 2 levels above her current level. I am on the interview panel but not her referee and therefore have to remain unbiased and take the information that was presented in the CV like I would for an external applicant.

This person has no technical skills, no understanding behind even simple concepts, just memorized a few things but is very interested in promotions and started asking about them 6 months into the role. Seems way more interested in promotions than learning DS :(

Anyway, I have seen plenty of people add about 20% to their CV, overstate their role in a project etc. This person has claimed that she has built 2 models that don't exist as a part of my team. She described techniques used and claims she has led the whole effort and the models are now deployed (these are techniques that I mentioned in team meetings, but always said that it will depend on the data. Turns out we didn't have enough good data so looks like these models will never be built. She is up to date on these developments). I am in a very large org and nobody really keeps track of new models etc.

On the basis of these lies, I have seen that she was invited for an interview. Many people that are way more talented but were more honest didn't. This really bothers me. I did mention it to my manager who seems disinterested and made a comment that I need to be building up junior DS and not tearing them down :(

This is more of a vent than anything.

/r/datascience
https://redd.it/zlobg8

Читать полностью…

Data Scientology

14 December 2022 09:14

The United States as James K. Polk Wanted It [964 x 740]
http://i.imgur.com/pwXoy.jpg

/r/MapPorn
https://redd.it/zkxfos

Читать полностью…

Data Scientology

14 December 2022 06:14

[OC] geospatial distribution of different fast food chains in the USA (included some of your suggestions from my previous post)

/r/dataisbeautiful
https://redd.it/zl3bta

Читать полностью…

Data Scientology

14 December 2022 01:14

Statisticians who got their PhD and now work in industry, how is it like? Q

Curious as to how the transition to industry was after a phd in statistics. Exciting? Frustrating? I’ve often heard both sides as with your phd you get more lucrative data science roles, but also it can be frustrating as there’s no emphasis of statistical rigor in industry. What have been your experiences? Any of you in startups? Developed your own startup? I’m just curious to see what kind of non traditional placements occurred for people who got their PhD in statistics.

/r/statistics
https://redd.it/zkzol0

Читать полностью…

Data Scientology

15 December 2022 18:14

The symmetry of the orbits of a double pendulum, dropped from rest, at every initial angles. [OC]

/r/mathpics
https://redd.it/zlh45q

Читать полностью…

Data Scientology

15 December 2022 15:14

P Image search with localization and open-vocabulary reranking.

TL;DR

Image search with open vocabulary localization using both index and search time methods.

Article (no paywall): jesse_894/image-search-with-localization-and-open-vocabulary-reranking-using-marqo-yolox-clip-and-owl-vit-9c636350bf66?source=friends_link&sk=b4e94d9d4095a2b8b60c5d1904a60825">jesse" rel="nofollow">https://medium.com/@jesse\_894/image-search-with-localization-and-open-vocabulary-reranking-using-marqo-yolox-clip-and-owl-vit-9c636350bf66?source=friends\_link&sk=b4e94d9d4095a2b8b60c5d1904a60825

Markdown: https://github.com/marqo-ai/marqo/blob/mainline/examples/ImageSearchLocalization/article.md

Code: https://github.com/marqo-ai/marqo/blob/mainline/examples/ImageSearchLocalization/index\_all\_data.py

I wanted to have a few choices getting localization into image search (index and search time). I immediately thought of using a region proposal network (rpn) from mask-rcnn to create patches that can also be indexed and searched (and add the localisation). I figured it might be somewhat agnostic to classes. I did not want to use mmdetection or detectron2 due to their dependencies and just getting the rpn was not worth it. I was encouraged by the PyTorch native implementations of detection/segmentation models but ended up finding yolox the best.

I also implemented one based on the self attention maps from the DINO trained ViT’s. This worked pretty well when the attention maps were combined with some traditional computer vision to get bounding boxes. It seemed an ok compromise between domain specialization and location specificity. I did not try any saliency or gradient based methods as i was not sure on generalization and speed respectively. I know LAVIS has an implementation of grad cam and it seems to work well in the plug'n'play vqa.

For the indexing I cropped the images based on the proposed bounding boxes. I did not test blending methods but feel this might be better as more context can be in the image. If anyone has a perspective on this I would love to hear it.

For localisation at search time I ended up using OWL-ViT. This worked really well. I did not try Detic or CLIPseg but would be interested to hear if anyone else has tried these?

/r/MachineLearning
https://redd.it/zmigt1

Читать полностью…

Data Scientology

15 December 2022 12:14

[OC] The Most Valuable Companies In The World

/r/Infographics
https://redd.it/zly6d0

Читать полностью…

Data Scientology

15 December 2022 09:14

Just a population density map with different colors :)

/r/dataisugly
https://redd.it/zkk8n2

Читать полностью…

Data Scientology

15 December 2022 07:14

Top-earning YouTube channel in every country

/r/MapPorn
https://redd.it/zlripw

Читать полностью…

Data Scientology

15 December 2022 05:14

Sun Tanning vs. Skin Whitening google search

/r/MapPorn
https://redd.it/zm2fl1

Читать полностью…

Data Scientology

15 December 2022 02:14

What's your opinion of medical marijuana, as a former patient? (Former medical marijuana patients)

(M/F)

Hello! I am a research student researching opinions on medical marijuana from former patients. I would really appreciate it if you would take my survey. Thank you! https://forms.gle/V7mAacDWBAcBcNCW8

/r/SampleSize
https://redd.it/zlwj13

Читать полностью…

Data Scientology

15 December 2022 00:14

Dataset: 2,889 battles occurring within Japan during its Warring-States period, from 1467 to 1600.
https://www.tandfonline.com/doi/abs/10.1080/03050629.2023.2149514?journalCode=gini20

/r/datasets
https://redd.it/zlorxu

Читать полностью…

Data Scientology

14 December 2022 21:14

If you were to hire a Data Scientist with one question, what would that be?

let's get creative 😀

/r/datascience
https://redd.it/zll3sq

Читать полностью…

Data Scientology

14 December 2022 19:14

America’s Beautiful Weather Zones by Mattie Lubchansky

/r/MapPorn
https://redd.it/zlqmpb

Читать полностью…

Data Scientology

14 December 2022 17:14

P Implemented Vision Transformers 🚀 from scratch using TensorFlow 2.x

Hello Everyone 👋,

I just implemented the paper named AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE popularly known as the vision transformer paper. This paper uses a Transformer encoder for image recognition. It achieves state-of-the-art performance without using convolutional layers given that we have a huge dataset and enough computational resources.
Below I am sharing my implementation of this paper, please have a look and give it a 🌟 if you like it. This implementation provides easy-to-read code for understanding how the model works internally.

My implementation: GitHub Link

Thanks for your attention. 😀

/r/MachineLearning
https://redd.it/zloof9

Читать полностью…

Data Scientology

14 December 2022 13:14

[OC] It takes over 12,000L of water to produce one outfit (not including shoes or underwear) - the average person drinks 691L of water a year. That's over 18 years worth of drinking water

/r/dataisbeautiful
https://redd.it/zln5zj

Читать полностью…

Data Scientology

14 December 2022 07:14

[OC] Meat consumption

/r/dataisbeautiful
https://redd.it/zlgxjk

Читать полностью…

Data Scientology

14 December 2022 04:14

[OC] Prevalence of British and American Spelling Variants on Wikipedia

/r/dataisbeautiful
https://redd.it/zlc972

Читать полностью…

Data Scientology

14 December 2022 00:14

Tesla value as it relates to Twitter's purchase [OC]

/r/dataisbeautiful
https://redd.it/zl0t0n

Читать полностью…

Subscribe to a channel