datascientology | Образование

Telegram-канал datascientology - Data Scientology

1234

Hot data science related posts every hour. Chat: https://telegram.me/r_channels Contacts: @lgyanf

Подписаться на канал

Data Scientology

TopicOpen Open Discussion Thread — Anybody can post a general visualization question or start a fresh discussion!

Anybody can post a question related to data visualization or discussion in the monthly topical threads. Meta questions are fine too, but if you want a more direct line to the mods, click here

If you have a general question you need answered, or a discussion you'd like to start, feel free to make a top-level comment.

Beginners are encouraged to ask basic questions, so please be patient responding to people who might not know as much as yourself.

---

To view all Open Discussion threads, click here.

To view all topical threads, click here.

Want to suggest a topic? Click here.

/r/dataisbeautiful
https://redd.it/yj6tsc

Читать полностью…

Data Scientology

Please fill out this survey about medical assisted death for a research paper in my college class (Everyone)
https://forms.gle/vrgFLeEbNXTC13YX9

/r/SampleSize
https://redd.it/ykmmca

Читать полностью…

Data Scientology

Peak map of Ireland

/r/MapPorn
https://redd.it/yk5l5d

Читать полностью…

Data Scientology

Relative PornHub searches in the UK.

/r/MapPorn
https://redd.it/ykimz7

Читать полностью…

Data Scientology

Broken McDonald's Ice cream machines worldwide
https://mcbroken.com/

/r/datasets
https://redd.it/yk0o85

Читать полностью…

Data Scientology

D Simple Questions Thread

Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instead!

Thread will stay alive until next one so keep posting after the date in the title.

Thanks to everyone for answering questions in the previous thread!

/r/MachineLearning
https://redd.it/ybjvk5

Читать полностью…

Data Scientology

I tried bending the rules of Perler beads!

https://redd.it/y0c0uk
@datascientology

Читать полностью…

Data Scientology

Map that shows if you were to drill straight through the earth where you would pop out on the other side.

/r/MapPorn
https://redd.it/yk2sgi

Читать полностью…

Data Scientology

Q If you had 3-5 years to prep for a PhD Stats, what would you do?

Given this time, what topics would you study, how would you research programs, what people would you reach out to, etc.?

My background: graduated with my BS Stats this past spring, really enjoyed my upper electives (bayesian, unsupervised learning, stochastic processes), and now I work in big four

My plan before the program: refresh + learn interesting/relevant topics (linear algebra, real analysis, data structures + algorithms), research many many programs (really like UWashington atm), and work 3-5 years to get experience and pocket cash

My plan after the program: I'd love to get into a soccer (football) analytics role - I really enjoy reading published statistical papers related to soccer and would like to do much the same (preferably for a club)

Any help/critique is appreciated even if it doesn't directly fit the background or timeline I'm working with. Also happy to explain more if it helps. Thanks!

/r/statistics
https://redd.it/yjt95c

Читать полностью…

Data Scientology

Travel times from London in 1914 (Source: RealLifeLore)

/r/MapPorn
https://redd.it/yjyqo6

Читать полностью…

Data Scientology

Request: Data sets of pharmaceutical drugs and which substances they have interactions with

I'm trying to find data sets of pharmaceutical drugs and substances they have interactions with. Example, if I search for "ambien", I want to see a list of all of the drugs that you shouldn't be taking with it. This might differ from country to country, so I want as many of these as I can find.

/r/datasets
https://redd.it/yityxq

Читать полностью…

Data Scientology

N Meta AI | Evolutionary-scale prediction of atomic level protein structure with a language model

Paper: https://www.biorxiv.org/content/10.1101/2022.07.20.500902v2


Meta's Tweet: https://twitter.com/MetaAI/status/1587467591068459008

Abstract

>Artificial intelligence has the potential to open insight into the structure of proteins at the scale of evolution. It has only recently been possible to extend protein structure prediction to two hundred million cataloged proteins. Characterizing the structures of the exponentially growing billions of protein sequences revealed by large scale gene sequencing experiments would necessitate a breakthrough in the speed of folding. Here we show that direct inference of structure from primary sequence using a large language model enables an order of magnitude speed-up in high resolution structure prediction. Leveraging the insight that language models learn evolutionary patterns across millions of sequences, we train models up to 15B parameters, the largest language model of proteins to date. As the language models are scaled they learn information that enables prediction of the three-dimensional structure of a protein at the resolution of individual atoms. This results in prediction that is up to 60x faster than state-of-the-art while maintaining resolution and accuracy. Building on this, we present the ESM Metagenomic Atlas. This is the first large-scale structural characterization of metagenomic proteins, with more than 617 million structures. The atlas reveals more than 225 million high confidence predictions, including millions whose structures are novel in comparison with experimentally determined structures, giving an unprecedented view into the vast breadth and diversity of the structures of some of the least understood proteins on earth.

/r/MachineLearning
https://redd.it/yjdt78

Читать полностью…

Data Scientology

Countries that recognise Kosovo

/r/MapPorn
https://redd.it/yjcdz0

Читать полностью…

Data Scientology

Race of players in major professional team sports leagues

/r/dataisbeautiful
https://redd.it/yjgqvr

Читать полностью…

Data Scientology

Deforestation In The Amazon Has Increased Significantly Over the Past Decade [OC]

/r/dataisbeautiful
https://redd.it/yjctv6

Читать полностью…

Data Scientology

The cost of 1 gigabyte of mobile data in every country around the world

/r/Infographics
https://redd.it/yki6v5

Читать полностью…

Data Scientology

Sentiment analysis of customer support tickets

Hi folks

I was wondering if there are any free sentiment analysis tools that are pre-trained (on typical customer support quer), so that I can run some text through it to get a general idea of positivity negativity? It’s not a whole lot of text, maybe several thousand paragraphs.

Thanks.

/r/datascience
https://redd.it/ykmpgt

Читать полностью…

Data Scientology

How many U.S. counties have a population greater than the state of Wyoming?

/r/MapPorn
https://redd.it/ykcd1q

Читать полностью…

Data Scientology

Riemann n-sphere

/r/mathpics
https://redd.it/y060t6

Читать полностью…

Data Scientology

P Implementation of MagicMix from ByteDance researchers, - New way to interpolate concepts with much more natural, geometric coherency (implemented with Stable Diffusion!)

Hi. Today I've came across this interesting paper https://arxiv.org/abs/2210.16056 that proposes interesting method to combine semantics of text and image in diffusion process.

In short, this mixes "layout" with "content", however unlike style transfer,


>"...semantic mixing aims to fuse multiple semantics into one single object."

I was surprised by the examples they showed, so I wanted to try it but the code wasn't available. I've implemented the method myself, and I wanted to share it here!

https://github.com/cloneofsimo/magicmix

Layout of \\"realistic photo of a rabbit\\" with content of \\"tiger\\"

I hope my implementation helps who is reading the paper!

Note: I'm not the author of the paper, and this is not an official implementation

/r/MachineLearning
https://redd.it/ykiuq0

Читать полностью…

Data Scientology

Help hosting trillions of rows of new health insurance public price data

As of July 1st this year all health insurers in the US were required to publish files on their websites of all their negotiated prices they have for every possible medical procedure with every doctor in the country. In totality this data set equates to trillions of rows and hundreds of TB of data.

I'm interested in building out a collaborative effort to aggregate all this data, but the cost of hosting seems to be a huge problem. What's the cheapest, effective way to host all this data in such a way that it's publicly accessible?

/r/datascience
https://redd.it/yk9gye

Читать полностью…

Data Scientology

Religions of Canada, 2021 [OC]

/r/dataisbeautiful
https://redd.it/yk2pqh

Читать полностью…

Data Scientology

The beginning of national anthems.

/r/MapPorn
https://redd.it/yk1o3n

Читать полностью…

Data Scientology

[OC] Different types of government systems

/r/dataisbeautiful
https://redd.it/yk1028

Читать полностью…

Data Scientology

Hilbert Curve pumpkin carving

/r/mathpics
https://redd.it/yhuudg

Читать полностью…

Data Scientology

US Child Pedestrian Deaths by Day of the Year: 2006-2020 [OC]

/r/dataisbeautiful
https://redd.it/yjarqg

Читать полностью…

Data Scientology

[OC] Is Nuclear Energy Dangerous? A comparison of chinese coal mining related fatalities to worldwide nuclear and radiation related fatalities

/r/dataisbeautiful
https://redd.it/yjovks

Читать полностью…

Data Scientology

Can you specialize in data cleaning?

Context: I'm studying Data Analytics right now through the Google certification, and once I land a job(any job😂 not necessarily specific to data analysis) I intend on pursuing a degree in Data Science.

Ran across Data Cleaning, as you'd expect, pretty early on. And everything about it sounds really interesting and like something that I'd enjoy. I looked more into it, but over and over again things kept coming up about how the work keeps being forced on data scientists who don't want anything to do with data cleaning.

So my question is, is it possible to specialize specifically in data cleaning? And if so, are there specific certifications or other relevant education that I should pursue to do that? Is data cleaning at risk of being automated?

/r/datascience
https://redd.it/yjkjlx

Читать полностью…

Data Scientology

D Pedagogy: Thoughts on this (old) blog post by Andrew Gelman on de-emphasizing the sampling distributions of the sample mean in intro Stats classes?

Teaching stats, and have tried to come up with the most intuitive explanations of the sampling distribution of the sampling mean, ran simulations with them etc. to try to inculcate the idea, thinking it would build up and be useful moving into inference for regression and other topics later. Found in office hours, many students still arent getting it (which I don't blame them for, I didn't in intro stats either). Then I came across this post, and I do not know how I feel about it for thinking about how I would change my class in the next iteration. Curious what all you Statisticians and educators and stats practitioners think!

​

Edit: added link

/r/statistics
https://redd.it/yiuj4z

Читать полностью…

Data Scientology

World's biggest employers

/r/Infographics
https://redd.it/yj1kp5

Читать полностью…
Подписаться на канал