datascientology | Образование

Telegram-канал datascientology - Data Scientology

1234

Hot data science related posts every hour. Chat: https://telegram.me/r_channels Contacts: @lgyanf

Подписаться на канал

Data Scientology

Ancestry in Brazil

/r/MapPorn
https://redd.it/yhe9h5

Читать полностью…

Data Scientology

Q What is wrong with interpreting a Bayesian probability from a Frequentist CI?

I know the interpretation for a Frequentist 95% CI is that if we were to construct infinitely many similarly constructed intervals, 95% of them would contain the true population value.

Generally, many people ("wrongly") say that if they generate a 95% CI of (5,10) that there is a 95% chance that the population value is between 5 and 10. But how is this wrong? There is a large X number of possible CI's of size n, 95% of which contain the true value, and this is one out of the X intervals, so there is a 95% chance that it is one of the intervals that contains the true value. And if that is the case, there is a 95% chance that it contains the true value.

Now, what I have also heard is people will say that by doing this I'm interpreting a Frequentist construct with a Bayesian probability. Because to the Frequentists, it doesn't make sense to talk about the probability of whether this contains the true value or not, *it does or it doesn't*. But what is wrong with saying "I've created this Frequentist 95% CI, (5,10), so there is a 95% chance (from Bayesian definition of probability rather than Frequentist) that the true population value is between 5 and 10."

/r/statistics
https://redd.it/yhf0fw

Читать полностью…

Data Scientology

Countries with the most Olympic medals per capita

/r/MapPorn
https://redd.it/yhcsim

Читать полностью…

Data Scientology

[OC] Total Number of Births Since 1850

/r/dataisbeautiful
https://redd.it/yg8som

Читать полностью…

Data Scientology

How would I figure this out

/r/mathpics
https://redd.it/ygp7ah

Читать полностью…

Data Scientology

Question R-Squared: biased and invalid for small samples?

I've running regressions with different samples, and I have the impression that, the smaller the sample, the larger the R-squared. For instance R-squared with n =2 is always 100%. Is sample R-squared a biased estimator of population R-squared? Is R-squared invalid for small samples?

/r/statistics
https://redd.it/ygrv5i

Читать полностью…

Data Scientology

[OC] US colleges attract a lot of foreign students

/r/dataisbeautiful
https://redd.it/ygtrah

Читать полностью…

Data Scientology

[R] ERNIE-ViLG 2.0: Improving Text-to-Image Diffusion Model with Knowledge-Enhanced Mixture-of-Denoising-Experts + Gradio Demo

https://redd.it/ygj11f
@datascientology

Читать полностью…

Data Scientology

data_irl

/r/data_irl
https://redd.it/ycbsb6

Читать полностью…

Data Scientology

Why I like Halloween

/r/funnycharts
https://redd.it/yewrzk

Читать полностью…

Data Scientology

An interesting visual jump

/r/dataisugly
https://redd.it/ygg7di

Читать полностью…

Data Scientology

UN vote to end US embargo against cuba

/r/MapPorn
https://redd.it/ygjf47

Читать полностью…

Data Scientology

Difficulty transitioning between R and Python?

I’m using R in grad school, as required, and Python at work, due to availability. I’m reasonably comfortable with both, but switching back and forth within the same day is a bit rough. At what level of fluency will it get easier? Any suggestions? Just keep at it?

/r/datascience
https://redd.it/yg38or

Читать полностью…

Data Scientology

[OC] The average colour of each US state flag.

/r/dataisbeautiful
https://redd.it/yfqdjy

Читать полностью…

Data Scientology

[OC] The average colour of each US state flag.

/r/dataisbeautiful
https://redd.it/yfqdjy

Читать полностью…

Data Scientology

Map of The 13 British Overseas Territories

/r/MapPorn
https://redd.it/yhedox

Читать полностью…

Data Scientology

The Stack - A 3TB Dataset of permissively-licensed code in 30 languages
https://twitter.com/bigcodeproject/status/1585631176353796097?s=46&t=mLrACB0pej1c7ge2uX2vKg

/r/datasets
https://redd.it/yfhxnb

Читать полностью…

Data Scientology

Infinite monkey Theorem [oc]

/r/mathpics
https://redd.it/ygjxgi

Читать полностью…

Data Scientology

Ancient Data Viz? I guess pictographs, cave arts, ethnomath count but are there spiritual or indigenous knowledge that exists? Pls lmk if u kno any orgs, groups, articles, vids, or academic papers on this!!

https://redd.it/yh8c6n
@datascientology

Читать полностью…

Data Scientology

[OC] Number of Costco vs Sam's Club Stores across US States

/r/dataisbeautiful
https://redd.it/ygnt2l

Читать полностью…

Data Scientology

[OC] Number of packs required to fill the FIFA World Cup Qatar 2022 Album (670 stickers without trading)

/r/dataisbeautiful
https://redd.it/ygrwhg

Читать полностью…

Data Scientology

There is a lake in Finland, that looks like Finland.

/r/MapPorn
https://redd.it/ygl4tf

Читать полностью…

Data Scientology

[OC] GOT and HOTD Episodes by IMDb User Ratings

/r/dataisbeautiful
https://redd.it/ygkmmu

Читать полностью…

Data Scientology

data_irl
https://imgur.com/gallery/y9rdpXZ

/r/data_irl
https://redd.it/yglpst

Читать полностью…

Data Scientology

The Top 100 Most Valuable Brands in 2022 (according to Brand Finance)

/r/Infographics
https://redd.it/ygpil4

Читать полностью…

Data Scientology

[OC] The average colour of each European country flag.

/r/dataisbeautiful
https://redd.it/ygjn50

Читать полностью…

Data Scientology

[OC] Contributing factors to price inflation. Another representation of the chart Rep. Katie Porter showed. With a bonus example to demonstrate.
https://www.epi.org/blog/corporate-profits-have-contributed-disproportionately-to-inflation-how-should-policymakers-respond/

/r/dataisbeautiful
https://redd.it/yg6ccq

Читать полностью…

Data Scientology

[OC] How much time do men and women spend caring for children in the US?

/r/dataisbeautiful
https://redd.it/yfvmtd

Читать полностью…

Data Scientology

Electrical grids of Canada/USA

/r/MapPorn
https://redd.it/yg0dnu

Читать полностью…

Data Scientology

A critical reflection of jupyter notebooks

In my experience notebooks are a surprisingly controversial topic. I've seen things ranging from Databricks building tools for data scientists and data engineers that can seemingly only run on notebooks unless you install the notoriously buggy databricks connect to people using the word "notebook" antithesis of good programming habits.

Recently I've been listening to more talks about interactive vs batch programming and have just been reflecting on how I write code myself. Here's my own set of hot takes:

​

1. The name of notebooks explains what they are meant for. They are for experimenting, prototyping, potentially automating reports with markdown, etc. Essentially, you use them to jot down ideas as you would on a piece of paper.
2. You should build systems/features/... with notebooks and not with regular scripts to save time. You should treat your notebook as a debugger that is always on. Writing code in notebooks is a great way to build code interactively and incrementally. If you have IO sitting around and waiting to load data out of your DB to train a model each run doesn't make sense.
3. Notebooks DO encourage poor programming standards if you don't watch out. People say this as a buzzword without ever clarifying what they mean. The biggest one here is the (ab)use of global variables and the fact that notebooks are typically self contained units. Having a proper project structure and reusing code across your project is important. Ideally you define building blocks in functions/classes somewhere in your project and run them in your notebooks, if so your notebook is equivalent to production code.
4. Using a notebook as a scratchpad and porting it to "production code" is faster than writing production code in a .py file. This is the summary of the following 3 points and what I personally do. The overhead of porting a notebook to 4-5 different files in a clear directory structure with a main somewhere that runs them, in the same order as a notebook would, is imo just less than building it from scratch like that.
5. Don't delete your scratchpads, keep them around as documentation for your streamlined production workflow. Why? Because running each block in a notebook is ime gives you more freedom than fighting with your debugger if/when something does go wrong.

​

Sidenote: this is why I think people have issues transitioning from R to Python or from Spyder to another IDE/text editor. R (studio) and Spyder are a lot closer to interactive programming because you can run your code line by line and not lose your variables. This is how programming should be, but not how the vast majority of people learnt it and people don't like change.

/r/datascience
https://redd.it/yfsxrn

Читать полностью…
Подписаться на канал