Telegram-канал datascientology - Data Scientology: Образование - каталог телеграмм

datascientology | Образование

Подписаться на канал

Telegram-канал datascientology - Data Scientology

1234

Hot data science related posts every hour. Chat: https://telegram.me/r_channels Contacts: @lgyanf

Подписаться на канал

Data Scientology

29 октября 2022 01:14

Q In business, it seems like we care much more often about Type II errors than Type I errors.

I often seem to encounter situations in business where Type I errors don't seem very important.

Say we're testing two images on our website and want to know which one causes more conversions, so I use a two-tailed test with the null hypothesis that there is no difference in conversion rates between the images. In this scenario, it seems like I shouldn't care much about making a Type I error. If the null hypothesis is true, but I incorrectly conclude that it can be rejected and implement the "winner", there's actually no downside to my business: we don't lose any conversions, because the null hypothesis is true so it never mattered at all which I chose. Whichever one I picked there would be no change to conversions. I guess the opportunity is missed to realise that this test was a waste of time and maybe avoid implementing pointless tests in the future, but considering only this one test, if we assume the null is true then my choice is unimportant and therefore so is the statistical significance of my results.

What does seem to matter far more is power. If there is a true difference between these alternatives, even a very small one, sometimes in business it's really important that I be able to detect it. If improving conversion a few percentage points can earn my company millions of dollars, then I need to set up my test so that we find those tiny effects.

So... is this correct? Am I missing something about the practical interpretation of significance here? If this is the case, why is it that so much time and literature and tooling focuses on the significance of results rather than the power?

/r/statistics
https://redd.it/yfubif

Читать полностью…

Data Scientology

28 октября 2022 23:14

7M+ Venmo transactions scraped from the public API

Transactions scraped from the [Venmo](https://venmo.com/) public API by [Dan Salmon](https://danthesalmon.com/about/)

This data was collected during the following date ranges:

* July 2018 - September 2018
* October 2018
* Jan 2019 - Feb 2019

While there is no data for the amount transferred, it's interesting to look at most frequently occurring target (receiver) / actor (sender) pairs.

Source: [https://github.com/sa7mon/venmo-data](https://github.com/sa7mon/venmo-data)
\[Self promotion\] We've re-hosted the data on Gigasheet for exploration before downloading [https://app.gigasheet.com/spreadsheet/Venmo-Transactions-by-Dan-Salmon-github-com-sa7mon-venmo-data/56db56e2\_acb7\_4cc9\_9d7a\_ae308f5a2a06?public=true](https://app.gigasheet.com/spreadsheet/Venmo-Transactions-by-Dan-Salmon-github-com-sa7mon-venmo-data/56db56e2_acb7_4cc9_9d7a_ae308f5a2a06?public=true)

/r/datasets
https://redd.it/yfn8sz

Читать полностью…

Data Scientology

28 октября 2022 21:14

kaggle is wild (⁠・⁠o⁠・⁠)

/r/datascience
https://redd.it/yfnbab

Читать полностью…

Data Scientology

28 октября 2022 19:14

Russian tank T72-B3M, Obr. 2016

/r/Infographics
https://redd.it/yfm3of

Читать полностью…

Data Scientology

28 октября 2022 17:14

Ethnic Map of Canada, 2021 [OC]

/r/dataisbeautiful
https://redd.it/yfn8s5

Читать полностью…

Data Scientology

28 октября 2022 07:14

The 12 Largest US Cities in 2020

/r/MapPorn
https://redd.it/yf29cz

Читать полностью…

Data Scientology

28 октября 2022 04:14

It’s pretty, but what does it mean?

/r/dataisugly
https://redd.it/ye0y6q

Читать полностью…

Data Scientology

28 октября 2022 01:14

What's that humming sound? The World Hum Database

The World Hum Database (aka "The Hum") has thousands of user-submitted reports about an "unusual unidentified low-frequency sound that scientists now call the Worldwide Hum."

https://thehum.info/

https://thehum.info/ewExternalFiles/march22dbase.csv

/r/datasets
https://redd.it/ye4rhc

Читать полностью…

Data Scientology

27 октября 2022 23:14

[OC] How Meta made (or struggled to make) money in Q3 👇

/r/dataisbeautiful
https://redd.it/yeo103

Читать полностью…

Data Scientology

27 октября 2022 21:14

Some of England's most relevant brands by county of origin.

/r/MapPorn
https://redd.it/yeow7f

Читать полностью…

Data Scientology

27 октября 2022 18:14

[OC] Where do Democrats and Republicans stand on free speech and the internet?

/r/dataisbeautiful
https://redd.it/yeu2r4

Читать полностью…

Data Scientology

27 октября 2022 16:14

List of large numbers up to TREE(3)
https://youtu.be/RYMjOwH_bWg

/r/mathpics
https://redd.it/y3qhlw

Читать полностью…

Data Scientology

27 октября 2022 12:14

Europe: How willing would you be to help another country in a crisis?

/r/MapPorn
https://redd.it/yehe70

Читать полностью…

Data Scientology

27 октября 2022 09:14

Drawing Europe from memory on a sequence pillow, a little stretched towards the north east but not bad.

/r/MapPorn
https://redd.it/ydwta2

Читать полностью…

Data Scientology

27 октября 2022 06:14

I made a family tree of the Olympian gods, featuring classical artworks

/r/Infographics
https://redd.it/ye1i3k

Читать полностью…

Data Scientology

29 октября 2022 00:15

[OC] The absolute quality of Better Call Saul.

/r/dataisbeautiful
https://redd.it/yfphz4

Читать полностью…

Data Scientology

28 октября 2022 22:14

data_irl

/r/data_irl
https://redd.it/yd21tv

Читать полностью…

Data Scientology

28 октября 2022 20:15

Vietnamese diaspora worldwide as a share of local population.

/r/MapPorn
https://redd.it/yfjket

Читать полностью…

Data Scientology

28 октября 2022 18:14

Ethnicities of Slovakia (Slovaks, Hungarians, Rusyns, Roma), based on the 2021 census

/r/MapPorn
https://redd.it/yfi514

Читать полностью…

Data Scientology

28 октября 2022 10:14

An 'Undoing' of the *Culprit* Hard Unknot

/r/mathpics
https://redd.it/y0r3jb

Читать полностью…

Data Scientology

28 октября 2022 06:14

New paper on Automatically Detecting Label Errors in Entity Recognition Data

Hi Redditors!

I think you guys will find this very useful. Any of us that use entity recognition datasets have probably come across labels that are incorrect. Our newest research) investigates automated methods to find sentences with mislabeled words in such datasets. Mislabeling is especially common in ML tasks like token classification, where labels must be chosen on a fine-grained basis. It is exhausting to get every single word labeled right!

We benchmarked a bunch of possible algorithms on real data (with actual label errors rather than synthetic errors often considered in academic studies) and identified one straightforward approach that can find mislabeled words with better precision/recall than others.

This algorithm is now available for you to run on your own text data in one line of open-source code). We ran this method on the famous CoNLL-2003 entity recognition dataset and found it has hundreds of label errors.

Blogpost: https://cleanlab.ai/blog/entity-recognition/

Paper: https://arxiv.org/abs/2210.03920

/r/datasets
https://redd.it/yewllw

Читать полностью…

Data Scientology

28 октября 2022 03:14

All the metals we have mined this year.

/r/Infographics
https://redd.it/yeole2

Читать полностью…

Data Scientology

28 октября 2022 00:14

U.S. Senators Ranked by Their Ability to Legislate in 2022 [OC]

https://redd.it/yewqtl
@datascientology

Читать полностью…

Data Scientology

27 октября 2022 22:14

America's 3 deadliest drugs are legal .Underlying cause of death in America in 2015, by drug

/r/visualization
https://redd.it/yeq34i

Читать полностью…

Data Scientology

27 октября 2022 19:14

Most common baby names in London, 2021

/r/MapPorn
https://redd.it/yetos9

Читать полностью…

Data Scientology

27 октября 2022 17:14

Some Cute Little Sequences of Figures Beautifully Evincing the (Possibly Not Altogether Obvious @ First Glance) Topological Equivalence of Linked & Unlinked 'Handcuffs'

/r/mathpics
https://redd.it/y1j292

Читать полностью…

Data Scientology

27 октября 2022 13:14

6 Things You May Not Know About Pumpkins

/r/Infographics
https://redd.it/yekklc

Читать полностью…

Data Scientology

27 октября 2022 11:14

D Why can't we say "we are 95% sure"? Still don't follow this "misunderstanding" of confidence intervals.

If someone asks me "who is the actor in that film about blah blah" and I say "I'm 95% sure it's Tom Cruise", then what I mean is that for 95% of these situations where I feel this certain about something, I will be correct. Obviously he is already in the film or he isn't, since the film already happened.

I see confidence intervals the same way. Yes the true value already either exists or doesn't in the interval, but why can't we say we are 95% sure it exists in interval a, b with the INTENDED MEANING being "95% of the time our estimation procedure will contain the true parameter in a, b"? Like, what the hell else could "95% sure" mean for events that already happened?

/r/statistics
https://redd.it/yeccnw

Читать полностью…

Data Scientology

27 октября 2022 07:15

Data Science Book Club

I’ve been a data scientist for 3 years and love it. I have come across some essential textbooks and books that would supplement my knowledge and career. I’ve made a list elsewhere and was wondering if others would like to join me as I try to read and discuss these books. I can host it in discord and we can read 75 pages a week, meeting for an hour virtually to discuss the ideas within. Any takers?

/r/datascience
https://redd.it/ye8626

Читать полностью…

Data Scientology

27 октября 2022 05:14

Islam in Canada, 2021 vs. 2011

https://redd.it/ye8nxg
@datascientology

Читать полностью…

Подписаться на канал