E A nice chart of the most commonly used univariate distributions and their relationships
Published in the American Statistician (Leemis and McQueston, 2008). I've found this extremely helpful as a quick reference and often share with my students.
https://imgur.com/a/h23Bgxc
The paper itself has all the density functions and additional useful tidbits about how distributions are related.
/r/statistics
https://redd.it/znyfj2
What is Apache Arrow? by Pandas Creator Wes McKinnley
https://youtu.be/DTqGMRYcEt0
/r/bigdata
https://redd.it/zl19xa
Sexual Racism Experienced by Asian American LGBTQ+ Men in Online Dating (Asian-American & Pacific Islander men who have sex with men)
Open to all Asian American & Pacific Islander men who have sex with men (including transmasculine individuals and those who are nonbinary but use man/male as an identifier) and have experience with online dating.
Survey Link: https://umassboston.co1.qualtrics.com/jfe/form/SV\_bpvn9N6m3FdAiPQ
This study is sponsored by the UMass Boston Department of Psychology and supervised by Dr. David Pantalone. Our LGBTQIA+ affirmative research lab is dedicated to advancing the health of LBGTQIA+ communities. The study has been approved by the UMass Boston Institutional Review Board (IRB#2021228). For any questions, please contact the principal investigator, Christopher Chiu, at cchiu.umbstudy@gmail.com.
/r/SampleSize
https://redd.it/zni7jo
If you've held Bitcoin for five years, you're now sitting on a negative return [OC]
/r/dataisbeautiful
https://redd.it/znpurr
[OC] The US leads the way by a mile in government space budgets, with the Artemis mission sending humans back to the moon for the first time in 50 years
/r/dataisbeautiful
https://redd.it/znb677
[OC] The U.S. spends one third of its tax revenue on its military
/r/dataisbeautiful
https://redd.it/zn2l2l
US States and Canadian Provinces' total sales tax.
/r/MapPorn
https://redd.it/zn72gr
How do you abbreviate cumulative in your feature names?
Not trolling but I want to know how do you all abbreviate cumulative x for a feature name. For example cumulative streamed minutes for past 7 days can be cum_streamed_min_L7. I feel uncomfortable putting that name in presentations.
/r/datascience
https://redd.it/zmuchu
[OC] Military expenditure (% of GDP) of the U.S. from 1993 to 2020
/r/dataisbeautiful
https://redd.it/zn2s5f
D Trying to find paper about n-grams in early transformer layers
I remember reading a paper a while back that showed early attention layers in a transformer could be replaced with a simpler mechanism since most heads only modeled small n-grams. I think they used some kind of pooling?
Wondering if anyone knows which paper that was and had any thoughts about it since then. Thanks!
/r/MachineLearning
https://redd.it/zmoxp7
Adult Obesity Rate vs. Median Household Income by State [OC]
/r/dataisbeautiful
https://redd.it/zo50vb
[OC] saw this at the local vet. pets’ ages in human years
/r/dataisbeautiful
https://redd.it/zny2h4
[OC] top 23* FIFA rankings over the last 25 years. Is there a correlation between the rankings and the world cup finals? Also, did you know that Morroco is the lowest-ranked team that qualified for the semi-finals over the last 25 years (*ranked #23) Link in the comments
/r/dataisbeautiful
https://redd.it/zneh00
Child abuse in the U.S. - victims by perpetrator relationship 2020
https://www.statista.com/statistics/254893/child-abuse-in-the-us-by-perpetrator-relationship/
/r/dataisbeautiful
https://redd.it/znno50
So far 2022 has had the second widest range of daily average temperatures in the Central England temperature series. This shows the 5 years with the widest and narrowest range of temperatures in the series from 1772. [OC]
/r/dataisbeautiful
https://redd.it/znccpl
[P] XetHub: We scaled Git to support 1 TB repos
Thanks to everyone who replied to our [earlier post requesting pre-launch product feedback](https://www.reddit.com/r/mlops/comments/zd7hqy/feedback_requested_new_data_storage_tool_for/)! We’re excited to announce that we’ve now publicly launched [XetHub](https://xethub.com/?utm_source=reddit&utm_medium=organic&utm_campaign=xethub-intro&utm_content=link), a collaborative storage platform for data management.
I’ve been in the MLOps space for \~10 years, and data is still the hardest unsolved open problem. Code is versioned using Git, data is stored somewhere else, and context often lives in a 3rd location like Slack or GDocs.
This is why we built XetHub, a platform that enables teams to treat data like code, using Git.
Unlike Git LFS, XetHub doesn’t just store the files. It uses content-defined chunking and Merkle Trees to dedupe against everything in history, allowing small changes in large files to be stored compactly. Here’s how it works: [https://xethub.com/assets/docs/how-xet-deduplication-works](https://xethub.com/assets/docs/how-xet-deduplication-works)
XetHub includes a GitHub-like web interface that provides automatic CSV summaries and allows custom visualizations using Vega. And we know how painful downloading a huge repository can get, so we built Git-Xet mount—which, in seconds, provides a user-mode filesystem view over the repo.
Today, XetHub works for 1 TB repositories, and we plan to scale to 100 TB in the next year. Our implementation is in Rust (client & cache + storage) and our web application is written in Go.
XetHub is available today for Linux & Mac (Windows coming soon) and we’d love for you to try it out!
More info here:
* [https://xetdata.com/blog/2022/12/13/introducing-xethub](https://xetdata.com/blog/2022/12/13/introducing-xethub)
* [https://xetdata.com/blog/2022/10/15/why-xetdata](https://xetdata.com/blog/2022/10/15/why-xetdata)
* Hacker News discussion (launched on Show HN at #1): [https://news.ycombinator.com/item?id=33969908](https://news.ycombinator.com/item?id=33969908)
https://preview.redd.it/t9tf3kt5i96a1.png?width=1740&format=png&auto=webp&s=184dd57d9f3d4e1dea94f8ab02211f663e214e84
/r/MachineLearning
https://redd.it/znfgap
[OC] How long is each US president's Wikipedia page?
/r/dataisbeautiful
https://redd.it/zms0td
Easy to build and high-end visualizations for Google Slides and Notion
Hello community,
For all those like me, who were struggling to build visualizations on Google Slides or simply felt they lacked the high quality charts they needed, Rollstack has created a simple and powerful charting and visualization tool for Google slides and Notion.
Its users especially enjoy the massive time gains when building charts, slides, and documents. Analytics, strategy, bizops, finance, marketing, and sales teams
Here's a short product demo.
Let me know what is your current experience building charts on Google slides and Notion?
/r/visualization
https://redd.it/zmjfvp
[OC] Fast fashion companies add new items to their sites all the time. Shein is the worst, with 60,000 new items each month.
/r/dataisbeautiful
https://redd.it/zmiezz
WLB suddenly turned toxic
Everything was nice, the WLB was good. Then the company got acquired by another larger fish and the WLB changed badly. I am currently working on a project where the manager expects us to work all day long. He himself works till 2-3AM. And I don't want to know why and how, but this will always be a mystery
He keeps saying this is critical and has a tight deadline.
I wish I could just say f### this criticality and tight deadlines. I can't be working 12-14 hours everyday and exhaust myself. I literally see blurr and severe headaches after the 10-11th hour.
This has been going on for 3 weeks continuously now and every time he keeps saying how we need to pace up and match his level of speed and commitment and he literally asks us to be "robots" and keep working.
I did hear of past employees changing teams because of his way of working.
PS : there is a time difference of 6 hours and he literally keeps on messaging us on MS Teams for updates, asks us for calls and updates at midnight!!!
/r/datascience
https://redd.it/zmr3an
Areas Under Arab control at one point or another in Europe
/r/MapPorn
https://redd.it/zmojg8