Hot data science related posts every hour. Chat: https://telegram.me/r_channels Contacts: @lgyanf
The United States as James K. Polk Wanted It [964 x 740]
http://i.imgur.com/pwXoy.jpg
/r/MapPorn
https://redd.it/zkxfos
[OC] geospatial distribution of different fast food chains in the USA (included some of your suggestions from my previous post)
/r/dataisbeautiful
https://redd.it/zl3bta
Statisticians who got their PhD and now work in industry, how is it like? Q
Curious as to how the transition to industry was after a phd in statistics. Exciting? Frustrating? I’ve often heard both sides as with your phd you get more lucrative data science roles, but also it can be frustrating as there’s no emphasis of statistical rigor in industry. What have been your experiences? Any of you in startups? Developed your own startup? I’m just curious to see what kind of non traditional placements occurred for people who got their PhD in statistics.
/r/statistics
https://redd.it/zkzol0
[OC] Median home price in each U.S. State 2022.
/r/dataisbeautiful
https://redd.it/zl194o
When did women get voting rights? Portugal had a dictatorship in the 70s, but what happened in Switzerland for it to happen so late??
/r/MapPorn
https://redd.it/zkugqa
[OC] The most frequently mentioned books in posts on r/books
https://redd.it/zkwcsc
@datascientology
[OC] UK housing most unaffordable since Victorian times
/r/dataisbeautiful
https://redd.it/zktc6r
The US advises to stay away from the Middle East [OC]
/r/dataisbeautiful
https://redd.it/zk99sf
Can you recommend a Python textbook to replace "An Introduction to Statistical Learning with Applications in R", Witten, J. et. al. E
I am migrating a course from R to Python, and am looking to replace this textbook with one that is as similar as possible, but uses Python as the application language.
There is a github which converts all the R to Python from this book, and that is very nice, but not quite as convenient as a new book.
/r/statistics
https://redd.it/zk8rbr
[OC] Visualising Pfizer's latest income statement. Pharmaceutical profit margins are notoriously higher than most other industries
/r/dataisbeautiful
https://redd.it/zju4mr
[OC] Average Home Sold Home Price in Canada, Q4 2022 in $USD/$CAD
/r/dataisbeautiful
https://redd.it/zk65kp
Yet another 2022 Wrapped: Chat Messages per Day [OC]
/r/dataisbeautiful
https://redd.it/zk785k
Benefits of Walking in the daily life
/r/Infographics
https://redd.it/zjqygk
Programmatically create presentation slides with data visualisation graphs in Python
Hi all,
I am currently working on a project where I use Python’s data science libraries to generate graphs and various visualisations on data (eg using Pandas, Seaborn etc.). Ultimately, I’m looking to put all of these graphs and models into a PowerPoint- like presentation in a way that 1) the graphs are linked to a database, 2) the graphs get updated automatically if anything changes in the database, 3) I have a clean layout of text, pictures and models all together.
I am hence looking at tools that can help me achieve that. I see that Google slides integrate with Python through the gslides library but I haven’t found many examples of what it can generate. Jupyter notebook is another option but I’m not sure how a presentation like PowerPoint can be created in it (so far I’ve only really used JupyterNotebook for reporting purposes). Is there any tools I could look at?
Thanks, any help is much appreciated !
/r/datascience
https://redd.it/zjyleu
World Heritage Sites by Country
/r/MapPorn
https://redd.it/zjt920
[OC] Meat consumption
/r/dataisbeautiful
https://redd.it/zlgxjk
[OC] Prevalence of British and American Spelling Variants on Wikipedia
/r/dataisbeautiful
https://redd.it/zlc972
Tesla value as it relates to Twitter's purchase [OC]
/r/dataisbeautiful
https://redd.it/zl0t0n
'Time lapse' diagram of the motion of arms & string of an 'inswinger' ballista.
/r/mathpics
https://redd.it/zioaru
the Richest Billionaires in Each Country
/r/visualization
https://redd.it/ziopas
Red hair frequency in Europe
/r/MapPorn
https://redd.it/zks8gw
Air traffic control zones in the USA
/r/MapPorn
https://redd.it/zk8mng
Discussion Amazon's AutoML vs. open source statistical methods
>TL;DR: We paid USD $800 USD and spend 4 hours in the AWS Forecast console so you don't have to.
In this reproducible experiment, we compare Amazon Forecast and StatsForecast a python open-source library for statistical methods.
Since AWS Forecast specializes in demand forecasting, we selected the M5 competition dataset as a benchmark; the dataset contains 30,490 series of daily Walmart sales.
We found that Amazon Forecast is 60% less accurate and 669 times more expensive than running an open-source alternative in a simple cloud server.
We also provide a step-by-step guide to reproduce the results.
### Results
Amazon Forecast:
achieved 1.617 in error (measured in wRMSSE, the official evaluation metric used in the competition),
took 4.1 hours to run,
and cost 803.53 USD.
An ensemble of statistical methods trained on a c5d.24xlarge EC2 instance:
achieved 0.669 in error (wRMSSE),
took 14.5 minutes to run,
and cost only 1.2 USD.
For this data set, we show, therefore, that:
Amazon Forecast is 60% less accurate and 669 times more expensive than running an open-source alternative in a simple cloud server.
Classical methods outperform Machine Learning methods in terms of speed, accuracy, and cost.
Although using StatsForecast requires some basic knowledge of Python and cloud computing, the results are better for this dataset.
Table
https://preview.redd.it/vt9ru0149i5a1.png?width=1274&format=png&auto=webp&s=64e6d4519f5934d56d25d76d17a58e6d03d70512
/r/MachineLearning
https://redd.it/zk6h8q
(OC) Nine ways to divide Argentina
/r/MapPorn
https://redd.it/zk2a7n
[OC] Geospatial density of the biggest fast food chains in the USA
/r/dataisbeautiful
https://redd.it/zkercv
Countries with mandated paid maternity leave
/r/MapPorn
https://redd.it/zkbvoz
Are rule-based algorithms like PRISM/Ripper... competitive?
Hi!
Our professor at school likes Weka and makes us use it for training on the algorithms.
He also has slides on classification rules and talks about rule-based algorithms like PRISM, Ripper and DTNB and is asking us to use them on some dataset.
I was wondering, as I'm not finding much information about these algorithms, if they're outdated / not competitive enough nowadays, have you ever used them in a professionnal setting...
Thanks.
/r/datascience
https://redd.it/zjvf8t
In which data science jobs/careers is the Agile/Scrum philosophy NOT used? Just wondering.
Hi everyone! I was just wondering: In which data science jobs/careers is the Agile/Scrum philosophy NOT used?
(Btw, I would appreciate it if we could please avoid becoming distracted by the pros and cons of Agile/Scrum. That is not my intent.
I am just curious which data science jobs/careers do NOT use it. Thanks!)
/r/datascience
https://redd.it/zj1i5g
[OC] I was bored this week so I made this map about our world's fisheries, today's fish consumption, what we consume and where it comes from !
/r/dataisbeautiful
https://redd.it/zk3hu8
3D graphs are helping me to visualise data across multiple dimensions
/r/datascience
https://redd.it/zjvnuw