Hot data science related posts every hour. Chat: https://telegram.me/r_channels Contacts: @lgyanf
U.S. Net Domestic Migration During Covid, July 2020 to July 2021
/r/MapPorn
https://redd.it/yo8dpk
European countries where Jews were allowed to exist in 1500 [1701 x 1600]
/r/MapPorn
https://redd.it/ynnvbp
[OC] How much has MrBeast spent on his YouTube videos? According to his video titles.
/r/dataisbeautiful
https://redd.it/ynq4fh
Will We Start Seeing "Full Stack Data Scientist" Job Titles? What Would be the tech stack, if so?
Weird observation and hypothetical discussion for you all.
I am surprised that I haven't seen the term "full stack" creep into data science job titles yet. I imagine there would be a big need for "full stack" data scientists, especially at small to medium sized companies who don't want to build out big data teams.
I guess I would imagine a full stack DS to be someone who can do everything from engineering, to analysis, to machine learning. In 2022, a baseline tech stack might be:
\-Python (PySpark, Pandas, scikit, a few plotting libs)
\-SQL
\-TensorFlow or PyTorch
\-Knowledge of a cloud platform (AWS/GCP/Azure)
\-Knowledge of Docker + Kubernetes for deployment
\-strong software engineering fundamentals
\-strong statistics / analytics knowledge
\-domain knowledge + presentation skills
/r/datascience
https://redd.it/yn9zui
Americans were asked to point to Iran on a map
/r/MapPorn
https://redd.it/ynnc23
[R] Reincarnating Reinforcement Learning (NeurIPS 2022) - Google Brain
/r/MachineLearning
https://redd.it/yng63w
Q Why is it more statistically accurate to round down if the preceding number is even and round up when preceding number is odd?
Saw a comment about counting blood cells in another sub and basically if your average is 52.5, round down to 52, and when it’s 53.5, round up to 54?
/r/statistics
https://redd.it/ym3l8z
[OC] An Ironman Triathlon, by the minute
/r/dataisbeautiful
https://redd.it/ynh6rt
Looking for a dataset in context with words and sentences.
Can you guys recommend me a good dataset in which the words and sentences are categorized as in eating, walking belong to activity and running, walking belong to moving like those.
Your help will be appreaciated.
Any tips on where to find it and how to look for it will be appreciated as this will mean a great deal to me
/r/datasets
https://redd.it/ym6l7f
Countries that have Banned Asbestos Use
/r/MapPorn
https://redd.it/yn44ce
[OC] Apple is now worth more than Alphabet, Amazon and Meta put together
/r/dataisbeautiful
https://redd.it/ymz53n
Percentage of population “absolutely certain” God exists
/r/MapPorn
https://redd.it/yn1jrd
[OC] Japan's Retail Giants: Convenience Stores
/r/dataisbeautiful
https://redd.it/ymq5mj
[P] Finetuned Diffusion: multiple fine-tuned Stable Diffusion models, trained on different styles
https://redd.it/ymo07f
@datascientology
[OC] Election Results in the Weimar Republic / Germany (1919 to 1938)
/r/dataisbeautiful
https://redd.it/ynphp8
Backpack survey (Anyone 13+ who has bought a backpack before)
Can you please complete this survey which will be used to help me design a backpack as close as possible to what my target demographic (Teenagers to Adults) want and need in a backpack?
It will ask you what you look for when purchasing a backpack and your opinions/ideas about backpacks.
Responses will be kept anonymous
The design will be used for a class project
https://forms.office.com/Pages/ResponsePage.aspx?id=ViObpySMIkm0IMbibQtAkaxezkD0HOxLuaghf1VqlNhUQldHWFA1QlBQOFYxU0ZQREtFMTAxSjUzUS4u
/r/SampleSize
https://redd.it/ynq3du
[OC] Breaking down revenue and profit sources for Goldman Sachs - the largest investment bank in the world
/r/dataisbeautiful
https://redd.it/ynn3gi
How would I prove this conjecture false with a counter example.
/r/mathpics
https://redd.it/yle2i2
D What are your thougths on PCA applied to both numerical and categorical(binary) variables , for dimensionality reduction purposes?
I googled this topic and it is clearly something already discussed in different forums, but the many mixed opinions about it make me quite confused.
So, let's say we have a dataset with mixed variables, do you think it's better to directly try with more suitable dimensionality reduction techniques that deal with both numerical and categorical data, or PCA can still be considered a valid option (I know, PCA is first of all a feature extraction technique) ? Obviously, for PCA we have first to binarize categorical variables.
/r/statistics
https://redd.it/ym9g2a
How Greenland looks in ten different projections
/r/MapPorn
https://redd.it/yn7pbf
[Q] I'm trying to fit a "Buy Till You Die" LTV model using the lifetimes package in Python. Most of the resources I can find online will first fit a model on a calibration dataset before fitting a model on the entire dataset, but it will throw away the results from the calibration dataset. Why?
What's the point of fitting a distribution on a calibration dataset if we're not going to use that same model on the entire dataset?
https://towardsdatascience.com/buy-til-you-die-predict-customer-lifetime-value-in-python-9701bfd4ddc0
https://github.com/h3ik0th/clv/blob/main/BTYD_05.ipynb
https://archive.ph/NV3Ui#selection-2801.0-2801.6
/r/statistics
https://redd.it/ymt2o9
Q Why do Errors Not Need to be Normal in Logistic Regression?
I’m studying logistic regression, and one thing the professor is saying is that errors don’t need to be normally distributed or homoskedastic to be able to fit the model, get estimates and conduct inference.
I get that errors don’t need to be normally distributed to fit our parameters accurately - the CLT guarantees that they will be unbiased. But, if our distribution of X_i is very skewed, won’t this make our estimates of the standard error of B_i unreliable, since the standard error estimate assumes normality on both sides of the distribution but in reality, since X_i is skewed, our confidence intervals should be further out on the side with the skew and not balanced on both sides?
Examples of why it matters that our errors are normally distributed and homoskedastic in OLS but not in logistic regression would be helpful.
/r/statistics
https://redd.it/yn9v9v
Armenia - Azerbaijan crisis, and the VERY complicated situation in the Caucasus after the war in 2020.
/r/MapPorn
https://redd.it/ympkh1
Poverty in South America
/r/MapPorn
https://redd.it/yn2df7
The most popular celebrity perfumes and colognes according to Google data
/r/Infographics
https://redd.it/ymhbem
[OC] 2022 Mid-Term so Far: Seniors Turnout 65+ is 32.2%, Young Voters (18-29) is 4.19%
/r/dataisbeautiful
https://redd.it/ymudah
D ICLR 2023 reviews are out. How was your experience ?
Link: https://openreview.net/group?id=ICLR.cc/2023/Conference
A thread for ICLR '23 review related discussion.
1. What's your score?
2. Are you satisfied?
3. Other comments about the review process?
/r/MachineLearning
https://redd.it/ymctqy
[OC] How long headphones last and where they fail
/r/dataisbeautiful
https://redd.it/ymn7rs