[OC] Animated map showing crude oil tankers blocked from crossing Turkish waters
/r/dataisbeautiful
https://redd.it/zecan7
The 50 Most (& Least) Deadly Travel Destinations [OC]
/r/dataisbeautiful
https://redd.it/zeb3ys
Q What kind of stereotypes do you hear often when you tell people you are a statistician/data scientist?
I was wondering what kind of stereotypes people assign to you when you tell them about your study/job. I have found that people assume (because I am a data scientist) that I am also an expert on how computer hardware works and that I strongly follow the latest trends on computer hardware. It's not that I am bad with computers or that I don't understand anything about them, but I really I am not that handy with computers.
I studied psychology before this and if you tell people that you studied psych they always, and they aren't even joking, think you can read their minds and shit. Honestly people are so fucking dumb (me included).
Please feel free to vent and/or tell me about your own experiences with annoying people assuming all kinds of stuff about you.
/r/statistics
https://redd.it/ze2mtp
What are some cool projects/tasks you got to do as a data scientist/analyst?
I think it would serve as some motivation for those still unsure about this field.
/r/datascience
https://redd.it/ze7o1g
US College Enrollment Has Dropped by 3.1 Million Students Since 2012, But Annual Tuition Revenue is Up by $6.9 Billion
https://myelearningworld.com/college-enrollment-tuition-revenue-study/
/r/dataisbeautiful
https://redd.it/zea0mh
[OC] Visualizing the Visualizers. A look at how many Instagram followers our favorite data visualizers have. If there are any visualizers we should check out, please comment
/r/visualization
https://redd.it/zdqcqu
How to push back on client making poor infrastructure decisions?
I'm consulting for biotech company that basically wants me to connect their clinical database and a couple public databases to a new database, a dashboarding program and develop some basic metrics/ML capability. They want this done within 6 weeks.
I said fine, my plan is to stand up a bunch of docker pipelines in kubernetes connected to a cloud DB in azure.
They gave me an 8 core ubuntu vm with admin access and told me to use that. They refuse to give me any access keys or create any specific roles or resource groups. I don't/can't manage the VM.
I had most of the pipelines and dashboard working locally via docker images. Now I've spent 2 days trying to move the VM mySQL database to an attached drive that has weird permissions. I'm particularly worried I'll have to redo everything if the VM goes down rather than taking a modern infrastructure as code approach.
Should I push back here or am I being picky?
/r/datascience
https://redd.it/zdgg10
Misleading Legend on Gas Price Map (dark blue states have gas less than $3; legend shows maximum value, not highest average)
/r/dataisugly
https://redd.it/zcnb5x
C I work in academia, and am looking for to change careers. What kind of jobs can/should I be looking at with the skill set I have?
i have worked in academia my whole life (well, aside from some pub work here and there when studying), and so I really don't have much of an idea of the job market at all. However, I am a bit fed up of short term contracts and academia in general and so I am thinking of exploring career options outside of that world that can leverage the skills that I have. In terms of general qualifications I have a bachelors degree, a masters degree and a doctorate, and I work in macroecology now, but have worked in evolutionary biology before that. Most of my career has been applying statistical models to large (well, large for ecology and evobio) datasets to understand patterns of trait evolution and patterns/drivers of biodiversity to understand the observed distribution of biodiversity and to predict how climate/land use change might impact biodiversity in the future.
That means that I am intimately familiar with R for data handling, processing, visualisation and model fitting. I have worked with a bunch of different model structures, but mostly variations on linear regression (OLS regression, GLM, logistic regressions, hierarchical models) in both frequentist and Bayesian frameworks. I have also used a few different machine learning algorithms for species distribution modelling (specifically random forests, boosted regression trees and SVMs And Maxent, but I think that might be SDM specific). I have a working familiarity with Python, but I barely use it (I understand how to write the language, objects and data structures, for loops, how to write and execute a script etc. but I mostly used it for editing and manipulating DNA sequences) and I have a rudimentary understanding of SQL but, again, I have barely used it.
So I THINK I have a good grasp of statistics and modelling (at least compared to my peers I would consider myself in the higher percentiles for ability in this area), but I don't have any idea how it compares to an average applicant in the job market. Additionally I don't have any real formal training in statistics/data science aside from undergrad courses aimed at biologists (mostly OLS regression, hypothesis testing and stuff like t-tests, chi squared and ANOVA) - I have more or less figured everything out myself, or learned from colleagues/peers. I also don't have any experience with a lot of the common platforms that seem to be used in these sorts of roles (from a brief scan of ads for jobs things like SQL and redshift come up a lot).
So what I am asking then, I suppose, is what sort of jobs and what sort of LEVEL of job should I be looking for if I want to try and leverage some of these skills into a different career? Or do I lack too many of the "industry standard" skills and experiences to really be thinking about this sort of thing?
Thanks all.
/r/statistics
https://redd.it/zd3ix2
TopicOpen Open Discussion Thread — Anybody can post a general visualization question or start a fresh discussion!
Anybody can post a question related to data visualization or discussion in the monthly topical threads. Meta questions are fine too, but if you want a more direct line to the mods, click here
If you have a general question you need answered, or a discussion you'd like to start, feel free to make a top-level comment.
Beginners are encouraged to ask basic questions, so please be patient responding to people who might not know as much as yourself.
---
To view all Open Discussion threads, click here.
To view all topical threads, click here.
Want to suggest a topic? Click here.
/r/dataisbeautiful
https://redd.it/z9n9d5
Looking for gas price data by ZIP Code
As the subject says, I'm looking for historical gas price data by Zipcode going back at least a year; and if possible, 8 years. The more detailed, the better. If it's on a daily basis, that would be great.
/r/datasets
https://redd.it/zcpg7j
Politically Exposed Persons (PEPs) Data Set
This data comes from OpenSanctions.org: "A politically exposed person (PEP) is a person that has been entrusted with a prominent public function. PEPs include elected officials, members of government.
Integrating data about political actors is an essential step in making an open source due diligence database. However, it is a much more intricate task than collecting sanctions lists (of which there are only a few dozen), and fully addressing it will be the focus of a later stage of this project."
You can find the data here on more than 197,000 entities: https://www.opensanctions.org/datasets/peps/
I've re-hosted the "Targets as simplified CSV" with more than 170,000 people records for exploration:
https://app.gigasheet.com/spreadsheet/Politically-Exposed-Persons--PEPs---opensanctions-org/862d21cf\_6eb3\_44db\_953b\_33b9324527e6?public=true
/r/datasets
https://redd.it/z7y9ap
Most popular languages to learn (according to Duolingo in 2022)
https://redd.it/zedi24
@datascientology
D Stable Diffusion 1 vs 2 - What you need to know
Hey everyone!
I wrote this quick summary of **Stable Diffusion 1 vs 2** to distill all the important points down into one spot for people who haven't had time to keep up. Just dropping it here for anyone interested!
https://preview.redd.it/v8r09ydu4b4a1.png?width=1151&format=png&auto=webp&s=b62ea88f08f66d8e686d06b8f3b465c3e1d778bc
/r/MachineLearning
https://redd.it/zebm1l
Academic Did you know santa travels at the speed of the Sun? (all)
We are working on a website to track Santa's arrival time and we could use some help on how to make it a bit more flawoureful.
If you're up to help us out please fill in this short form:
https://forms.gle/ucnKXGWMtaWT9wYu5
/r/SampleSize
https://redd.it/ze681l
I made a website that lets you launch an asteroid at Earth and see the effects [OC]
https://neal.fun/asteroid-launcher/
/r/dataisbeautiful
https://redd.it/zdd566
P Save your sklearn models securely using skops
Hello 👋🏼 I'm Merve, one of the core devs of this library called skops. In the latest release, we introduced a new serialization format for sklearn models that is more secure than pickle.
You can check this notebook out to see how to use it.
If you want to learn more, check out our docs.
It's very appreciated if you could let us know if you run into any issues by opening an issue on GitHub.
​
obligatory ML meme
/r/MachineLearning
https://redd.it/zd3n8s
[R] The Forward-Forward Algorithm: Some Preliminary Investigations [Geoffrey Hinton]
Paper: https://www.cs.toronto.edu/~hinton/FFA13.pdf
Twitter summary: https://twitter.com/martin_gorner/status/1599755684941557761
Abstract:
> The aim of this paper is to introduce a new learning procedure for neural networks
and to demonstrate that it works well enough on a few small problems to be worth
serious investigation. The Forward-Forward algorithm replaces the forward and
backward passes of backpropagation by two forward passes, one with positive
(i.e. real) data and the other with negative data which could be generated by the
network itself. Each layer has its own objective function which is simply to have
high goodness for positive data and low goodness for negative data. The sum of the
squared activities in a layer can be used as the goodness but there are many other
possibilities, including minus the sum of the squared activities. If the positive and
negative passes can be separated in time, the negative passes can be done offline,
which makes the learning much simpler in the positive pass and allows video to
be pipelined through the network without ever storing activities or stopping to
propagate derivatives.
/r/MachineLearning
https://redd.it/zdkpgb
Distribution of the Crested Porcupine - the largest Porcupine species in the World
/r/MapPorn
https://redd.it/zdiwhq
[OC] The Highest Streaming Spotify Artist/Band From Each State
/r/MapPorn
https://redd.it/zcv8d6
The 10 countries with the largest populations that didnt make it into the World Cup this year
/r/MapPorn
https://redd.it/zd9iw0
[OC] The Highest Streaming Spotify Artist/Band From Each State
/r/Infographics
https://redd.it/zcv8bo
E Stats Professor teams up with nameless GHOUL to play a parody of SQUARE HAMMER in class
My professor is a rock star and made a stats parody of Square Hammer by Ghost, then played it live in class
Here is the link:
https://www.youtube.com/watch?v=hupRyzrFRrg
/r/statistics
https://redd.it/zcp2ax
D NeurIPS 2022 Outstanding Paper modified results significantly in the camera ready
The paper is "A Neural Corpus Indexer for Document Retrieval"
According to the Revisions record on OpenReview, the final modification of the Rebuttal phaseat which point Table 1 reads.
​
https://preview.redd.it/75ibpthipw3a1.png?width=720&format=png&auto=webp&s=fd5c6071db4eb3f47b8e41ded08aa253cbc07c4a
But the Camera Ready version in which results of the same experience in Table 1 are obviously different from the first submitting and the difference is huge.
​
https://preview.redd.it/quwdju9npw3a1.png?width=720&format=png&auto=webp&s=421c6b48803331945610b27e6acd649563614d32
/r/MachineLearning
https://redd.it/zcdw0k