Hot data science related posts every hour. Chat: https://telegram.me/r_channels Contacts: @lgyanf
America's car reliance: getting to work across 48 states mapped
/r/MapPorn
https://redd.it/zvyc67
[OC] - My 2022 Spending Breakdown - 25M - (Sankeymatic)
/r/dataisbeautiful
https://redd.it/zvpuvt
Introducing BastionLab - Collaborate with our simple privacy framework for data visualization!
📈 We’re thrilled to introduce BastionLab, our open-source and simple privacy framework for data science collaboration!
To see what plotting looks like when privacy issues are automatically handled for you, you can check our GitHub or directly go to our Visualization tutorial 📊
### Built for sensitive data collaboration
Collaboration between data owners and data scientists is a big challenge for highly regulated fields like health, finance, or advertising due to security and privacy issues. When collaborating remotely, data owners have to open their whole dataset, often through a Jupyter notebook. This too-broad access creates huge privacy gaps because too many operations are allowed, which enables data scientists to extract information from the remote infrastructure (print the whole database, save the dataset in the weights, etc).
⚙️ BastionLab solves this problem by providing fine-grained access control. It guarantees data owners that data scientists can only perform privacy-friendly operations on their data and that only anonymized outputs are shared with them.
### How does BastionLab work?
BastionLab makes sure that the data owner’s remote data is never accessed directly by the data scientist. Three main elements ensure this:
- First, a ‘safe zone’ is defined by the data owner to filter the data scientist’s queries, which enforces control while allowing for interactivity.
- Second, expressivity is limited. This means that the type of operations that can be executed by the data scientists is restricted to avoid arbitrary code execution.
- Finally, the data scientist never accesses the dataset locally. They only manipulate a local object that contains metadata to interact with the remotely hosted dataset - and data owners can always see the calls made by that object.
### Ready to try?
If you like the project, drop a ⭐ on our GitHub! We’re open-source, so it’s a big help ^^
/r/visualization
https://redd.it/zspjml
E Is there a major difference between possible job opportunities of MS Stats and MS Biostats holders?
I basically have three options: go into work force with my B.A, get an MS in stats, or get an MS in biostats. My degree is in Econ & Math and I'm leaning towards wanting to work in finance on the analytics/DS side or maybe as a quant dev (completely different path).
The reason I 'm leaning towards getting a masters is because I've read and seen salary data suggesting MS stats holders make significantly more than their bachelor holding counterparts, plus many of the more technical positions require an MS.
I don't really want to work in the biostats fields, its more of a backup if I don't get into the normal stats masters, as the foundations are the same. So I'm just wondering if a masters in biostats is a deterrent for those wanting to go into finance/tech? Thanks.
/r/statistics
https://redd.it/zv20jh
Merry Christmas from Desmos
/r/mathpics
https://redd.it/zvd10y
[OC] This is the graph of my weight over this year. Picture (scan) of the graph is from Weight Diary app.
/r/dataisbeautiful
https://redd.it/zvjy2r
D Are reviewer blacklists actually implemented at ML conferences?
Are blacklists actually implemented in these conferences (ICML / ICLR / NeurIPS) given that the number of reviewers required grows every year?
Edit: Should've been clearer, sorry. By blacklist I mean a list of reviewers who are barred from reviewing because of their bad review quality in previous iterations of the conference.
/r/MachineLearning
https://redd.it/zuyy3j
Jupyter Server 2.0 is released
https://blog.jupyter.org/jupyter-server-2-0-is-released-121ac99e909a
/r/IPython
https://redd.it/zem28q
How Snowflakes are Formed
/r/Infographics
https://redd.it/zu9qp6
Trippy Inkpunk Style animation using Stable Diffusion [P]
/r/MachineLearning
https://redd.it/zvbjot
FIFA World Cup 2022 saw a surprising number of unlikely match outcomes. Here's a way to estimate how exceptional it was. I simulate an experiment when I bet on the least likely outcomes. Surprisingly this dead-simple strategy is beneficial [OC]
/r/dataisbeautiful
https://redd.it/zv556w
Here’s a playlist I use to keep inspired when I’m coding/developing/studying. Post yours as well if you also have one!
https://open.spotify.com/playlist/0SbaIICccsV3XXZVaY4o9a?si=85b1f5df776e469d
/r/bigdata
https://redd.it/zlwz17
what would you consider to be the most important practices for a newly assembled DS team?
I have started as the first data scientist in an e-commerce start-up some months ago. Although my tasks are currently more on the data engineering side, it’s a very exciting time. Now there will be another data scientist joining soon and I’m trying to figure out what some of the most important practices are that could leverage our teamwork. Whether it’s about communication or about specific code practices (unit testing, etc.), what would you consider the most important points for newly assembled DS-teams?
Thanks already!
/r/datascience
https://redd.it/zuvov7
Trinidad and Tobago’s Rig days show some correlation with the WTI Spot Oil Price.
/r/visualization
https://redd.it/zugmpn
[OC] Every High School Baseball Field Used in the State of West Virginia
/r/dataisbeautiful
https://redd.it/zvtef4
[OC] US city housing price changes since 1991
/r/dataisbeautiful
https://redd.it/zvt54h
Global Terrorism in the 21st Century: A Map of Attacks and Incidents
/r/MapPorn
https://redd.it/zvmz1a
Forest coverage map of Costa Rica over time
/r/MapPorn
https://redd.it/zvfc23
🎅12 Days of Christmas Workout
/r/Infographics
https://redd.it/zvj13s
Books to read after "The Visual Display of Quantitative Information"?
Just finished. Was a decent introduction, but am interested in a more rigorous treatment of data vis. Idk exactly what that looks like, but something in this direction
- More cited research. I know there's research on human color perception for instance.
- More taxonomy of graphs (these are different ways to represent bivariate real data, this is how to represent a real dependent variable and a categorical independent variable). Almost to the point of a prescriptive flowchart thing.
/r/datascience
https://redd.it/zve9pd
[OC] Beer as a percentage of total calories
/r/dataisbeautiful
https://redd.it/zvddm2
EV Production in the U.S. by Brand
/r/visualization
https://redd.it/zv528r
[OC] State by State Housing Price Growth since 1975
/r/dataisbeautiful
https://redd.it/zvaijm
Where do I get the location of Elon Musk's private jet?
Given that he has banned the Twitter account that tracked his private jet, I wondered if anyone knew where the account got its data from.
/r/opendata
https://redd.it/zmloap
Map: These are the world’s least religious countries/Gallup research
/r/MapPorn
https://redd.it/zuygf5
[OC] Life Expectancy at Birth (1960-2019)
/r/dataisbeautiful
https://redd.it/zuxugl
What is this chart called and how can I plot one myself?
Can anyone help me name this type of scientific chart with lines overlaid indicating a second independent variable? Also would be helpful to know a program/library that is good at working with these.
Called a pressure density chart or volume-pressure-temp diagram. (related charts are flight envelope by weight). Thank you
https://preview.redd.it/8id3rqcuvr7a1.png?width=1288&format=png&auto=webp&s=eb4cc89bf34e01e310f72481201677883e85592a
https://preview.redd.it/20r7b340vr7a1.png?width=640&format=png&auto=webp&s=b086368511f1d3b3fe2278b53e4859ece013bf5a
/r/visualization
https://redd.it/zu1kar
[R][P] I made an app for Instant Image/Text to 3D using PointE from OpenAI
/r/MachineLearning
https://redd.it/zubg2u