If a model was trained on low resolution images, how well is it expected to generalize during test to high-resolution images ?
Lately, I have seen some examples of research using CIFAR and FER2013 (Facial Expression Recognition).
Both sets have low resolution images, resp. 32x32 and 48x48 images.
It seems to me that most studies using these datasets report good performance, on testsets that have similar resolutions and come from the same data pool. But I have doubts if training with low resolution images, the model will generalize well to different datasets with high resolution.
My question is :
Does anyone have experience with this, having trained on low resolution data and then after that having tested on different dataset with higher resolution?
Are there any studies that addressed this question ?
Thank you very much in advance for your input!
/r/computervision
https://redd.it/11bmpda
Real-Time-Object-Counting-on-Jetson-Nano
https://github.com/R-Mahmoudi/Real-Time-Object-Counting-on-Jetson-Nano
/r/deeplearning
https://redd.it/1174qgv
Open sourcing Rerun: A toolbox for visualizing Computer Vision
Today we're making the Rerun open source project public. Links to docs and repo on rerun.io
Rerun beta: Visualize Computer Vision
Rerun is now installable as
pip install rerun-sdk
for Python users and
cargo add rerun
for Rust users. C/C++ support is planned but not there yet.
Rerun is an SDK for logging data like images, tensors and point clouds, paired with an app that builds visualizations around that data. We built Rerun for computer vision and robotics developers. It makes it easy to debug, explore and understand internal state and data with minimal code. The point is to make it much easier to build computer vision and robotics solutions for the real world.
Rerun is in beta. It is already quite powerful and useful. A couple of great teams have been using it for several months as both their main internal debugging tool, and as a way to show off their systems to customers and investors.However, we're just getting started and have lots of exciting features in the pipeline.
We are also open for contributions now and are all looking forward to hearing your feedback!
Visualization of a sparse 3D reconstruction done with COLMAP
/r/computervision
https://redd.it/112w0br
[R] [N] Toolformer: Language Models Can Teach Themselves to Use Tools - paper by Meta AI Research
/r/MachineLearning
https://redd.it/110s8ui
N Getty Images Claims Stable Diffusion Has Stolen 12 Million Copyrighted Images, Demands $150,000 For Each Image
From Article:
Getty Images new lawsuit claims that Stability AI, the company behind Stable Diffusion's AI image generator, stole 12 million Getty images with their captions, metadata, and copyrights "without permission" to "train its Stable Diffusion algorithm."
The company has asked the court to order Stability AI to remove violating images from its website and pay $150,000 for each.
However, it would be difficult to prove all the violations. Getty submitted over 7,000 images, metadata, and copyright registration, used by Stable Diffusion.
/r/MachineLearning
https://redd.it/10w6g7n
Fine tuning mt5
How do I fine-tune an MT5 model for generating Bengali paraphrases? I have enough datasets but I can't find a working script to fine-tune an MT5 model.
/r/LanguageTechnology
https://redd.it/10rvura
Easily Build Your Own GPT from Scratch using AWS: A Comprehensive Guide for Domain Adaptation
🔥🤖Get ready to train your own GPT-2 model from scratch using AWS SageMaker!🤖🔥
This comprehensive guide will take you through the entire process of creating a custom-built GPT-2 model, tailored to your specific domain or industry. 💻
You'll learn how to acquire and prepare raw data, create custom vocabularies and tokenizers, pre-train large language models, and evaluate the performance of your custom model. 📈
Not only that, but you'll also delve into the intricacies of training a GPT-2 model to generate cohesive news articles related to the COVID-19 pandemic! 🦠
And the best part? It comes with 9 Jupyter notebooks and all the necessary Python scripts to help you get started right away! 🚀
You'll also gain a solid understanding of key concepts like generative AI, foundational models, language alignment, and prompt engineering with a focus on GPT. 💡 https://tinyurl.com/hvrjkm5r
/r/LanguageTechnology
https://redd.it/10ohy1m
The ChatGPT Cheat Sheet
😁 Happy to introduce one of the most comprehesive ChatGPT cheat sheets: a 30 pg. paper highlighting various prompts to manage ChatGPT for generating text. The document not only highlights what ChatGPT can generate but also how it can generate it! Here is the TOC:
1. NLP Tasks
2. Code
3. Structured Output Styles
4. Unstructured Output Styles
5. Media Types
6. Meta ChatGPT
7. Expert Prompting
Google Doc: https://drive.google.com/file/d/1OcHn2NWWnLGBCBLYsHg7xdOMVsehiuBK/view?usp=share\_link
/r/LanguageTechnology
https://redd.it/10k67l1
DensePose From WiFi
By Jiaqi Geng, Dong Huang, Fernando De la Torre
https://arxiv.org/abs/2301.00250
>Advances in computer vision and machine learning techniques have led to significant development in 2D and 3D human pose estimation from RGB cameras, LiDAR, and radars. However, human pose estimation from images is adversely affected by occlusion and lighting, which are common in many scenarios of interest. Radar and LiDAR technologies, on the other hand, need specialized hardware that is expensive and power-intensive. Furthermore, placing these sensors in non-public areas raises significant privacy concerns. To address these limitations, recent research has explored the use of WiFi antennas (1D sensors) for body segmentation and key-point body detection. This paper further expands on the use of the WiFi signal in combination with deep learning architectures, commonly used in computer vision, to estimate dense human pose correspondence. We developed a deep neural network that maps the phase and amplitude of WiFi signals to UV coordinates within 24 human regions. The results of the study reveal that our model can estimate the dense pose of multiple subjects, with comparable performance to image-based approaches, by utilizing WiFi signals as the only input. This paves the way for low-cost, broadly accessible, and privacy-preserving algorithms for human sensing.
/r/computervision
https://redd.it/10eg0d6
Using computer vision to find shortest paths on cross stitching patterns (code on comments)
/r/computervision
https://redd.it/108emlz
Nvidia DeepStream 101: A beginner’s guide to real-time computer vision
https://chirag4798.medium.com/nvidia-deepstream-101-a-beginners-guide-to-real-time-computer-vision-afefcb5d7fba?source=friends_link&sk=b5bdfe8e2fb1b387ac3db8b8c08b5e7f
/r/computervision
https://redd.it/109fi7a
Laptop with GPU for Work vs Cloud, Best Practices
Hey guys, in my last job as an ML CV Engineer, we were given laptops with dedicated GPU for our work and I hated it. I think there is no point explaining why crappy gaming laptops (even expensive ones) can be worse than some good-quality laptops without GPU, especially if you care about portability. Of course, we had certain cloud solutions for model training, but these laptops were always justified as "something you can quickly check and debug things before starting long training runs on the server".
Now, I got a similar role in a new company, by default they offer similar kinds of GPU laptops for ML Engineers, but we managed to have a deal that I will have a machine without GPU and see how it goes.
That got me thinking, how do you cope with such cases when you need to quickly experiment/debug your ongoing code changes in a GPU-intensive applications? Do you connect to your cloud instances and do everything there, or maybe have a separate company server, or something else? I hardly believe that having a gaming laptop is the best solution we've come so far for ML CV Researchers/Engineers. Would be interested to read what are your takes on that.
/r/computervision
https://redd.it/10boise
text to 3d open source blender addon
open source pipeline setup to generate 3d , seems they didnt finish, but looks better using point e, and dmtet for mesh, dream for texture. Firework-Games-AI-Division/dmt-meshes (github.com)
UPDATE: was prompting an alien ship, found an alien inside the ship... shiiit
​
https://preview.redd.it/lipgio7o41ca1.jpg?width=813&format=pjpg&auto=webp&v=enabled&s=9de794ab03445954f272f89b1da8e7c2fa92fbdd
/r/computervision
https://redd.it/10b52ao
Introducing Visionner (Your image dataset toolkit)
Hi guys my name is Charles, and I'm the creator of Visionner.
Visionner is a open source python package that help you Import, Normalize , Save and Manage Your custom image dataset for your computer vision task .
Why ? :
Because most of the time when we learn to create computer vision models , we just use Tensorflow or Pytorch built-in datasets , but in real world project we need to use custom dataset. And I was surprise to see that the difficult things is not what model architecture to use but how to import and normalize my custom dataset to pass it in the neural architecture.
So that is why I decide to automate this step with Visionner.
You can check the code source on my github: https://github.com/charleslf2/Visionner
You can view some showcase on Visionner webpage : https://charleslf2.github.io/Visionner/
Some outputs:
​
Import your image for any supervised computer vision tasks
​
Visualize the first 10 images of your dataset
​
Visualize your labels and save your custom dataset
/r/computervision
https://redd.it/10cet92
How good is the new YOLO? (YOLOv8)
A brief reviev of YOLOv8 capabilities, link is below: (No mailwall)
https://www.flyps.io/blog/a-new-yolo-is-here-yolov8
​
https://i.redd.it/zc8538mhgmba1.gif
/r/computervision
https://redd.it/10a0uvz
We used text-to-location models to find Twitter mentions of "Rihanna" and "Riri" during the Super Bowl
/r/deeplearning
https://redd.it/119zfxw
I am working on a salient feature extractor, to allow future farmers to collect training data about invasive weed species directly from their fields.
/r/computervision
https://redd.it/116yood
R Hitchhiker’s Guide to Super-Resolution: Introduction and Recent Advances
I'm glad to share with you our Open Access survey paper about image super-resolution:
https://ieeexplore.ieee.org/abstract/document/10041995
The goal of this work is to give an overview of the abundance of publications in image super-resolution, give an introduction for new researchers, and open thriving discussions as well as point to potential future directions to advance the field :)
/r/MachineLearning
https://redd.it/11287zf
Good replacement for Tensorflow's Object detection API
The TF Object detection api has been deprecated for a while now, but I really liked the fact that it provided a standardized interface to train and test multiple model architectures. I was wondering if there was a popular alternative today?
I know the new big boy in object detection is YoloV8 so maybe I should just switch to using that model and ecosystem instead.
Edit: Never mind, Ultralytics and yolov8 slaps, I will be using that from now on.
/r/computervision
https://redd.it/10uq4c5
How to visualize CNN feature maps?
I have been working on CNN but cant figure how to visualize feature maps between layers.
/r/deeplearning
https://redd.it/10q44ld
D Simple Questions Thread
Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instead!
Thread will stay alive until next one so keep posting after the date in the title.
Thanks to everyone for answering questions in the previous thread!
/r/MachineLearning
https://redd.it/10cn8pw
[P] paper-hero: Yet Another Paper Search Tool
Hi guys, thanks for reading this post. I built a simplistic paper search tool that integrates ACL Anthology, arXiv API, and DBLP API.
Github address: [Spico197/paper-hero](https://github.com/Spico197/paper-hero)
**Motivation:** I'm majoring NLP and I'd like to search for papers with "Event Extraction" as titles in specific proceedings (e.g. ACL, EMNLP).
**Challenge:** There are lots of search tools and APIs, but few of them provide field-specific searches, like authors, titles, abstracts, and venues.
**Methodology:** I integrate ACL Anthology, arXiv API, and DBLP API, and provide a two-stage search toolkit, which first stores target papers via the official fuzzy search API, and then matches specific fields.
**Advantages:** This tool satisfies my need to stockpile papers and it can dump checklists in markdown format, or complete paper information in jsonl. AND and OR logics are supported in search queries.
**Limitations:** This tool is based on simple string matching, so you have to know some terminologies in the target fields.
You are warmly welcome to have a try and feel free to drop me an issue!
from src.interfaces.aclanthology import AclanthologyPaperList
from src.utils import dump_paper_list_to_markdown_checklist
if __name__ == "__main__":
# use `bash scripts/get_aclanthology.sh` to download and prepare anthology data first
paper_list = AclanthologyPaperList("cache/aclanthology.json")
ee_query = {
"title": [
# Any of the strings below is matched
["information extraction"],
["event", "extraction"], # title must include `event` and `extraction`
["event", "argument", "extraction"],
["event", "detection"],
["event", "classification"],
["event", "tracking"],
["event", "relation", "extraction"],
],
# Besides the title constraint, venue must also meet the needs
"venue": [
["acl"],
["emnlp"],
["naacl"],
["coling"],
["findings"],
["tacl"],
["cl"],
],
}
ee_papers = paper_list.search(ee_query)
dump_paper_list_to_markdown_checklist(ee_papers, "results/ee-paper-list.md")
​
[markdown checklist](https://preview.redd.it/myy4kbut15da1.png?width=2038&format=png&auto=webp&v=enabled&s=4fc3cacedd22bf6290bef3d94ec00bdfe16f61c7)
/r/MachineLearning
https://redd.it/10gp7rm
Automatic generation of image-segmentation mask pairs with StableDiffusion
/r/computervision
https://redd.it/107h6at
Train YOLOv8 ObjectDetection on Custom Dataset Tutorial
/r/computervision
https://redd.it/108616o
VizWiz Launches 4 AI Challenges to help blind/low vision community
Greetings!
We are pleased to announce the fourth annual VizWiz Grand Challenge workshop, which will be held in conjunction with CVPR 2023. The workshop is running 4 AI Challenges to drive the development of assistive technologies for people who are blind or low-vision. Please share this post with those who might be interested in participating.
This workshop is motivated in part by our observation that people who are blind have relied on (human-based) visual assistance services to learn about images and videos they capture for over a decade. We introduce visual question answering, few shot recognition, and object localization dataset challenges for the AI community to represent authentic use cases. A few more details:
· Friday, May 5: submissions of algorithm results due to the evaluation server
· Monday, June 19: results will be announced at the VizWiz Grand Challenge workshop at CVPR 2023
Visual Question Answering (VQA) Challenge here
· VQA Answer Grounding Challenge here
· Few-Shot Object Recognition Challenge here
· Salient Object Detection Challenge here
We are looking forward to your participation in the Challenges this year!
/r/computervision
https://redd.it/10anp57
Other than "Multiple View Geometry in Computer Vision" by Hartley & Zisserman, what are the most essential books(!) for 3D Vision?
I love the book by Hartley & Zisserman and was wondering if there are other, similarly essential books for someone interested in getting into 3D Vision. Any suggestions?
/r/computervision
https://redd.it/10ashym
Photorealistic human image editing using attention with GANs
/r/computervision
https://redd.it/10bw49g
Computer Vision News, the magazine of the algorithm community - January 2023
Dear all,
Here is Computer Vision News of January 2023.
It includes reviews of 2 Best Paper Award winning research papers.
Read 44 pages about AI, Deep Learning, Computer Vision and more - with code!
Read online version for free (recommended)
PDF version
Free subscription on page 44.
Enjoy!
https://preview.redd.it/c0q3fax2k7ca1.jpg?width=400&format=pjpg&auto=webp&v=enabled&s=686185794db8bad40417f77399de94bd5edda595
/r/computervision
https://redd.it/10cjrad
[P] I built Adrenaline, a debugger that fixes errors and explains them with GPT-3
/r/MachineLearning
https://redd.it/106q6m9