DEV Community

[Comment from a deleted post]
Collapse
 
kmwill23 profile image
Kevin

You definitely have to provide more information. Right now based on your post and your responses, I am not even sure you know what you are asking for. Sometimes it sounds like you do. Sometimes not.

This post is also tagged as JavaScript, which makes even less sense.

Some guiding questions:

  • What is your ultimate goal?
  • What kind of "program" do you envision? Command line, app, web?
  • What information are you looking to scrape?

Try taking it a step at a time. You have some kind of interface. You put in one or more URLs and press a button called "scrape". What do you expect the output format to be?

Collapse
 
allanjeremy profile image
Allan N Jeremy

Agreed 100%!

Collapse
 
saadmrb profile image
Saad Alem

Done

Collapse
 
kmwill23 profile image
Kevin • Edited

Are you looking to...

  1. Scan for image data within a webpage. Or...

  2. Take a screen capture of part or the whole URL destination?

Also, clarify this part of you could:

'integrate it to "see" the webpage without downloading it'

Taken literally, this would never be possible.

 
saadmrb profile image
Saad Alem

Imagine it as a visual detector, if it finds the information it crawls it, otherwise just pass to another url.

I have an idea ^

 
kmwill23 profile image
Kevin

How do you provide the context for the "information" it is using to determine whether or not to crawl?

 
saadmrb profile image
Saad Alem

I have built a classifier for "images I want to be crawled", the thing is I need to deploy it into the web;

Websites Url -> Detected -> Crawl

 
kmwill23 profile image
Kevin

This "classifier", is it actual image file data, or just contextual information about image data?

Can you provide a sample of this "classifier"?

 
saadmrb profile image
Saad Alem

Image file data, otherwise It would be time consuming to do it all in the same time on the web.

i.e: provide it with images of monkeys, it will search through websites, when it detects the image of the monkey it will be scraped then.

 
kmwill23 profile image
Kevin • Edited

When it is scanning for an image, is it looking for that binary-exact image? Or is it looking for visually similar images?

Are you looking for dynamic web content or static web content?