Unscraped Art
Searches open access art and museums for what you want. Builds an Excel spreadsheet of citations and info as you go. Free. Private. No sign-up.
Open source under the Apache 2.0 license.
Searches open access art and museums for what you want. Builds an Excel spreadsheet of citations and info as you go. Free. Private. No sign-up.
Your collected images are saved here until you download them.
This small tool reads the spreadsheet and simply downloads every image on it into organized folders on your computer.
Step 1: Download this tool.
Step 2: Put it next to your spreadsheet and double-click.
Downloads come directly from museum servers using your own IP. This tool just reads the spreadsheet.
Download for Windows Download for MacWhy not to ChatGPT, Midjourney, Stable Diffusion, DALL-E, Adobe Firefly – just some opinions.
"Scraped" means taken without asking. Scraping uses "Crawler Bots". A company writes programs that crawl the internet and download every image they find. Those images are stored on private company servers owned by wealthy shareholders. The shareholders are perhaps 1% of the population who are culturally homogenous.
Images are fed into a training process to connect their "styles" to linguistic terms like "oil painting" or "Van Gogh" or whatever. When you type a prompt, it generates new pixels via language. The output looks like art because it was blended from art, but it is arguably not art. The people who made the images were not asked and were not paid.
The extraction is not small. The major image generators appear to have trained on billions of images. Their output now competes directly with the people it was built from. An illustrator can spend years developing a recognizable style, only to watch a machine trained on that style sell a cheap imitation to anyone with twenty dollars and a prompt box. Fair use law was written to protect critics, teachers, scholars, and satirists. Using it to defend industrial-scale extraction for private profit is a different thing. And unlike music sampling, there is no real audit trail here. No artist can trace their work into the output, demand credit, or seek compensation. The line back to the source was cut on purpose.
Dozens of federal copyright lawsuits are currently pending in U.S. courts against the major AI companies. The landscape includes Andersen v. Stability AI (proceeding to trial September 2026), Bartz v. Anthropic (settled for $1.5 billion in 2025), New York Times v. OpenAI (ongoing discovery), Getty Images v. Stability AI, Universal Music's $3.1 billion suit against Anthropic, and Disney's suit against Midjourney over copied characters. Every image you generate with these tools carries unresolved legal provenance questions.
As of 2026, U.S. courts have split on some issues: training on copyrighted books can be fair use (Bartz ruling), but storing pirated copies is not. The Supreme Court has also confirmed that purely AI-generated works cannot be copyrighted in the U.S. (Thaler v. Perlmutter, 2026). For anyone in education, publishing, nonprofits, or institutional work, it is wiser for now to use creative-commons and public-domain images with documented provenance.
The AI image generation market sits in the tens of billions annually. Midjourney has around 100 employees and reportedly generates hundreds of millions in revenue. DALL-E is part of OpenAI, valued at $157 billion after a 2024 funding round. These companies trained on billions of images scraped from the internet.
Apple's market capitalization sits above $3 trillion. Microsoft's is similar. Nvidia crossed $3 trillion in 2024. OpenAI alone, a company that didn't exist a decade ago, was valued at $157 billion after a 2024 funding round. Analysts project the AI industry to add many trillions to global GDP by 2030.
The U.S. Bureau of Economic Analysis puts arts and cultural production — including commercial film, broadcasting, publishing, plus nonprofit museums, libraries, performing arts — at roughly $1.1 trillion a year in contribution to U.S. GDP. The nonprofit cultural sector (museums, libraries, archives, symphonies) is a smaller share of that figure. By comparison, the major AI companies' market valuations now exceed the annual GDP of every country on Earth except the largest two or three.
A handful of men now control more of the world's digital infrastructure than any government. One of them (Musk) operates roughly three-quarters of all active commercial satellites, the machinery behind GPS assistance, weather forecasting, and global communications.
The images that trained these systems came from everybody else: taxpayers, archives, working artists, and the accumulated visual record of the human species.
The data scraping was sudden, quick, global. The ownership of the output is private and concentrated inside a demographic so narrow that you are almost certainly not in it.
Yes, prompting DALL-E or Midjourney burns energy. So does driving a car, streaming video, or refrigerating strawberries in January.
People often talk about typing a prompt as if it were a singular ecological sin — but that framing ignores much larger, routine forms of damage. A single beef cheeseburger carries a much heavier material footprint than one image prompt because cows require land, feed, water, methane-producing digestion, slaughter, refrigeration, packaging, and trucking. The same goes for air travel, fast fashion, and endless consumer junk.
So be honest: AI image generation has an environmental cost, but it is usually being folded into a much bigger industrial mess. Moral panic about prompts can become a convenient way to avoid talking about capitalism, meat, energy, logistics, and scale.
Image generation runs through data centers, and data centers need power, water, land, cooling systems, and transmission infrastructure. When one town fights a facility over water use, noise, tax giveaways, or grid strain, the problem does not vanish. The company just looks for a place with weaker resistance, cheaper land, or poorer people who have less power to say no. The environmental issue is not just "AI uses electricity." The issue is where the burden gets dumped, who absorbs it, and how quickly wealthy firms can move extraction to communities with less political power.
Your phone or whatever you are reading this on is a daily, normalized object built on extraction, labor exploitation, surveillance, and energy use at massive scale. Most people will not throw it away because the device is now structurally tied to work, banking, maps, social life, medicine, and survival. The honest position is that we are all living inside systems of damage and dependency, and the real question is scale, necessity, and where the burden lands.
Humans are chameleons and thieves and always have been. Culture is made by borrowing, stealing, remixing, inheriting, misremembering, jamming, copying, and carrying things across borders, languages, and styles. There is no pure text with a single author – no "Homer."
We live by copying others – we are mimetic beings with arguably no stable sense of "self" in the first place. So who owns the art? Are there authors at all?
Museums know the issues better than anyone. Many "open source" archives are full of objects passed through black markets, by force under empire, looted in war, stripped from ritual life. It's all warped into stable oddities in glass boxes for us to gaze at – our possession as "public domain." Many institutions are genuinely careful about provenance. Many are not.
Artists survive this economy by declaring authorship – to market a self and make a living. They think about "art" in terms of an industrial system for pricing and owning property, land, objects. But this industrial system is very weird and not normal for human history.
Most human societies through time and space have never treated culture as private commodities in the first place because they did not operate as markets. Not every culture divides the world into "artists," "owners," and "buyers" the way capitalism does. The modern art market is not some pure and innocent place.
But worse – to say that certain objects belong to a certain "nation" is nationalism. It divides the planet into invisible lines of us and them. But who gets to decide what constitutes a nation, anyway? The people in charge – the few up top, the council, government, chiefs, kings, usually men – they get to define art for everyone?
Then how is all art not simply propaganda? To push a piece of art into a particular "culture" reeks of exactly the imperial thinking that most creative commons work seeks to relieve.
This tool is not pretending to touch some pure, untouched archive – that does not exist anywhere. The point is narrower and more honest. It works with images that institutions have already released under open-access terms. That's not perfect, but it's better than the newer corporate move: scrape everything at industrial scale, sever it from all human context, and sell their own brains back to the hordes of dumb consumers. It is a problem and I offer no solutions.
If you use it like Google, it will return crappy results.
Museum records are written by professional curators. This is not Google. Pretend you are a scholar requesting something from a museum. Say what it's made of. Bronze sculpture, woodblock print, silk textile — museum records always describe the material. Use culture or period words. Artist names work great. Use words, not sentences.
As best you can (not perfectly) try 3–6 keywords, no sentences, in this rough pattern:
[culture/region] + [object type] + [material] + [date/period] + [artist/place if known]
Each word should be something that could plausibly appear in a catalog field.
When in doubt, try to hit at least three of these buckets:
Culture / place: Egyptian, Greek, Roman, Yoruba, Japanese, Chinese, Florentine, Venetian, Dutch, French, Mexican, Andean
Period / date: New Kingdom, Archaic, Classical, Hellenistic, Gothic, Renaissance, Baroque, Edo, Meiji, 18th century, 1920s
Object type: vase, krater, figure, amulet, stele, relief, portrait, landscape, altarpiece, mask, textile, banner, print, photograph
Material / technique: bronze, marble, limestone, terracotta, faience, gold, silver, ivory, silk, cotton, wool, oil, tempera, fresco, woodblock, lithograph, gelatin silver print
Name / place (if known): Hokusai, Rembrandt, Rodin, Picasso, Florentine, Venetian, Benin, Ife, Thebes, Luxor
Describe what the thing is, not your reaction to it.
Swap adjectives for catalog fields:
"old" → century or period ("15th century," "New Kingdom")
"Asian" → specific culture/region ("Japanese," "Korean," "Chinese")
"weird" or "cool" → object type ("mask," "idol," "figurine")
If you only know one precise thing, anchor on that and add generic but catalog-friendly words around it:
Your keys are stored in your browser and pass through our server only to reach museum APIs. We do not log or store them.
An API key is a free library card. You register, you get a code, you paste it once. That's it.
| Source | How to Get the Key |
|---|---|
| Smithsonian | api.data.gov/signup |
| Europeana | pro.europeana.eu |
| Rijksmuseum | rijksmuseum.nl/rijksstudio |
| Harvard Art Museums | harvardartmuseums.org |
| DPLA | dp.la developers |
| Museum / Library | Key | What They Have |
|---|---|---|
| Metropolitan Museum of Art | None | 400,000+ works |
| Art Institute of Chicago | None | 50,000+ CC0 |
| Cleveland Museum of Art | None | 30,000+ CC0 |
| SMK — National Gallery of Denmark | None | European & Danish art |
| Wellcome Collection | None | Medical & scientific imagery |
| Princeton Art Museum | None | Greek/Roman, pre-Columbian |
| Wikimedia Commons | None | 100M+ files |
| National Archives (NARA) | None | US gov records, maps |
| Library of Congress | None | FSA photos, maps, prints |
| NASA | None | Space photography |
| Smithsonian Institution | Free key | 21 museums |
| Europeana | Free key | 50M+ European items |
| Rijksmuseum | Free key | 800,000+ Dutch masters |
| Harvard Art Museums | Free key | 250,000 objects |
| DPLA | Free key | Thousands of US libraries |
"Creative Commons" images have very sketchy origins. Museums have stolen many things from many people. Open access solves a legal problem – not a consent problem. Many museums and archives hold material that was taken under colonial conditions, cataloged under Western property law, and later released as "public domain" or CC0 by institutions that were never the rightful moral authorities over it in the first place, if anybody ever did.
That means a file can be legally open while still carrying living cultural tensions: sacred restrictions, seasonal restrictions, gendered restrictions, lineage restrictions, or community-use limits.
This tool only searches sources that have already chosen to publish material under open-access terms. It does not reach around community protocol systems or treat legal openness as proof of ethical permission. That line matters.
I don't call generated images "art" – we can debate that all day. Unless they involved extremely complex decisions or coding, I call them "decoration." I do use them when they assist accessibility or provocation toward positive social goals.
For example, I used DALL-E to create many complex images of animals with prompts extending beyond 30 pages each, in order to make George Orwell's Animal Farm more accessible to kids. Then I offer that as a product to educators and teachers.
To me, that is one acceptable use that becomes a kind of artistry and outweighs costs of "scraping." However, I also create an Open version of everything simply using stuff the community makes with pen and paper or creative commons images.
I am not anti-AI – I built this with AI for the social good. Yes, you can use AI for good things without loving its tech-bro billionaire corporate culture.
Effective April 20, 2026. Operated by Al Tarbet in Salem, Oregon. Questions: al@tarbet.design.
Unscraped Art is a search interface. It queries public APIs, returns public metadata, and displays links to publicly hosted images. It does not host, store, proxy, or redistribute any image or non-public data.
Every image carries the rights information provided by the source institution. A label of "CC0" or "Public Domain" reflects that institution's declaration, not ours. We do not verify or guarantee rights status. Before using any image commercially or republishing it, check the original source record directly. The museum or archive is always the authoritative source for rights.
We do not require accounts, collect names, emails, or store personal data. Your API keys are stored only in your browser (localStorage) and are sent as request headers when you search. Your keys pass through our server only to reach the museum APIs — they are not logged, stored, or retained on our side.
Ephemeral access logs may record the search topic and your IP address for operational reasons (debugging, rate-limiting). These logs are not retained long-term and are not linked to any identifying information.
Your collected images and API keys are stored in your browser's localStorage under the keys harvester_v1_keys and harvester_v1_basket. If you are on a shared or public-access computer, use the "Clear All" button in the API Keys modal and clear your browser data before leaving.
Under Oregon's OCPA, California's CCPA/CPRA, and the EU's GDPR, you have rights to access, correction, deletion, and opt-out of data sale where applicable. Unscraped Art retains nothing about you after a search completes. The museum APIs you query retain data per their own terms. For questions, email al@tarbet.design.
The Unscraped Art codebase is open source under the Apache 2.0 License. The website text is licensed CC BY 4.0. Images and metadata belong to their source institutions.
This tool does not search community-governed repositories. Legal openness is not the same as ethical consent. If your institution wants to be removed or change how we interact with your API, contact us and we will respond within 30 days.
In the event of a breach at Unscraped Art or its upstream providers (Replit, museum APIs) that affects search request data, a notice will be posted on unscraped.art within 72 hours of discovery.
They are like online library cards given out for free by museums. If you haven't yet, sign up for them — it takes like two minutes.
Your keys are not logged or stored on our side.
localStorage and x-smithsonian-key to see how keys are handled. The code is open source under Apache 2.0. (I wish life were long enough for me to sit around stealing people's museum keys.)
It will return crappy results if you do not!
Core rule: try 3–6 keywords, no sentences:
[culture/region] + [object type] + [material] + [date/period] + [artist/place if known]
We are searching over 15 museums and archives simultaneously. Each one has to respond with its own data. This can take a minute or two.
If the search fails or returns nothing, it usually means one of two things:
1. Too many requests from your IP address — wait about 15 minutes and try again.
2. Your search was too broad — try narrower, more specific keywords.
This is normal. Museum APIs have rate limits. The tool is working — just be patient with it.
