Standalone python music scraper

Do you use fanart.tv in your app? tell us about it!
Post Reply
kamaradski
Posts: 3
Joined: Thu Oct 22, 2015 9:37 pm

Standalone python music scraper

Post by kamaradski »

Hi all,

Since i got increasingly frustrated with the lack of music orientated Fanart.tv plugins in Kodi\XBMC, and the total instability of the few existing plugins, I decided to slab a quick script together that will iterate over your music collection and download the cover.jpg & cdart.png from Fanart.tv.
What is this ?
This script is a stand alone python scraper that will read MusicBrainz tags of your MP3 and FLAC files, and fetch the following files from the Fanart.tv database:
- Cover artwork
- CDart Artwork
Dependencies:
- Python
- Linux (might work under Windows too, but totally untested)
- Mutagen libs (https://bitbucket.org/lazka/mutagen)
- You music collection is tagged with at least "MusicBrainz Album ID"
- Each album has it's own folder
Current features & Limitations:
Project state: LIVE: Stable as of V1.5.2

Limitations:
- Support for Mp3 & FLAC only
- Support for cover.jpg & cdart.png only
- Relies only on MusicBrainz tags
- Downloads only from Fanart.tv

Features:
Logging:
- Debug file-logging (Log every action to file in real-time)
- Log missing artwork to file (after completing script-run)
- Log downloaded artwork to file (after completing script-run)
- Log session statistics to file (after completing script-run)

Download:
- Download cover-art to cover.jpg
- Download CD-art to cdart.png (Will try to match specific disk numbers, or fall back to first result)
- Reduced API calls by skipping albums that already contain a full set of artwork
- Support for Fanart.tv user API keys
- Supports MusicBrainz tags, and MusicBrainz API
- Supports Fanart.tv API
- Supports FLAC & MP3
I hope this makes downloading the content you want a lot easier (just like it did for me)

Readme: https://bitbucket.org/kamaradski/fanart ... r/overview
Download: https://bitbucket.org/kamaradski/fanart ... /downloads

KR
Kamaradski
Last edited by kamaradski on Sun Jan 03, 2016 10:18 pm, edited 2 times in total.
kamaradski
Posts: 3
Joined: Thu Oct 22, 2015 9:37 pm

Re: Standalone python music scraper

Post by kamaradski »

So some time has passed, and since i added a lot of features and support for FLAC, i wanted to give a quick update on this topic.

So basically the biggest updates are:
- Added support for FLAC
- Improved log to file functions
- Improved error-handling (still WIP)
- Increased overall stability

Code: Select all

V1.4 
- Added Summery log-file keeping track of missing & downloaded fanart 
- Missing fanart no longer include fanart downloaded in the current session 
- Added FLAC support 
- Minor changes to the onscreen & logfile messages

V1.3
- Additional statistics added
- Improved error handling
- Added support for Client-API key
- Added & tweaked more ui messages
- Added some config options to the script (not yet working)
- Added more fanart.tv messages to the script header, to comply to their API rules

V1.2
- Added User-Agent headers to the API-call request, for easy identification\logging\reporting on their end
- Minor ui message tweaks
- Added basic statistic reporting (WIP)

V1.1
- Greatly improved user onscreen messages
- Greatly improved logging
- Generic code cleanup
- Improved error handling for the API-calls
- File header with credits

V1.0
- Stability tweaks

V0.0
- Initial release
User avatar
Kode
Site Admin
Site Admin
Posts: 353
Joined: Wed Dec 18, 2013 11:34 am

Re: Standalone python music scraper

Post by Kode »

Thanks for the update :)
vicmanpergar
VIP
VIP
Posts: 47
Joined: Mon Dec 23, 2013 6:56 pm

Re: Standalone python music scraper

Post by vicmanpergar »

Thank u
It is basically musicbrainz, and fanart, right?
kamaradski
Posts: 3
Joined: Thu Oct 22, 2015 9:37 pm

Re: Standalone python music scraper

Post by kamaradski »

Hi Vicmanpergar,

Yes read all about it on the read.me page on Bitbucket here: https://bitbucket.org/kamaradski/fanart ... craper/src

in particular:
How does it work on the inside ?
- Create list of sub-folders
- Loop through these folders & create list of all files in these folders
- Check if any artwork exist in the list of files for this folder
- In case of missing artwork, check for MusicBrainz Album ID tag
- Request the release_group for this AlbumID from the MusicBrainz API
- Request the artwork download URLs from the Fanart.tv
- Download the missing artwork to the current folder
- Continue looping through the folder-list from step-2
And i will also use this post to announce Version 1.6:

Code: Select all

V1.6
- Improved API error handling for Fanart.tv API calls 
- Minor error-logging tweaks
- Minor display messages tweaks
- Fixed possible divide-by-zero exceptions upon statistic calculation 
- Added Toggle Switch for Writing downloaded artwork log
- Splitted downloaded & missing artwork in 2 separate log-files
- New file-names for the log-files
- Optimized initial file initiating process
V1.5.3
- Improved API error handling for MusicBrainz API calls
- Added API-errors to the debug-log
V1.5.2
- Small code-flow correction to improve file-type detection
- Improve socket error handling
- Library import clean-up
- Added toggle for writing missing artwork to file
- Changed project status to: STABLE
V1.5.1 - Code clean-up release, many new comments added
V1.5
- Fixed a bug where FLAC files without a disk-number would cause an exception
- Fixed a logging exception when Fanart.tv API is not available
- Fixed a logical error in the audio-type detection
- Fixed a bug with FLAC AlbumID detection
- Added debug-logging switch
- Added pre-defined function for debug-log writing, to allow code clean-up
- Generic code-clean-up
vicmanpergar
VIP
VIP
Posts: 47
Joined: Mon Dec 23, 2013 6:56 pm

Re: Standalone python music scraper

Post by vicmanpergar »

Cool...
Will check it out
Post Reply