Weekend Project: Spotify Album Image Search

Sanders Lauture

Nov 24, 2020 • 7 min read

I almost only listen to full albums on Spotify. Sometimes I know the name of the album I want to listen to but other times I only remember what the album cover looks like. This happened to me again last Friday. I wanted to listen to Somersault by Beach Fossils but I only remembered that it had a white cover album. I then wondered if there was a way to build something that solved my problem. Thanks to the hard work of people smarter than me, artificial intelligence, and APIs I present Spotify Album Image Search (along with source code). Login with your Spotify account, the website will process your saved Spotify albums using the Azure Computer Vision API, and then you can search through your library using colors, objects, or places, along with being able to search by album or artist name.

The hardest part of the project would have been writing code to analyze album images but luckily there are tons of image analyzer APIs out there. I picked the Azure Computer Vision API because I work at Microsoft and I get $150 in Azure credit a month. The API can determine objects, tags, dominant colors, or even create a caption describing an image. For example if I send the API an image of the Because the Internet album cover it will caption that image as "Donald Glover wearing a pink and white checkered shirt"

Because the Internet album cover aka Donald Glover wearing a pink and white checkered shirt

Next, knowing that the API would send the same result for the same input image I figured it would be a good idea to save the results somewhere. Oh look, Azure Cosmos DB and there's even a new serverless option which lets me further ignore the physical reality of server hardware.

Finally, I wanted to be able to search through the results the Computer Vision API returned. I could just do a simple string compare in either the front-end or back-end but a better solution would be a proper search service. You guessed it, Azure Cognitive Search, another offering of Apache Lucene as a service.

Putting It Together

The first part requires authenticating with Spotify to get the user's saved albums. Typical OAuth 2.0 shenanigans. Once redirected back to my website I need to use the access token to fetch the user's saved albums using the /me/albums API. I already have a small library that makes this easy. The API returns a list of saved albums that includes the album name, a list of artists for that album, and a list of album images. A thing to note is that every Spotify object, whether it be an album, an artist, a track, or a playlist, has a unique id. This is perfect for Cosmos DB lookups. If the album id exists in Cosmos DB that means the album was already processed by the service. If the album id does not exist I need to process it.

The first part of processing is analyzing the album image using the Computer Vision API. The example tool on the Azure website only works on image links that have a .jpg, .jpeg, or .png file extension and Spotify images links don't have extensions, for example https://i.scdn.co/image/ab67616d0000b273fce23dadb51975ebf2e9d75c. Thankfully the actual API works fine with Spotify image links. To use the API you need to specify what features you want analyzed. The four I chose were tags, description, color, and image type. Here's an example response:

{
    "Color": {
        "DominantColorForeground": "Brown",
        "DominantColorBackground": "Pink",
        "DominantColors": [
            "Pink",
            "Brown",
            "Red"
        ],
        "AccentColor": "A53526",
        "IsBWImg": false
    },
    "ImageType": {
        "ClipArtType": 0,
        "LineDrawingType": 0
    },
    "Tags": [
        {
            "Name": "person",
            "Confidence": 0.9999055862426758,
            "Hint": null
        },
        {
            "Name": "man",
            "Confidence": 0.994906485080719,
            "Hint": null
        },
        {
            "Name": "human face",
            "Confidence": 0.9926855564117432,
            "Hint": null
        },
        {
            "Name": "wall",
            "Confidence": 0.9863784909248352,
            "Hint": null
        },
        {
            "Name": "indoor",
            "Confidence": 0.8605086803436279,
            "Hint": null
        },
        {
            "Name": "wearing",
            "Confidence": 0.8425125479698181,
            "Hint": null
        },
        {
            "Name": "clothing",
            "Confidence": 0.8335848450660706,
            "Hint": null
        },
        {
            "Name": "shirt",
            "Confidence": 0.8327466249465942,
            "Hint": null
        },
        {
            "Name": "portrait",
            "Confidence": 0.7301548719406128,
            "Hint": null
        },
        {
            "Name": "human beard",
            "Confidence": 0.6403167843818665,
            "Hint": null
        },
        {
            "Name": "screenshot",
            "Confidence": 0.5777060985565186,
            "Hint": null
        },
        {
            "Name": "forehead",
            "Confidence": 0.5023394823074341,
            "Hint": null
        },
        {
            "Name": "staring",
            "Confidence": 0.26128333806991577,
            "Hint": null
        },
        {
            "Name": "male",
            "Confidence": 0.18879052996635437,
            "Hint": null
        }
    ],
    "Description": {
        "Tags": [
            "person",
            "man",
            "wall",
            "indoor",
            "wearing",
            "shirt",
            "staring",
            "male"
        ],
        "Captions": [
            {
                "Text": "Donald Glover wearing a pink and white checkered shirt",
                "Confidence": 0.44440755248069763
            }
        ]
    }
}

Notice that Description, which returns a Tags array, repeats many of the items in the main Tags array but there are some items in the main Tags array that aren't in Description. No idea why this is.

Next I need to save that information to Cosmos DB. I created a data model object with basic album info and the results from the Computer Vision API. Cosmos DB items need an id field so I used the album URI (Spotify id: 4GNIhgEGXzWGAefgN5qjdU vs Spotify URI: spotify:album:4GNIhgEGXzWGAefgN5qjdU). Cosmos DB also requires a partition key so I simply used the album URI there as well. Note that I'm not saving albums with any user identifying information. Albums are the same across different Spotify users libraries. If someone else using the service had the same albums in their library your library processing time would be much faster because of the shared album ids.

Once saved in Cosmos DB I next want to create a search document in Cognitive Search. The search document is a simplified version of the DB item. Both tag arrays are merged and other unnecessary fields, like confidence values, are removed. Search documents require a key field and only certain characters are allowed so instead of using the album URI as the key I used the album id. Here's an example search document:

{
    "AlbumId": "4GNIhgEGXzWGAefgN5qjdU",
    "AlbumName": "Because the Internet",
    "ArtistName": "Childish Gambino",
    "Colors": [
        "Pink",
        "Brown",
        "Red"
    ],
    "Tags": [
        "person",
        "man",
        "human face",
        "wall",
        "indoor",
        "wearing",
        "clothing",
        "shirt",
        "portrait",
        "human beard",
        "screenshot",
        "forehead",
        "staring",
        "male"
    ],
    "Captions": [
        "Donald Glover wearing a pink and white checkered shirt"
    ]
}

And that's all the processing I need to do! Lastly I need to return some album info to the front-end so the front-end can display the user's albums.

Spotify albums returned by the process step

Now I need to enable searching on the front-end. When the user types in a search query like "white" or "outdoor" or "Donald Glover wearing a pink and white checkered shirt" the service should use the search API to return a list of results from their Spotify library that match that search query. This presents a problem since the album items in Cosmos DB don't have any user identifying information. Neither Cosmos DB or the search service know what items belong what user. I did however return the user's albums to the front-end. I can use that information to filter down the list of search results to just those in the user's library. I actually pass the list of user's album ids to the service so I don't have to pass potentially hundreds of search results to the front-end to be filtered down. I don't however use the built in search service filter. The Azure documentation specifically says this:

One of the limits on a filter expression is the maximum size limit of the request. The entire request, inclusive of the filter, can be a maximum of 16 MB for POST, or 8 KB for GET. There is also a limit on the number of clauses in your filter expression. A good rule of thumb is that if you have hundreds of clauses, you are at risk of running into the limit. We recommend designing your application in such a way that it does not generate filters of unbounded size.

Considering I currently have 418 albums in my Spotify library I probably wouldn't be able to use the search service filter so instead I just filter on my end in the service. I just return the album URIs because the front-end already has the information needed to display the albums and it just needs to know what albums to display after a search.

Search results for "Donald Glover wearing a pink and white checkered shirt"

After building a crude UI I'm done right? Wrong. I wanted some safeguards in place to try and avoid abuse of my service APIs. The service APIs pretty much only need a Spotify access token in order to work. Anyone could request a Spotify access token externally and make direct calls to my service APIs. To avoid this I introduced a state value. Similar to state in OAuth 2.0 to prevent cross-site request forgery I use state to ensure the client calling my service is a client I authorized. After getting the access token from Spotify but before returning to the client I save a small item in Cosmos DB which contains a random state string along with the user's Spotify id. Next, when a client calls the process or search API they're expected to pass their Spotify access token along with the state string I gave them. If their Spotify access token is invalid that will fail authentication with Spotify. If their access token is valid but their state string doesn't exist in Cosmos DB that will fail authentication. Finally, if both their access token and state string are valid but the user id stored with the state string doesn't match the user id of the Spotify access token, retrieved with Spotify's Get Current User's Profile API, authentication will fail. This all but ensures only authorized clients can call the process and search API.

It's also mobile friendly. This was taken on a real iPhone!

After a little Bootstrap and simple JS I'm actually done. The website is available here if you want to try it out. It can take a few minutes to process your Spotify library for the first time but after it's processed subsequent uses should only take a few seconds. You can also tap on the album image to directly link you into the Spotify app. Yep, those Spotify URIs are actually URIs. Now when I forget the name of Somersault by Beach Fossils I can use this website to search for "white" and find the album among the 204 albums in my 418 album Spotify library...

Maybe I should just remember the name of albums I want to listen to.

Putting It Together

Sign up for more like this.