Amazon with it’s Amazon Web Service (AWS) is pretty cool. It gives you access to remote machine which you don’t have to maintain. Actually you don’t have to do anything other than use it. All machines come in different flavours, but what tastes better than free? Granted that it’s extremely limited, but surely we can squeeze something out of it. Right?
AWS instances, i.e. remote machines, differ in the amount of RAM, disc space, operating system, whether they have GPU access and so on. As you can expect free tier instance is pretty low on all measure values. To be more precise free tier instance is of t2.micro type, which is a general purpose burstable instance with a single CPU, 1 GiB memory and EBS data storage (default 4Gb storage).
What is this good for? Depending on the needs, this might be good for almost anything that doesn’t require whatever these instances are lacking. (Did I help?) Obviously. So it’s not so good for heavy computations, training machine learning models or storing data. First of all, it’s better to use for these some other services like S3, DynamoDB, Lex or general machine learning. However, in case of specific requirements, it’s always better just to rent(?) more powerful instance.
These cheap instances, in my option, are very good for few tasks. The main one is web scrapping. This is tedious task that requires small CPU bandwidth, but constant access to the internet. Moreover, we don’t really want to make many calls in small time period so there needs to be a delay between each download. That’s either because we would like to avoid being detect as a bot, or for simply politeness to the owner of the server (not clogging bandwidth).
Internet is full of examples of scrappers for different type of data. I’m adding my own to the collection with r-u-listening project. The core of the project is to allow for users to find similar music to their input. It is a bit more than recommender, but more on this project probably in the future. The scraper itself is more in two parts, i.e. crawler.py and scraper.py. The database that I’m using is FreeMusicArchive.org, which goes with slogan “It’s not just free music; it’s good music”. I do recommend it and once I have something valuable I’d like to share it with them.
Unfortunately these instances don’t come with big default memory and storage. By default they have only 4 Gb storage, which when downloading mp3 tracks will be enough for about 800 tracks (assuming about 5 Mb per track). Again, as always, it depends on the task, but for machine learning algorithms we go with The more, the merrier.
As mentioned before, free tier instances allow up to 32 Gb. To do so go to EC2 service in your AWS console. In the options tab (left side) find Elastic Block Store (EBS) and select Volumes. Then select your instance and Actions, and Modify Volume. Simple, right? In all honesty, like many things in the AWS.
I’ve been using AWS for a while. Even finished AWS general course, its essentials and 3 day onsite workshop on Architecting on AWS. All is pretty simple and consistent. I like it.