Exist – the personal dashboard

Let me share with you the thing I found on the internet. That thing is called Exist.io. They aren’t paying me for this ad but I’m still going to do it as it reignited my push for the quantified self.

The premise is simple. The Exist.io asks for your permissions to certain services and on your behalf collects, aggregates and presents these as nice graphs. It’s a personal dashboard with metrics making it easier to observe progress. They also provide means for adding metrics such as mood scale and, a very limited, tagging system but based on the transparent roadmap that might change one day.

Yes. the privacy issue is huge, so you simply have to trust them that they won’t abuse permissions. However, based on their privacy policy and business model (monthly subscription), I’m less concerned about them than probably most other services. Besides, not knowing where exactly these things go or how they are protected I’m not going to give them access to sensitive data such as email or location. Feel free to steal my sleep time, coffee intake or steps count (might even publish these one day).

The dashboard works for me. It doesn’t give many meaningful insights but I like that everything is in a single place and I can export from there to my personal storage. I can also tell what is missing and how I’d like to collect the other information. It made me even start working on my personal tools for collecting productivity time and forecasting mood changes and “life cycles”. It’s a great source of personal multivariate time series (clickbait) understanding of which will literally change your life.

Overall, big thumbs up. If you like to know more about yourself, especially to have solid numbers in a single place go for it. And, if you think about yourself in number, let me know. I’m planning to work on a few projects that would like to release to broader an audience in the “near” future.

Referral link that gives me $2: exist.io

Barren pages get some words raining

There has been a surprising shift in my approach to writing. Any external request to express or clarify something in a written for would evoke a passionate hatred. I could feel the amount of time that is going to be wasted for no good reason what-so-ever. What others might complete in 5 min I would need to spend more than an hour. The reason was that many people considered my expression style as “weird” or “unusual”, and would often ask for clarifications in the way they wanted just to make sure that we are on the same page. Talking is easier because you quickly clarify these concerns and carry on. Writing, however, is a double pain. I’d spend hours on rewiring and polishing a couple of sentences trying to clarify what might be potentially unclear and, then, I’d spend a couple on replying to feedback which typically meant “unclear.” It’s a frustration spiral.

At times when I’d panic and plea for help, the most common advice was reading more books. Probably great advice, but doesn’t work for me. As an (too) active person, I have trouble sitting down and reading. The most productive reading is when I’m walking in a dull room with white noise and without anyone. As you might expect, this isn’t always possible, so my reading is slow. How can you write clearly if you don’t know the rules, and how do you know the rules if school grammar is different from the common folks’ expectation?

The solution isn’t clear but I think I’m approaching it. Not only starting a new paragraph isn’t problematic anymore, but I’m also spending less time on polishing expressed thoughts. A year ago I’d quickly run away from any writing task but now I’m scribbling a coupled of pages every day. Some pages are for myself; personal notes and thoughts to keep my life organized and properly archived. Others are for work. Even though, at the moment, I’m a Software Developer Engineer, which some might imagine solely writing code, I’m daily updating documents, be that a documentation, design explanation or ideas pitching.

What’s the change? To be honest, I’m not sure and it might be anything. But, if I were to highlight a few potential reasons, three that come to mind are:

  1. The Elements of Style” by Strunk & White. It took me a while to start reading this book but the delay was mainly because “it is a book” and books are scary as they typically have more than 50 pages (too much). But, depending on the edition, it’s about 30 pages long and concisely explains “dos and don’ts” in writing. Logical rules to follow; a simple guideline to follow; a blessing!
  2. Grammarly. A service and a web application acting a spell-checker on steroids. It detects grammar misuses, unclear sentences, word repetitions and has a built-in thesaurus. Each detection comes with a brief explanation of why it was triggered and how to solve the issue. Given the broad demand and lack of alternatives, there’s definitely more to come from the Grammarly.
  3. Lower expectations. Quality bar for writing outside academia is significantly lower. This isn’t an insult. It’s actually great to not stress out about imperfect sentences and slightly ambiguous words. Words don’t carry that much liability so people are encouraged to make mistakes are that can be cleaned in the process. The focus is more on the story progression and less on defending thoughts. Lower threshold allows realistically learning from experience.

This blog as a whole was intended to be scientific. Even when I wrote that it’ll be less ‘scientific’ than before, I meant that it’d be less ‘precise’ in words’ synthesis. Given that I already left academia and my ties with the scientific side are decreasing, the intention needs to change. Expect more docs on general topics but still in engineering/technology theme. I’m still considering having a separate blog dedicated to philosophy, so, who knows?

Project template

Having more time I started retrospect on my previous projects and attempt to extract characteristics that lead either to success, or failure. As one might expect, these characteristic change in time. Currently, I’m doing better off dividing work into chunks and deal one at the time but a while back it was easier to just get into working and the immediate task would pop out. Going forward my habits and motivation will definitely change again so it’s difficult to impose a strict ‘successful project’ template, but, we aren’t setting things in stone and if needed we’ll change them.

Analysing notes from the last 10 years I noticed that there were a few common parts which, if defined properly allowed me to succeed in the project or fail gracefully. I use “properly” because sometimes I tried to cheat myself expecting that future me might forget and will be fooled by the writing. However, the only thing that I managed to forget was that the “forgetting” part doesn’t work. Even now I have a vivid experience of changing the expectations for each project but had to think a bit why I did that. Hopefully I wiser.

Now, whenever I’m thinking about starting a new project it goes through a few phases.

Project phases

Phase 0

Quickly write a sentence or two about the project. It’s usually a sentence on what the idea is and how I came to think about it. This typically allows me to let that thought go for a while and continue on whatever I was doing before. For such quick notes I’m using Google Keep – two taps to input note – but I’m looking to escape Google. Once it’s written I can let it mature for some time, typically until the next evening.

Phase 1

Note: If the project is super short or it doesn’t make sense to start it in a few weeks, I’ll skip over this phase or delay it.

Typically a day delay and a full night rest allow me to validate whether it makes sense to devote some time to the idea. In this phase I’ll open a text editor and will try to write at least 100 words on how I see this idea to grow and what would it mean to do it. Writing out prevents me from cheating myself with the optimism of the moment. It’s a bit tedious to write on something in my head sounds awesome but the logic is if I can’t be bothered to spend 5 min writing out on the project it’s unlikely I’m going to want to work on it.

Phase 2

Create a document to monitor the progress of the project and populate the top with the following template:

{Project Name}

Success: {The definition of successfully finished project.}
The **Why** statement: {Why am I doing this project?}
Expected length: Coarse: {Long/mid/short term}
Fine: {XX weeks/months}
Health check: {Define period of time to check on the project.}

The goal is to check these properties every {health check} period and make sure that they’re still valid. This task won’t be successful unless the table is easy to read and short. The rest of the document should be devoted to progress updates. Depending on the project this can be in form of a timeline with timestamps, or living document. I’ve started using the http://notion.so/ for this; it’s a collaborative, realtime doc with plenty of key shortcuts and markdown enabled.

Phase 3

This phase is about execution. Every {health check} review priority of the project and set aside some time for it. If it happens that for a few {health checks} you weren’t able to do anything related, go ahead and update the project. I like to keep track of ongoing projects in a special Trello board and create individual tasks to “daily” board. I could’ve used Notion for it as well, but I find Trello awesome for day-to-day tasks and it’s easy to accidentally check what’s going on when you’re a click away with a constant reminder.

It’s just a phase

There’s no point in cheating yourself that you’re going to do something when you don’t want to. Almost everyone I talked to had experienced the inability to do anything caused by mental overload with things that had to be done. I’m assuming that the driver for these projects wasn’t a necessity but the choice to do something awesome in spare time. If that’s the case feel free to let some go. That’s difficult due to the sunk cost fallacy but the goal of this template is to help rationalise the decision. Our values and motivation change, and so should our priorities. (It won’t be probably long before I feel stupid for ever writing this blog post.)

Two-way doors decisions

Working for a tech behemoth corp is significantly different than what I expected joining. There is plenty of corpo bullshit, “vapor-ware” and ass-lickers that simply want to game out the system to climb the ladder. But, obviously, that’s not all it has. Can’t tell how is everywhere, but where I am, I must admit that I’m surrounded mainly by competent people and well-matured ideas.

One of the ideas/approaches that we’re trying to strive to is two-way door decision. It’s a literal analogy to the two-way door which in contrast to one-way door allow going through, and if you don’t like what you see, go back.

It’s obvious, isn’t it? When was the last time you saw one-way doors? Those things typically don’t make much sense so you don’t see them around. Why would you want to go somewhere you don’t know much about and not be able to get back? The concept with doors and action on them — go, check, return — is easy to understand but it can be extended to any activity.

We should strive to make such circumstances where we can make a decision and if we don’t like the outcome we should be able to come back. In down to earth example, this is often what retailers allow us to do; we can buy things and if we don’t like it we can return them. Examples in the software development include using feature flags to allow quickly turn off new futures when they don’t do well or make a rollback mechanism for quickly reverting broken builds to the previous state.

Often “backup plan” comes as a similar concept. To me, the backup is more like a pair of one-way doors. Once you go through the doors it will allow you to run away from that place but where you go don’t have to be exactly the same place from where you came. This is not “worse” but it’s a different concept and in some situations will be better. Maybe consider situations where you know, where you are is a bad situation and you simply want to run away from it.


One-way doors: A -> B
Two-way doors: A -> B ( -> A)
Backup plan: A -> B ( -> C)

Toggling academia status to halted

There was a significant update on my title. Since the end of November, I am officially a PhD. The relief is immense. Obviously, life goes on and nothing has significantly changed on the outside but I can see that my approach to things lighten up and the approach of “Yes can do” returned. I’m open to new projects and ideas.

Surprisingly enough once just before submitting the final version, I stared (again?) to recognise the greater contribution that the work has and it might have. Given that the Machine Learning community is again gradually incorporating the model-based approaches and go smaller on distance (calculus). Such progress opens up opportunities to apply my work to the broader area of interest.

When will this finish…

For the past few years, my life is on hold. Yes, I go to work and do something there but the majority of the time I’m still spending on PhD. It’s such an existential trap. It’s close to the second year when I’m trying to impress a single person who doesn’t really care. It’s close to four years when I’m trying to improve some idea that I had and thought that it might work because the previous 3 years gave no results.

When I started the PhD I was motivated, interested in everything and shaking from the excitement that I’ll be pushing humanity forward. Now, I just want to do the minimum required. In the hindsight, I’ve wasted my life. Nothing good is coming from this. Hopefully, that is “yet”. December is in or out and, at this stage, I don’t really care.

“Type ahead search” in Ubuntu 18.04+

Defaults in Ubuntu nautilus

In Ubuntu 18.04, typing in Nautilus file manager makes a deep search through it’s all subdirectories. This is potentially a powerful and convenient tool but, unfortunately, not for me as it doesn’t fit into my workflow. I’m less of the GUI fan and only use a graphical file manager when I need to do some quick drag and drops on file bulks. Typically, if I’m using Nautilus, I’m already in the directory where I need to be and just want to quickly jump to a file/dir by the name. Shortly: yes, bring back the ye olde stuff.

Unfortunately, it seems that Nautilus has removed the type ahead feature a long time ago (2013) and it was the Ubuntu who was patching for its Unity. With the 18.04 we’re back to Gnome so no patches are, and won’t be(?), provided. Maybe at some point, when I have SSD and Ubuntu improves indexing, I’d like to use the deep search but for now, let’s revert this.

Reverting options

For reverting to the type ahead search there are two options:

  1. Use a different file manager. Apparently, Nemo and Caja are reasonable alternatives.
  2. Install a forked and improved version of Nautilus. Steps below are based on OMGUbuntu and AskUbuntu.


Installing forked version means adding a ppa to the modified code. The ppa addition doesn’t modify the default one so it’s easily revertable.

  • Add PPA: sudo add-apt-repository ppa:lubomir-brindza/nautilus-typeahead
  • Install/override nautilus: sudo apt full-upgrade
  • Restart Nautilus: nautilus -q

To remove the modified version simply remove the ppa and update apt with

  • Make sure you have ppa purger: sudo apt install ppa-purge
  • Purge the ppa: sudo ppa-purge ppa:lubomir-brindza/nautilus-typeahead
  • Install/override nautilus: sudo apt full-upgrade
  • Restart Nautilus: nautilus -q

There you go. The World is better by being the same as before.

AWS EMR with data in medium input and large output in AWS S3

The last post about AWS EMR and S3 has resulted in few people messaging me directly. To ease others let me add something about how I approach a specific problem.

As mentioned previously when dealing with large amount of data some precious needs to be made. There isn’t a solution which would fit all computation problems (obviously) but that doesn’t mean there aren’t better starting points.

In case when the input data is relatively small, say less than a terabyte, and the processing is highly parallelizable producing larger output, then it helps to do everything locally. If the input data is in S3, or we want to store the output to S3, then one can copy data with S3-dist-cp. It’s an extended version of dist-cp with the understanding of AWS S3 so it’s rather safe. All EMR instances have it installed by default making it easy to either execute through shell after ssh onto master, or, which is preferred, execute it as a EMR job step.

It’s reliable enough that for a given set of problems it was better to write a quick wrapper which converted a single step

spark-submit s3://bucket/path/to/script.py --src=s3://bucket/input/data --dest=s3://bucket/output/data

into three steps, download-process-upload, i.e.

s3-dist-cp --src=s3://bucket/input/data --dest=/hadoop/input/data
spark-submit s3://bucket/path/to/script.py --src=/hadoop/input/data --dest=/hadoop/output/data
s3-dist-cp --src=/hadoop/output/data --dest=s3://bucket/output/data

This is great when we have a large number of executors, definitely more than 200. But even then, experiment. Sometimes it’s better to reduce the number of executors and increase their onload.

Tiny bit about AWS EMR on big data

One of the recent projects I’ve worked on involved processing billions of row stored in AWS S3 in terabyte size data. That was the biggest I’ve worked so far and, even though I don’t like the term, it broke through the big data barrier. Just handling the data with popular toolkits, such as scikit-learn or MXNet, created so many problems that it was easier to create our own solutions. But the biggest surprises came from places least expected.

The first surprise came from AWS EMR. With such amount of data, there is no other way than to use a large computation cluster and the EMR is quite easy to set up. Web UI is rather nicely explained and once you know what you want you can use CLI or SDK to do so programmatically. Cool, right? So what are the problems? There are plenty of things that simply don’t work as they should or are not mentioned that they work differently. The number the one-click install applications that the EMR supports is rather big and you can see some explanation for any of them. However, nowhere is mentioned that the EMR Spark is a fork of Apache Spark and thus slightly different. It comes with different default settings so the best practices for Apache Spark aren’t the same and searching for EMR Spark just doesn’t return anything. It took me a while to find out that accessing S3 should be through s3:// or possibly through s3n:// but it’s deprecated and slow. It also states that you shouldn’t use s3a:// which is, in contrast, is the recommended way of doing with Apache Spark. Oh, and while I’m on the S3…

Another big surprise came from AWS S3 itself. Thinking how global and popular the service is I was surprised to learn that there are actual limitations on the connection. Ok, I wasn’t surprised that there are any, but I thought they were much bigger. According to AWS S3 documentation on Request Rate and Performance Considerations one shouldn’t exceed 100 PUT/LIST/DELETE or 300 GET requests per second. It is averaged over time so occasional bursts are Ok but do it too often and S3 is going to throttle you. Why this matters? When you are using Spark to save directly to S3, e.g.

    val orders = sparkContext.parallelize(1 to 1000000)

and you are working with hundreds of executors (processes) on small tasks (batches) then you are going to query S3 a lot. Moreover, by default Spark saves output in a temporary directory and once save is done it renames and moves everything. On file system that’s almost instantaneous operation but on object storage, such as S3, this is another big read and save, and it takes a significant amount of time. Yes, this can and should be avoided by proper configuration or repartitioned to a smaller number before but learning about this the hard way is not the nicest experience. Moreover, one would expect that the integration between EMR and S3 would be smoother.

Having said all of that I need to highlight that working with Spark and Hadoop on AWS EMR is rather simple. It takes a long time to learn the nuances and proper configuration per task but once that’s done the life gets only better. One of features I’m looking forward in the future is a better integration with MXNet. Both MXNet and Tensorflow allow for nice utilization of CPU and GPU clusters so it should be a matter of time for EMR to support that out of the box, right? Hopefully. In the meantime Spark + Hadoop + Ganglia + Zeppelin seems to be all I need for big data processing pipelines.


Although learning and book knowledge are the best, my personal relationship with reading activity is not the friendliest. Being focused on the text is a huge struggle and I often need to re-read sentences to actually read it. That’s why sometimes I use text-to-speech (TTS) software or service.

Few years ago I discovered an Ivona Text-to-speech software which was far superior to any other TTS solution. It was able to quickly read out loud (and clear) text from my clipboard. Not only it was better than others but also it supported Polish – my language. Even though the default software wasn’t useful for my use cases, i.e. scientific papers have unusual formatting, it wasn’t that difficult to write a wrapper and GUI around the Ivona. Unfortunately, it’s not supported anymore and one cannot download the offline version.

Currently, Ivona is owned by the Amazon and its voices are accessible through the Polly AWS service. It’s a relatively a cheap service but one still has to have an internet connection and it’s not provided with any gui. At least officially.

I’ve written an application to use AWS Polly. It’s a simple graphical interface with some formatting options for the text but it does its job. The AWS Polly GUI is accessible from my GitHub page. It’s running on Python3 with PyQt5.

Features are updated as needed so if something might be helpful to anyone, feel free to contact me or create a ticket issue on the repository. I’m using this for my personal work so I’m not planning on leaving this on a side.