Stop reading and solve a real world ML problem

One of the best ways of learning comes from solving a real world problem. A machine learning colleague suggested I solve a real world problem that I was facing using machine learning. My mind was drawn to a problem on the road where I live. Many drivers drive too fast down the road in the evening. I wanted to raise awareness to the police about how fast these vehicles were going. 

I created a github repo with the work. Please take a look, and https://github.com/amplifiedengineering/opencv-radar. The following post is about the process I did to come to that solution.

My first approach was to find a LIDAR sensing device to measure the speed of the vehicles, and capture the license plate information through a camera. As I began researching LIDAR devices, I realized some of the less expensive models were custom built for a specific purpose, for example this devices’ spec sheet shows it does a single reading which is built for measuring the depth for drones to avoid ground collisions. That wouldn’t help me in my tracking, where I need to be able to measure multiple targets potentially (two cars going the same way, or two cars going different ways). 

As I was describing this to a friend who is into robotics, he suggested a different approach. Why not measure the speed using video? I liked this approach, as I could record video from my iPhone, without buying expensive single use hardware, and then run the calculation using software. As I started thinking about how to solve the speed calculation, it seemed straightforward: have a set of lines on the video that mark a well known distance. Then calculate the speed based upon the number of frames it takes for the vehicle to transit between the two lines. 

Speed (miles per hour): miles (distance in feet / distance in feet of a mile) by hour (total # of frames * frames per seconds * ( 1 / seconds per hour )

To be specific, I traded off real-time video processing for using post-processing. So, this approach wouldn’t work for all applications, but I was more interested in grabbing a video file from my phone, doing post processing and finding how many people were speeding. This way, I didn’t have to worry about the frame rate causing a queue to develop in the stream (which can be solved by sampling frames rather than processing each frame).

The first problem was detecting where the vehicle was in the image. I knew about the openCV library, so I started down a path on exploring this. I used a Haar classifier to detect the cars and experimented with multiple different hyperparameters to tune the detection of the cars and trade of the # of frames per second I could process using detectMultiScale API. Some of the key trade-offs here were the size of the video frame to be processed (the smaller the faster the processing, but the less ability to detect), the scaleFactor (API), and the minNeighbors (API). I played around with how much I would resize the greyscale image of the video frame before passing it into the detectMultiScale API. It was working “ok”, but I noticed that there was a bit more choppiness (jumpiness) in the object detection borders, so I looked for a way to improve it. 

I found this blog post informative. It uses Haar classifiers with background averaging to detect the differences in pixels and then creates bounding boxes on those detected contours differences from the background. This again improved the performance slightly. However, I was still seeing very unstable bounding boxes around the cars, which left me wanting a more stable detection mechanism.

Finally, as I was talking with another principal level developer about an unrelated topic, he was sharing about some work which he had done using OpenCV and Yolo. I decided to look up Yolo. I was impressed with the improved stability in object detection and similar performance. At first, I was a bit disappointed that the Yolo architecture requires a color image, I was previously using greyscale (with less channels used in inference I assumed it would be faster). After that I discovered that the architecture would also do all of the resizing for you when you did inference, and that the previous resizing I was doing was superfluous. Finally, I modified from using the v3 predict function to find the objects to the v8 track function

Figure 1: setup screen to verify the start / end lines for video

Figure 2: tracking cars using the YoloV8 track API function, with tracking trails

Hope you enjoyed the ride. A few potential improvements which I’m thinking of now, detecting the speed based upon well known sizes and distance (specifically I could use license plate size), and ability to collate together multiple views to tag a higher resolution view of the license plate.

Pragmatic & predictable notebook using Python environments

Going through the second half of the course of fastai, I was running into problems running the notebooks. I am relatively new to Python package management, and so some of the very cryptic error messages baffled me on how to resolve. 

An example

When trying to run the notebook #9 (may have failed earlier) from part 2, I got errors around NoneType object not being iterable when getting the data back from the dataloader. I was trying to ensure that I had the correct versions of libraries installed, so I went to the terminal window and tried a pip3 install of pytorch libraries, but then that would give this error:

× This environment is externally managed
╰─> To install Python packages system-wide, try brew install
    xyz, where xyz is the package you are trying to
    install.
    
    If you wish to install a Python library that isn't in Homebrew,
    use a virtual environment:
    
    python3 -m venv path/to/venv
    source path/to/venv/bin/activate
    python3 -m pip install xyz

First, I realized after much searching on the internet that the above error message is a passive way (if you wish to install a library) of indicating that you should set up a separate Python environment where you can have specific libraries installed.

Second, I have installed some newer dependencies than what the fastai course expected (torch version needs to be 2.0.1 and numpy needs to be less than 2). So, I needed to have those legacy versions installed to run the fast.ai code. (example python3.12 is installed on my machine which is incompatible with torch 2.0.1).

So, I was wrestling with how I can specify those specific versions in the Jupyter runtime. Python environments help resolve this issue. The premise of Python environments is that you can independently manage all of the library versions for a particular environment without destabilizing other Python libraries you may need access to. You can think of it as a container which allows you to install specific versions that are only available within that container.

However, I had never worked with Python environments before, so I had to learn that. They are pretty easy to do (step 2 is an example below). I was able to isolate all of the package versions using the Python environment so that I could independently run the notebooks without additional problems. The steps below allowed me to do that.

figure 1: how to Python env allows you to target specific libraries

Solution

Steps to get a running local notebook (Macbook) for fastai/course22p2 in the Github repo:

  1. Get Python 3.10
    • brew install python@3.10
  2. Create a virtual environment which is python3.10 based
    • In the terminal window run
      • python3.10 -m venv .fastai2
      • source ~/.fastai2/bin/activate
    • Note: this creates a virtual environment from which you can install pre-reqs which could be incompatible with the currently installed default python version. For example, pytorch 2.0.1 is incompatible with python3.12.
  3. Install the pre-reqs (in terminal using Python env)
    • pip3 install -Uqq git+https://github.com/fastai/course22p2
      1. (Installs all of the miniai components created in earlier notebooks)
    • pip3 install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 
    • pip3 install matplotlib
    • pip3 install jupyter
    • pip3 install numpy==1.23
  4. Run jupyter (in terminal using Python env)
  5. Launch colab in browser
  6. Run the cells in the notebook, voila!

I hope this helps someone save time from figuring out these dependencies in the future.

Prioritize or Perish: The Ultimate Guide to Time Management

The greatest superpower anyone has in this day and age is disciplining their attention and use of time. Time management is the skill that differentiates people who achieve their goals from those who don’t. The key skills to learn for great time management are identifying, prioritizing and executing tasks. Since this particular post is targeted towards employees newly entering the job market, I will focus on the last two prioritizing and executing. Once you have mastered the prioritizing and executing skills, moving on to identifying and exploring new potential priorities will make you unstoppable. The process of identifying tasks will be covered in a subsequent post. 

First, we’ll start with the Eisenhower matrix. This matrix is a great tool to help you understand and prioritize what to work on. 

   Urgent   Not Urgent 
  Important   1 – do   2 – schedule 
  Not important   3 – delegate   4 – delete 
Figure 1: Eisenhower Matrix

Every task should fit into one of the four quadrants:

  1. Is it important and urgent? Do it.
  2. Is it important, but not urgent? Schedule time to complete it.
  3. Is it not important, but urgent? Delegate it.
  4. Is it not urgent and not important? Delete it (don’t do it)

Some excellent books which gave some great nuggets on time management: 

  1. Seven Habits of Highly Effective people
    • Habit #3: Putting first things first means organizing and executing around your most important priorities. 
  2. Getting Stuff Done
    • Write a list of specific things you want to accomplish to get it out of your head. E.g. Write next blogpost draft on time management skill. 
  3. Checklist Manifesto
    • Checklists provide discipline, and discipline makes success possible. 
  4. Deep work
    • “The key to developing a deep work habit is to move beyond good intentions and add routines and rituals to your working life designed to minimize the amount of your limited willpower necessary to transition into and maintain a state of unbroken concentration” 

Each of these books circle around a theme of discipline around writing down and prioritizing what you should be focused on. I have synthesized the advice into three key skills to master. 

These key skills are: 

  1. writing everything down which you need to do (I.e. organizing or identifying tasks) 
  2. prioritizing tasks  
  3. executing tasks in prioritized order 

Make a list, prioritize, and execute. Repeat. 

1) Writing everything down  

If you aim at nothing, you will hit it every time.

Zig Ziglar

Let’s take an example, first write down everything that you need to prioritize over the next week. Here’s an example list of my tasks: 

0. Finish draft of time management with focus on prioritizing and executing 

  1. Drop off Leaf for repairs at auto-body shop (Mon) 
  2. Drop off Dad @ airport (Mon) 
  3. Clean-up
    • Clean desk 
    • Vacuum desk area 
  4. Training
  5. Network
    • Go through list of previous contacts (Google sheet) and schedule three informational interviews (follow-up on potential introductions) 
  6. Prepare for interviews (obfuscated company names to variables to avoid exposure)
    • CompanyX – interview Tuesday 
    • CompanyY – interview Thursday  
    • CompanyZ – interview Friday
      1. Each of these contain subtasks around looking up people I’m meeting with exploring their business, preparing questions for the interviews, etc. 
  7. Call Rightway pharmacy (re: compounding prescription re-imbursement) 
  8. Check stock market once this week 
  9. Read through Twitter feed for 30 mins max 
  10. Engage with a couple of posts on LinkedIn for 30 mins max 
  11. < Personal family / friend related priorities > 

The most important step in my week is cataloging what things need to be done. When I was younger, and had less responsibilities, it was easier to juggle all of these tasks mentally. However, as I grew in responsibilities (more kids, employees, job responsibilities) there was an inflection point where I realized the lack of an organized list was hurting my impact. I would focus on whatever I wanted to at the moment, and sometimes that was the wrong thing. Focusing on the wrong thing is easier to do when you don’t have a list. Alternatively, I would prioritize interrupt driven requests without weighing it against my other weekly priorities. At the end of those weeks, I would look back retrospectively and wonder where all my time and energy had been spent, and what impact I had to show for it.  

2) Prioritizing tasks 

The second most important step when making the list is to prioritize it according to the Eisenhower matrix. Here’s an example of bucketizing my items:  

  • Urgent & important – items 0, 1, 2, 6 
  • Not urgent & important – items 4 & 5 
  • Not important & urgent – item 3 & 7 
  • Not important & not urgent – item 8-10 

Note: I added items 8-10 as an example of areas where I personally struggle with setting boundaries.  

Note 2: item 11 is expanded on the longer list, and this particular week had many additional family responsibilities that aren’t normally present. 

Of course, just because you can identify some items as delegate in the matrix, doesn’t mean you have the means or have the time or emotional overhead of delegating the task. For example, task 3 (clean-up) although it’s categorized as “not important” and “urgent”, my wife and kids wouldn’t necessarily take kindly to me asking them to clean up the mess I made at the desk. So, what do I do with those tasks? I call these filler tasks. I fold these filler tasks into my week when I’m feeling less motivated to do a more thought or focus intensive task. Completing some of these tasks and checking them off can significantly boost my motivation and help defeat procrastination.  

Having a deadline associated with tasks, will give you the proper focus to ensure that you avoid doing non-important tasks. For example, I know my dad’s flight leaves on Monday morning, so I schedule time for “drive dad to the airport” on Monday morning in my calendar. I’m sure that my dad appreciates that I focused on dropping him off at the airport on Monday, rather than waiting to bring him to the airport on Tuesday. Additionally, having this context will help you to order similar tasks. For example, the interviews with the companies (X, Y, and Z) were actually on Tuesday, Thursday, and Friday, so I needed to focus on preparing for X first, since it was on Tuesday. Getting that order incorrect would have left me unprepared for the interview. 

People think focusing is about saying yes to the thing you’ve got to focus on. But that’s not what it means at all. It means saying no to the hundred other good ideas that there are. You have to pick carefully. I’m actually as proud of the things we haven’t done as the things I have done. Innovation is saying no to 1,000 things.

Steve Jobs

3) Executing tasks in prioritized order 

It can be helpful to have a strict ordering of the tasks on your list.  This will allow you to focus on only the next highest priority item on the list and avoid getting to the end of the week without a critically important task done. I am less rigid about this strict ordering, just because I know how I work best. I like autonomy to choose what I’m working on next instead of having a schedule dictate what that will be. Some days I will wake up and I have to coax myself more into getting into the zone of work. This may mean that I focus on getting a couple of the filler tasks done as a warm up to the more difficult tasks. This is the exact opposite advice which Cal Newport gives in Deep Work, where he suggests putting the most difficult task first in the day. However, I know what works well for me, and sometimes that is a gradual easing into the Deep Work.  

It’s important to focus on the priorities of the tasks when choosing what to work on next. 

Allowing less (or not) important (urgent or not urgent) items to choke out your ability to focus on important items will stifle impact. Distractions impact everyone in this era where every app and device is vying for our attention. Some of the most successful people that I know are ruthless in training their focus on executing tasks in the important row (see Eisenhower matrix). A negative example, a newer employee Bob would spend a lot of his time on his cellphone texting with friends during the day. While this isn’t a problem for some people, sending a one-off text, or doing it during lunch time. Bob would become absorbed in these conversations and spend hours on his phone, texting with friends, and it was impacting his ability to prioritize and get his work done. I noticed this lack of progress when he shared during daily stand-up. In private I asked him to measure where he was spending his time. This insight led to an important break-through (I’m spending too much time on not important / not urgent tasks, i.e. texting friends) for the person in time management and impact. *  

* Note: – An insight I learned when doing performance optimization in software, you have to measure where the time is being spent with a profiler, I.e. you can’t guess. You can try to guess where the software is spending time, but you will often be wrong. As an example, at Microsoft, I found logging boilerplate in a large service codebase which I worked on was doing stack trace calls to determine in which file it was making the request to log the filename. The stack trace call in C# is expensive because it uses reflection, so I just added the file name logic to be statically built. I would never have guessed that it was the stack trace call that was eating up so much of the compute without measuring it! 

Today, there are many ways to observe what you’re working on. The simplest is setting a fifteen minute timer which goes off throughout your day. When the timer goes off, immediately write down what you’re doing (even if it’s not work related, e.g. searching for a product on Amazon), and then go back to whatever you were doing. After a few days of observation, you might observe a pattern, like Bob, who discovered he was spending a LOT of time on his phone. (There is now tooling support for this measurement: Toggl

There are various reasons (ignorance, lack of discerning priorities and impact of tasks, being interrupt driven) for folks having trouble with time management. However, knowing what to do, when it should be done, and in what order they should be done will help provide the structure and the motivation to say no (especially to things outside of the urgent / important bucket). Giving yourself the permission to say “no”, to good, but not the best things will help you to be more effective and intentional in your impact. 

FAQs:  

What happens when a new request comes in from your boss? 

Part of the power of having a prioritized list that you are working towards is allowing that list to help you say no. After putting the task through the Eisenhower matrix, you can take the new task and measure to see where it compares to your other items. If it’s being asked of you by a manager, it might be worthwhile to have a trade-off discussion with them about what you planned to work on that might be impacted by the new request. If it doesn’t make the list, consider adding it to your next week’s priority list, or better yet seeing if it can be deleted. 

What about recurring items? 

I try to put all recurring items on my calendar. This avoids me having to think about them, and gives me greater confidence when scheduling meetings, etc. If I have to take my daughter to/from soccer practice, I put those commitments as a recurring meeting into the calendar to help me remember to do them, and then also keep the list to a manageable length. 

How do I improve my discipline in creating, prioritizing and executing against a list? 

Start off slowly, you don’t have to do it perfectly and capture all the context the first time. Create a list with a few items, prioritize and execute. Repeat this regularly over the next two months. Revisit what is working well versus what is not.  

Easy (best) instructions for enabling local AI development on mac.

Setting up SSH tunnel from mac to PC

Colab has security measures to prevent you from putting in any IP address in the Local Connections Settings dialog. These instructions mention that you can use gcloud command to point to a different machine using local port forwarding. However, it doesn’t show you how to do this for a mac to window set up (and maybe you don’t have gcloud installed). So, if you have a Windows PC with a GPU that you would like to run Python notebooks using colab on your mac (macbook), simply follow these instructions below:

  • On Windows PC (Win 10 or 11)
  • On Mac
    • Create ssh key (if you don’t already have a SSH public key which you use)
      • ssh-keygen -t rsa -b 4096 -C “<emailaddress>”
      • Should generate a public / private key pair in ~/.ssh (id_rsa and id_rsa.pub), the latter file is the public key
  • On Windows PC install
    • Copy ~/.ssh/id_rsa.pub (from Mac) to %ProgramData%\ssh\adminstrators_approved_keys (on Windows)
    • Set up permissions for administrators_approved_keys file
      • icacls administrators_authorized_keys /inheritance:r
      • icacls administrators_authorized_keys /grant SYSTEM:`(F`)
      • icacls administrators_authorized_keys /grant BUILTIN\Administrators:`(F`)
    • Restart openssh server through services or command line (net stop sshd & net start sshd)
  • Startup Jupyter on Windows
    • Type jupyter lab in a command prompt
    • Copy localhost connection string with token (starts with https://localhost::8888/lab?token=<token>)
  • On Mac
    • Start SSH tunnel
      • ssh -L 8888:localhost:8888 <windowsAccount>@192.168.1.xxx
    • Start up colab instance in browser
      • Click on “connect to a local runtime”
      • Paste in localhost connection string from above
        • Pro tip: I use google docs to share context across machines.
  • Troubleshoot
    • ssh -v or ssh -v -v -v (connection issues)
    • nc -z localhost 8888 || echo “no tunnel open” (test for liveness)