How to find the part of the code which implements some feature of a software

In this article, I intend to give all tricks I know to figure out which part of the code is in charge of a behavior in a program. This assumes that you don't already know the codebase and don't know where to look for. I'll go by what I consider to be the easiest to the hardest.

The tools I know

Looking for the answers in the logs

Assuming your software has logs and you can access them, they may simply state the name of the function executed, or the name of the file+line numbers.

Analyzing the running app

This assumes that:

  • you can compile the code,
  • you can run the code (on the computer which compiles, an emulator, a connected device) and
  • you have a tool that analyze the current state of the application.

Such a tool can be a profiler. With graphical program, it can be a tool that explode the view in part and allow you to see what are each part of the view. For text interface, it can be a debugger connected to a program running on another thread, and showing the current stack trace.

If you don't know how to use such tools, it may be really useful for you to learn how to use them; the search will be quite slower the first time, but you'll win incredible time later. However, those tools are sometime really hard to set-up. For example, if you run microservices, if you run code in multiple languages, if you have separated applications dealing with background, foreground... In AnkiDroid, the code I know the best, the main software is in Kotlin, but some backend is done in Rust, connected through protobuf messages, and some UI is displayed using webviews; I plainly admit I would not know how to profile this back-end and this view.

Looking for strings

Strings can appear in the UI or in the logs. You can look for a string, or at least a string template, that seems to be used mainly in the part of the software you want to consider. Then you can find where this string is used in the software. This may not be exactly the part of the code you were looking for, because a lot of the time, that will lead you to the view, and you may be more interested in a model or a controller (assuming the MVC model); however, it may be case that the controller is in the same folder as the view. Or that at least you can find who creates this view and what controllers/models is used together.

Asking someone who knows the codebase

The easiest method is to ask someone who knows the codebase. In my experience, if you are able to ask a clear question "what function is in charge of the action triggered by this button of that view", people are keen to answer. Indeed, if someone knows the answer, it's very easy to transmit, you just give a function name/class name/file name and line number.

However, I discovered this to fail in two oppositive case. If the original author resigned, or the code repository is not really active anymore, everybody forgot the codebase. On the contrary, if the codebase is extremely active, it may be hard to figure out who is the relevant person. Worst, I did a few contribution to cpython documentation and to some of its tools, I found it quite difficult to find people who can help understanding cpython code source, because most of the volunteer who spent time helping people where here to help people learning python; and it was hard to convince them I already am profficient in Python and that I was not here to write code in Python but to write code FOR Python. (Actually, some of the documentation tooling is in Python, so I was also here to write code in Python, but that was not relevant.)

Fuzzy searching

Most modern IDE allow you to do fuzzy search for class/file/function name. Most of the time, names in codes are similar to names in the English version of the running software. So you can just search and see what you get. That may require to explore multiple search result, because it's not always easy to figure out which one is relevant.

Adding breakpoints to library functions that must be used

Usually, one debugs with a debugger; however, the debugger is almost useless if you don't know the code base as you can't put break-point on the code you want to debug. However, you can also add breakpoints to libraries used by the code you want to debug. Let's say for example that you know that the action you want to discover send a very generic log such as "download completed". You can't look at all the places that log "download completed" because your "download" function is called everywhere. However, you can put a break-point on the logging function that gets triggered only if the message contains the word "download"[1]. You deactivate all breakpoints until you are almost doing the action you want to understand. Then you reactivate the breakpoints. And then you run the software up until the log occurs. You can then look at the stack-trace. If you didn't find the function you were looking for in the stack-trace, maybe you need to resume the run, because you may log "download" multiple time. Eventually, you'll have found the method you want.

Beware that on multi-threaded software, the function you are looking for may not be in the stack trace because it would be on a different thread. If you're lucky, this thread is waiting for the end of the download, and so if you look at other thread's stack-frame, you'll find the function you are looking for.

I'm certainly forgetting things

It's almost certain there are tricks I don't know. I'll be glad if you can alleviate my ignorance.

Note

[1] At worst, you modify the log function such that you add a code-path to log "download" specifically. And you put the break-point in this code path.

Add a comment

HTML code is displayed as text and web addresses are automatically converted.

Add ping

Trackback URL : http://www.milchior.fr/blog_en/index.php/trackback/778

Page top