Summer, Stats and sleep(I wish)

6 minute read


Summer’21 was probably the most challenging summer my nineteen-year-old self has faced. I will not go on about I have become stronger now, eh well, I could say I have become more of myself. Staying in your room for 14 months, living with yourself every day does that to a person. I am not complaining. I have been more privileged than most people; nothing “major” happened to me personally, except maybe utter chaos. Anyway, this post is not a sob story, although I feel like I could not talk about this summer without first addressing the elephant in the room - we collectively experienced a pandemic, and we are all exhausted. Now that we have said that, the summer was not all that bad. I will tell you-

It was probably in march, when I was still 18, aimlessly getting through the day. I’ll be honest; I haven’t been a conventional “coder” I practically hated it in high school. Syntax has always been annoying to me; somewhere at the beginning of my sophomore year, I tried a bit of everything- I tried stats, “data science”, a bit of python, random bits of applied maths here and there. I’m still not someone who writes actually good code, but python is relatively simple, or it has been to me; mostly, I’ve just used it as a tool to “get things done”, although I’m guessing algorithms are important. Anyway, it was march, I was looking through Numfocus organizations for Google Summer of Code, I just wanted to work on something where I knew what was I coding, it’s difficult to explain this, but I’ll give an example - maybe say you had a cool graph theory course at college, then you go home and decide to do a project out of it, that people can use, and its something that you wanted to study? I know, that’s nothing special everyone wants to do things they like, right? So let’s say I got lucky; I was interested in Graph theory, quantum computing and Statistics. I decided to have a look at three Numfocus sub-organizations- pymc3, qutip and networkx. I wish I could do two projects; I’m still considering contributing to networkx outside of GSoC; if you’ve ever worked with graphs in python, you would already know about it!! The project I really liked was pedagogical notebooks on graph algorithms, now this is something I find super cool: I think anything that creates an intersection between application and theory of mathematical stuff is great; I mean, I would love to learn about the travelling salesman problem and also simultaneously see how to implement that in a good library with great visualizations and actually tinker with all the steps. Jupyter notebooks are great for that, it turns out this is a part of what I actually ended up doing. So, sometime at the end of march I decided to work on pymc3, I cannot thank the people working there enough, everyone was insanely patient and kind. It’s probably one of the nicest things about open source, people helping you learn stuff without any expectations, just everyone trying to figure stuff out and help. I think the first PR in pymc3 that I worked on, was generating a figure. It was a simple task, but I ended up spending two days learning about color blindness, accessible visualizations and random things about matplotlib I had no idea about. Slowly, after this I mostly worked on a few jupyter notebooks in the pymc3 or pymc-examples repository, as I was talking about pedagogical notebooks, pymc3 is a great example for that, not exactly pedagogical, but really helpful how-to guides. Ravin, my mentor for the GSoC project and Oriol have been the most helpful people around, if you’re reading this, thanks a ton!!

I spent march and April making small contributions, smiling for 0.5s when a PR was merged, and reading about Bayesian Statistics to be able to understand what exactly am I coding. I’m writing all of this, wrapped in a bedsheet with a heating bag to deal with cramps, excuse me if my flow of thoughts is absolutely incoherent. This isn’t some sort of a - “how to do GSoC” blog, I would like to think of it as a journal entry for an exciting summer. For me community bonding started, from the first day I mailed pymc3 mentors for the projects I was interested in. The wonderful thing about open source communities is - you can start from anywhere, if you want, you can definitely help with something, and there are people who want to help you. So after pestering Ravin and Oriol to review my proposal multiple times, I ended up submitting one for Extending Time series in pymc3. This project tries to add forecasting models like ARIMA. I’m really excited to make up fresh tutorials out of ARIMA and its applications once I’m done implementing it in Pymc3. Oh, yes, a few cool things you can look out for:

  • Arviz, its a library for exploratory analysis of Bayesian models, it makes really cool visualizations and uses xarrays for storing InferenceData, which again was just another cool thing I learnt only this summer.
  • If you want to understand more about actually implementing Bayesian models, check out Osvaldo Martin’s book - “Bayesian Analysis with Python: Introduction to Statistical Modeling and Probabilistic Programming Using PyMC3 and ArviZ”
  • There’s another book on Bayesian Modeling in Python, which will be out eventually - All the best to Osvaldo A. Martin, Ravin Kumar and Junpeng Lao for that!!

These days I’m trying to get a grip around Pymc3’s documentation and planning stuff I need to implement. All of this with my main task in community bonding - dropping random ideas/questions on the community channels!! Everyone has been extremely welcoming, it is great to be a part of this community. Now I’ll go back and read about stats and get some sleep, while you can have a look at my actual proposal here!!