Developing with @regmicmahesh

Structuring Infrastructure as Code with a Layered Approach

Har Har Mahadev! — Tue, 30 Jan 2024 06:55:14 GMT

One of the problems that I face frequently when writing Infrastructure as a Code is how you structure it. To answer this question, I think there are a few indicators that decide how we want to structure.

One resource will have less change compared to another resource. For example: your VPC resource may not change for the lifespan of the project but the application resource may change every day. How can I structure it in a way that I can make sure the changes in application don't have any relationship with the change in VPC?
Changes in some resources are significant. Such as IAM Roles, Accounts, Billing, etc. They may bring down the whole system. How to make sure it's properly encapsulated? So that the code responsible for the application is guaranteed to have read-only access to the IAM Roles.

Layered Approach

Then, I read this fantastic write-up I've included in the references section of this blog. We can think differently. What if we can distinguish every resource in different layers based on the two principles:

Rate of Change: The resources with a similar rate of change should go in the same layer.
Encapsulation: A layer should encapsulate all the necessary resources which change with it.

If you think of these layers, they resemble somewhat like the OSI layer model in computer networking. Each layer encapsulates the data passing only the required data to the lower layer. The rate of change and the effect of change increases drastically as you go up the layer chain.

Layer Representation

This is the diagram of different layers as presented in the blog by Lee Briggs.

Layer	Name	Example Resources
0	Billing	AWS Organization/Azure Account/Google Cloud Account
1	Privilege	AWS Account/Azure Subscription/Google Cloud Project
2	Network	AWS VPC/Google Cloud VPC/Azure Virtual Network
3	Permissions	AWS IAM/Azure Managed Identity/Google Cloud Service Account
4	Data	AWS RDS/Azure Cosmos DB/Google Cloud SQL
5	Compute	AWS EC2/Azure Container Instances/GKE
6	Ingress	AWS ELB/Azure Load Balancer/Google Cloud Load Balancer
7	Application	Kubernetes Manifests/Azure Functions/ECS Taks/Google Cloud Functions

We can discard the upper two layers. It's best if we manage those layers manually because the lifespan of those resources is generally equivalent to the lifespan of your whole IaC project. And, we have this sweet problem always existing of "if IaC manages state using a resource which can't be managed by that IaC, what manages that resource? For example s3 bucket". For this kind of resource, it's best to have some scripts that do it.

So, the approach that I'd like to take is the following.

Layer	Name	Example Resources
0	IAM	IAM Users, IAM Roles etc.
1	Network	AWS VPC / Security Groups / NAT / IGW / Route Tables / Subnets
2	Certificates	AWS Certificate Manager, Route53 Records for Certificates
3	Data	AWS RDS, AWS DocumentDB, DynamoDB
4	Assets	S3 Buckets, ECR Repositories
5	Ingress	AWS ELB, Target Groups, Cloudfront Distributions, DNS Records for Route53
6	Compute	ECS, EKS, EC2 Instances, Autoscaling Groups, Lambda Functions
7	Application	ECS Services, EKS Resource, Lambda Functions

If you prefer a tree view, this is how I've structured the codebase written using Pulumi SDK in Golang.

./layers/ application    main.go    Pulumi.dev.yaml    Pulumi.yaml assets    main.go    Pulumi.dev.yaml    Pulumi.yaml certs    main.go    Pulumi.dev.yaml    Pulumi.yaml compute    main.go    Pulumi.dev.yaml    Pulumi.yaml data    main.go    Pulumi.dev.yaml    Pulumi.yaml ingress    main.go    Pulumi.dev.yaml    Pulumi.yaml network     main.go     Pulumi.dev.yaml     Pulumi.yaml

Complete Project Structure

If you look at the complete project structure, it resembles something like this.

. assets base    Dockerfile.infras-builder    Makefile env    override.mk go.mod go.sum internal    file        main.go layers    application       main.go       Pulumi.dev.yaml       Pulumi.yaml    assets       main.go       Pulumi.dev.yaml       Pulumi.yaml    certs       main.go       Pulumi.dev.yaml       Pulumi.yaml    compute       main.go       Pulumi.dev.yaml       Pulumi.yaml    data       main.go       Pulumi.dev.yaml       Pulumi.yaml    ingress       main.go       Pulumi.dev.yaml       Pulumi.yaml    network        main.go        Pulumi.dev.yaml        Pulumi.yaml main.go Makefile pkg    acm-certificate       main.go    ecr-repository       main.go    ecs-cluster       main.go    ecs-service-v2       main.go       types.go    label       main.go    load-balancer       main.go       target_group.go    mongo-database       main.go    postgres-database       main.go    s3-cloudfront-website       main.go    security-group       main.go       rule.go    ssm-parameters        main.go policies    ssm-parameter.access.json targets     docker.mk     go.mk     pulumi.mk

Explanation

I've heavily used Makefile for this architecture.

targets

Everything that needs to be run from automation is grouped into the CLI it invokes. For example: dockerized commands, go into the docker.mk, go building/ linting/ vendoring commands go int othe go.mk file

pkg

This folder contains the Pulumi components that I've created to abstract all the resources required for one logical component. For example: the ecs-service-v2 creates EBS volumes, ECS task definitions, and ECS service itself.

layers

Every layer of the above diagram resides in a respective folder inside of the layers folder. Each of those packages produces a binary which is then passed with LAYER_NAME to the make command to operate on a single layer at a time.

env

This folder contains the environment variables required for the whole deployment to function. It also sets some handy variables like GOCACHE, GOPATH, and PULUMI_HOME to ensure I have a caching mechanism and a pretty fast build cycle in local environment.

These things will be directly set when running the IaC in CI/CD platform.

Reference

Structuring your Infrastructure as Code | lbr. (leebriggs.co.uk)

Hands-On: I wrote my auth logic with JS in Nginx; Will you?

Har Har Mahadev! — Sun, 17 Sep 2023 16:02:03 GMT

This may come as a surprise to you. How can someone use nginx configuration with Javascript? Fortunately, nginx supports a language called njs which is a strict subset of ECMA5. This can be used to further extend your routing logic, but don't be that happy because the feature you get with njs is very limited.

Lots of syntax doesn't work out of the box and I couldn't make it work with third-party npm modules such as jsonwebtoken etc. I couldn't even use esbuild to generate a single file build.

From the documentation of nginx:

njs is a subset of the JavaScript language that allows extending nginx functionality. njs is created in compliance with ECMAScript 5.1 (strict mode) with some ECMAScript 6 and later extensions. The compliance is still evolving.

I won't go with the installation process because you can get it easily in the documentation here. Also, you'd have to add the following repositories to have nginx packages available if you're also using Ubuntu.

# NGINX Repositorydeb http://nginx.org/packages/mainline/ubuntu/ jammy nginxdeb-src http://nginx.org/packages/mainline/ubuntu/ jammy nginx

Running Hello World

After, we've installed the njs module. Let's go ahead and write the js code for straight-up firing the hello world.

@machine:/etc/nginx sudo cat > index.js << EOF> var hello = function (r) {    r.return(200, "Hello world!");};export default { hello: hello };> EOFroot@machine:/etc/nginx

This is a very simple handler, which will return the string "Hello world!" with status code 200.

Let's plug it into the nginx.conf file.

user  nginx;worker_processes  auto;load_module modules/ngx_http_js_module.so;http {  js_import index.js;  server {      location / {          js_content index.hello;      }  }}

Now, when I send GET request to localhost with curl, I get the expected response.

root@machine:/etc/nginx# curl localhostHello world!root@machine:/etc/nginx#

This is already very cool. But, we can do even more.

Implementing Dummy Auth and User Service

Let's create two simple services; one auth service and one for consuming the auth service.

Auth Service: Code

This is a fairly simple API. The login handler takes a POST request with username JSON key and generates a JWT token. The verify handler verifies the JWT token passed as a bearer token.

Let's test our auth service if it's working as intended.

regmicmahesh@machine:/etc/nginx$ curl -s localhost:8080/login -XPOST -d '{"username":"mahesh"}'{"token":"eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE2OTUyMjUzMjMsInVzZXJuYW1lIjoibWFoZXNoIn0.y7kOx9dScMwg0XjRj1AAa9xh3rZp0GzEzR0FC1f3x-w"}regmicmahesh@machine:/etc/nginx$ curl -s localhost:8080/login -XPOST -d '{"username":"mahesh"}' | jq -r ".token"eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE2OTUyMjUzMjYsInVzZXJuYW1lIjoibWFoZXNoIn0.XXkLCwspeYYy8kCONSSPve1T7AYX94GgzNB1rrdOygkregmicmahesh@machine:/etc/nginx$ k^Cregmicmahesh@machine:/etc/nginx$ curl localhost:8080/verify -H 'Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE2OTUyMjI5NTksInVzZXJuYW1lIjoibWFoZXNoIn0.GXnnmLuXRr8tu1Sx08DXcgs7N6au4huku-S62sR0JCU'{"exp":1695222959,"username":"mahesh"}

User Service: Code

The user-service simply prints out the value of the header X-User-Id .

Now, we need to configure our nginx such that, every request first goes through the auth service, decodes the token if possible, and passes the username to user-service

Writing Nginx Config

First of all, let's configure the /verify .

 location = /verify {   internal;   proxy_pass http://localhost:8080/verify;}

This is okay enough. The auth-service is running on port 8080. This should pass the request to that endpoint whenever requested.

Now, we can simply use auth_request /verify . But, where's the fun in that? We want to set the username in the header dynamically.

Let's write a simple njs script to send a request to this endpoint.

function verify(originalR) {  function callback(r) {    originalR.return(r.status);  }  originalR.subrequest("/verify", { method: "GET", body: "" }, callback);}export default { verify };

The NJS script above simply sends the request to /verify but doesn't send the request body. It sends only the headers.

Let's add one more block in nginx to use this js function.

 location = /auth {   internal;   js_content index.verify;}

Now, we need to use this /auth endpoint as the authentication endpoint and everything should be good. We just wrapped our actual verification endpoint with a proxy. Why? Because now we can add our JS logic to set the header.

Let's define a variable to hold the value of the username.

server {    set $username "";# other code}

Now, we can set this variable from the NJS script.

function verify(originalR) {  function callback(r) {    if (r.status === 200) {      const body = JSON.parse(r.responseText);      originalR.variables.username = body.username;       originalR.return(200);    } else {      originalR.return(401);    }  }  originalR.subrequest("/verify", { method: "GET", body: "" }, callback);}export default { verify };

Now, we can send this variable as a header to the actual user service.

Adding it in the location, block it comes as follows:

location / {    auth_request /auth;    proxy_set_header "X-User-Id" $username;    proxy_pass http://localhost:8081;}

Let's take it for a test-run. It works!

regmicmahesh@machine:/etc/nginx$ curl  localhost -H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE2OTUyMjI5NTksInVzZXJuYW1lIjoibWFoZXNoIn0.GXnnmLuXRr8tu1Sx08DXcgs7N6au4huku-S62sR0JCU"Request Received From: maheshregmicmahesh@machine:/etc/nginx$

Thanks for reading my blog. I don't think this is the best way, but just wanted to share what's possible with NJS scripting.

Dev Blog 1; Scratching the Kube API Server to get TTY

Har Har Mahadev! — Sat, 16 Sep 2023 06:50:49 GMT

This is my first blog of multiple blogs on developing a pod troubleshooter turned into a compiler as a service. What do you call it CaaS? COMPaaS is cooler though, lol. I've been learning a lot of things in this journey, so I decided to share about it with you all.

Let me start this blog with how it started, I wanted to make a super simple pod troubleshooter to provide for developers. Developers were having a hard time running migrations, checking application logs, and everything. I thought, okay, I'll create a simple tool which will do one simple thing; provide live tty session inside a running pod.

Although I used Weavescope for the project need, I thought it was always exciting to learn how these tools work and how can I develop one myself. So, I decided I'll build one live tty API for the pod myself.

REST API of Kube-API Server

Let me give you a little bit of background on what is this kube-apiserver thing.

Kube-API Server is one of the major services in the control plane which provides a REST API to change/configure the cluster's state. All the things (kubelets, controller manager) need to communicate with the API Server to perform any change in the cluster.

Now, it was clear. If I want a shell in the pod, I need to go through the kube-api server. But how? I wanted the REST API documentation or any reference on how can I send requests to the kube-API server.

Let me teach you a nice hack to find out what requests are you making when you're using kubectl. You can change the verbosity to level 10 to get the exact curl request you're making with the API Server.

kubectl get pods --v=10...curl -v -XGET -H "User-Agent: kubectl/v1.25.4 (linux/amd64) kubernetes/872a965" \ 'https://[redacted].com/api/v1/namespaces/default/pods?limit=500'...

It's that simple to understand what request is kubectl making to fetch the list of pods in your cluster.

You can also get the OpenAPI schema of the REST API of Kubernetes.

curl --insecure -L "https://$API_URL/openapi/v2" \    -H "Authorization: Bearer $TOKEN"

This will provide you with the complete open API schema of the kube-api server. You can then visualize this with tools like redoc, swagger, etc., and get a proper hold of the API.

REST Endpoint for pod/exec

Our goal is to extract the endpoint of exec-ing inside of the pod. You can either get it using the kubectl as I showed you or just explore through the schema. Either way, the endpoint is this.

${this.config.url}/api/v1/namespaces/${namespace}/pods/${podName}/exec?command=${cmd}&stdin=true&stdout=true&stderr=true&tty=true

But the catch is, that this endpoint doesn't work on HTTP, it upgrades your request to WebSockets. You can't do something like okay, I'll fire up an HTTP request with the command I wanna execute and show the stdout. The response to this endpoint is always a WebSocket upgrade, and the stdout/stderr is sent as a WebSocket message.

curl -XPOST -H "X-Stream-Protocol-Version: v4.channel.k8s.io" \ -H "X-Stream-Protocol-Version: v3.channel.k8s.io" \ -H "X-Stream-Protocol-Version: v2.channel.k8s.io" \ -H "X-Stream-Protocol-Version: channel.k8s.io" \ -H "User-Agent: kubectl/v1.25.4 (linux/amd64) kubernetes/872a965" \ "https://$API_URL/api/v1/namespaces/default/pods/nginx-deployment-cbdccf466-qjfvw/exec?command=python&container=nginx&stdin=true&stdout=true&tty=true"

The response will contain the header Connection: Upgrade and Upgrade: SPDY/3.1 This means the connection is upgraded to WebSocket.

curl won't be able to take you long enough with WebSockets, we can use another tool similar to curl wscurl to send WebSockets requests with curl-like API.

If you're using curl, make sure to specify the subprotocol (i.e. X-Stream-Protocol-Version) with the flag -s .

wscat -H "Authorization: Bearer $TOKEN" \   -c "$URL" \   -s "v4.channel.k8s.io"

Message Format from Kube-API Server.

The message you receive from kube-api server over WebSockets is in a specific format. The first byte contains the messageType and the remaining bytes are the actual message from the TTY.

There can be three types of messages: stdin, stdout, stderr .

Message Type	Byte
stdin	0x00
stdout	0x01
stderr	0x02

In this way, you need to filter out the messages whether they're in stdin, stdout, or stderr.

For example: if you establish a connection with websocket, wscat will print something like this:

You see the first byte in the message received from the server. It's 0x01 because it's sent in stdout by the pod when you open a shell. And, your terminal doesn't know how to render that.

Will continue in Part 2.

My bad; But it's too late to rewrite.

Har Har Mahadev! — Mon, 11 Sep 2023 14:32:10 GMT

It's been around 5 years in the software space as a software engineer. In this journey, I started with the backend, jumped around between different full-stack technologies and currently settling down as the DevOps Engineer, occasionally handling difficult and complex parts of the system development.

Throughout my career, I've worked with numerous frontend frameworks, backend frameworks and operational strategies and technologies. This brings back a lot of painful and happy memories. In this blog, I've written about my views from my experience which may or may not be correct.

One problem I've always faced in the majority of the projects is as the projects grow it gets increasingly difficult to change things, observe things and gain a better understanding of the system. When we aren't able to have a good sense of the system, we can't properly architect the new change.

One recent example; I was leading in moving the auth-module of a system from self-backend implementation to Google provider. When I started the project it was really simple, I had implemented the auth module as a separate testable entity that could be plugged into other modules and easily modified. As the project grew with other developers, they started plugging in middlewares, modifying attributes of the auth module and injecting unneeded things to the auth state just because it was properly designed to hold the state in the request-response lifecycle.

Now, I thought to change the internal implementation of the auth module, and sweet hell Mary, everything was a mess. Our middleware broke, and authorization didn't work. Most of the modules that depended on the auth module broke.

Another example; I was leading the development of a data labeling system. We envisioned this system would be complex, so we chose to separate systems into 3 different components. This time, I had bootstrapped the projects and the code was in proper shape in the initial commit. But, soon after from Day 2, there were blockers. Separate components had separate developers working on them. The interaction between components was over HTTP, and communication issues between these components ate up a lot of the development time. By a lot, I mean the development time was extended significantly. Now, when I think about it, it boils down to "premature optimization is the root of all evil".

Another example; This was a streaming site built with a claimed "Super Fast" NodeJS framework at the time. I thought, okay streaming is a tough job. Let's write this service with NodeJS and a custom ffmpeg wrapper in NodeJS. The super-fast framework was so slow to develop, that it had no community plugins. I wrote a lot of things myself which I could have got for free. If I'd sacrificed 1% speed and used a slower framework, my productivity would be 100x.

What did I learn from it?

I learned two key things from it:

Developers often fail to understand the intention behind the code.
You don't need to think about scaling to 10 million when you're writing your first line of code.
There is no such thing as "It worked for X corp, it should surely work for us."
Don't go with "Yeah, I've worked with X framework between these choices, and I think it's a good fit."
Performance isn't always the top priority.

How do I do it now?

Now, I've seen enough failures to identify the correct one. I'm confident enough to tell that whatever you're building, it should follow these characteristics:

should have as little magic as it can.
dx is not a thing to ignore.
Most of the things should be automated.
Interfaces and boundaries should be thought of at first.
The system should be observable.
There should be a proper document in place to understand "why" for all ADRs (architecture decision records)

In the starting, I'll start with some BaaS (Firebase, Supabase, Strapi maybe) because it's not worth investing a lot of code and developer time when your (potential) clients and software don't even understand each other.

Most of the BaaS makes things observable, you can know which services are getting most requests, which queries are taking the most time, what time you are getting a lot of traffic, which is the bottleneck in your system etc. You also get sweet other things out of the box such as authentication/authorization, and data persistence layers and the best of them all is dx with it.

After a certain period, you'll get a perfect understanding of which way your software is heading. You can slowly start separating the bottlenecks of the BaaS platform into separate services. You write your code this time, as serverless functions if possible for your service. And you plug it into your BaaS.

Now, you need to start finding out how that BaaS does certain things and how'd you on about doing that on your own, research on the database performance, language, toolings etc.

After a certain level of maturity, you'll realize your business logic is getting too complicated to tune your BaaS and you need to do it on your own. This is the moment you start slowly rewriting your backend and shifting your API into your backend, one service at a time.

At last, this is my personal experience. Let me know if you have any conflicting views or I misunderstood something, I'm learning every day and would be very happy to correct myself.

Is unpacking faster than indexing?

Har Har Mahadev! — Tue, 12 Apr 2022 09:24:34 GMT

This is a strange story.

One morning, I woke up and wrote a few lines of code. Then, there came an interesting situation. I need to assign two elements of an array to two variables. There are many ways of achieving this in python.

If you're a javascript developer, your instinct should say this is the way.

const [x,y] = items;

Suppose, I've got an array [1,2] and I need to assign two variables to this value. Something like follow:

a = 1b = 2

I did the natural thing I was used to doing, unpacking the variable.

a, b = items

This works with charm. But suddenly I remember of an another way. What if I index the array and grab the elements that way.

a, b = items[0], items[1]

Or even.

a = items[0]b = items[1]

Now, I want to profile this code. I know this is over-engineering at max but let's see what exactly happens.

First I wrote the same implementation in python and disassembled the code.

In [15]: def x():    ...:     a , b = items    ...:     return    ...:In [16]: def y():    ...:     a = items[0]    ...:     b = items[1]    ...:     return    ...:In [17]: dis(x)  2           0 LOAD_GLOBAL              0 (items)              2 UNPACK_SEQUENCE          2              4 STORE_FAST               0 (a)              6 STORE_FAST               1 (b)  3           8 LOAD_CONST               0 (None)             10 RETURN_VALUEIn [18]: dis(y)  2           0 LOAD_GLOBAL              0 (items)              2 LOAD_CONST               1 (0)              4 BINARY_SUBSCR              6 STORE_FAST               0 (a)  3           8 LOAD_GLOBAL              0 (items)             10 LOAD_CONST               2 (1)             12 BINARY_SUBSCR             14 STORE_FAST               1 (b)  4          16 LOAD_CONST               0 (None)             18 RETURN_VALUEIn [19]:

Do you see the extra overhead when I index the value? First, it need to load the constant i.e. index value, then perform the subscription. The store the value in a. This is a long way of doing it.

If I look at the other code, which is unpacking. One instruction UNPACK_SEQUENCE is capable of returning all the values and it's followed by two store instructions.

Even by intuition, I suppose the first function (x) will run faster.

Now, let's compare the execution time.

In [19]: %timeit x()84.3 ns  3.59 ns per loop (mean  std. dev. of 7 runs, 10,000,000 loops each)In [20]: %timeit y()108 ns  2.42 ns per loop (mean  std. dev. of 7 runs, 10,000,000 loops each)

To no surprise, the unpacking beats the indexing by around ~24ns. This isn't the kind of optimization you should be thinking when writing a code. Write a clean, readable testable code.

Don't abuse your codebase with lines like the following:

a ,= items# use thisb = items[0]

I stopped using environment variables for config.

Har Har Mahadev! — Sun, 10 Apr 2022 06:03:29 GMT

Environment variables were the standards for passing configurations, keys, and deployment assets in the system for a long time. It's even recommended by the twelve-factor apps guide.

No syntax, No composition

But the problem is once your configuration starts getting too complex, it's very tough to manage environment variables. Currently, in one of my projects, the environment variable count is 53 and it's bound to increase substantially when we deploy our next release. If only there was composition we could identify which are common variables to be used in all stages and which are the unique ones. For this problem, what we're doing currently is namespacing the environment variables via underscore. For example MY_COMMERCE_DB_HOST_NAME. Now, when we deploy another environment it's very tough to identify which environment variable is used in which aspect of the project and how. The complexity and no data types of environment variables is only one part of the problem, and probably a lesser significant one.

Security Issues

Environment variables increase the attack surface for your application when sensitive values like SSH Keys, Database Passwords, authentication credentials, and API Keys are passed via it. Most of the loggers dump all your environment variables into the log file when your application crashes along with the stack trace which silently passes down your secret variables into a plain text log file. Also, the environment variables when supplied to a parent process are passed down to every child process. Now, if you're spawning a subprocess to some third party process it has access to all of your secrets. In ephemeral deployment solutions like AWS Lambda, Azure Functions it maybe not be much of an issue because you can use your own master key to sign your environment variables, and also the container lifecycle is ephemeral which significantly reduces the risk of leaking down your variables but still, it's doable and is happening.

What is my approach?

Store your configuration file in YAML.

It's easy enough to create language and OS agnostic configuration files (INI, YAML, etc.) and it's usually straightforward to make files easy to change between deployments too - e.g. if a deployment is containerized, by mounting the file.

You can replace it like this:

APP_ID=123APP_HOST=domain.comAPP_LOGGER=logging-serviceDB_HOST=domain.comDB_PORT=5433DB_USERNAME=mmm

app:  id: 123  host: domain.com  logger: logging-servicedb:  host: domain.com  port: 5433  username: mmm

Now, this is significantly easier to parse and pass down the road to required places which also supports the dependency inversion principle. There are packages that can create schemas from YAML files which will even help you further. Also, you can use YAML validators to add different validations, now your configuration has its own syntax, and data types are composite.

Store your secrets in an ephemeral file system.

Different containerization platforms have their own way of passing down secrets in an ephemeral file system. If you're using Docker in swarm mode, you can utilize the full benefit of docker secrets. Kubernetes has its own configuration mechanisms for passing secrets. One of the methods I personally suggest to pass secrets is using Hashicorp Vault which is very good for implementing in zero-trust environments.

Also, the cloud provider has some sort of secrets management service like AWS Secrets Manager which has a variety of security mechanisms as a service.

Should you use Kubernetes?

Har Har Mahadev! — Sun, 03 Apr 2022 10:24:02 GMT

No.

Hardening a Docker Image

Har Har Mahadev! — Wed, 23 Mar 2022 02:18:13 GMT

Hardening

Hardening refers to decrease the potential of a system getting exploited by reducing the attack surface. It may range from adhering different policies like Least Privilege, Zero Trust and also contain different steps like automatic security updates, firewalls, rotating passwords etc.

For hardening a system, basically the system-ops should cut-off unnecessary provisioned processes, services and vast majority of other resources currently used by the system unless they're bare minimum or essential for the system to function.

There are a million steps to be taken in order to prepare an absolutely stone hardened system but we'll look at some essential steps in hardening a docker image so that next time someone tries to own your docker container, they'll have hard time.

Limit the build data with .dockerignore or unnecessary copy.

Most of the times when building a docker image, it's a wild practice to move a directory directly inside the image, but this increases the attack surface as .env files, configuration folders, log files may be moved inside the docker image which gives the attacker more information already present inside the docker container regarding the system.

Even if the data are too many to be copied without doing a wild-card copy, I'd suggest to use a .dockerignore file which behaves similar to .gitignore and it stops the files from being copied which are mentioned there.

Use smaller images or scratch if possible.

Using a larger docker bases which consist of a thousand of tools your application doesn't need to function but increase the attack surface as there's chances of them having security issues is a great way of helping the hacker.

Try to use a very small image. Also, don't make a nightmare for developer by using images like alpine when the application is supposed to use very complicated libraries as alpine builds everything using musl and sometimes it's a nightmare.

My preferred docker images are slim builds of debian. Also, it's a lot easier when you're using statically compiled languages like go, which can run natively inside docker container when you just ship a container with only the binary. Think about the attack surface reduction when everything the filesystem contains is a single binary.

Downgrade to non-privileged User.

Getting rid of the root user in a docker image is just a two line configuration change. By doing just this simple change, we introduced a whole new problem of privilege escalation to the potential attacker.

Mount host file system as read-only.

Often times, we need to mount host data for reading configuration, or many other reasons for the docker container to function properly. For this, if we mount the host file system with full RW access chances are the potential attacker will misbehave with the mounted file system. To mitigate to some extent, we should mount the file system in read-only mode. We should reduce the access in the host file system by stripping down to the only required folder / file.

Don't expose unnecessary ports.

In development environment, it's a good idea to expose multiple ports, as the developer may require to run remote debugging in the container, or an extra development only network access for whatever reason but those ports should be strictly shut down while running with live users. It's a good idea to expose only the port listened by the main process running inside the docker container.

Never run in privileged mode

This requires no explanation. Running a container in privileged mode means it can do basically anything the host system. This is the worst thing through which you not only risk the container but also risk the host system.

Other Extra Suggestions

Enable Scan on Push in ECR.

Elastic Container Registry (ECR) service provided by Amazon can scan for security issues in your docker images as soon as you push them. This enables you to know about the potential threats in the docker image and quickly update / patch the packages which have CVEs assigned to them.

some notes i took.

Har Har Mahadev! — Sat, 22 Jan 2022 06:25:32 GMT

The Single Responsibility rule states that a class should have a single purpose, and its methods should all be related to that purpose.
The Encapsulation rule states that a classs implementation details should be hidden from its clients as much as possible.
The Most Qualified Class rule states that work should be assigned to the class that knows best how to do it.
The Low Coupling rule states that the number of class dependencies should be minimized.
The rule of Transparency states that a client should be able to use an interface without needing to know the classes that implement that interface.
The Open/Closed rule states that programs should be structured so that they can be revised by creating new classes instead of modifying existing ones.
The Liskov Substitution Principle (LSP) specifies when it is meaningful for one interface to be a subtype of another. In particular, X should be a subtype of Y only if an object of type X can always be used wherever an object of type Y is expected.
The rule of Abstraction states that a classs dependencies should be as abstract as possible.

Splitting a big tfstate into Multiple Environments with Remote Backend

Har Har Mahadev! — Mon, 20 Sep 2021 05:31:37 GMT

Splitting tfstate files into multiple environments is very useful when your project is growing so much and you've a very big tfstate file and you need to monitor which resource are failing, which are deploying effectively.

It's hard to track resources specifically when you're using for example own mongodb in your dev cluster but want managed service for your production environment.

Let us consider a file structure as below. Currently, all your backend configurations are stored inside the mono/provider.tf. Now, your project is growing and you need to separate out your tfstate into multiple environments.

For this example, we consider s3 backend. But you may use any; the process is same.

 [FILE] envs ~/devops/learningtf/envs      dev                                                                                        provider.tf                                                                              mono                                                                             provider.tf                                        prod                                                                                  provider.tf

The code inside the mono/provider.tf is as follows.

mono/provider.tf

terraform {  required_providers {    aws = {      source  = "hashicorp/aws"      version = "~> 3.0"    }  }  backend "s3" {    bucket = "test-split-backend"    key    = "dev"    region = "us-east-1"    dynamodb_table = "dev-test-split-backend"  }}# Configure the AWS Providerprovider "aws" {  region = "us-east-1"}resource "aws_s3_bucket" "dev-bucket" {  bucket = "test-split-qa"  acl    = "private"}resource "aws_s3_bucket" "qa-bucket" {  bucket = "test-split-qa"  acl    = "private"}

Consider these two buckets dev-bucket and qa-bucket. Now, you want to move these two resources into it's own environment.

Let's make a scaffold for putting resources with provider and backend for dev environment.

dev/provider.tf

terraform {  required_providers {    aws = {      source  = "hashicorp/aws"      version = "~> 3.0"    }  }  backend "s3" {    bucket = "test-split-backend"    key    = "dev-split/terraform.tfstate"    region = "us-east-1"    dynamodb_table = "dev-test-split-backend-final"  }}

Now, you may proceed to terraform apply so that terraform will generate state file and lock files on the remote backend.

If you run terraform state list, the output should be empty as we haven't added any resources in this environment.

Now, the first step is to dump the tfstate file which is very easy, just one command.

cd dev && terraform state pull > dev.tfstate

This will pull your current configuration of dev environment which should be not exactly empty but without any resources.

Now, you're ready to move the resource from mono into dev environment.

First of all identify which resource / module you want to move by looking at the tf files.

After identifying the resources, now move them into dev.tfstate using this simple command.

cd monoterraform state mv -state-out=../dev/dev.tfstate aws_s3_bucket.dev-bucket aws_s3_bucket.dev-bucket

This will remove the resource tracking from mono and move into to dev.tfstate which is not synced with dev remote tfstate as of now.

Now, the last thing you need to do is push the dev tfstate into the remote server, which is as easy as the following command.

cd devterraform state push dev.tfstate

Now, if you run terraform state show, you should be able to see the different as it won't show in the mono but it'll show in the dev.

You can confirm if there's any drift by doing terraform plan inside the dev environment.

Reduce the fear on reduce - Javascript.

Har Har Mahadev! — Mon, 24 May 2021 05:21:19 GMT

The reduce method available on arrays on JavaScript is one of the most powerful feature of functional programming. It is used to combine an sequence of elements together as a binary operation. In general terms, when we reduce an array we produce a single value at the end which consists of a pure function and some important concepts of reduce workflow.

The image below correctly explains the three main pillars provided in javascript to support functional programming paradigm.

Map is used to create a new array based on certain manipulation of every item of existing array but not removing them.

Filter is used to create a new array based on certain condition of existing array.

Reduce is the master of all which includes a whole pipeline and can do map, filter and much more. But, as you see in the diagram it is used to produce a single value at the end.

Before getting hands-on with reduce, we need to understand few concepts first.

Accumulator

Accumulator is used to store the current value of our reduce pipeline. Initially, if we provide any initial value to the reduce function, accumulator equals that value if not accumulator takes the first value in the array and the iteration starts from second item. Not clear? Look at the examples below.

Let's start with one of the most basic reducing operation.

const arr = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];const num = arr.reduce((acc, el) => acc + el);console.log(num); // 55

Here, the reduce function takes 1 as initial value ( first item in the array ) because no initial value is provided. After taking the first item as initial value, the invocation of function starts with the second item in the array.

Initially, the accumulator stores the same value as the initial value.

So, the initial value is 1, the accumulator holds the value 1, and the current element is 2.

After running the function we passed inside the callback, the returned value of our function is the new value to be set in the accumulator.

After one iteration.

So, the accumulator holds to value 3, and the current element is 3.

After second iteration.

So, the accumulator holds the value 6, and the current element in 4.

So, the accumulator is responsible for holding the returned new item after every iteration which is again passed while invoking the function for next item.

Guess, the above example makes it crystal clear now.

Now, let's try to do the same thing but providing initial value.

const arr = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];const num = arr.reduce((acc, el) => acc + el, 5);console.log(num); //60

Here, we pass another argument to the reduce method of array 5. Now, the reduce method will take 5 as initial value. The iteration begins on first item because it's not taken as initial value.

So, the initial value is 5, the accumulator holds the value 5, and the current element is 1.

After one iteration.

So, the accumulator holds to value 6, and the current element is 2.

After second iteration.

So, the accumulator holds the value 8, and the current element in 3.

Hope you understood the process.

Now, let's understand the whole syntax.

reduce((acc, el) => { ... } )reduce((acc, el, i) => { ... } )reduce((acc, el, i, arr) => { ... } )

We tried the first syntax already. The first item we pass is the accumulated value which is equal to the initialValue ( if we pass ) or the first item of array initially.

The second syntax is also near-equivalent but it takes a third argument i which is equal to the current index of the array.

The third syntax passes the whole array as first argument.

Keep in mind, only the first two are required. All other are optional parameters, pass it if you need them.

Now, you've a foundational knowledge of reduce, let's try to see how it's a swiss knife of functional programming.

Converting an array of number to array of strings.

For this operation map may be suitable and if you do it via map, you can do something like this...

const arr = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];const num = arr.map(el => el.toString())console.log(num); // ["1", "2", "3", "4", "5", "6", "7", "8", "9", "10"]

This is a lot simpler approach. Let's try to approach this with reduce.

First let's think how we can solve this like a functional programmer.

The initial value for this operation should be blank array i.e. []. Now, on every reduce operation we convert the number into string and push into the array.

Pretty straight-forward right?

const arr = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];const num = arr.reduce((acc, el) => [...acc, el.toString()], []);console.log(num);

If you're confused with the part [...acc, el.toString()], it means to spread the value of current acc array in the place of ...acc which means if acc equals to [1,2,3] then our new array will be replaced as [...acc] => [1,2,3], here the value inside of acc array is spread in-place.

After the current value held by the acc, we are pushing our current el.toString().

The flow can be explained as...

At first, acc = [] and current element = 1

At second, acc = ["1"] and current element = 2

At third, acc = ["1" , "2"] and current element = 3

Pretty straight forward right?If you're having issues with spread operator, consider this.

const arr = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];const num = arr.reduce((acc, el) => { acc.push(el.toString()); return acc;}, []);console.log(num);

Now, what if only even number is required.

Let's try with filter and map.

const arr = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];const num = arr    .filter(el => el % 2 == 0)    .map(el => el.toString())console.log(num);

This does the job. Now, let's try implementing this with reduce.

So, the initial value will be []. Our callback function should return the accumulator if element is odd, else add the stringified-element to the accumulator and return it.

const arr = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];const num = arr.reduce((acc, el) => {  if(el % 2 == 0) acc.push(el.toString()); return acc;}, []);console.log(num);

Now, this is getting interesting, right?

Let's try to implement a reduce function which reverses an array. It's quite simple because we also have reduce right, which reduces the array from left.

const arr = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];const num = arr.reduceRight((acc, el) => {  if(el % 2 == 0) acc.push(el.toString()); return acc;}, []);console.log(num);

We can also do this though...

const arr = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];const num = arr    .filter(el => el % 2 == 0)    .map(el => el.toString())    .reverse()console.log(num);

This is just the surface of the power of reduce. You can do much more based on your needs. If you've more examples, feel free to drop those in comments.

Further Reading:https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/Reduce

Learning Django Middleware by exploring CSRF.

Har Har Mahadev! — Fri, 21 May 2021 11:15:52 GMT

Talking about how much people assume me to be experienced in django, I'm one of those guys who've never seriously thought about how django does specific stuffs and rather rely on it works kind of thoughts. I just simply make my own if I don't want to hassle with django generics and mixins, or I spend 5second of my development time trying to search for it in docs ( thanks for the wonderful examples).

Recently, I had a software engineer interview, and one question that hit me was you've done django but have you written your own csrf (Cross Site Request Forgery) validator? I had written once in node when using with ejs, but I had never thought about how django does it... I answered the same.

CSRF

Cross Site Request Forgery refers when an attacker from another website can submit a form in my website.

Imagine you have a form which allows user to delete his account when he hits a POST request to /deleteprofile.

Now, a third party website can make same form as yours, provide a better SEO and use ads campaign on facebook to make a form which will instead submit a deletion request to your backend.

This is a serious issue.

-Image from Portswigger

You need to validate that the request you've received is originated by you and authorized by you to be performed.

Read More About CSRF:https://portswigger.net/web-security/csrf

One of the general solution to solve this issue is to use csrf token. Now, the token will be given by your backend to the webpage which is responsible to submit the form as a cookie.

And, the webpage will submit you the data including the csrf token. Now, you will validate if the submitted token is same as csrf token issued by your backend.

This cuts off all the attackers trying to request to your system from another system or trying to trick your users into submitting data unknowingly.

How Django handles security?

If you didn't know django comes with pretty much all the middlewares required to secure your backend. If you want to be practical, just check the settings.py file in your recent django project and find a list called MIDDLEWARE.

#other code....MIDDLEWARE = [    'django.middleware.security.SecurityMiddleware',    'django.contrib.sessions.middleware.SessionMiddleware',    'django.middleware.common.CommonMiddleware',    'django.middleware.csrf.CsrfViewMiddleware', # this is the one    'django.contrib.auth.middleware.AuthenticationMiddleware',    'django.contrib.messages.middleware.MessageMiddleware',    'django.middleware.clickjacking.XFrameOptionsMiddleware',]#other code...

CsrfViewMiddleware

Let's get into the github repository of django and try to locate where the source of this middleware is located. Open File in Github

A little Context on Django Middlewares.

In django, a middleware is just a request interceptor which has some hooks associated with it.

If you have zero experience, have a read here.

Some hooks are mentioned below...

process_view() is called just before Django calls the view.
process_request() is called just before Django parses the requst.

Now, you're ready to go into this journey.

In django, a csrf token contains two part mask and actual token. Each part contains 32 characters consisting of letters and digits.

CSRF_SECRET_LENGTH = 32CSRF_TOKEN_LENGTH = 2 * CSRF_SECRET_LENGTHCSRF_ALLOWED_CHARS = string.ascii_letters + string.digits

There are couple of helper methods defined there which help us to create a token, mask token, unmask token and compare tokens...

For this purpose.

 def _get_token(self, request):        if settings.CSRF_USE_SESSIONS:            #not important for this context.        else:            try:                cookie_token = request.COOKIES[settings.CSRF_COOKIE_NAME]            except KeyError:                return None            csrf_token = _sanitize_token(cookie_token)            if csrf_token != cookie_token:                request.csrf_cookie_needs_reset = True            return csrf_token

This method extracts the token from the cookies using the CSRF_COOKIE_NAME as defined in the settings.

Now, let's go into the lifecycle.

def process_request(self, request):        csrf_token = self._get_token(request)        if csrf_token is not None:            # Use same token next time.            request.META['CSRF_COOKIE'] = csrf_token

This hook utilizes the above helper method to extract the token and save it in request.META.

Now let's get into process_view.

if getattr(request, 'csrf_processing_done', False):            return None

skipping checks if processing is already done.

if getattr(callback, 'csrf_exempt', False):            return None

skipping checks if csrf_exempt is on.

if request.method not in ('GET', 'HEAD', 'OPTIONS', 'TRACE'):    if getattr(request, '_dont_enforce_csrf_checks', False):         return self._accept(request)

csrf token check is only done for put, patch, post and delete methods only. Also, it's worth nothing django doesn't accept post request without csrf tokens by default.

Then the check starts checking if the HTTP_REFERRER is valid or not.

request_csrf_token = ""if request.method == "POST":    try:           request_csrf_token = request.POST.get('csrfmiddlewaretoken', '')     except OSError:            pass

Now, we get the token sent by the user in the request.

And then the django checks the token with the issued token.

request_csrf_token = _sanitize_token(request_csrf_token)if not _compare_masked_tokens(request_csrf_token, csrf_token):       return self._reject(request, REASON_BAD_TOKEN)

Now, I guess this also explains to you about django middleware. A middleware essentially captures a request in middle, and does something with it and it's upto the middleware whether to forward to next middleware / view or just drop it.

Removing Sensitive Information from Git History

Har Har Mahadev! — Wed, 12 May 2021 07:45:19 GMT

We all have gone through this problem once in our development career. We mistakenly have pushed sensitive info into github and we need to remove it. The sensitive info can be anything like a prod .env file or passwords hard-coded in the code.

Luckily, it's pretty straight forward to remove that change from git history both from your local repository and remote repository.

For the demonstration, I've create a local git repository.

And populated with some fake git commits to demonstrate what you may actually do.

$ git log --onelined5d9f70 (HEAD -> main) some more changes to mainc05631b Added Passwordac547a8 Added Main File to Project

Editing Commit

Here, I have made clear using commit message to show that the commit with has c05631b adds sensitive information to our git history. Now, we need to remove that.

If you're having hard time finding which commit actually changed the password. You can use the following git command...

git log -p fileName

Now, run git rebase using this commit hash to rebase our git history from that commit onwards.

$ git rebase -i c05631bpick c05631b Added Passwordpick d5d9f70 some more changes to main# Rebase ac547a8..d5d9f70 onto ac547a8 (2 commands)#

Now, edit the text in the editor. We need to edit the commit which added password into the git history. So, I'll rewrite the first line as follows...

edit c05631b Added Passwordpick d5d9f70 some more changes to main# Rebase ac547a8..d5d9f70 onto ac547a8 (2 commands)

Note the first line, I've replaced the pick with edit. Now, after saving and exiting the text editor, git will put me right into the staging area of that commit.

Now, edit your password file.I will remove the password from .env file in this case.

And, I will add the changes into the commit using the given command.

$ git commit --amend -a -m "Removed Sensitive Content"

This will update the message in that commit and also the changed file.

Now, I will continue the rebase using the following command...

$ git rebase --continue

Now, git will try and update the git history by updating that specific commit with our new change.. We may also get some conflicts in the process.

Dropping Commit

This section is optional. If that's all you want you can go ahead and skip this section and follow the pushing into GitHub section.

Now, I will drop the same repository.

Use the git rebase command as before, but keep in mind the commit hash will be changed. So, again grab your commit hash using git log.

git rebase -i ab5edff

Now, change the pick line into drop which will actually remove the commit from git history.

Make sure you aren't dropping a repository which has other changes, it's generally bad idea to drop a commit. Or, it was a bad idea to create :D.

$ git rebase -i c05631bdrop c05631b Removed Secure Contentpick d5d9f70 some more changes to main# Rebase ac547a8..d5d9f70 onto ac547a8 (2 commands)#

Now, after saving and exiting this file. git automatically removes your commit and updates the git history.

$ git log --oneline9f865e5 (HEAD -> main) some more changes to mainac547a8 Added Main File to Project

YAY!

Pushing To Github

If this repository is already pushed into the github, it's always a bad news to rebase a remote repository, because it'll make conflict on all of your co-workers git history.

The command given below will do the job.

$ git push origin main --force

Ownership in RUST - ELIF 5

Har Har Mahadev! — Tue, 11 May 2021 16:06:04 GMT

Hello everyone ! A fellow rust learner here. It's been couple of weeks I've started learning rust. The most confusing part I found about rust as a programming language is the concept of ownership and borrowing.

We all are used to throwing a variable to a function using it later, or just re-assigning to multiple variables by making clones just because we can. But rust follows a strict practice here.

In order to make sure your code doesn't have any data race, it restricts a variable's value to be written only from one place. This also solves the problem of dangling pointers as rust has no garbage collector, it uses the variable scope to automatically clear the memory.

Ownership

In rust, every value has a variable which is called it's owner. And there can't be multiple owners at one time. When you initialize a new variable with a value it's said to be the owner of the value.

let name = "mahesh";

Here, name is the owner of the string literal "mahesh".

let name = "mahesh";

When you use this variable in another variable's assignment. The value is copied into another variable. Or, the variable is cloned. The two variable do not point to the same variable from memory's perspective.

This is because variables like integer, characters, booleans, raw strings implement the Copy trait. They require very less memory and can be copied easily to another. In other words, copying them is efficient enough for a computer.

But, this is not the case with structs or any other compound data types which are stored on heap.

#[derive(Debug)]struct Age{    num: i32}fn main() {   let _name = Age { num: 30};   let _another = _name;   println!("{:?}", _name);}

In this case, the variable _name is a struct. This _struct declaration doesn't implement the Copy trait. So, whenever you reassign the variable, it's owner is changed. At the line let _another = _name;, the ownership of newly created struct is shifter from _name to _another. Now, rust won't let you access the variable _name because a variable can only have one owner at a time.

But, this is not the case if you use #[derive(Copy)], it will create a brand new struct everytime you re-assign the variable if it implements the Copy trait.

error[E0382]: borrow of moved value: `_name` --> src/main.rs:8:21  |6 |    let _name = Age { num: 30};  |        ----- move occurs because `_name` has type `Age`, which does not implement the `Copy` trait7 |    let _another = _name;  |                   ----- value moved here8 |    println!("{:?}", _name);  |                     ^^^^^ value borrowed here after move

The rust compiler is also pretty much self explanatory on this matter.

To avoid this from happening, you have multiple one options. One is to derive Copy trait as mentioned before. Another option is to clone the variable. You can either write your own clone method or derive the predefined trait and use it.

But these two are not the trivial solution always. So, the solution is to use referencing.

Referencing

Rust allows you to create a read-only or mutable reference to a variable. This allows you to safely access the variable but remember there can never be multiple mutable reference of a variable at once. Rust borrows the same & notation to create reference of a variable.

   let _name = Age { num: 30};   let _another = &_name;   let _other = &_name;   println!("{:?}", _name);

The snippet shown above runs completely fine because in this case we are creating reference of the variable _name and we're good to do it. Because remember, this won't create multiple data and since there isn't a actual value stored at _another, there won't be issues like double-memory cleaning issue.

But, this isn't the case when you try to create multiple mutable references.

   let mut _name = Age { num: 30};   let _another = &mut _name;   let _other = &mut _name;   println!("{:?} {:?}", _another, _other);

In this snippet you'll get a error as follows...

error[E0499]: cannot borrow `_name` as mutable more than once at a time --> src/main.rs:8:17  |7 |    let _another = &mut _name;  |                   ---------- first mutable borrow occurs here8 |    let _other = &mut _name;  |                 ^^^^^^^^^^ second mutable borrow occurs here9 |    println!("{:?} {:?}", _another, _other);  |                          -------- first borrow later used here

This helps you to avoid data race at compile time.

I'm just getting started with rust migrating from high level language. Would love to hear any suggestions, feedbacks. Cheers!

Efficient Background Processing in NodeJS with BullMQ — MP4 to HLS.

Har Har Mahadev! — Mon, 03 May 2021 11:31:56 GMT

In one of our recent projects, we had to design a scalable distributed system to process incoming videos and serve those videos efficiently to large amount of concurrent users. Like netflix, but not complex from engineer perspective.

Two of the major concerns we had while designing the system was efficiently converting the video into HLS format, so that user can have better experience and the video adapts with change of the bandwidth, other concern is regarding tracking the watch time and recommending other videos to the user.

The second one is a lot easier because of prior experience of developers in our team with recommendation system on python and websockets for tracking everything the user does.

For converting the video into HLS format, we had a major hassle as this requires a lot of domain knowledge about video encoding and conversions.

Luckily, ffmpeg acted as a saviour for all of our issues. We could have used node-ffmpeg but we went with native ffmpeg in order to keep it simple, as were going distributed we may also have an extra endpoint layer of ffmpeg in aws to handle the video conversion in and out from AWS S3.

$ ffmpeg -i filename.mp4 -codec: copy -start_number 0 -hls_time 10 -hls_list_size 0 -f hls filename.m3u8

We used this script in order to convert a file from mp4 format to HLS.

Currently, we have a bit longer script which re-encodes the video into multiple resolutions for seamless and lag-free streaming experience.

In order to let this script to be executed, we needed a message broker which takes incoming video and consumes it to convert it to HLS.

For that problem, we used bull to act as message broker.

We created a queue at first called VideoQueue which will route our messages into the consumer.

Then, we wrote a very simple consumer function which just encodes the raw video file. Keep in mind, if youre planning to do the same, you must validate all the inputs. Dont throw native packages randomly as the file may not be what you want and unsanitized may lead to RCE.

This is our internal video converter which uses the shell script in order to convert the file and save it into movies folder.

Bull is extremely easy to use and configure.

It just stores your message into redis and consumer pulls from it. Weve had a pretty good response time and conversion status with it as of now. Were currently experimenting other video conversion formats to serve best traffic under certain video bandwidth.

Thanks. Too busy these days. No time to write articles.

Clean Way to Work With JWT —

Har Har Mahadev! — Wed, 03 Mar 2021 07:55:18 GMT

Working with JWT is a headache specially when youre starting out building a stateless backend system.

Note: This is not an introduction to JWT ( JSON Web Tokens) or any tutorial to implement JWT from scratch.

In JWT, the token is the most important piece of information to identify the user in the system. If user is able to tamper, see and modify the JWT by any means, then he/shell be able to perform identity theft with XSS / CSRF attacks when not secured properly.

Lets try to approach this problem by whatever comes in mind.

What if we tried to save the token in local storage and send it in header on every request?

There are many things wrong with this approach, One of the many being user has complete access to localstorage CRUD operations in the browser with few lines of javascript. This is extremely vulnerable to XSS attacks.

If a potential attacker has access to local storage, he can perform whatever he desires while the backend recognizes him to be the JWT Stored User.

And he hacker will have access forever, now what if user changes password what will you do?

Store it in database.

If you store the credentials in database and validate if the password has changed since the last token was issued, itll help you a little bit. But thats not the point of JWT. The point of using Token Based Authentication like JWT is make sure the authentication system is maximum stateless as possible.

Lets change the thought a little bit. What if the token expires after short time? And a new token is issued.

What ? A user logs in every(short)time?

No, lets not do that way. Its a terrible UX to login everytime. You can minimize UX such a terrible way just to make sure hacker doesnt get access for long time.

Lets use another token called refresh token which will stay with access token but have a greater expiry time. This works, now how will you make sure refresh token is used to refresh after access token is done (expired)?

Expose two endpoints

This works. But not for a long. A sensible hacker will request jwt forever and try to maximize his session as possible.

Keep in mind, our storage of JWT problem is not solved yet.

Covering the expiry part

When you specify expiry time in jwt token, it wont be valid token after the time has expired.

Now, the UX wont be that bad and the token will keep on refreshing every time but how to make sure client / hacker doesnt have reach to that token?

Using Cookies

What kind of cookies? If you store JWT in plain cookies, thats how 90% of XSS attacks escalate to massive impact.

Using HTTPOnly Cookies

HTTP Only cookies are such kind of cookies which cant be accessed and read from the client but will be stored on the client and passed on every request, which means the cookie works sort of like session but in a completely stateless manner.

The above code issues a two token which will be expired after 20m and 7days while the cookies cant be read from the client.

This protects us from maximum attacks.

Little Help to Developers

This is how we can write a middleware to make sure, the token is validated and try to refresh the token if theres any.

Make sure, your CORS enables frontend to work with HTTPOnly Cookies by following

Keep in mind, origin cant be wildcard when credentials is true..

You can login in this way

Hand-On — Containerizing Development Environment with Visual Studio Code

Har Har Mahadev! — Tue, 16 Feb 2021 06:39:40 GMT

The Problem

I was having this problem since few months and I guess most of you are having the same issues.

Do you do development of multiple languages on your laptop and have trouble managing all the environments and dependencies of the projects?

This is one of the biggest burden in my current development environment, specially with slow HDD read/write speed.

Just now, I had issues installing the graphics library files for my development environment as Pillow in python used another version and NodeJS used another version. Updating any of them would break another.

It would be lot better, if I could isolate my programming environment but I cant run a whole virtual machine just to have a isolation.

I mean, I dont need a new OS, I just want to isolate my development programs, tools and codebases. Another benefit it comes is that if you stop working in any of the projects in golang, and you can remove the whole development environment with just some commands, and youre good to go.

Isnt that already convincing enough to try containerizing your development workflow?

Container

A container is a standard unit of software that packages up code and all its dependencies so the application runs quickly and reliably from one computing environment to another.

What a container does is, it comes as a software but contains everything it depends upon. Like if a script requires a example.so linker files and a whole g++ installed to run, a container can include all of those and still remain lightweight than you expect!

This is all the magic of linux namespaces and cgroups. The concept of isolation of processes was thought long back then and kept in linux.

So, if youre running a virtual machine, then it runs on top of a hypervisor such as virtualbox, hyper-v, vmware, but docker containers run on the same host, but do not share the same PID list and are isolated from each other.

Keep in mind, docker container are also built secure and has minimal security concerns to think of if you configure it properly.

So, docker is just a platform which is used to control the containers effectively.

Or, quoting from the wiki:

Docker is a set of platform as a service products that use OS-level virtualization to deliver software in packages called containers.

You can research further about docker and containers.

So, when you convert your whole development environment inside a docker container, its totally isolated from the host operating system and also can be managed as a whole from the host operating system, or via docker cli.

Solution

I wont cover installing docker in this article, but you can look it up here.Installing Docker :https://docs.docker.com/get-docker/

So, in order to convert your visual studio code setup into docker container. the first you need is to choose an docker image to run.

Keep in mind, docker images are the whole software shipped with dependencies which can run isolatedly forming containers.

The most common one to use is ubuntu (if you prefer a big community) or archlinux (if you prefer lightweight)

#if you want to run ubuntu, execute this after installing docker.docker pull ubuntu:20.04#if you want to run arch, execute this after installing docker.docker pull archlinux

After youve pulled the images for your development environment, well use a volume so itll be easier for you to share files, to and from the container.

#For ubuntu,docker run -d -it --name ubuntu-dev -v ~/my_files:/my_files ubuntu:20.04 /bin/sh # or /bin/bash#For arch,docker run -d -it --name arch-dev -v ~/my_files:/my_files archlinux /bin/sh # or /bin/bash

Now, you can check if your container is running using docker container ps

CONTAINER ID   IMAGE          COMMAND       CREATED          STATUS          PORTS     NAMES84bc37ed215b   ubuntu:20.04   "/bin/bash"   54 minutes ago   Up 44 minutes             ubuntu-machine

The output should be similar to above one.

Now, you can open vscode and install remote pack extensions.

Install this extension pack, if you dont have already which will allow you to setup vscode remotely in a container.

Now, youll see your container listed here. Just click on it and vscode will prepare a development environment in the container and make it ready to start coding.

Now, you can install all your compilers, interpreters and everything using the provided bash via package managers.

When you need to remove the development environment just remove the container and youre good to go :)

Automatic Type Conversion in Runtime | Python

Har Har Mahadev! — Wed, 02 Dec 2020 12:33:13 GMT

Hi. I am back with an exciting topic in python.

How many times you have come across functions which automatically convert your values to required types in runtime?

For e.g. In Django whenever you define your types like this in route definitions the URL parameters are automatically parsed inside of your function.

urlpatterns = [path("/", someRandomView, name="home"),path("/", someRandomProfile, name="home_profile")]

So whenever you define your views, your data types are automatically converted to the required data type.

def someRandomView(slug):  ...def someRandomProfile(pk):  ...

This offers a huge convenience in your application by automatically converting your type as per your function or something hints the function wants to achieve.

Similar behavior is achieved in fast API.

Lets define a simple function.

def getAge(age, name):    print( f" {age} : {type(age)} , {name} : {type(name)} " )

You can see our types are nowhere defined in this piece of code. But, we can guess the user wants to age as an int variable and name as an str variable ( most probably). But python doesnt work that way. You can call that function with any variable you want.

It would be a lot better if your code automatically gave the developer some idea about data types.

Lets refactor that piece of code a little bit.

def getAge(age: int, name: str):    print( f" {age} : {type(age)} , {name} : {type(name)} " )

Now, this code is a lot cleaner than your previous code. But, we still have the same problem but it aids a little bit. Your text editor may flag out wrong parameter types if youre using some linters or running on mypy.

But, what we wanted to achieve is to make automatic type conversions in runtime. So, whenever you call the function, your arguments are automatically parsed to be converted into required data types.

Lets try writing a decorator to print what exactly we can extract out of the function definition.

from typing import Callabledef decorator(func: Callable):    def inner_func(*args, **kwargs):        print(kwargs, func.__annotations__)        func(*args, **kwargs)    return inner_func@decoratordef getAge(age: int, name: str):    print( f" {age} : {type(age)} , {name} : {type(name)} " )

When we call the function using getAge(age=12,name=Mahesh), this is the output we get.

kwargs: {'age': 12, 'name': 'mahesh'}__annotations__ : {'age': <class 'int'>, 'name': <class 'str'>}

Now, this makes sense. Your annotations are extracted from your function in the compile time of the function and not changed by your arguments.

So, to achieve automatic type conversion on the function, we can write all the code to convert types as mentioned in the function annotations.

#/bin/env pythonfrom typing import Callabledef decorator(function: Callable):    def inner_function(*args, **kwargs):        newKwargs = {}        for argName, typeName in function.__annotations__.items():            if typeName is int:                intVar = int(kwargs[argName])                newKwargs[argName] = intVar            elif typeName is str:                strVar = str(kwargs[argName])                newKwargs[argName] = strVar        function(*args, **newKwargs)    return inner_function@decoratordef getAge(age: int, name: str):    print( f" {age} : {type(age)} , {name} : {type(name)} " )

And boom, this works!

Whenever you call the function, it gets automatically converted to the required type and if it fails, it will throw the value error.

Then I researched this binding in React

Har Har Mahadev! — Tue, 27 Oct 2020 03:41:38 GMT

Ive been learning javascript and react library in-depth for a couple of weeks as of now and one thing that always keeps bothering me is the event callbacks or the functions passed down to children ( generally stateless ) are always somewhat magically needed to be BIND to the original class/component.

I asked couple of people about why this is needed and didn't get a satisfactory answer from most of them. Then, I started to research what is the binding thing in react and why it is needed.

Then I started reading a couple of books and articles and after being confident enough to have known about the classes and objects in javascript, I can I am pretty comfortable to all this.

First, lets clear up some terminologies.

PrototypesIts a javascript way of passing attributes/properties to its children. Whenever javascript starts searching for methods/properties not found in a child class, its concept of inheritance is it starts following the prototype chain and stops on the first entry matched.

Call StackWhen a new function is called its pushed onto the call-stack and javascript execution model executes it passing all the required arguments and properties.

Call SiteThe actual place where you call your function is called callsite which is also by default bound to the called function as this

Now, lets start with a basic class.

class C{    constructor(name) { this.name = name }    thisIsAFunction(){        console.log(this.name);    }}

So, this is what it compiles down into when you convert it into es5 classes.

"use strict";var C = /*#__PURE__*/function () {  function C(name) {    this.name = name;  }  var _proto = C.prototype;  _proto.thisIsAFunction = function thisIsAFunction() {    console.log(this.name);  };  return C;}();

So, basically, a new function is created called C, its prototype is saved to another variable and an attribute or method called thisIsAFunction is attached to its prototype so now when javascript goes on in search of methods, it follows the path of thisIsAFunction. And in the end C is returned but the whole function is immediately invoked and saved in C.

Okay, lets look at it one by one. The this is the main boogeyman in this code and in this article as a whole.

So, theres two things to take care of now i.e. call-site and call-stack.

function C(name){  console.log(this);  console.log(this.name);}

So, what do you expect to be output ?

Lets visualize the execution model followed by your javascript engine to run this piece of code. First your interpreter goes on executing whatever is in the global scoping.

Next, whenever it shows a call to a function. The function is pushed into the call-stack with all its arguments and properties and stuffs. So, when the code is inside the execution of C function.

The item on top of call-stack is C. And the place where function is called is global or window or globalThis. Now, what this is evaluated is not to the original function but the actual place where function is called which is also called as call-site.

So, When I call this function, this is attached to place of call i.e. globalScope or undefined if youre in strict mode.

You can go on and try that piece of code and you will see it will print your window or globalThis on node environment. and this.name will be undefined now, lets tweak its call a little bit.

var name = 'mahesh'function C(name){  console.log(this);  console.log(this.name);}C();

It should print mahesh. Now recall what I said earlier, this is bound to call site. So, your global variable exists in this. You can look on the this object to find name: Mahesh existing there.

With that being said, if you try to modify this.name inside the function it will also be modified.

Now, lets see what this means when its an object constructed from a class. Or called as new binding.

class TestClass{constructor() { this.name = "ram"; }testMethod() { console.log(this.name); }}

Now, when I create a object from this class using new binding and call it.

const obj = TestClass();obj.testMethod();

This will print expected result.

But keep in mind, javascript doesnt bind this in compile time i.e. testMethod doesnt knows its this should always point to the object when its detached from its main class.

const extractedMethod = obj.testMethod;extractedMethod(); //**this **is undefined

So, with that being said, this is not attached to your class or object but its referenced whenever its being executed i.e. obj.testMethod() here obj behaves as this.

But this is not the case with other modern programming languages which confuses most of the other language programmers.

Now, we are near to the end in this journey.

const extractedMethod = obj.testMethod;

You can do this its completely fine but keep in mind your method is not attached to your class. Its sitting freely somewhere in javascript virtual memory world.

const extractedMethodWithThis = obj.testMethod.bind(obj);

What this does is your testMethod is now bound to always use obj as context.

Even if you pass an object when calling it will never change this context from being obj.

//try thisextractedMethodWithThis.call(someOtherObject);

This wont work because its bound to apply the context as obj whenever its called.

Now, adding this all up knowledge. Whenever you make a Class Component in React.

Your piece of code ( method ) from your class is not called just after instantiating the object but passed along extracted from the object.

const yourComponentHandler = yourComponentObject.handleClick;react.bindToClicks(yourComponentHandler)

Now, whenever your handler is called it lost your context of being this as your object, so it needs to find that every time. So,

YOU BIND IT TO ALWAYS POINT TO YOUR OBJECT !

Trending Phishing Pages.

Har Har Mahadev! — Fri, 23 Oct 2020 15:12:54 GMT

Dashain time. Was just chilling in bed scrolling facebook and chatting with some of my homies trolling each other. One friend told I should never go on date to Dakshinkali because that place haunts couples. I dont know why I decided to share this lmao but dont try that theres some pretty horror stories behind my statement. Hahaha, jokes on you single bitches, I want to make you suffer. Aaah nvm, truth tho.

Please 18+ materials ahead.

So, getting in the topic. I wanted to talk about phishing incidents happening recently. I wont say yall 21 yo fuckbois go to phishing page to see sabita bhabi having fun with her debar and complain your account got hacked?

Fuck off.

The real shit here is these phishing pages got so horny titles that not even a guy with ejaculation disease or girl phobia shits wont think twice to whether click or not. And recently, these phishing stuffs got so better I just got one from friend saying Oh, Mahesh !!! :O :O . Damn, they got really one step further with this move.

Imagine your father I mean, someone elses father scrolling their facebook bored and they see Oh, Ramkumar :O :O See what these horny couples did when they were alone with a creepy af thumbnail.

Cmon if I become a father Ill surely click ( if I wasnt techie enough ) also when your gf doesn't gets on mood everytime so nyaaaaah. TOP TIER SECRET REVEALED !

This boring ass friend who never texted me for a decade texted me a phishing page and I laughed at first seeing how legit the video thumbnail looks lmao. Then I decided hmm lets see what this shit really does.

Okay, homie. Ill login but in one deal Ill make a new account. Quickly fired up a new account and logged in in the phishing page.

Wait first let me talk about the source code of the phishing page.

This is the page, looks pretty legit, aehi ?

Okay feed me your source code baby girl.

So these were two js responsible for redirecting to single moms in my area page after I login on this.

I mean I have done phishing to a lot of people but honestly if the thumbnail is seductive and after login I get redirected to horny mom in my area who dont wanna be in relation but just have sex. I will think oh okay that's pretty legit. No idea about you.

Then, it was time for real test of what do these shits do.

Okay, after putting my pants in shape.

I made a new account and added my original account in it. Then I texted some shits between two accounts as I have no friends :( . Then I found out this thing. I opened the phishing page logged into my account. Then after 38 minutes this device popped up.

Then I got a mail.

All the Linux sessions are my device but who the heck uses windows ? Not me. Never me. So, I thought once wait do they really do this stuff manually I was expecting some API stuffs. Or maybe they use fbchat module but whatever this is how your account gets hacked tryna be extra horny or extra curious on the internet and putting your credentials everywhere.

Also my account email got changed :( and Im pretty sure they do this stuff manually. Hope my email dont get spammed :(

Dont put your d*k and credentials on the internet without seeing the domain

I mean, the message I want to convey from this article is phishing is increasing rapidly and at last I want to show you how to identify these pathetic bitches staying whole days writing facebook pages clones and trying to grab some accounts either for business or adware or whatever.

See the URL, facebook never does like this grabbing password from a third party site and loggin in into facebook. If youre that dumb, just notice this if your URL doesn't end with .facebook.com dont put your facebook.com credentials there.

They even sent a text to myself about my text hahaha. That was funny but if its your real account its pretty sad if you get some adult videos links in your inbox.

x86–32bit function calls in assembly explained.

Har Har Mahadev! — Sun, 30 Aug 2020 05:09:10 GMT

Hello there. In this article I will explain how function calls work in assembly from stack point of view i.e. how stack grows and shrinks when functions are called and returned from the function.

The prerequisite for this course is..

#include int sub(int a, int b){    return a - b;}int main(){    sub(8,2);    return 0x1337;}

You must be able to understand this piece of code and explain what it does as a c-programmer. If you know it in assembly too, its cool but understanding as a c-programmer is must.

**int main() {**00C21020  push        ebp  00C21021  mov         ebp,esp  00C21023  mov         ecx,offset _5B900AD5_example1@c (0C25000h)  00C21028  call        __CheckForDebuggerJustMyCode (0C21050h)      **sub(8,2);**00C2102D  push        2  00C2102F  push        8  00C21031  call        sub (0C21000h)  00C21036  add         esp,8      **return 0x1337;**00C21039  mov         eax,1337h  }00C2103E  pop         ebp  00C2103F  ret

This is how the code above is compiled down into a assembly level. Im talking about only the main function. Sub is not included here.

So before we move into analyzing this code, I want you to understand these things.

Registers

Registers are small memory storage areas which lives in the processor i.e. volatile memory. There are 10 registers in 32bit processor ( including flags register and instruction pointer) while it may vary from processor to processor. And by the name, you may get all registers can store values upto 32bits.

Some registers are explained below.

eip ( extended instruction pointer) : This register stores the address of next instruction to execute by the processor.
ebp ( ext. base ptr) : This register stores the base address or the address of stack on the memory space from where the currently executing function can pull data from. All other memory places are undefined ( well technically they may be defined) but a program should not be able to access it.
esp ( ext. stack ptr) : This register points to the end of the stack i.e. last used memory address to the stack i.e. whenever a new item is pushed to the stack it automatically decrements to point to the last item.

These are other common register which is helpful to know and also one thing to know these below are conventions its okay for a compiler not to use this convention but it makes other fellow devs and (we reverse engineers) to understand what it does.

eax : ( extended accumulator register) : This register is generally used to store function return values. i.e. return 3; is equivalent to mov eax,3.
ecx : (extended counter register): As the name suggests its used to count the indexes in the loops and strings operations.

Okay, I mentioned about the instruction pointer but not the flags register. Basically, after each operations these flags get either set or unset based on the operation for e.g. theres a flag called SF which is 0 if the result of operation is positive value else 1 which indicates a negative value. Different flags i.e. (1 and 0) are named as the different bits of eflags.

This is confusing, so I attached one image below.

Some refreshers to assembly instructions.

mov

This instruction is used to move data from register to register, or memory to register or register to memory. Keep in mind, memory to memory cant be done.

mov eax,3 ; move 3 to register eax.
mov eax,ebx; move value from ebx to register eax
mov [eax], ebx; move value from ebx to memory address stored in the register eax.

But this is not allowed.

mov [eax] , [ebx] ; memory to memory data flow is not allowed.

So either you can move data from memory -> register -> memory or either just change the memory address.

push

This is a very simple instruction which is responsible to move the value to stack and decrement the stack pointer to point to the new value.

push 3; push 3 to the stack and decrement the stack pointer by 4 bytes as integers are 4bytes.
push ebp; push value of ebp to the stack and . as above.

pop

This removes the last inserted element from the stack and saves in the register also incrementing the esp by 4.

pop ebp; remove last value from stack and put it in the ebp register.

add / sub

Basically just addition and subtraction to the two memory address values / register values.

add eax , 3; add 3 to the current value of eax.
sub eax, 3: subtract 3 from current value of eax.

call and ret

This instruction has a bit complicated job to do. What it needs to do is to jump into another code or a function but the control must be able to continue from where it was when the anther code block calls ret.

So first of all, the address of next instruction to execute after the function call is pushed on the stack and then the eip points to the functions first statement.And whenever the callee returns using ret statement the first value from stack pops off and that value is the next instruction address to execute.

Now well see some stack diagrams to understand how it works i.e. function calls.

Stack frame refers to the currently accessible memory space in the stack which is accessible by the current function.
First of all I want you to know except the global memory and dynamically allocated memory all the other variables are stored in the stack. i.e. local variables are always stored in the stack. Now, if this is our main function the stack will look like this.

All the local variables you define in the function are pushed on top of the stack. and then theres something called caller-saved registers which we wont discuss here. After it, the arguments are pushed from left to right into the stack. just like this. in the above code block.

006B102D push 2 006B102F push 8 006B1031 call sub (06B1000h)

The left hex value is instruction address and the right values are instruction in assembly. As we had done sub(8,2), you can see its evaluated in rtl order.

If we look at the complete source code, we see this first.

int main() {006B1020 push ebp 006B1021 mov ebp,esp

What this is doing is ebp is the base pointer of whatever the function is calling our main function. Then our main function needs to save their stack pointer and make a new stack frame the current stack pointer i.e.

their ebp is pushed into the stack and ebp is changed to esp so that now this function will evaluate all its stack from current stack pointer so it doesnt modify the previous functions values and also its popped at end so that the calling function gets to access its stack after this function does its job.

Now we have clear understanding of the main function. Now what the sub function does is essentially this

int sub(int a, int b) {00051001 push ebp00051002 mov ebp, esp return a  b;0005100D mov eax,dword ptr [a] 00051010 sub eax,dword ptr [b] }00051013 pop ebp 00051014 ret

So its creating its own stack frame by moving the previous stack base to the stack and changing base pointer to new stack frame. Now as I had said previously eax is responsible for saving the return value to the previous function. You can see the mov and sub instruction are just saving a-b to the eax regiter.

Then at the end ebp is popped off from the stack so the caller knows its stack limits and the control is returned to the next instruction from the caller function.

Reference:IntroX86Creator: Xeno Kovah @XenoKovah License: Creative Commons: Attribution, Share-Alike (opensecuritytraining.info

Professional Assembly Language, Richard Blum.

Journey of nand2tetris — building own computer from principal gates

Har Har Mahadev! — Mon, 24 Aug 2020 14:12:04 GMT

Hey everyone. Its been few weeks Ive dig in into this book. So, this book covers how a modern computer is built using just logical gates. Let me talk about the book structure first. So basically, you build a simple logic gates in first and in the next one take the logic gate you built as granted and develop ALU out of them, in the next chapter you add memory or sequential logic to your computer and so on you keep abstracting the learnt stuff and only worry about the usage not the implementation in the upcoming chapter.

You may be thinking hows this possible to do at home (just like me). Well, these authors have done pretty good job for learners as they provide all sort of simulation tools and techniques so you can run and simulate every gates and logics from your computer. And they put projects on every chapter so you take away some knowledge by hands-on practice.

We are provided with a HDL ( Hardware Description Language ) which is very simple language which lets you write the specifications and implementation of sophisticated chips and simulate them and thats what modern engineers on this field do, it looks like this

Chip Something{ IN a,b;OUT out;PARTS:Xor(a=a,b=b,out=out1);......And(a=out1,b=out1,out=out);}

This is how they laid it out, c being a chapter level.

So talking about what I did so far, I have been following the book and also following a video course or lectures on Coursera made by the authors of the book which basically explains the stuffs if youre a hardcore visual learner kinda like me. [ all links in bottom ]

So far Ive completed all elementary gates using universal NAND Gate. Then, I made de multiplexer and negator which is quite complex as I had to frequent jump into my digital logic book to understand. Also, k-map was a plus point for me to generate boolean function quick to implement using gates.

Then joining on, I got into building half adder, full adder and also virtually implementing a subtractor which is basically addition but in 2s form.

Then what I was able to build is a full ALU which is capable of taking instruction and processing out. I never thought it would be that clear although it is very basic one.

So, if anyone interested I put my ALUs instruction set.

I decided not to upload any codes as interested learners may find it as spoilers. Ive put the links below if you are interested.

Coursera : https://www.coursera.org/learn/build-a-computer/
Coursera part 2 : https://www.coursera.org/learn/nand2tetris2/
Book : Cmon you can search for pdf everywhere on internet. Also you can buy the book if you support the creators.

Walkthrough of Dropbox Security CTF — Android Problem

Har Har Mahadev! — Tue, 18 Aug 2020 09:31:13 GMT

Lately I am interested in reverse engineering as much as Achal Sharma is interested in Udip Shrestha. So, I completed the DIVA vulnerable app and many other android reversing courses via youtube. And, I planned to try one real problem. Not a real tho, it was a CTF Challenge I found in github and planned to try this one.

So, I downloaded the apk from above provided link and quickly setup my android device with adb and run the app. The app greeted with me screen which says guess the flag.

I tried blank inputs, tested very long inputs and got no luck. It always said Nope, thats not it.

Then i quickly checked adb logcat if the app is leaking some logs which gives me flags. But no luck, I had to decompile this app now and do static analysis.

I quickly fired up jadx and opened my apk there. This is what I found the flag checking function.

This is too complicated and I need to spend 3/4 hours to find flag by analyzing this code.

So basically if youre not a java developer what this does is whenever you type anything in that field and press the button. It checks if your provided string equals with the string returned by the function i(). And now our search is very narrow but deep.

In order to find the correct flag, we need to find what is the string returned by the given function. So, if you started doing maths on that, well good luck on that.

Trust me, I tried.

Then, I got an idea. If this application is checking this string with my string. Then what if I can log this string out. Then i decompiled my app using apktool.

apktool d FlagApp.apk

Now, my FlagApp folder had all codes decompiled into smali files which can be analyzed and also modified. Then, I got a plan. I will inject a code in up of the return statement to log out the string which is about to be returned by i function. So, This is the expression which logs out a string in smali.

invoke-static {v0, v1}, Landroid/util/Log;->v(Ljava/lang/String;Ljava/lang/String;)I

Here, v0 and v1 need to be two strings which will behave like log title, and log message. So, I need to put my string here and the application will log out the flag to me.

Here, you can see line 235 is converting my byte array into string array and saving into string instance v0. Now, I need to log this v0 out. Now, it doesnt matter to me if the log title and log message is same.

So, I added the highlighted line which basically logs the value in v0 out to the android logger which I can read from logcat.

After editing this piece of code. I again recompiled and signed the app using these commands which is pretty straight forward.

apktool b -d application -o unsigned.apk

Now, The app is created but I also need to sign it so I created a keystore and signed my application with it.

keytool -genkey -v -keystore key.keystore -alias alias_name -keyalg RSA -keysize 2048

After creating a keystore, time to sign my apk with it.

jarsigner -sigalg SHA1withRSA -digestalg SHA1 -keystore key.keystore unsigned.apk alias_name

Now, my apk is signed and ready for installation but we need to zipalign it because the old certificate conflicts with my new one.

zipalign -v 4 unsigned.apk final.apk

Now, I am ready to install my app. And did it too :D

adb install final.apk

Now I opened the app and run adb logcat with grep in threadLoop because threadLoop occurs so frequently its impossible to see any other logs.

so I did this.

#/ adb shell130|OP486C:/ $ logcat | grep -v threadLoop

This greps all log output except those lines containing threadLoop. Now, I opened the app and hit some random inputs and checked the logs.

See the last output, its from my apk as the log title and log message is same.

Checked in the app if its correct one and yes :D Thats the flag :P

how i made a ridiculously simple but fun bot — with fbchat

Har Har Mahadev! — Sat, 01 Aug 2020 06:26:44 GMT

Hello hahaha.

Its ridiculous I am writing a article about a very simple bot which just texts randomly yet so many people chatted with it and didnt notice hahaha. Though, some people got triggered but who cares xd. Pornhub wouldnt exist if he asked about his startup idea with his family, right? hahaha.

So, I talk with a lot of girls ( i mean, ofc ) and when i get bored all i text is hora and haina hola all the time. I wrote a pyautogui program which captures new text and types eh ok aru vana long ago. hahaha that was also so fun. i triggered many garals.

The idea was relatively simple I would capture the chatting persons profile picture and check if new instance exists (because facebook renders avatar next to text) for every new message. the code was something like this.

But this was very simple and checking image was too redundant and boring. As I needed to register every new photos and my code broke when they change profile picture. So, I checked facebook docs to see if they offer any messaging api.

They did offer one but it was nodejs sdk. And I do javascript but ( i dont enjoy it) idc what you say. I just dont enjoy writing cli scripts in nodejs. I prefer python.

So, I found this amazing module called fbchat hahaha with best docs for a personal project out there. ( Link in bottom )

I wrote a simple bot which just replied with hora and haina hola randomly.

Something like this and running it was so easy just pass and initialize your client and write some codes to secure your email and password as you dont wanna expose it on your github hahaha.

Then, I made a wordlist of like 200 words and it sent randomly from that list hahaha.

Let me add some screenshots.

hahahaha.

Then people said i should add some commands to this.

Then i first added -s- to start the bot and -x- to stop the bot. The mechanism now was. I would record the thread id and if its one of those command it would save / remove their id from thread id list and the bot will reply only to those who have their names on thread list something like this

Pretty straight forward right hahaha. And my send texts code was just checking if incoming message belonged to my list.

Then, I took it one step further.

Now the bot sends kanda ( porn for nepali) hahaha. The mechanism is it justs forwards a kanda video from one kanda group. I ditched this later as i dont wanna get in any legal troubles.

I added meme, when the message is -meme it will scrap reddit and send you a mem.

I added joke, also when message is -joke it sends you a random joke from one random api i found on the internet.

I also added voice message if you message something like -say {hello} it will say hello in voice message using google tts.

So the code was like this now..

The code is dead simple in like 100/150 lines but it was so fun.

Github: https://github.com/geekyarthurs/chat-bot

Docs of fbchat: https://fbchat.readthedocs.io/en/stable/api.html