MOHIT KUMAR

Technical Consultant and Trainer

"I am fascinated with Computer Science, mathematics, Universe and stuff. I like to understand things and the way they work under-the-hood. Occasionally, I like to explain the things that I understand using the first principle's approach. More formally I am a Researcher, Trainer, and a design consultant on the design of Artificial Intelligence(deep learning)based systems. Microarchitecture based optimizations are my specialization, more specifically P5(Intel) to Skylake(Intel). On the vector side, SIMD, GPUs(Nvidia). My micro-architecture knowledge enable me to see the complete stack. Close to 20 years in total experience, last 5 years I have been working on Optimizing Tensorflow and Models on Tensorflow on GPUs/CPUs. A general example of micro-optimization on Haswell microarchitecture."

Artificial Intelligence Research (DEEP Breath)

I feel that the barrier to entry for Deep Learning is very steep. Consider Natural Language Processing as an example. Neural Machine Translation, for example, uses concepts like LSTM, Bidirectional LSTM, Multi Layered LSTMs, Attention, etc. Neither one of them is easy to understand by itself, imagine the plight of a student when these concepts are strung together for a Neural Machine Translation or Google's BERT based systems. I have seen Neural Machine Translation Based systems grossly underperform and, it was simply because most of the hyperparameters were not understood at all.
DEEP-Breathe is a complete and pure python implementation of most complex models, especially but not limited to Neural Machine Translator.

SOFTWARE

Provide training and consulting on AI and more specifically on Deep Learning, and can work with your in-house team. Also we work in DevOps mode if needed to design, develop and maintain the AI solution. We deliver value by understanding your use cases and providing end to end strategy and implementation. It is very important to use open source tools and stay away from proprietary AI platforms. AI is still a moving target. By locking in to an AI platform, you risk wasting a lot of time and resources when new AI technologies appear. We only use open source tools and help you build AI and data science platforms which are platform agnostic and easy to maintain and support. Having done extreme optimizations(fine-tuning) and research on NMT/BERT/XLNET based architectures we are positioned uniquely to the following. Install, configure and optimize vanilla(google's) NMT/BERT/XLNET at your on-premise/cloud hardware Install Stillwater's pretrained and more finely tuned NMT/BERT/XLNET on your on-premise/cloud hardware Building AI platform for an Investment Banking Major for Automated Trading and Operational Optimization(Automated L1/L2/L3 support) using the above technology stack.

Concurrency

My main interests are techniques for designing, implementing, and reasoning about multiprocessor algorithms, in particular concurrent data structures for multicore machines and the mathematical foundations of the computation models that govern their behavior.

My research these days is directed at the use of randomness and combinatorial techniques in concurrent algorithm and data-
structure design.

SOFTWARE

Provide training and consulting on AI and more specifically on Deep Learning, and can work with your in-house team. Also we work in DevOps mode if needed to design, develop and maintain the AI solution. We deliver value by understanding your use cases and providing end to end strategy and implementation. It is very important to use open source tools and stay away from proprietary AI platforms. AI is still a moving target. By locking in to an AI platform, you risk wasting a lot of time and resources when new AI technologies appear. We only use open source tools and help you build AI and data science platforms which are platform agnostic and easy to maintain and support. Having done extreme optimizations(fine-tuning) and research on NMT/BERT/XLNET based architectures we are positioned uniquely to the following. Install, configure and optimize vanilla(google's) NMT/BERT/XLNET at your on-premise/cloud hardware Install Stillwater's pretrained and more finely tuned NMT/BERT/XLNET on your on-premise/cloud hardware Building AI platform for an Investment Banking Major for Automated Trading and Operational Optimization(Automated L1/L2/L3 support) using the above technology stack.

JVM Tunings and Optimizations

Designing, Tuning and, Optimization of JVM based applications. Most often tuning is not only a software job. It requires one to know the entire stack right down to the hardware and the tools with which one can pinpoint the real issue. Beside the usual profilers, the tools I specialize in are Perf, Systemtap, Dtrace, Solaris studio analyzer, JMH, JCStress etc.

Most often tuning is not only a software job and majority of the java CPU profilers have little idea as what is happening beyond the JVM.
So using the right profiler with minimal overhead is the key. PMCs ( Performance Monitoring Counters) are special CPU registers that can record the entire trace of a call. Perf and system tap are tools that make this extremely easy and then generate a flame graph(figure-2) to inspect the calls that are bottlenecks.

SOFTWARE

Provide training and consulting on AI and more specifically on Deep Learning, and can work with your in-house team. Also we work in DevOps mode if needed to design, develop and maintain the AI solution. We deliver value by understanding your use cases and providing end to end strategy and implementation. It is very important to use open source tools and stay away from proprietary AI platforms. AI is still a moving target. By locking in to an AI platform, you risk wasting a lot of time and resources when new AI technologies appear. We only use open source tools and help you build AI and data science platforms which are platform agnostic and easy to maintain and support. Having done extreme optimizations(fine-tuning) and research on NMT/BERT/XLNET based architectures we are positioned uniquely to the following. Install, configure and optimize vanilla(google's) NMT/BERT/XLNET at your on-premise/cloud hardware Install Stillwater's pretrained and more finely tuned NMT/BERT/XLNET on your on-premise/cloud hardware Building AI platform for an Investment Banking Major for Automated Trading and Operational Optimization(Automated L1/L2/L3 support) using the above technology stack.

Ultra Low Latency design and Architecture

With an intimate knowledge of the hardware specially the CPU(x86 family mostly) and the GPU, I specialize in the design and architecture of ultra low latency software.

SOFTWARE

Provide training and consulting on AI and more specifically on Deep Learning, and can work with your in-house team. Also we work in DevOps mode if needed to design, develop and maintain the AI solution. We deliver value by understanding your use cases and providing end to end strategy and implementation. It is very important to use open source tools and stay away from proprietary AI platforms. AI is still a moving target. By locking in to an AI platform, you risk wasting a lot of time and resources when new AI technologies appear. We only use open source tools and help you build AI and data science platforms which are platform agnostic and easy to maintain and support. Having done extreme optimizations(fine-tuning) and research on NMT/BERT/XLNET based architectures we are positioned uniquely to the following. Install, configure and optimize vanilla(google's) NMT/BERT/XLNET at your on-premise/cloud hardware Install Stillwater's pretrained and more finely tuned NMT/BERT/XLNET on your on-premise/cloud hardware Building AI platform for an Investment Banking Major for Automated Trading and Operational Optimization(Automated L1/L2/L3 support) using the above technology stack.

My main interests are techniques for designing, implementing, and reasoning about multiprocessor algorithms, in particular concurrent data structures for multicore machines and the mathematical foundations of the computation models that govern their behavior.

My research these days is directed at the use of randomness and combinatorial techniques in concurrent algorithm and data- structure design.

With an intimate knowledge of the hardware specially the CPU(x86 family mostly) and the GPU, I specialize in the design and architecture of ultra low latency software.

My research these days is directed at the use of randomness and combinatorial techniques in concurrent algorithm and data-
structure design.