Farshid PirahanSiah
Supporting Startups in Berlin
As part of my commitment to fostering innovation, I am offering one free on-site consultation session to startups in Berlin. This session focuses on computer vision and large language model (LLM) solutions, tailored to help startups kickstart their AI and machine learning projects.
How I Can Help:
- Computer Vision Solutions: From object detection to video analytics, I can guide startups in implementing state-of-the-art vision-based systems.
- LLM Applications: Whether it’s chatbots, document summarization, or multimodal AI, I provide actionable insights to integrate LLMs effectively into your workflows.
- Strategy and Roadmap: I help you identify potential AI-driven opportunities and create a roadmap for successful implementation.
Eligibility:
- Berlin-based startups.
- First session is on-site and free of charge.
Interested startups can reach out via LinkedIn
Let’s build the future of AI together!
About Me
I am an accomplished Research Engineer with over 7 years of experience, including a PhD in Computer Science and a Bachelor’s in Software Engineering. My career has been dedicated to the fields of Computer Vision, Machine Learning, and ML Operations. I have a proven track record of transforming complex algorithms into production-ready applications, utilizing cloud technologies and containerization, and optimizing ML pipelines and infrastructure in fast-paced environments.
With expertise in Machine Learning, IoT, Medical Imaging, and Robotics, I am proficient in designing image analysis algorithms and have made significant contributions to patents, books, and research papers. As a Technical Lead and Research Engineer, I have collaborated with stakeholders to define and achieve KPIs, managed cross-functional team projects, and deployed scalable AI solutions across cities and cloud platforms.
Experience
- 10+ years: PhD, R&D in Computer Vision, C++
- 7+ years: Machine Learning/Deep Learning, Python
- 5+ years: IoT, Model Optimization on Edge, Robotics, Medical Image Processing, Cloud Solutions (AWS)
- 3+ years: Technical Lead, Global Collaboration, Development Leadership
- 1+ years: LLM, Multimodal LLM, Vision-Based LLM, RAG, Langchain, OpenAI API
Core Skills
- Computer Vision and AI: Advanced expertise in image processing, deep learning model development, and application deployment.
- Project Management: Experienced in leading complex, cross-functional projects internationally.
- Technological Proficiency: Skilled in OpenCV, NumPy, Pandas, Matplotlib, PyTorch, TensorFlow, Docker, and AWS.
Technical Skills
- Languages: Python, C++, MATLAB
- Frameworks: TensorFlow, PyTorch, Keras
- Tools: Docker, Kubernetes, AWS, Git, Datadog, MLflow
- Operating Systems: Windows, MacOS, Linux
- Containerization: Docker
- Cloud Computing: Amazon Web Services (AWS)
- Hardware Architectures: x86, Apple Silicon, ARM, Jetson Nano, Raspberry Pi
- Others: CI/CD, GitHub Actions
Leadership and Collaboration
Strong teamwork and leadership capabilities have enabled me to lead international projects that exceed business expectations, driving growth and technological innovation in computer vision and machine learning.
Feel free to connect with me through LinkedIn or follow my updates on GitHub. I am always interested in discussing potential opportunities or collaborations in computer vision and AI-related projects.
Patents and Publications
Dr. Farshid Pirahansiah List of all publications
- Patents: A METHOD FOR AUGMENTING A PLURALITY OF FACE IMAGES - 2021
- The present invention relates to a method for increasing data for face analysis in video surveillance.
- WO2021060971A1
- Patents: A METHOD FOR DETECTING A MOVING VEHICLE - 2021
- The present invention relates to a method for detecting a moving vehicle.
- WO2021107761
- Patents: System and method for providing advertisement contents based on facial analysis - 2020
- Invented an algorithm, methods, and system for advanced facial attribute detection, leading to improvements in advertising systems.
- WO2020141969A2 WIPO (PCT)
- Book Chapter: Camera Calibration and Video Stabilization for Robot Localization, Springer, 2021.
- Authored over 16 publications in books, journals, and conferences globally.
-
My Google Scholar citation metrics are: Total Citations 141, h-index 7, i10-index 6
- My podcast
- LinkedIn Group over 55K members
- Facebook Group 15K
Professional Profiles and Networks
Coding Challenges and Competitions
Academic Contributions and Publications
Project Repositories and Code Sharing
Social Media and Community Engagement
Content Creation and Sharing
Learning and Development Platforms
Reading and Literature
List of My GitHub Repositories:
1. BI4CV
Business Intelligence (BI) tools for media content (Image/Video) - Generative AI Business Intelligence Computer Vision (BI4CV)
Repository Link: BI4CV
Description: This project is dedicated to revolutionizing how businesses utilize image and video data for insightful Business Intelligence (BI). Our tools are designed to enhance data storytelling through advanced visualizations, interactive dashboards, and comprehensive reports. Our system smartly selects the optimal visual representations based on the complexity of your dataset, making analytics accessible to all users.
Key Features:
- Advanced Visualization Tools
- Smart Dashboard Creation
- Anomaly Detection
- Local LLM Integration
- User-Friendly Interface
Update: 2024 - May
- Add 8 Blind/Referenceless Image Quality methods
- Add CI/CD Github Action
- Add Documents for the project plan
2. 3D Multi-Camera Calibration
- Repository Link: 3D Multi-Camera Calibration
- Overview: This project focuses on geometric camera calibration techniques to estimate the parameters of a lens and image sensor. Such calibration is crucial for applications in machine vision, robotics, navigation systems, and 3-D scene reconstruction.
- Research Links:
3. Advanced Programming with Modern C++ 23 for Image Processing
- Repository Link: C++ Image Processing
- Functionality: This repository contains advanced C++ code examples for image processing tasks. The main function,
int func_image_info(cv::Mat src, cv::Mat &dst)
, provides detailed information about an image including size, histogram, and more. - Additional Resources: YouTube tutorial on OpenCV
3. LLM
The collection of Python scripts provides a range of functionalities: one script automates logging into KaggleHub and setting up a pretrained Gemma model for chat simulations, another builds a GUI for real-time OpenCV function testing using PyQt5, while a third manages an asynchronous chat application with aiohttp. Additionally, there’s a script integrating machine learning models for data analysis using advanced libraries like langchain, another launching AI-powered chat applications, and one demonstrating interactions with natural language understanding models on HuggingFace using LLMware. There’s also a script using MLflow to manage the machine learning lifecycle and another detailing the local setup of Kubernetes via Terraform, showcasing infrastructure management and resource cleanup. These scripts employ a variety of technologies including Python, Gemma, PyQt5, OpenCV, aiohttp, asyncio, langchain, transformers, MLflow, Kubernetes, Terraform, and Docker, useful for tasks ranging from machine learning to software deployment. #MachineLearning #SoftwareDevelopment
- Source Code
- This script logs into KaggleHub, downloads a pretrained Gemma model and tokenizer, sets up the model, and enables interactive chat simulation. ## Libraries: os, sys, torch, kagglehub, gemma_pytorch
- Source Code
- This Python script uses PyQt5 to create a GUI application for real-time testing of OpenCV functions on images. ## Libraries: sys, PyQt5, cv2, numpy, screeninfo
- Source Code
- his Python script defines an asynchronous chat application that uses the aiohttp library to interact with a chat model API, handling concurrency with semaphores and maintaining conversation history. ## Libraries: asyncio, aiohttp, collections, json, re
- Source Code
- LLMOps & RAG: This Python script integrates various machine learning models and APIs to process financial data, interactively analyze text, images, and tables, and generate structured outputs. It employs libraries like langchain, transformers, and torch, alongside environmental variables for secure API key handling.
- Source Code
- Launch your chat app with OpenRouter’s AI! 🚀 Utilize asyncio and aiohttp for seamless conversations and manage interactions with a smart queue. Dive into the future of chat applications now!”
-
- This Python script demonstrates how to use the LLMware library to interact with various models hosted on HuggingFace for natural language understanding tasks. It performs a specific query about an invoice using a provided context and compares the response from the model with a pre-defined answer. The script uses time measurement to track model loading and processing times.
- Source Code
- This Python script uses MLflow, a platform for managing the machine learning lifecycle, including experimentation, reproducibility, and deployment. It demonstrates how to log parameters, metrics, and artifacts within an MLflow experiment. Specifically, it logs a parameter named “param1” with a value of 5, logs multiple values for a metric called “foo,” and records a markdown file as an artifact.
-
-This script outlines the setup and use of Kubernetes on a local machine using Terraform, along with tools like Docker and Kubernetes command-line interface (CLI) utilities, all managed through Homebrew on macOS. It demonstrates the installation of required software, setting up Kubernetes with Terraform, querying the Kubernetes cluster, and visualizing Terraform plans. Finally, it guides through cleaning up resources with Terraform. This sequence ensures a practical approach to infrastructure as code (IaC) development and testing in a controlled, local environment.
-llamafile
- LLaMA
- Ollama
- LLava
- OpenELM
Generative AI Development
On-Device Training & Edge Inference
- Optimization: On-device training
- Inference: Real-time LLMs
- Configuration: LLM tuning
- Resources: Power & speed balance
End-to-End Solutions
- Cloud-Based: AI pipeline development
- Deployment: Data to scaling
Multimodal LLMs
- Integration: Image & video processing
Large Vision Models (LVMs)
- Optimization: For edge computing
- Innovations: IoT responsiveness
Technical Expertise in Large Language Models (LLMs) and AI Development:
- Multimodal RAG Systems: Led the development of Retriever-Augmented Generation (RAG) applications integrating text, image, and structured data, enhancing multimodal interaction capabilities.
- Advanced AI Pipelines: Engineered end-to-end solutions for generative AI, leveraging cloud-based architectures to deploy scalable and efficient AI systems.
- Deep Learning Implementation: Proficient in implementing complex deep learning models, with extensive use of libraries such as PyTorch, OpenAI’s GPT models, and langchain for sophisticated text and image processing tasks.
- Data Handling and Processing: Experienced in manipulating large-scale datasets, implementing custom extraction and partition techniques for PDF data integration, utilizing Python’s robust libraries like PyPDF2 and pytesseract for OCR functionalities.
- Optimization Techniques: Applied advanced machine learning techniques including hyper-parameter tuning, quantization, and model compression to enhance performance and efficiency on target hardware platforms, particularly in edge computing scenarios.
- AI Model Deployment: Skilled in deploying AI models using Docker, managing environments with dependencies including langchain, unstructured, PyPDF2, and various OpenAI services, ensuring smooth transition from development to production.
- Research and Development: Authored comprehensive documentation and guides, effectively summarizing research findings and technical processes, demonstrated through detailed GitHub repositories and Jupyter notebooks.
- AI-Powered Summarization: Developed capabilities for summarizing diverse data elements (text, tables, images) using AI-driven approaches, significantly improving information accessibility and user engagement.
- Community Contribution and Collaboration: Actively engaged in community forums and collaborative projects, contributing to open-source projects and providing innovative solutions to complex problems in the AI space.
I am eager to bring my expertise to a dynamic team in Berlin, where I can contribute to groundbreaking projects and further advance the field of artificial intelligence.
My works on LLMs:
- Image Processing GPT
- GPTs: Computer Vision Developer
- Expert in Python, OpenCV for image processing and computer vision applications.
- MindMap about LLMs & LLMOps
- Code for chat app with OpenRouter’s AI! 🚀 Utilize asyncio and aiohttp for seamless conversations and manage interactions with a smart queue. Dive into the future of chat applications now!”
- fine-tune LLMs
- Microsoft AI Lab: RAG Workflow with Azure AI
- Lab Focus: Hands-on RAG workflow development using Azure AI Studio and Prompt Flow.
- Skills Acquired: Mastery in LLMOps, Azure AI Studio usage, and Prompt Flow integration.
- Tools Used: GitHub Codespaces, Visual Studio Code, Azure AI & ML Studio, Azure Portal.
- Outcome: Successfully developed and deployed “Contoso Chat”, enhancing skills in scalable AI solution development.
- Multimodal Conversational Interfaces with GPT and Vision AI
LLM and Generative AI Focused Courses
- Hands-On Generative AI with Diffusion Models: Building Real-World Applications
- Generative AI and LLMOps: Building Blocks and Applications
- Advanced AI: Transformers for Computer Vision
- Large Language Models (LLM) · Generative AI
- MLOps Tools: MLflow and Hugging Face
- GGUF-Quantization-of-any-LLM for IoT devices
- ollama pull llama3:8b
Tools and models I use for LLM
- OpenAI API
- langchain
- Hugging face
- Ollama
- LM Studio
- open-interpreter
- LLaVA: Large Language and Vision Assistant