Peyman Farahani

Avatar

About Me

About

About
Swimming
Skiing
Santoor
Gaming
Movies

Experience

Experience

Mar 2024 - Present Business Intelligence Developer Riton
Jan 2023 - Aug 2024 Business Intelligence Developer SnappTrip
Aug 2022 - Jan 2023 Commercial Data Analyst SnappTrip
Aug 2020 - Apr 2021 Data Scientist & Business Process Analyst Saipa

Activities

Activities

Director of Finance

Amirkabir University Of Technology Scientific Association Of Industrial Engineering

Member

Andeesheh Toastmasters Club

Band Leader

"Tanin" Music Band

My Skills

Skills

Software
Programming Language
PostgreSQL
SQL Server
SSIS
SSAS
Microsoft Excel
Microsoft PowerBI
Tableau
Apache Airflow
Git
Microsoft Access
Weka
IBM SPSS & SPSS Modeler
Python
SQL
Spark
Bash

My Projects

Projects

May 2024

Person Detector Model

Details

This project involves a Flask API designed to detect persons in images using the YOLOv8 model. The API processes image URLs submitted via POST requests and returns a JSON response indicating whether persons are detected

The project leverages the YOLOv8 model, integrated into a Flask-based web service to facilitate person detection. The API endpoint "/detect" accepts a JSON payload containing a list of image URLs. It uses YOLOv8 for inference, detecting persons in the images. The results are processed and returned as a JSON response, where each image URL is paired with a boolean indicating the presence of persons. The service is optimized to handle multiple images and provides real-time detection results, showcasing an efficient approach to object recognition in images.

Feb 2024

Time Series Forecasting for Net Margin and RoomNights

Details

This project focused on building a robust forecasting system for net margin and room-nights sale using a blend of traditional time series models and machine learning techniques.

The approach began with extensive data preprocessing, including time-indexed data alignment and cleaning to ensure consistency. Exploratory analysis was conducted to identify trends, seasonality, and residual components using decomposition techniques. SARIMAX was selected as the primary model based on its ability to capture both trend and seasonal variations. In addition, Prophet and machine learning models were implemented to benchmark results. Comparative visualizations of predicted versus actual values demonstrated SARIMAX’s efficacy, while alternative models provided complementary insights. The forecasts were finalized, visualized, and exported to CSV files for further business analysis and decision-making.

Feb 2024

PostgreSQL Function and View DDL Extractor

Details

This script automates the extraction and organization of Data Definition Language (DDL) source code for user-defined functions and views from a PostgreSQL database

The project involves two Python scripts that connect to a PostgreSQL database and retrieve the DDL for functions and views. The "extract_function_ddl.py" script extracts function definitions, organizes them into directories based on schemas, and saves them as .sql files, handling invalid characters in filenames and long names. Similarly, the "extract_view_ddl.py" script performs a parallel task for view definitions. Both scripts ensure proper organization of extracted DDL files and handle potential exceptions during the process, providing a robust solution for database documentation and analysis.

Jan 2024

Failed Tax Resending Automation

Details

This project automates the process of identifying and resending failed tax records

The project involves a two-step process executed via PySpark and Apache Airflow. The "update_tax_failed_null_response.py" script utilizes PySpark to extract data from PostgreSQL databases, identify failed tax records, and store them in a dedicated table for further processing. The "tax_daily_resend_failed_null_response_dag.py" script, orchestrated by Apache Airflow, schedules the execution of the PySpark script and subsequently manages the resending of the failed records by interacting with the database. The automation ensures timely updates and accurate handling of failed tax records, improving the reliability and efficiency of tax data processing.

Nov 2023

Hotel Location Data Extraction and Reverse Geocoding

Details

A Python project focused on extracting hotel location data from a PostgreSQL database and applying reverse geocoding to obtain address information

This project retrieves latitude and longitude data from a PostgreSQL database, leveraging SQL queries through the pandas library. It extracts geographical coordinates of hotels and then uses the OpenStreetMap API (Nominatim) to perform reverse geocoding, converting these coordinates into human-readable address formats. The final output is a structured dataset of hotel IDs paired with their respective latitudes, longitudes, and corresponding addresses, providing valuable location-based insights.

Feb 2023

Cron Task Scheduler

Details

A comprehensive toolset for scheduling tasks and identifying non-overlapping intervals between them within a specified timeframe

The project involves several Python scripts for managing cron job schedules and visualizing them. The "data_preprocessor.py" script consolidates input datasets into a unified output, categorizing jobs and renaming columns for consistency. "gantt_chart_generator.py" creates Gantt charts to visually represent job schedules using Plotly, displaying job timings over user-defined intervals. "cron_schedule_lookup.py" parses cron schedules to generate datetime lists for tasks within specified intervals and date ranges. Finally, "cron_task_scheduler.py" refines task scheduling by accounting for free intervals and optimizing crontab schedules to minimize unassigned times.

Feb 2023

Airflow Dataset Preprocessing

Details

This project involves the extraction and preprocessing of Apache Airflow DAG data for detailed runtime analysis

This project consists of two main components: "airflow_dag_scraper.py" and "airflow_dataset_preprocessing.py". The "airflow_dag_scraper.py" script extracts DAG metadata from the Airflow UI using session-based authentication, generating a CSV file (airflow_dags.csv) containing DAG IDs and their schedules. The "airflow_dataset_preprocessing.py" script processes this data along with Airflow log files to compute runtime statistics, such as average and maximum runtimes, and failed run metrics. The processed results are consolidated into a summarized report saved as dags.csv. The project utilizes libraries like pandas and BeautifulSoup for data manipulation and web scraping, and it generates insights into DAG performance to facilitate further analysis.

Jan 2023

HTML Table to Markdown Table Converter

Details

This script converts HTML tables into Markdown tables while preserving text formatting and HTML tags where necessary

The project involved creating a Python script utilizing the BeautifulSoup library to parse and transform HTML table structures into Markdown format. The script processes both header and data rows, and directly prints unconvertible HTML attributes as they are.

Dec 2022

Cron Gantt

Details

This project involved developing a script to visualize job schedules using a Gantt chart, with input data in a pandas DataFrame

The script processes job scheduling data from a pandas DataFrame containing job names, crontab schedules, categories, and durations. It converts crontab schedules into a series of start and end times, and aggregates these into a Gantt chart using Plotly. The chart visualizes job schedules within a user-defined interval, displaying job timings and categories clearly. The output is an interactive Gantt chart saved as an HTML file, which helps users analyze job schedules effectively and is displayed for immediate review.

Dec 2022

Proxy Extractor

Details

A tool designed to retrieve free proxies from ProxyHub.Me and verify their connectivity status

This project involves the development of an asynchronous Python script that extracts free proxy lists from ProxyHub.Me across multiple pages using aiohttp for efficient HTTP requests. The script employs BeautifulSoup for parsing the HTML content and organizing the proxy data into a structured pandas DataFrame. It then checks the operational status of each proxy by attempting to access a test URL, logging proxies that are responsive and removing those that fail. The results are saved in a CSV file for easy access and future reference. This approach ensures up-to-date and functional proxy lists for use in various applications.

Dec 2022

Python Main Menu

Details

A template to refer to when creating a main menu in Python scripts

The Python Main Menu project is a versatile template designed to create a user-friendly menu system for Python applications. It includes two primary functions: FeatureMenu and MainMenu. The FeatureMenu function displays a list of features with toggles for their activation status, while the MainMenu provides an entry point for navigating to different application sections or exiting. The menu options are controlled through user input, with clear feedback on invalid choices. This template demonstrates the use of modular functions to manage user interaction and application flow efficiently, offering a clear structure for expanding menu options and features in Python scripts.

Oct 2022

reCAPTCHA Solver

Details

A script designed to automatically solve reCAPTCHA v2 audio challenges

This project involved creating an automated solution for solving reCAPTCHA v2 audio challenges using a combination of Selenium WebDriver and third-party services. The script accepts a URL and a reCAPTCHA site key, then interacts with the reCAPTCHA widget to trigger and retrieve the audio challenge.

Oct 2022

Hotel Guarantee Code Checker

Details

A Python-based tool for ensuring accuracy and consistency in hotel guarantee codes by cross-referencing data from two different platforms

This project involves creating a Python script that accepts a specific date and scrapes guarantee code information from both hotels' panel and admin's panel. It compares the data to identify any discrepancies between the two platforms. The program generates a detailed .csv file highlighting these inconsistencies, helping hotel administrators ensure that guarantee codes are labeled correctly. The tool also allows for optional tables to be generated, providing additional insights depending on the user's needs. This project significantly reduced manual checking time, improved data accuracy, and contributed to more reliable hotel booking processes.

Oct 2022

SnappTrip International Flight Crawler

Details

A simple program for crawling international flight data, allowing users to specify dates and destinations, and outputting the data as a CSV file

This project involves a Python-based web scraping tool designed to collect B2C and international flight information from Snapptrip.com. The program accepts dates as input and uses a customizable mapping file to specify destinations. It leverages asynchronous requests (using aiohttp) to efficiently scrape flight data, processes the raw data using Pandas for cleaning and formatting, and then exports the information into a CSV file. The tool's user interface is built using Tkinter, enabling users to select the directory containing the destination mappings. The final output is a neatly organized CSV file containing detailed flight information, including prices, departure and arrival times, and airline details.

Oct 2022

B2B Flight Crawler

Details

This project involves a suite of crawlers designed to extract flight data from various B2B flight providers and a comparison tool to analyze and compare the prices

The project is composed of several modules, each tailored to crawl data from specific flight providers, including Snapptrip, Altrabo, and Parto. The crawlers, implemented using Python, leverage asynchronous processing to efficiently gather flight information. The data is then cleaned and formatted for consistency, and supplementary crawlers are used to handle any missing information. The final step involves a comparison module that merges the datasets from Snapptrip and Altrabo, removing duplicates and identifying the best prices across providers. The output is a comprehensive dataset that provides detailed price comparisons, helping stakeholders make informed decisions based on the most accurate and competitive flight pricing available.

Oct 2022

Free RoomNight Lookup

Details

A program designed to retrieve and export detailed information about hotel bookings using booking codes

This project involves a tool that accepts hotel booking codes as input and scrapes relevant booking information from a designated website. The program logs in securely, processes the booking codes, and attempts to retrieve data from both B2C and B2B booking databases. The extracted data includes details such as hotel name, city, booking and checkout dates, room type, and pricing information. The processed information is then neatly organized and exported into a CSV file, providing an easy-to-use format for further analysis or record-keeping.

Aug 2021

Credit Card Fraud Detection Using Supervised Learning Methods

Details

7 supervised machine learning methods tested on 2 million credit card transactions in order to find the best fraud detection model

Abstract: Today, most e-commerce transactions are done through credit cards and online banking. With the spread of e-commerce and the advancement of technology, credit card transaction fraud has also increased and many companies have faced huge losses every year, thus the detection of financial fraud is one of the most important aspects.

In this study, the data set including simulated credit card transactions on the Kaggle website was used and after performing the preprocessing steps, the SMOTE method was used to overcome the problem of data set imbalance. This study used supervised machine learning methods including Gaussian Naïve Bayes, K-Nearest Neighbors, Support Vector Machines, Decision Trees, Random Forests, AdaBoost, and Bagging to detect fraudulent transactions in credit card dataset.

After evaluating their performance using Matthews Correlation Coefficient and Area Under the ROC Curve, Random Forest model with an average accuracy of 0.65 and 0.98, respectively, was selected as the best model for predicting fraudulent transactions.

Feb 2021

Spotify Music Taste Discovery

Details

A project focused on analyzing Spotify's "Discover Weekly" playlist to uncover personal music genre tendencies

I analyzed Spotify's "Discover Weekly" playlist to uncover personalized music preferences. By extracting and pre-processing the playlist data, I was able to generate insights into the user's musical inclinations. The final output was a visually engaging radar chart that clearly illustrated the dominant music genres, providing a comprehensive overview of individual music tastes.

Jul 2021

Strategic Planning for SAIPA

Details

"Strategic Management and Planning" course project

We chose SAIPA because it has had an active presence as an ambitious company for many years. Also, despite all the shortcomings and difficulties of reviewing domestic industries, our emphasis was on the domestic analysis and cases so that the audience has more sense of brands, their position, and competition between them and a better view of Iran's automotive atmosphere.

Jun 2021

Digikala Distribution Centers Dashboard

Details

A dashboard was created for the biggest E-commerce company in Iran that showed all logistics information of that company in 2020.

The dashboard indicates the information about all distribution centers of Digikala.

Jun 2021

Simulating a System of Three Part Making Machines Using Python

Details

"Principles of Simulation" course project

The system consists of three separate machines working in a workshop. All events and the relation of the machines have been defined and coded with python. The whole simulation runs for 9 hours for ten days.

May 2021

Business Plan: Planting & Cultivating Pitaya in South East of Iran

Details

"Entrepreneurship and Business Planning" course project

Abstract: Drought and water shortages, as well as lack of capital in the agricultural sector have left the agricultural lands of south eastern regions of Iran barren and the migration of farmers to the surrounding cities. Meanwhile, "Sistan and Baluchestan" province has tremendous potential for the development of the agricultural sector and the planting of tropical fruits.

In this project, among all tropical fruits, Pitaya plant was selected as the most suitable plant for planting and breeding in agricultural lands around the central river of "Sistan and Baluchestan" province due to its abundant nutrients, resistance to dehydration conditions, adaptation to hot and dry climates and growth in arid and semi-arid regions. Due to its rapid growth and fruiting ability 5 to 6 times a year after the first year and when the seedlings are fully grown, this plant has an early return on investment of less than a year and investing in this plant is highly economical. The main idea of this project is planting and cultivating Pitaya and preparing processed products of its fruit, which is known as Dragon Fruit.

The distinctive goal and strategy of the project is to develop local participation and create employment for the villagers, especially women. In order to achieve this goal, the contract and partnership with the farmer will contribute to the net profit of the project. It should be noted that the drip irrigation system is intended to be used. The innovation in the irrigation system is considered an advantage of the project.

Dec 2020

Financial Database

Details

"Modeling and Database" course project

Designed a sample database in SQL server for the "Finance Department of SAIPA" which stores the information of sales and returns. The sample data were imported from Microsoft Access.

Oct 2020

TSE Stocks Analysis

Details

Tehran stocks market analysis done on all available stock tickers.

The data was derived from official TSE (Tehran Stock Exchange) website by using a data scraping tool specifically developed for TSE tickers data (The updated version of this dataset is published in my Kaggle profile. The link to this dataset is provided below). Using machine learning methods, tickers' behavior during future 7 days were predicted. This prediction makes a suggestion on buying, selling, or keeping the stocks. Dataset link: https://www.kaggle.com/paymanfara/tse-stocks-data

Sep 2020

Vehicle Specification Clustering

Details

Using k-means algorithm with the aid of PCA analysis, a segmentation was done based on all available cars' specifications in Iran's market.

Due to changes in the price of cars in the Iranian market and also the removal of some brands from the market, a new classification and segmentation based on current information from the market and the vehicle specifications was done using machine learning methods.

Aug 2020

SAIPA Financial Dashboard

Details

A dashboard was created for the biggest automobile company in Iran which showed all financial information of that company between 2015 and 2020

During my internship at the "Center of Strategic Studies of SAIPA", I was assigned to complete one of the center's unfinished projects. This project was to design a management dashboard of financial information of SAIPA subsidiaries using Microsoft Power BI. The information in this dashboard includes the financial statements of SAIPA and its subsidiaries from 2015 to 2020, which were collected by the Finance Department. After the completion of the project, the results were presented to the senior management of the center during a working meeting.

Dec 2019

Control Charts for Saipa-X200 Auto Parts.

Details

Control charts for Saipa-X200 auto parts in year 2019 with three control approaches

Using the data from 11/2019 provided to us by the "Department of Quality Control of SAIPA", the quality control charts of Saipa-X200 auto parts is drawn (with three different approaches) by Microsoft Excel and its analysis is presented using Minitab.

Contact Me

Contact