About
I always thought my dream…
Experience & Education
Google
View Toh’s full experience
See their title, tenure and more.
or
By clicking Continue to join or sign in, you agree to LinkedIn’sUser Agreement,Privacy Policy, andCookie Policy.
Publications
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
NeurIPS 2024
See publicationAutonomous agents that accomplish complex computer tasks with minimal human interventions have the potential to transform human-computer interaction, significantly enhancing accessibility and productivity. However, existing benchmarks either lack an interactive environment or are limited to environments specific to certain applications or domains, failing to reflect the diverse and complex nature of real-world computer use, thereby limiting the scope of tasks and agent scalability. To address…
Autonomous agents that accomplish complex computer tasks with minimal human interventions have the potential to transform human-computer interaction, significantly enhancing accessibility and productivity. However, existing benchmarks either lack an interactive environment or are limited to environments specific to certain applications or domains, failing to reflect the diverse and complex nature of real-world computer use, thereby limiting the scope of tasks and agent scalability. To address this issue, we introduce OSWorld, the first-of-its-kind scalable, real computer environment for multimodal agents, supporting task setup, execution-based evaluation, and interactive learning across various operating systems such as Ubuntu, Windows, and macOS. OSWorld can serve as a unified, integrated computer environment for assessing open-ended computer tasks that involve arbitrary applications. Building upon OSWorld, we create a benchmark of 369 computer tasks involving real web and desktop apps in open domains, OS file I/O, and workflows spanning multiple applications. Each task example is derived from real-world computer use cases and includes a detailed initial state setup configuration and a custom execution-based evaluation script for reliable, reproducible evaluation. Extensive evaluation of state-of-the-art LLM/VLM-based agents on OSWorld reveals significant deficiencies in their ability to serve as computer assistants. While humans can accomplish over 72.36% of the tasks, the best model achieves only 12.24% success, primarily struggling with GUI grounding and operational knowledge. Comprehensive analysis using OSWorld provides valuable insights for developing multimodal generalist agents that were not possible with previous benchmarks. Our code, environment, baseline models, and data are publicly available at https://os-world.github.io/.
OpenAgents: An Open Platform for Language Agents in the Wild
COLM 2024
See publicationLanguage agents show potential in being capable of utilizing natural language for varied and intricate tasks in diverse environments, particularly when built upon large language models (LLMs). Current language agent frameworks aim to facilitate the construction of proof-of-concept language agents while neglecting the non-expert user access to agents and paying little attention to application-level designs. We present OpenAgents, an open platform for using and hosting language agents in the wild…
Language agents show potential in being capable of utilizing natural language for varied and intricate tasks in diverse environments, particularly when built upon large language models (LLMs). Current language agent frameworks aim to facilitate the construction of proof-of-concept language agents while neglecting the non-expert user access to agents and paying little attention to application-level designs. We present OpenAgents, an open platform for using and hosting language agents in the wild of everyday life. OpenAgents includes three agents: (1) Data Agent for data analysis with Python/SQL and data tools; (2) Plugins Agent with 200+ daily API tools; (3) Web Agent for autonomous web browsing. OpenAgents enables general users to interact with agent functionalities through a web user interface optimized for swift responses and common failures while offering developers and researchers a seamless deployment experience on local setups, providing a foundation for crafting innovative language agents and facilitating real-world evaluations. We elucidate the challenges and opportunities, aspiring to set a foundation for future research and development of real-world language agents
Projects
Better ChatGPT
See projectPlay and chat smarter with BetterChatGPT - an amazing open-source web app with a better UI for exploring OpenAI's ChatGPT API!
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
-
Autonomous agents that accomplish complex computer tasks with minimal human interventions have the potential to transform human-computer interaction, significantly enhancing accessibility and productivity. However, existing benchmarks either lack an interactive environment or are limited to environments specific to certain applications or domains, failing to reflect the diverse and complex nature of real-world computer use, thereby limiting the scope of tasks and agent scalability. To address…
Autonomous agents that accomplish complex computer tasks with minimal human interventions have the potential to transform human-computer interaction, significantly enhancing accessibility and productivity. However, existing benchmarks either lack an interactive environment or are limited to environments specific to certain applications or domains, failing to reflect the diverse and complex nature of real-world computer use, thereby limiting the scope of tasks and agent scalability. To address this issue, we introduce OSWorld, the first-of-its-kind scalable, real computer environment for multimodal agents, supporting task setup, execution-based evaluation, and interactive learning across various operating systems such as Ubuntu, Windows, and macOS. OSWorld can serve as a unified, integrated computer environment for assessing open-ended computer tasks that involve arbitrary applications. Building upon OSWorld, we create a benchmark of 369 computer tasks involving real web and desktop apps in open domains, OS file I/O, and workflows spanning multiple applications. Each task example is derived from real-world computer use cases and includes a detailed initial state setup configuration and a custom execution-based evaluation script for reliable, reproducible evaluation. Extensive evaluation of state-of-the-art LLM/VLM-based agents on OSWorld reveals significant deficiencies in their ability to serve as computer assistants. While humans can accomplish over 72.36% of the tasks, the best model achieves only 12.24% success, primarily struggling with GUI grounding and operational knowledge. Comprehensive analysis using OSWorld provides valuable insights for developing multimodal generalist agents that were not possible with previous benchmarks. Our code, environment, baseline models, and data are publicly available at https://os-world.github.io/.
XLang Agents / OpenAgents
-
XLang Agents are language model agents developed by our team, aiming to utilize a range of tools to enhance their capabilities, serving as user-centric intelligent agents. Currently the XLang Agents supports three different agents focusing on different application scenarios, including:
- Data Agent: This agent is skilled in data tools, allowing efficient data search, manipulation, and visualization. It excels in code execution for data-centric tasks.
- Plugins Agent: With over 200…XLang Agents are language model agents developed by our team, aiming to utilize a range of tools to enhance their capabilities, serving as user-centric intelligent agents. Currently the XLang Agents supports three different agents focusing on different application scenarios, including:
- Data Agent: This agent is skilled in data tools, allowing efficient data search, manipulation, and visualization. It excels in code execution for data-centric tasks.
- Plugins Agent: With over 200 third-party plugins, this agent addresses diverse daily life needs, aiding in various tasks.
- Web Agent: Utilizing a Chrome extension, this agent automates web navigation, streamlining browsing to find and access information.
- Robotic Agent: comming soon!ZilKin
-
As a highly accomplished and select member of the APAC region, I was honored to be chosen as one of only 20 undergraduate students to participate in the prestigious ZilHive Student Practicum. This intensive mentorship program, focused on cutting-edge blockchain, Web3 development, and building on the advanced Zilliqa ecosystem, provided me with invaluable opportunities to further develop my skills and knowledge in this field.
Working in a highly skilled and effective team of 2, I played a…As a highly accomplished and select member of the APAC region, I was honored to be chosen as one of only 20 undergraduate students to participate in the prestigious ZilHive Student Practicum. This intensive mentorship program, focused on cutting-edge blockchain, Web3 development, and building on the advanced Zilliqa ecosystem, provided me with invaluable opportunities to further develop my skills and knowledge in this field.
Working in a highly skilled and effective team of 2, I played a key role in creating a revolutionary Scilla smart contracts deployment tool that empowers developers to effortlessly deploy their Scilla contracts via our innovative Contracts Wizard (Interactive Code Generator) or Automatic Contract Deployment. This ground-breaking tool has been instrumental in driving the development of the blockchain ecosystem and solidifying my reputation as a leading expert in the field.
After months of hard work, I am proud to present to you Zilkin, a cutting-edge blockchain application built on the advanced Zilliqa ecosystem. Leveraging the power of ReactJS, TypeScript, and TailwindCSS, my team and I have skillfully crafted an intuitive, user-friendly interface that seamlessly integrates with Scilla, a powerful smart contract language. The result is a revolutionary application that has set a new standard for blockchain technology, and is poised to disrupt the industry.
I invite you to explore the project's official website at https://zilkin.tjh.sg/ and view the source code on Github at https://github.com/xJQx/zilkin. The level of expertise, attention to detail, and technical proficiency that went into creating this project is evident in every aspect and I am confident that you will be impressed with the results. This project is a testament to my ability to deliver high-quality and innovative software solutions that push the boundaries of what is possible.Other creatorsSee projectByteVid - Hackathon 1st Place
-
- Developed for NTU MLDA Deep Learning Week Hackathon in 48 hours (1/10/2022 - 3/10/2022).
- Achieved 1st place out of 120 teams.
- Detailed explanation of our solution: https://me.tjh.sg/blog/bytevid
𝐏𝐫𝐨𝐣𝐞𝐜𝐭 𝐃𝐞𝐬𝐜𝐫𝐢𝐩𝐭𝐢𝐨𝐧:
Say goodbye to long and boring videos! 👋
Powered by the cutting-edge deep learning technologies in 2022, ByteVid transforms long, boring videos into fun byte-sized content.
Be it a one hour long lecture, or a 30-minute zoom…- Developed for NTU MLDA Deep Learning Week Hackathon in 48 hours (1/10/2022 - 3/10/2022).
- Achieved 1st place out of 120 teams.
- Detailed explanation of our solution: https://me.tjh.sg/blog/bytevid
𝐏𝐫𝐨𝐣𝐞𝐜𝐭 𝐃𝐞𝐬𝐜𝐫𝐢𝐩𝐭𝐢𝐨𝐧:
Say goodbye to long and boring videos! 👋
Powered by the cutting-edge deep learning technologies in 2022, ByteVid transforms long, boring videos into fun byte-sized content.
Be it a one hour long lecture, or a 30-minute zoom meeting, ByteVid can transcribe, summarise the content, extract keywords, detect and extract important slides from the video, and translate into other languages.
𝐅𝐫𝐨𝐧𝐭𝐞𝐧𝐝
- React.js
- Tailwind CSS
- Deploy on GitHub pages
𝐁𝐚𝐜𝐤𝐞𝐧𝐝
- Flask server
- Deploy on a GPU machine
- Relay to an Internet-facing VPS
- Nginx reverse proxy
- Cloudflare protection
𝐃𝐞𝐞𝐩 𝐋𝐞𝐚𝐫𝐧𝐢𝐧𝐠
- Whisper: SOTA speech recognition (Sep 2022)
- YOLOv7: SOTA object detection (Jul 2022)
- KBIR-inspec: key phrase extraction (Dec 2021)
- Bert Extractive Summarizer: summarisation (Jun 2019)
- BlingFire: sentence extraction
- Baidu Translate API: translation
𝐓𝐨𝐨𝐥𝐬
- OpenCV
- youtube-dl
- ffmpeg
𝐏𝐫𝐨𝐣𝐞𝐜𝐭 𝐥𝐢𝐧𝐤𝐬:
- https://devpost.com/software/bytevid
- https://github.com/ayaka14732/ByteVid
- https://xjqx.github.io/ByteVidFrontend/
- https://github.com/ztjhz/ByteVidExtension
- https://github.com/ztjhz/yolov7-slides-extractionOther creatorsSee projectWord Piece Tokenizer
-
See project- I developed a python library that implements a modified, lightweight version of HuggingFace’s BERT Tokenizer in pure python.
- My tokenizer was on average, 57% faster than HuggingFace's BERT Tokenizer.
- Developed for an in-browser Transformer Attention Visualiser: https://github.com/ayaka14732/TrAVisAniFame
-
This project aims to maximize studios’ profits on anime they produce by estimating 'mean' rating of anime and predicting 'success' probability before production, hence giving studios the ability to fine-tune the anime before production
𝗗𝗮𝘁𝗮 𝗰𝗼𝗹𝗹𝗲𝗰𝘁𝗶𝗼𝗻:
- MyAnimeList API
𝗗𝗮𝘁𝗮 𝗰𝗹𝗲𝗮𝗻𝗶𝗻𝗴 𝗮𝗻𝗱 𝗽𝗿𝗲𝗽𝗿𝗼𝗰𝗲𝘀𝘀𝗶𝗻𝗴:
- JSON conversion and manipulation
- Feature engineering and generation
- One-hot Encoding
𝗘𝗗𝗔 &…This project aims to maximize studios’ profits on anime they produce by estimating 'mean' rating of anime and predicting 'success' probability before production, hence giving studios the ability to fine-tune the anime before production
𝗗𝗮𝘁𝗮 𝗰𝗼𝗹𝗹𝗲𝗰𝘁𝗶𝗼𝗻:
- MyAnimeList API
𝗗𝗮𝘁𝗮 𝗰𝗹𝗲𝗮𝗻𝗶𝗻𝗴 𝗮𝗻𝗱 𝗽𝗿𝗲𝗽𝗿𝗼𝗰𝗲𝘀𝘀𝗶𝗻𝗴:
- JSON conversion and manipulation
- Feature engineering and generation
- One-hot Encoding
𝗘𝗗𝗔 & 𝗩𝗶𝘀𝘂𝗮𝗹𝗶𝘇𝗮𝘁𝗶𝗼𝗻:
- Matplotlib, Seaborn
𝗥𝗲𝗴𝗿𝗲𝘀𝘀𝗶𝗼𝗻:
- Linear regression
- Lasso regression
- Ridge regression
𝗖𝗹𝗮𝘀𝘀𝗶𝗳𝗶𝗰𝗮𝘁𝗶𝗼𝗻:
- LinearSVC
- Decision Tree
- Random ForestOther creatorsSee projectPokeApp
-
See project- Developed full stack web application with React.js, Redux, Django, and Django-Rest-Framework; Utilized Djoser and JWTAuthentication for authentication.
- Deployed frontend and backend to AWS EC2, and database to AWS RDS (postgresql)Maintenance Checklist Application
-
📑 𝗢𝘃𝗲𝗿𝘃𝗶𝗲𝘄 𝗼𝗳 𝗖𝗼𝗻𝘁𝗿𝗶𝗯𝘂𝘁𝗶𝗼𝗻𝘀:
- Led a team of 3 to create a 𝗠𝗮𝗶𝗻𝘁𝗲𝗻𝗮𝗻𝗰𝗲 𝗖𝗵𝗲𝗰𝗸𝗹𝗶𝘀𝘁 𝗮𝗽𝗽 through the entire 𝗮𝗽𝗽𝗹𝗶𝗰𝗮𝘁𝗶𝗼𝗻 𝗱𝗲𝘃𝗲𝗹𝗼𝗽𝗺𝗲𝗻𝘁 𝗹𝗶𝗳𝗲 𝗰𝘆𝗰𝗹𝗲
👨💻 𝗧𝗲𝗰𝗵𝗻𝗶𝗰𝗮𝗹 𝗖𝗼𝗻𝘁𝗿𝗶𝗯𝘂𝘁𝗶𝗼𝗻𝘀:
- Designed and 𝗼𝗽𝘁𝗶𝗺𝗶𝘀𝗲𝗱 𝗿𝗲𝗹𝗮𝘁𝗶𝗼𝗻𝗮𝗹 𝗱𝗮𝘁𝗮𝗯𝗮𝘀𝗲 to reduce redundancy
- Developed frontend application using 𝗣𝗼𝘄𝗲𝗿𝗔𝗽𝗽𝘀
- Performed rigorous testing to identify and fix…📑 𝗢𝘃𝗲𝗿𝘃𝗶𝗲𝘄 𝗼𝗳 𝗖𝗼𝗻𝘁𝗿𝗶𝗯𝘂𝘁𝗶𝗼𝗻𝘀:
- Led a team of 3 to create a 𝗠𝗮𝗶𝗻𝘁𝗲𝗻𝗮𝗻𝗰𝗲 𝗖𝗵𝗲𝗰𝗸𝗹𝗶𝘀𝘁 𝗮𝗽𝗽 through the entire 𝗮𝗽𝗽𝗹𝗶𝗰𝗮𝘁𝗶𝗼𝗻 𝗱𝗲𝘃𝗲𝗹𝗼𝗽𝗺𝗲𝗻𝘁 𝗹𝗶𝗳𝗲 𝗰𝘆𝗰𝗹𝗲
👨💻 𝗧𝗲𝗰𝗵𝗻𝗶𝗰𝗮𝗹 𝗖𝗼𝗻𝘁𝗿𝗶𝗯𝘂𝘁𝗶𝗼𝗻𝘀:
- Designed and 𝗼𝗽𝘁𝗶𝗺𝗶𝘀𝗲𝗱 𝗿𝗲𝗹𝗮𝘁𝗶𝗼𝗻𝗮𝗹 𝗱𝗮𝘁𝗮𝗯𝗮𝘀𝗲 to reduce redundancy
- Developed frontend application using 𝗣𝗼𝘄𝗲𝗿𝗔𝗽𝗽𝘀
- Performed rigorous testing to identify and fix bugs
📝 𝗢𝘁𝗵𝗲𝗿 𝗖𝗼𝗻𝘁𝗿𝗶𝗯𝘂𝘁𝗶𝗼𝗻𝘀:
- Published 𝗰𝗹𝗲𝗮𝗿 𝗮𝗻𝗱 𝗲𝗳𝗳𝗲𝗰𝘁𝗶𝘃𝗲 𝗱𝗼𝗰𝘂𝗺𝗲𝗻𝘁𝗮𝘁𝗶𝗼𝗻𝘀 for users and developers
- 𝗖𝗼𝗹𝗹𝗮𝗯𝗼𝗿𝗮𝘁𝗲𝗱 engineers to enhance user experience.
- Organized meetings with stakeholders
- Pitched final product to 10 engineers and HR staff
🥇 𝗥𝗲𝘀𝘂𝗹𝘁:
- Improved maintenance efficiency by 𝟮𝟱%
- Reduced paper wastage by 𝟭𝟱%
- Exceeded expectations and received outstanding commendation and monetary award
📚 𝗧𝗲𝗰𝗵 𝗦𝘁𝗮𝗰𝗸:
- PowerApps, PowerFx, PowerAutomateOther creatorsSee projectNFT Web Blockchain Gaming Platform - Zendodo Mission Craft
-
👨💻 𝗖𝗼𝗻𝘁𝗿𝗶𝗯𝘂𝘁𝗶𝗼𝗻𝘀:
- Developed a blockchain NFT gaming web application (Mission Craft) in 𝗧𝘆𝗽𝗲𝘀𝗰𝗿𝗶𝗽𝘁, 𝗦𝗔𝗦𝗦 and 𝗡𝗲𝘅𝘁.𝗷𝘀 (𝗥𝗲𝗮𝗰𝘁)
- Optimized data retrieval from blockchain API
- Engineered reusable components to speed up development efficiency and improve code quality
- Optimized 𝘀𝘁𝗮𝘁𝗲 𝗺𝗮𝗻𝗮𝗴𝗲𝗺𝗲𝗻𝘁 using 𝘇𝘂𝘀𝘁𝗮𝗻𝗱 and 𝗿𝗲𝗮𝗰𝘁 𝘀𝘁𝗮𝘁𝗲 𝗵𝗼𝗼𝗸𝘀
📚 𝗧𝗲𝗰𝗵 𝗦𝘁𝗮𝗰𝗸:
- SASS, TypeScript
- Next.js (React)…👨💻 𝗖𝗼𝗻𝘁𝗿𝗶𝗯𝘂𝘁𝗶𝗼𝗻𝘀:
- Developed a blockchain NFT gaming web application (Mission Craft) in 𝗧𝘆𝗽𝗲𝘀𝗰𝗿𝗶𝗽𝘁, 𝗦𝗔𝗦𝗦 and 𝗡𝗲𝘅𝘁.𝗷𝘀 (𝗥𝗲𝗮𝗰𝘁)
- Optimized data retrieval from blockchain API
- Engineered reusable components to speed up development efficiency and improve code quality
- Optimized 𝘀𝘁𝗮𝘁𝗲 𝗺𝗮𝗻𝗮𝗴𝗲𝗺𝗲𝗻𝘁 using 𝘇𝘂𝘀𝘁𝗮𝗻𝗱 and 𝗿𝗲𝗮𝗰𝘁 𝘀𝘁𝗮𝘁𝗲 𝗵𝗼𝗼𝗸𝘀
📚 𝗧𝗲𝗰𝗵 𝗦𝘁𝗮𝗰𝗸:
- SASS, TypeScript
- Next.js (React), zustand
- Blockchain API (atomicassests API, dfuse API by EOS Nation)Other creatorsSee projectPortfolio Website
-
See project- Developed a website using Next.js (React.js) to feature my projects and experience
- Stored data in MongoDBMemories
-
See project- Developed full stack web application in MongoDB, Express.js, React.js and Node.js
- Designed API to create, edit, delete, like and view posts. (CRUD)
- Implemented login authenticationFood Demand Forecast
-
See project- MLDA Deep Learning Week Hackathon (16/10/2021 - 17/10/2021)
- Cleaned dataset with Python, Pandas and NumPy
- Modelled food demand using Time Series Analysis with statsmodel
- Visualised data with matplotlib, and JavaScript web app
- Predicted future food demand to optimize food inventory and reduce food wastage.Personal Finance Portfolio Tracker
-
See project- Developed portfolio tracker in Python with Pandas, NumPy, Matplotlib and Selenium
- Designed SQL (SQLite) database to store portfolio data
- Utilized JavaScript to visually display portfolio data in a graph.
- Reduced portfolio tracking time by 45%NTU Class Schedule Generator
-
See project- Developed a web application that automates the conversion of text timetable to iCal format in JavaScript
Pathfinding Visualizer
-
See project- Developed a web application that visualizes pathfinding in React
- Implemented popular pathfinding algorithms in JavaScript
- Students are better able to understand these algorithms
Honors & Awards
Indeed Women Coders Contest 2022 8th Place
Indeed
Achieved 8th place
MLDA Deep Learning Week Hackathon 1st Place
MLDA @ EEE and TikTok
Achieved 1st place out of 120 teams and 600+ people
SMU LIT Hackathon Champion and Most Innovative Award
SMU and Rajah & Tann Technologies
Over the weekend, I participated in the SMU Legal Innovation and Technology (LIT) hackathon with these awesome teammates: Martin Liu, Tsien Jin Ong, Vighnesh Ramasamy, Eugene Ho, and Xin Han Chen.
Our solution was to automate the process of extracting relevant economic data from financial statements and drafting excerpts of MD&A section. With this application, we wanted to free up the lawyer's time, so that they can devote their time to more critical work. We created a web application…Over the weekend, I participated in the SMU Legal Innovation and Technology (LIT) hackathon with these awesome teammates: Martin Liu, Tsien Jin Ong, Vighnesh Ramasamy, Eugene Ho, and Xin Han Chen.
Our solution was to automate the process of extracting relevant economic data from financial statements and drafting excerpts of MD&A section. With this application, we wanted to free up the lawyer's time, so that they can devote their time to more critical work. We created a web application using Next.js (React) and TailwindCSS for our frontend, Flask for backend, and MongoDB for database.
It has been an amazing experience working with the team and I am grateful for each and every one of them. I am thankful for the hard work and dedication every one of them have made.
We are proud and humbled to have emerged overall champions of SMU LIT 2022 Hackathon and awarded the Rajah & Tann Technologies Most Innovative Award.
View Toh’s full profile
- See who you know in common
- Get introduced
- Contact Toh directly
Explore collaborative articles
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
Explore More