AI programmer Devin works undercover to fix bugs in a group! Chatting with CTO about technology, netizen: top coder level
The dreamy west wind comes from the Aofei Temple
Qubit | Official account QbitAI
Devin , the first AI programmer , appeared in the internal group of a star startup company.
To solve a technical problem, Devin borrowed the account of its creator, communicated with the CTO of the client company , and adjusted the code plan based on the response.
The conversation was so professional that onlookers said that the world is too crazy.
The incident happened on the office software Slack, and the akshat in the screenshot is Akshat Bubna, CTO of the AI infrastructure startup Modal Labs .
Modal Labs is also one of the first customers of Devin developer Cognition.
At this time, Devin is wearing the vest of one of his creators, IOI gold medalist Steven Hao .
The conversation began with Devin, an AI programmer, asking about the lifecycle of keys for the Modal Lab platform, specifically the time it takes for a key to be updated and propagated to running applications.
Devin said that he has reviewed the documentation , including the key and environment variable guide, CLI command reference, API reference, and container life cycle hooks and parameters, but still did not find clear information about key propagation time .
Devin asked how long it typically takes for updated keys to be used by running applications, as this is critical to their operations and knowing this will help manage their deployment process .
The human CTO explained that when keys are updated, they will not invalidate already running Modal containers, but newly launched containers will read the updated values.
Devin was grateful for this and decided to temporarily adopt a manual method to manage keys in Modal, that is, calling the modal deploy command to trigger the restart of the relevant application container when needed .
After watching the entire process, Raunak Chowdhuri, who is also an AI entrepreneur, commented:
Finding problems, creating tickets, and tweaking code is how the best human developers work.
More test results from Devin
Not many people and companies have obtained Devin's early test qualifications, but some people are still publishing actual test results one after another.
After trying it, Wharton Professor Ethan Molick, who is passionate about AI , believes that its novel real-time interaction method is the most worthy of attention.
You can "talk" to it at any time, just like a human, and it will constantly execute and debug your ideas in the background.
In a test, Ethan Mollick asked Devin to develop a website explaining "dilution in startup financing."
However, he revealed that AI is not yet able to complete this work autonomously and error-free without any help.
There's still a long way to go before we can hand over a major project to artificial intelligence, but it's still a fascinating start.
Mckay Wrigley, another entrepreneur who posted the test process, was even more excited.
In the 27-minute test he posted, only a GitHub connection was sent, allowing Devin to deploy code from open source projects.
Devin independently breaks down the task into a series of sub-steps and starts executing them step by step.
During the execution process, Devin encountered obstacles when installing the Supabase database, so he opened the corresponding Github repository and started to check the documents ...
It can be seen from the subsequent terminal feedback that Devin found out what should be filled in for various ports and keys required to run Supabase.
(Everyone who has pretended to do so knows that feeding birds is quite troublesome...)
At the same time, Devin is constantly revising his follow-up plans based on the actual situation .
After a while, a local chatbot program started running.
After testing for a period of time, Mckay Wrigley believed that Devin could already calculate the Agent's ChatGPT moment.
Reproduce Devin plan ing
Everyone on Devin's side is still testing continuously, and on the other side, the open source "reproduction" plan is also in progress...
No, the GitHub 30,000 Star project MetaGPT has launched a new “open source version of Devin” .
Named Data Interpreter :
Like Devin, Data Interpreter can also implement autonomous programming, iteratively observe data, predict and analyze disease progression, and machine operating status; it can also build machine learning models, perform mathematical reasoning, automatically reply to emails, and imitate websites...
For example, analyze the closing price trend from NVIDIA stock price data:
Analyzing data to predict wine quality:
In addition, Binyan Hui, a member of Alibaba Qwen, and others started the OpenDevin project, which has just started and has received 1.2k Stars.
Binyan Hui tweeted that there was a preliminary roadmap and a group of excellent people working hard to complete the front-end prototype in a short period of time.
At the same time, the project team is also recruiting new members:
In addition, a team called Maisa AI launched Maisa KPU (Knowledge Processing Unit), which was considered by netizens to have some competition with Devin.
Maisa KPU is currently in the testing stage, it can solve complex problems and reasoning. The benchmark results released by the team are as follows:
According to the demo, KPU can become an "intelligent customer service" to help customers solve the problem of undelivered orders when the customer does not write down the order number correctly:
Devin benchmark technical report released
Recently, Devin's founding team Cognition also released a technical report on SWE-bench testing.
In addition to the previously announced test results, the team also revealed some new information.
For example, one of Cognition's goals is to enable Devin, an AI agent specializing in software development, to successfully contribute code to large, complex code bases.
We chose to run the agent end-to-end on SWE-bench because it is closer to real-world software development.
In addition, the R&D team also revealed that in order to prevent Devin from cheating in the test, such as looking for external pull request information, the test has been set up to ensure that Devin cannot access relevant information, and the Devin operation has been manually checked during the process. .
Finally, the team emphasized that Devin is still in its infancy and there is still a lot of room for improvement:
Families interested in more details can view the report details.
Less than a week after Devin was released, the discussion among netizens is already very heated.
For example, this eldest brother said that what he was worried about a year ago finally happened.
From now on, Stack Overflow will be filled with Devins asking questions, and people will have to be squeezed out (Stack Overflow is in danger!!!) :
Some netizens responded (manual dog head) :
They can answer each other's questions.
Some netizens discovered that Cognition, the team behind Devin, was recruiting full-time software engineers, so they slowly raised a question mark:
Shouldn't Devin be filling these positions to save them money?
Finally, if Devin becomes public, what would you like to do with it?
Reference links:
[1]
https://www.cognition-labs.com/post/swe-bench-technical-report
[2]
https://x.com/raunakdoesdev/status/1769066769786757375
[3]
https:// twitter.com/emollick/status/1768742585122558063
[4]
https://x.com/mckaywrigley/status/1767985840448516343
[5]
https://x.com/maisaAI_/status/1768657114669429103?s=20
-over-
Registration is underway!
AIGC companies & products worthy of attention in 2024
Qubits is selecting the most noteworthy AIGC companies in 2024 and the most anticipated AIGC products in 2024. Welcome to register for the selection !
Registration for selection ends March 31, 2024
The China AIGC Industry Summit is currently under preparation. To learn more, please click: In the Sora era, how should we pay attention to new applications? All at China AIGC Industry Summit
For business cooperation, please contact WeChat: 18600164356 Xu Feng
For event cooperation, please contact WeChat: 18801103170 Wang Linyu
Click here ???? Follow me and remember to mark it with a star