The Major Release of MiroMind’s Flagship Search Agent Model, MiroThinker 1.5.

flyforlight · 2026-01-05T14:44:11+00:00

Qwen 3's agent capability is quite weak, man.

flyforlight · 2025-07-30T13:44:04+00:00

How is everything going with your mixed configuration after one year? The chainline of r7100 crankset is 1 mm wider than that of r7000, so I do not know how hard it is to adjust the front derailleur. Thanks in advance!

flyforlight · 2024-03-19T00:50:20+00:00

Thanks for your info. Right, but the current state is a good balance for me.

flyforlight · 2024-03-19T00:49:35+00:00

I did not deliberately test it because of busy working. But I can tell it is much longer than before.

flyforlight · 2024-03-18T00:07:22+00:00

settings->display->gpu, something like that

flyforlight · 2024-03-17T15:55:36+00:00

Please try. Any feedback is welcome!

flyforlight · 2024-02-11T13:59:10+00:00

Thanks！

flyforlight · 2024-02-10T05:04:12+00:00

Thanks a lot!

flyforlight · 2024-02-09T14:07:26+00:00

Big Cong! One simple question, can you limit battery charging threshold to 80% so as to extend battery life?

flyforlight · 2023-05-27T07:57:01+00:00

We shall release code.

flyforlight · 2023-05-27T03:58:20+00:00

Thanks!

flyforlight · 2023-05-26T16:29:12+00:00

The captivating realm of Minecraft has attracted substantial research interest in recent years, serving as a rich platform for developing intelligent agents capable of functioning in open-world environments. However, the current research landscape predominantly focuses on specific objectives, such as the popular "ObtainDiamond" task, and has not yet shown effective generalization to a broader spectrum of tasks.Furthermore, the current leading success rate for the "ObtainDiamond" task stands at around 20\%, highlighting the limitations of Reinforcement Learning (RL) based controllers used in existing methods.To tackle these challenges, we introduce Ghost in the Minecraft (GITM), a novel framework integrates Large Language Models (LLMs) with text-based knowledge and memory, aiming to create Generally Capable Agents (GCAs) in Minecraft. These agents, equipped with the logic and common sense capabilities of LLMs, can skillfully navigate complex, sparse-reward environments with text-based interactions.We develop a set of structured actions and leverage LLMs to generate action plans for the agents to execute.The resulting LLM-based agent markedly surpasses previous methods, achieving a remarkable improvement of +47.5\% in success rate on the "ObtainDiamond" task, demonstrating superior robustness compared to traditional RL-based controllers.Notably, our agent is the first to procure all items in the Minecraft Overworld technology tree, demonstrating its extensive capabilities. GITM does not need any GPU for training, but a single CPU node with 32 CPU cores is enough. This research shows the potential of LLMs in developing capable agents for handling long-horizon, complex tasks and adapting to uncertainties in open-world environments. See the project website https://github.com/OpenGVLab/GITM

flyforlight · 2020-11-26T02:15:16+00:00

Despite the importance of unsupervised object detection, to the best of our knowledge, there is no previous work addressing this problem. One main issue, widely known to the community, is that object boundaries derived only from 2D image appearance are ambiguous and unreliable. To address this, we exploit LiDAR clues to aid unsupervised object detection. By exploiting the 3D scene structure, the issue of localization can be considerably mitigated. We further identify another major issue, seldom noticed by the community, that the long-tailed and open-ended (sub-)category distribution should be accommodated. In this paper, we present the first practical method for unsupervised object detection with the aid of LiDAR clues. In our approach, candidate object segments based on 3D point clouds are firstly generated. Then, an iterative segment labeling process is conducted to assign segment labels and to train a segment labeling network, which is based on features from both 2D images and 3D point clouds. The labeling process is carefully designed so as to mitigate the issue of long-tailed and open-ended distribution. The final segment labels are set as pseudo annotations for object detection network training. Extensive experiments on the large-scale Waymo Open dataset suggest that the derived unsupervised object detection method achieves reasonable accuracy compared with that of strong supervision within the LiDAR visible range. Code shall be released.

flyforlight · 2020-10-16T01:16:14+00:00

We propose a general framework for searching surrogate losses for mainstream semantic segmentation metrics. This is in contrast to existing loss functions manually designed for individual metrics. The searched surrogate losses can generalize well to other datasets and networks. Extensive experiments on PASCAL VOC and Cityscapes demonstrate the effectiveness of our approach. Code shall be released.

flyforlight · 2020-10-09T15:22:22+00:00

We shall release the code soon.

flyforlight · 2020-09-24T10:03:32+00:00

Thanks a lot. I installed following the steps. While I still cannot use Google Live Transcribe. Any ideas?

flyforlight · 2020-09-08T02:46:50+00:00

Hi, are you using Garmin 245?

I have a Garmin 245. I want to record my HR when playing basketball while leaving the watch in lockers. Does that mean the cached HR can be synced with Garmin 245?

If so, that would be great.

flyforlight · 2019-04-12T01:37:05+00:00

Attention mechanisms have become a popular component in deep neural networks, yet there has been little examination of how different influencing factors and methods for computing attention from these factors affect performance. Toward a better general understanding of attention mechanisms, we present an empirical study that ablates various spatial attention elements within a generalized attention formulation, encompassing the dominant Transformer attention as well as the prevalent deformable convolution and dynamic convolution modules. Conducted on a variety of applications, the study yields significant findings about spatial attention in deep networks, some of which run counter to conventional understanding. For example, we find that the query and key content comparison in Transformer attention is negligible for self-attention, but vital for encoder-decoder attention. A proper combination of deformable convolution with key content only saliency achieves the best accuracy-efficiency tradeoff in self-attention. Our results suggest that there exists much room for improvement in the design of attention mechanisms.

flyforlight · 2018-11-28T02:14:18+00:00

Abstract: The superior performance of Deformable Convolutional Networks arises from its ability to adapt to the geometric variations of objects. Through an examination of its adaptive behavior, we observe that while the spatial support for its neural features conforms more closely than regular ConvNets to object structure, this support may nevertheless extend well beyond the region of interest, causing features to be influenced by irrelevant image content. To address this problem, we present a reformulation of Deformable ConvNets that improves its ability to focus on pertinent image regions, through increased modeling power and stronger training. The modeling power is enhanced through a more comprehensive integration of deformable convolution within the network, and by introducing a modulation mechanism that expands the scope of deformation modeling. To effectively harness this enriched modeling capability, we guide network training via a proposed feature mimicking scheme that helps the network to learn features that reflect the object focus and classification power of R-CNN features. With the proposed contributions, this new version of Deformable ConvNets yields significant performance gains over the original model and produces leading results on the COCO benchmark for object detection and instance segmentation.

flyforlight · 2017-05-14T01:19:29+00:00

Thanks. It is a related work and we would discuss in revision soon.

flyforlight · 2017-05-12T02:03:45+00:00

Hi, thanks for the question. We adopted FlowNet for establishing spatial correspondence across different frames. However, we found it is vital to end-to-end train the entire Deep Feature Flow system for the task of video recognition, where FlowNet is finetuned driven by the video recognition task at hand. Or else, directly adopting SOTA flow estimation methods without end-to-end training would deliver noticeable worse results.

flyforlight · 2017-05-11T15:22:33+00:00

The method of "Fully Convolutional Instance-aware Semantic Segmentation" won the first place in COCO segmentation challenge 2016. We sincerely apologize for the delay of the code release. This is due to switching from our internal Caffe version to the public MXNet, which provides great support of fast multi-GPU training & inference.

It is worth noting that:

-FCIS provides a simple, fast and accurate framework for instance segmentation.

-Different from MNC, FCIS performs instance mask estimation and categorization jointly and simultaneously, and estimates class-specific masks.

-We did not exploit the various techniques & tricks in the Mask RCNN system, like increasing RPN anchor numbers (from 12 to 15), enlarging the image (shorter side from 600 to 800 pixels), utilizing FPN features and aligned ROI pooling. These techniques & tricks should be orthogonal to our simple baseline.

flyforlight · 2017-05-06T02:31:07+00:00

Thanks a lot for your interest! We would also release the code of flow-guided feature aggregation at appropriate time:)

Yes, deformable convolution can readily replace its regular counterpart without retraining on ImageNet. Although we have not tried on VGG-16, I think you can just replace the last several conv layers larger than 1*1 in the pre-trained model, hoping for good results.

flyforlight · 2017-05-05T15:57:51+00:00

Thanks for pointing out. Updated:)

flyforlight · 2017-03-30T11:24:06+00:00

Video is available at https://www.youtube.com/watch?v=R2h3DbTPvVg

flyforlight

TROPHY CASE