Shreya for Firecamp

Posted on Oct 19, 2023

🐍🐍 23 Python github issues for Beginners to Foster Development into a Proficient Developer🧑‍💻 🥇

#webdev #hacktoberfest23 #opensource #python

TLDR

I've listed 23 python hacktoberfest issue that helps developers to contribute to opensource projects.

For new opensource contributors, finding a good place to start can often be difficult. The process of sifting through repositories and identifying issues that are suitable for new contributors is time-consuming and frustrating.

To help you start contributing here, I have curated 23 issues.

Let's check them out!

1) Question - add columns to output of bt.run()

Repo : https://github.com/kernc/backtesting.py

Question - add columns to output of bt.run() #1033

finithejernigan posted on Jul 30, 2023

Hi,

Great package for backtesting! One question - is it possible to add additional columns to the output of the backtest (such as the indicator values) at the time of the trades? Alternatively, getting the indicator values output for each bar would also be helpful. I could hack it to do it but wondered if there was an easy way.

For example, adding additional columns for each trade with the value of each indicator?

Thanks so much for the great package,

View on GitHub

2) Parsing error on Duckdb QUALIFY

Repo : https://github.com/sqlfluff/sqlfluff

Parsing error on Duckdb QUALIFY #5051

bgarcevic posted on Aug 04, 2023

This ticket requests several things. Checked when they are done

[x] Union all by name: https://duckdb.org/docs/sql/query_syntax/setops#union-all-by-name
[ ] Pivot: https://duckdb.org/docs/sql/statements/pivot
[ ] Exclude: https://duckdb.org/docs/sql/expressions/star.html#exclude-clause
[ ] Replace: https://duckdb.org/docs/sql/functions/char.html
[ ] Qualify: https://duckdb.org/docs/sql/query_syntax/qualify.html
[X] // is integer division: https://duckdb.org/docs/sql/functions/numeric.html

Search before asking

[X] I searched the issues and found no similar issues.

What Happened

This is a valid duckdb query

with

source as (

    select * from {{ source('danish_parliament', 'raw_aktoer_type') }}
    qualify row_number() over (partition by id order by opdateringsdato desc) = 1
),

renamed as (

    select
        id as actor_type_id,
        type as actor_type,
        opdateringsdato as updated_at
    from source

)

select * from renamed

I get an unparseble error using both dbt and jinja templater.

Expected Behaviour

Should be able to be parsed and not return error

Observed Behaviour

sqlfluff lint models\staging\stg_actor_types.sql
=== [dbt templater] Sorting Nodes...
20:35:22  [WARNING]: Configuration paths exist in your dbt_project.yml file which do not apply to any resources.
There are 1 unused configuration paths:
- models.danish_democracy_data.intermediate
=== [dbt templater] Compiling dbt project...
=== [dbt templater] Project Compiled.
== [models\staging\stg_actor_types.sql] FAIL
L:   6 | P:   5 |  PRS | Line 6, Position 5: Found unparsable section: 'qualify
                       | row_number() over (partition by ...'

How to reproduce

save the sample sql into file example.sql
use the sample .sqlfluff file
Run sqlfluff lint against example.sql

Dialect

DuckDB

Version

sqlfluff, version 2.2.0

Configuration

.sqlfluff

[sqlfluff]
dialect = duckdb
templater = dbt
runaway_limit = 10
max_line_length = 80
indent_unit = space
exclude_rules = RF05

[sqlfluff:indentation]
tab_space_size = 4

[sqlfluff:layout:type:comma]
spacing_before = touch
line_position = trailing

[sqlfluff:rules:capitalisation.keywords] 
capitalisation_policy = lower

[sqlfluff:rules:aliasing.table]
aliasing = explicit

[sqlfluff:rules:aliasing.column]
aliasing = explicit

[sqlfluff:rules:aliasing.expression]
allow_scalar = False

[sqlfluff:rules:capitalisation.identifiers]
extended_capitalisation_policy = lower

[sqlfluff:rules:capitalisation.functions]
capitalisation_policy = lower

[sqlfluff:rules:capitalisation.literals]
capitalisation_policy = lower

[sqlfluff:rules:ambiguous.column_references]  # Number in group by
group_by_and_order_by_style = implicit

.sqlfluffignore

reports
target
dbt_packages
macros
extract-load
.dbtenv
.vscode
logs

Are you willing to work on and submit a PR to address the issue?

[X] Yes I am willing to submit a PR!

Code of Conduct

[X] I agree to follow this project's Code of Conduct

View on GitHub

3) Specify Example Caching Directory

Repo : https://github.com/gradio-app/gradio

Specify Example Caching Directory #5781

freddyaboulton posted on Oct 03, 2023

[x] I have searched to see if a similar issue already exists.

Is your feature request related to a problem? Please describe.
Would be great to specify the directory gradio uses to cache examples. This would allow developers to save the cache to persistent storage on HF Spaces.

Describe the solution you'd like
Support specifying the cached directory via environment variable and a parameter in the examples class.

Additional context
Add any other context or screenshots about the feature request here.

View on GitHub

4) Nits on the Getting Started page

Repo : https://github.com/gradio-app/gradio

Nits on the Getting Started page #5778

freddyaboulton posted on Oct 03, 2023

[x] I have searched to see if a similar issue already exists.

There are too many separate concepts in the "Sharing Your App" section of the getting started guide.

My Recommendation:

Mounting within another... should be under the Sharing Your App Header
Pull Authentication into another section and include Password Protected... and OAuth into that section
Accessing the Network Request feels pretty random here. Maybe move to Key Features?
Security should pulled out into its own section.
Api Usage should be pulled out into its own section.

View on GitHub

5) [Integration] Integrate Lindorm with MindsDB

Repo : https://github.com/mindsdb/mindsdb

[Integration] Integrate Lindorm with MindsDB #7887

martyna-mindsdb posted on Oct 19, 2023

The goal of this issue is to build a new handler that will connect MindsDB to Lindorm database from Alibaba.

Resources 💡

🏁 To get started, please refer to the following resources:

📚 Building Data Handler docs: This documentation will guide you through the process of building a custom handler for MindsDB. It contains essential information on the handler structure, methods, and best practices.
📖 Lindorm database from Alibaba
👉 ChromaDB: This existing handler connects MindsDB with ChromaDB.
You can find more examples of MindsDB's data handlers here.

Next Steps 🕵️‍♂️ 🕵️‍♀️

You can familiarize yourself with Lindorm.
Create a new handler that connects MindsDB to Lindorm.
Implement the core features mentioned above.
Thoroughly test the integration to ensure proper functionality.
Submit a pull request with the new Lindorm Integration handler.

The PR should include the implementation of the Lindorm handler and unit tests.

📢 Additional rewards 🏆

Each data integration brings 2️⃣ 5️⃣ : points for SWAG and entry into the draw for a 👇 :

💻 Razer Blade 16 Laptop
👕 🐻 MindsDB Swag ℹ️ For more info check out https://mindsdb.com/hacktoberfest/ 👈

View on GitHub

6) FolderBase Dataset automatically resolves under current directory when data_dir is not specified

Repo : https://github.com/huggingface/datasets

FolderBase Dataset automatically resolves under current directory when data_dir is not specified #6152

npuichigo posted on Aug 16, 2023

Describe the bug

FolderBase Dataset automatically resolves under current directory when data_dir is not specified.

For example:

load_dataset("audiofolder")

takes long time to resolve and collect data_files from current directory. But I think it should reach out to this line for error handling https://github.com/huggingface/datasets/blob/cb8c5de5145c7e7eee65391cb7f4d92f0d565d62/src/datasets/packaged_modules/folder_based_builder/folder_based_builder.py#L58-L59

Steps to reproduce the bug

load_dataset("audiofolder")

Expected behavior

Error report

Environment info

datasets version: 2.14.4
Platform: Linux-5.15.0-78-generic-x86_64-with-glibc2.17
Python version: 3.8.15
Huggingface_hub version: 0.16.4
PyArrow version: 12.0.1
Pandas version: 1.5.3

View on GitHub

7) Create a Stimulus controller w-init to support initial classes

Repo : https://github.com/wagtail/wagtail

🎛️ Create a Stimulus controller `w-init` to support initial classes #11071

lb- posted on Oct 17, 2023

We have a small bit of JS that adds the class 'ready' to the body element when the DOM has loaded, this is useful for a few things such as styling elements as hidden until we know the JS is ready.

This pattern is used in a few places, such as the side panel JS where we want the panels to be hidden until we know the logic of the JS is loaded.

I propose a simple Stimulus solution to this that's similar to x-init and x-cloak from Alpine.js - where we use a controller to trigger this behaviour.

Describe the solution you'd like

Create a new controller InitController that adds initial classes and also removes some classes based on data attributes when the JS is ready.
Use this new controller in wagtail/admin/templates/wagtailadmin/skeleton.html
Remove our jQuery approach to this in client/src/entrypoints/admin/core.js
The controller will be pretty simple but can be enhanced later for other usages such as the side panel when that work is being done.
Testing is a must, remember that Jest await jest.runAllTimersAsync(); is your friend, see client/src/controllers/CloneController.test.js for some examples of tests that use this.
nice to have - Remove the initial-class code from the RevealController - this is actually not required and better served with this generic approach when it comes time.

About timing & usage

Stimulus will connect controllers progressively based on the order of the data-controller and all controllers connect async, but as a microtask. This is slightly before a setTimeout would fire (it's a bit nuanced).

This gives us the ability to order our controllers in a way that the w-init will be last and hence will do its classes changes last.

<body  data-controller="w-other w-init" data-w-init-ready-class="ready">

In the above code, w-other will connect first, w-init second.

Rough implementation

Here's a proposed implementation, it has ready and remove classes and the ability to configure the delay, default will be no delay except for the microtask / Promise resolving.

import { Controller } from '@hotwired/stimulus';
import { debounce } from '../utils/debounce';

/**
 * Adds the ability for a controlled element to add or remove classes
 * when ready to be interacted with.
 *
 * @example
 * <div class="hide-me" data-controller="w-init" data-w-init-remove-class="hide-me" data-w-init-ready-class="loaded">
 *   When the DOM is ready, this div will have the class 'loaded' added and 'hide-me' removed.
 * </div>
 */
export class InitController extends Controller<HTMLElement> {
  static classes = ['ready', 'remove'];

  static values = {
    delay: { default: -1, type: Number },
  };

  declare readonly readyClasses: string[];
  declare readonly removeClasses: string[];

  declare delayValue: number;

  /**
   * Allow for a microtask delay to allow for other controllers to connect,
   * then trigger the ready method.
   */
  connect() {
    Promise.resolve().then(() => {
      this.ready();
    });
  }

  ready() {
    // if below zero (default) resolve immediately instead of using a setTimeout
    const delayValue = this.delayValue < 0 ? null : this.delayValue;

    // remember to call the result of `debounce` immediately, it returns a promise that can be used to dispatch a helpful event & allow this controller to remove itself
    debounce(() => {
      this.element.classList.add(...this.readyClasses);
      this.element.classList.remove(...this.removeClasses);
    }, delayValue)().then(() => {
      this.dispatch('ready', { cancelable: false });
      this.remove();
    });
  }

  // To avoid this controller hanging around when not needed, allow it to remove itself when done
  remove() {
    const element = this.element;
    const controllerAttribute = this.application.schema.controllerAttribute;
    const controllers =
      element.getAttribute(controllerAttribute)?.split(' ') ?? [];
    const newControllers = controllers
      .filter((identifier) => identifier !== this.identifier)
      .join(' ');
    if (newControllers) {
      element.setAttribute(controllerAttribute, newControllers);
    } else {
      element.removeAttribute(controllerAttribute);
    }
  }
}

Describe alternatives you've considered

We keep this logic ad-hoc in various bits of JS and avoid a generic solution.

Additional context

This further enables us to move away from core.js being a grab-bag of ad-hoc jQuery
This enables additional prep for the Panels migration #10168
See Alpine.js x-init & x-cloak for inspiration on the usage https://alpinejs.dev/directives/init & https://alpinejs.dev/directives/cloak
See HTMX's approach to generic 'load' events - https://htmx.org/docs/#events

View on GitHub

8) Add "How to Install Gradio" in the docs page or Quickstart guide

Repo : https://github.com/gradio-app/gradio

Add "How to Install Gradio" in the docs page or Quickstart guide #5444

zabihshaik posted on Sep 07, 2023

[X] I have searched to see if a similar issue already exists.

Is your feature request related to a problem? Please describe.
I wanted to Install gradio in my laptop(OS: Windows 10). As per the quickstart guide, it is showing to just write pip install gradio . I tried that, but after installing many packages, got an error:(screenshot attached)

Describe the solution you'd like
I would like to see a "How to install guide" on the guides page or the quickstart guide page. Like, if the Windows users are facing this issue, or is it common for all. I saw a solution for the same error in the Streamlit app website, as they have a specific guide/instructions to install Streamlit in the virtual environment. Searched in the Gradio Website to see the same, but couldn't find it. I really want to install gradio and work on a ML Project.

Additional context
I would like to share the screenshot of the Streamlit installation page if it's allowed to.

View on GitHub

9) Stripe] Implement INSERT, UPDATE, and DELETE for the Payment Intents table

Repo : https://github.com/mindsdb/mindsdb

[Stripe] Implement INSERT, UPDATE, and DELETE for the Payment Intents table #7874

martyna-mindsdb posted on Oct 18, 2023

MindsDB integrates with Stripe, so you can fetch data from Stripe and use it to make relevant predictions and forecasts with MindsDB.

Resources

Familiarize yourself with the existing Stripe integration: https://github.com/mindsdb/mindsdb/tree/staging/mindsdb/integrations/handlers/stripe_handler

Steps to Follow

The task is to extend this implementation to include the INSERT, UPDATE, and DELETE commands for the Payment Intents table.

📢 Additional rewards 🏆

Each integration extension brings 15+ points for SWAG and entry into the draw for a 👇 :

💻 Razer Blade 16 Laptop
👕 🐻 MindsDB Swag

ℹ️ For more info check out https://mindsdb.com/hacktoberfest/ 👈

Please note that this issue is for first-time contributors to MindsDB

View on GitHub

10) [REF-552] Reflex prints warning about Node binary not found

Repo : https://github.com/reflex-dev/reflex

[REF-552] Reflex prints warning about Node binary not found #1778

masenf posted on Sep 07, 2023

Running reflex init inside a fresh python:3.11 docker contain prints the following warning twice

Warning: The path to the Node binary could not be found. Please ensure that Node is properly installed and added to your system's 
PATH environment variable.

Which doesn't really make sense, because then it goes and bootstraps a version of node, which it then proceeds to use.

This warning should only be displayed if node is not installed, and Reflex is unable to bootstrap a version.

Present in 0.2.7a3

_{From SyncLinear.com | REF-552}

View on GitHub

11) Enhancement: Addition of a "UserFeedback" tool

Repo : https://github.com/TransformerOptimus/SuperAGI

Enhancement: Addition of a "UserFeedback" tool #532

lukemmtt posted on Jun 27, 2023

SuperAGI is built with a variety of tools that an agent can utilize based on its evaluation of the current situation. However, there is a gap when it comes to dynamic user interaction. This issue proposes the addition of a UserFeedback tool which the agent could invoke when it requires more information or guidance from the user.

In its current state, SuperAGI allows for different operational modes, such as "God Mode" for fully autonomous operation, and "Restricted Mode" which seeks user permissions before executing any tool. The UserInteraction tool would add a new dimension to this, allowing the agent to operate autonomously (like "God Mode") while also having the capability to actively seek user input at key milestones or decision points.

This would provide a more dynamic interaction model, enabling the agent to adapt to changing requirements or unexpected scenarios more effectively. It would also enhance the agent's ability to align with user expectations and objectives, making it a more versatile and useful tool for developers. The proposed UserInteraction tool could be integrated into the existing SuperAGI framework and made available for the agent to select as needed, just like the other tools currently at its disposal.

View on GitHub

12) TypeError: Cannot read properties of undefined (reading 'split')

Repo : https://github.com/oppia/oppia

TypeError: Cannot read properties of undefined (reading 'split') #18836

kevintab95 posted on Sep 04, 2023

This error occurred recently in production:

TypeError: Cannot read properties of undefined (reading 'split')
    at C.isLocationSetToNonStateEditorTab (router.service.ts:214:60)
    at Je.<anonymous> (exploration-editor-page.component.ts:311:31)
    at Generator.next (<anonymous>)
    at exploration_editor.e5d90c2ddba5e4a20ea3.bundle.js:3987:43001

Where did the error occur? This error occurred in the exploration editor page.

Frequency of occurrence Add details about how many times the error occurred within a given time period (e.g. the last week, the last 30 days, etc.). This helps issue triagers establish severity. 2 times in 2 days.

Additional context

This errored occurred when the user was viewing the URL /create/XXX?story_url_fragment=YYY&topic_url_fragment=ZZZ&classroom_url_fragment=math&node_id=node_3#/.

General instructions for contributors In general, the procedure for fixing server errors should be the following:

Analyze the code in the file where the error occurred and come up with a hypothesis for the reason.
Based on your hypothesis, determine a list of steps that reliably reproduce the issue (or confirm any repro instructions that have been provided). For example, if your hypothesis is that the issue arises due to a delay in a response from the backend server, try to change the code so that the backend server always has a delay, and see if the error then shows up 100% of the time on your local machine.
Explain your proposed fix, the logic behind it, and any other findings/context you have on this thread. You can also link to a debugging doc if you prefer.
Get your approach validated by an Oppia team member.
Make a PR that fixes the issue.

View on GitHub

13) Auto-detect entered edition ID type

Repo : https://github.com/internetarchive/openlibrary

Auto-detect entered edition ID type #8424

lephemere posted on Oct 15, 2023

When adding a new book, it would be great if the ID type be automatically selected when entering the ID.

Describe the problem that you'd like solved

Proposal & Constraints

I'm thinking like it is done in the author edit page when entering an identifier with a recognizable pattern.

There is only 3 ID types allowed here (ISBN-10, ISBN-13 and LCCN). Could be also be extended to the edition edit page also, however there is a lot more ID types allowed.

Additional context

Stakeholders

View on GitHub

14) Telegram Toolkit [Feature Request]

Repo : https://github.com/TransformerOptimus/SuperAGI

Telegram Toolkit [Feature Request] #1196

ilyamk posted on Sep 06, 2023

⚠️ Check for existing issues before proceeding. ⚠️

[X] I have searched the existing issues, and there is no existing issue for my problem

Where are you using SuperAGI?

MacOS

Which branch of SuperAGI are you using?

Main

Do you use OpenAI GPT-3.5 or GPT-4?

GPT-4(32k)

Which area covers your issue best?

Tools

Describe your issue.

This document outlines the features that can be added to the SuperAGI application to enable interaction with a Telegram account using a Telegram Toolkit. It provides the technical framework for integrating and operating various features in a Telegram bot with the SuperAGI application. The toolkit aims to offer functionalities such as message sending, receiving, managing chats, and more.

Requested Features:

1. Account Authentication

Description Enables SuperAGI to authenticate the Telegram user account.

API Methods getMe

Workflow User inputs Telegram API token. Validate token by calling getMe.

2. Sending Messages

Description Send text and multimedia messages to individual users or groups.

API Methods

sendMessage
sendPhoto
sendAudio
sendVideo

Workflow Get chat ID. Prepare message or media. Call the appropriate API method.

3. Receiving Messages

Description Receive and process incoming messages from individual users or groups.

API Methods

getUpdates
setWebhook

Workflow Poll for new messages or set up a webhook. Process incoming message data.

4. Inline Keyboard and Custom Commands

Description Customize the chat interface with buttons and slash commands.

API Methods sendMessage with InlineKeyboardMarkup

Workflow Create inline keyboard object. Attach to a message. Implement custom command functions.

5. File Handling

Description Send, receive, and process files like documents, spreadsheets, etc.

API Methods

sendDocument
getFile

Workflow Get chat ID. Prepare file. Call appropriate API method.

6. Chat Analysis

Description Analyze chat history and user behavior.

API Methods

getChat
getChatMembers

Workflow Get chat history and members. Run analytics functions.

How to replicate your Issue?

Upload Error Log Content

View on GitHub

15) Fixed point intermediary type

Repo : https://github.com/SFTtech/openage

Fixed point intermediary type #1545

heinezen posted on Sep 17, 2023

Required Skills: C++

Difficulty: Medium

We could use an optional intermediary type for fixed-point types. Intermediary types are used for temporary values in calculations where intermediary steps can cause the original values to overflow. A common scenario where this can happen are multiplication or division. We take inspiration for this from the fpm library.

Intermediary types are only strictly necessary if the base type is small, e.g. int16_t. Therefore, their usage should be optional.

Tasks

[ ] Add intermediary type as optional template parameter
[ ] Use intermediary type in fixed-point calculations

Further Reading

View on GitHub

16) Reflex Tuple class to fully support tuples

Repo : https://github.com/reflex-dev/reflex

Reflex Tuple class to fully support tuples #1371

ElijahAhianyo posted on Jul 19, 2023

We need a ReflexTuple class just like ReflexDicts and ReflexLists.

Tuples need to be converted to lists to avoid render errors.

View on GitHub

17) Make Document Embedders enrich Documents instead of recreating them

Repo : https://github.com/deepset-ai/haystack

Make Document Embedders enrich Documents instead of recreating them #6107

anakin87 posted on Oct 18, 2023

Problem

When the Document Embedders were implemented, the Document class was frozen, so they are recreating Documents just to add the embedding to them.

An example: https://github.com/deepset-ai/haystack/blob/21d894d85a01103a9023be566fde2a2ae8a471cf/haystack/preview/components/embedders/openai_document_embedder.py#L164-L168

Solution

Now, the Document class is unfrozen, so we should simply do something like doc.embedding = emb

Components to refactor

OpenAIDocumentEmbedder
SentenceTransformersDocumentEmbedder
DONE: InstructorDocumentEmbedder (in haystack-core-integrations)

View on GitHub

18) Fixed point math constants

Repo : https://github.com/SFTtech/openage

Fixed point math constants #1544

heinezen posted on Sep 17, 2023

Required Skills: C++

Difficulty: Easy

It would be nice to have common math constants like pi defined using our fixed-point implementation, so that they can be used in fixed-point calculations with deterministic outcome. Floating point constants are already implemented here.

Tasks:

[ ] Define existing math constants also as fixed point

Further Reading

View on GitHub

19) [BUG] Adding model to existing plugin, existing plugin instances disappear

Repo : https://github.com/django-cms/django-cms

[BUG] Adding model to existing plugin, existing plugin instances disappear #7476

alexgustafson posted on Jan 11, 2023

Description

I have some simple plugins which are basically just templates to wrap child content, things like rows and columns. Usually they don't need a model. If the requirements of a plugin change and I do need a model for a plugin, adding a model will make all existing instances of a plugin disappear from the editor.

Steps to reproduce

    @plugin_pool.register_plugin
    class ColumnPlugin(CMSPluginBase):
        name = _("Column")
        # model = models.Column    <-- adding model to plugin that already exists
        render_template = "cmsplugin_column/cmsplugin_column.html"
        allow_children = True
        require_parent = True
        parent_classes = ["RowPlugin"]
    
        def render(self, context, instance, placeholder):
            context = super().render(context, instance, placeholder)
            return context

create the plugin above, initially without a model
create a couple instances of the plugin on the page
add the model to the plugin ( uncomment the commented line above )
plugins will disappear from the page

Expected behaviour

I would expect that either plugins show up, but without any settings, or that some error displays, indicating that some plugins dont have corresponding model isntances.

Actual behaviour

Plugins disappear without any warning or indication

Additional information (CMS/Python/Django versions)

django==3.2.14 django-cms==3.11.0

Do you want to help fix this issue?

Could this be solved via an manage.py command that creates default model instances for plugins that hove none? I could help creating a manage.py command if I had some guidance how to correctly query for this case.

[ ] Yes, I want to help fix this issue and I will join #workgroup-pr-review on Slack to confirm with the community that a PR is welcome.
[x ] No, I only want to report the issue.

View on GitHub

20) Link section for disabling file based logging to Databricks docs

Repo : https://github.com/kedro-org/kedro

Link section for disabling file based logging to Databricks docs #2978

ankatiyar posted on Aug 25, 2023

Description

The docs on logging are kind of hidden away. This page with all the information about the logging setup does not show up on the landing page for logging but is visible in the side bar.
It's also worth linking the section for disabling file based logging to the deployment docs - eg. The Databricks docs and recommending disabling file based logging for read-only Databricks repos. Not everyone starts with the databricks-iris starter.
We had this as a note in the old docs - https://docs.kedro.org/en/0.18.10/deployment/databricks/databricks_workspace.html#databricks-notebooks-workflow

View on GitHub

21) docs: replacement of exit command with creation of .kube directory and usage of newgrp

Repo : https://github.com/canonical/microk8s

docs: replacement of exit command with creation of .kube directory and usage of newgrp #3779

SuperAayush posted on Feb 18, 2023

Summary

As discussed in last community meeting a change in the tutorial section under this link in which while getting a error when installing microk8s it is advised to - "To validate the changes you can exit the VM’s shell and log in again." which is not a good practice and can be replaced by using newgrp.

What Should Happen Instead?

Instead of using exit command, a directory named ".kube" can be made if doesn't exit in the current VM and then the commands shown in the screenshot should be copy and pasted in the terminal followed by newgrp microk8s

Reproduction Steps

Launch a new VM using multipass launch --name <vm-name> --mem 4G --disk 40G
Shell into that VM using multipass shell <vm name>
Install microk8s in that VM using sudo snap install microk8s --classic
Run microk8s status --wait-ready

Can you suggest a fix?

To overcome this error usage of newgrp is a better than using exit and re-enter

Are you interested in contributing with a fix?

yes

View on GitHub

22) [BUG] COCO export: Instance Segmentations OOB wrt their bounding box

Repo : https://github.com/voxel51/fiftyone

[BUG] COCO export: Instance Segmentations OOB wrt their bounding box #2847

allenleetc posted on Mar 27, 2023

Instructions

Thank you for submitting an issue. Please refer to our issue policy for information on what types of issues we address.

Please fill in this template to ensure a timely and thorough response
Place an "x" between the brackets next to an option if it applies. For example:
- [x] Selected option
Please delete everything above this line before submitting the issue

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 20.04
Python version (python --version): 3.10.10
FiftyOne version (fiftyone --version): 0.20.0
FiftyOne installed from (pip or source): pip

Commands to reproduce

def check_ann(ann):
    bb = ann['bbox']
    segall = ann['segmentation']

    bbxyxy = bb.copy()
    bbxyxy[2] += bbxyxy[0]
    bbxyxy[3] += bbxyxy[1]

    for seg in segall:
        seg = np.array(seg,dtype=np.float64).reshape((len(seg)//2),2)

        #print(seg1)
        orx0 = seg[:,0]<bbxyxy[0]
        orx1 = seg[:,0]>bbxyxy[2]
        ory0 = seg[:,1]<bbxyxy[1]
        ory1 = seg[:,1]>bbxyxy[3]
        if np.any(orx0):
            print(f'x0, {seg[orx0,0]} against {bbxyxy[0]}')
        if np.any(orx1):
            print(f'x1, {seg[orx1,0]} against {bbxyxy[2]}')
        if np.any(ory0):
            print(f'y0, {seg[ory0,1]} against {bbxyxy[1]}')
        if np.any(ory1):
            print(f'y1, {seg[ory1,1]} against {bbxyxy[3]}')
    
    return seg, bbxyxy

PATH = '/home/allen/tmp/export_oiv7'

ds = foz.load_zoo_dataset('open-images-v7',max_samples=15,
                          split='validation',label_types=['segmentations'])

ds.export(export_dir=PATH,dataset_type=fo.types.COCODetectionDataset)

ds2 = fo.Dataset.from_dir(dataset_dir=PATH,dataset_type=fo.types.COCODetectionDataset)

JSON = os.path.join(PATH,'labels.json')
with open(JSON) as fh:
    j0 = json.load(fh)

for aidx,ann in enumerate(j0['annotations']):
    print(aidx)
    check_ann(ann)

Output:

0
x0, [752.5] against 753.651712
1
2
3
y0, [154.5 153.5] against 154.77248
4
5
y0, [350.5 351. ] against 351.99974399999996
6
x0, [266.5 266.5] against 267.327488
y0, [166.5 167.5] against 167.727725
7
y0, [86.5] against 87.560681
8
y0, [71.5 71.5] against 72.42499
9
y0, [468.5] against 469.25214600000004
10
x0, [102.5 103.5] against 103.999488
y0, [86.5 86.5] against 87.999744

... snip ...

Describe the problem

Exporting instance segmentations via COCODetectionDataset can result in segmentations that go out of bounds with respect to their bounding box by up to ~1.5px. This may cause an issue for some training pipelines and/or represent a small loss in accuracy.

I investigated one of the original COCO datasets downloaded from cocodataset.org and this issue was not present. So round-tripping through FO (importing and then re-exporting) likely causes the slight discrepancies.

What areas of FiftyOne does this bug affect?

[ ] App: FiftyOne application issue
[x] Core: Core Python library issue
[ ] Server: FiftyOne server issue

Willingness to contribute

The FiftyOne Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the FiftyOne codebase?

[ ] Yes. I can contribute a fix for this bug independently
[ ] Yes. I would be willing to contribute a fix for this bug with guidance from the FiftyOne community
[ ] No. I cannot contribute a bug fix at this time

View on GitHub

23) MarkdownToTextDocument (v2)

Repo : https://github.com/deepset-ai/haystack

`MarkdownToTextDocument` (v2) #5669

ZanSara posted on Aug 29, 2023

This file converter converts each Markdown files into a text Document.

Draft I/O:

@component
class MarkdownToTextDocument:

    @component.output_type(documents=List[Document])
    def run(self, paths: List[Union[str, Path]]):
        ... loads the content of the files ...
        return {"documents": documents}

In order to ease preprocessing, this converter should also extract metadata from the files while converting them into Documents.

Some information that can be gathered from the files may be:

File name and file-related metadata
Page numbers
Headings and chapter information (implemented partially in v1 as well)

All this information should be "anchored" to the text by character index, if so needed. For example, it may look like this:

{
    "pages": [
         {"start": 0, "end": 2034},
         {"start": 2034, "end": 3987},
         ...
    ],
    "headings": [
        {"text": "Chapter 1", "position:" 3, "level": 1},
        {"text": "Paragraph 1", "position:" 203, "level": 2},
        {"text": "Paragraph 2", "position:" 3203, "level": 2},
        {"text": "Chapter 2", "position:" 10349, "level": 1},
        ....
    ]
}

View on GitHub

Thank you for taking the time to read these issues. I hope you find them helpful. Make sure to contribute to open source during this month.

Here is a little bit about me: I'm Shreya, and I'm building Firecamp.